Companies developing software systems for internal use or software products for sale need to assure compliance with personal data protection norms, such as the European General Data Protection Regulation (GDPR) which was established in 2016. Such an assurance, though, remains challenging due to various obstacles between legal and technological aspects of software system compliance, hence the need for an interdisciplinary perspective. In this blog post, we will elaborate on those challenges and specifically focus on the cornerstone notion of ‘anonymous data’.
Regulation with a tall order
Challenges surrounding the compliance of software systems are largely related to three key issues to do with the very nature of GDPR, in particular: (1) a goal-based nature that makes its norms vague and abstract from a software engineering perspective; (2) a focus on organisations rather than systems; (3) and lastly, but not least, the close interrelation between non-technical organisational measures and technical compliance measures. These aspects make some practitioners question the very possibility of unambiguously demonstrating that any given software system is GDPR-compliant, considering that the regulation recommends rather than requires producers of software systems to take personal data protection norms into account (recital 78). Furthermore, while GDPR highlights an opportunity for software certification, the development of such certification features the same problems. So, what exactly makes it so challenging to develop software systems that ‘comply’ with GDPR provisions?
The traditional approach to complying with GDPR is to conduct regulatory requirements engineering as a part of the software engineering process. As a result of this approach, regulatory norms are translated into requirements that consider the necessary properties of a software-intensive product and its development process compliant with the norms. Yet, such an approach is labour-intensive and error-prone as, again, different stakeholders may interpret GDPR provisions differently. A potential solution to fulfilling the implementation of ‘technical measures’ for data privacy compliance, as mandated in the GDPR text, is to use Privacy-Enhancing Technologies (PETs). These are methods or solutions that can help to minimise the efforts required from software engineers to achieve compliance, for example, through anonymising data while keeping its utility intact. Of particular interest in this context is the notion of Differential Privacy (DP) which provides a mathematically grounded definition of privacy in the context of an individual’s data, which also offers quantification of privacy loss, thus providing the possibility of controlling any trade-off between privacy and utility. Yet, both of these approaches encounter the problems of vagueness and abstractness caused by GDPR’s goal-based nature. Furthermore, their applicability in the compliance process is unclear.
This GDPR peculiarity is reflected in the fact that it establishes high-level goals rather than detailed implementable measures and procedures which are sufficiently concrete from a software engineering perspective. Some may claim that this feature of GDPR stems from its existence as a high-level pan-European regulation. By contrast, we consider this to be more the kind of regulatory approach, which is characteristic within personal data protection overall nowadays. For example, The Standard Data Protection Model (SDM) that was designed to ‘translate regulatory norms’ breaks down regulatory norms into particular sub-goals rather than concrete technical measures or procedures which could be unambiguously translated as (testable) requirements. While motivation for the application of a goal-based approach can be to provide flexibility for software engineers, we contend that in its current form it creates legal uncertainty in software engineering independently of the applied compliance methods. To demonstrate the issue, we use one of the core concepts of personal data legislation – ‘anonymous data’ and ‘anonymisation’ – as a process of transforming personal data to anonymous data.
Anonymous data: a concept to find
According to regulators, the processing of anonymous data does not impose any threats to data subjects, hence data protection should not apply to anonymous data. Accordingly, anonymising data or collecting anonymous data by design promises to be a suitable strategy for software developers in minimising the need for compliance procedures and mitigating legal risks in advance. At first glance, such anonymisation has a straightforward technical implementation in removing any personal identifiers and preventing their further processing. This way, anonymisation should still allow the processing of large data sets required for the regular operation of software systems without additional regulatory burdens.
The legal perspective on anonymisation is, though, far broader as it is driven by the goal of protecting personal data independently of where and how it is processed. Hence, regulators perceive data as anonymous when data can neither be related to a person already identified nor to an identifiable person. ‘Identifiability’ as a feature qualifying data as personal imposes not only the need for removal of identifiers but also the prevention of re-identification, singling out an individual, linking records relating to an individual and inferring the information concerning an individual. While such a broad approach to person identifiability seems feasible and rationale from a legal perspective, it stays too vague and abstract for effective implementation in software engineering and thus creates legal uncertainty.
Legal Uncertainty or Uncertain Legality?
GDPR itself does not provide a detailed perspective on the notion of anonymous data or anonymisation. However, this gap is filled by the pre-GDPR opinion 05/2014 on Anonymisation Techniques by Article 29 in the Data Protection Working Party. Back in 2014, this opinion recognised not only that anonymisation and re-identification are active fields of research with new discoveries emerging regularly, but that anonymisation-related risks should also be reassessed regularly by data controllers.
Clearly, as volumes of available data grow rapidly and re-identification attacks are becoming more sophisticated, re-identification is reaching new horizons and hence a rapidly growing list of mismatches in privacy anticipations. Some research shows that re-identification can be achieved using publicly available data sets only. For example, nowadays the utilisation of triples or quasi-identifiers (post code, birth date and gender) makes it possible to achieve notable success in re-identification attacks.
In this context, it seems that expectations are unrealistic, with regard to data processors being able to re-assess their re-identification risks on the basis of new research results. Firstly and most importantly, there is no clear guidance on when, how and who needs to re-assess such risks. Secondly, it seems improbable that without any clear guidance or coordination software developers will be able to combine equally their assessment of such recent research results with the establishment of data subjects’ ‘identifiability’. Such a state of regulation not only creates barriers to both software engineering and PETs adoption, but also challenges opportunities for consistent enforcement of personal data protection regulations.
Interdisciplinary ‘middle ground’ to address compliance
It can be seen that there is a dilemma between clear prescriptive ascertainment of anonymous data required in software engineering and a broad, applicable approach to anonymous data better serving the regulatory purposes. The solution clearly lies somewhere between the two.
On the regulatory side, it is sometimes necessary to rethink the protection of data subjects and concretise the context scope relevant for data identification. For example, this could be achieved by considering identifiability only within the context of a data processing organisation or ecosystem of data processors exchanging the data. Such relevant factors can be generalised by data protection supervisory authorities, for publication in guidelines, recommendations and best practices.
On the software engineering side, it is becoming increasingly important to promote interpretation of ‘anonymous data’ as a concept in each individual case. That requires capturing the broader social context of a system (e.g., number and type of data exchange partners) in combination with the technical implementation details (e.g., available data processing capacities). In our experience, this can be well supported by applying a concept known as ‘artefact-orientation’ in requirements engineering. When modelling artefacts as a reference model for engineers (to be used as a guiding blueprint for specifying requirements), we may capture different layers of information relevant to software engineering. Here, we can define sources of the relevant information (e.g., inter-organisational contracts, user stories, data models) and, thus, raise engineers’ awareness of relevant content and relationships. Furthermore, this allows for the incorporation of different levels of abstraction in the contents and facilitates reasoning about ‘anonymous data’ in each particular context as well as the achievement of personal data protection compliance.
We argue that effective implementation of software systems’ regulatory compliance with GDPR requirements should be considered from interdisciplinary positions, integrating both legal and engineering perspectives. That makes future research on interdisciplinarity in research and practice crucial for advancements in this field.
The blogs published by the bidt represent the views of the authors; they do not reflect the position of the Institute as a whole.