Rationale for De-identification

This package deals with methods for handling personally identifiable features in data sets and hence we should address the point of why this is useful and necessary.

‘Personal data’ is information about an individual that could identify them[1]. In the United Kingdom, the use of personal data is legislated for under the Data Protection Act, 2018[2] with the ‘Information Commissioner’s Office’, or ICO, acting as the independent organisation that upholds information rights. Data protection laws essentially protect a person’s right to privacy; considered to be a fundamental human right in many places around the world[3–5]. The misuse of personal data could be harmful to individuals and could lead to things such as identity theft, discrimination or physical harm.

In the UK and across Europe the use, or processing, of personal data must have one of six bases in law. These bases are predicated on the necessary use of the personal data for a specific purpose; essentially you can’t just collect and use personal data because you might find it helpful for something sometime in the future.

The law in the UK recognizes the importance of research and how some aspects of data protection law could potentially compromise the integrity of the research. For example, research is exempt from the article of UK GDPR that gives data subjects the right to have their personal data erased. This is particularly pertinent to medical research whereby removing data could potentially skew the results which could mean a drug or treatment looks better or worse than it really is.

Even though there are research exemptions, the first port of call for the collection or processing or personal data is to ascertain whether non-identifying or anonymous data can be collected instead. Hence, researchers need a simple, reliable, and transparent methodology for the anonymization of data sets. Notably, due to the ability to script and share routines - this package means that researchers can design their anonymization pipeline and share it with the data controllers (with the benefit of no proprietary costs).

[1]

Regulation (EU) 2016//679 of the european parliament and of the council of 27 april 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data (united kingdom general data protection regulation). Chapter 1; article 4: definitions 2016.

[2]

Data protection act 2018 2018.

[3]

Diggelmann O, Cleis MN. How the Right to Privacy Became a Human Right. Human Rights Law Review 2014;14:441–58. https://doi.org/10.1093/hrlr/ngu014.

[4]

Rights UNH. International covenant on civil and political rights. Article 17. HUMAN RIGHTS 2007.

[5]

Krotoszynski RJ. Privacy revisited: A global perspective on the right to be left alone. Oxford University Press; 2016.