New Psedunomisation Data Security and Privacy Requirements Apply Globally
Recent data security and privacy obligations from the EU apply to companies in the US and indeed worldwide. One of these obligations that is causing much confusion on both understanding and implementation is the requirement for pseudonymisation.
-- by Arthur J Musgrove
Recent data security and privacy obligations from the EU apply to companies in the US and indeed worldwide.
The European Union (EU) GDPR (General Data Protection Regulation) 2016/679 places substantial technical obligations for privacy and data protection on both data controllers and, in a major change from the prior data Directive 95/46/EC, on data processors.
These obligations apply to any organisation either in the EU or that process personal data for anyone residing in the EU, whether or not the company has a presence in the EU or the data subject is an EU citizen.
Failure to comply with the obligations can result in dramatic fines of up to the greater of €20m or 4% of worldwide turnover.
One of these obligations that is causing much confusion on both understanding and implementation is the requirement for pseudonymisation.
Pseudonymisation is a process of 'de-associating' a data subject's Personably Identifiable Information (PII) from the data being processed for that data subject. By way of example, this might mean
In an insurance system, de-associating the identity of a policyholder from their claim history.
In an online shopping system, de-associating the identity of customers from their orders.
In an online discussion forum, ds-associating the identity of the participants from their posts.
To begin to understand why this may be complex, you need to consider what Personally Identifiable Information (PII) is.
The definition varies somewhat across jurisdictions, but in all cases it means information that can be used individually or in combination with other information to identify an individual.
You probably immediately think of a name or national identify number (UK) or social security number (US).
But it is far beyond that.
Neither a postcode (or ZIP+4 in the US) or a date of birth on its own will not identify an individual, but in combination it is very likely those identify a specific individual.
Add in any other piece of information, such as gender, and it is a virtual certainty that an individual can be individually identified.
One of the challenges for system architects, designers and operators, that has no clear answer, is log files.
For intsance, a major piece of often overlooked PII is the IP address. An IP address is a long-lived assignment and given that address it is often trivial to associate other records with an individual.
Given the usefulness of directly associating an IP address with online activity, it is not immediately apparent how to de-associate it in the content of the GDPR. There are of course many other types of PII that might 'leak' into log files.
Before continuing, you should understand core definitions in this area.
When talking about pseudonymisation, we refer to Personably Identifiable Information.
According to the GDPR Article 4(1), an identifiable person is a "natural person who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person' (emphasis added).
The GDPR Article 4(5) defines pseudonymisation as "the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person".
Another useful definition of pseudonymisation comes from NIST IR.8053 and is "“particular type of anonymization that both removes the association with a data subject and adds an association between a particular set of characteristics relating to the data subject and one or more pseudonyms"”.
Anonymisation versus Pseudonymisation
For data privacy software development professionals, it is important to understand the distinction between anonymisation and pseudonymisation.
Most software developers are familiar with anonymisation in the context of test data: the test data we use are derived from real user data which is transformed in such a way as to permanently remove the identification of the real user.
This is different from pseudonymisation in that the identity of the original, real user can never be recovered. With pseudonymisation, if the association keys can be found, the data can be associated with the real users.
Implications for System Design
Pseudonymisation must be built into the system right from design time.
In fact, Article 25 of the GDPR requires that data protection be "by design and by default".
The most expedient method of compliance may be to segregate all PII from operational data and only associate the two with an association key of some sort, by that may not be the most effective.
A consideration is when PII is useful for analytic processing.
Think of the example of locations and dates of birth.
One method of dealing with that that complies with the GDPR and still enables analytic processing is removing precision.
For instance, you could use only the first 3 digits of a postcode, or in the case of zip codes remove the Plus 4, and change the date of birth into an age range.
This gives a data set still useful for analytic processing while obfuscating the data association.
GDPR Requirements as Good Design
The GDPR in general, and pseudonymisation in particular, has developed a reputation for being onerous. It shouldn't.
It really represents current best practice with regards to data privacy and codified what most organisations should already be doing.
In particular, the Article 25 requirement of privacy "by design and by default" is simply good practice that organisations really should do whether or not it is a regulatory requirement.
Since it is, this good advice is now a must, and is backed up by quite severe penalties.
The following references will be useful for further details.
All views expressed in this article are my own and do not represent the
opinions of any other entity whatsoever with which I have been, am now
or will ever be affiliated. No assurance of accuracy is given and
any use of any information provided is entirely at your own risk. The
author assumes no responsibility or liability for any errors or
omissions in the content of this article. No infomration provided is intended
to be a source of investment advise or credit analysis with respect to
any material presented or otherwise. Nothing contained in this article
is intended to defame or harm any person, business or other entity.
The author retains sole and exclusive
ownership of all material herein.