The Data Protection Commissioner ("DPC") recently published guidance on the use of data anonymisation and pseudonymisation techniques. In our last blog, we examined these concepts and some of the key points in the DPC's guidance. We focused, in particular, on the difficulties in implementing these techniques and the scope of what is considered to be personal data.
We take a closer look at some of the anonymisation techniques referenced in the DPC's guidance. We also consider data protection law obligations arising for organisations wishing to anonymise or pseudonymise certain data sets.
The Data Protection Acts 1988 and 2003 (the "Acts") do not explicitly recognise the concepts of data anonymisation and pseudonymisation, meaning there is no prescriptive standard of anonymisation under Irish law. Consequently, an organisation hoping to employ anonymisation techniques will have to decide, on a case-by-case basis, what techniques to use to sufficiently anonymise data sets. In this regard, the DPC's guidance note is useful for organisations wanting to assess what techniques (or combination of techniques) they should use.
The DPC's guidance note discusses the two main forms of anonymisation, namely randomisation and generalisation.
Randomisation techniques involve altering personal data in order to remove the link between the data and the individual. There are a host of randomisation techniques available including "noise addition" and "permutation".
Noise addition (or "noise injection") involves the addition of random variables to personal data to reduce the risk that an individual can be identified from the data. For example, in a database which records the height of individuals, each individual's height could be increased, or decreased, by a small amount. It can be stated to be accurate only with a certain range, such as +/-10cm.
Permutation, on the other hand, involves the swapping or shuffling of data between the records of individuals, making it harder to identify a particular individual. For example, a data set containing the height of individuals could be "randomised" by shuffling the height values so that they are no longer connected to other information about the individual. These techniques are useful in reducing the risk of inference and the matching up of data between data sets.
Generalisation involves the dilution of identifiers attributable to data subjects so that...