Is 'Hashed' Data Anonymized Data?
First of all, it is useful to examine the concept of 'hashing' in order to determine whether the hashed data is anonymized data or not. Hashing is the process of taking input of any length and converting it into encrypted output with a series of algorithms. Experts generally define hashing as a one-sided encryption method. The value obtained after this operation is called the "hash value".
Depending on the type of algorithm used in the hashing method, the hash value obtained varies. At the same time, as seen in the examples below, in addition to changing the algorithm, replacing one letter with another letter in the word subject to the hashing process or even changing a letter as a capital or lowercase letter can directly change the resulting hash value.
For example, when the words "KPVeri" and "kpveri" are hashed using the SHA256 and SHA1 algorithms, the resulting values are as below:
Well, when data or a data set is hashed using various algorithms, does the hash value(s) obtained always qualify as anonymized data?
In Turkish Legislation, hashing is not defined in any law or secondary regulation, especially under Law No. 6698 on the Protection of Personal Data (''KVKK''). There is no direct legal regulation on this method in the international legislation on privacy or even in the European Union General Data Protection Regulation (''GDPR''), which is accepted as the most advanced privacy legislation today. In this sense, hashing is more of a technical standard than a topic of legislation.
In the article definitions of the KVKK, personal data is defined as "any information relating to an identified or identifiable natural person". Anonymous process, on the other hand, is the process of making personal data impossible to be associated with an identified or identifiable natural person under any circumstances, even if it is matched with other data.
On the basis of these definitions, there are experts in the doctrine who consider that hashed data is personal data or, on the contrary, that hashed data is anonymized data. Therefore, in order to conclude this issue, it would be useful to review the decisions and published content by international authorities:
The Spanish Data Protection Authority published guidance on anonymization techniques in its remark No. 5/2014, which considers hashing as a method of de-identification. Accordingly, the use of hashing to pseudonymize or anonymize personal data will need to be justified by a re-identification risk analysis associated with the specific hashing method used in the data processing. Such a risk analysis will need to examine/encompass both the hashing process and the other elements that make up the hashing method, with priority given to information that is or may be linked to the value represented by the hashing. This analysis should result in an objective assessment of the probability of "resolvability" in the long run. In other words, the likelihood and risks of "decryption" in the long term should be objectively assessed.
Some international authorities may accept hashing as an anonymization technique if certain conditions are met. In this scope; in the guidance of the Spanish Data Protection Authority, the requirements in the risk assessment for the hashing method to be accepted as an anonymization method are as follows:
Taking the necessary administrative measures to guarantee the elimination of all information that allows the data to be "parsed" again,
Guaranteeing system security beyond the expected lifetime of personal data.
In the light of these criteria, it would not be correct to make a final comment that if the data is subjected to the hashing method, every hashed data retains its personal data quality or every hashed data loses its personal data quality and becomes anonymous. Although the issue is highly controversial, it would be more accurate to evaluate a concrete case. At this point, the "analyzability" criteria should be the focus of the evaluation criteria.
For example, if a data set is subjected to hashing and transferred abroad, if the non-resident party (the data controller or data processor) can analyze the hashed data as a result of matching using any data, then the data retains its personal data quality. However, if the non-resident party cannot match any data and therefore cannot analyze the hashed data, it can be considered that the data loses its personal data quality and becomes anonymized.
The German Court's decision (VG Bayreuth, Beschluss v. 08.05.2018 - B 1 S 18.105) also evaluated the hashing method in the concrete case and concluded that the hashed data received by Facebook from third-party data controllers may be the data stored in the Facebook database and if the data is hashed, it may be possible for Facebook to detect this data and concluded that the hashed data does not become anonymized.
In the Personal Data Protection Authority's ('Authority') decision on the subject (Summary of Decision dated May 20, 2020 and numbered 2020/404), it was emphasized that biometric data does not lose its biometric data quality with the hashing method. Although it is a very accurate decision, it is necessary to point out the way in which it was made. In the concrete case, only a single piece of data (biometric data) is hashed, and the hashed data can be matched with the biometric data in the systems within the data controller. In this case, even though the biometric data is hashed, it is possible to access the biometric data, especially considering the possibility of analysis, and the Board has evaluated that the data has not become anonymized.
Consequently, whether hashed data loses its personal data status should be evaluated on a case-by-case basis:
If the hashed data is analyzable; the data will still have the quality of personal data and in this process, it would be more accurate to consider that the hashing method is not an anonymization method, but a method of pseudonymization and this hashing is only a security measure in terms of personal data.
The hashing method can be considered as an anonymization method if it is guaranteed to remove all necessary data that allows the hashed data to be analyzed and all measures are taken at this point, and if there is no possibility of analyzing the data as a result of the technical guarantee of system security beyond the expected lifetime of personal data.