Data-analysis in internal Compliance Assessments

image title

Due to large volumes of data, forensic data analysis requires a strategic approach. In the future, technical analysis conducted by software will prevent the potential strain for an analytical result due to bias. But already now, an analytic team should be open to a result, which might divert from its initial hypothesis.

When potential misconduct of employees (including management) is being assessed, mainly data (e-mails, calendar entries, data-files) of the employee under suspicion will be analyzed. Due to the continuously growing volume of data, this is primarily a technical and management challenge.

From the perspective of data protection internal assessments are rather a minor challenge (as far as German law is applicable). It is not the aim of the General Data Protection Guideline or German Data Protection law to prevent access to data, which has been produced by employees, particularly not if the employee is under suspicion of a criminal offense. This has been rules by the Supreme Labor Court with judgement of August 23, 2018 (2 AZR 133/18) with the key-phrase: “Data-protection shall not prevent prosecution.” Even if data has been generated, stored and processed under violation of the applicable data protection law, this does not prevent the utilization of potential results from such analyses with the purpose to initiate claims.

The growing volume of data on the other hand is a challenge. This needs to be approached systematically. Initially, the data needs to be stored forensically secure in order to prevent a suspicion of manipulation. A complete, not touched copy of all data must be saved and set aside as proof of evidence. Another copy of the data may then be processed. Every file will receive a unique hash-code. The file thereafter may not be changed without also affecting the hash-code. This may be compared with the principle of a block chain in which every change – to the extent possible at all – is documented. Thereafter the mere system data, i.e. the data, which has not been generated by the user but automatically by the computer, will be deducted. It is important however that meta data, which are attached to the user-generated data, remain within the to be assessed data volume. These contain information such as the author, last date of change, last date of saving, the format of the file and so on.

The user generated files may then be analyzed. As they are attached to hash-codes, a potential manipulation of the file itself may be reconstructed. Despite the deduction of the system-data, the volume of to be analyzed data, even if only generated by one employee will comprise several hundred megabyte of data.

These files may now – if the volume should not be too big – be assessed manually. This should be possible particularly if the suspicion points in a specific direction. Any popular storage system should allow for key-word-searches. Should the data volume be larger, analytical software should be applied. Such software tools are particularly being applied in document review projects, typically taking place in common law – jurisdictions. Such analytical tools in general allow for the application of has-codes, the filtering of system files as well as the analysis of data by way of key-word-searches or even by stating cast-facts within a comprehensive text. If the latter should be applied, the software is able to identify analogous files. The results are then being evaluated several times by teams of attorneys. Their results will then be reapplied to the search-software so that the search results may be improved. This process is being named “TAR” Technology Assisted Review. Of course the analysis will become much more efficient through applying such analytical tools.  

Should the result of such analysis be that no proof was found, it may well be that the initial suspicion is unfounded. Every analytical team should remain open to such a potential result. The research team should particularly keep eyes open for sources of proof, which may lead to an exculpation of the employee under suspicion.

If the research team does not strictly stay neutral, the so called “inertia effect” kicks in. This is a very human-like phenomenon, which weakens the result of the analysis. The inertia effect means that humans always, if intentionally or unintentionally, seek to confirm once taken decisions. The person will rule out potential alternative facts if they cause discrepancies to their previously taken decision. The analyst thus loses his/her impartiality. This endangers the whole analytical team to pursue wrong leads during their assessments. Particularly in internal compliance assessment this should be viewed critically. Wrongful suspicions my lead to never to be rectified damages.

Therefore the analytical team should from time to time change the perspective and again assess the analyzed data critically. For example if the team could identify multiple communications (via whatsapp, messenger, e-mail) but none between supposed members of a “gang”. Or if the supposed gang-members never “digitally” communicated on the matter, which caused the suspicion. Then the believe in the initially stated hypothesis should be weakened.

The risk to come to a wrongful conclusion due to bias could be reduced by applying a software, which would assess all data without specific targets. The task to be assigned would be: “Analyze these files and describe who did what when. Describe up to five alternative settings..” Such an analysis would become even more realistic if such software had access to additional external data and could thus put whatever happened in a more global context. Of course such an assessment merely based on artificial intelligence could only discover the truth or several alternative possible truths if the software has access to all relevant data, i.e. also those resulting from mobile technology (mobile phone, laptops and tablets) also comprising movement data, as well as data from the cloud.

Such a mere technology-based assessment would completely be free of bias. Sadly, this will only be possible in a couple of years. But then investigators, analysts, attorneys and courts of law may refer to such assessments as continuous mean to critically test their hypothesis. Should their alternative of “reality” not be one of the alternatives assessed by the computer, then it should be self-evident that the human might be wrong.

en_GBEnglish (UK)