The GDPR v Machine Learning Algorithms
10 May 2019
Matthew Fisher is a doctor and aspiring barrister with an interest and experience in MedTech.
Josef. K the protagonist of Kafka’s novel ‘The Trial’ was an ambitious and successful banker prior to his unexpected arrest. The criminal charges brought against him were never explained because they were beyond the comprehension of all but the most senior judges. Attempting to understand his guilt, consumed K’s every thought – he was distracted at work, subservient to his lawyer and ultimately docile when led to his execution. ‘The Trial’ eloquently argued that transparency is a prerequisite of accountability. In the Age of the Algorithm, Kafka’s novel is now more relevant than ever.
Machine learning algorithms increasingly regulate our lives making decisions about us in finance, education, employment and justice. Ultimately, it will become pervasive in most, if not all aspects of decision making in the foreseeable future. But what is a machine learning algorithm? How does it decide? What rights do data subjects have? This article aims to answer all three of these questions.
What are Machine Learning Algorithms?
An algorithm is a set of instructions which are followed to complete a task. For example, place bowl on table, pour in both cereal and milk, finally eat with spoon. A more complex example from healthcare is the CHAD VASC score. It allows clinicians to make evidenced based decisions when prescribing blood thinners for patients at risk of stroke. The score is comprised of eight separate questions such as age, sex and blood pressure. The answers to these questions are the algorithm’s variables, which determine the CHAD VASC score. Two of the variables – age greater than 75 and having had a previous stroke are double weighted to reflect their significant predictive value.
The CHAD VASC algorithm is the product of research studies performed by human clinicians and the algorithm’s weighted variables are fixed. However, machine learning algorithms require no human input and the weighted variables can change to reflect new data inputs and outputs. It is a form of artificial intelligence because it allows computers to automatically draw inferences when presented with new data, without being explicitly programmed for the task.
A common type of machine learning algorithm is an artificial neural network which imitates the human brain. The neural network functions via interconnected neurons, which are the algorithms weighted variables. The connections between neurons become stronger if the algorithm arrives at the correct answer and weaker if it arrives at the wrong answer. The system has an input layer (i.e. data on age, sex and blood pressure), hidden layers and an output layer (% risk of stroke). There are large numbers of connections between each of these layers which can be refined. With time and large data sets, billions of refinements can develop into an algorithm that is very successful at its given task.
How do Machine Learning Algorithms decide?
A complex machine learning algorithm is one with many variables. In these algorithms, the input and output layers are known but the hidden decision-making layers remain hidden. These complex models cannot be explained in their entirety, which has led to their media label of ‘black boxes’.
To understand why this is the case it is necessary to consider the ‘curse of dimensionality’ from computer science. Data can be represented geometrically – if there are two variables all the data can be displayed on a two dimensional xy graph and with three variables, a three dimensional xyz graph.
However, in complex systems there are thousands of variables, requiring thousands of dimensions. It is important at this point to distinguish between low-dimensional settings such as the three-dimensional physical space of everyday experience, from the high-dimensional spaces which arise when analysing data.
Therefore, in high-dimensional spaces such as complex machine learning algorithms it is possible to have thousands of dimensions. As the number of dimensions (variables) increases, the number of ways in which all the potential values can be combined grows – exponentially.
It is the ‘curse of dimensionality’ that turns complex machine learning algorithms into black boxes because their hidden decision-making layers are beyond human comprehension. Arguably, it is possible to provide an explanation for a specific variable in a complex system. However, this is challenging for several reasons, which are beyond the scope of this article.
The Rights of Data Subjects – Article 22(1) of the General Data Protection Regulation
The GDPR has been incorporated into the Data Protection Act 2018 and as per section 3 of the European Union (Withdrawal) Act 2018 it will remain a part of domestic law after the United Kingdom’s exit from the EU.
Article 22(1) of the GDPR states that
the data subject shall have the right not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her.
The Data Protection Working Party, which is an advisory body to the EU on data protection and privacy, has provided some examples of “legal effects” – cancellation of a contract, denial of social benefits or the refusal of citizenship. It has also provided examples of “similarly significant decision”, which includes “decisions that affect someone’s access to health services” i.e. a patient whose risk of stroke and therefore treatment is determined solely by a machine learning algorithm.
This is not science fiction, the Topol Review an independent report written on behalf of the Health Secretary states that
rather than relying on a concept of the normal derived from population studies (i.e. CHAD VASC), AI techniques such as deep learning will be used to define normality for an individual, and hence identify any deviation from it, using that individual’s genomic, anatomical, phenotypic and environmental data, and its variations over time.
This is truly personalised medicine because by combining all the variables that make up you, a stroke prevention management plan can be tailored specifically with you in mind. Personalised medicine will result in considerably better patient outcomes. However, only Dr Algorithm can administer such a complex system. He will make decisions for you and about you. In this, scenario human doctors and health professionals will be akin to the low-level court officials and guards in “The Trial” – merely implementing the unexplainable decisions made by a higher authority.
However, this dystopian / utopian future must first circumvent Section 98 of the Data Protection Act:
Right to information about decision-making.(1) Where — (a) the controller processes personal data relating to a data subject, and (b)results produced by the processing are applied to the data subject, the data subject is entitled to obtain from the controller, on request, knowledge of the reasoning underlying the processing.
S.98 only applies when the processing is done solely by automated means but in the above scenario humans remain ‘in the loop’. However, human involvement can be rendered nominal secondary to “automation bias” a phenomenon whereby humans either over or under-rely on decision making tools. It is fair to assume that human doctors will over rely on complex machine learning algorithms, either through choice or insurance, rendering their involvement in the decision-making process illusory.
If the role of human doctors is proven to be illusory s.98 will apply, requiring the data controller to provide “knowledge of the reasoning underlying the process”. However, as established it is not possible to provide meaningful explanations of the decision-making processes underlying complex machine learning algorithms. If the algorithm is simple having only a few variables combined in a straightforward way it is easier to explain the decision-making process but the algorithm does not perform very well. Therefore, we will end up with a trade-off between performance and explicability. Is a Kafkaesque world a price worth paying for dramatically improved health outcomes?
The technology behind today’s machine learning algorithms is not new, most dates from the ‘70s, ‘80s, and ‘90s. What has changed is the vast quantities of data that corporations and governments store on all of us, feeding the algorithms which make decisions for us and about us. As we leave 4G behind and enter a new world of 5G and the of ‘internet of things’ our data trails will grow exponentially, as will the role that algorithms play in our lives.
The GDPR has provided subjects with powerful rights and controllers with serious obligations. The law must now determine how best to interpret and implement this powerful piece of legislation. If done well the benefits of machine learning algorithms will lead to a fairer, more prosperous society. If done badly, it will lead to a wildly unequal society and give rise to a new digital aristocracy. A sentiment Stephen Hawking agreed with:
the rise of powerful AI will be either the best, or the worst thing, ever to happen to humanity. We do not yet know which.