Machine Learning in Healthcare: Regulating Transparency
18 June 2020
PHG, linked with Cambridge University, provides independent advice and evaluations of biomedical and digital innovations in healthcare. PHG has recently published a series of reports exploring the interpretability of machine learning in this context. The one I will focus on in this post is the report considering the requirements of the GDPR for machine learning in healthcare and medical research by way of transparency, interpretability, or explanation. Links to the other reports are given at the end of this post.
Just a brief summary of machine learning in healthcare (for the detail, go to PHG’s report Machine Learning Landscape).
Machine learning typically denotes “methods that only have task-specific intelligence and lack the broad powers of cognition feared when ‘AI’ is mentioned”. Artificial intelligence (AI) can be defined as “the science and engineering of making computers behave in ways that, until recently, we thought required human intelligence.” We are only beginning to realise the scope of intelligence that is silicone-based, rather than meat-based, in the reductionist words of neurscientist and author Sam Harris. It is important too to grasp the difference between types of programming. As this report puts it,
Machine learning as a programming paradigm differs from classical programming in that machine learning systems are trained rather than explicitly programmed. Classical programming combines rules and data to provide answers. Machine learning combines data and answers to provide the rules
The challenge posed by machine learning, is, like AI, the “black box” problem. In other words, the “machine” predicts input variables to output variables without explaining what happens in between.
Until C-19 and Lockdown, very few people will have heard of NHSX, a virtual body incorporating teams from the Department of Health and Social Care and the NHS with responsibility for establishing a framework for AI in the health and care system in the UK. Now of course with its centralised contact tracing app it has come out of the shadows, and all eyes are on the success or otherwise of this particular technology. See our earlier post on the privacy concerns regarding contact tracing here, and note that the NHS has now abandoned its bespoke tracing app to move to Apple and Google’s decentralised technology.
But it has long been established that AI of this sort – and machine learning – promises to change numerous diverse parts of medical research and its practice, and the report sets out a very clear table of applications in this area, ranging from plotting genetic variation to the survey of novel candidates for anti-microbial therapy and other types of tailored drug discovery. It has obvious applications in public health, the most topical being the prediction of novel zoonotic diseases. Radiology departments in US medical schools find themselves depleted of students, as the time-consuming task of manually delineating radiological images is increasingly given over to faster and arguably more accurate machines.
It is clear, from reading these tables, that “medical research often blends into healthcare, research underpinning the delivery of care but also often constituting care, healthcare and including therapeutic intent as well.”
But, the authors conclude in this report,
There is increasing recognition that determining appropriate ethical and regulatory oversight of AI is a universal challenge which is best met by consistent and harmonised approaches.
Which brings us on to the detailed consideration of GDPR in the Regulating Transparency report.
The authors point out at the outset that the GDPR is not the only source of law that might generate a duty of transparency, interpretability, or explainability. There are a number of other ethical, common law and EU law bases for this, chiefly the ‘patient centred’ standard of care for communicating risk in Montgomery v Lanarkshire Health Board  UKSC 11 , which may require models used in a clinical context to be rendered at least somewhat interpretable, to avoid claims in professional negligence.
The authors also remind us that the common belief that the Data Protection Act 2018 is the UK implementation of the GDPR is false:
The GDPR is an EU regulation and so has been directly applicable (a part of UK law) since its publication, although only in force since the 25th of May 2018. It is contrary to EU law even to transcribe a regulation’s requirements into domestic law. Consequently, those looking for a domestic foothold for the GDPR should look no further than the GDPR itself.
After the UK withdraws from the EU in December 2020 the GDPR will (“all going to plan”) become the ‘applied GDPR’ or ‘UK GDPR,’ being transferred under the authority of the European Union (Withdrawal) Act 2018.
The report provides a comprehensive explanation of the territorial reach, application and rights bestowed by the GDPR which I will not go in to here, but will just point to some of the salient examples of how important its provisions are in the context of machine learning in medical care.
The authors give an example where a machine learning model may require a set of inputs for the model to process for a particular instance. For example, a model to assess surgical risk might require inputs such as age, height, and BMI to provide the output of surgical risk for any given patient. In this way, data as an input for a particular instance of processing might be caught as “personal data” under the GDPR, giving rise to obligations for the data controllers and rights for the individuals whose data are being processed.
In the context of healthcare and research,‘biometric data’, ‘genetic data’, and ‘data concerning health’ all count as special category data under the GDPR and are subject to special restrictions and safeguards.
Some machine learning for healthcare and research will either use personal data to train the model or, as a part of the function of the model itself, process personal data. At various stages the purposes for processing might change. For instance, a university researcher might ‘process’ personal data to develop a model for research purposes. Subsequently, when the trained machine model is deployed, it might use the personal data of patients to provide predictions for healthcare purposes. The rights to information may apply to processing for both of these purposes, requiring the disclosure of information to data subjects.
There is a specific derogation available to data controllers under Article 9(2). In the context of health research, controllers will commonly rely on this derogation to process special category data for research purposes. In addition to Article 9(2), Article 89 allows derogation from the listed rights where the data is processed for research purposes subject to the caveat that the right “would otherwise likely render impossible/severely impair the purposes of processing and that the derogation is accompanied by appropriate safeguards”. Reliance on the Article 89 research exemption is only possible where the processing is not likely to lead to “substantial damage or distress to the data subject”.
GDPR requirements are to be considered together with the common law right to confidentiality. In other words, if information is disclosed within the bounds of a clinician-patient relationship, and those using machine learning as part of healthcare may refuse to disclose information about processing that may reveal sensitive data regarding other patients. This is recognised by the Article 14(5)(d) exception on confidentiality and professional obligations of secrecy. In summary, in the context of healthcare and research, a number of restrictions outlined above to data subject rights and data protection principles are relevant: where the controller is no longer in a position to identify the data subject (Articles 11 and 12(2)); flexibility for research purposes found in Article 89; and the restrictions relating to disclosure of trade secrets and intellectual property in Recital 63 and Article 23(1)(i).
The report then moves on to the subject of automated decision making which is covered by Article 22 GDPR, and the “right to explanation”:
The data subject shall have the right not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her. (Article 22(1))
Article 22 was designed to meet the concerns about machine learning and the danger of the misuse of data processing in decision-making which may become “a major problem in future: “the result produced by the machine, using more and more sophisticated software, and even expert systems, has an apparently objective and incontrovertible character to which a human decision-maker may attach too much weight, thus abdicating his own responsibilities.”
There are exceptions within Article 22 to this general prohibition/right, the prohibition/right being disapplied where the processing is
A. Necessary for the performance of a contract
B. Authorised by EU or Member State law; or
C. Is based on the data subject’s explicit consent.
The authors of the report consider what kind of “decision” would be covered by Article 22 in the healthcare context. They give by way as an example a histopathology machine learning system that interprets biopsies, classifies the sample as cancerous or benign, stratifies those classified as cancerous according to prognosis, and triages patients to different patient pathways (including no treatment) following the former two tasks.
Is this one “decision”, or several, for the purposes of Article 22? And is it “based solely on automated processing, including profiling”? Not an easy question to answer. At the moment anyway, conclude the authors, machine learning acts as a “decision support tool, as a second reader, or as an interpretative aid for a healthcare professional”.
Accordingly, more often than not, there will be a human who, by default, has the authority and information to provide meaningful oversight of the machine learning system. Much the same analysis probably applies in the research context, with the qualification that systems for investigational use, clinical trials, or research may be “more ambitious” in their automation than those systems in use for the healthcare system. In any event, the majority of machine learning uses for healthcare and research are not an easy fit for ‘legal effect’ under Article 22(1). For instance, systems directed toward diagnosis or treatment, although they may have grave consequences for their data subject, do not directly have legal effect.
The authors conclude in this section that only a subset of machine learning applications will count as ‘a decision based solely on automated processing.’ Further, only a subset of this subset will have legal effect or similarly significantly affect the data subject.
Despite the difficulty of interpreting Article 22, the authors recommend in their conclusions that controllers
consider interpretability or explainability of their machine learning system throughout the development and lifecycle of their system. Specifically, transparency and associated requirements should be interpreted, being sensitive to the interests of their data subjects, keeping in mind the general precept to provide the ‘data subject with any further information necessary to ensure fair and transparent processing, whilst also taking into account the specific circumstances and context in which the personal data are processed’.
The PHG Black Box Medicine and Transparency Wellcome Trust funded project explores the technical, ethical, and legal aspects of human interpretability of machine learning for healthcare and medical research through six major reports.
· Machine learning landscape (Where is machine learning used in medical research and healthcare and where might it be used in the future?)
· Interpretable machine learning (What is human interpretability of machine learning? How may machine learning models be rendered human interpretable?)
· Ethics of transparency and explanation (Why should machine learning be transparent or be explained? What lessons can be drawn from the philosophical literature on transparency and explanation?)
· Regulating transparency (Does (and if so, to what extent) the GDPR require machine learning in the context of healthcare and research to be transparent, interpretable, or explainable?)
· Interpretability by design framework (This new framework is intended to assist developers in understanding the interpretability of their machine learning models intended for medical applications)
· Roundtables and interviews (Information on the roundtables and interviews that informed the Black box medicine and transparency reports)