Using Patient Data for Research …

Using Patient Data for Research …

  • Category All Articles
  • Date Published Dec 11, 2019
  • Written by Fred M Behlen, PhD
  • Share This Article
without getting in trouble for it.

Research using patient records data is certainly in the news today. Many see patient data as a resource for machine derived knowledge to improve diagnosis, benefit clinical workflow, post alerts to potential errors, and advance medical knowledge and technology. This is not a new idea, but the emergence of deep convolutional neural networks is finally making it computationally feasible. Many organizations are getting involved, from healthcare provider organizations to “big tech” companies to new startups. These new efforts use patient data to train deep neural networks.

Not all the news attention is favorable. Collaborations of hospitals with technology companies have attracted press attention questioning their privacy practices, have attracted litigation specialists seeking damages for erroneous anonymization of data, and have attracted congressional committees seeking wrongs to right.

As I noted in my previous post, providers have fairly broad discretion to use patient data for the improvement of the provider’s operations (“analytics”), but creation of generalizable knowledge (“research”) is highly constrained by public policy on human subjects.

The above rings a bit odd, doesn’t it? A healthcare provider has broad discretion to use patient data for internal use, but creation of generalizable knowledge is strictly constrained by federal law and regulation. In other words, if what you’re doing is for your organization’s benefit, that’s ok, but if it for the good of society, then you’re subject to a lot of rules and scrutiny.

You might ask, how did research get such a bad name? If you want to know, check out the Tuskegee study. The Wikipedia article is a good starting point if you want to read the history. It will sicken you. The study, starting in 1932, followed the progression of syphilis in African American male subjects, and continued untreated for 30 years after penicillin was validated as an effective cure in the early 1940s. A whistleblower in the Public Health Service raised ethical and moral concerns in the late 1960s, and when they were ignored, went to the press. The story broke first in the Washington Star on July 25, 1972, and became front-page news in the New York Times the following day.

This was not research carried out in one of the dystopian regimes of the 20th century. It happened in the USA in full view of the medical establishment. I must note, that just in the course of reviewing the history for this post, I was overcome by sadness about what happened there, and the things people will do in the name of advancing knowledge.

Nothing we can do with patient data remotely resembles the harms referenced above, but we are wise to remember and respect the sensitivities and sensibilities of people when they hear the word “research”.

Law and regulation

The framework of law and regulation that constrains human subjects research evolved out of reactions to the Tuskegee Study. The beautifully written Belmont Report (, a worthy read, articulates the ethical and practical issues in working with human subjects. The current regulation of human subjects is codified under 45CFR46 (, and establishes organization and procedures for regulating research using human subjects. It is understood that information about people is included in the scope of human subjects.

Under the regulations, each institution conducting human subjects research must have an Institutional Review Board (IRB). Each IRB must have at least five members, with varying backgrounds to promote complete and adequate review of research activities commonly conducted by the institution. The IRB review and approve all proposed research protocols, according to guidance of regulation, ethical and practical considerations about the proposed protocol’s goals, risks and safeguards.

The regulations also define certain categories of research that are exempt from regulation. Investigations using existing data or specimens may be exempt if they meet the conditions of u00a746.104(d)(4)(ii) that

“Information, which may include information about biospecimens, is recorded by the investigator in such a manner that the identity of the human subjects cannot readily be ascertained directly or through identifiers linked to the subjects, the investigator does not contact the subjects, and the investigator will not re-identify subjects;”.

These regulations, updated in 2018, relax the older regulations of 45 CFR 46.101(b)(4), which required absolute inability to identify subject. An article by Steve Johnson and myself (, 20 years ago, reviews human subject regulations for patient records research, the impossibility of absolute anonymization of patient records, and the limitations of research working with selectively de-identified data. Much has changed since then-especially the technology for re-identifying “de-identified” data-but the basic regulatory framework is the same.

OK, this is getting waaay to geeky, but the point of this post (and of the 1999 article) is that many investigators are working way too hard to qualify for the exemption when instead they should be going to their IRB for an approved protocol. Some de-identification of data under a protocol may be required by your IRB as part of the project’s risk mitigation, but you are not relying on de-identification as a critical condition for legality as an exempt project.

However, as we noted at the outset, this only applies to “research”, i.e., the discovery or creation of generalizable knowledge. If a hospital wants to reduce its readmission rate and wants to measure factors that may be markers for readmission, that is a legitimate local use of that information for the benefit of their patients. This type of investigation is often called “analytics” or some other term to distinguish it from research.

And, healthcare organizations are allowed to outsource their business activities to other organizations. This has been done for decades, over a century in fact. Hospitals use outside services for many services, from labs to laundry to the more recent outsourcing of IT functions to cloud services and contracted service providers (including data migration companies like Laitek Inc.). Any such activities involving patient information must, according to regulations promulgated under the Health Insurance Portability and Accountability Act (HIPAA), be covered by a Business Associate Agreement (BAA) between the provider and the contractor to ensure responsible handling of confidential patient data.

Coming back to the basic analytics/research distinction, where research is IRB-regulated and analytics are not, it may still be that public sensibilities are not aligned with this formalism. And this can be a liability in the present age of privacy scares and social media mobs. The intuitive political reaction to data sharing, even with an organization under a contacted BAA, may be that “hundreds of vendor engineers” should not be privy to the private data of patients. Therefore, while a provider organization is perfectly justified in outsourcing analytics to a technology company, the counsel and protection of your IRB may be worth the extra paperwork.

Machine learning is different

A more technical consideration is that machine learning blurs the line between analytics and research. Conventional analytics are based on a human-generated hypothesis of the effect of certain variables on certain outcomes and seeks statistical testing of the hypothesis. The applicability of the results is local to the provider and exempt from research regulation. Machine learning involves training deep neural networks from large and diverse data sources, and the methods and coefficients of layers in the networks become part of the evolving knowledge embedded in the network. The vendor of the AI software does not rely on any knowledge that can be described as results in a research paper, but these neural network coefficients may be part of the refinements the vendor takes to other customers, and thus are in effect generalizable knowledge.

What to do about it

In light of all this controversy and uncertainty, and the goals of provider research personnel to improve their professional standing through published research, my message is:

  • Learn to work with your IRB, and do it.
    Do your work within an IRB-approved research protocol. You must plan the project and identify the risks and benefits to patients and society, but you ought to do that anyway. It builds constructive relationships. It’s not that hard. It gets easier to do subsequent projects as the IRB gets to know you and trust you.
  • Your results may be publishable research if the project succeeds, and
  • You never will end up testifying before a congressional committee.


Fred Behlen, PhD is the President and Founder of Laitek, a former faculty member of the University of Chicago, and past co-chair (1999-2010) of the DICOM-HL7 joint working groups (DICOM Working Group 20 / HL7 Imaging Integration SIG). He is also past co-chair of the HL7 Structured Documents Technical Committee and a Co-Editor of the HL7 Clinical Document Architecture Release 2 (CDA).