Biases in electronic health record date due to processes within the healthcare system: retrospective observational study

Authors: Agniel D, Kohane IS, Weber GM. 

Reference: BMJ 2018; 361: k1479

Summarised on: 7 June 2018

This article considers the massive amounts of data generated by electronic health record (EHR) systems, and points to an increasing risk of biased and incorrect medical findings from the use of Big Data analytical techniques without a full understanding of the complexities and limitations of EHR data. The researchers explain that EHRs are observational databases with data reflecting both the health of patients and their interactions with the healthcare system.

Additionally, the recording process itself is affected by factors such as doctors’ decisions to order tests and treatments, and policies and workflows of healthcare providers. Thus, the effects of healthcare processes on EHR data should not be viewed as data quality problems or noise when in fact they generate a signal, which can be used to identify subpopulations of patients and improve predictive models.

The researchers systematically evaluated the ability of 272 laboratory tests to predict three-year survival across the full patient populations seen over one year at two large hospitals in Boston. They undertook a retrospective analysis of 669,452 patients treated at the hospitals between 2005 and 2006.

The laboratory test data in the EHR were treated as having two distinct dimensions – one was the value of the test result (a measure of the patient’s pathophysiology), and the other was the timing of when the test was ordered (a marker of the underlying healthcare processes). For each laboratory test, the relative predictive accuracy for three-year survival, using the time of the day, day of the week, and ordering frequency of the test, was compared to the value of the test result.

It was found that the presence of a laboratory test order, regardless of any other information about the test result, has a significant association (P<0.001) with the odds of survival in 233 of 272 (86%) tests. Data about the timing of when laboratory tests were ordered were more accurate than the test results in predicting survival in 118 of 174 tests (68%).

The researchers say that EHR data, without consideration to context, can easily lead to biases or nonsensical findings, making it unsuitable for many research questions. However, healthcare process aspects of EHR data can be used to infer information about patients’ state of health unknown from patient pathophysiology alone. For example, a normal laboratory test result is only one indicator of a patient’s health. Information that it was ordered at 4am captures the physician’s experience, intuition, and assessment of the patient’s main complaint, baseline status, and physical exam, which are usually not explicitly coded elsewhere in an EHR.

They conclude that, if explicitly modelled, the same processes that make EHR data complex can be leveraged to gain insight into patients’ state of health.


Welcome to the RNZCGP Digest. Here, we summarise recent New Zealand and overseas journal articles and publications that are of interest to general practice and those working in the primary care sector, such as those about clinical issues, health workforce, education, quality improvement, or cross-cultural care.

Some articles are available in open-access journals, some require an online subscription.

We welcome your suggestions and comments. Please contact the College's policy team at

More RNZCGP Digest articles