Health Data Standardization and the Argument for Push


By Dr. Peter S. Tippett

February 14, 2017 — For many years, a majority of healthcare industry leaders and regulators have envisioned a technology architecture in which essentially all medical information about a given person would be stored or accessible in one location. The record, with the proper rights and permissions, would then be available to every appropriate person on the care team, to scientists for research purposes, and to the patient and her personal care circle.

But in the past 30 years, none of the PC, Internet, Cloud, or Mobile revolutions have been driven by an assumption that everyone would put all relevant information into a common database, nor that all information would have the same format. We don’t give everyone access to the hard drive on our PC or to our mobile device so they can take whatever they need whenever they need it. We don’t even do that in our work environments. Bosses can’t fetch a file from my hard drive when they need more information. Instead, they ask me and I send it via email or shared drive.

During the Internet and subsequent Big Data, Social Media, Mobile and Cloud (BigSoMoClo) eras there have been two basic models:

  1. Big Database-fetch models, in which data-type diversity is relatively simple, and

  2. Index-everything for human consumption models, in which there are many varied data types, widely diverse data, and minimal agreed-upon structure.

Big database-oriented or link-fetch models are widespread when there are relatively few discrete data elements and when high levels of structure and name standardization already exist. Historically, this approach has applied in accounting systems, which have named categories, a hierarchy of accounts and product SKUs, and widely standardized reporting. A more recent example is airline flight booking, which requires passenger name, demographics and billing information, as well as a limited set of airports, airlines, flight numbers and times comprising most of the data set.

By contrast, medical data often have thousands or tens of thousands of data element types, many with diverse naming standards. And the data in those fields are often difficult to standardize. Something as seemingly simple as the names of imaging and other radiographic studies, views with and without contrast, or those from different institutions or in different departments of the same institution, commonly have different names, making it very difficult to manage in a system in which all data is stored centrally and fetched.

The epitome of the index-everything-for-human consumption model is Google Web Search. If the structure is 8-bit ASCII, the indexing “works”, so no more structure than that is required for acceptable results for human users. The big programming challenge in this model was in separating text from graphics, ignoring most of the text formatting, and providing viewers so that people could visualize the most common graphics.

People (unlike machines) are tolerant of sloppy formatting, extraneous data, data noise and other aberrations that typically result from index-everything models. We do prefer pretty output, but some messy results typically are not a deal breaker. Most of us, when doing a Google search for a particular fact —a population of a place or an actor’s real name— don’t even bother to click on any of the six or seven blobs of results. We typically scan them and find our answer. This is easy for humans, but not easy for many data trigger, alert, analytics or other operations that rely on structured data.

Data Standardization

So the idea of the central storage and fetch model implies very high levels of data standardization, format standardization, standards of data typing, hierarchal standardization (which named thing is more or less, better or worse, stronger or weaker… than another thing), name consistency, and a range of other standards that are poorly defined – and which are implemented very differently across the hundreds of electronic medical records (EMRs) and dozens of health information exchanges (HIEs) in the U.S. alone. Even if all of this is achieved, two important aspects of healthcare are seriously impacted by this approach:

1. Clinical Nuance – Making data highly structured and standardized usually limits one of the most important features of clinical information — nuance. Patient A may have obesity, diabetes, hypertension, hyperlipidemia, burning feet and painful knees, but she will not respond in the same way as Patient B with the same problems. Patient A may not be allergic or have adverse reactions to metformin, for example, but after working with her the clinician may find that something works better. There are hundreds of reasons this might be true from nausea to pill taste, or psychological factors. Nuance, subtlety, and the story of the person are not just important for accurate diagnoses and treatment, but also to achieve the highest levels of compliance, self-esteem, and clinical results.

2. Access Control – Putting all data in one place and providing access to appropriate individuals also begs the question of exactly who should have access to which portion of the data, when and by what means? With so many data types, and so many kinds of people, processes, organizations, care givers, researchers, administrative and billing needs, family members and more, the rights management of this model is very difficult to both establish and maintain over the long-term. Structuring, providing, and maintaining the huge range of potential access rights to the vast number and diversity of users in a single health system is exceedingly challenging, but doing so across a region or a nation, or the world — especially as we transition to accountable care with even more diverse, varied, and distributed caregivers — is unthinkable with current models.

Although the idea of putting all of a person’s medical and health data in one central place from which all kinds of analytics, alerts, population work and more can be accomplished is a laudable goal. It is also a very, very big lift, and one that is probably impossible. With current thinking, it could take another 10-20 years to get working.

There is an incessant belief that no progress will be made in the world of health information until we achieve significant advances in consistent data structure. When the President’s Information Technology Advisory Committee (PITAC), and essentially every other standards or planning body, came together to propose solutions to the lack of information sharing, essentially all decided that addressing the lack of consistent structure and standards would necessarily lead to interoperability. Unfortunately, progress has been hindered by a number of market forces, and in the meantime, clinicians are left with inadequate patient information at the point of care.

The scientist in me craves structured data and would love if the panacea of universally structured data was widely and instantly available. But the physician in me is pragmatic. I am delighted to get a phone call, fax, text, email, digital note or even a slip of paper showing that the in-home or ambulance glucose meter reading was 500 (or 50) in the unconscious patient. Especially when compared to getting nothing, or waiting for structured data, or worse yet, wading through 80 pages of EMR or HIE summary output.

It’s time we look at the challenge differently. We should encourage EMRs to keep doing the valuable job they are doing creating digital records and support HIEs as they strive to become the treasure troves of structured data that promise better visibility into our national health. But instead of waiting for interoperability between competing vendors, technology is readily available today to bridge the communications gap between diverse providers and EMRs, to “translate” complex patient records into a usable format for another hospital or provider, either across town or around the world.

The keys to making this work now, as well as future-proofing this system, involve embracing “push” to compliment “pull”; encouraging and learning to deal with unstructured and semi-structured data, pushing more than just data, and fully leveraging strong authentication (which I will address in my next post).

Please drop me an email, I would love to help you get started.

Peter