Why capturing patient race data is so difficult
We're a group of animal lovers who want to solve everyday problems pets and their owners face, in fun and innovative ways.
Race can sound like straightforward information to collect from patients—but changes to how race has been categorized over time, how consistently demographic information is asked of patients and how patients think about race make it a data point worth taking with a grain of salt in patient records, experts say.
“We often think of race as a very ‘noisy’ indicator,” which isn’t necessarily fully documented or collected consistently, said Suchi Saria, professor and director of the Machine Learning and Healthcare Lab at Johns Hopkins University and CEO of Bayesian Health, a company that develops clinical decision-support artificial intelligence.
When race is used in a predictive model, Bayesian Health also uses machine-learning techniques to integrate the data while considering uncertainty around how reliably it’s collected, Saria said.
Race isn’t a fixed variable or set of categories, noted Dr. Brooke Cunningham, an internist and sociologist at the University of Minnesota Medical School.
How people think about and categorize race in the U.S. has evolved over time. The federal census, which takes place every 10 years, has frequently changed the racial groupings it collects.
Race isn’t a biological variable, and shouldn’t be used as such in medicine, Cunningham said.
A person who’s seen as Black in the U.S. would likely be labeled in a different way in Africa, Latin America or other regions. That can be confusing for recent immigrants to the U.S., and people usually aren’t given a clear definition of when to use a particular label when filling out paperwork.
That was the case for Dr. Nigam Shah, associate chief information officer for data science at Stanford Health Care and associate dean for research at Stanford University School of Medicine, when he moved to the U.S.
“When I came to the U.S., the first couple of forms I filled out I dutifully checked off ‘American Indian,’ ” Shah said. “I mean, I grew up in India and I was in America.”
American Indian is another term for Native American.
Selecting a racial category can also be confusing for some subpopulations—such as Middle Eastern or Latino people—who might be unsure which label best applies to them. Many organizations supply just five racial categories to choose from—American Indian/Alaska Native, Asian, Black/African American, Native Hawaiian/other Pacific Islander and white—with a separate ethnicity question that asks about Hispanic/Latino heritage.
A patient’s self-reported race could even change over time or be reported differently at different sites of care, depending on their understanding of race and what categories are available. Patients who identify with more than one race might choose to select just one, if they feel more closely aligned with that side of their identity or don’t know whether they can select multiple options.
Patients could also be confused about why they’re being asked to share race and ethnicity data, so may decline to do so.
Shah cited a quality improvement project he had seen from about a year ago, in which researchers asked patients at a family medicine clinic about their race and ethnicity, and then compared patients’ responses to the data recorded in the EHR system. The project found patients were misclassified roughly 37% of the time, according to a poster presented at a conference.
The project didn’t dig into reasons for the mismatch. But “the mismatch was astounding,” Shah said. “I don’t know what to do with those labels.”
Healthcare organizations vary in how they collect race and ethnicity data, as well as how consistently the information is captured, according to Dr. Peter Embi, president and CEO of the Regenstrief Institute. Embi joins Vanderbilt University Medical Center as chair of the biomedical informatics department in January.
Some healthcare organizations might have patients self-report that data on paper or electronic forms, while others might have a registrar ask a patient to identify their race and ethnicity at check-in—which staff might be uncomfortable doing. In some cases, it’s possible registrars are making an assumption about a patient’s race and ethnicity based on their appearance or name.
“I’m concerned that often times it’s not really what an individual would report as their self-identified race and ethnicity,” Embi said.
Traditionally, demographic data—including race and ethnicity—has been collected by registration staff, who enter the information into a registration or patient-intake module that sends data to the EHR. But, increasingly, the data is self-reported by patients in a patient portal, check-in kiosk or intake forms that are electronically sent to patients before an appointment.
“That, I would say, is growing,” said Hans Buitendijk, chair of the EHR Association and Cerner’s director of interoperability strategy.
Race and ethnicity data isn’t always collected in the EHR itself, but for an EHR to receive certification from the Health and Human Services Department’s Office of the National Coordinator for Health IT, it must be able to record that data on race and ethnicity—with race defined according standards from the Office of Management and Budget and from the Centers for Disease Control and Prevention. Beginning with the 2014 certification criteria, EHRs were expected to let users record multiple races.
There are more than 900 categories related to race and ethnicity included in the CDC’s standards; while EHRs must be able to record each of those concepts, a developer isn’t required to display all of them to users.
Hospitals can choose to display race and ethnicity categories in different ways, as long as the options can be reorganized to align with OMB’s standards for federal reporting—which encompass five broad categories for race and one category for ethnicity.
The OMB categories were last updated in 1997. An Obama-era proposal from 2016 would have combined the race and ethnicity questions and added a Middle Eastern and North African category, so that the standards encompassed a total seven categories for race and ethnicity. The proposal has reportedly been picked up by the Biden administration.
UCSF Health in San Francisco has developed a multi-layered way of collecting and storing patients’ race and ethnicity data, which helps to account for patients who identify with multiple races and ethnicities, while also organizing data in a way where they can be brought into different analytics applications, said Dr. Russ Cucina, the system’s chief health information officer.
Every patient receives a form where they can select their race (American Indian/Alaskan Native, African American/Black, Native Hawaiian/other Pacific Islander, Asian, white or other, in which case they can write their own response), ethnicity (of which there are dozens of options) and whether they consider themselves Hispanic/Latino.
Each of those questions also has an “unknown” or “decline to answer” option, and patients can select multiple races and ethnicities.
“Obviously, that makes the data more complex,” Cucina said, but that data is more accurate and descriptive. “People are complex. People have complex backgrounds.”
An algorithm then groups some categories that patients select together—for example, grouping patients who are Latino with Mexican heritage one way, and grouping patients who are Latino with Afro-Caribbean heritage another way—while maintaining the more granular details that the patient reported.
Those algorithmically derived groups help to ensure relevant data aren’t lost when using analytics applications with multiracial and multiethnic patients.
“There may be analytic applications where those things should be considered separately, and there may be analytic applications where those things should be considered together, as a grouping,” Cucina said. Still, he acknowledges it’s complicated work. “We’ve put a lot of time and energy into this—which doesn’t mean we think we’ve perfected it.”