Skip to main content
UC researcher sitting at desk in front of a computer

University of California researchers are making groundbreaking use of information from electronic health records with billions of data points to fast-track breakthrough insights in medical practice and treatment. This approach, enabled by use of data science methods, makes medical research more efficient by evolving traditional methodologies that have been the convention for collecting and analyzing clinical data.

Two recent publications demonstrate how data science in medical research has the potential to accelerate a national study of breast cancer detection methods and better understand COVID-19 breakthrough infection risk on-pace with the virus.

These studies have been made possible through UC’s Center for Data-driven Insights and Innovation (CDI2), which manages the UC Health Data Warehouse (UCHDW). The UCHDW is distinguished by its volume of information and the diversity represented in data from patients of UC’s six academic health centers—UC Davis Health, UCI Health, UCLA Health, UCR Health, UC San Diego Health and UCSF Health—going back more than 10 years. UCHDW also includes publicly available data from other sources, such as the California Department of Health Care Access and Information (HCAI). 

“This kind of data science-driven research is truly a leap forward in our ability to bring precision medicine to more people and close the health equity gap, especially in California’s most vulnerable communities,” says Atul Butte, M.D., Ph.D., University of California Health’s chief data scientist and head of CDI2. 

“In some cases, years and billions of dollars would be required for randomized clinical trials to arrive at these findings. What’s more, the diversity represented in this data allows us to study the impacts of care and treatment by age, gender, race and ethnicity, and more,” added Butte. 

The UCHDW contains data for over 9 million patients, including more than 5.2 billion vital signs and test result measurements, as well as billions of procedures and medication orders and prescriptions, tens of thousands of sequenced cancer genomes and more than a billion diagnosis codes. 

Better detection of breast cancer

The Women Informed to Screen Depending On Measures of risk (WISDOM) Study is investigating whether personalized screening based on a woman’s individual health and medical history is more effective than annual screening for breast cancer. Launched within the UC system in 2016, the study includes more than 55,000 patients (of whom 18,000 are UC patients) and has been extended to include women nationwide. Historically, collection of confirmed cancer diagnoses has relied on state and national cancer registries, where data validation can take months or longer to finalize, and patients electing to self-report. 

Katherine Leggat-Barr, M.D., and colleagues at UCSF, led a study to explore whether the UCHDW could play a role in speeding the collection of confirmed diagnoses for the WISDOM Study. 

Published in the American Society of Clinical Oncology’s Journal of Cancer Clinical Informatics, Leggat-Barr and her team show that using the real-world data in the UCHDW as well as self-reported data is more complete than self-reported information alone, although less complete than cancer registries once registry data is finalized after two years.

COVID-19 breakthrough infection insights at the pace of virus mutation

The University of California COVID Research Data Set (UC CORDS) was one of the nation’s first COVID-19-specific sets of real-time data from patient encounters, created in a matter of weeks at the start of the pandemic. UC CORDS has remained crucial to fighting the virus as well as understanding the risk and outcomes of illness for diverse populations. 

Most recently, Michael Hogarth, M.D., UC San Diego, and a research team used UC CORDS to document the odds of breakthrough infection for a wide range of specific comorbidities.

The work was published in The American Journal of the Medical Sciences. Because patient data is continually added to the UCHDW, UC CORDS keeps pace with changes in clinical data in real time. Hogarth and team’s analysis provides a repeatable framework for researchers to use as the pandemic has evolved to become endemic.

Recognition for thought leadership

Atul Butte, M.D., Ph.D., UC Health chief data scientist and head of CDI2
UC Health Chief Data Scientist Atul Butte, M.D., Ph.D.

In recognition for his contribution to the transformation of health care through the use of health data, the American Medical Informatics Association has presented Butte with the 2023 William W. Stead Award for Thought Leadership in Informatics

As a result of Butte’s leadership, data-driven research is expanding as intended, at the local level, and nationwide. At the local level, individual UC academic health centers are adopting the principles modeled by CDI2, including at UC San Diego, which has emulated the UC CORDS model in creating a data warehouse for all local electronic health records and establishing a team dedicated to facilitating researchers’ use of data. At the same time, UC-initiated research has expanded to include patients beyond the UC Health system, as in the case of the WISDOM Study. The research design is a model that can be replicated by other institutions and organizations that have an interest in advancing their own data-driven efforts.

About University of California Health

University of California Health comprises six academic health centers, 20 health professional schools, a Global Health Institute and systemwide services that improve the health of patients and the University’s students, faculty and employees. All of UC’s hospitals are ranked among the best in California and its medical schools and health professional schools are nationally ranked in their respective areas.