The role of big-data in clinical studies in laboratory medicine

Zhongheng Zhang


The primary aim of laboratory medicine is to make an accurate diagnosis or risk stratification. A diagnostic tool is said to be useful if its diagnostic performance is good, but it cannot be routinely recommended for clinical use unless it also provides clinical benefits in patient-important outcomes. The latter condition is the effectiveness of the diagnostic tool. While the gold standard of effectiveness relies on randomized controlled trials (RCTs), big-data clinical study employing electronic healthcare records (EHR) provides an alternative to provide evidence to guide decision-making. RCTs are limited by their strict inclusion/exclusion criteria, high cost and ethical constraint. EHR contains data on patient-level granularity and can help to disentangle complex research questions. The Achilles’ heel of observational studies (e.g., big-data study is a kind of observational study in nature) is uncontrolled confounding. Although it is still impossible to control for unmeasured confounding factors, the re-use of EHR helps to control as much confounding as possible. As compared with registry and administrative databases, EHR contains more detailed information on demographics, drugs, procedures and operations. Recent decades have witnessed exponential increase in the application of machine learning techniques in laboratory medicine and these novel techniques continue to provide advanced tool for better prediction of diseases.