Applications of sequential mining and data modeling for personalized medicine

Thumbnail Image
Malhotra, Kunal
Navathe, Shamkant B.
Sun, Jimeng
Associated Organizations
Organizational Unit
Supplementary to
Healthcare data modeling and analytics as an area of study has gathered momentum especially after the increased adoption of Electronic Health Records (EHRs) by various provider facilities. Clinicians are getting valuable insights on the efficacy of treatments based on analysis of historical medical data of patients. All patients do not respond similarly to existing treatment regimens and hence it is advantageous to stratify patients so that customized treatment plans can be designed, which in turn reduces their financial burden as well. Personalized medicine is a relatively new area in the field of medicine which involves identifying patient profiles and recommending appropriate medical interventions to them based on clinical, genomic and other social factors. Another challenge in healthcare analytics is that existing standard coding structures used by EHRs today are extremely difficult to interpret which triggers the need for dependencies on medical ontologies to improve profiling of patients. Given the above context, this dissertation begins with examining sequential mining approaches to study treatment patterns for a variety of diseases ranging from the rarest of rare cancers such as Glioblastoma (GBM) to some of the more prevalent disease worldwide including heart disease and epilepsy. We propose a non-conventional graph based approach to mine sequential patterns from medical data and come up with clinically relevant constraints to be applied on the graph. Predictive analytical models are developed leveraging such patterns for survival analysis. Treatment pathways are generated using sequential patterns to get insight into actual practices followed by epilepsy domain experts and models are developed to predict drug resistant epilepsy patient population. We also include a study of variation in treatment protocols followed by physicians in breast cancer, autism and heart disease. A known challenge in developing predictive models in healthcare involves utilizing the diagnosis information in a clinically significant manner. We develop an adaptive approach to leverage the hierarchical nature of medical ontologies to adaptively group raw diagnosis codes into optimal and clinically meaningful categories that lead to clean interpretation and great predictive performance. The thesis also focuses on management of schema evolution resulting as a consequence of difference of opinion amongst clinicians regarding making a consensus on common data elements (CDE) for a particular disease. Customizing EHR systems is costly and is not the preferred choice for small medical facilities. We present an automated approach to customize the back-end database schemas without compromising consistency.
Date Issued
Resource Type
Resource Subtype
Rights Statement
Rights URI