Multidimensional statistics metric in biological data analysis

Thumbnail Image
Huang, Tzu-Hsueh
Dickson, Robert M.
Associated Organization(s)
Organizational Unit
Supplementary to
Sepsis, a serious body’s response to infection, is still a leading cause of death around the world. Appropriate treatment, however, can take more than three days for the clinical lab to determine. Board-spectrum antibiotics that might not be the effective treatment are thus issue before the lab results are available. Inappropriate antibiotic treatments not only increase the mortality rate but also trigger bacteria to acquire new resistant. This study focuses on identifying antibiotic resistant both phenotypically and genotypically. While phenotypic antibiotic susceptibility test (AST) is accurate, the standard AST test is very slow (three days). To rapidly determine the effective treatment, antibiotic-induced bacterial damages were monitored by flow cytometry. Probability binning – signature quadratic form (PB-sQF) is developed to analyze the cytometric data. PB-sQF adaptively bins the cytometric data so much fewer bins can be used than regular histograms. With PB-sQF, linear distances between data sets are calculated. As a result, data taken from different settings, machines or days can be compared directly. With only one hour of bacteria-antibiotic incubation, effective treatment can be selected by PB-sQF. This method reduces the time to result from 48 hours to 4 hours post-blood culture. For pre-blood culture test, the bacterial count ranges from 1 to 100 CFU/mL in the present of ~4x109 cells/mL of blood cells. To separate bacteria from blood cells, saponin was used to selectively lysed the blood cells. The isolated bacteria can then be incubated in the appropriate culture medium. With only 5 hours incubation (compare to 24 hours blood culture), effective treatments can be selected by analyzing the cytometric data with PB-sQF. This pre-blood culture fast AST (FAST), can be done in 8 hours instead of more than three days as in the standard AST. Although genotypic tests can only detect known antibiotic mechanisms, it can be done much faster than the traditional AST. While the existence of resistant gene is an important indicator for multidrug-resistant bacteria, the number of copy of a certain resistant gene is also a deterministic factor for their resistant phenotype. To estimate the copy number and determine whether there are copy number variations between the query sequence and the reference sequence, sequence analysis methods are developed. First, nearest-neighbor (NN) is used to map the short reads from the next-generation sequencer to the reference sequence. NN results are linear with the number of repeated regions in the reference sequence and NN is error-tolerant compared to mrFAST, BWA-MEM and Bowtie2. We then developed copy number variation detection with mapping multiplicity (CNVMM) to analyze the mapping results from NN. While all the CNV detectors cannot properly account of the multiple copies in the reference sequence, CNVMM adjusts for the repeated regions in the reference sequence by estimating the number of copies in the reference from the NN mapping results. We demonstrate that NN-CNVMM has better performance than mrFAST-mrCaNaVaR and MAQ-CNVnator. And using NN-CNVMM with short reads data of a multidrug-resistant Acinetobacter clinical isolate, we found that a carbapenem-resistant related gene has 10-fold higher copies in the clinical isolate than in the sensitive reference strain.
Date Issued
Resource Type
Resource Subtype
Rights Statement
Rights URI