Person:
Gibson, Greg

Associated Organization(s)
Organizational Unit
ORCID
ArchiveSpace Name Record

Publication Search Results

Now showing 1 - 3 of 3
  • Item
    Simultaneous clustering of gene expression data with clinical chemistry and pathological evaluations reveals phenotypic prototypes
    (Georgia Institute of Technology, 2007-02-23) Bushel, Pierre R. ; Wolfinger, Russell D. ; Gibson, Greg
    Background: Commonly employed clustering methods for analysis of gene expression data do not directly incorporate phenotypic data about the samples. Furthermore, clustering of samples with known phenotypes is typically performed in an informal fashion. The inability of clustering algorithms to incorporate biological data in the grouping process can limit proper interpretation of the data and its underlying biology. Results: We present a more formal approach, the modk-prototypes algorithm, for clustering biological samples based on simultaneously considering microarray gene expression data and classes of known phenotypic variables such as clinical chemistry evaluations and histopathologic observations. The strategy involves constructing an objective function with the sum of the squared Euclidean distances for numeric microarray and clinical chemistry data and simple matching for histopathology categorical values in order to measure dissimilarity of the samples. Separate weighting terms are used for microarray, clinical chemistry and histopathology measurements to control the influence of each data domain on the clustering of the samples. The dynamic validity index for numeric data was modified with a category utility measure for determining the number of clusters in the data sets. A cluster's prototype, formed from the mean of the values for numeric features and the mode of the categorical values of all the samples in the group, is representative of the phenotype of the cluster members. The approach is shown to work well with a simulated mixed data set and two real data examples containing numeric and categorical data types. One from a heart disease study and another from acetaminophen (an analgesic) exposure in rat liver that causes centrilobular necrosis. Conclusion: The modk-prototypes algorithm partitioned the simulated data into clusters with samples in their respective class group and the heart disease samples into two groups (sick and buff denoting samples having pain type representative of angina and non-angina respectively) with an accuracy of 79%. This is on par with, or better than, the assignment accuracy of the heart disease samples by several well-known and successful clustering algorithms. Following modk-prototypes clustering of the acetaminophen-exposed samples, informative genes from the cluster prototypes were identified that are descriptive of, and phenotypically anchored to, levels of necrosis of the centrilobular region of the rat liver. The biological processes cell growth and/or maintenance, amine metabolism, and stress response were shown to discern between no and moderate levels of acetaminophen-induced centrilobular necrosis. The use of well-known and traditional measurements directly in the clustering provides some guarantee that the resulting clusters will be meaningfully interpretable.
  • Item
    A mixed model approach to identify yeast transcriptional regulatory motifs via microarray experiments
    (Georgia Institute of Technology, 2004) Yu, Xiang ; Chu, Tzu-Ming ; Gibson, Greg ; Wolfinger, Russell D.
    A genome-wide location analysis method has been introduced as a means to simultaneously study protein-DNA binding interactions for a large number of genes on a microarray platform. Identification of interactions between transcription factors (TF) and genes provide insight into the mechanisms that regulate a variety of cellular responses. Drawing proper inferences from the experimental data is key to finding statistically significant TFgene binding interactions. We describe how the analysis and interpretation of genome-wide location data can be fit into a traditional statistical modeling framework that considers the data across all arrays and formulizes appropriate hypothesis tests. The approach is illustrated with data from a yeast transcription factor binding experiment that illustrates how identified TF-gene interactions can enhance initial exploration of transcriptional regulatory networks. Examples of five kinds of transcriptional regulatory structure are also demonstrated. Some stark differences with previously published results are explored.
  • Item
    Assessing gene significance from cDNA microarray expression data via mixed models
    (Georgia Institute of Technology, 2001) Wolfinger, Russell D. ; Gibson, Greg ; Wolfinger, Elizabeth D. ; Bennett, Lee ; Hamadeh, Hisham ; Ashari, Cynthia ; Paules, Richard S.
    The determination of a list of differentially expressed genes is a basic objective in many cDNA microarray experiments. We present a statistical approach that allows direct control over the percentage of false positives in such a list and, under certain reasonable assumptions, improves on existing methods with respect to the percentage of false negatives. The method accommodates a wide variety of experimental designs and can simultaneously assess signi cant differences between multiple types of biological samples. Two interconnected mixed linear models are central to the method and provide a exible means to properly account for variability both across and within genes. The mixed model also provides a convenient framework for evaluating the statistical power of any particular experimental design and thus enables a researcher to a priori select an appropriate number of replicates. We also suggest some basic graphics for visualizing lists of signi cant genes. Analyses of published experiments studying human cancer and yeast cells illustrate the results.