Person:
Voit, Eberhard O.

Associated Organization(s)
ORCID
ArchiveSpace Name Record

Publication Search Results

Now showing 1 - 6 of 6
  • Item
    Identification of neutral sets of biochemical systems models from time series data
    (Georgia Institute of Technology, 2009-05) Vilela, Marco ; Vinga, Susana ; Grivet, Marco A. ; Maia, Mattoso ; Voit, Eberhard O. ; Almeida, Jonas S.
    Background The major difficulty in modeling biological systems from multivariate time series is the identification of parameter sets that endow a model with dynamical behaviors sufficiently similar to the experimental data. Directly related to this parameter estimation issue is the task of identifying the structure and regulation of ill-characterized systems. Both tasks are simplified if the mathematical model is canonical, i.e., if it is constructed according to strict guidelines. Results In this report, we propose a method for the identification of admissible parameter sets of canonical S-systems from biological time series. The method is based on a Monte Carlo process that is combined with an improved version of our previous parameter optimization algorithm. The method maps the parameter space into the network space, which characterizes the connectivity among components, by creating an ensemble of decoupled S-system models that imitate the dynamical behavior of the time series with sufficient accuracy. The concept of sloppiness is revisited in the context of these S-system models with an exploration not only of different parameter sets that produce similar dynamical behaviors but also different network topologies that yield dynamical similarity. Conclusion The proposed parameter estimation methodology was applied to actual time series data from the glycolytic pathway of the bacterium Lactococcus lactis and led to ensembles of models with different network topologies. In parallel, the parameter optimization algorithm was applied to the same dynamical data upon imposing a pre-specified network topology derived from prior biological knowledge, and the results from both strategies were compared. The results suggest that the proposed method may serve as a powerful exploration tool for testing hypotheses and the design of new experiments
  • Item
    Parameter optimization in S-system models
    (Georgia Institute of Technology, 2008-04) Vilela, Marco ; Chou, I-Chun ; Vinga, Susana ; Vasconcelos, Ana Tereza R. ; Voit, Eberhard O. ; Almeida, Jonas S.
    Background: The inverse problem of identifying the topology of biological networks from their time series responses is a cornerstone challenge in systems biology. We tackle this challenge here through the parameterization of S-system models. It was previously shown that parameter identification can be performed as an optimization based on the decoupling of the differential Ssystem equations, which results in a set of algebraic equations. Results: A novel parameterization solution is proposed for the identification of S-system models from time series when no information about the network topology is known. The method is based on eigenvector optimization of a matrix formed from multiple regression equations of the linearized decoupled S-system. Furthermore, the algorithm is extended to the optimization of network topologies with constraints on metabolites and fluxes. These constraints rejoin the system in cases where it had been fragmented by decoupling. We demonstrate with synthetic time series why the algorithm can be expected to converge in most cases. Conclusion: A procedure was developed that facilitates automated reverse engineering tasks for biological networks using S-systems. The proposed method of eigenvector optimization constitutes an advancement over S-system parameter identification from time series using a recent method called Alternating Regression. The proposed method overcomes convergence issues encountered in alternate regression by identifying nonlinear constraints that restrict the search space to computationally feasible solutions. Because the parameter identification is still performed for each metabolite separately, the modularity and linear time characteristics of the alternating regression method are preserved. Simulation studies illustrate how the proposed algorithm identifies the correct network topology out of a collection of models which all fit the dynamical time series essentially equally well.
  • Item
    Automated smoother for numerical decoupling of dynamic models
    (Georgia Institute of Technology, 2007-08) Vilela, Marco ; Borges, Carlos C. H. ; Vinga, Susana ; Vasconcelos, Ana Tereza R. ; Santos, Helena ; Voit, Eberhard O. ; Almeida, Jonas S.
    Background Structure identification of dynamic models for complex biological systems is the cornerstone of their reverse engineering. Biochemical Systems Theory (BST) offers a particularly convenient solution because its parameters are kinetic-order coefficients which directly identify the topology of the underlying network of processes. We have previously proposed a numerical decoupling procedure that allows the identification of multivariate dynamic models of complex biological processes. While described here within the context of BST, this procedure has a general applicability to signal extraction. Our original implementation relied on artificial neural networks (ANN), which caused slight, undesirable bias during the smoothing of the time courses. As an alternative, we propose here an adaptation of the Whittaker's smoother and demonstrate its role within a robust, fully automated structure identification procedure. Results In this report we propose a robust, fully automated solution for signal extraction from time series, which is the prerequisite for the efficient reverse engineering of biological systems models. The Whittaker's smoother is reformulated within the context of information theory and extended by the development of adaptive signal segmentation to account for heterogeneous noise structures. The resulting procedure can be used on arbitrary time series with a nonstationary noise process; it is illustrated here with metabolic profiles obtained from in-vivo NMR experiments. The smoothed solution that is free of parametric bias permits differentiation, which is crucial for the numerical decoupling of systems of differential equations. Conclusion The method is applicable in signal extraction from time series with nonstationary noise structure and can be applied in the numerical decoupling of system of differential equations into algebraic equations, and thus constitutes a rather general tool for the reverse engineering of mechanistic model descriptions from multivariate experimental time series.
  • Item
    A multivariate prediction model for microarray cross-hybridization
    (Georgia Institute of Technology, 2006) Chen, Yian A. ; Chou, Cheng-Chung ; Lu, Xinghua ; Slate, Elizabeth H. ; Peck, Konan ; Xu, Wenying ; Voit, Eberhard O. ; Almeida, Jonas S.
    Background: Expression microarray analysis is one of the most popular molecular diagnostic techniques in the post-genomic era. However, this technique faces the fundamental problem of potential cross-hybridization. This is a pervasive problem for both oligonucleotide and cDNA microarrays; it is considered particularly problematic for the latter. No comprehensive multivariate predictive modeling has been performed to understand how multiple variables contribute to (cross-) hybridization. Results: We propose a systematic search strategy using multiple multivariate models [multiple linear regressions, regression trees, and artificial neural network analyses (ANNs)] to select an effective set of predictors for hybridization. We validate this approach on a set of DNA microarrays with cytochrome p450 family genes. The performance of our multiple multivariate models is compared with that of a recently proposed third-order polynomial regression method that uses percent identity as the sole predictor. All multivariate models agree that the 'most contiguous base pairs between probe and target sequences,' rather than percent identity, is the best univariate predictor. The predictive power is improved by inclusion of additional nonlinear effects, in particular target GC content, when regression trees or ANNs are used. Conclusion: A systematic multivariate approach is provided to assess the importance of multiple sequence features for hybridization and of relationships among these features. This approach can easily be applied to larger datasets. This will allow future developments of generalized hybridization models that will be able to correct for false-positive cross-hybridization signals in expression experiments.
  • Item
    Priming nonlinear searches for pathway identification
    (Georgia Institute of Technology, 2004-09-14) Veflingstad, Siren R. ; Almeida, Jonas S. ; Voit, Eberhard O.
    Background: Dense time series of metabolite concentrations or of the expression patterns of proteins may be available in the near future as a result of the rapid development of novel, highthroughput experimental techniques. Such time series implicitly contain valuable information about the connectivity and regulatory structure of the underlying metabolic or proteomic networks. The extraction of this information is a challenging task because it usually requires nonlinear estimation methods that involve iterative search algorithms. Priming these algorithms with high-quality initial guesses can greatly accelerate the search process. In this article, we propose to obtain such guesses by preprocessing the temporal profile data and fitting them preliminarily by multivariate linear regression. Results: The results of a small-scale analysis indicate that the regression coefficients reflect the connectivity of the network quite well. Using the mathematical modeling framework of Biochemical Systems Theory (BST), we also show that the regression coefficients may be translated into constraints on the parameter values of the nonlinear BST model, thereby reducing the parameter search space considerably. Conclusion: The proposed method provides a good approach for obtaining a preliminary network structure from dense time series. This will be more valuable as the systems become larger, because preprocessing and effective priming can significantly limit the search space of parameters defining the network connectivity, thereby facilitating the nonlinear estimation task.
  • Item
    XML4MAT: Inter-conversion between MatlabTM structured variables and the markup language MbML
    (Georgia Institute of Technology, 2003-12) Almeida, Jonas S. ; Wu, Shuyuan ; Voit, Eberhard O.
    The MatlabTM programming environment and related public license environments such as Octave are gaining in popularity for the identification of algorithms and the rapid prototyping of applications in bioinformatics. At the same time, there is a strong push to standardize the identification of extended modelling languages (XML) and their underlying ontologies, to facilitate bioinformatic integration of data and methods. We hereby introduce a new m-file library, XML4MAT, that supports the inter-conversion between any MatlabTM structured variable and a specialized extended markup language (XML), designated as MbML. The library developed also includes functions to import non-MbML compliant XML structures. The functionality described is achieved without object-oriented programming, which makes it ideal for inclusion in declarative programming and implicitly turns m-structures into generalpurpose object models for data structures. The new library is made freely available at http://bioinformatics.musc.edu/xml4mat. It is ideally suited for 1) computation of XML structures in Matlab programming environments and 2) its inter-conversion to and from a specialized markup language, MbML. This also enables using Matlab structures as a format to identify new markup languages that are MbML compliant, with the corresponding gain in clarity and computability for bioinformatic applications in that environment.