Structured Data and Resource Sharing in Open Source Computational Chemistry

When designing drugs and materials, researchers must examine how systems of molecules interact with each other. It is expensive and difficult to execute and measure the results of such experiments in the real world, as there are many physical limitations to providing ideal conditions. The field of computational chemistry attempts to solve this problem by developing advanced software packages such as PSI4 to simulate complicated molecular interactions. These tools take in large amounts of data as parameters and then generate large amounts of data as output. Depending on desired accuracy of results, simulations can take hours to days to run. Sharing this data with other groups is essential for research to move quickly. The volume of data produced by these simulations is difficult for a group to manage, let alone share effectively. The goal of this paper is to discuss the implementation of a computational chemistry data storage and dissemination system so that groups can easily structure data from chemical output for analysis and distribution. By structuring this chemical data rather than just storing it in a flat file, researchers can quickly derive important insights; initial results show that advanced queries for chemical properties of this structured data can execute in around 2 milliseconds. This data storage system introduces a flexible data structuring standard to the field of computational chemistry.
