OK I want to start with a story. So I graduated ninety nine in so you don't really know a whole lot about me so I figured I'd add a little story and. I had to make a decision because several of you. I know are in here maybe thinking about an academic position and so for me the decision was a post-doc. Or an assistant professor position. And you know I didn't know which way to choose in I ended up working a little bit with you. I reckon Steen and I'm sure several of the faculty it know or have heard of you. I reckon Steen and Dr reckon Steen everything that you did he did about thirty years ago. That's if his seminar if his presence at seminars are any indication but Dr reckon Steen. So I worked with him and you know I got to talk to me a little bit and when I was the signing on schools. He gave me advice and he basically said well you can be a small fish in a big sea these are his words a small fish in a big sea or a big fish in a small city. And then he followed up with discussing about how you can effect change at a smaller institution. And that kind of stuck with me. And so when I had the opportunity to make a decision about. Tennessee attack. It was really a school that I had an opportunity to effect change at and that was really important for me because. Maybe I'm the type of person that sees something and thinks well I know how to do it better than that and. So I kind of saw that in the Tennessee Tech program so when I started out at Tennessee Tech. You know and I just want you to to think about you know how you can effect change so start at Tennessee Tech my startup package. Was a computer and a comfortable chair. And literally. And I am telling you that because right now and I want. I C H E meaning that year and I was the only faculty member. There was no graduate. No graduate students no undergraduate students at the meeting just myself. From Tennessee Tacon everyone asked me where is Tennessee Tech and I say well it's halfway between Nashville Knoxville along I forty to the point I put a figure of Tennessee up and show work for Tennessee Texas. However you know last year all of our faculty went to the meeting we had thirty some odd contributions. You know a big chunk of our graduate students went there about thirty undergraduate students so we really have now a much larger presence at A I C H E. And I think at some level I have it here to Dr reckons Deans commentary about you know kind of making an impact and I do I have a lot invested in Tennessee Tech. So I'm very very proud of that. OK so my summer opportunities. So without I don't have a starter package and so you know I want something to do during the summer. Hopefully that will catch a. And so I started to send out. Information to different national labs in grand helpful thing or. Sandia ended up contacting me and a Grant said OK well you have this background in molecular simulations we need someone. How we have this larger algorithm. To calculate to do some drug design we need someone to calculate free energy of. Lincoln's in binding pockets. It's like OK you know that's within my skill set. I can do that. So I'd be working with John the fella who was a researcher. At Sandia and so I show up in Albuquerque and John Lewis and there he actually just transferred to Livermore about two weeks before that time. And by the way they're not step or docking for energies part of a larger algorithm. They're actually one step one. Which is chemical graph theory and you know. And so I started out at step one in an area I didn't really know anything about and not work working with someone that wasn't physically where I needed him to be so I. You know this and that's kind of a blessing. Because oftentimes if you look at something with unique perspective you're not limited by what you already know about a particular field so. With that. That's how this topic of computer aided molecular design comes into play because my background is in molecular simulations. So this is you know working on something totally new. This seminar. You know I never got a letter for a seminar for use in Taishan That's the first half or so of the presentation should be an overview or tutorial on your subject and I thought there was very interesting and I wanted to adhere to that because I've been in seminars where I didn't really understand the whole lot even though I'm the person that sits where you are right there. Trying to understand and see the presentation clearly so I'm going to here to this for the first half of the Senate. OK So what is Q Sorry so. Q sort of quantitative structure to relationships there quantitative in that they're not trends things go up or things go down you actually have values so. That's why it's quantitative there's a structure activity meaning it's a structure of the molecule with correlates to a certain activity change and it hopefully model form so this is quantitative structure activity relationship that's what the. That's what the definition of that is. So the origin. Any eighteen hundreds there is structure activity relationship identified here. And really the beginning of quantitative structure activity relationships can trace their. Getting to about nine hundred sixty two with Panshin coworkers and really extensions on what they've done. You know a little bit after nine hundred sixty two. So this is really the beginning of quantitative structure activity relationships. So what about cues are right now so Q So right now if you want to find a Q. sorry if you want to publish a Q. so our. You know if you look right now in journalism in this no chemistry or sarin Q. so our environmental research there is a this. Q So uncommon Tauriel science which next year is becoming molecular informatics and I think that's. Not only a change in the focus but also kind of getting away from the work you start and finally this journal of. Chemical information and modeling it used to be Journal of chemical. Journal computer information. Journal of chemical commission appears science and it's split off in the Q. journals one of which is this and they're really well much less focused on choose are. To the point where this article came out last year's Q star dead or alive. And you know it's a perspective from an industrial person in the pharmaceutical industry. About whether people are actually using Q SARS in the way that they should be use. So let's cue star one about quantitative structure property relationships. So quantitative structure property relationships. The way I like to think about it is well it's just like a Q. Star except it's generic so Q.S. P.R. the property brick USA ours activity but the property for Q.S.T. ours. Whatever property you have me interested. Now you know them. You've likely used them but maybe you haven't really thought about them in that way. And so if you look at say. Plodding the normal boiling point for normal alkane that's a function of carbon number you can correlate that fairly well linearly and so this isn't a very simplistic example of a quantitative structure activity relationship and if you so desire. You can pull off the properties of gases and liquids and really that's just a bunch of quantitative structure property relationships. This book and sure I saw it on some bookshelves today during my visit it's building the library. So what about to espie aren't chemical engineering it's not dead. How much is it not dead. So here's an article from this year in a dusty engineer in chemistry research. I'm predicting up refine ability limits of organic compounds for molecular structures and you can see right there. Q S.P.R. study. Same year and early point temperatures hewas P.R.. Another Q S.P.R. study same year. And here talks about a molecular descriptor which I'll mention momentarily. But it's really another Q. S.P.R. study so these are all four from two thousand and nine from the same journal. And this is just a handful of several more so. Q. S.P.R. in chemical engineering is not dead. However maybe people aren't as familiar with with that as as you might anticipate I've talked to people when I've described this technique to them and I'd mention Well do you know what a Q. is P.R. is and I get a head shake No What is that and I have to explain it and then they would understand it but it's certainly not that in our field. OK So molecular descriptors I kind of mentioned little bit about the previous one had the previous paper talked about it. I.Q.'s They are Q.S. P.R.'s and by the way I'm going to use this this framework interchangeably. I might say to us they are not talking about a property other than activity. You can forgive me for that these can be linear or non-linear relationships the pen. What you trying to do the property of interest is your dependent variable and the independent variables are related to the structure of the molecules and. Those independent variables are called Molecular descriptors. There's literally a cottage industry. Of the finding and using molecular descriptors and hewas P.R.. So there's thousands of these types of descriptors that are available. And a popular one that you might have heard of would be the connectivity index but you can use the lowest unoccupied molecular orbital molecular weight shape in the seas etc And here's a couple handbooks to tell you to kind of describe how many descriptors there are available. Ultimately their goal is to discriminate. You wouldn't want to use molecular weight if you're trying to discriminate for example. Amongst isomers So a structural changes should give rise to differing values of a particular descriptor so those are what molecular descriptors are and what their. What their goal is so I mentioned. Connectivity index and even though this is not a major part of what I do research wise. You might see this next week at the C.H.P. meaning or if you've read papers on. Quantitative structure proper relationship or anyone that's talked about connectivity index I just want to introduce this here. It's a popular top alogical index. It's basically an operator on a two D. Graph So you take a molecule and the atoms are nodes and the bonds are edges near to the graph and so on top a logical index is an operator on that graph and it gives you a value. A connectivity index. It's a counterfeit. Given type of subgraphs you make subgraphs within your larger chemical graph of the molecule and it's weighted by a function associated with the number of skeletal neighbors and different subgroups are associated with the path length. Within that subgraph and you denote that as order and so quick example would be three methyl painting. So three methyl ten pain if you look at these Delta values the value in this the hydrogen suppressed graph this and tell you about the nearest that these bonds are connections rather that are not hydrogen so here this carbon atom has only one connection here that's not hydrogen this carbon has three this carbon has one. It's set and so you get these Delta values. And so it's functionalized here. And then your connectivity index is just the sum of this functional zation and just show a little example here. And so they cause i and it's pronounced the zeroth order chi and extra methyl three methyl Pentium is four point nine nine one. And you can develop a similar relationship for this path of one. So here's one two three four five there's five paths we get five here. And the connectivity index that order one is given Heron So this values two point eight zero eight etc You could see how you might do this for order to order three etc So this is a sample of a type of top logical index. OK so how are Q.S.A. are skewers P.R. is used so you use the model to relate dependent variable to some independent variables and so the value or the sign of a regression coefficient that happens to make it into the model potentially can provide insights for researchers so you might have a. A large number of descriptors to choose from. You do a statistical analysis to determine which one goes or which ones end up into your model and the values of those regression coefficients for example provide insights for researchers and they might allow them to test conjectures that are made or really give new design direction so you might run into star and and depending on what goes into your model might give you an idea about how to modify your training set based on the cue star in order to see improved activity. However you can do this in a little bit in really an approved rational approach. So a more rational approach to molecular design is called Computer Aided micro design and so that's what I'm going to talk about here. So the community computer aided most of design has some general steps and these are certainly General So the first part is this selection of fragments or groups and it's either defined by the set. You're working with or you select a pre the find number of groups. The second step is really making molecules and now you put those fragments together in a certain way based on balance arguments or other types of. Constraints you put them together in a certain way to make a molecule and then ultimately you have to evaluate the fitness. And that molecule in some way that you've made and then the pending on the algorithm you might have a structure generation step here. So that's really the general steps. On the Origin is from one nine hundred eighty three. Really. This comes from a gun in his lab and Danny is still. Working on this. So this is one nine hundred eighty three and his work. Looked at using functional groups. From your a fact in a generate and test approach would generate meaning that you generate. Structures and then you test the fitness of those structures so it's called the generating test approach. And he used this for solvent selection. Really suffered greatly from carbon authorial explosion so you could consider or you can think about. Well if I have a lot of groups and I want to put them together in a wide variety of ways this could quickly get out of hand the number of combinations that that you could come up with. And so more rational strategies were needed to limit this type of carbon the toilet flush. And really I mean he has a technique that he still uses today that's basically has its framework in the same general approach it's still generating test technique. It has many it's more complex. It has more steps to it but it's still similar to this generation test technique there was a paper. On polymer design on this. That was published this year so it's really a technique that's still being used in computer and molecular design. And so what he does is he uses group contribution techniques to score the fitness of the molecules. And I want to take an aside a little bit to describe for those that that are in for me with contribution techniques so groups are fragments of a molecule and they're looked at in two dimensions. And so. His method Petone and you have four groups. Each group will contribute something towards a property of the molecule and three the groups are unique. One group is repeated group contribution technique and I kind of. Give you a little example there and by way of comparison. The formula gives you three hundred forty five in the experiment gives you three hundred fifty three. So there's a simple example of your contribution technique that's used to rank the fitness in the generating test approach that there's other approaches. You can envision computer aided molecular design being approached as an optimisation way and so you would formulate really entire problem as an optimization problem with constraints. And this was introduced really by prominent around us. In ninety eight and they had the objective function is for whatever property are interested in and it's right in terms of those connectivity and the seas that I spoke about earlier. Their contribution was rewriting these connectivity indices in terms of variables related to a partitioned adjacency matrix partition adjacency matrix tells how each of those groups are connected. Which ones are connected to which what's the binding is single binding double binding etc So they saw the saw as an optimization problem. Because then they can use optimization techniques techniques from optimization theory in the solution of these problems. However you really needed templating with called templating to make efficient progress. And so what is template so templating is when part of the structure is fixed up for you. Ari. And so you have an idea of what you want the structure to look like. And so then you would design around a certain scaffold or template it promotes certainly method efficiency or allows you to introduce some sort of expert knowledge as well. It's less likely to design not intuitive compounds however because not really could target the space very much from. Your training set compounds if you've already this scaffold so really you're not going to design any new scuffles this way. Bo. The generate and test technique of Danny. And this. This after musician approach both of us templating and I give some examples of this. So this generate text technique here's a case study to design active herbicides with a certain backbone. And they kept the structure would have the highest activity and so this published in two thousand and seven and so here you see the template and so they modified at the R one are two or three positions. And so here they put. Method her appear here they've put to hide and so that was the optimal structure. Because that's generating test here is computed like design using mixed unit in the linear programming. They wanted to design a fuel additive. And so this is two thousand and five with these following properties. And so you can see the template molecule that they started with. And then the additions that they've made over here so. The C H two O. H. Group on top of that was the compound that had the properties that best matched what they were looking for. OK so now we get into. Kind of an area where I'd normally start my talk. This is what we do this is computed like a design with the signature molecular descriptor. And we use this technique since signature creates a useful quantitative structure property relationship. I'm going to show in a little bit of compares favorably to other popular molecular descriptors. I hazard to nibble degeneracy. Really we can control the amount of solutions that come. I'm out of our computer aided molecular design algorithm and a number of structures personal Lucian based on the signature height we use and it's it's tunable and we go from high degeneracy to non degenerate depending on what height we go to so it's monotonic Lee the degeneracy monotonically decreases. And really the best thing about this technique I think is that it can develop novel non-intuitive structures you don't have to template. And so you're not. Predisposed to choosing a certain scaffold. And we also have an exhaustive and effect a fish in structure generation algorithm designed specifically for signature and oftentimes this is a bottleneck in some other techniques the structure generation system. Right. So what a signature it started in one thousand nine hundred four. As part of a computer assisted structural elucidation code. And it really ended up identifying the structure of weakness. So this was one thousand nine hundred forty really similar to the smiles notation if you happen to know anything about that. The signature of an atom. Is a root it three that spans all of the atoms bonds that are a certain distance from the root. So that's the atomic signature in each atom has its own atomic signature. And then the molecular signature is just a vector of the occurrence numbers of those atomic signatures. So this molecular signature that describes the molecular signature of the molecule is an atomic signature of that or fragment all parts. So here's a signature example and I show method all up here. And I'm going to focus on the carbon atom. That's right here and we can do this for all of the. Atoms in the system. But here's the carbon atom. What you do is you draw turrets so this carbon atom is bonded to four things three hydrogen is an oxygen you drop like that and then this oxygen is bonded to the hydrogen that's not you know backtrack. So this is your tree. For carbon. At Highgate zero. It's just the carbon atom. That's the atomic signature and one it's the carbon and what I've responded to and it had two is the carbon whatever it's been a two and whatever those atoms are positive and so you see the signature it had zero and I want to take two for that carbon atom. You would do this then for each of the six atoms in your system. And Adam up and you would have a molecular signature for methanol. And you see those are the occurrence numbers and you might see well these hydrogen's are all equivalent and that's demonstrated here but this hides in this difference in this and that's shown there. And so I showed this is a very simple example we can do this for six structures for aromatics for double bind IT systems for triple bonded systems etc. OK So signatures in in these types of quantitative structure. Relationships. We have value weighted the biological activity of HIV you want protease inhibitors as well as close to thirteen thousand optimal water partitioning coefficients. And we compared signature to those found in commercial descriptors and there was a whole you know bunch of commercial descriptors we could look at here's just a sample there were hundreds of them available. And we found that we had results that were similar our correlate of results were similar to that of commercial descriptors and really the fragment like nature signature allows them to be more a thug and all and it creates a still more stable Q. So are. Then what. You would have from those molecular descriptors. All right so say you want to do computer aided molecular design the signature. What's the steps for the first steps and I think I've mentioned that to a few faculty already today when they were talking about this technique is we have to start with a data set. So I set of compounds that have certain property or properties so we need to be given a set of molecules and then we break them down into atomic signatures and so after we do that we generate a quantitative structure property relationship for our property of properties of interest and then we must generate equations which describe how those atomic signatures merge together to form a new molecule and these are based on. The valance arguments in some other hand shaking arguments that are shown a moment and then we solve these equations and those are our inverse solutions. Then we scored a solutions based on the U.S. P.R.'s we generated in step one. And then from that mathematical description of the molecule from these inverse solutions you generate an actual structure from the molecular signature then we go backwards and we generate structures and the best going structure has become your focus database. It's really easier to see in an algorithm and so we start with a database of N. compounds and we generate two dimensional structure so we draw these are we have them from somewhere. Then you calculate your unique height age atomic signatures you might say well I'm going to start going to do this problem at how you want to do this problem. I hate to you kept it all of your unique atomic signatures. And from those atomic signatures. You've you generate your constraint equations and I will show an example of a constrained equation. In a moment. So you generate these contrasts constrain equations and then you have to solve them and they constrain equations are solved using a brute force. Technique. Originally we used. A form background arisen these equations are Diophantine equations which means they have energy coefficients an interesting Lucian's Fort Barker algorithm was coded and it's an efficient algorithm for some types of problems for our problems that would work either really well. Or they wouldn't work at all and we never had an idea about which when it when the problem when the bottleneck would be and so we redesigned our own algorithm. That gives us a much better handle on how long the solution to this problem is going to take and then here's the demi check because you should always get the compounds back. From the ones that you're originated you create your model or models to score your candidates that you've generated. In this box here is interesting this is the box of. This is the box that really refines the problem. So no where do we impose whether or not our aromatic signatures have to obey huckle rule and so this is the step where we say OK huckle rule has to be enforced. This is the step where we introduce some stability calculation so we might calculate driving force field so in intermolecular. Energy and eliminate some of those that have higher energies we might impose a C. rule of five if you're looking for a drug like ness cycle of can rule of five. We might limit the number of rings you can have in your system. You could also put in a sort of analysis of synthetic feasibility at this point. So this step. Really refines what comes out of this part and you end up ultimately with a focused database and your hope is that this focused database are things that you want to explore in more detail. Maybe involving us instead of chemist and then evaluating the predictions. For this for these substances. OK so a simple in first design example you would never do something like this with just one compound so but I'm just going to show this with with one cop and his pen Tain. Normal pen thing and it has seventeen atoms the twelve hydrogen atoms at height one. Are all equivalent they're all hydrogen is fine that the carbon was called this variable X. one then there's these two terminal carbon atoms here. They all upon it to another carbon in three hydrogen's. And so you get two of those and there's these three internal carbon say see here. So we represent each of the unique atomic signatures with a variable the graphic ality equation first. Determines whether or not. The you can create a connected graph and this is based on. Valid arguments associated with a particular atom type and so this star indicates a modulus equation here so this is the first equation. That's the graph calculation you get one persist. And then you have handshaking or bonding arguments. So you have two types of bonding in the system carbon bonding to hydrogen and hydrogen bonding to hydrogen and so these are questions in a relate these variables. About what type of plan you have in your system so depending on the types of bonding you have you will have that many. That many consistency equations and those that represent those sets of Diophantine is a quick Diophantine equations that I talked about and you solve those. OK so I would like to move on to an application here. University of New Mexico is in Albuquerque this was done in collaboration with Sandia National Labs which is also in Albuquerque New interested enough designing inhibitors. For I cam one stronger than what's currently available. And so. We looked at designing. Rather they gave us sixteen seqlock pap tides. There were nine amino acids long and they ranged from strong to weak to night inhibiting. And so here is a R Q Some results which. Span. You know a few orders of magnitude aren't great and the non binding you know you get data and it wasn't a number just said you know greater than a thousand. And so we lumped all of those to one thousand and so for such a small set of sixteen and for the quality of the Q. so we were not confident in these predictions but we want to head we solve the problem. So. We generated height one signatures at the amino acid level we didn't have to go to an atomic level to do this because. You know you don't have to go that finer grained because you know how what type of bond you have an amino acid so we did the high one signatures relative to me. No assets. In those of the constrain equations associated with how the new acids can combine for our system we distribute we create a distribution of solutions based on the quantitative structure and activity relationship that we had created. And you can see the predicted I.C. fifty values are down here and these are the most active compounds ones with the lower. I.C. fifty values. And so we gave University of New Mexico twenty compounds. I'm sorry twenty peptide sequences. That we had predicted to be. More active than the most active in there in the original sixteen. They picked two here and they synthesize those and you could see how well. The actual values compare with the predicted values and so this was a. You know a success of this particular technique and right now these are the two most active compounds relative to that particular receptor and I can one known to date. So I've done some a variety of different work with this. Technique in a variety of projects. This is a sample of this list. This is a sample of this list I'm going to show here we designed hydro floor ethers as replacements for foam blowing agents. In two thousand and five. Brown and coworkers use the signature technique in two thousand and six to design. Up. Repeat units that had similar properties to nylon six if I remember correctly. We have current work on fire retardant polymers and soap and example of this. If you look at this data set of fire retardant polymers here. These are three properties he released capacity mass loss rate and heat of combustion. And so this is the dataset that. We utilized and so we used our technique and we generated. Several compounds and I'm going to focus on these two and you can see the predicted values are very low and you want low values here and what I think what's most interesting is this unit that's right here. It doesn't really appear by itself at all. In this set. However it does appear here so certainly it's part of this comp. However this. Hydra fear and ring. That's right here is not on any of the training set compounds. So what's happening is it's borrowing from this ring here it's taking this oxygen and putting it in this ring to make this structure here that ends up over here and. Forms this particular compound. And so this kind of shows you I think some of the. Not intuitive solutions that you can hear it through this particular approach. We've designed Gluco quarter cord receptor link instead of nonsteroidal. We've just published a paper it's in press. Basically it extends the Glaxo Smith Kline solvent selection. Database they have this green database and we've utilized that they're in. First design an extended it and in the during our extending we actually identified things that are already green solvent in there so the fact that we've identified green solvent that weren't originally part of the softened selection got kind of gives us some confidence potentially that some of our compounds. Would be interesting things to look at for further study. We're talking about this on Wednesday afternoon at A I C H D. There's also an Auburn group that's using. Signature and product a combined product process design. And interesting thing about this technique is you can kind of take in the point in a in a direction and so we pointed this in the direction of our strength and Hansard and set set retards for the concrete industry and certainly so W.R. Grace is very interested in this and so we have funding for this through N.S.F. to design admixtures. For the concrete industry using this technique and it's a combined. Design with experiment. Approach which is similar. What we're doing with the design of also timers disease. Aggregation hitters with a colleague at South Carolina. And finally there's a static liquids as pretreatment for sale is like ethanol and we are working with the joint Bio Energy Institute. On this and I want to talk a little bit about this one. In more detail so Tennessee has committed to switch grass as the say last source to make what they call grass only. They're paying farmers to grow this is fairly robust You can see a picture of it there. So pretreatment of the biomass is required to expose the say Los there are different techniques that have advantages and disadvantages. I act liquids have been of interest in this area. Because it's identified as a green solvent and it has a basically a negligible vapor pressure and so the energy constraints or the energy input associated with recovering the solvent is is is much less so. Salts I think liquids or salts that melt below one hundred. On environmentally friendly solvent. Their design are solvent because you can kind of pick between cabins and eyes. And here's just a sample of some. Projected to be a very large combination rather large number of can I. And I develop one so you'd say well this is. Amenable to a computer aided molecular design approach. So for sailors pretreatment. We are looking at melting point viscosity and sailor site delivery. OK so we're going to play or technique here and we need to score the hand first Aleutians. However. Previous studies. We've had all wrapping data sets if we wanted multiple properties. So if we wanted to do things. Only point and vapor face there are a kind of clarity like we did for a hydro floor of foam blowing agents. The data sets are overlapping that doesn't happen here. I like liquid there's very limited overlapping data at the melting point set is large and diverse viscosity there's data available but only minor structural diversity and a very limited data on site Realty. So we had to redesign or we had a modifier design strategy. Because there are only few ironically quids that had all of the data for the desired properties so we focused on the cattle in the side. And there's more diversity among the kadai And then there are hands and so while this is a more qualitative approach. What we've done is we've kept the constant for each property so. For the melting point we were keeping this and I'm constant for the discussing this and I stand for the Cybill of this and I asked him and he said we have a decreasing number of compounds available to us. So we've selected the melting point. And by the way there's the. Cuter statistics for the melting point viscosity and the. Cycle the work. OK so the inverse design process Here's a sample of the five hundred ninety nine training said compounds for the melting point set so what we did is we took our set of five nine and we solve the inverse problem the way to really spoken about we get about a hundred thousand solutions we keep those that have a Melting Point less than fifty and we end up with close to three hundred thousand solutions we don't generate structures at this point what we do is we take those solutions and we score them relative to our frisk us the set and we keep those that have a predict. Did you discuss the Lower than one hundred seven points. At this point though we introduce an overlap metric. So what you overlap metric does is it looks to see the number of signatures that are available to the model relative to the number of signatures available to the compound you're evaluating and if that overlap in signature is high enough. We keep that and the conjecture is that the higher the overlap the more confidence she would have in those predictions. So we kept. At the seventy percent level the viscosity set we do the same thing for the side or the set. You might think five percent side ability is in small bits of very large number for cellulose. And if we keep those that had a greater than fifty percent overlap. We end up with about twenty two hundred solutions. Those solutions we generate structures for. And each solution doesn't necessarily have to generate a structure. However one solution can generate multiple structures so. You don't get two thousand one hundred thirty four solutions structures here. We end up with a focused database of close to four hundred compounds and here's a sample of some of them. And obviously they're going to have. Predictive properties that are within the range that we're interested in. And so what we did and by the way here is. Hey here's just a. More conceptual level. Presentation of actually how this works. So really we take these two compounds here that are part of our training set and what you're doing is you're cutting parts of them and making new molecules out of him. Hopefully you know it's the parts that are really hopefully you're designing it such that. It gives you the optimal properties. And so here's a little example of one of the compounds that we've generated. So experimental verification so I have the student that they had want to see fellowship. They were sending to Australia with Doug McFarlane to actually synthesize some of these and so he gets there. And we give him the west. And he's like Well in a couple of months we're not going to be able to synthesize and testes you know our top performers. So they identified those that were feasible to synthesize within the limited time frame that they had during the summer. So here's an example of some training set. I had a quick wit So these are among the five hundred ninety nine. And so these are some of the ones that are inverse designs identified. And so they were synthesize and experimentally tested their melting point and you could see. You know how these parts come together. To form the different parts of these new liquids. And so. You can see the experimental values and then the predicted values and you can see that they're you know they're fairly close and so. On This is very encouraging for us because the next step then. Is to set up a set of system lab and so right now this is happening. We're looking through a set up a synthesis that have a Tennessee text to the design some of these more. Additionally and evaluate their melting point we already have protocols established in our lab for sale side ability that we had worked on in the previous semester I didn't discuss this at all. But we have work that looks that data mining pub cam Cam is a bio asked a database for me and I. Across multiple bio Athanase. We have a paper on this from last year. It's really not computed Michael design but it. It's really an exciting technique and we're trying to do this on multiple bio Athanase and we have an experimental verification step that we're involved in right now to validate the or verify some of the predictions that this approach has made. You know fundamentally going to thermodynamics this. So I have projects involving modeling and experimental work of poly also flowing agency ability and ultimately a question of state creditor's ation. Using quantum mechanical approaches to aid in the equation of state primitives ation So those are some future directions and some additional work them involved in. And I know I've exceeded my time limit. But at this point I'd like to acknowledge the Department of Energy and. An S.F.. As well as collaborative and so here's a group of students that helped with this and these are all. Actually the greener the undergraduate students. So I had several in graduate students. Involved in this. So thank you for your attention and I'll take any questions at this time. We would we want to do that once we. Once we. Experimentally determine the properties of the models. So as part of our Also timers disease work. It's a much larger algorithm where we're going to then refine the mom. Predictions based on the the predictions that our models end up making. Alternately we hoped that we wouldn't have to do. That's that. But practically I suspect we will. But yeah there is a certainly you know that aspect helps us really refine the models that we develop. You mean the inverse design. Well I mean it really depends and so let me give an example. You know I don't want to bore you with the fact that we have to draw the compounds are going to take a longer right. So but I think practically you're asking. I how long this computer have to work on the system and it's really. If it doesn't finish. You know in like a few days then carbon authorial explosion will kill us and we won't be able to store all the information we have so we have an idea of how long things take. And so you know we're not talking about weeks if that if it takes a few days. That's at the high end. So it wouldn't take longer than that. Yeah I mean I haven't used it for anything like that. So I don't know if I'd really be qualified to speak to that I do have an idea as to why something like this wouldn't work in an area. Well. Yeah well so you know depending on the states right so you can have different atoms that have different states and so we would identify that in the atom type if that was the situation. So you know nitrogen can have. I think it's you know in five or something like that. So there are ways up front if the if if you have a different animal types or if you have the same atom types that have different number of patterns you would denote that differently by the atom type in the signature. So you can handle something like that. Which. I'm curious that you have in mind. Just people want precipitate. Just. This kind of. Well. You know we. You know we haven't. We've done that. But we've never you know it's not something quantitative So yeah it's exactly the issue so we've used water to as a an anti stop in gotten the sailors out but we were really most interested at that point in measuring the site delivery and you know it was cool for the student to foil the kind of quick Woodward. Dr the water off but certainly we have water in that. Yeah and. Maybe you know that the issue with the Arctic liquids is that when you do that. Another time. You know you might end up. Degrading the Anik liquid and that's one of the issues in this particular technique or rather in this in using and cliquishness area that people are are evaluating it. Well I mean fundamentally you know if it's not. Provided the groups that you've identified upfront. Are able to come together to do that our technique isn't limited by that. So in say the you know the general in test technique. They start with like eight groups and only those eight groups can. You know what will merge together and what might be a group of benzene ring or like a fluorine and so really there's kind of it's like a mixed. It's a mixed approach. So. You know if you can generate those molecules potentially. In the same way that we have as long as you have those predefined groups upfront. Now the problem that they have is that they wouldn't have to SARS are curious P.R. is developed for those groups. And so that's where they're you know they would have to. End up doing that for a wide variety of systems in order to work in a book and find like a drawback technique. Well ultimately. I mean ultimately the if you're using it. In a predictive manner. Right. Ultimately what will generate the compounds since that you know you're synthesize the compounds. And then evaluate them experimentally and so that's really I think the only way to evaluate their their. You know their predictive ability outside of the set. They're using it. I mean we you know there are techniques to evaluate the models. There's leave one out techniques where you leave home a molecule out then. Run a prayer of write have different models and calculate how well they predict the one that you've left out. You can leave out just a whole host of them and train on this model and see how well they evaluate this set that you've excluded. So we done that but ultimately it's it's how well it predicts the set. You've created. It's not melting quite yet. Correct. Yeah. Now we are we have well. You mean for the stuff that we've generated. Yeah we're not even I mean this is so for example the melting point data we had just maybe gotten a couple weeks ago. So we're really this is this is kind of. Yeah absolutely. I mean I think that we do. And one of the problems that you run into is that even a little bit of cellulose in an act liquid makes it very viscous and so. One of the things that you. You know one of the things that's potentially on tap is how can we. Maybe not use a pure ironically can we mix of the something else so that when you add the sailors to it it will retain the site will be but you won't have it won't won't cost you as much. Money because the goods are expensive and maybe you can modify some of the properties in that way but yeah I would certainly expect the quality of the predictions to the grid because the set sizes in this large. OK. Yes So that's really high one and two so what happens is that if you use a higher signature height. You become too specific because now you're looking at the atom and its nearest neighbors and it's next nearest neighbors it's next nearest neighbors. So you get really specific signatures. When you go to a higher height you also because degeneracy is a function of the height you end up with a reduced number of inverse solutions at that point. The other way. If you end up at zero Now we're talking about just the molecular formula. So really there's there is some sort of optimal height between one and two and we've done some similarity studies that have been shown here that also kind of call out that one of the two are really where you need to be so that's where we've had it up. We've done things where we've combined Heights one in two we think you SARS with multiple heights. So there's opportunities to kind of mix mix and match that level. Yet. Yes but we haven't we haven't. I want to we haven't needed to do it yet. I mean. But there you know we thought about ways to add in Cairo receptors through modifying. You know what the signature would be or what the how we would denote the atom type so yeah there are there are ways that we could do that and I suspect this. This matures. We might have to do something like that because you know what hat what we would do now. For example. If you have a stereo isomers and the sis is active in the trances and active and you make it to the graph and it's the same molecule. What do you do we just get rid of both of them. I mean. And so we move on. But yeah I mean if we were. You know normally it doesn't come up but in situations we've just got rid of it. When it's going to come up more and more in the future. You know we can certainly consider that absolutely.