My name is Mikael Brown I'm a research engineer in the I.S.I. you Department industrial systems engineering. Those are always on this side of campus. I work under the direction of Dr Evil East professor and I have my department and she's also the director for them to help you are going to nation transformation under the eye of my department and with an F.M. to several different schools. She's also the director of the Center for operations research in medicine and health care her educational background and she got her Ph D. and computational applied mathematics from right. And ever since then she's been doing absolutely amazing real courage that has to do with bridging the gap between industrial engineering and real applications in health care in medicine or research usually focuses on the things modeling software development or medicine or clear medical decision making. What you guys are going to hear some about a really exciting presentation. She does work with hospital old public of organizations like the the Veterans Administration and pretty much anything you could think of that has to do with out here. Hugh had a finger in her hand in it or a whole self it at some point she very passionate researchers pretty much all hours of the night doing research but she's also devoted mentor I'm happy to have been under tillage of one of her graduate student in today's talk is going to be about a machine learning framework for our medical decision making and it's really exciting that I hope you guys enjoy. Thank you. So thank you for coming to these talk and early in the morning and also think about the opportunity for me to present a talk to the I.P.P. Committee. So just sum up the history. And I can already mention that so we start to some of this research already back in ninety ninety eight. When I first came to Georgia Tech and we were very fortunate to receive the National Science Foundation. I want as well as to make a foundation awards that help to establish the center of operations research in medicine and health care and that's really the first of this kind in as why it. Terms of really using systems approach to look at medicine and health care so many of you may already know the as to why you have a history of doing health care research but most of it is really on health care management and with the graduating master's students and working in hospital as managers and doing day to day operations but these programs really focus on research at dances in medicine and so we are very fortunate because a lot of the payers key players we work with the top medical schools and on a lot of the research and the center really focus on quite a bit of work based on that how prediction did the stand. Gnosis and how these aren't optimal treatment and different type of the Seas drug delivery Truman outcome and are so quality safety and are so public health. And global and also policy analysis. So as you can see the reason why it is so broad partly is because the a lot of the problems can be looked at in the systems where you and there are many methods that one can apply to it. So the. Then the second center and two we just got it. Several years ago. This is stay an S.F.. Industry University collaborative partnership Santa and basically we were partnered with The only rule public how program in the country which is a Texas A and M.. So that's the two of us me that the proposer the P.R. ads and and Except actually really love the speakers that stuff first center that focus on health care as you know there's a lot of there will in the previous slide that we mentioned. Getting with patients with the others and the science basic science part these is really application and delivery side I mean if you have advantages How do you actually put it into the hospital which is actually quite difficult and a lot of challenges and I think we are also very fortunate to be supported by the health system leaders and they sign on as the members and they have to help. To actually implement a lot of these. So I like the method so I'm going to talk about all you have the decision support system systems modeling Operations Research Information Technology this division this bit of computing and also really at the ends communication technologies are these are being used in integrated together into a a framework in terms of like assisting the doctors or assisting the managers in the hospital so many of these techniques in a sense that give you a very nice. Powerful tools rate you can actually model in the sound level in the molecular level in a human level and also really in a facility and system globally across the nations. So that's the beauty of it and also the fun of it of course to students sometimes who would worry about exactly how are they going to focus and what do they have to learn to achieve that. So as you noticed that so I think we do need to spend a lot of time with the students because it is important to figure out what they do and also what what is the interest that that stay connected conta the most so in a sense that these work comes really at the right time is step because there are major forces that really make us think about healthcare reform and also really how do you translate the scientific work into the hospital. So there's a lot in this treaty of now in and I H and the sap so it gives us a lot more emanation to do the work and I want to emphasize with those of you you have taken. I was right you courses set your vision there as well you look at the districts that mean they look a scheduling they look at things that have not much to do with the patients but more on the operations side but the work here really focus on the doctors the patients and their health care systems. So the basic science part as we look at that Gnosis the look at treatment the signs that cancer HIV treatment these and we look at how do they get over that once you get there the Sunday advanced use and love with you have to be sent. How do you look at outcome to an outcome. How do you predates that. Patients response to different treatment and met them up so that when a new patient comes in. You can use those characteristics and be able to guide them to the right curtain and also public health and in the delivery savvy look at the quality the operations the efficiency which is always part of what I said I use about information management many of you know if you're working in the lab you would generate huge amount of every single day. How do you actually manage those information and use them all to get it right. So that's the really the key part is a lot of times we use them just one set at a time but not all together the change management that's about the N.S.F. son so that's about the Health Organization transformation. I must mention that it is really an interesting experience in a sense that when I first started this area. I woke in the cancer treatment the zines and let me first start to fifteen years ago at Columbia University in the medical school and the doctors thought that we were crazy you and me are thinking about how do you put these in we were on real time planning in the operating room. They got this crazy because first the problem is too hard to solve in real time. Second even if you do it. They're not going to have the machinery that can accommodate at the OK So rate that that's really the key part why do you have to change them because they do need to change the configuration in the operating room to allow you to actually implement those change. So that's basically the overarching focus. So if at this time I'm going to tell of a machine then and basically talk about this criminal and novices and rethink the model in medicine and this is one of the hottest topics in the. In medicine nowadays and actually in many different areas. So the line up my talk. I will talk about a little bit about what this women that assists and through the MA those are just give you a general schema what it is about and the mathematical concept that there are many different methods so you can always supply your favorite methods of course some methods work better than the others that Suki part I was showing you some of the real app. Cations most of these are clinical applications and then you have time we talk about other medical decision making but I don't think we have an i Pad. Yes it's really quite long already. So what are the challenges. So when we think about systems. So think about the data that is available not just the biological data you have to and the eyes. It has the clinical data. It has the historical data the outcome of other patients. How do you utilize all of these data and be able to identify some of this woman's patterns from it so that it will help you to advance what you are looking for a in this case there are many ways that you're trying to advance maybe you want to predict the health risks of an individual you want to look at the the tactician it sees detection and I will mention one of these these seas action that is quite novel and also really early stage detection. How do you really. Monitor the treatment and the prognosis of the individual and be able to change the cost of the treatment that is necessary. Knowing what is going on and be able to change that. What are the therapeutic choices and how do you have on public health in the engine and at the pump population health so public health intervention here a lot of times we separate them. What medicine in a sense that that's that's the part that a lot of times there's even the School of Medicine this School of Public Health but if you think about how do you decide when an individual should take a big scene. So that's a very important question to answer that question is if the doctor. The scientists in it or is a public health. They this where they have to say OK we're going to stop these nation and this is the policy because I thought so those are all intertwined and I will give you one of the example where we are looking at that very goodly Celts. So a lot of times if you look at the outside world. What do they call these area is medical in Vermont. It's a health informatics and I mentioned about computing is to everything. We're going to do. With data you will have to implement some very powerful tools so that you can I just saw your mother so that always come hand to hand so help informatics really play with the organization after more use of our logical and medical information data and knowledge related to all of these cease and chemical conditions. So you notice what I mentioned Ray I have information I have data and I have knowledge they can come from many different forms and I would like to use them in the most effective way so that I can get some information. So the idea is to I I would like this process to be able to facilitate discoveries new discoveries and interpretations. So that we know from these stated that will help us. In the development or you better wish an innovative technology for early detection intervention diagnosis structure livery optimal trim and the sun out contradiction and population health. So what is this criminality is one. One way to look at things is that if I hit group and groups of. Different things I would like to be able to classify and if you are into one of these groups. So that's the idea. So you think that very simple idea for example if you go in to see the doctor. And you complain about it. You have chest pain. So what the doctor has to figure out is what are you OK. You are not having any disease or even have any B.C.'s what part of the seas you have raised first of all the global first step is set to a two step normal Was this not normal. So that say if you are normal you or your chest pain could just be moving your bum this out or something happened that that is not important or not not off significant step they have to worry about or if you have chest pain this OK this is not normal. So we have to look into it. What's going on. So that's the two groups but that's really brought you the doctor only know that it is not normal. It's not going to help you because he would not be able to take care of what to do so then you know. Right away. So there's that easy stenosis right so you go to the doctor. That's what. So we do the training set and we'll use training says so that it will help you to develop what we call it the discriminant rule or the predictive. So you use a small set of these and as step two and the key part is the third part is that when a new and as you come in you want to be able to classify the correct. OK So all these stack Gnosis is that there are the evidence learned from the old days that they're going to be able to fight it because that is how the rule works. So an example is pets now they take the pets man test and they diagnosis yes the cancer us of species of cancer in these patients no there is none but you know what is the percentage of. Correctness in pets me as speaker in sixty to seventy percent. So sometimes you use those rules even though you have force. Positive because you cannot get the perfect. Percentage in terms of the correctness. So there. How do we use this type of techniques mission and then a lot of times you hear lots of different races that's called Machine Learning supervised learning classification op ed in recognition. So when used as my earliest to work at you when I was a Columbia I started my position in the enduring school and our first education is really Would the Wall Street and with the consume of happens. So now they all populated Google in everything in the Internet when you buy something they said well you may want to buy these to all the customers that buy these items also would prefer the side. So this is caught customer pattern recognition. OK You look at consumers. What do they like and you develop the rule and they act in ninety percent of times are very correct why is that the case why is that the consumer patent is so at Grey whereas in the medical diagnosis it's not. Pick. I guess. If you have what huge number of people buying and whether you would like to be a subject of the study or not you are unreal. Right. The moment you create on it so that is really the key pad so they can really find it and they can really look at a very broad base and be able to come up with of course it is also a bit different when you inject biology to a great human. It's very different even though we we are so alike so there's also the of course credit history. So some of my students actually work for the for those companies and and they apply these techniques for looking at whether you should. You should let the individual open the account or not and looking at the chance of the fault there see of course the economics where you actually predate the stock market and are so investment ran all economics Tran. And of course it is Steve many different then C. applications one of my cousins actually works. For the F.B.I. and they look at the use this type of techniques and the end for Essex and picking up the sort of picking up little fiber of from the sea and trying to reconstruct all of this information. So those you have heard about now medicine in a sense that. A lot of times the most popular techniques in medicine has always been the test and you know just take regression partly because the to sticks has always been really very readily available to. Medical research. So the illusion which in many techniques has not really been very popular and so it really can last five to ten years stead that it gets really although if you talk to let you know my grip medicine the director has been telling me they have been looking for these type of ideas for the last twenty years but so so I think they actively really want very sophisticated techniques. So a lot of these techniques can be used in many different ways and I listed prediction of these seas early detection and intervention I really have mentioned. So you can actually also use that in terms of predicting the behavior of individual when they come into the emergency department. How do you create that the outcome of this individual and call will be a meeting which is one of the very big problem all big challenge in terms of for the emergency department. So some of the thing needs to I'm going to mention today is involved in a Tory of has a recognition of division intelligence and support victim machines so their favorite magnets or the realm at the statistical analysis data mining. The A lot of times have been used a lot and they are bottleneck. But the next in those approaches is step many of you know that statistical approaches a lot of times have trouble with a huge amount of data that mention the curse of the mention is that good to manage and you have to have a really huge sample size which you don't have that luxury in many of the clinical trials or any of the clinical study and neuro network. It's one of the easiest and most friendly approach in terms of who are istic approach where you can write a brochure algorithms and you can find some solutions is not accept approach but it gives you a very fast solution. So what are the general what is the general scheme. So no matter what method. You're going to use the schema is identical. So first you're going to identify the attributes or the patterns for a given dataset any data set that you look at then you develop the mathematical models and they come to those in the engine so that you can actually or establishments and establish that predictive that they steal the first step then this step three is that you validate these predictive rule to see how much confidence you have in that. And of course you will find you retrain here you can see you have new data you come in you you refine be the rule and the most important step. If you indeed one you rule to be used at all. So if you don't then you can stop at step three or step four and you can write. Papers and these will be the cries of people said you would see a lot of times but if you really want to figure out. Are there any clinical significance can I translate my work into the kind that was setting. Then the step five is most important because that is the time when you really want to know can you really predict. OK So because you can bet that they really will but you cannot predict that that means it cannot be translating to the usage the two usage. So how do you identify attributes so let's take a look at it. So I do not know the background of the audience so I'm going to give this some of you. This is could be really simple the next field slides and some may be new to you. So I'm going to start with the idea of what attributes means it has to just if you up your concept of what I choose for example I have a plan for Iris plants and I have fifty in each of these. And what I can measure as the attributes or I can mention attributes as dimension of the similar length. And the signal with. And the pedalling and the paddle with a gate that could be part of the attribute is it easy to obtain these attributes the answer is yes is usually a lot of times you want attributes to be pretty easy to output is that I want to know. If I pick and I respect and I want to know why. Which one comes from. OK So is that easy to understand so you have three groups and you have full attributes. So you look at how you register that then you can read just to see if you can register like this one is that you have the attribute C. and then you have the class of what these iris speed longs to so that Steve three groups with four at your boots and past then now we match so that's the plans now for those of you enjoy wine as said. You of course there are some very good wine tester that they could taste of that that could actually tell you exactly where the wine come from. Of course you can also do some in a centipede test is that you can take the measurements of these a constituency in the line and in this case you have. Thirteen of them. And again we have three cultivars and malleus in a similar size we change today so we have different sample size in each of these scripts and so now in this in again it's numeric attributes I have thirteen of these attributes and then I know each of these sample which group they belong to. So that's how you represent the. Now as much into human. OK so we get a bit more complicated. You don't just measure our height too of course you can measure that too but there could be different things so I write the C's. So this one. We have it from once a freak groups normal high pull and hyper and we have different measurements here. So now again you know the stats. What are these machinations in your mind if we look at medical stuff. It could be biologically data. It could be clinical data. It could be the ball retrieve data it could be images straight could be anything. So you can look at the next one. So now we match into the real the real data that we are looking at. So this one relates to this disease study the M.R.I. images. This is a city with biomarkers and the method below makes and this one is a micro and do the images and the patents and I must note the patent C.-A is looking at the branching and how that branching how many branches are there these these vascular. And where they are located. So these when we actually look at metastasized the cancer cross the body as a different type often sees for example premature aging and Mecca to be generous. So you know this dependence becomes really difficult and now you don't you don't just OK I counted them measure it. Now you have to come up with an albums to just look at the patterns and figure out what they are. So once you have to attributes and this may be just given to you. Then you have to develop a mathematical model. Try to soft him. So I step is that so what is classification in the ready simple senses Jeffrey example I have two groups here represented by the little comma. So I can point to I put plain So just a straight line here and separate the two groups what do you notice in the first of all that I do a good job in a separation. Yes or no right and I do not know if I can actually separate these with a straight line stances have you stated that no we cannot be separated with a straight line. So the objectives are you can minimize the deviation minimize the maximum they've ation the margin and all this but we do have errors you see right away misclassification may not be possible to avoid and we just have to make sure that we can do the best we can. So how do we go about doing that. So the concept and I'm Linda mentioned to you like how can the edge of classify so force discriminant the rule set if I have an groups. I'm going to put an entity into one off the end groups and nothing else. So this one works well like the previous example what you see then is that I have two groups or a all these dots have to be either in one on the other side. So it works well you get the group of us separable that misses easy to separate them and then they don't have a lot of errors. Otherwise you will have a high rate misclassification the next one is set because of that. Let's say we don't classify them all into one of the end groups for example look at the thyroid disease. So I have normal hyper and hyper high pull. So the patient comes into the doctrine. The doctor discover the OK the patient is the director and is not normal. OK So is it normal Was this not normal. That is what two groups say so. So that the chance of error may not be so high. But can you take the cost of treatment if I tell you the patient has at normal thyroid. No because the course of treatment is what hypo and hyper are totally opposite. So in this case this is exactly what we're doing is that we lump those sub sub groups together. So in this case you reduce the chance of misclassification but it doesn't really help you in terms of like trying to take the next step. So so that's the chaos. So what are we doing this is one of the focus in my lap and. We're looking at in these Swan I projects the free groups you represented by the one two and three. So I just put tech into the out two plane. So you know the set I have this is what the separation this and you have some errors that you can see. So this is what we call the research judgment. So we instead of classify the only in two and groups in this case three. We're going to make a research judgment here for those that F.S.E. So this could be the case. Steadman patients I mean we have some terrorists and they figure out OK. It is not in this group in this group or the other group but we really do not know exactly where the patient is but we don't want to miss that gnosis. Then you put them into that research and what it means is that you will take other measurements to separate them and to avoid in the skies vacation is a sticky. I don't times in medicine is a little bit different. Instead. Classification means that you are going to make the next step. Right. The treatment or anything so you really don't want to treat the patients wrong and you would like to be really taking a very careful step in order to to be able to have the correct that Gnosis So the advances that we have first if we develop a viable compensation a model for motor group classification that means it depends it doesn't matter how many groups you have for example I could have eight different type of skin this is that I need to figure out. So you can apply this model. You can apply this model for cancer patients yes yes cancer or no it doesn't have cancer. So we at the end that I see complexity of these tomato and identify. The crew three sticks of the solution space. So of course the most important part is that we want to test and that are they the ability of such a model and how it causes side different type of applications and we also want to agree to incorporate the predictive rule for critical decision making. So that's really. Now you know this I put in parenthesis the. The N.P. complete and the theory and combination here. So for those of you you know from computer science and they can create you know that stick graph of problems. The hottest in terms of compensation no effort and these problems for into read this or that mistake is no easy way to solve the problem. So it is a very difficult problem. So in general. Here's what the problem can be stated you know given groups with an entity's and each of these entity has at your boots now I'm going to have just one slide that has all the mathematics to it. So the mathematical model basically assign a variable to each of the entity that means if it is individual human so that the assign a variable zero one to these to this individual and one means if you discuss if I correctly zero S. and classify correctly and then there's a mathematical expression to describe which group it belongs to and also mathematical expression to describe that it is being placed in there is a judgement and a constraint to read the constrain the percentage of misclassification in each group. So the objective is really easy is to maximize the number of great classification race still so legacy it does not matter what what model you are going to use all that you are going to care for the best results coming out from the. Mathematics. So the outcome then is a predictive rule represented by a mathematical expression in a case you can think of it like regression line. You know is this a curve like you know how you move it all the points but that line feeding it's not sophist. Because the most important this is when you have an other entity you want to be able to predict using that particular feeding mind rays so that's important. So the outcome from our model is not aligned. A cookie is actually a transformation is a mathematical set of and that's probably the most complex plot in terms of explaining to a lot of the doctors although they sometimes they really don't care because it is just a black box in the back and it's not like when they test them so they just say this is Steve. Many about the magic that I'm going to include. So this one is the end of the J. from Group G. being pressed about that in Group G.. OK so that's a symbol you. G J is either zero or one if it is one. It means it is cast by correctly right because I classify the j entity from Group G. back into Group G.. And I want to maximize that. So I make I sum it over all the groups and I sum it over all the entities. So this is a mathematical expression. I won't go into detail and this one is. So we want to maximize the total correct classification and they want to constrain the misclassification And in this case I put fifteen percentage so you can put any favorite number that you want. So the key why this is so important is that a lot of times if you do not include the misclassification constraining you may come back with a solution very fast but there really that solution. It may only they in one of the cases I some of the had this nice one that we look at the some of the models at it come back with only forty percent correct pacification So it is worse than the prime minister a vis five months yes or no. So it is you get half and half. So you need to be able to have those property. Although it is not true that all moto skin can have that and we actually use one came up with a very good solution. So what are the characteristics say now if you are doing computer science a systems engineering a lot of the work in that this into the application that's also the theory and they come. Ation OSA So this is basically sum up the company area that you can look at this is really the first efficient competition the model for and groups. It has to that's if you look at the literature are a lot of times estate used a set of data and then they define a model. So if the data only has two groups you define a mother that has to us so we edges that out with the model and you can apply that to any of the number of groups. It has and none in the a transformation that managed to claw curse of that mission now Nitty that means you can have a million attributes but it will be transformed into the space where the number of groups. So this is the most powerful I read that this enough to know exactly how this is done because this is exactly why it is so powerful and it overcomes a lot of the statistical. Issues. So it allows constraint on the misclassification rates. It provides a service sort of judgment which is really quite important in many different settings and it also allows you to develop predictive grew at different levels. So I would show you that. So in theory they are those of you interested in computer science and systems. So I mentioned this is and he can play with it also is very interesting. This is one of the very few models in the shin the Ning way this university strongly consistent. That means no matter what percent page and what this division you have in your data it will always converge to the optimal solution. So this is a very beautiful theory to have in terms of in your solution space and also in a sense to as you mentioned the key part here is the following is that it really works well when the sample size is very disproportionate that means I could have the normal set on normal controls set to be like one hundred patients. And I have the D.C. set only ten patients. OK So in many cases is that they were biased towards this December size that is fake. But in this model it actually overcomes that their duty. It is still very difficult to understand the all the greater stakes in. This is just the sum of the work that we do in terms of the theoretical pot so I was skipped them. So how do you gather they get predictable and now you have the data you get the attributes you read through the mathematical models so the next thing you want to know how accurate is it using these to take the predictive rule. So basically is that you're applying the entity to it and they say the key part is set the predictive rule as step listening it is very hot. That means it takes a lot of complication time it could take months of C.P.U. time and I have one hundred seventy six computers to be used with me and at any given time in this running lots of jobs and lots of jobs that relates to medical decision making and machine and then NG so running getting that predictive rule is hard but really testing it if I have new patients. I take the measurements it with at nanoseconds it's very rapid So how do you validate such a group so Pancras tensile cross validation is a common way is that you use your training set you petition the into ten almost equal petitions you do is nine of them for training that gives you a rule and then you test the one full and then you repeat ten times. So this gives you an unbiased estimate of the predictive correctness based on your training set and this is where everyone does. And you can do are still leave one out or many pay for it doesn't have to be careful. So the next step is to be training so it's easier to show you in the. In the bigger here. So I have experimental data at clinical data come see there's a pattern recognition module where I mention is that it could be different type of data end of this imaging you have to come up with the algorithms to to figure all of these and then there's a fee to selection for example if you actually come up with a many and type of attributes you. So each. A vigil has a million of attributes. So you go through this feature selection in a sexy fry it learns and get the classification rule for you to go spend here and continue to train and to the solution is satisfied until you you say yes this is the correct classification rate that I set a lot of times over seventy five percent is already the best result or sometimes we are very lucky we get ninety five percent which is really very good and then the solution will be represented out to you. Many different groups and then the research mark that we do from the research judgment. We continue like here and sat classifying again. So you select different types of attributes. So this is the beauty is that in order for us not to misclassified we have to research judgment that allows you to continue to do the multi-state presentation and this works really well in terms of in the medical arena. Brand tests. So now as I mentioned brand test is really simple. You just take a new and the a patient you take the measurements that way those patents form that this criminal group and then you test it. So I want to mention something very important prediction is not the same as correlation. OK So you ate then B. That is a correlation. So that's what you look at the phenomena. This is if I do the experiment. I see these three cells that see correlation and you frankly relation so prediction pretty complete Still that means we have A is equivalent to be a implies to be and the conditions of be sufficient to really describe a or predict day. So this is really the power lot of the we're now looking at characteristics of things to really focus on the correlation. OK So and that's what I'm saying is stat. Prediction I'll predict that medicine is still rather new in the. In the area because. A lot of times because see traditional way of thinking always to look at culmination without the predictive power so we can to really overfeed the logistics lines right the regression lines because we're thinking we want to look at all to commit three sticks without really thinking well if you have a new patient how are you going to predict use indefinitely. So it's very different from thinking so that I won't be able to go through our case. Absolutely not possible but I'm one who show you. Some of these that I would really exciting news outlets and the first two and then we can go through in great detail and the first one that really is a very important work and that really touched a point the medicine and public health that's why I want to mention that and then second one is on thirty diagnosis of cancer. And then we have the S.M.S. disease project and we have the last of the couple omics projects the last two is different as you know the set. The last two actually the the fifth one is really trying to look at what are the characteristics that we can predict how patients go to the emergency department. It's very interesting. We're able to look at thirty five. Pediatric sites and be able to predict what are the discriminant patents inside all of these characteristics that can predict that yes these places will be visited by lots of children at a certain time so it's very interesting that the three sticks and this I like to help them with the policy and with the operations and the last one is predicting the mission of patients also very interesting because Children's Healthcare of Atlanta and she was able to capture all of the information not only the patients but also the. You that I say Sion and the nurse what they're doing the doctors what they're doing every single minute. During the in the Navy. So we were able to predict. So those out quite interesting things like this. I just throw them out here so that you get a sense if you're not so inclined to bio stuff then you can look. At the more health care delivery side and still can use it. This type of approach is so that so that predicting the minute the of vaccines. So the steady bit is to develop methodology is to predict the amenity off the victim but that exposing individuals to infection. So that's really to address a longstanding challenge in the valley paying back scenes where we can only determine immunity a what effectiveness and after the vaccination. Often when the individual is being exposed to the infection the steady at Tube or supported by the bow the fence worked from and I ID is really trying to figure out how do you actually develop the validations vixen technology so that we can actually tell the individuals they should take a certain vaccine are not during emergency a drink and Demming so to their mothers has already been committed in our study. So the first one employed the yellow fever as some other system and partly because yellow fever has been a minister to nearly half a billion people over the last seventy years and that single shot really induces immunity in many people for nearly thirty years so that is a very powerful lakes in and of course we're not using that in the United States but this threat the great success of yellow fever vaccine very little has been known about the immunology Cohen mechanism that makes it effective. So basically be the sign of the experiment is step the vaccinate a set of how the individual. With the yellow fever in and study the T. cells and the body responses in their blood and basically the. Blood is being taken at different interval at they Ciro they three they five they seven or the way two days sixty. So we've got lots of gene expression patterns in the right that sells for a period of time over two months and then among that we have about fifty thousand Gene time signatures. So. Now remember we have of course about twenty five thousand genes in the human body but these said Gene time signatures that means it is that Gene the and the characteristic of the gene at a certain time so this US time stem that relates to that. So just to show you some of these is say you can see the different dates and how the T. cells at your response and of course you know you look at these just look at the experiment you know their differences between different patients but we want to know other patients respond well with these vaccine or do they not response well OK so that's to keep up. So the first goal is to identify this thing to Gene signatures that can predict the Manitou of response and the and the body. Response induced by the vaccine. So we want to know whether there are signatures early signatures of innate in new activity that could predict the subsequent T. cells response. So that's that's very important because that's how the in the mood to actually response to the mechanism of responding to X. and nation and how about prediction on the B. cell and the body response so we want both T. cells as well as C. and the body. The second one is the real one. So first we want to identify that this woman patents and secondly we really want to be able to test. Are these patents that we identified that actually could be useful prediction. So we actually validate the findings on the separate set of how the individuals that received the vaccine and we will predict and then grew a talent the clinicians and the biologists just tell them this is what we have seen in terms of the the response and they valid they do it in the lab setting. OK so that's a very easy way to understand that. So very exciting result we were able to identify very distinct Gene signatures that this woman the level of T. cell response and also and the body response induced by the vaccine and the blind prediction used a ninety percent correct classification out these is. The fake deal and I mention it is survey the specifically as well as the sensitivity so it is both of them. So it is important that we that we correct classify that and also it doesn't have a high false positive rate so we need both of them and both of these are over ninety percent. So this is the first study of these kind and they've really gives us the ability to look at that the validation and also live looking at how do we determine the effectiveness of see an end. This is still a very unknown. Problem among to in this country for example it is difficult to rally be a citizen to take the flu vaccine because some of them worried about side effects and they simply did not want to come so that you could disrupt a little bit of the D.N.A. and say OK the system. The gene signatures that we have to look at and then we will be able to process and know with a response. Well I'm not. So these are the rules and I know it is not. So these are the genes at different dates that we identify these are the rules that means that these other one set us only three of these and you see out of all of these fifty thousand genes and they choose to be selectivity the tree in off three to five of these gene signatures that are critical. So you can think of it even look at it in a systems viewpoint. Then you can look at a system where you have the big notes and they're connected. Is a very big and you have thousands or fifty thousand of the other notes are tiny little ones that are not so important. If I pull at the connectivity of these fake know what happened the system collapsed right there. These are the critical notes. So these are the critical signatures that I'm out as to create and basically each of these condom correspond to a rule. Why is this important is that in a sense you can think of it is that the policies if the test things it is nice to have different options for you to test. Now this is in a sense is interesting. A lot of times. We look for just one rule but the biologist really like that it has many rules. So what are they can it with significance the ability to successfully pre-dates the immunity and effective off like scenes with a city that rapidly validation and the sign of the vixen for a new and imagine a logical agents. So that's why I mentioned that work was started actually from the that pilot defense fund. When there is a great sense of urgency to be able to understand what the biological agent that we have facing and how fast can we come up with effective vaccine. So this really identified the individual squad likely to be protected by the vaccine and they should be exempted from really taking the vaccine and also it and so a very fundamental questions that could lead to better vaccinations and prevention of the Seas so we have a paper of these published in Nature in the Audigy what is next. It's OK. So we just finished the study that last I think maybe three months ago and this one as you know this the first model is yellow fever and the second model actually took three years in terms of getting these sample data points so just to give you an idea how long it could take right in terms of using human in terms of human data so they're free trials of patient two thousand and eight two thousand and nine and two thousand and ten. It consists of different rule vixens as well as Flu Mist. So some of them is just spray in the nose and some of it. This effects a nation. So we are able to also predate ninety percent accuracy across the trials. So basically the idea is that we use one trial to develop the predictive. And then be pretty big the other two tries. Now you know this this is really at fascinating is that it is send the expensive study in a sense that it involves human subjects. It involves a lot of blood sample national answers and I. A lot of genetic. Do not mix studies but it also takes time to collect all of these red noses this is this paper and these of you spend so much time in just collecting the data but the results really it's very exciting and. It's also published in Nature in Audigy just last month. So the next step is that we really are looking at through vaccine action many different type of they can see in the Zion to an. For infectious disease outbreak and basically if you look at just floor don't even look at the very fancy biological agents. Just look at flu vaccine. It takes a long time to develop the right. Scenery. You have to test it and so the idea that we can actually after three days of injection we were able to create the it is actually working on not really speed up the sign off the effective breaks and so this is really the key part of the study. So the next month and you know this. Now this one as I mention it is fun medicine and it is public how our survey because vaccination is so gets into the public health pot where the cities cease asking a citizen to go to a certain place and that's the that's will help them in terms of the terminating who should get the vaccine and this also has a very very. Great impact to protecting the military personnel because many times they're exposed to different type of agents and I lot of times they will be asked to take vaccination and it could have grave consequences and you can predict that yes this individual should take it and this individual should not because he or she may suffer from Ebbers reaction this is really important even though it is ninety percent correct but you know when it comes to medicine ace difficult if somebody's house is one hundred percent you're going to start wondering exactly what that means say so. Next one and I don't think I have a lot of time but I would mention this one because this is kind of a very futuristic type of work but it's really not everybody is really excited about that and the stuff that. It's research about seven years ago. So we looked at every genetics and looking at the sequence signature to see if we can actually predict certain phenomena. So C.V.T. Islands are the region of the N. a sequence with high concentration of C B G nucleotides So it's about a six to twelve percent versus to normal three percent and across the human we have about twenty nine thousand C.B.D. islands. So why are they important in every genetics if you look at the meth nation said this is that this is the normal cells. This is the island and this is the schematic. The scription of meth nation and how it looks in the cancer cells is that they usually have the Had to methe nation followed by hypo methylation up between body and the ball. So it gets a very specific information when it comes to cancer south and that parents met nation normal normally and made the city jobs of course Rick when the human cancer and the a barren meth nation needs to appropriate to silencing and a lot of these genes are responsible for example in some cases responsible for South that's if the genes are silence the cells will not die and that is the lemma for. So well proliferation and also the. Marching into the cancer formation. So what we're looking at is epigenetics phenomena and this silencing associate with apparent machination promote a region of C.B.D. islands and what we want to know is that are there any gene signatures are there any sequence signatures that will allow us to predict such phenomena. So that these to most Now this is no longer about the gene that is responsible for cancer like the Brett one gene those genes but we're looking at the sequence signature that music little segments of the genes that in your body that would give you some idea. Will you actually be more prone to have cancer or not. OK so what I do is know. About these genes why they succumb to these apparent event but what is known and this is D.V.D. interesting part is said these are the genes and these are the location of the crew most of these genes and these are two types and these are the genes that a silence by Matt nation has so so that is in that division of the gene and you notice that it's a broad variety of genes being silence and it covers a huge amount of cancer type A So that is the interesting part. That we would like to look at so we want to understand the importance of C.G.I. the map nation in Kansas at the gene nominate level. So we want to be able to develop predictive rules so that we can actually identify some signatures that could predict the meth nation status. So the idea is really to be able to be activated the genes being silenced by reversing the D.N.A. methylation So you know there's this gives you a new therapeutic sites or you can identify those sites you can actually track at those sites and reverse that reaction. You can develop a novel treatment structure just blocking all reversing such math nation status and also develop math and they met nation mockers for cancer prediction treatment and pronounces So this is really the brought scheme of it and it's really interesting because I have to say as you know this my training is in mathematics right there is exactly the opposite biologists in terms of knowing anything. We only know everything as equations. So but on the other hand this one actually is quite beautiful because they so on the stuff that quite a long time ago and that we had to identify some of the signatures that actually was known by the biologist Of course it came from our computer. So first the biological data is that we analyzed the sets of the ability of seventeen forty nine and survived to see gallons to the novel methylation driven by the over expression of the M N B N M T one and then we identified features. And develop classification rules then we employ these islands that are given to us by the kinds of ologist and develop the rule and then classified it and then try to determine the correctness of that then the are still classified the ones that are known and then do a prediction on the human chromosome. So this is the general schema very simple to understand a very complicated to actually achieve that and said This is the need to see Future Islands to suffer from the breast cancer cell and this is to a math major C.B.D. Allan's So that's the pattern recognition. OK this is the first step is that you have the just the input this is really all the sequences. So the first step is pattern recognition. So you run through the computer program that we have the relevant and it comes fat with twenty million little sequence of patterns. OK How long does it take the answer is very long because this is a very difficult problem in terms of just looking at short sequences so we look at five to twelve base pair. And once we get the sequence and the pattern send the select all of these patterns a small subset and then be crass inside and then because divide the them into different groups where they said that made it on the method and then we apply the new one to it. OK so that's a general schema. What do we get in the training sets we get ninety percent correct classification which is really quite exciting because that's the first time where it actually identifies sequin signature that allowed prediction of native versus And that native and that native sequences and then in the Brian prediction we were fortunate to to have forty four sequences and we were able to restate that we achieved over eighty percent correct classification on the two groups. So now you notice that it's quite interesting as set out of the twenty million we sell a only seven. So here is still. The keypad. You will always remember an expression for those of you that are very familiar with biological work and what lab is that you always generate a huge amount of data. How are you going to be sensible really narrow it down and get very few of them that you stick. OK Because you can you can have a million of them and you overtrain OK so you must have a very small set and these are the key ones. So two of the patents I want the seven identified algorithms actually unknown entity C H and. Aliments so we're able to pinpoint exactly where they relate to cancer exactly which grows in which is location and way it has that particular pattern. So it's very interesting because it opens up some idea of how do you actually intervene in the therapeutic level. The results very exciting and the interesting part. So we actually published the results in P N A S and the within two months. I believe if I didn't forget exactly which group. I think is to do group they were able to reproduce themselves to sing the mouse model. So now when somebody told you those are the those signatures. Then you can apply that right away. It is very hard to find those signatures to begin with. Once you have them. It's easy to test it. So what that we do is that we actually upright it through the corals are twenty one and twenty two. So so you five mentioned at the beginning about how many of these C.P.G. Island stat that we know in human is have a twenty nine thousand but for the last twenty years use in biology a good approach really lets them a fewer than two thousand has been analyzed because it is really quite expensive too and I see it and you don't know which one to analyze first. So what we do is that we actually applied our predicted rule to chromosome twenty one and twenty two and they identified as there is out of that I think it's about fifteen fifteen hundred seventy islands and all of that we identified less. What and sixty of them that is math and they can program and then I sent it back to the biologists to test it in the lab so now in a sense it helps the experiments because now you actually prioritize what C.P.G. islands are more interested to and that eyes first race so they had to when and and then I this and. We were able to use eighty five percent correct. The odd thing is set. We started out with the breasts and be a pretty They claim the lung cancer. So this is strange about both from my point of view I have no understanding about the difference or similarities but what you do know this if you remember the slide that I put all of the cancer cells base clear the a lot of. Information about the cell us and the about the the cancer of different types and the meth nation and epigenetics But what most of the groups have been doing and said they have been focusing on one type of cancer at a time but this is the focus of their work. So we're able to really just spread it across. So basically before the chromosome twenty one and twenty two it gets us really good results. So we have the paper in cancer research and then the one man assists now is we are looking at now we match right into the lung cancer cells and then we look at the patents and what these for the study we were able to select fifteen off that this woman patents are well over two million of those that we identified. So really the key part is is about how do you actually sent experiments in this case and this is the same as your edge of the town drugs where you find the binding site and a lot of times you have a clue. Maybe fifteen of them works really well or fifteen of them you want to test how do you how do you identify the ones that works but to go for that first and that's very important so that brought in pay. This is really the key part it really opens up the opportunity. To identify and that made the status of or C.P.G. islands from the human chromosome and it takes us really no time. So to test the twenty one and twenty two and come back with a prediction. Take us only seven day it's and that's one thousand five hundred. So the judge injuries so you know this at the seventeen fifty that were being identified earlier that takes more than twenty years of work because you have to do experimental Now we can actually read the repartee check that very fast. So in the global aspect is step we can analyze it across many different types of cancer and there's you notice that there is a lot of correlation between different type of cancer between prostate cancer and breast cancer and all the other this. So this is really an intervention early intervention and novel treatment idea because it opens up the more likely the targets for their intervention. So this one. I only show you a few pictures. So this one that we have lots of patients. This actually was the first study that I have ever wanted to do when I first got to machine learning and prediction. When I started at Columbia and building this model I really wanted to do Alice and this disease but at that time when I asked them for patients and they told me even we had no patients and we only will know that definitively they have some assistance and through they died and we do an autopsy. OK So that c fifteen years ago. So maybe it was good because then it gave me really a few years of time to develop them all those to really understand it really well provided to the. All of the customer and consumer market and all the things in finance and then come back to the medical. Now we have lots of patients interesting thing. What do we get in this case is that we have the mini mental health stat status exam. We have the different pipe off like you asked them to. Used different things. Yes tend to draw the crop. Yes damn about the depression scale and I asked him about words that they are familiar with and want to favorite. So this is the what we call the psychological tests very simple tests but I lot of times as you know this this test has many different components and it is difficult to actually figure out. What what can we use it for in terms of diagnosis. So we have the first group actually used it and just finished a study is said to these three groups of patients that have the normal brain and then the one with the early comedy of in Palin's And the one without sinus disease but the when they were given the information we were given the mini. The psychological day to the M.R.I. and I imaging as well as many other different information and interesting in that is only two groups. So the doctor told us one is the normal patient control group one is patients with sinus disease and we find three groups. So we had to find the one group that has thirty sign of conative impairment. So that will be the group that we're allowed to for early stage in the evolution. OK so that's in the sense that it really gives you the capability of slow down the progression to us sinus issues or even stop it because there are medicine that we can actually provide to the. To the patients. So this is very exciting work and Lee finished the rope and I'm finishing up the paper on this one and the predictive rule is the first set of patients is a hundred percent and that really shocked me. Of course I go back and say all right. That is not possible. And so we look at and that a set of patients and that we have about two thousand patients from also Molokai. Sent to what they could collect and so now we refine it and it is about ninety ninety to ninety five percent which is more comforting because it really is difficult to say. You have a predictive grew that is one hundred percent because it has to be some you know differences in humans are examples I'm a good example is everything that the doctor measure me always tell me I'm different and everyone else right. So you would meet and some exception that this is about method but all makes and just to show you this is about patients that kind of interesting study in school is said they have patients coming in and they measure the MacApp eyes and they take measurements of these and be able to create a different Apple. Better but all makes that is in the body and this is actually a very important tool for some of the spots Medicine study as you know this in the newspaper I lot of times when a young individuals collapse on the caught and have had a tag usually that is always the first sign and that's also the last sign because they usually die. So in some of the sports medicine that we do we take them at their bloods the individual and try to track them. Not so much about tracking draw anything but with a tracking Are there any really simple way to do it. Changers in these megabytes that will give us a hint of a bases or deficiency on certain apples so that we can actually in to be right away so they said they said these patients I to help the patients and they came in and the lab and they were being measured and all these so I want a nonissue my collaborator this and many of them. I don't think I can these all of them. We have been very fortunate to get funding from the National Science Foundation from the several grants from the National Institutes of Health and also from the Georgia says grants and thank you for your attention and if you like to receive some of the papers or if you are interested in more of these projects so you you are welcome to send me e-mail and asked me about some of these things and I would be happy to describe things we are leaving some of the book on makes in design as you know there. It's and went to center grants to you in one thousand nine hundred grants from and I put X. in design and then they have grants for the treatment I'll complete action on how patients really respond and those patients that have hot bases and how they responds and how do you decide on drugs so. And of course we have since I started out with cancer. We have lots of projects of cancer. Thank you. Thank you. And questions. Yes. OK. Yes. Well the interesting part is that many of you apart know this in a newspaper is set by the time you actually see the change in the end I think it is too late but at A.B.C. stenosis so the reason why the. Psychological at the new was psychologically data is really good is that I mentioned to you. It really takes only five to ten minutes to do it. So when you go into a physical you can do it and knows that this is quite important and for those of you that F.E.M.A. with a traumatic brain injury because of I.E.D. that's also a very good baseline for us to take measurements for the military personnel. So we want very cheap and noninvasive mash. Now M.R.I. is not invasive it is not cheap but also really a very minor change in M.R.I. actually is caused by quite a bit of change already in the brain so when you actually can see that. So it is a little bit too late. From our point of view but like I say it's now imaging is advancing and there could be a lot of advances in terms of nano particles do imaging that may be able to detect a minus that the tiniest thing but what we do I think the good data turned out to be really good but we were given also updated. So we just use everything. But then what we what you know this is. That you will identify that he won and so that's that's why those that important. It's nice. I never use statistics estimate my training right because right at the beginning. It is not P test that is very important and peak has is that correlation. So remember that their parents and this is really about at this edition. Making process within the clinic so it is a very different. Yes yes. Yeah there is a problem and they realize that I mean in a sense that a lot. They just mentioned to you and I H And they realize the statistics are very important and integral in terms of kind of the trials and study. So there's no question about it but statistical approach even if you ask. The someone that is major in statistics and then saying that you. They know that they curse of them mention that I think that is really a difficult sophistic approach or parametric amount of time so that steam really difficult part because you know this for example in one of the studies we have very interesting. I have thirteen individuals that behave in a certain way and then two individuals that behave differently. So if you apply almost all of the to stick approach it will classify everything into the first group and it has very good results. If you just look at one without knowing all you have zero in the other group. So that's really the major difference. So I don't on purpose. The skip that but I already know I already noted that right at the beginning about I said the focus of these top is not and statistics because I am not. An expert in statistics but the good thing is that a lot of this poem stat came to me has already vet through by the statistics sions For example some of the Had this estate has some of these has been and the eyes by the stations and they were not able to come up with the patents and also the who is self. So that's no reason why. This method. Yeah yeah. Absolutely. Absolutely. It's like I mean for those of you that no other there's your intelligence. I think that was really the. There thing and I bet in the eighty's spent when that was before I was a Ph D. student and then it kind of diet and part of the reason it's kind of an interesting POV is set at that time at this difficult for them to validate things or you know that you have to be really bold to say Yes. Now this test of predictive rule. What if everything is wrong. So you could You don't get away all right. And you have to be brave. Sometimes you know as soon as a researcher I read it through like the what I do like to tell the students is that if you find out your books. That's great. We can write a paper if they find out that it does not work. We still can write a paper so that others do not do the same thing and not knowing that it is not going to work so I think being able to analyze it and being able to blindly think is really important and you know the machines the I.B.M. machine that beat the human on the Jeopardy. A lot of them. That is actually very limited in a sense if it will have everything that remember and trying to connect everything right is not yet the AI but the true AI Pod is the the test machine. So why is it that it will be the champion app to free trials. If I were the chest champion. I would only pay once. Why. Because as I'm playing the machinist and learning your steps and the machine can move a maybe and steps per second. Ray. So you have no chance but to be bitten even if you are the champion. So that is the whole idea of machine and it's really powerful but of course those machines. These aren't just for playing chess but if you look at these type of techniques. It is now AI But the reason I mention AI's because it kind of died down in the ninety's and the early two thousand. Now it came back in. So everyone is talking about. I think now people understand much better. What's going on and. Knowing how do you validate and be able to figure out what is right and what is wrong and saw things out. This is really important. If you can do studies using multiple clinical trials that is really beautiful and also you can do studies say for example you have wet. And you days and the route you go back and have new studies and then check and validate that is also very beautiful because that gives you the ability to refine your route to validate and to know what you find those patterns actually are important this I can tell you every single day. There must be like millions of patents or you've not been suppressed and being funded a but at the end of the day there may be just only two or three left or maybe zero. I mean I have on a regular basis you found lots of patents and then and still there's always tell me Dr Lee we have right now one thing. Several trillion of that's a five you know continue to run because I don't know where the validation goes I am not going to take the risk off that being sloppy in the amount of suspicious. It is important if you want to make it work in the clinical setting. That's important and in the same way they can if you want to make it work in the area and I guess if those are getting very mature because it has a huge population base right we don't have that option. And unfortunately clinical trials are ready. Have at most maybe a hundred patients that. Over the very big medical trials. So that's why the statistical approach is very limited in that there's a very good question now how do you land this year then the machine learning in. In the systems engineering department in the computer science and then the biological site you also learn how to integrate all of the pieces together. That is important. So I hope I answered your question. But that don't worry they say everybody all my students that started with me has no knowledge of basic I teach them in an individual basis because I think sometimes there are things that they have to do it is not there to know how and how to use them at that's an offensive. Yes. Yes I can sure. Yes and I can also give you a book chapter that I have written that gives you the detail on the challenges as well as a way to go and I give you a Ph D. student one you go into this area and they lay out the issues that you should look at because these are very hard problems so it's not going to die. It's not going to go away. It's going to stay for a long time maybe for. Another hundred years or so we see and hear the questions. All right thank you. I guess if you have any questions you can always email me. Thanks.