More driving factor in your life right then your doctors and then change you because the minute the numbers are right General we're in the horses for the guy who's out first really well with flowers or no great mystery about what you want one thing is for good science very interactive books they are good little things they interact with right away or another box was put in the parking garage is a pointless exercise and what Warchus fantasy reconstruction was working on associated with what is now called Iraq next door but it will go on for. Thank you so much. Thank you for having me here very very impressive campus here. You know this but it's amazing to come and see it from somewhere else. Honestly the level of inter-disciplinary that I just saw since last night over dinner and this morning is I guess make me jealous but honestly it's wonderful. Anyway so I'll start with a quote and thanks for the into. Action by the way which is probably better than what I will be able to do for you guys. I like this review by Paul Nurse and if you're not. I'm going to try to speak the language of biologists to keep them interested this morning but also try to reach out to those who don't do biology every day because we really need to work together. Obviously if you go to biology every day. This is a nice thing to read Paul Nurse you Nobel Prize winner for cell cycle discoveries. Reminds us that there really are four pillars in biology. The cell right which is obviously the basic structural and functional unit of life. Just imagine a time where you did biology and you didn't know that right. And that of course existed before we knew about microscopes which again was an engineering slash technology. Discovery in a way. The gene is the basis for hair ready. We all know that now we think we have it solved I think there are still questions that are interesting though to be asked as to what a gene is evolution by natural selection is a card to stick of life and life is based on chemistry and many of you here and I've seen examples of that this morning. So those are four pillars that we all accept it's basically under the pillow and it's sort of by I want to one. Now the gene is the basis of heredity. Really started with this paper in one thousand nine hundred one. OK We always talk about double helix and that sort of thing. And obviously that's important. The fifty three. You know what's a quick but beetle and got the Nobel Prize themselves because they discovered this dry Klink between a gene which at that time was a very theoretical concept and enzymes. Right. So it's basically genes has you need a very ready and life is based on chemistry. However the reason why I put this in is look at the abstract of that paper nine hundred forty one. Just to remind us that Systems Biology is nothing new. What those guys want to do is this they say an organism consists century of an integrated system of chemical reactions controlled in some manner by genes. One hundred forty one since the components of such a system are likely to be interrelated in complex ways it would appear that there must exist order of the orders of directness of Gene control ranging from simple one to one relations which have been studied by biologists over the last twenty years in great detail to relations of great complexity nine hundred forty one. So based on that Paul Nurse actually if you read the and that's why I was interested in this review is that he imagines that really the fifth pillar the fifth key point of biology with which we are much less of course comfortable than the first for once is summarized here as being biological organization is based on logical and informational processes and structures. OK So you know the words were on the stage now where the biologists like to do their reasoning with those four key points in mind the fifth one is much less clear and how do we actually go about this is the topic of this talk. Now there are several ways that you can think of organization. First of all of you see how cells are organized to form the brain. For example or any other organ or you can go inside the cell. This is beautiful work actually we're just talking about that earlier this morning. Yes acquire electron tomography by well my store where this is actually a model that comes from a set of pictures of the cell. That are then being compared to actual structures of enzymes that people know about. So for example in green here is the right. So now you see in an intact living cell the quote unquote organisation of those structures. Now obviously and those are acting filaments here. This is great I love to look at this right and broadcaster gives amazing talks but this is also if you think about it from the point of view of the entire proteome complete set of proteins and also other micro molecules. This is a very small number of molecules right in terms of the type of qualities of the genome is capable of encoding. Another way you can think of organization and that we know much more about because of historical reasons life is based on chemistry the work of biochemist has been incredible over the years. And so we have already here a notion now of networks whereby basically metabolize can be linked to each other by virtue of how they are substrate and products of common reactions and so you of you've all seen this beautiful metabolic network it turns out this network is actually in some organisation their properties in this network that are interesting. What we're more interested in as you heard in the introduction is what we call the intractable networks for lack of a better word. OK By interacting and we need we mean the complete set of protein protein traction in a cell or in an organ or in an organism right. So this is a beautiful structure right. You have those two guys that homo demise and then they interact with this other so media over here. What we would do if we were looking at those three interactions is we would summarize it this way. Beautiful structures three dimensional structures as you just saw would be summarised down to a node for a single protein and then the interactions that we detect between them would be summarized by. Links or alliance or sometimes called edges as you will hear over and over in the talk so in ninety three I was working in a lab that was very good at detecting interactions and I started thinking you know how do you actually start thinking of a way to get out of the protein that we were staring at a time in our B. And how would the network look like if we're able to map them all basically essentially so to speak come out of our own neighborhood and start zooming out like you could do on Google Earth for example or something like that and you know pretty soon you go from the Georgia Tech campus or the way to the United States of America or something like that. And so ninety three we didn't really know the number of proteins will come back to that in a second but we knew it was a large number of things relative butt. Tens of thousands and so the idea was can we actually generate data that would give us an idea of what this network actually looks like so the hypothesis is that global and local properties are being tracked on networks relate to. Biology. That's that's the major assumption of all this work. OK that essentially that if you were able to do map this right. This huge beast that what you would find there would not be random that there would be some information there that might relate back to to the biology. Remember the fifth point of Paul Nurse that biological organization might be the fifth pillar. My main motivation was that even though I trained as an engineer I then became a geneticist and I realized very quickly that they were very many and still today unresolved aspect of genetics which I'll tell you about the next slide and also a second motivation for me it was this idea that gene number. Versus biological complexity was very part Ochs ago. I'll show you that in a second. So those were my two main motivation to tell you what I mean by and resolve aspects of genetics. This is very cartoonish and it's designed that way so I apologize for those who know a lot about genetics but you know we keep saying US Genesys this gene does that this gene does that and if we knew what all genes do that will solve it. It's actually just not true. You know for one thing. Every time we do an experiment where we compare wild type to mutants right where we mutate a particular gene. We have anyone sometimes all four of those problems. OK. Many mutations gave rise to no fear type whatsoever in which case what does the gene do really. We can't detect anything at least under normal laboratory conditions. Many mutations in one gene will give rise to many different kinds of phenotype depending on the conditions and sometimes to classically depending on the individuals that you're looking at and more importantly in warms For example we have beautiful examples where if you compare a wild tribe to a several mutant animals only a small number of the early months even though they're completely I said gently. Tical genetically identical eyes are Janet. So they are genomes identical yet. Their phenotypes can be different between each other. Called sometimes incomplete penetrance and by the way the same sort of thing can be explained with this weird word called expressivity whereby different animals with the same genotype can actually give rise to different kinds of you know quantitatively different kinds of type. So what can explain the six right sort of good old model one gene one function from beetle into two in the second slide. It's not going to be correct. There's just no way in the second motivation that was mine and also I'm sure you've been hearing. That argument so many times is that how do we explain that this is those organisms with different could include I put quotes because I get this question all the time. What is complexity how do you measure it. I don't know but I don't really feel like a new cell myself and yet we only have three thousand genes. You know three fold difference in number of genes between us. So how can we explain that if we take the simple model that D.N.A. makes R.N.A. makes protein and that makes that provides functions functions one time and will solve it that way. So perhaps we might retire this sort of central dogma or at least find a complimentary idea and that would be the following is that the cell is a system that's organized very differently then then then this sort of you know sort of focused view. So the idea is that the interactive network proteins but also by the way I mean to include everybody micro molecules are nice if that's if that's if that's when when that is relevant. That this network actually impact of the genome that you know of course is important. It contains all this information but it. Genome is nothing without the interactor network. If you close your eyes one second. If you isolate D.N.A. D.N.A. will do nothing. If you. There are cases where we can do a lot of beautiful biology with interacting proteins without D.N.A. that's really the motor of everything that's the dynamic aspect and obviously there's a transcript in between. So you know one of the motivations to do this now how do you start building that network as I was trying to show you there are several ways and it's too short in fifteen minutes to really cover other people's work I apologize for those who might be in the room or. And we can answer questions to compare what we do to others. Of course I'm here to tell you what what or approach doesn't necessarily mean that it's the best what it's meant to be a complimentary one to what everybody does in the field but where everybody agrees I think is on this one and that is that if you had a genome and if it was well annotated and if you knew what all proteins are of an organism think about yeast C. elegans whatever else and then you could express all those proteins you could make them your member in network theory a lot of people map their networks and the elements the components already exist if you're sociologist people exist already you just have to map how they interact with each other. In your field we first have to make our own versions of the components of the network before we can actually map the network. And obviously the goal is to test or provide combinations for possible possible physical interactions and this is very important for the people in the room who are more quantitative for those who don't follow biology closely. This is an old cartoon it's a very famous paper in one thousand nine hundred nine published by Stan field and. Song where the idea is really for us revolt revolutionary and what it is is that you put two proteins that I will call X. and Y. you make marriage proteins make you make basically hybrid proteins one hundred between X. and the D.N.A. binding domain that can bind D.N.A. and another hybrid with Y. and an activation of a second act of a transcription and only when the two approaches can interact with each other. Do you get a transcription factor so to speak and that can be measured by the turning on of the gene that's downstream here. It's called a used to hybrid system again trying to speak about those things for people who have read different kinds of literature. I know some of you know this. And the point of this science to show that. This idea where you can no hope to do this in a high throughput way of systematic well controlled unbiased that's very important mapped a network as you'll see a little bit mapped a network in an unbiased matter is extremely important. OK So that tool for us was really what you know made a start thinking about the making of this entire enterprise of looking at the network now before I go into five or six sides of a little bit of history and then we'll get into the current stuff that's not published. I want to do a couple of course notes because I get that question all the time and it's a good question and actually I'm writing a little perspective on this. We are mapping what we could call the biophysical network meaning. We're testing proteins two at a time and the question we're asking can be interact. So that's what I mean by possible interactions in the reasonable reasonable dynamic range. OK this is to be contrast to what biologists really would like to have and that is networks depending on the conditions the cell types the organs or whatever the cell type is I said that that you know different networks where do we know that the interactions actually do take place in vivo and are relevant for whatever physiological reaction that they're looking at and I think that those two networks don't need to miss that. Excuse me that the addition of the union of all those networks here doesn't necessarily be need to be identical to this one and I can discuss that during the question session and yet reason this sort of going like this with us also work. I think there are physical interactions that we can detect that are absolutely not. Artifactual you know. Not at all they are happening in their bio physical but they are prevented from happening back in vivo remember we're using an artificial assay here I can discuss this more if you want to but I wanted to put that into. OK so the whole thing started really with three papers one by. One by Stanfield himself came up with a used to hybrid system and one from our lab in in two thousand and what we have done is to basically started this whole matrix by saying you know. Do it all without any controls is completely ridiculous we're not going to be able to do anything with that let's start with known proteins a small number go deep into all the proteins predicted from the genome sequence and take those proteins as being well characterized from a well can I trust biological process. So we chose at that time something that might you know it's just what it is it's vocal development and they are proteins that are really well known chemically characterized genetically characterized pass with you've seen those images. Even if you don't reviled you every day those names are proteins and genes and that was the picture of the so to speak regulatory pathways that lead to normal vocal development in C. elegans as found out by a number of generations by chemists. When we went in and said let's be specific be systematic and take each one of those notes over here and try to establish edges not only between them but with the rest of the of the predicted proteome this is the network that we found. And now the southern those passwords are completely collapsed into basically a very highly connected network much more highly connected and obviously than than certainly I would have been able to imagine at that time. This is very small dose. So a couple of years later we went in and mapped what I would say is about three to four percent of the entire interact on network for this little beast. So those networks of course are descriptive right now I'll show you in a second what we do with them but I'm showing you this slide. This is a paper that we published in two thousand and four and you can go have a look at it. The point is that again. It gives you this amazing flavor of high connected. In the cell. You know sort of think about networks think about it again sociological networks how we are you know not more than six degrees of separation away from any any any person any other person on earth and that sort of thing. This is the sort of analysis this is the sort of organisation that we can try to know and understand about what proteins are doing in the cell. Now we were not only the only lab in the game. This is a map very early map also a first draft map for east. This is C.L. against this is just awful. And it came out from a company called Cura gen and then we just recently and I'll say more about that in a second we started to we are tempted to go into the human interactive networking try to map that. OK So the bottom line about those maps again is that they don't cover much more depending on the organism that you're looking at much more than one two three percent of the problem there and I'll give you a demonstration of that in a little bit. On the other hand there are sometimes being criticised let me put it this way in terms of quality and I want to also to talk about the quality of this data with you guys so that people who are in the room are interested in looking at this data and do analysis on it finally start feeling comfortable about about what's happening here before I do that don't I just want to show you a few more slides as to of course to illustrate to you that looking at Bio physical interactions between a great number of persons is obviously not enough to understand biological processes right and that we proposed very early on this is two thousand to two thousand and one that we would need to have some dynamic. And also functional perturbation based functional assays with which we could then figure out a little bit better. How those networks actually relate to biology remember that's my that's my original question here so I'll just show you again by. Illustration And you know pointing to some papers if you guys are interested in finding more about this. So for example. Yeah sorry again while I talk about sociology. This is a very interesting case here. I know for those so this graph is really drawn by Mark Newman who has incredible craft Fareast And I think. And so here what you're looking at is a social network in a high school right. And I want to make two points about about this is a game to try to illustrate what it is that we do. And so basically one point is that what you do here is you go to students in a high school you say give me. I forgot what the number is but your father five best friends or something like that. And so for any student then an arrow points out to their five best friends and you do that with all students. OK And that's all you do you don't know how friends you know what they do when they're together. Why they're friends when they're friends when they see each other that the dynamic stuff is out another fee and yet you can see from this graph that you know they're interesting concepts are coming back actually probably remind you of your own time in high school that for example this was already somewhat of a good memory for me when I was in high school I always felt I always felt I was on that side. You know and also the interesting thing is that some kids say that this one is their best friend but this one doesn't say the same on the other side you know that's OK So I'm just trying to illustrate with a funny example in a way this group a work program that even though we don't understand all the details once and all the dynamics of all those relationships yet there are interesting features that are coming back up this call is reflected in the city of the city so and I did put the code here because I mean I don't even remember what it is like but you can imagine that this reflects the sort of society that that we go through and that's the point. What's the code. So that's almost exactly the idea here is if we could build networks where we the edges are physical attractions but we would give a color code to the nodes. Relative to other kinds of aspects other kinds of partners that can be measured for genes then that would give us images of this kind. So one example that a graduate student who he who is now she's a white had fellow. She so she came up with this really beautiful example where you basically take an expression cluster matrix and you're looking at my current data and you know how you can do micro rate on all the genes of yeast across many different conditions. Now if you do the genes or can be clustered with each other and you can color them to simplify a supposed to supposed to according to the type of expression profiles that this that they show across many different conditions. So basically what we learned here is we could take a very basic static almost boring protein protein physical interaction network. And basically give it back some life by just looking at how the dynamics of the expression of the corresponding genes OK. Another one I saw you had five your piano coming here. Maybe a few months ago or something like that somebody was so this is another example where we had a network here of physical interactions starting in bloom with proteins involved in D.N.A. damage response in C. elegans. And the white stuff was completely new nodes unknown to that point and they were again added to the network based on the fact that those proteins can interact with the starting ones. So what we did in this case is called phenotypic profiling. OK What we did here is take the starting genes and the genes that I could detractors and then do R.N.A. I own all of them. So I think that's pretty much covered for anybody. It's basically knocking down each one of those genes using this tool called our AI and that's important. Conceptually is to then measure systematically different phenotypes different readouts So in other words I'm going to that network here. And I'm perturbing each node one of time sort of basically making it go away. And then I read what's at the end in a systematic manner looking at different things that the worm is now not capable of doing cell cycle arrest reduced up of ptosis and the picture that Simon Bolton when he was in the lab came up with is basically a way to group genes. According to the kind of field types that they will show upon knock knocking knocking them down or perturbing them or knocking them out by the way doing our AI OK So to not lose genes are closer to each other because the finity profile that they show upon perturbation is very close relative to for example another group here of genes that seem to be doing a different kind of biological function now and probably most of showing you this slide this is done in cooperation with his lab work by Kristen consolidate and many of the colleagues and now you get to a point where in this particular case we're studying cell against early embryo Genesis is just to sell divisions very early on in the against and we're Genesis and this matrix is much more complete Obviously this is six hundred sixty one genes and forty five phenotypes measured again upon already I or pro-traditional knock down of every single one of them wanted time and now obviously you start getting really interesting profiles right groups of genes that tend to show a very similar profile of phenotypes out of this. By the way. So if you look at the relationship between being close to each other as measured here by the phenotypic correlation on this matrix and having proteins that interact with each other. You have this very nice positive correlation between the two so you know where it's at that scale trying to look at the organization of a cell making two and two cells making four. If you look at two very different things for that to be profiling and physical interactions between proteins the genes that tend to be doing the same thing here in this matrix then to encode interacting proteins and from that you start getting those costers of those clicks another like this. So logical network I was showing you as a as an illustration. OK. Now remember we said global properties of interactive networks relate to. If you will function. Every solve that. We know for sure. Not quite I showed you a number of correlations I showed you a number of maps as an attempt to try to get there to do what I would like to do is to ask you know three basic questions in the context OK. How am I doing on time. That's that's OK. So what is the size of this beast. So if you think the human and since we know I have that on my next slide I think we know the number of genes. Even though actually the number of genes has been this is a slide from one thousand nine hundred four. I think OK at that time the estimates range between fourteen thousand and three hundred thousand genes for example this ninety four. It's actually not bad More fact that I love the paper because and that's the point of the next part of my talk is it provides a framework to try to estimate the number of genes without the full genome sequence at that time. OK so now we think it's about you know twenty twenty two thousand genes each and putting obviously at least one protein sometimes many more than one. You know that's the phenomenon of eyes of forms coming from splice variants. So basically what you can try to do is here is to look at how you want to do this how you want to estimate the number of proteins protein protein interactions in the cell. If you could say OK I'm going to go try to read everything that's known about protein protein interactions. OK So there are tens of thousands of papers if not more. And so their databases that have attempted to cure that information. Five of them are represented here. And so in other words it's one way to get into the network by saying it's sort of a bottom up where you're looking at single labs having worked a couple of interactions at a time you really tens of thousands of papers and you try to extract a. Network from that. And then like I tried to explain to you a little bit earlier is the sort of you know let me show you this one is sort of the converse of that if it's basically taking a totally systematic unbiased way tried to express all the proteins. You know map the network out of that and so this is what of course what we've been trying to do and I was referring to that paper a little bit earlier on remember that's the first attempt of trying to get to the human interact on network map and I just want to make a point here about the fact that this other team in Germany in Berlin team of any Quaker only work has also published about the same time as we did making descent some sort of attempt so you know what. How do you do it. What's the best way to proceed. How good is this how bad is that etc Those are the ideas that I would like to develop in the next. You know five or six minutes. Another way to put the question is this is it if we close our eyes and try to imagine the real. Binary biophysical interactive network that's encoded by the genome basically by MIT by virtue of making proteins that can interact with each other and if we look at current interact on maps would they be bottom up or top down right. How close are we. And how well can we use this to try to estimate what the network looks like and just simply today. How many interactions they are you know how can we size this beast. OK. I have the feeling the colors are going up. Is it me the. All right so we just set up a framework basically recently where we realised that you know one of the things that when people read those papers and try to compare data sets for example the shells on the reaal data says that we just published in a couple of sites ago. You know a couple of number of people forget that those networks are very limited by what we call the or fail. So for those who don't do biology every day we do have a genome sequence. Agree. Yes those attractions not only are by physical but there are a physiologically relevant etc. And then we took one hundred and eighty eight pairs completely randomly in this huge matrix that we have right now. So if you have ten thousand genes that you've cloned you have fifty million combinations of pairs between them. Right. And in those fifty million we pick two hundred of them to some extent control ingrained negative control in red. So the next slide shows a number of used to hybrid our say is different versions on the system remember the fields and song thing with the reconstitution of the transcription factor. Well it turns out that we detect about fifteen to twenty percent depending on the versions depending on how to titrate depending on the stringency fifteen to twenty percent of the green that is the gold standard procedure. So the detectability of those acids is about one fifth. So for for one hundred interactions are we know are right we see about twenty percent of them and that's under conditions where most of the versions although not all some of them are a little bit more sloppy. But most of the version do not detect any of the two hundred randomly generated only selected pairs out of fifty million Mexicans. So that's a number that we have not measured before and then lastly and this is an engineering place so I hope that I can actually make a good job at this one but you know if you go in the space I dislike or first reality paper right where we map seven thousand by seven thousand. Again that's forty nine million divided by two combinations right. You do that you do the assaye once and it turns out you're not successful at every test the first time around we call this coverage. So we decided to go and use a subspace of that and then in that subspace now. Repeating our say four times which is a lot of work I can tell you that. And so now what we see here is the first time around the number of interactions we found and then essentially the number of interactions from twice on. Three times found four times now. Each one of those by the way as are as reproducible as each other. It's not like those guys are of lesser quality. They just artistically have been found. Once after four times relative to of course a lower number that has been found four times and if we take this data model it it turns out that to reach about eighty five ninety percent coverage here. We will need to do this test about between eight and ten times depending on the sort of thing. And finally and this you know I'm sure many of you who read this literature have in mind is that if you take the current interaction interactive maps and try to imagine the real interactive network you have to take specific city or should call precision probably into account in other words how many of your edges are wrong. Artefactual due to the kind of a say that you are actually using and so we wanted to measure this. And so to do that we said OK all the ass is that we've been using that we've done so far all the tests have used this as a let's summarize two proteins common interact with each other from a hybrid with the D.B. from a hybrid with A.D.T. were nice cells were in the nucleus and that's where we testing the interactions. Now we're going to do is we basically getting out of all this. It's the same idea you dealing with two proteins that only they interact they reconstitute a function but now that function is a receptor it's in a million cells and it's a membrane. And so the problems of this one are totally eliminated by using this sort of speak or orthogonal independent assay of that one. So in other words the specific potential problems of of using your yeast are say on this particular site. So now we're going to do is we're going to go back to the gold standard positive right. The two controls the positive control green negative control red and now those are the actually attractions that we're testing here real ones here completely random over here very small bag. This is this new assaye called mapping. It doesn't completely overlap doesn't completely overlap with different versions of these two hybrid system but you can see there is some clustering here where we have all for us. Is that the union of all for us is detecting about forty five to fifty percent of what's known up there now. This is the important site at least from my perspective. OK. Remember the two high throughput papers I showed you were all the nature and styles all from any rank or burning group. This is remember high throughput close your eyes doing systematically doing well do it well do it well controlled but do we do it do it unbiased systematic This is literature curators. Remember the five databases that have read those tens of thousands of papers where interactions are described one interaction at a time. If you use Map it. Here's the experiment. This is the rate of detectability of the gold standard. Plus other words it's it's this right. It's your positive control. This is the gold standard negative. So the negative control over here other those conditions no background. This is the retest rate the precision again if you want to do is too high to put datasets relative to literature curated OK So remember those are those databases that go into a time to read literature one paper at a time and what we are starting to find here and we by that we were so surprised by this that we also requested it just by use to Highbridge is the way we've actually done generated the first two the first two datasets and even there the literature curated stuff doesn't look as great to the point is this is that there's a lot of literature out there and many of those papers are great but it's very hard to read them and extract the information out of that and rebuild your network bottom up is my point here. And so I really advertise for you know a way to do this in a very systematic way the way I just tried to this. To you considering the know the number of nodes the eyes of forms doing this very cleanly and or in an organized fashion. You know in a systematic way. Considering the limitations of what I say we need more I say is that's why I was excited about talking to some of the nanotechnology people here we are only covering twenty percent of the covered stuff that's money and also reducing the size of the experiments the volumes but I think we can get there and I think we'll be able to one day get a image of the interactive map that resembles that what's what's actually possible out there. OK At this stage. If you take the framework has we have it and you really compute back the number of interactions we find in this given given space and then introduce completeness detectability and coverage. Depending on the way we do it and it's again it's a range. That's why I showed you the the gene number slide. You know ten. I don't know we think they might be anywhere between one hundred twenty one hundred thirty to a billion two hundred million three hundred thousand interactions possible between human proteins. OK. And at this stage. If you go to the databases the bottom up ones which are not so good as I showed you the top down which are getting better. You'll be in the range of five to ten thousand. So we really only have the tip of the iceberg. OK that's that I think it's one of the. Take all messages in the next slide but first I want to thank everything catus and Jeffery Vasquez the first author of the other paper and my limits and this is done with the labs of young and my own lab. So the digital message of this is to summarize the binary file by physical I'm trying to network might contain in a range of a billion interactions something like that. According to the framework we now have high throughput data can be higher quality than loop with data that's you know a barrel just at least are very surprised by this when I show this data and I'm ready to discuss it with you guys. During the question session and the current challenges with those maps is not so much false positives but it's really false negatives so again we only have the tip of the iceberg. And I think we have provided you with that sort of a new framework to map this network. OK. So this teach like to put this more into perspective I like to compare the same to the genome sequence project because a lot of people know at least a little bit about that. And so you might say well. So you only have the tip of the iceberg you still worried about your quality What are you doing exactly what those networks mean I just want to put into perspective this is. This is a very famous paper about this one. There are people who can see this right the first the first sequence one of the first. Sequencing reaction fifteen years later you have one chromosome of use that sequenced and fifteen years later now you know what the genome sequences are like It's amazing it's great quality it's out there. We all use it every day. The fields in song Paper which I think is almost as important as this one. Well it is not much more than fifteen years ago it was one thousand nine hundred nine. So when we generated those first maps it's as if we had like a small comes out of a small organism. And now we need to get organized. We need a good framework we need to think Nano and we need to increase the the Thora of assets to try to get there but I think it's possible. Again how much time I always lose track of time and I don't want to be rude about them is that. So another vignette that I would like to discuss is despite all those limitations when we start using interactive networks and put that back into a framework of of human disease. OK So because at the end of the day and I think I have this over here. Yes. If you have taken any class even in high school. I'm sure you understand this biology class. That is in the genome or or differences into genome might make us more susceptible to diseases of this kind or that kind but really the relationship between. In them hasn't been that clear. I mean the except for a few examples right what really probably happens is we didn't is here. I think the nodes are and or the edges of the network the network properties are changing and that gives rise to the phenotypic consequences that we see in the clinic. So we're trying to get at this idea and try to relate this interactive network models to what's seen in diseases in general and more particular in genetic diseases. So who does Zogby who's at Baylor She's a specialist of a taxi. I was telling me you know you were very good at looking at patients and trying to help them doing their pathology figuring out the genetics right. Map it cloning to genes and then looking at their product we were pretty good at that right. And so far I think she was telling me that I showed that in two slides that there are about three dozen genes so between thirty and forty genes. Mutations in which can give rise to those very bad diseases right when she was telling me is as I was giving a talk and I met with her in her office right after that she said I really want to see where those genes or the products of those genes are in your intractable networks are there relationships that again would not be expected by that would be far from what you would expect by chance. So this is just a one example of those you know thirty or forty. A list of thirty or forty genes very weird names you know attacks in Calcium channel dependent something top box binding protein very little to do with each other not not really a pathway you know not not some sort of really nice coherent biological story and imagine a you know a list that goes down to thirty. I'm sorry did I forget the the actual numbers between thirty and forty G.'s. So it's very simple question just to give you a sense of where this is going you know well what if we measured the shortest path length between those notes. You know about this in that work theory right. So. You can actually not only measure the shortest path as between two notes but also measure the average shortest path as between a series of notes and that's what we did here if we look at those proteins involved in the facts in this otherwise you know quote unquote static and boring network. Well what we see is the shortest path the average the mean shortest path and sometimes between them is extremely different from what you would expect from a distribution of randomly generated either networks or attacks on nodes. So somehow they're closer to each other in the network models as we have them today than what would be expected by just now remember this is important because we take turns in those nodes that are relatively close to each other we need to do is absolutely amazing diseases out there and doing. They look very much the same like those kids behave the same way in a way even though the molecular defects that affects them can be summed up some very different. So let's about a bash the many of you know him who's worked at a number of fundamental aspects of any network with a biological or not came on sabbatical to my lab and. I was telling him about this and he said Well can we do if you can do this with thirty to forty genes can you do it with all of the genes. We didn't see which are known to cause one or more of these orders and you could you do that in a bipartite graph matter where you would consider not only all genes involved in disorders but also all disorders that those genes are known to be involved with OK so if you don't do biology every day. This might be surprised if you but they are today in those databases me show you one in fifteen. This is the growth of the number of genes known to be involved with a number of disorders. It's absolutely phenomenal. You know some fifteen hundred genes are known to be involved in one or more disorders there which I think it's an incredible information so. Go have post-op in decided to look at this problem from a from a bipartite. Right. So now you have the diseases right. And you want to link them to each other if they have at least one gene in common. And then you have a gene network where you're going to link them to each other if they are known to be involved in at least one common disease and that's because there's not again remember my very very first light. We tend to hear in the news this gene is involved in this disease. Right. And from what we remember from that is that they sort of are one to one relationship but that's very far from the truth. Many You know the majority of diseases can be obtained by mutations in more than one gene and vice versa. OK anyway so here's a network of diseases. So the nodes here are this is causing cancer leukemia obesity right. And they are linked to each other by virtue of the fact that they share common genes mutations in which are so shared with perceptibility to get those diseases. I won't give you all the details obviously because of time but this was generated completely blindly the way to draw those graphs is that what we did is we colored You know it's again somebody asked about the color code in one of the networks. Well this relates to the kind of this is this is in blue here or green. This is cancer for example. Now we didn't put those notes here. OK the algorithm by virtue of trying to draw this network in the most I guess comfortable way of looking at it and it up with this particular region including most of the cancer diseases here. We didn't do that. OK so what I'm trying to show you with this image is that diseases are related to each other in a way that most of them are actually part of a main component. There are exceptions to the other one here which I could discuss until tonight if you want to remember there's the counterpart of that this is the gene network. So now the nodes are genes and the links between them indicate that they're involved in at least one common disease that makes sense and I want to stop with that to just show you one. I think. Very interesting observation to just leave both worlds the sort of all out world of bio physical interactions that are possible between proteins of the human genome and that. I guess more practical of genes can cause such and such diseases it turns out that if you look at the two graphs and put them on top of each other. The relationship of genes causing common diseases and protein protein interactions is again far from what you would expect from a set of random randomly generated negative control networks. OK So in other words proteins that include that in trying to have each other. Tend to be included by genes that are involved with the same sort of diseases. Again the same sort of separation I was showing you a little bit earlier with with C.L. you get. To finish one very early. I mean a very recent observation I don't know really what to do with that you probably have if you know anything about network you probably know that there's this idea that some of the notes on those networks are called helps because they're highly connected. I've heard that Atlanta is a hub at least three times since I got here yesterday and and so you know so if you actually look at the distribution of number of interactions and the proportion of nodes that have those number of interactions you get some sort of a parallel. I want this because the details of that but it turns out that in the cell the proteins that tend to be highly connected like this have been shown to tend to be more essential for example for life then the proteins are much less connected again using the current truck term models. So I think that's Atlanta right. Probably. So you've seen this before. So what we wanted to do is well. Those global properties that we're starting to. Observe in those networks it is still a lot of work to be done for one thing we need better maps I tried to tell you that two or three times today or more but still given the limitations of what we have today. Can we start looking at a global problem like this. When a virus infects the host cell. The virus has itself twenty's right. And those proteins of the virus are used to basically. In the worst scenario kill the cell but in many scenarios to rewire the network so to speak to change the phenotype of the cell and make sure that the cell is now doing what the virus needs the cell to do right. So we said well can we actually use a bio physical interaction strategy to try to look at where it is that the viral proteins are going to grow in the interactive network what do they tend to interact with so we generated very easy dataset to think of the yellow nodes here are virus viral proteins and the green ones are the host proteins right so we're looking at the viral proteins interacting with the host proteins and we've made a couple of I think relatively interesting observations we can't explain them right now quite yet but remember the Hub's Well it turns out that relative to all other nodes in the host network. The viral proteins tend to target the hubs more so than you would expect by chance as well and not only that remember the idea of shortest path length between nodes and example of attacks. The target proteins in the host tend to be more closer to all the other nodes of the network then again then you would expect by chance so the occupy a position that's what that's significantly different than the entire set of nodes in the network. Again we don't know what that means but the hypothesis here and it's really only a hypothesis is that is there an evolution when you know when there is a relationship of host pathogen whereby the pathogen needs to quote include circumvent some of the global properties of the interactive network of the host in order to do what it's supposed to do. I don't know. So if I had five more minutes. I would do a last vignette but I can stop here. I always do this and I always get five more minutes. I'm just trying to be nice and. Is that OK. So. Which is essentially to introduce back to the unit of those networks. You know. So basically here is the question that I want to finish with so I told you about interest. That's what models. I told you about fear types like in the worm. But also in people right. So but one thing is that we never really investigate in this field is whether the perturbations that we're talking about a network related to nodes. You know you remove a node and then that changes the properties and that's working. You see if you type. Well or and actually they are not mutually exclusive. Obviously Or do you sometimes actually affect and that. And so and the way the network is rewired is by keeping both nodes here completely normal at the exception that they cannot see each other anymore. You know words to talk about proteins that a protein would have retained it would be a mutant would be abnormal. But it would have retained the ability to interact with all its products except one. And so we wanted to ask in human disease. Considering the interactive network as we have it today for the for humans can we distinguish between the two so and again this will be a little quick for those who don't know biology every day but I don't have much time. If you have a stop going on in a gene right. Essentially very early on. For example you don't have the protein at all and if you have if you don't have the protein the node is gone in the network on the other hand many patients. I'll show you the second are due to single I mean ASA change as we call them so the D.N.A. is changed so that you have a new I mean a lawsuit here but if to the normal one. And that potentially not always but potentially we thought could be associated with the loss of an edge and not the loss of those necessarily right. So for example if you look at those Remember all those genes in the only database that are known to be associated with all the disorders Well it turns out that about how of the mutations are variable so far that have been found in patients. Our of this kind right here. Single very I guess. You know not not destructive right type of change. OK so having that in mind we then went ahead and said Well we know of genes that sometimes are associated with more than one disease right. When we have that case and we know what mutations what changes in those genes are socially to those diseases. Can we actually find relationships that would indicate that this model is actually not completely ridiculous and we find that for dozens and dozens of genes so let me just restate this. This is a genes mutations in this portion of the gene encoding this domain in the protein will tend to give rise to this disease mutations in that part of the gene will give rise to another type of disease that in the clinic you really see the difference between them supporting this model of a specific patient I'll skip this one. So just to finish. I want to tell you that we're now experimentally going after this idea. And so we basically cloning hundreds of we take that are found in patients that are known to cause disease. OK And we basically are looking at their interaction capabilities relative to the normal protein that is present in unaffected individuals and I'll just show you one example with out up in the area. It's a very famous gene mutations in which gave rise to this disease. We picked those are the illegals that are known to you know single I mean I was a change on the other. No it is protein and essentially Here's the analysis. Here's five examples all of which are known to cause a disease right. But again when you look at the physical interactions of the corresponding proteins with three of its interactors the behaviors are very different and so in this case for example the protein is not affected at all in its ability to interact with its partners those two only of the protein seems completely. Dead. This would be more of a node perturbation and then those guys are somewhere in between we call them as a joke right now a jet it because it's sort of genetics of edges and so those two proteins cause an amazing disease out there and yet retain the ability to interact with some of their partners let me show you that in a graphic idea so. So if you don't look at those two hundred stuff every day. This is a normal guy. And what we're finding is so most of mutations that cause disease really make the protein dead quote unquote. But many and we have more examples than what I have time to show you here many actually affect the protein in a very subtle way and only affect one of sometimes two edges that that protein is capable of mediating so I'll summarize and remind you that we're in the business of trying to tackle this question. It's probably a very important pillar of biology. Now that I showed you the last answer is obviously I think you might have a feeling that by systematically mapping interactive networks we're now going into finding some global properties I didn't show you and your local stuff but I'm sure you know the work of many other people and more particularly I tried to tell you that who are still very early on in this field. If you look at this estimated size from the first part of the experimental side of the talk. It's scary. It's in the range of a million. We think it's worth it because it looks like it might help us understanding human disease and my point here was more to confuse you. It's very early on it's not ready to go but just to say that you know we have this one gene one phenotype idea. It's very possible that we could start thinking about this but in terms of one edge one function one phenotype and I'll put the side back I think I thank the people as I was going through and I'm running out of time. So thank you very much. I'm sorry I. How. Yes So. Yes Well those are even more likely to be agenda grade in a way right. So so obviously as you all know there are mutations that make the protein apparently at least. Make the protein unable to mediate some functions and there those are usually called also function you're right and they usually are recessive noways and then you have this petition that make proteins do stuff that the wild type doesn't do. And those are called gain a function then they're usually dominant but that's that's also not always the case. So we've actually looked at that one surprise that I had you know is when you look at the human population. And you look at all those diseases some are called dominant and others are called recessive obviously so take Huntington's Disease for example. Right. That's a clear dominant one and it's clearly again a function there is this sort of probably going to mean stretch that appears in the mutant and those proteins do things that the wall type doesn't do. However I was surprised by the proportion maybe because of my own lack of knowledge but I was surprised by the number of mutations that actually are dominant. In the human population right. In other words you only need to inherit one version of one of the offer from one of your parents and that's it. But which are actually lots of functions when you look at the biochemistry. OK And that actually is much more than I than I would have thought and it's you know Gene dollars each for example or it's usually explained by this I had a slide that I skipped it not probably as an answer to your question were mine here right. So you have the dominant negative also. Mode of action right. And so basically what happens is that is that what I guess what I'm trying to say is the dominant phenotypes back there in the population do not always correspond. If you look at the literature if you look at what's known about those earlier do not always correspond to get a function. I see. Yes. OK OK I see your question. I'm sorry. So so yes that's right. We've done a small number of genes so far and but we really want to go large scale with this and we are starting to now include known dominant mutations probably so many of them gain a function it's going to be a little bit harder of course because. Lots of interaction we can measure right I mean if we know of interactions that the wild Had can mediate and we can look at the illegals and see whether or not they can mediate if you if you this. Again of interaction that's much harder we have to actually go to a screen. Basically yes that's right. So it's a harder problem practically but a fascinating one. Yeah. OK anyway. We're just talking about that I get that question all the time so we do we can estimate it we can give us sort of a ballpark. But honestly the assay that we use is so interact. OK what you can measure the very end. Right. How something to do. Obviously with the ability of two proteins to interact but it also is influenced tremendously by the stability of those two proteins by how easily they go to the use nucleus by how easily the U.S. will make them will degrade them so there's so many parameters that we'll end up giving you what we measure here that yes there is a correlation overall for the few actually these are people just published where. So e-mail me I can I can point you to it where overall the story of the new and strong interaction score very highly in the south in the week a one ten ten ten to two. To score much more likely but I wouldn't give a number to it now does that mean we don't we should do that. Yes we should and actually really excited about this but we don't. That's why again no technology or right now measuring the Cady's of thousands of edges from those networks is very very hard. Right. I mean with the current technology that we have you know by a quarter would be one of them that I think if we could do this in much smaller volumes. Yes it's absolutely critical that we do that but honestly I don't think we have the tools of course I mean many people perhaps in this room to have measured Cady's for you know one or two or three tractions and have done it well but to do it at the network. You know level is another is another problem and you know so let me tell you why I say this is not just to be smiley and say yeah you're right but a fundamental question I have is if you look at the distribution of Cady's for physical interactions between proteins and compared to species. You can we you know basic question Do we expect the same distribution or from nice to ones do you see. Do you see different distributions do you do you go through lower or higher kiddie is that you know are there changes there that evolution has given to the net to do all those proteins so that the network can have the properties is supposed to have so they are really fundamental questions. I love them but honestly practically you know we need Current T.V. to hear from people like you guys to do to start at that scale. OK so you're pointing back to the slide right where I was making the point that we're looking at by a physical and biological interactions. Well. So you know where we're measuring this right now we're actually I'm trying to remember I tried to say that I'm glad you asked the question. So imagine yeast. For example and all the conditions possible or at least a good number of conditions that the yeast cells can live under. And so you have a bunch of different tract of networks right. And I was trying to make the argument that if you take the union of that that might be different than what we measuring here which is the network of possible interactions. OK. So that means there is if it's not if the two are not exactly identical. Obviously I'm thinking that the biophysical interactive networks heart has more interactions. OK. So the number the delta is totally unknown. But here is here's why I think it's an interesting problem so I tend to put words on things you know. And in this case I'm not I'm not successful at all because I've tried to convince people in my life to work on those things I call those interactions pseudo interactions. Like pseudogenes. And you might say that's crazy. You know it's just you just saying that false positives have a cute name now but that's not exactly true. So if you sequence a genome for example right. You use a sangar reaction for example and you go through things and you see a beautiful gene or another. Gene and then at some point you see something else and at first you think it's a gene then somebody else comes in it's a no that's because of some things they've seen that you hadn't seen before the sequence is not wrong but the interpretation of the sequence can go back and forth as my fact the list of two genes and real genes have been changing over time because you know people are just trying to do the best job they can and trying to predict them. Nobody is wrong. It's just that it's hard to do it. So if you had a biophysical network right. It's like that. It's like you know how would you universal I mean I wouldn't be surprised at all of those into that soldiers interactions would never take place back if you go. They would be to go right now. How do you find them. How do you measure them how do you measure their proportion their percentages is a very hard work and I would like to finish this by telling you the reason why I started thinking about this is it's actually going the other way around. And if you think of the problem. Take you straight six thousand by said. Isn't right. That's eighteen million combinations right combinations. So of course no one knows what the size of the U.S. interactive network is I just showed you an attempt at estimating the size of the human interactive I'm sure that you know we have a large range and it's just sort of more methodology call that's that was the point of that of telling you that but still we can try to do it right so many biologists by a physicist biochemist will tell you I don't know I think you know on average a protein trucks with five other words for example have done this exercise many many times and it's sort of the number you get from what's known today. In other words the brain the the number of expected interactions there is probably in the range of tens of thousands something like that so that means that from eighteen million combinations you could down to tens of thousands something like that. So you have three orders of magnitude of pairs that apparently in vivo would not see each other or could not see each other or it would be really much better for the cell if they didn't. Now the problem becomes from an evolution point of view. How do you do that. How do you actually keep proteins away from each other. And it might not always be the best solution to do it from there and I mean the acid. You know structural point of view it might be that in some cases you keep two proteins away from each other by giving them two different signal peptides and sending one to the nucleus and the other one to the membrane some extent. Now when I do my path that aspect is now confusing because I'm actually taking the two genes and I'm putting them in one assay where I'm sending everybody to the nucleus and I'm testing whether or not they can reform a transcription factor. You see so you know that's the point is is OK so and lastly what we're trying to do right now is to look at by physical I mean the by physical interaction that works and try to look for those pairs that are very clean technically but for which there is absolutely no other times of information available to. Suggest that they might be real back in vivo you see where I'm trying to go. So their genes are not core expressed. If you knew something about their field. You don't see any similarity you know so you go on the other side. I guess of the distribution of. Of similarity of parameters long answer I'm sorry but thank you.