[00:00:05] >> So we're going to talk and she's going to talk a little bit about her current work but she's doing no p.. On the work on. Using my. Techniques. In health Ok so we're going to sort of thank you so I'm going to start this off so today I'm going to talk about automation of evidence matching and systematic reviews using web based medical literature so I'm going to put a big caveat I am not a natural language processing expert nor am I an expert in techs my name mostly what I view this from is from the lens of thinking about how you can improve the learning process of models. [00:00:51] So I'll give you a bit of background as to how I got into text mining or looking at Web based medical literature and then hopefully you'll start to see some connections. And one of the things I wanted to talk about was just thinking about each our base phenotype Priem probably you guys have seen this slide or something along this line right and this was joint work with Dr Sun and myself but really looking at how can you actually get a clinical characteristics that are present in a char and thinking about defining a patient cohort and what you can think about using that you charge typing for is really looked at in a better understanding of your population. [00:01:35] How many diabetics are in my population how many are suffering from hypertension. Once you get an idea of your population you can really think about what do I need to do targeted screening and interventions specific to them and also thinking about who should I use for randomised control trials which is what our c p stands for. [00:01:57] Now traditionally what you can think of as each are basically no typing is looks like a flow chart right it's basically looking at specific definitions in this case you're looking at a case patient for type 2 diabetes but basically looking well do I have a type one diabetes I see the 9 code if not then maybe they're type 2 diabetes so do they have one and if they do are they taking insulin and so on and so forth now the down side to thinking about. [00:02:28] Constructing this flow chart is that I can take many hours and many days to croon struct right basically what happens is you have a panel of clinicians who decide Ok I want to look at a specific disease and then I'm going to go ahead and iteratively refine the definitions or or parts of this flow chart now from a machine learning perspective that seems really inefficient because one you need labels right somebody needs to figure out there's this patient is this patient type 2 diabetic and the 2nd is actually to figure out you know. [00:03:05] You can't learn disease and other words would you consider all type 2 diabetes patients to be the same Probably not right. It's like if I grouped you guys all into you know you guys are all taking this class right now right so you guys should all be the same right but that's probably. [00:03:25] Assumption So what we thought about was well maybe you can think about taking this database and you can run many different machine learning algorithms and in particular what we've focused on is unsupervised machine learning algorithms right that means you don't need a label you can automatically sort of learn what are the prevalence types or subgroups of your population. [00:03:50] And what I'm showing you down on the bottom is the tensile factorization model there's also been recent work that was looking at problems the graphical models as well as a mechanism to learn and you can imagine you could use things like making their officially elocution as a topic model right and learning the topics from your patients and the nice thing is once you do this you can get these nice groupings so you can automatically get these disease subtypes and you based on you know however you constructed what data you're looking at you might have something that says you know this is a mild hypertension you know type that means that patients typically have the hypertension I see 9 codes and they're also taking a stand hitters and die as a like director and. [00:04:42] Now the nice thing about unsupervised methods is that you can actually quickly come up with a lot of different candidate phenotypes. Which is things that you think are groupings that naturally occur in your data but then the natural question should be how do you actually evaluate these candidate phenotypes right because in the end you're just going to get these groupings and how many of them are actually relevant. [00:05:11] So I'm more curious how many of you guys have run unsupervised learning so much what have you what have you guys done clustering and how do you decide to evaluate whether or not your cluster is good that's right so you can think about evaluating Well how close are your patients right but really that only comes up you know to a certain extent it's just looking at how well are your clusters defined close to one another right it really doesn't tell you you know if I was to give these clusters to medical professionals they may not trust it right do you guys have any other thoughts to also look at that more and supervise. [00:06:23] All. Right so you're exactly right just because there are some are doesn't mean it's useful right useful and similar can be quite different and so traditionally what we do is we actually get external validation right so if you think about the way each are based you know type have been it was a panel of experts who was doing it right who came up with the definition and they all agreed this is how you did it and so we said well why don't we do the same thing right which is that you can take your raw you Char records right you run it through whatever you know automatic unsupervised machine learning algorithm you want to try some you supervise the pain on what you want and you come up with these candidates you know types and then when you're going to do is you're going to present all these candidate phenotypes to a panel of experts. [00:07:19] Sometimes a panel is one right sometimes it's more than one and hopefully what ends up happening is that the experts say Ok these are the ones that make sense and then these other ones that don't. So you're looking at not only trying to capture as much of the data as possible but you're also thinking about usefulness from a medical professional perspective right now what's the downside to thinking about a panel of experts expensive and slow That's right but if I ask any 3 of you to vote on something do you think the 3 of you will agree how many I think everybody will agree in this cluster how many think that at least one of you will disagree of the 3 right actually I guess is really right so that's exactly correct so what happens is you know what do you do with disagreements so not only is it expensive from time perspective and it also takes a lot of time. [00:08:28] The question is well what if 2 experts disagree which expert do you really trust right or do you just assume the worst case and you throw everything away. And so during the process of doing this. As Ari presented right so you can run all these unsupervised methods and typically you pass it to a panel of experts one day one of the experts that I and she said well really actually the court you're looking at isn't my specialty so if you think about doctors they have specialties. [00:09:00] That means that they're experts in specific things and they may not be experts in everything and she was willing to admit that you know we were looking at resistant hypertension and she really was not that familiar with this cohort of patients have you thought of looking at Pub Med Ok. [00:09:19] And I said you know that's a great suggestion because I really didn't know any better we should definitely explore this direction. And it turns out that actually searching Web based medical literature is a is a pretty interesting area it's 100 you guys have heard of Pub Med just a handful the rest of you have never heard of it so I'll give you a crash course in it in a little bit. [00:09:47] But I was curious how things were done exists you know to date and so what I happened to be running across was this article by a bull in that all in Jamia 2015 and they had this nice little summary that said we extracted all articles from Pub Med with the term birth month and an additional article referenced by the located article and so that yielded about $156.00 articles we manually reviewed all abstracts right which means they manually reviewed all 156 articles. [00:10:21] To identify $92.00 relevant articles and to summarize the literature. And if you look more closely at thinking about systematic reviews which is a broader way of you know a more rigorous way of exploring literature. What I wanted to note is that the total time that you're going to calculate right in terms of the number of hours is going to be based on the number of articles that your search query retrieves Ok and what you'll notice is it's there is a baseline number which is 721 hours and then for each additional article you're going to multiply it by 0.243 and then take off some scalability right but basically that means that at a minimum you're looking at 721 man hours just to do any review of the literature. [00:11:18] And I don't know about you but I don't want to sit there and review 721 hours for every single phenotype Now the reason why it's so expensive is actually publish and it is this great website that basically captures all the citations thinking about buying medical literature it's one of 3 common databases and bases one on and I forget the name of the other right but what I want to highlight is that Pub Med Currently when I grew up the screen shot I think last week is 30000000 citations and growing right so if you actually really want to search for something in Pub Med right chances are you're going to come up with a lot of articles and you can see it in fact actually if you do a simple search looking at diabetes and hypertension I wanted to highlight the growth in terms of a result so every year there are more more articles that are getting published and so you really want to think about well how can I start. [00:12:24] To summarise information from Pub Med and use it for driving the learning process or even thinking about can I use it to validate phenotypes. Now there's been a lot of work in thinking about mining biomedical literature and what I want to showcase is what's been done to date right which is looking at I want to be able to do a document prosecution document retrieval and passage retrieval which means I'm looking for specific things right and you might say well within that I'm also interested in look at extracting information from the articles themselves and this would be things like drug drug interaction protein protein interaction or relations between genes and then you can think about using them for different applications as well. [00:13:19] So this is how you can think of laying out what's been done in biomedical techs mining but to better understand how the work's been structured you can think about it from how biomedical text mine has been done from a natural language processing perspective so what you have is looking at you know different parts of the article so at the very bottom you're looking at passive retrieval techs cost cation and ad hoc retrieval and these are all the different things that you might want to just get directly from the techs so this is looking at can identify all the articles with obesity with certain properties or can I find you know this is a protein to protein interaction Now that's very much at the lower label lower level which is identifying individual things within a full passage. [00:14:13] And then you can imagine you're going to go up and you can think about mention detection Co or France resolution normalizations a little bit more understanding of the text you can look at extracting information and then finally you can think about summarization and question and answer. Now if you recall how I got into this area was thinking about I have this candidate type right so I'm looking at hypertension and already with the medications of ace inhibitors and. [00:14:46] The make agents. And really the question is what you have these paradigms can I use to retrieve the relevant articles for this particular phenotype and if you're like me you think none of these should work right because it's not really necessarily one of these particular things right because that you know in some sense I'm looking at mention depiction but I'm looking at multiple different mention detection right and I'm looking beyond just a specific category so this is sort of how we got into this area. [00:15:25] What we proposed was that you know. This new tool that we could use for validating phenotypes using public which we call you know cloud flash pivot so the original name was female cloud more recently we renamed it to pivot to make it a bit easier but the idea is I have you know this mechanism for generating phenotypes right whether you're coming from a peer reviewed paper or whether it's coming from high throughput what I'm going to do is I'm a Stuart and then I'm going to think about some way to represent every individual item that I'm looking at I'm going to analyze all 30000000 citations and publish it and then I want to present some results to help determine whether or not a phenotype is valid comfortable with the set up. [00:16:19] So the overall process basically is I'm going to pick some phenotype. Which is shown on the left side and then I'm going to generate all the synonyms associated with every single item there and then I'm going to go ahead and figure out which ones are relevant I'm going to search all of Pub Med and then calculate how useful it is in comparison to one another. [00:16:42] So that really consists of 2 parts Bridges I'm going to call it the few notes to fix in them Generation and the coal currents analysis so the 1st part is thinking about the phenotypic synonym generation which is to think that every article may refer to different the same concept differently. [00:17:06] So out of curiosity how many of you guys know what my own cardio infraction mean you have to guess close you're very close there's a more easier name. Very very close there's a very very simple. Word for it that's right so my a cardio infraction is also known as heart attack right and you can imagine that there can be the interchange of those 2 things right similarly hypertension also means high blood pressure right and you can use you can think of using them interchangeably so depending on who's writing the paper it might look quite different right and so what we've decided was we needed some way to figure out well what I care about is good recall right in other words if I'm going to find articles related to hypertension I really want to make sure that I and compass almost all the articles that are somewhat related to it so we queries a mesh database which is this terminology database that's available by next to the National Library of Medicine and it has this nice tree structure which means that you can figure out rather than words similar so you can see hypertension there's pre-hypertension preeclampsia. [00:18:32] As some interesting synonyms right and then what we're going to do is they just give us a potential number of candidates synonyms what we want to do is we want to figure out how correlated is it with our true. Original work right and so what we're going to do is we're going to re rank it by create a database and see how often they appear together. [00:18:55] And so you can see hypertension obviously is very highly ranked with the self and it's also in terms of thinking about purpose malignant hypertension and white coat hypertension are pretty high up there as well. So this is one mechanism for thinking about if I had a word can I figure out all the synonyms that might appear in the literature Ok then the next thing to think about is looking at coal currents analysis because remember we were thinking about how do I know if I were there 5 items together make sense and so there's this concept known as lift which means it's looking at you know how often set so words are current together compared to what you would expect them to occur so formally for 3 items is defined as the intersection of those 3 words and then normalized by the. [00:19:51] Assuming that they were each independent now highly effective means that there's a good chance that these 3 items are actually associated with one another and a Lol this actually indicates a small chance of relationship so what you can see is if I'm looking at COAG aeration factor deficiency with other coagulation modifiers there's actually a very high value in terms of the literature. [00:20:16] But if I'm looking at diabetes antibiotics if I'm looking at the literature itself there probably isn't much of a relationship between those 2 items or you know the opossum of uncertain behavior and cardiovascular agents so it gives us an idea of how we can start to rank all the items together. [00:20:38] So what that means is you're going to take your candidate phenotype right you're going to figure out all the relevant synonyms that you might be interested in and then you're going to calculate the lift using those synonyms right and then once you have the left you can assess whether or not all these items together has a good chance of a relationship or not. [00:21:01] And so what we've developed is this web interface right so up here is the candidate phenotype that you put. Here is looking at. The impact of an article on the final score so part of it is you know I want to be able to generate evidence for a physician to figure out. [00:21:23] Especially if it's not their specialty and then what we have is we rank the articles based on what we think. Its Relation is to that phenotype to get give you a better idea of whether or not it should be included and you can actually look at the link in the abstract so that's one way you can think about doing some text mining but if you recall I'm not coming at this from an n.l.p. perspective right I really want to think about how I can improve the learning and make things more efficient. [00:21:56] So the next thing we did was we wanted to figure out well can we use the results of this core currents analysis and send in a synonym generation to actually classify whether or not a panel of experts were actually deemed phenotype as clinically meaningful and so in this task we're solely focused on the ones who were the experts all agreed it was meaningful or they agreed that it was not meaningful at all. [00:22:27] And there's 102 different phenotypes and we trained a logistic regression and Kerry nearest neighbors just using features that were derived from it and what you can see is logistic regression gets and you see of 0.7 i n n f one score of their 0.878 and came near its neighbors and you see a slightly lower but the f one score in terms of accuracy was a little bit better now you might be wondering well what can I think about doing with this what it can serve as a validation aid So what I'm showing you are actually too. [00:23:06] Candid if you know times that. A couple of experts that they weren't sure what to think about it. And in fact actually what I'm showing you is what are the diagnosis. What are the medications and then the comment from the Entertainer and when I'm showing you is the score that you get from running our classifier and then also the lists both finalist score that was generated and what you notice is the 1st one that came up to be more uncertain actually based on our classifier it suggests that it actually should have been a clinically meaningful candidate right and based on the comment It also seemed to suggest that there was some certainty right mental Maybe it was just you know misclassified at the time. [00:23:56] But you can see really the 2nd one had this lung disease and a big question mark and even our classifier was not really sure what to think about it as well because there are some stuff that is potentially not as relevant and you notice that's also reflected in the live score right because the list is actually quite different. [00:24:16] Questions some are comfortable with this. Hopefully so I want to just think about tying it back to what are yet covered right which is how you can think about incorporating it into the learning process and a couple of things that we wanted to think about was let's say if you have type one diabetes her you should not have type 2 diabetes. [00:24:42] If you take medication a you should not take medication b. these are the types of things you would find in the literature right and if you are pregnant you should not take medications. Right now if you think about doing it from the perspective of cancer factorisation right there really is no mechanism to incorporate this knowledge. [00:25:05] One way you can think about it is Ok well for some of the things to be the same right now the downside is that you may actually bias the model towards expert knowledge once you start including information it still goes back to the problem of requiring lots of annotation right and I and it assumes that the knowledge is well established which is to say not everything published in the necessarily is true. [00:25:34] Especially true of machine learning models. So the way that we thought about it was what if we think about using pivot to actually guide the learning process so we basically run tensor factorization unsupervised and then we identify Well maybe there are some pairs that should be pruned and these are the low probability elements so we're going to pass those We're going to figure out the strength of the relationship between those items. [00:26:06] And they were to go ahead and update the tensor factorization model and this way what we can do is we can generate more useful things so formally this is c. Pickwick that means that we start with our observed tensor and we look at the low probability elements and we run the low probability elements pass them to pivot and then we construct this matrix that we call the cannot mean tricks and once we do that we're going to prune the ones that seem relevant so keep the ones that suggests that there is high strength of relationship and then go ahead and update the concert the composition model. [00:26:45] I'm not going to talk much about how exactly we did this but we ran it on electronic health record data to predict resistant hypertension We constructed a tensor that was roughly 16000 patients. Of which a subset of them were diagnosed to identified by domain experts to be casing control patients right which means that clearly there is 304 who they felt had risen hypertension and 399 who for sure were not resistant hypertension and what I'm showing you is some different metrics looking at predicting whether or not a patient actually belong to case or control. [00:27:28] What you know notices so rubric incorporates domain knowledge by foreseen specific items to be present there was a supervised them met that that was published in scientific reports back in 27 pm by commit all and you can see with all of the labels you get 0.6466 grand it is looking at diversity but no supervision and so you see in fact actually granite does better than revert because forcing specific items to be on is actually quite stringent and then ship equipped as the one that we did and both cases we outperform having all the label information because we're really thinking about incorporating knowledge from medical literature. [00:28:18] Comfortable with us yes all. The big change. Is. Likely already fill up just. Like. All the answers lie to us it's more classify and so so this is actually. This was using the Vanderbilt synthetic got the repetitive and what they have is they used to be some definitions in there to define resistant case and hypertension case and control what we did is really looked at the date at which they were because there was actually a date associated with the case and controls and so we took all the records before that date so that that particular date we're holding out everything and so we're saying Ok you know using the previous history can we classify whether or not there casing that's how we're looking at it. [00:29:32] And less so from you know actually leaking information from. The flow chart itself for the actual you know. I mean there may be some some stuff yeah. Yeah yeah I mean they're resistant to the medication themselves in. Now but I was the one who defined them. Which is I think the thing that this more trustworthy maybe. [00:30:04] But I mean this just gives an example of how you can think about using medical literature to guide learning. And so what I just wanted to summarize was this 1st part which is looking at producing evidence for a candidate phenotype by searching Pub Med and also looking at whether our phenotype is clinically relevant without having to necessarily ask a panel of judges what we view pivot as something is that we can automatically reject really really bad ones so we don't really have to necessarily spend the puter time on that very we're really asking them to get the high quality annotations on the ones that matter and actually we did some improvement to scale it to index the entire pub med open access articles which is 1000000 plus articles using just a regular machine right so any of your laptops can run our analysis. [00:30:57] But there is something that was bugging us. Which is thinking about the synonym generation which was really relying on Match terms to do that and the question was well can we actually do better so the answer should be us because I'm about to present it right. And so this is what they do you know about word embodiments right and so this is the latest craze I'd say in natural language processing and probably. [00:31:27] Also thinking about utilising deep learning so the goal is actually you take every word and you represent it using an m dimensional vector so one way you can do in code words is you actually do a 100 encoding right which means you look at the size of a vocabulary and it's a one if it's that word and it's a 0 for everything else now under word embeddings actually it captures this notion of sin and sin and m.m. Pol Pot us me I don't want to prance that properly but the idea is similar words should actually be close by in vector space and so the common example that is used quite Previn the me in thinking about what embodying is is if you look at the vector King and the vector queen and you take the difference between the 2 right the difference is that it's male versus female right it will give you the same direction as if you took the difference between man and woman and so what you can do with what I'm betting is you can do things like. [00:32:35] You know King minus man plus woman will give you Queen that same representation and part of the reason word embeddings became so popular was the model word to back that was introduced back in 2013 and the way that you train the word embeddings is you think Wall similar words actually occur in similar context. [00:33:05] And so what you're going to tell the model to learn is forgiven words I want to be able to predict all the other words around it right using just that particular embodying So what that looks like from a model perspective is you have this tug of war that you're looking to encode using the neural network and then you're predicting all the worst the left and to the right of that word. [00:33:30] So as an example let's say that my target word is metabolically So then what I'm going to use is I'm going to try to predict using this word nutritional support benefits stress surgical and patients and that's how you train the word defect model. Comfortable with this and so worth of x. has been shown to be quite powerful so if you take 2 documents let's say one document is give a research talk in Boston and another document as had a science lecture in Seattle so Boston and Seattle are locations right and they should be similar to one another context was there they're very similar and lecture and talk are also a very similar right and so what you can see is if you learn the embeddings Boston and Seattle are together talk and lecture are similar right and research and science are also similar directions. [00:34:28] So that's great right now if you actually try to train a standard word to that model on Pub Med literature the issue you get is stuff like this so if I look for the words that are similar to hypertension and look at the top and you get hypertensive hyper cardiovascular antihypertensive h.c.n. r r e g n r r just shorthand and if you look at diabetes it'll give you things like my lightest pre-diabetes diabetic p d m and so on. [00:35:03] So the problem is actually these words are similar but it actually is in biomedical literature there are a lot of multi-word phrases right so in fact actually one of the most similar words to hypertension should actually be high blood pressure because it actually happens a lot. So one of my students proposes p.m.c. back which is a Pub Med Central vector representation that takes into account multi-word phrases right which happens a lot so at a high love though what we're doing is we're 1st clean I'm not identifying possible phrases in the top level and then we're going to do is we're going to actually filter out the phase phrases that don't occur enough for us to think them to be meaningful we introduce a new criterion to rank the phrases so we can figure out which ones are relevant We're going to go ahead and tag it and then run it through a word to the model which is basically people earning neural networks. [00:36:13] So I'm going to showcase the existing criteria and then our criterion which is in full frequency so this is frequency and card Corp what you notice is if you look at frequency right the most prominent top 10 words are present study risk factor significant difference Eline results suggests control group right these are not very exciting words. [00:36:44] I don't view them as exciting because I understand them all and if you look at the card it's only slightly better right I don't know why Rainbow Trout is so common. In the public literature but really it seems that both of these criterion suggest words that aren't entirely medically relevant or really related. [00:37:06] Now you can look at 2 of the more common ones which is point wise mature information. And also worked a phrase which was introduced with work to back and what you can see is the phrases look a little bit more relevant to medical techs right especially thinking up p.m.i. and word to phrase right at least p.m.i. is there's a lot of words that I cannot pronounce right which I think is a pretty good indicator of useful words and then work to phrase you know partially there but they're still very short most of them are just $2.00 to $3.00 word phrases and so we needed some mechanism to sort of balance phrase frequency right and the actual underlying words themselves so that we would tag them more frequently and so we introduced a new formula and that's called Information frequency which takes into account how often the phrase occurs and how often each individual word curves and the link and what you can see is actually using our info frequency criterion you have much more interesting things like c o p t m r I's and Paulo Murray's chain reactions up. [00:38:28] So there really is a good balance of phrase line frequency and medical role of relevance now the real question is what doesn't work and so what we did is we looked at 5 different biomedical similarity tasks. That are publicly available so on the bottom is looking at the data sets so there's many male Mayo and r.s. a different version and then age and basically higher similarity is better what you can see as our model outperforms all the existing vector representation word embeddings for Biomedical text and even thinking about glove new glove and Google News right as the bass lines and they perform really poorly. [00:39:14] So that's a quantitative comparison qualitatively we're also interested to look at well how does ours differ a from all the others so if you recall if we use no phrases all you got was a single one word. If you use Pub Med phrases which is they identified a bunch of phrases and rank them according to different criterion these are the ones that you would get right and then if you look at ours the larger font is the ones that are more similar to hypertension and what you can see is elevated blood pressure and high blood pressure are high up on there as well as arterial hypertension blood pressure an essential hypertension. [00:39:55] So in fact you can think about using Word to vac you just have to tune it slightly so that you find the right orders. And so p.m.c. back was really about learning quality vector embeddings for a single word in multi word phrases and the useful thing is now I can actually generate multi-word phrases automatically from the corpus rights I can get easily get synonyms from this process if I passed in hypertension I can figure out high blood pressure would be a synonym and you can use it for a variety of biomedical and Arpita. [00:40:30] But we also had another problem which is that. When I go to read Pub Med articles I can't actually understand most of it right because usually there's you know it's quite complex and so we thought well can we actually think about using p.m.c. vac to actually specifically summarize articles. [00:40:51] So really focusing on a variety of biomedical n.l.p. tests so I wanted to 1st introduce the notion of key phrase summarization the idea is you know you're given an abstract and you want to be able to automatically generate all the relevant phrases to summarize this article so an example of this is you can read this entire text but these are the key phrases that were supplied by the author. [00:41:21] So the 1st one is looking at clinical microbiology which occurs in the title. Infection Prevention which also incur occurs in the. I am p g m which is I think a specific form and platform next generation sequence is also prevalent and whole genome sequencing which never occurs in the title nor the abstract so you can see you know actually identifying and summarizing the key phrases for and for a document can actually be quite difficult so you might be thinking Well why else would you want to use it right. [00:42:05] You can use it to summarise text you can think of using just the key phrases themselves to classify the text you can use it to determine well which topics are there you can make it for recommendation purposes or you might want to recommend articles with this particular word and you can also use it to summarize citations. [00:42:28] Now with the prevalence of all the deep learning models you would think this is a solved problem but actually last year in Acol no had this quote that says despite using advanced people earning models large quantities of data and many days of computation are systematic evaluation of 4 test data sets reveals that the export text summarization methods could not produce better 2 phrases than simpler. [00:43:01] Which means that people earning models cannot necessarily do much better than unsupervised. And so what we thought was well maybe we could do some unsupervised learning as well utilizing p.m.c.. So the overall process of name key is which is the the name of the Met that we introduced was we take an abstract. [00:43:24] And what we're going to do is we're going to identify the possible key phrases using 2 different streams one is looking at name and any recognition to extract all the name and bees that exist and the other thing is to actually do our own chanting to identify other phrases that might be interesting that are not named and Indies and then we're going to do is we're going to use the word embeddings and phrase ranking to figure out which ones are more meaningful and some more to the document and then we're going to go ahead and find diverse and Representative key phrases by ranking them and clustering them together so the cool thing is that we can actually utilize the information frequency criterion to as a proxy for phrase quality So this is looking at key phrases that were correctly extracted by our method on the left and then on the right is all the common phrases that were incorrectly extracted by the baseline which are the baseline key phrase summarization. [00:44:28] And what you can see is the other methods because it doesn't know about the quality of the phrase will actually generate not very useful things like lower subjects or high risk or hurt but you can see because we were rating it based on freeze quality we can really have a proxy for what we think is is useful make sense so we basically introduce this notion of phrase quality to figure out which phrases should we actually keep right and which phrases might be part of the article and then we're going to do is then we're just going to cluster the ones that we think are relevant and so let's take a different abstract right so based on that we've identified through a name and then the recognition and phrase quality all these phrases and what we're going to do is we're going to automatically figure out how many clusters how many different topics there are. [00:45:24] And then we're going to go ahead and rank them within each cluster as well so what you can see this cluster one is really looking at photos has to size or information clustered to is related to some outer membrane proteins and then cluster 3 is really looking at some particular drug or pump and the goal is you're just going to find diverse key phrases by choosing from each of the clusters in the proportion that they're looking at questions comfortable with this completely lost. [00:46:02] Board Kirti. So what we have is we have we created a new benchmark dataset which didn't exist before using Pub Med and the open access articles were identifying any abstracts with at least 5 author provided key phrases that occur in the abstract themselves. And then what we have is the title of the article the abstract of the article and then the list a key phrases that are provided by the author and what you can see in the blue line is our method right this is looking at f. one score and on the x. axis is the number of extract if you know key phrases right so what you can tell the algorithm to do is give me the 15 right or at the top 20 and this is looking at the balance between precision and recall of the actual key phrases so you can see across the board we outperform all the existing techs summarization methods that are there yes. [00:47:08] That's correct that's right so if there's like 10 provided matching you know the 10 that we propose how many of them actually occur in the set and how many do we actually get back. To higher up of course is better. But one. So mostly Yes because in fact actually a lot of it with the old methods is they can generate useful key phrases and even with ours there are some key phrases that we don't identify the candidate key phrase so it's hard to rank them they don't make it through our original filter so all they're just not very they're weird I'd say I don't want to call them weird but they're just not very common overall yeah so they tend to just get drive the score down a bit. [00:48:22] But it's interesting because we were also curious work exactly what is the part that matters the most right because we have you know quite a bit of different parts of it. So we did in the place in the study which means that you turn off a bunch of the elements and turn them on and off to see what their effect is and so this is showing you with no name and then the recognition and just purely looking at embeddings and chunking. [00:48:45] You have pretty low recall which means that we of the author provided lists we actually were able to recover a lot of them now if you actually turn on named any recognition it helps quite a bit especially if you look at both precision and recall because a lot of the key phrases will be unnamed and it recognition will be named and it is that important but the next thing we were interested in was well what if we actually also you know brought in the process of identifying additional phrases that war not named energies. [00:49:26] And what you can see is actually there isn't much change from the previous So this is just a chunking by itself actually define more phrases is not that useful because you're not actually identifying useful phrases you're just identifying. Sort of just noisy phrases and so what you can do is if you add phrase quality the thing that goes up is recall writes a precision takes a bit of a hit which means that other things that you propose how many of them are actually relevant but your recall of your original author keyless actually goes up quite a bit. [00:50:02] And so really the phrase quality is needed to ensure a better recall of non named and of the phrases but there's only so much that it can do right and you'll see the same thing happens actually across the board. At different number of key phrases that you're specifying the algorithm to come back but it's a pretty interesting problem because what you can do is rather than having to read the abstract by yourself if you have some mechanism for summarizing the article I can now look at that more quickly. [00:50:35] So named c. is really has the ability to identify diverse and Representative key phrases and also improve document representation using key phrases and we also created a new benchmark dataset Now then I'm going to spend the rest of the last 1520 minutes talking about well this is only looking that validating Pino types here and I can use all of this in the few notes about each process but can I think about other applications that also can benefit from this type of learning process. [00:51:10] And so what I really want to cover Well what I will want to cover next is actually thinking about evidence based medicine so if you look at actually how you know sort of the importance of articles or literature what you can start with is background information expert information and at the very top level is what's known as the medic reviews and med analysis. [00:51:36] And the idea is as you go up this pyramid the quality of information is better you can think of this as a survey article of machine learning methods so if you're going to tackle one new model you're not going to try to find all the articles that does t