Thank you so much and thank you Jeff. Well it's great to come over to this side of town every now and again so it's been a really wonderful experience for us working with the sun lab and though we just started we just got a grant from the National Institutes of Health and it started in August of this fall so the work is preliminary I thought it would be good to give a broad background about what it is we're doing and why it is we're doing it and you know my background is as an epidemiologist so I have a different different background than you and. It's been really great to work with. People with your skill set and background and really have found that in the medical field in the health field I'm having data scientists and people with this expertise has been extraordinarily helpful for us there are many many problems to solve so we'll start just talking about in terms of what we're doing so this project that we're working on is. Our goal is around predicting hospital readmissions among transplant recipients and were using. Were in year one of the project it's a five year project the ultimate goal is to update a predictive analytics system within our health care health care system so physicians surgeons and our multidisciplinary care provider teams can know who is highest at risk for poor outcomes after they get surgery so we can identify as early as possible who needs intervention might mean additional additional medical interventions or things that the hospital might be able to provide for the patient before sending them home after surgery so what I want to do is present a little bit of a broader background about you know the field of kidney disease and transplant my background in research is really focused a lot on Social Determinants of Health access to care disparities in access to kidney transplant Taishan and this project focuses more on sort of after you get to the transplant and you receive the surgery what happens and there are unfortunately some disparities throughout every step of the transplant process and I'll talk a little bit about that and then provide some updates more specifically on some of our current work and ongoing preliminary data. Related to this project and then talk a moment about future directions so feel free to interrupt me if you have any questions throughout the talk would love to hear from you and if there's parts that aren't clear Feel free to just raise your hand and I'll explain explain that better. All right so kidney transplant Taishan is the optimal treatment really for patients who have end stage renal to see is there's about seven hundred thousand patients in the United States that have end stage renal disease typically caused by things like hypertension diabetes obesity and the best treatment for these patients is transplant there is an alternative treatment is there and heard of dialysis before so dialysis is the alternative treatment and that's really you know patients go to a dialysis facility three times a week their blood is filtered waste is removed and you know this is something that three times a week for multiple hours they have to go typically to this center to receive this treatment and it's not fun it's associated a lot of infections though it is a lifesaving treatment I also says associated with much higher morbidity and mortality compared to transplant patients so transplantation is a surgery that you either can get a deceased donor organ or a living donor organ so if your friend or family member has kidney disease you could potentially donate one of your kidneys you have two of them and actually you once one of your kidneys is removed the other one really. Within a few days will have up to ninety percent of the recovery of the kidney that you've lost or deceased donor transplant so if someone you know unfortunately process away and they might be a good candidate for an organ there's deceased donor organs for people that can be used so while the this figure shows here that the proportion of patients who have kidney disease and are on dialysis has been rising over the last several decades due to these increases in these chronic conditions diabetes obesity hypertension Unfortunately our pace our transplant rate has not kept up with out so the absolute number of transplants. Last year we had about nineteen thousand but there's over one hundred thousand patients who are currently on the waiting list so unfortunately this sort of constraint on a limited availability of organs has resulted in a number of disparities in access to transplant patient and that's been some of the for focus of our research. And before I get into some of the more specifics I wanted to explain sort of what's the pathway that a patient has to go through in order to get a transplant and this is helpful because when we talk about the data that we're going to be using in order to predict an outcome after transplant we want to use some of this data that occurred prior to the transplant period so these steps to transplant are in some some ways linear in that you have to complete sort of some of these in order to get to the next step so typically most patients start on that on dialysis because of that limited number of organs and again this is what the dialysis facility would look like you've got a chair and you're hooked up to a machine and you go there multiple times a week and typically your nephrologists are a kidney doctor supposed to educate you about transplant as a treatment option and potentially refer you to a transplant center to undergo a medical evaluation so to say is this patient a good surgical candidate for the transplant surgery so. Typically. You know what they'll say is in Atlanta for example we have two transplant centers we have Emory transit center we have Piedmont transplant center in Georgia there's another one in Augusta so but we have maybe three hundred three hundred fifty dialysis facilities across the state so these dialysis facilities are referring patients to one of these transplant centers then the patient is expected to actually show up at the transplant center usually these are attached to a hospital very close to a hospital system somewhere equipped to be able to do the surgery and they go through a fairly rigorous evaluation process so they're evaluated both medically psychosocial financial and then they have to complete that evaluation This may take some or between you know one hundred or two hundred days and they're placed on this national deceased donor waiting list or they pursue a living donor kidney at that point so our patients that are sent are really encouraged to pursue a living donor transplant given the long wait times for a deceased donor transplant and that short or the small supply of organs. And then you may wait on the waiting list the patients may wait for five years before getting a deceased donor transplant let's sort of the process for these patients and what's important to know is this this complexity of care in kidney disease it does change sort of pre-diagnosis to when they have the disease and they start on dialysis typically they're referred for for transplant you have this evaluation where they have many encounters sometimes two to four encounters with the health system over several years and that may be in the dialysis facilities three times a week but with the transplant center at the transplant center level where we are they start with two to four encounters over several years and then in the peri operative period meaning once they actually show up in the transponder they get the surgery we have maybe fifteen counters with the health care systems and counters being maybe you saw a nurse and they took your blood pressure measurement maybe they saw physician maybe it's the actual surgery itself or the follow up visits but or the education that they have but all that within the peri operative period so when they show up and they have this three to five day period sort of length of stay at the time of transplant and then after trance when these patients are discharged they're sent home and they'll come maybe ten to twenty times within the first year they're back at the transplant center and their health is evaluated so think about this in terms of all the data that we're collecting in in this system as we talk about later the predictive models that we're going to use for this project so before I get into the specifics of the data I want to point out a couple of things for need and why this is an important area of research I think particularly for our community and that is one this is some of the research some research we did a few years ago this is showing every state lined up from lowest to highest transplant rates so the standard is chance what ratios at the Dallas facility level and a few years back when we publish this data we were really surprised to see that Georgia had the lowest rates of kidney transplant patient in the entire nation so that led us to a number of different research studies that I won't go into all of them but one of the things we found was for example that the variation and transplant referral among these dialysis facilities remember that key step that just have to refer patients from the Dallas asleep to the transplant center to undergo care this on the X. axis shows the more than three hundred dollars US facilities in Georgia so each sort of column is one dialysis facility and then the red line is showing the median proportion of patients referred for transfer within a year of diagnosis and what I think is really striking about this is just the extreme variability in transplant referral for these patients so one of the reasons for this we have done a number of studies but it is related to both. Economic status race distance to transplant center there's a number of social behavioral social determinants of health and reasons for why we have variation in access to transplant and wide Georgia has one of the lowest rates of traits on the nation an update from that previous life is that we're no longer the last last we're like in the top or in the bottom five now so we're getting better but not it's still a problem. So you know for example this is some work we have done looking at access to the transplant waiting list finding that. Patients in. Multiple if you look at categories of poverty level you see significant disparities in the last comma showing adjusted for a number of other demographic and clinical characteristics you still see substantial disparities racial disparities in access to the transplant waiting list and it's worse in those that are in the poorest communities even after patients are waitlisted we see a substantial disparity in in access to the transplant access to transplant even after patients are waitlisted where African-Americans are different studies have shown between fifteen to twenty percent less likely to get a transplant and this continues on in the post transplant world so after the patient even gets a transplant we see higher risk of poor outcomes some of the poor outcomes we care about our graft failure and mortality of our patients so people who get a transplant surgery typically their one year survival least if you're at Emory for our center is about ninety eight or ninety nine percent is pretty high. At least compared to what the alternative treatment would be on dialysis. Unfortunately that. There's a disparity by by race and graph survival where especially over the long term this is five year risk of graph survival this disparity has has remained despite improvement over time in the. Survival So one of the things and in the interest of hospitalization hospitalization is often an early marker for something like graph aliar so graphic means the kidney is failed they no longer can use that kidney that you know cost the health system a lot of money and that is life saving for that patient so if there fails that means they return back to dialysis which costs upwards of two hundred fifty thousand or one hundred fifty thousand dollars per year for a patient where the surgery itself is about two hundred fifty thousand dollars in that one time cost so it's a definite definite drain on the health care system as well as really terrible for the patients. So about thirty percent of our patients are readmitted within thirty days of being discharged from the transplant surgery and half of them are readmitted in the first year after transplant and this is also a problem for our patient population again it's a signal of being associated with potentially worse outcomes it's costly each admission to the hospital costs more than ten thousand dollars and nationally represents twenty percent of all Medicare payments for the transplant population and as I mentioned earlier oscillations associated with lower graft and patient survival and there's some hope that that evidence suggests that up to half of these surgical readmissions may actually be preventable so if we were able to identify them we might be able to do something about it and then there is something about you know when are these readmissions occurring so there's a higher hazard of. Early and late readmission among African-American patients compared to white patients and sometimes the reasons for these readmissions differ whether that's early versus late so the causes could be they are multifactorial they can include infection complications during surgery and it could be co-morbidities so you can imagine if someone's very heavy set in there they have a high B.M.I. that they may be a higher risk for surgery so that may be associated more MORCOMBE or biddies again these causes vary depending on when the hospital readmission occurred and this is some data from our center looking at you know the proportion what proportion are due to what cause and you can see you know chronic disease exacerbation may only represent twelve percent at the start but that rises to closer to twenty percent of the end infection again as a major cause and then you know rejection or acute kidney injury things that are a little bit more specific to the surgery and the sort of immediate operating how well that could be as functioning related that that's more prominent in the early part of the. Readmission rather than the late phase but little research is really disentangle these causes by racial and ethnic group and have examined so sick anomic status as potential causes or markers and these factors can influence or interact with some of these important other causes like infection or complications or come of it it is so I want to say just take a minute to say you know this is an area again where much of my research is focused on this health care access and why does this matter why do we care about all these social determinants of health and outside of transplant. And this is a really interesting study of I'd urge if anyone's interested in sort of a health care aspects of this. You know improving access to care this is a really interesting study doesn't necessarily improve your health care outcome so there's some simulations and some some data put together that you know even if you could treat everyone only about ten percent of premature deaths could be prevented health care really only explains a certain portion of the. You know this isn't in this case the outcome of premature death whereas things like your genetics behavioral patterns your environment and your social circumstances actually explain much more so treatment is great going to the hospital get treated as great receiving a transplants great but that's not going to be very helpful if you're not paying attention to some of these really important and critical other factors including the social determinants of health and why does this matter for George of course. Those of you that have you know are familiar with with our state and. Poverty levels will know that we have above average poverty rate here compared to many other states in our countries so the south and the southeast we're part of this sort of poverty belt is what it's referred to so this is this is really important for our patient population and where you live matters so this is an example of Atlanta alone in a difference in life expectancy of thirteen years if you live in Bangkok versus Buckhead So these really do have implications for health outcomes and this is something that you know our group is trying to incorporate more and more of these potential exposures that may happen early on in your course of your your life and throughout the course of your disease into our predictive models to try to identify who might be a highest risk for these poor outcomes so that's that's the sort of background on what the problem is in transplant and I want to turn now to some of our current research so applying this background in the rationale for why we're doing what we're doing how do we go about and then predict who it is that's going to be a highest risk again we are expecting these are going to be patients who maybe have lower socioeconomic status or maybe minority but may also have other unknown factors that may be driving these poor outcomes that we want to discover so before I go on to the next section is anyone have any questions or comments anything now OK So you know anyone that's how to transplant or been on dialysis for my Or that anyone donated a kidney. Are you organ donors on your license something to think about. OK. All right so you know I hypothesized in this research that the earlier prediction of readmission after this surgery could allow for Again earlier intervention so if we were to implement a system that predicted in real time who's who's at highest risk couldn't we then identify what's the best intervention for those patients so it might be instead of letting them go home on day three after the surgery maybe we need to educate them about you know how to recognize the signs of a failing graft or making sure that there are you know taking their medicine that's critical to saving the kidney graft that they have a question what are the what that are coming in. What are the quantity or the quality quality there if else is a good question so what are the question is What are the quality of referrals that are coming into the transplant center I think it's variable but I'd say that most of the nephrologist in the dialysis societies are really instructed to refer patients who they think are pretty good candidates for transplant all the transplant centers have kind of documented on their websites in the usually fairly familiar with what their criteria are so like Emory for example has a B.M.I. cut off so they would know not to refer patients who have a B.M.I. greater than thirty five because they wouldn't be good candidates for surgery there may be times when they're not really sure so I would say there are certainly some referrals that come in and we fairly quickly say this person is not a candidate so you know we're not going to consider them I could question other questions. And then we think that you know if we were able to identify them and you know do some sort of intervention that could decrease our disparities that we see and read Mission rates particularly targeting interventions to those higher risk patients so kind of population health approach to targeting that. Problem. So you know what's been done so far so there are other studies that have used some national registry data they've looked at demographic factors so sick anomic clinical factors transplant surgery factors utilization factors so how much somebody may be using the hospital system before they get the transplant to see if that predicts hospitalization and readmission after they receive the transplant but most of these have been static models so what I mean you're all for Mary when I say static models are they're not my dynamic they're really just take into account sort of non longitudinal data so something happens at one point in time and they're measuring that and there's not really any follow up of so what happens the next encounter in the health system so you're obviously losing so much information if you're not in corporate in that sort of dynamic model and this would be an example of how this might be important for someone so it could be that this patient is at intermediate risk at first but then there are rising risk over a period of time and then there decreasing risk so it's this is something for a patient there's situation may change over time they may be a rising risk when they maybe move from one neighborhood to the other or when they have a significant. You know they have weight gain during that time period or the discovery of some other camera Biddy that they have or it could be that then they have some treatment and intervention in a decreases so it's very important to document in the data what's going on with that patient over time so it seems fairly obvious but there's a lot of limitations in the data that don't allow us to actually look at some of those changes over time and so a lot of these static models that have been done in the field have fairly limited predictive accuracy says C. statistics ranged the area under the curve is ranged from you know point six three to point seven one and actually in transplant we use risk prediction models that are about that similar quality about point six five point six seven in our regulatory process so transplant centers are flagged for having poor outcomes and they use risk prediction models with the C. statistics that are that low in order to sort of guide the quality of care for transplant centers However we think that we can improve upon this in predicting readmissions. So this one example and this is a paper that was done at one transplant center they looked at incorporating a bit more dynamic information but what they did though was they included things after the time of transplant so of course after the patients transplanted and they have followed visits there after you might say that certain lab values so if their blood pressure is rising that that's going to lead to increased hospital admission readmission And then there of course they did find that when you included things like that in the model the area under the curve the C. statistic increased substantially but what we're trying to do and some of the research we're doing is saying we don't even want to look yet at these this data after the discharge because what we want to do is identify before the patient ever leaves the hospital and goes home because that's the point of care where we're we really want to intervene is can we identify at that point who's the highest risk of course it's easy to find out if someone writes in a note this patient looks like he's about to die that if you put that into a model that's going to be highly predictive of somebody dying or having a poor event but we want to know is earlier on before they leave the health system can we predict with some more accuracy you know what's going on with the patient so the kind of overview of our aims for this project this grant that we have is we're going to try to do this in the Emory transplant center we have begun this is a strength of his memory transplant center data as we have the highest proportion of our highest number absolute number of patients who are African-American who have received a kidney transplant and so this is the population that we're very interested in and we've seen substantial disparities and so this is where if we were to be able to reduce readmissions it would have a high impact on disparity reduction even nationally just by impacting our single population. And we're looking at hospitalization in this time period thirty days six months and one year and we're pulling both structured and unstructured data from our electronic health record for these patients and so what we want to do is then. For the second name so say we identify a perfect model maybe it's not perfect but it's a great model as best as it can be and we want to integrate that. Dynamic wrist production model with what we call our transplant data mart and I'll talk about that in a second question yes. Just a few slides before you mentioned. The resettling. Yes So how do you how do you guys. Because that's a forty percent large so hard. To count because that's what. You Yeah so I'll get to some of that soon this one this isn't our research yet I haven't gotten to yet so this is the example the paper of that I'm pointing out the limitations of however if I don't answer your question raise your hand again I want to get to some of our data because that is one of the gaps that we're trying to address I think that's really important because of that these models that's another limitation is that the prior research group that had looked at this they really didn't look at some of the social determinants of health which we think that might explain why maybe their area under the curve you know is not that strong and again they were able to increase it substantially by including post transplant discharge factors but we're saying can we improve it over point six three by including not post transplant distractors and some social behavioral factors so what we plan on doing then you know just to give you provide the framework for sort of where we're going haven't gotten there quite yet but. To is once we build that model to integrate it into a sort of risk dashboard for clinicians to actually use to aid in their clinical decision making and guide the use of scarce resources for these patients and the idea would be don't just intervene on everybody and provide lots of resources we have to also identify who's at low risk and for those that are low risk use less intervention for them so we don't have just unlimited money to throw at this problem so we have to deescalate interventions in some ways and then escalate interventions for those that need it the most so our process and plan is to engage a lot of the clinicians patients some of our community partners and guide some of our existing resources in order to figure out sort of what what should we do in order to. Support these patients once we identify who's a high risk. So I take a minute before I go into more of the specifics also to present sort of what are what are our guiding principles for what we're calling this transplant data mart and the idea is you have lots of different data going into this data mart this is the electronic medical record data for patients all the encounters that they're having a memory visit encounters explained anything at the transplant center level those encounters we have national surveillance data included in there as well a point in scheduling pharmacy data medications there's a whole lot more data they don't have right here lab data administrative and billing data there's clinical notes really everything that you can get within the electronic medical record is there so we have a very very large set of data. And the idea is to use that to run a model in this sort of in our predictive analytics system output that back and to the data mart and make sure that the clinicians can see sort of using that data who's a high risk and so we're working through some of the logistics of how exactly to do that and that will be the sort of second phase but so getting back to sort of the first game and what we're trying to do and that's what I mean to focus on the rest of the talk on some of our pulling in any data for that so some of the data we have I realize you may not be able to read this from the back but highlighting some of the important some of to your point some of the social behavioral determines of health these are sometimes hard to measure and we have some proxies that we use for that So for example we think that frequency of moving So if someone changes their address many times over time that that may be a signal that the patient or the other the patient may be moving from place to place apartment apartment or that could be a socio economic indicator we have things like missed clinical or lab appointment so sometimes missed a bunch of miss appointments in a row that's also associated with some other social behavioral issues. We have psycho social data such as whether the patient had depression dimentia our social workers take into account whether they had social support you know is someone there at the clinic with them what health behaviors they have they have to in their value in them for surgery and for transplant they often evaluate drug and alcohol abuse non-adherence to medications so we have a very rich source of data these are in clinical notes so we. Will talk a little bit about that soon the sort of we have structured data and we have unstructured data that we're trying to consider for inclusion in some of these risk wrist predictors or or covariance within our models and then we have things you know a number of course clinical characteristics as well. And. The. Plans to link to. Neighborhood information is American Community Survey or census data to get things like how far away is do they live from the transplant center what's their poverty level what's the income income level of their neighborhood or education or any index so a lot of different aggregate neighborhood information that we can include as predictors as well so again our you know our methods for this what we're trying to do for identifying these trends are recipients of high risk of hospitalisation and that thirty day six month and one year post transplant period is to you know use a number of different risk prediction algorithms using machine learning techniques deep learning technique with both our structured and unstructured data so natural language processing and this is where we had an obvious reason to collaborate with the sun lab and use the strings from Georgia Tech of computing with how we have so many data sources and working to develop some of the most appropriate deep learning artificial intelligence techniques for looking at predictive models for hospitalization for this patient population and again what we're trying to do this is what we're trying to get for and this may be hard to read but you want to seize on this sort of red and orange and green might tell you that OK On this day the patients that you're seeing or you're about to discharge in the hospital these top three have a you know ninety percent chance of being back to the hospital within the next thirty days don't let them go let's intervene on them whereas someone might have a two percent chance and so you say let's let this person go home a day early. So the overview of some of the data sources we're using so in kidney disease we are lucky that we have access to a national database surveillance database for anyone at the time that they have a diagnosis of end stage renal disease they're part of this they're covered actually by Medicare and the Medicare program documents their encounters within the medical system so we have claims data and we have information about their demographic and clinical character characteristics and that data is linked to United Network for Organ Sharing data which has information of when a patient is waitlisted and when a patient is transplanted we also have our local database as I mentioned this. Emory transplant center data mart and this local database is. Has about three. Say about after some exclusion criteria we've got about a little bit over two thousand patients that we're working with and again we have structured data and unstructured data so with the local data we are planning to link both the local and national data together and we have you know so we're working with various parts of these data then we're working with some of the local structured data in the local unstructured data right now we have fifteen different structured data files that we're working with and these range from things like labs with millions of observations I think about from we pulled from any encounter that they've had with the health care system we have all of their labs we have information on the medications that they have any encounter and then all the clinical notes we're starting to pull So if some of unstructured data we don't have right now we're only looking at social worker notes and selection committee nodes there are lots and lots more notes but again with two thousand patients we've got a very sort of wide database not necessarily a long database and so we're working through all of these now and cleaning a lot of the data and spending a lot of time thinking about you know where we want to go where we get the most bang for buck for these data. Again with the idea of predicting readmission with this information so we talk a little bit about the local structured data first so this is where you know these are the patient information we have data on their pre-transplant details so before they came into the transplant appointments that they've had including missed appointments readmission information that will include for the outcome variable biopsies that stay there for most or all of it in the post transplant world H.L.A. his so compatibility matching factors and I won't get into all the details social behavioral things like tobacco alcohol use emergency department encounters both pre and post transplant the patients details may change over time so their insurance status may change for example where their living may change their number of different things that may change other health status values may change diagnoses medications vitals and so on so again lots of data. For us to examine and this is just showing so beyond the just over two thousand Emory transplant recipients each of these models of the column here is this is just looking at the demographic data so that's one file the second file would be demographic data plus diagnosis so if they had diagnoses for hypertension for example or diabetes we would be capturing diagnoses and there this third one is looking at demographics diagnoses and medications and the last one is adding in the lab values and then each of these colors is representing the readmission. Readmission for each of the time periods thirty days is the red the green is sixty days the light blue here is ninety days and the purple is one year and so what this is showing is the area under the curve here. And as you can see you know the model with just the demographics doesn't do great you know not that much better than chance alone at predicting readmissions for no matter what time period point you're looking at here and this is again using logistic regression sort of starting with logistic regression basically just a progression to start we're comparing a number of different models of procedures here Yes QUESTION. That you. Would imagine. More and more. Factors more. That's a good question I'm not sure if we did see an increase in the false positive rate with that. I don't know if we've looked at that necessarily but I think it's a good good suggestion good thought to look at. Right right yeah it's a good point. But you know so what we're finding is that we do have and I think this is not unexpected given the data that we're looking at and what I said said that the causes of readmission vary over time that we're we seem to be doing a little bit better of a job predicting some of the long term readmissions and so again this is just a preview for just a few of the files of the structure data and again using logistic regression just to start we've got a long way to go to look at all the data sources we have but just using them for these data files I think we're seeing a signal that at least adding more data has helped us to predict. And so using random forest performance on the structured local data I think the performance was rather similar for this group as well I'm so we're comparing both logistic and random forest and this may be one of these was updated one of these was and. We just had some updates this afternoon and so but you're trying to compare what's the most you know best approach for for this and as we continue to work through some of this data again we're hoping to add in more social demographics or socio behavioral factors but a lot of this is coming from the unstructured data and I'm going into that in a moment. This is data from last week so some of this is changed slightly I think in some of our new models that we've run but one of the things that we looked at was. Looking at so comparing the overlapping models between the logistic regression the random forest and what are the features that come up as the most influential and this is what the clinicians really care about they're sort of like great you have this gray area to the curve but tell us what it was that drove that you know see statistic up higher they want to make sure it makes sense they want to make sure that there's not a mistake in the data and sometimes there may be things that are predictive that you wouldn't think and that's good for the these models to bring that clinicians really want to know sort of why that is what's you know what's the explanation for that so in some you know we saw a lot of consistencies between the models there was. The random forest African-American race came out as an important predictor but didn't in the logistic regression model and a lot of things that we expected so certain medications that they give when things are not looking good in surgery or associated with worse outcomes in the thirty day readmission things like diabetes uncontrolled diabetes were important a code for surgical operation you know with causing abnormal patient reactions a lot of things that we're expecting came out and then the unstructured data so again we were really just focusing on a couple of social worker or a couple of different notes at first a social worker know some pre-transplant social worker notes and a selection conference no no I'll explain that in a moment so this is work that some of the I think Sarah did at the here at Georgia Tech where she was looking at you know we started with three thousand four hundred twenty social worker notes and this is among that population about two thousand people. With transplant readmit within thirty days eight thousand without a total of eleven thousand seven hundred fifty two notes so this is a lot of data for a small number of patients on the average to be patient per patient was six average length for the soldier was about eighty seven words and then we focused again on the social worker notes that were pre-transplant So we're looking about for the sample size of all twelve hundred nodes and four hundred twenty eight patients so we're we're we're missing some social worker knows we're going to go back and this is one of the struggles that we've found working within the health care system is that it's not easy to pull this data someone's got to pull in sometimes things like practical things you know change over time so the forms in which the social workers were documented this information have changed over time so how do you when you're you know accounting for that in your models how do you take into account that the notes have changed but you don't want to cut out half your population and lose them they all have a social worker know it somewhere or a clinical note somewhere so these are things some of the methods and challenges that we're you know are ongoing that we're working through how do we incorporate some of this important information for all the patients. On the other note there we're looking at is this it's called the selection committee notes so what this is is remember when I talked about the process for getting a transplant the patients have to go through this transponder valuation they go undergo a number of medical tests psychosocial evaluation well after they complete all those tests they're presented to a selection committee it's a multi-disciplinary group so there's surgeons there there's just there's nurses there's the social workers there's maybe cordon eaters a psychologist so they're all there trying to say is this patient fit for surgery or are they a good candidate for this transplant and they write a note about that patient so there are things like you know OK the patients you know they're E.K.G. results were normal but the we saw something you know in their in their social worker the social worker might raise Well we were a little bit worried about social support they don't seem to have someone that might be able to drive them home from the hospital and to make sure that they take their medications in that critical period right after they get the surgery so those may be reasons that they say we're not going to get the transfer of the patient or those may be. Just they still go ahead and proceed with the transplant but it's a rich source of data for us to potentially use later on after they get the transplant but sometimes this selection conference no happens for a century years before they actually get the transplant so there is still some limitation to you know having that kind of earlier on in in that process so any questions about those notes and let me give you a little bit of example this may be a little hard for you to read too in the back but it might say you know so and so year old person with chronic kidney disease stage four or five is now being valued for a kidney transplant and they say things like this patient would benefit from a transplant will continue to follow up with this primary for all adjust. Polly cystic kidney disease patients native kitties are in large patients symptomatic from the pain and bleeding patient may need to be evaluated by urology such and such a hypertension currently stable medications patient reportedly had an M.R.I. of the brain done I've discussed with the patient and patients probably you know all these things to protect the patient's name have been removed but the patients could be husband wife friend who is there regarding the risk of infection I've discussed the Regarding wait times at this time the patient appears to be a reasonable candidate for renal transplant. So these are things that are potentially important so this is also where in this note we might see some flags for you know this patient is a pretty good candidate but we do have some remaining concerns about X.Y.Z. that we might be able to use in our predictive model so this is the selection conference that is that C.N.N. the selection conference no we're just using a subset again not everybody had this selection conference that we're still working on pulling some of these notes so these are preliminary data of course and we've done some simple characteristics to see how similar is this patient population this subset of those with the nodes the three hundred eight patients compared to the six hundred ninety one that we could have pulled from that just as we're just using a smaller sample two thousand and thirteen to two thousand and fifteen. And they are fairly similar. And we're looking at a few different models here so a model with the structure data alone and that's just I showed some that before so say we had those four files that we're working with right now we've got thirty three variables in actuality we have. Many more variables to choose from and we're working through the data cleaning on a lot of those variables now as we are running. Analyses on the predictive analytics on the structured data on my own and then on the selection committee notes alone and the social worker. Series in bag of words and also sort of simple just for now is sort of our first sort of pass of this natural language processing I think in the future we want to try to grab not just one word we're trying to look at different methods for phrases so having two to three words in a place or or this is on you know some. You know something that the clinician may be saying that may be really important that we're not picking up if we're just selecting one word but for now back of words for our natural language processing methods principal component analysis and comparing our area and the curve for. Different models and again we've got one hundred fifty variables that were polling or notes words from the section conferences one hundred fifty seven from the social worker notes and in this just smaller subset of the population we have at least some preliminary data that's that's giving us some hope that by adding a lot of that you know what we think is getting a lot more of the social behavioral social determinants of health in addition to the clinical factors that we're going to capture in the structured data again more to come we see if you look at the selection of the structured data alone is that blue dotted lines that summary here point six eight if you looked at just the social worker node sets the red line here if you looked at the selection committee notes alone that's the gray line but if you put in all the other you get a lot more information so you've got our area the curve is point one So we're working on trying to expand this for the rest of the cohort and then again having more of the data for the structured data as well as the unstructured data. And this is showing the you know within the selection committee notes and within the social worker notes What are some examples of the words that are actually coming up here. That are that were flocking And so some of these are we say yeah I could see how that is and some are like I don't know if this is meaningful or not you know. So it might signify that signifying cancer so that's super important so that comes up as something that is in the top few words for coefficients or for the selection committee no pathology chronic but something like transplant that probably pops up everywhere so you are working there refining sort of what might might be important here but these are some of the variables. That are coming up the words that are coming up within those nodes within the social worker know things like language so we have a lot of patients and we've known that patients with English isn't their primary language have less access to transplant and poor outcomes after transplant that's about six percent of our patient population may come to us and English isn't their first language so that's an important. Component so maybe a good intervention for those patients down the line could be ensuring that they have you know an extra person there at every appointment and interpreter there to help explain to them what needs to be done in terms of taking their medications for example things like affordable Medicaid are all words that also came up psychiatry just so many questions about that OK and I mean talk just briefly about the national data that we're doing as well and as I mentioned one of the things were we want to eventually do on this sort of has two fold purpose one we're hoping that you know we create we can incorporate some of the national data into our local data and include some of the predictors are local data we're hoping that some of this would be generalizable to other transplant centers if we can only do this at one center this isn't going to help that many people so we want to want to sort of think about what are some of the variables in the national data that might be predictive and sort of fine tune what are some of the predictors that we're coming up with in our local diner data. That we can hopefully generalize so we don't you know one of the limitations I'll say the National Survey on state is we don't have clinical notes on patients and it's not as robust of a data source in terms of you know we don't have. Data at every time point so there's certain time points we have data on the patients but not every time point but the advantage of course is it's nationally generalizable and there there are a lot of files there's a number of variables and a lot of them are unexplored. So I'll just go through briefly some of the main features that we're looking for in random forest was. These are not unexpected I think these are fairly consistent in some of the literature but some of these do include some important social behavioral determinants of health related to insurance Medicare and Medicaid whether the patients spend it you know employed previously whether they're working for income and as well as a number of other co-morbidities this S.W. dial is first week dialysis they needed dialysis within the first week of the transplant so that's a clinical indicator that's typically associate with poor outcomes that means their kidney is not functioning quite yet it can to a function at some point but not not quite yet. Just like that's donor age so the donor that could be the living or the deceased donor the age of the donor for the patient is also associated with potentially worse outcomes and that's been shown in the literature as well so the older you know if you have an eighty year old than any a kidney versus a twenty five year old a new kidney you really want the twenty five year old kidney so the older age can also lead to complications in the surgery or it may be delays associated with getting that kid need to actually work in the recipient but a lot of again so sick anomic sort of characteristics as well we don't have as good a data on some of the social determinants of health I would say in the national surveillance data that would be another limitation but we have plans to get a link this to the poverty data in. American Community Survey or census data. And this is so this is the difference between thirty day ninety day and three hundred sixty five day and you're going to see that they the factors do change and that's important so. We need to incorporate in this is I think very helpful for us to learn before we go into the second stage of this you know research and aim to get to that point but potentially you know the causes are different at different times and so the interventions that we perform are different and as these models you know may need to be generated and a different time so we have to be adaptable for how those are used within the whole system and how clinicians are sort of aware of what the risks are and the changing risk for patients depending on how far out they get so for some that are three hundred sixty five days some of the SO SICK anomic characteristics are potentially more influential for these patients. Things related to their or their health insurance and I wouldn't worry about all the details of this but the definition is here kind of over on the right and Medicare and Roman status Medicare and Romit reason things that are related to their health insurance any questions about those. OK And again yes this is sort of getting at that this is a nice way to plot and I care a member who did this from the sun lab it was great work maybe sand. But looking at feature importances and how do they vary over time a lot of the clinicians love visuals I mean they just love so anything that you know like that's like the hot spot if you've heard of hot spotting that everybody really wants to see sort of where who are in the red and what's going on and we got to find those people and so looking at the changes over time are things getting better or worse and what are the features you know what features are most important in one at one time versus another so like for example it's easier to sort of see from this graph that maybe that you know Medicare patients status that that it's not that important in the first thirty days or maybe ninety days but it gets to be more important as you go further on in the post transplant period so again I think we have plans to link the local data with the national data and get a little bit more information from some other data sources and that will I think once we have additional data sources help to improve some of the predictive models and put it to doctors accuracy for some of those so we are at the very beginning stages of this and I appreciate the opportunity to present some of this this work it's I think really exciting. The amount of data cleaning that we've had to do is been extensive so I want to talk about future Decker directions briefly and give you guys a couple of minutes to ask questions to myself and potentially others in the room that have done some of these analyses and work on this. So some of these future directions are methodological So we want to improve some of the features and include you were going through and saying you know the first cause was OK maybe we shouldn't clean all this data first we should just give give the Georgia Tech Group all of our data and say just go and then what we found was that we were getting a bunch of stuff that the clinicians were saying like no this doesn't make any sense at all so you know just because they want to think about for every variable doesn't make sense you know so there's a lot of derived variables that we're finding that the clinicians really want to be involved in and might be really important and they're looking at the literature and saying we know that this is really important and so Dr variables being you know maybe it's a particular categorization of a continuous variable that they know that once you hit a certain point you know cut point of you know blood pressure or you're crowding or something that that's really important and so maybe categorizing them or looking at a combination of variables for example we have something that sort of talks it's a risk prediction score that's been put together previously in a different paper on sort of what's the donor quality so maybe looking at four different features of donor and coming up with a derived variable for donor quality and putting that in the model is also helpful so we're at the stage where we're going through and trying to improve the features that are in the model again we want to add features from the structured local data I remember I just talked about four of these files and we have fifteen files to potentially work through so we're going through a lot of data cleaning and a lot of this is you know we have lab values so many values and medications so many medications you couldn't imagine So how do we categorize these do we need to you know have a focused approach how do we how do we group them you know there's a lot of thought that we want to put in that and a lot of this is working back and forth with clinicians and sitting down and saying what's your sense of what's the most important thing for us to look at at this time and going back to the data and then saying OK you know how can we focus this and now we'll focus on this data set and incorporate that into the model and again the unstructured local data text and updating the natural language processing analyses and we've really just sort of I think are at the tip of the iceberg for all the things we can do with the data we have so many more notes to go after we started with social worker notes and selection conference notes but we haven't even looked at things like during the peri operative period the description of the actual transplant surgery so you get a magic they say like I mix the your reader and that might be in the note that that would certainly be related to complications post-transplant and that that might influence our post-transplant hospitalization. And of course our longer term once we can identify some of these models can we put them into a sort of real time system and create a dashboard and we are working with scenes course this semester I forget what the Course is called but there's a group that's what is it the health informatics that's that we're working with to develop. A potential solution to integrate this model once we can identify the sort of final model which I'm sure were quite a little ways off from but sort of how can we put in the technology to ensure that we get that in sort of real time we have updated information because we need the information up until the point that they're discharged so we need this in near real time to be able to update those models before the patient is discharged from the hospital and then long term you know working with some of the clinicians and partners to develop and test interventions so that's sort of the next are a one grab might be once you figure this out how do we then say OK we've got this great infrastructure how do we test an intervention and I think thinking about how do we take that model and generalize said and we've got some plans we've got partners with other transplant centers who have similar sort of data marts who have written us letters of support did for our gram saying you know if you want to try this out at our center you can but it's good for us to start in one place where we have so much data first to evaluate what are the most important predictors and then go to other transplant centers in a sort of more strategic manner rather than saying let's look at everything so with that I'd just like to acknowledge my team transplant Outcomes Research Group there's a lot of folks who've worked really hard on some of this data potentially you know all that all this data cleaning and of course the sun lab we really appreciate all the help so thank you yes. I just. You know. Why is that. Yeah so I mean you go back to looking at some of the features you know over time. This is with the national data I guess but I mean you can look at it both for national and local data there's a lot of research already to suggest that those causes vary by time I think what the point the aspect that we don't know is how some of the additional features that we're gathering and whether those. Sort of have changed over time and how they interact and I think that's where the sort of interesting science is and where the some the clinicians are really interested we have one clinician that's hoping to write a paper on that aspect alone is just what is it that differs between the thirty day versus ninety day versus three hundred sixty five and why is that this isn't the study design you would do to sort of answer that question but because we have the data in order to answer it you can you know very simply sort of create a design I would design a study where you're asking the the why the causal So you set up sort of a causal model looking at that. And you know I think there's there's a lot of different areas different sort of features or variables that we can select to sort of ask is it you know is this causal but how you design that is sort of focusing on one sort of feature at a time and asking Is there a causal relationship between these different points in question and the other questions. So quick question what's the kind of standard of care now on for doctors clinicians to target Yeah so I assume that they have complications from the surgery is there anything else that they do in terms of I pressed or they discharge them yes the average Basically I'd say that. We have a standard of care and everybody gets the same thing so they're educated a nurse educator comes and talks to them about their medications that they need to take Certainly if they had any kind of complication what they would do is probably have them stay in the hospital day longer but they don't actually do anything in particular for that patient like OK let's now assign them for X.Y.Z. we have something called the watch list that's informal it's you know one of the clinicians really has championed this and saying we need to sort of a watch list who we think is going to you know not do well it's not data based but it's sort of OK I you know this complication seems to be happening I'm going to put them on the Watchlist and have the coroner make sure to follow up with them next week so it's a combination of the sort of standard of care of educating the patients before they're discharged and then putting them on a watch list. And that just you know want to limit it's informal I think it's helpful it's informal and I think you miss things so it's you know and there's variability from provider to provider so some people may say like this is the most important thing and some don't and I think but one of the one of the studies I think we've thought of that we really want to do is can we kind of compare the risk production models do with the physician gestalt of you know I think this is a patient that's going to come back and see like how well do we do in terms of data versus the physician approach and then you know yeah I think one of the limitations of the sort of watch list approach is it seems like they just keep adding more and more patients the Watchlist and they're not really taking any patients away so the burden on the physicians is getting greater and greater and what I like about the predictive model approach is we're also thinking about the in turn the rising rates but also those who are falling into risk and can you monitor and predict people who are going to do super well so that you don't have to like the don't watch list right and that would be saving resources in time for the clinicians it's not something that they sort of currently do. Yeah. Well thank you and thank you so much I appreciate it thanks your attention. Thanks.