[00:00:05] >> I or one welcome to the idea seminar series today I'm really happy to welcome Professor been you from u.c. Berkeley will be giving our I.D.'s triad distinguished lecture So Ben is a chancellor as distinguished professor and the classified 936 2nd chair and the Department of statistics and e.c.s. at Berkeley does in fact a former chair of statistics at Berkeley a research focuses on practice algorithm and theory of statistical machine learning and cause of inference and has power group has been engaged in interdisciplinary search with a wide range of patients from genomics neuroscience precision medicine so you it has received broader pick accolades to go to Korea She's a member of the u.s. National Academy of Sciences if any of them are going to got me of arts and sciences you as a Guggenheim Fellow and the 2 key memory a lecturer at the Bernoulli society she was also the hospice didn't offer I a mass mystic of mathematical statistics and a whole other list off awards which I don't think I've had the time to read through she was also the pharmacologic to write them as our lab became University and a member of the Scientific Advisory Board Alan Turing Institute So again I am really pleased to welcome but for this distinguish lecture thank you then to thank you. [00:01:25] Thank you thank you for the very kind connection so what I want to share is some project we have then things March so this is really kind of epidemiology probably was not limited to what we do that then we share what we have then thank you for having me. [00:01:41] So. They only started in late March. I responded to a couple data times expertise by a newly founded nonprofit organization at the Congress to remind you guys we were in the media was having all these people the urgency costs so we can always believe which will make the impact and possible impact and that was a great opportunity to work my talk and like 12 members of my group jumped in with me so that was once we started teaching like a maturity knowledge is what we joined the response to life that was one week old when we join which really helped. [00:02:21] Them to distribute the t.v. and we thought of the call and afterwards the data and they said we have no data so you guys have to find the data so this is the team we come to him with me and Nick and. Shell where they had teams are top teams and we have Nick a shell actually our modeling team and Shannon which at the lower 100 kroner was my deputy really did a fantastic job with me organizing the team and the Tiffany and here with the data team this is keeping data and other people did a lot out thing Iraq was really helpful. [00:03:05] We did paper regions and all that So this really is teamwork and in the summer we continue with new members and we've got a 3 in term 10 team wow and supervise and to turn it all turn to very a huge team and the 1st time I have about 20 people I never have work to maintain and to be so many people so that was a challenge for me as well and one point I want to make is that. [00:03:31] Really we need interactive parenting face time this would have been possible. Without like 12 people on the team for 2 months and to finish a paper human and never happen in my group you should take us 2 or 3 years to finish a paper bill the doubling of all you know cancel increase to him members really set up the process and also put me in a different regime of collaboration with other people myself so really this is really teamwork and with a taste of everything for 2 months and only work on that but a very interesting project I have found I was like a war like project we have a good track and we see an emergency room doctor and his time is our Because it's got part time jumper exam emergency management the most interesting 15 minutes of every other specialty a fellow like our projects the most interesting a few hours every skill let me just share with you what skills we need I feel like we're over technical training right now data scientist at a point that we need to acquire others to use to really make the impact on the economic they remain It's. [00:04:44] More like a traditional just do the technical part and just this project is just not enough so 1st we have 0 they had to start with and we start Googling you know Fran you know because we didn't used to do a pretty melodies will reach out to people and understand what could be driving that that's come quickly with the tide that we're going to look at that comes at the county level they don't want to cause a hostile libel because that's where the p.p. of the ship but we don't have particular they don't we have basic information data demographics that we don't have that has come at how true that was to we don't except as well we have you see hospital hospitalization days and we get in huge and everything was just having us pass lose my personal connection and. [00:05:35] Through connections talked to medical equipped marketing people to see how with the medical supply logics work and in the beginning we were after ventilators and later Actually I pushed to support other people want to p.t.s. and we try to connect with Emma was knowledge of success and every 830 is a day to call it a leadership team more than 2 years and tragic at data because Parish an organization sometimes the replies sometimes they don't and we even got high schoolers actually made San Francisco hospitals because we're sad if you have a treaty says it hard to trust us to receive it and that route was now success because you just didn't know who you are and to try to receive and I talked to a couple of your doctors to see if I was going on and on the ground has u.c.s.f. and I have a team supplement people working with low just the people who work with the self process apply under severity index I would be the Web site because we want to have transparency because ability and share the data so it's clear that we think if we can know that they secure them even we don't do a lot with them other people can use windows where the same data and stores and then have different versions of it so they're the best site to share that and then because we want to share out of them and they don't we sell have to document that's why when they're writing a paper so the Shalimar who is the editor of our data science I mean you reach out to America and said you want to interview me and later and I'm writing papers and I have been a paper and he also did the into the beginning we're now thinking about writing it was really for Aquarius. [00:07:25] And alas it's what we used right have to do the prediction with the data we have. In order to think it was no less important but we wouldn't do what we usually do with the all the other things I mentioned like certain of them so it's very and all this q. is that I think part of everything I think it's worth and we would like to have our next generation this is time to have to be able to work with people you know that you can have the data you can have the domain knowledge you can have the time the expertise so that we get to do the modern a part you know more reproducible transparent and responsible way so the 1st job was to find data we scream you can see like Surjeet data sources and sank to sample not is can we share some data that we couldn't find in the open source on that site and well to God it's all county level we're the one the 1st if not the 1st people things their level predictions of country that predictions so here Bertie of the how to level country data to radiate the 7 about 7000 hospitals in the us you talk about about 200 features you got me to identify address what have a solution urban or rural how many bands how many i.c.u. band acoustic occupancy rate number employees and with the hospital or rating because we really want to be able to predict that comes in the end on a date on the right turn up in our various local County Pa predictions about what happened there other people can use it we have that that comes from your time for us to fax and also squeak. [00:09:15] Open source website demographic population public density age structure al-Sharif factors are disease so our so-called social economic risk factors. And so sure this isn't single bill until just recently I think most Facebook and a company reach out they have some new data they want us to have it I think they recognize a lot of people go to our site or they to Pew poll they said they to repository and we all have some opinions or temperature data. [00:09:46] And some sample flight and here is a lot as I read newspaper and team member every newspaper is now they don't seem to script and do and they security and so this is this talk is not just about the prediction matter it's also about this data every part of the whole time what you will help us at of ties and kind of useful to the data quality issues about ask of the case number just completely depends on how many people get tested. [00:10:12] And this kind of has a problem to theirs and they come problem after April 14th const start including proper that and post the u.s.c. facts and your kind that account from the same sources at the county level but they don't always agree and we have about 3 Soudan counties and we didn't have the human resource really verifying the cell to get a sense of which country is very reliable and they might be delays for them to put things on a website so a lot of things could happen we're using this data repulsor from my final project here she has a class with an ask each team of these call one county to find out the quality so that's just a really small subset of confidence we can get if you look at the through sources what you have on the left is each state we have a difference we have 80 days and we have a series out of the county so you can see that we have lost a lot of big numbers and that's what is it mostly there are great if you chop that off you can see them on the topic of the scale was too high that the vertical was on the right is zoomed in version of the left and you can see that the differences. [00:11:31] From 102200 mostly minus 100 to possibly 100 and there's something actually there was one correction which national origin and the way it is also different from we can we take that into account a little bit you know our predicament so the dashed line is the average of total that comes and the deficit that somehow there's a peak because it is a delay of report at different level from the hospital and how the public health department in the country guys are that I think that small of the weekend a fact that there was let's look at the 1st on the left is the number of divisions totally to transit 100 and then and pay for evasion is the delay right time like your evasions and you use that we do this you can do a histogram or there's just. [00:12:32] A c.d.f. of that should be ashamed of if you look at the revision and how long people waited to do the revisions you can see mostly it's about mostly sheep like 20 days as most and in terms of how much a pig is a relation is a sign magnitude and the scripts it's really because one thing gets revised and future you know comes over gets revised and there's one down there minus 1000 and this is actually we look into it's a King County in Washington State and the actual data should be like 530 something with the answer as 1500 to somebody prophesied a human error and to the one and that created the revision 8 or minus 1003 where. [00:13:25] We do have such an incidence. And they said have a revisionist have a mistakes great problem are you going them and we'll have a huge pig I lift that and usually really screwed up agen bit but alas we have humans there to check on the data it seems not a model so here's a pipeline we develop we sure of it though and Country data and we started our mouths actually with how many 1st we aim at a 7 day because there was enough of p.v. shipment for the purposes responsible actually survive this was. [00:14:04] The provision made by the private makers around the country and we work with Arab rage which is a lot of pilots which is free to ship the peace process those 7 days was enough to to give them protection so they can try to. And ship in 7 days and we infielder the House will have a demand as to meeting or predicting the county level that's used a number on how each to a proportion of the ship you that's in 2 different hospitals because making the assumption that the size of the House will in terms of number of b.c. really reflect that the man this is kind of true and that we did a lot of scraping reading newspapers and your high spout and we have 4 wire responsible either a lot of volunteers we also have them and I sat and ourself reading experience to validate for the it's a really pretty expensive operation and you have very short period time. [00:15:06] Go ahead and data and we're now in the stage I described the data we now in the state and county live pictures so we began a lot of quick predictors and use something I've worked on our development years still to come find a different picture so that the producer that worked well recently got small but so we don't have to switch we just to start switching and the different predictors captured a different kind of regimes of the grounds of a community that's. [00:15:34] In the United States and that we work use a component to use the past 5 days prediction Ira to come out with their predictions of all. That work pretty well and then we have a Web site which pub called The Very dot com which actually attended got to visualize today it's become an automatic Ai assistant to create the data automatically and do the prediction visualised and we update we have one day delay every day and or cold and. [00:16:09] Are going to miss and data or acceptable from that time and also going to appear in the harbor this is actually the Ok so this is the 2nd part and I have a data team right he's got that work I also have a modeling team and a lot of other people doing other things. [00:16:28] We decided the best way to identify hot spots for the hospital level it's really close to the lowest level possible where we have data which is a country like oil and that comes come here to mention that's come have problems that it's more reliable to do than the case numbers so we're facing many curves it very dynamic data right dependent policy change human behavior change things issues it's no there's no station ality or anything it's just it's it's. [00:17:06] Quickly changing. And long term predictions have to do with feedback because just no way at Wild aging they smaller I haven't see any mother I really panned out quantitatively it was the number that is so anything we have seen now salmon lands into the pandemic and we also where ambitious want to predict 07000 pounds of this is wrong this would be 3000 counties and next up there's a number hospital this country should be 3000 Continental u.s. because the response our live position is out as a national organization didn't want to favor California Colorado where it has cars so we don't we can already possibly do very detailed analysis that we have to do critical local analysis but blessings that every day we have a new data point $3000.00 actually because every county will have a new task and we can validate that and do a reality check and we have to be honest a to mention for p.p. supplies short term prediction is adequate we now can do 14 days and the Ok but we know the prediction arrested 7 days so we have the data changes designed many different predictors so emotional period times we don't have a lot of typing knowledge the experimental girlz right that's where we were that's what it is but you can't unring your just seems to be an. [00:18:34] Obvious we should do or you know try and use 5 days past 5 days and there we did national predictive sharing that cohesion shared across countries to see if you can bar graphs and their use this year to cross country on to try to use demographic right remember I have all the social economic and health actually it's respecters we collected. [00:18:56] And the other is just saying that we share only using the counting case numbers not using demographic and got to be the best of $2.00 and $5.00 turns out of both that and how do we combine all we know them other sections because we don't believe anyone on the work that well so we went back to something 20 years got developed we did all the predictions so the signal for both music and lyrics like and the time Internet broadcasting was really of and are we try to do compression to do that to be a good predictor of the signals for his boat and speech you know and also music and the science they require different cover predictors and the music to their long term and their speeches short term and their level of rigorous detractors like a compressor and people are switching back and forth so we came up with said yeah that's not switch back and forth let's just wait use sample getting factory and see who does better and that should be attached to whatever the Odyssey is and you have to do is of Pa compression so basically to prediction and then we stand over it as quantized residuals and the whole while now I think both doesn't use animal code for why post wireless earphones use as ideas from our in culture and this was actually some signal processing actually give us the best paperwork down to 06 dots very delighted that we could use this again because everything happens but you just happen to let you know. [00:20:28] Don't the detail he said you have say am predictors and you just look at look at the pics are right you have a catching factor of me I'm always sat upon 5 I was very worried about feedings very good how much data so we just used the CMU prompter from then all you called the paper and then you have l. and I are using square root of the count and just look at the last few days and with the tapering to so will take 5 days past I will have a paper ins of the the 5 basic those predictions doesn't contribute as much as yesterday's prediction error to this weight so this way you keep mall weight to predict a recently performing well and see it's Apprentice control suppose you have to predict there's. [00:21:18] No one in your bank's blank shot that I see can also redistribute the weight that you can favor the better one more by tuning see the bigger to see your favorite better about more and we just took the 2 numbers from God according to you because I was very worried about 4 of it so that's what and the weights right minimalize and then you just let the projector automatic a switch between we end up using 5. [00:21:48] Actually when used 7 days are always not using the picture and the fitting was 5 days. In the waiting we use and they so we do the next day prediction and we use imputed predicts it to the next one so we basically put struck it rich of the recursively get it set to Sandsting prediction. [00:22:10] We discovered that actually too and Barack by adequate we didn't need on the street demographics to our surprise in the help because a lot of information got captured the 1st time very simple just then here Trav for each county the 2nd one we use generally your mother will 1st try to do recognize where sympathy really help which is basically used and above her so lovely in your model and what you can see that. [00:22:35] At times she used to love to have of the last yesterday and the same county cases 7 days ago and used over the neighbor's **** as our last time and a single older neighbor counted 7 days ago cases so this other pictures that we take into account it's a weekly in fact. [00:23:04] Yesterday and that's our express expanded Shareef the county the supernatural predictors or whatever because combining exponential critical collapse and that's why we end up using and then another affixes So we have a lot more data the papers are made in early May and came back in June which isn't to anything we just have no data by the months passed and then i Report result in the final paper the paper actually got extended to like the double lines we didn't. [00:23:37] Write on the data quality issues and then the referees really wanted us to cancel the data repository section got expanded which is good to people understand the data quality issues. So here are some results so you see I'm very much into if you have to do some of my other work I'm very much into stability or robustness state and actually I started on this path idea to mention that given the tricky like sure the 2012 and was to really try to confine the days are part of a show me some other part of a show so I think there's another level of robust this is ready and a heater in the mouth of her model robustness is a lost function that we have 3 lost functions the 3 plus why just are all scaled min average counts difference from the prediction and the predicted the 2nd by relative error because we look at cumulative that the higher to go of course is that number which gets us to look at relative. [00:24:41] Error and the service was kind of square root we use in our waving and then we have a square root of scale just to see that better what it really is a robust stable but different performance and actually validation you can see that the orange is expanded shared using a 7 day and they bring countries in the past day take a lot of that and a lot cleaner model and down the wrath is a linear and the blue is the combined we call collapse so we always very nice it would have consistent. [00:25:19] Over performance of like the are collapsed relative to 2. Like basic predictors. Course because we're used to waiting in the square root. Transformation the 1st to look like the article is hard because the scale is different we didn't pick out that the right skill because there are so that's why let's arm just lot worse than most of the time almost all the time that you can see that our blue flag is Stephanie while the better story there was if you look at it the 1st panel that in India. [00:25:58] There was a period rather in terms of the senior predictor for our profound our collapse like early April and but if you look at using the square root as metric than is now the case it just we we we didn't want to tear out prediction too much into the cast because I can read the article really down there so it is a square root and but I just unfortunate that where then you're actually works pretty well in that regime the April we kind of you know the near vision and this is unfortunate we use the same colors we could have used different colors that this is the same Magic's that a different horizons so the rat is a 7 day that's what's good enough for responsible life or in 7 days and through it's 14 days so of course. [00:26:57] You get harder and harder and if you're not going to 1st try again because the magic was not tailored for this or all scale skill and then average of some era than in the beginning just very volatile so there were cases that there was a weird like a 10 day since to be harder than the 14 days but that's a 7 year volatility of the data and you can see the other metrics you can see that we're seeing on the. [00:27:28] And this is just if you want to push further into the horizon to see 23 days you can see that on average you have been in your skating Bowl different treatment tricks and you have the park close on top of it there are some hours definitely on the scale you can see they're all skill which is much much higher because we didn't really use that. [00:27:51] In the agony of trying to use it to to do the waves so it doesn't do very well as well but still kind of reasonable so there was reassuring that all this reasonable man tricks with perform as expected this is the same information in the table which is hard to see you can see that we also did a straight there had 5 there had and collide and a consistent a well another as a comparative but. [00:28:24] This is the median in the middle you can see that well we're worse than the linear which is almost as close right is 5.39 we have 508 and the percentile mostly. The concealing your awesome percent actually works the quiet places works well but if you look at the median for the majority of the. [00:28:47] Class it's already there when are. 2 of the same class like the now we are at we have a contributor club and we want to go for interval prediction because you want to give people some sense when you do planning or just this can be 100 s. but you want to know ballpark it can be up to 200 only can be optimal like 110 so this is not a general him although I don't know whether we can. [00:29:21] Problem is you carry cutting edge very dynamic situation we just don't have a lot of my so we have the power of trance cross different that across different days the simple idea we had was just look at a 5 page residuals in the past they sent a car this is old data. [00:29:37] And then just look at the relative arrow because the up to the average human square roots of that is just not comparable because we're looking at cumulative that just keep going up so relative more like a comparable cross. And I think the maximum of that and I would just add that to our prediction that's our prediction is kind of following this compound one that is a very nice your ship bulk smokes it I think the last year then a term of growth and really a kind of my work across here whilst working on information theory and he was really pushing this line of about a christening probably to Syria that one too late sixty's that I really tried to repeat and probably the series on the show on top of Tunisian television is a very nice way of kind of more individual must go back to our original gambling interpretation probably see which are the sample space and property theory just frequency so this is frequencies we look at each county prosecutor and date and see how many times it covers and has a lot of that falling already in their pockets and come analysis came up you know to try to make it back to id but I think it's monitoring the time series and situation so that's what we did try to have the last 5 days take the maximum there are absolute and on top of prediction of course this is cumulative if the interval goes beyond those below the last the last observation or you don't want to use it chop it off and make it and their own too. [00:31:16] You can make some assumptions which models follow that if over the past 5 days revisions and the future predict exchangeable as kind of the idea as a mathematical and a concept and there come from analysis than you can show that or the right but you permute that there's only 5 over 6 chance that. [00:31:42] You will get covered so. You will be your the the next would be your peak oil beyond. This next interval to get 83 percent coverage if exchange of energy holds this is average rational 3.5 is quite a bit better than average but the really proof is knowledge of some Just because a thing about time is really about is ation this look at the characters so they give you some sense of what's a target so you can see that we just look at different intervals April 11th the 1st man. [00:32:22] After you need some data the trend than we really about 80 percent most of them but the average size a pretty big but the 2nd still but we're really a math counties wish that large and if you do that the intervals are easy because I have very few that I'll show you one is just wild even just the state of data so the coverage usually when the low coverage like or 2 percent if you can is a sudden after like a revision and which is cultural. [00:33:00] Of the track because the arrow we don't know we don't know how to correct for that later we will recover but at the time because remember you remember that my plot the delay could be 10 days or something for their release. The Japan should have humane The somebody checking into better data quality to a lot of our no courage because it's up to and now are going to couldn't really keep up with that and then you have an injury with it but I'll show you some results other people miss. [00:33:30] The prediction of the observer we don't meet a lot you have an interval Now let me tell you about the vision of the world we have the culprits a very cute dog can handle a very smart he just bought it starting this. New season is Web site and really get a lot of attention and those people find it very useful and I read in the paper which is lot harder so every day was created from us in fact all the reason we use year to fax non your times is because us the fax line we went into this prediction exercise New York was really the hot spot and us the facts as the different districts of New York as different counties and you know time had whole New York City at once so we didn't like that that's the main reason we wanted us to connect and now we do have some automatic other static features already a curated by humans and we have pretty good documentation as how we did all the judgment calls but the new data coming in we do some pay sick cleaning and so automated we ran it down we have a lot of credit to do a. [00:34:45] Project with a double x. So we ran it on you a double as everything prediction intervals plot maps all of energy generated as a satellite you know that's a number I showed you our King County where we could have human or Bill that's my. Vision of how actually human machine collaboration we can have the human is do the frontier work and then understand things when things become repetitive with brain and machines at the same time you measure always how oversight never got it seems company automatically right so this is now as changed actually are our friends and our friends and above site now on a hat on the laptop is a map of a cumulative cases this is I updated on Tuesday those pretty recent and you can say that well that depends on the population you're exactly right and we have the slides and others in data on the right hands it's something new we actually have been able to go to concerts but we haven't really gotten there at the have something to be allotted for other people you can put in a county and use why are mavericks and say that look at all the other countries with similar trends something so that's what with the it. [00:36:02] Under that it's like comparing different counties and then found there is also 2 plus this is a case isn't that at best the paper only. Discussed that's the same as it actually worked for cases that we didn't put in a paper he said about a site and you can see the different counties and we also do a matching of the 1st case happened so you have a time like recognizing different countries don't have the same starting point and if you click on the left you interactive that the blue dashboard now you get to this page and we have 8 matchups on top you can click we have human a case commit as new cases new that as a normalized have resigned and that's it and this is Tuesday new cases pro-ban 100 k. you can see that the they seem to be what's happening it's really up or like in North Midwest. [00:37:01] And the tub it's a place called Carter Montana and the the right it's a service you can put in Montana Carter County and take a look like Greg Hunter ek you have $12000.00 that's huge that there's a problem so that's why we have the next if you look at that Packer County I only have 1000 people so the only have like one task because anomalous station they become so up there so rad that this is not a reliable it's just because there's one case there's very sparse it doesn't you cannot really talk about. [00:37:44] That trying to k. that so that's a problem but our facility I mean this search capability allow you to check the calendar see very sick case or not so this is I was like This is what's going on I realize this is just got blow up because there's no message very very small county so this is just you look at the county level there you know and it's one of 2 cans that I don't. [00:38:10] Know where reliable and you get a 2nd while the Carroll County in. Mississippi and they say you can see the end if they have 12 that is longer alive or canceled than other men so this is kind of Ok and down. And I can see our coverage but that's actually doing pretty well because it hasn't been changed and then for cases we we kind of underestimate Manderson a peak so I don't know whether the peak or the blue which is accumulative case that maybe it's a correction which later are fine so that's why we missed it it's this sudden up we cannot keep up because there's nothing in the past to tell you there will be up to a transit Ok sudden jump. [00:39:02] Ability assume the future similar to today. And I know you guys mostly alive that's a look at that it's popping candy and down you can see that you guys. With the task on again we have pretty good coverage for case prediction we Another good point 7 best to not too bad and somehow we seem to have over in the beginning of September we over predicted that you can see it's pretty close is not a huge difference but to be to be honest we didn't Sheila mattered for case numbers with June everything father that we can actually just add a factor to have the right coverage of a case because really the last of them after the relative era it's really give you the magnitude of the rice scale and then you consume by a fixed factor and you seem to recall cover to find their desirable coverage I want to really give to response alive with knowledge can't we reach attribute we impute the heart realize that's count using the employee numbers because we have that analogy whereas we used todo that and you know that a cut interest rate categories high severity median severity and those and that was never passed into a optimization Eugene's to optimize because we had limited p.p.s. reduce abuse so it's very risky at this liable for them to look at the see our business is to be at the impacted area situ men's look at this indices and and compiles down their request to do that which was the way we work closely with talent work who really was hands on the piano that Logitech system and this is some photos from you work with Arab rage on a stage to ship officials to Temple University Hospital now the response lab kind of countries in missions already in operation very much anymore. [00:41:06] So the impact is that on responsible life feel there's a self-imposed logistic system that just appreciate that you know Prima who cannot find areas of the multi-user for any public health department to use it every since so it is they were did make many shipments and work with different organizations to do the p.v. shipping early on. [00:41:34] If you go to our initial to the site if you go to the right. Click the dashboard that's would see on the right hand side so when I saw it and knew that from Tuesday you can see by the near that our primary south and and I think some of the upper Montana ones probably like inside it because they are small county it looks horrible actually just one case just because they have very few people there though I take their sweet Gramps out and down and the ride is. [00:42:05] The classroom that you can choose different so here I have chosen you have to keep a fountain at Atlanta where is and then you ask how do I class or so I use total tab to cumulative that as the class or so those other places similar to Atlanta like you see a therapist and Virginia the page a noise ordinance Louisiana and so on so for a New Jersey to pleasantries to lately what other country singer can be and the last to appear that show is like a shift the curve to match the 1st day then it is natural time and cluster that way and also see the ship and country so this is just pop you will explore into some similarities. [00:42:53] To the paper as a set. Is finally I think accepted by Howard is a century view and we had another project charging us in the summer. And where we are so I want to make a comment post this combination I could you have your own predictors I can combine with on agent based predictions in this combination with National Army Signal Processing and spacing what I'm doing that for the together and just see who perform well and we could use that in terms. [00:43:34] To match the interval if I cannot see 2 predictors 2 it's just you need some kind of actions ability for you when you don't you can just inherently validate it every day you have courage and see on average you have the right average and I'm working on a paper in the summer actually we finished this paper but also about out of California to your heart that I think your prediction and. [00:44:00] We end up using county data can have you see hospital data that we couldn't get access even with talk to the states they spoke to share data with the new serious situation hasn't happened and down so have the county level authorization and for that we actually know which changed the the predictors we actually use a moving average works better with Hospice station and we also have some snoozing So it's we don't use the exponential and the senior. [00:44:32] Animal use moving average and something else and use the same combination and the same magic palette you have to do the scale one point for a bit. And we also started tuning in the sea and news there was a producer that said using our message they're comparable to hawks approach and they have very. [00:44:53] Strange magic to put their magic time and a prediction it's easier than $3.00 and $5.00 it's just the magic was really strange but a lot of the fact that they actually tuned our c.n.n. which they also had no data so we're starting to need cmucl this paper. Or getting all power out every time out data and the data repository to there we just added some new data somebody else actually reach out to us. [00:45:22] At their data and over to make period this is why back we have like 212000 visits and 53 unique cloners which means to download that everything populate it making good use of it and post Clive and Matty a simple and fast and very transparent you know exactly what you're doing and you can comply and different critics are based on different mechanisms and watching the way that give you some sense about which regime you I mean you're in the video regime will grow exponentially another way it's very clear this is all slight I should say we kind of still have come to the slow response of life I don't think response from markets as active as before. [00:46:04] We have a result. And blogs and time to recall our spatial data science and the State University of Chicago to work with them and our predictions already at the city focus hop Abby and hoping that they were used to waiting that somehow they have different groups and ng some can contribute action some time now that I find it strange that audition just requires them to obviously continues a prediction and also have open source or reproducible calls and those who made the prediction cannot find virus or self-help. [00:46:40] We're working on a hospitalization prediction. And the adaptive Union inspired to produce a paper chairman all we kind of only thinking about it which is because the investigation and we already in the star more than she already claimed that her lines of processing with some data policy changes a country level over which we have assigned human resources to that but the classroom which we put on our website a very well staff watch down I wouldn't dare to call in 1st our 6 other investigation. [00:47:13] Things have been starting teaching started so that's where we are we have the data and code at our website and you have it your isolation and the papers Thank you great thanks for the car nice facts then a question from the audience. When I had a question so in our world. [00:47:42] This kind of a problem we have seen that a lot of these experts also demonic split Nations for these predictions are likely why exactly are you predicting what you're predicting. To have you have you seen that in your own experience as well and if you had to give explanations to say the response for life now what kind of explanations are you. [00:48:05] Well I think for responsible life I think they're pretty trusting us so we didn't have a lot of pushback on them they're all volunteers but in the referee process people did say that you know that's come as problems and has quality problem when we're great I usually pick this it was like a war hunt project is the best you can this is not academic project this is just you do the best you can to help the organization to the ship people so it's a best. [00:48:34] Effort approach and we're hoping to be so do you identify any signals which are really useful for your prediction like our. It is a nice particular feature you feed us ignoramus you fell close but not exactly because to use the yesterday was the most useful and then also the neighboring counties and the most useful and 7 that you have so that's without any thought other demographics in every house but is invaded into in this you know past neighbors stuff right there's the neighbor says that there's transmission I just people mingling the next fight with my neighbor predict how right now there's a time like. [00:49:14] This city that we took into account bad. Just since we were in the prediction game we if the president doesn't help we don't feel our need to include those demographics now and the simple so what do we learn is yet this really helpful and then they was happening in your neighborhood it's not just that that it's cases especially cases 7 days ago it's like there is not a secret I am always on the number of testing is that last year predictive of how many That's Michael. [00:49:49] Will be one of the question from Rock succeeded who asked Did you consider using any graph structure to prove predictions for example bigger cities although not in neighboring counties can have similar trends. Definitely that's something people have asked we didn't go there just because time pressure we need to get something and then and. [00:50:09] I think if we are in true that's why if we didn't settle to go on the national anthem I have done it in the San Francisco Calif on Iraq and tomorrow Peter modeling does a national scale we didn't have to understand the old actual Politan areas in time to do this you can people wonder represents Why did you do like hierarchical right that that I'm excited our class traits try to go there most days of human way. [00:50:36] Each metropolitan you have to understand the structure that fit through that National Forest and area seems like to have a national level that makes a lot of salads and are they hope to Kentucky or something else once you do that I think we should really do consume opening. [00:50:52] Up to prepare for the next pandemic right so have the state of used both of that but for myself it's just getting started out on sabbatical That's why I could do this in March because I didn't but not teaching started and my students also have to continue with their every job everything right I can already say that this is if it's a short term thing we might have done or this is a long haul. [00:51:18] So. Mr Wright fatal I might jump back in but right now you like I always want to have a very clear defined goal and responsible live kind of mission accomplished right which just by Ginevra it was volunteer jumped in to help. You fight if the State Department in California started more overworking which just I'm I have stayed in the absence of that I don't I'm not that motivated to stay in. [00:51:49] Just because I don't know I don't want to write another paper even if paper was nice you know to to share what we did that is not a paper writing game for me now you don't run the paper writing game anyway step right so I need a clear goal to jump back in order to actually see me ologies community is really waiting to work something I don't have an impact than things are small this most of the write another paper and who knows who can use it's number printing to me. [00:52:21] Yeah which one has to impact them right now and one of the question I had was like to consider using any of the intervention information like the shut down some of these other men a different company leverage we tried a little bit we didn't have all the information we have some data on early a little bit didn't help a lot of things or even that in the times you let more Iraqi people are predicting that. [00:52:55] That's why I said the call so and go would be nice but I think the so many compounding factors and just want to comply issue well you have the edge I was going to use blue or red counties to compliance factor measures some are because you have a policy and people don't follow it it's not it's not worth writing you are a real intervention. [00:53:19] So unless you have some understanding the ground implementation this policy is not that useful knowing there's a policy that you so much to put on company and the culture right and people you know eternally tell me that now every reopened and a piece back on pass right and then the talent always kissing each other having parties and summer and. [00:53:47] So. Without the human behavior liable the policy there isn't anything in terms of knowing what the real intervention right. Our next question is from Mark Barofsky and who asks Are foreign meters of the great predictors could change in time and space here did you find any large scale change in the factors that to related to the time of the year are differences in policies we look at the present are wasted and. [00:54:17] That sinking feeling you're to that I just don't need a reason 1000 don't believe it or see anything I think we need to say we need to understand the compliance issue before we see the policy Yeah I mean these are a lot more so like careful analysis which we don't have the human resources to right now because so many companies that you know the number but. [00:54:47] Common sense that it must work and essential work for me here you know like very used model to support that permit overkilled. Is a medical knowledge right for me is that you know if it's common sense and just knowing basic medical technicians this is a not that right. [00:55:09] A rich person I don't need a mother to show it's a culture in France because of the meat and that's the story and I doubt that now the other people who are less sophisticated as it was called in her were fine and convincing anyways so and they were fired via use for to come in staple to prove some time common sense is good enough. [00:55:32] So we had another question from Alexander opulence who asks I saw there's a plot of similarity between counties but I was missed how the similarities being defined are calculated could you go over that again briefly for our website what you do say you have only given you a few choices to use total that's come to even have heard different counts and which is class or maybe nearest neighbor and you have to specify like 5 We do 5 so we just find a 5 nearest neighbors to make the curves match and then the top to taste and that and then next 2000 ships the time is a bit to match them so that the 1st match the date that the 1st. [00:56:13] The 1st case of 1st task reported to its primary a. Simplistic. It was kind of beginning to try to get a job in the cult or investigation that summer and that and I was not very convinced myself that there's any signal there and I got busy so shooting where I am a charge very easy to say with that was our amateur job to go back to what we used to do so that's what happened but I'm open to well have a collaborator last time see the impact I'm happy to jump back in but the State Department of Public Health and we tried we try very hard it's obviously. [00:56:53] On to yeah how about just using out our prediction intervals right it's very simple idea right the use fast and that's why we can really help us connect with surprise and call but they're too busy though and I think it's our own. Little working relationship it's very hard I can see from their point of view they don't know us right you know they probably a lot of people contact them but they did reach out to you see Researchers say that we want to share data but I think the privacy issue got them so we still haven't seen the. [00:57:33] This is not to share data with many of us like in the Ok they found who got sick who got tested and where they got the hospital and when they checked out I don't have the data we still haven't because this is not ending but this being 23 months I haven't seen data Yeah thanks and I think a covered all the questions we're all trying to clearly Mark thanks again to been ordering the great doc I'm sure thank you thank you very much for it my rant thank you and see you next week for another edition of the.