Welcome to today's GPU brown bag. I believe it was also the penultimate brown bag semester. So only this 11 more wept. Today, we are very lucky to have Dr. Cleo Andrews, who is a professor with the School of Industrial Design. And they'll also with interactive computing. I completely forgot my works there for a second, but she's really cool, incredible and awesome and does amazing work. And honestly, I could probably spend the next hour talking about that myself. But you didn't come here to hear me talk. You came to hear her talk. And so I'll just put her showed you how awesome she is. And with that clear, please take it away. Thank you so much for having me. I'm going to go ahead and share my screen. And hopefully we are good and we can see it. All right? Yes, we can. All right. Thank you. So today I'd like to introduce the project I've been working on for a little over a year now. It's called a human network regions as spatial units or COVID-19, policy implementation. To a little bit about me. I'm a geographic information scientist. I've been at Tech since August 2019. Prior to that, I was an end state for five years and interactive computing, I work with events like Jon and I'll add some oil and it's been such a great experience in scarp. Then also in the School of city and regional planning. I'm affiliated with CS5, which is the Center for Spatial Planning analytics and visualization. And we're a GIS, a large GIS uj that is located in 1760 Spring Street across from coda, where perhaps you've gotten a COVID test. So come up to the second floor and see at some time. And I have a small research group called a friendly Cities Lab. I teach various classes here at Tech, both in interactive computing in and planning. And then you miss class is called Intro to urban analytics. It's a new graduate. And I have mine, a lot of spatial data this semester on Twitter data, Google Street View data, Diddy by theta. And it's, it's been a lot of fun. I also wanted to let you know and make you aware that this week as geospatial awareness week. So it's very fun and coincidentally, get to talk to you today. Yesterday was GIS Day, which is an internationally celebrated Day. We have a new, there's a new poster circulating to talk about the power of Geographic Information Systems. I encourage you to look at the 30-day map challenge on Twitter. Every day in November, people post maps of the different theme like a land, ie and yesterday, or null values. And it can be really fun. And yesterday we add our own GIS Day celebration here at Georgia Tech, where the students put together some tests. They'll happy geospatial Awareness Week. For this project is part of a very, very fun initiative called the geospatial fellow's project. This is an NSF funded project at the University of Illinois. And before last summer in 2020, they sent out a call to say, Oh hey, do you have a COVID related project? We would like you to be a geospatial fellow who they selected a group of us across the country, group at MIT, almost every week or a year. And we develop tools. And we develop research that combines geospatial analysis with aspects of COVID. And the goal of this project was reproducibility. Though reproducing are at being able to reproduce our analysis. And through this project, again funded by NSF at the University of Illinois, we are all able to produce notebooks, though there are a lot of Jupyter notebooks out there, and mine is actually one of the few that is not in Python, it, it is in R. And also blog post on the different types of analyses we did. So I'll be sure to share with you. And, and and I'd like to thank the geospatial Fellows Program for supporting this research. This enabled just introduce this problem, talk a little bit about regionalization, a bit about data and methods than our results, and then a bit of conclusions and implementation. To the main point of this research is that instead of states, we should use Cao needs and sets of county to county groups. Birds decision on COVID-19, APIs, which stands for non-pharmaceutical interventions. That would be social distancing, mask wearing and similar types of closure, outdoor activities, et cetera. And we should use these all are regions or these accounting-based regions to communicate cases, case rates, vaccine rates, et cetera, don't. States we think are not always the best option in an emergency or with health-related data to convey this information and to implement policies on non pharmaceutical interventions. The goal is to delineate new regions based on US counties that preserve tight-knit communities. Do preserve the places you go in your daily life. Preserve the typical commuter flows. Preserve where you may go to meet Fred. For where you share information and minimize transmission between the communities. So we want to find these kind of hotspots, or we want to kind of find these little hertz of communities, geo-spatial communities that kind of stay within one region and don't tend to go across regions very often. The approach that we use here is we're using networks of hundreds of millions of geospacial commutes, migrant, GPS traces, Facebook and Twitter ties to that would be who are you afraid? Where's the person you're friends with on Facebook and who on Twitter have you, Cole mentioned before? Or US contiguous county. So this is the 48 states in the US. We're going to apply community detection algorithms, which is a network science term. To partition the spatial networks. We're going to chop them up into regions that make sense. We're going to test for COVID-19 rate transmission between and across our new region. That we have five new distinct sets of regions based on the commutes, the migrants, the traces, the Facebook and the Twitter ties. And we find that some of these regions perform a lot better than states. In saying ovid is very, we use the word contained very loosely because that can be a very clinic epidemiological term. But we can use it with some air quotes here that there have been a bit better contains, that they prevents spillover. And then we find that some regions don't do a good job and that we should stick with the state. Though. What's our motivation for doing this type of research? Our motivation is that states in the US, for example, may not be the best unit for policy implementation. Dates are too heterogeneous, though. They're different in population density, they're different in ideology. You can think of the state of Georgia, you can think of the State of California, you about urban regions. You have rural region, different types of behavior in different types of needs. The level openings, for example, in Georgia, sometimes can put too much pressure on vulnerable area. And we had an example of this last year with Albany, Georgia. The Georgia is a really good place to show what kind of tensions can exist when a state tells you to implement policies that your local area can be hurt. I know, for example, last year and around I think this was around April 2020. The van is they're required face masks. And Governor Kemp was not very happy with that and tried to, they know this is the state of Georgia you're not allowed to have faced. The Georgia governor. Governor also sued. Usually Mayor Keisha lands bottoms and the city council over mandates for as the state coronavirus cases in Atlanta. He got up. Though there's been a lot of tension between, as we know, the state politics and local politics. We also had a huge super spreader event in the early days of the coronavirus in Albany, Georgia, after a funeral brought together a lot of people and it's spread the disease around the counties in here, Albany, Georgia. And from this article right down here says start county commission chairman recommends a regional Coronavirus Task Force, meaning that they want rules and they wanted non-pharmaceutical interventions and policies that were better suited to what was going on in their small region of Georgia. Then to have to say, Oh, whatever the state decides, it's fine for us because they had a real emergency there at the beginning of the coronavirus. Also geographically single city is straddle across states. Here is a kind of a prime example of Washington DC. Part of it is down here, is in Virginia. Part of it is its own entity, Washington DC, part of it is in Maryland, part of it is now in West Virginia. They actually annex some of this combined statistical area, which is this kind of large agglomerate area, may or may not have picked up a microbiology and statistical area of of chambers, Berg, and Franklin County in Pennsylvania. You can see that one big metropolitan area I can really span across states. And it may not be suitable for different state policies in different places because crossing state lines sometimes can feel a bit arbitrary. If you're meeting somebody at a restaurant, if you're going to your job. These change about every ten years. The Office of Budget and Management in case and interested, they annex and amputate counties based on the percentage of commuter flows that a county will send to the city. That when they're growing out and they're getting more and more counties. Atlanta has a ton of counties associated with the metropolitan statistical area. Those are based on commuter flows. So how many people from that new calumny are now commuting to the city. And it's very natural for these to cross state lines. Here's a beautiful example from about five years ago from dash Nelson and Alice array. They showed different commutes. They're just in different colors here as lines using census data connecting. The county's two are connecting smaller areas to the metropolitan area that they commute to. That as you can see here, these different colors don't necessarily match opt-in states. We understand that. But they may, this may be a good kind of concept or mapping out how we can create these areas to implement policies that are best suited for the people who are living in that metropolitan area. Also for emergencies, we don't use state a, we use counties in the US, so federal mandates of us using counties. For example, here we have current hazards. This from a little while ago, but showing which hazards existed in which counties at the time. For example, a tornado watch between Mississippi and Alabama. It is created is an agglomeration of different counties. They don't do a whole tornado rocks from Mississippi and Alabama because it's simply not suitable. Over on the right here, we'll wind advisory. When we have a wind advisory, they also do this at the county level. Do being able to say, hey, this isn't a kind of an emergency situation. We need to set groups of counties together to have some, to have well-suited policies and has been done before. And the county packs are possible. So County's do work together to create packs. Often these are economically driven. We have a coastal packed in Georgia. One example is the Appalachian Commission. So the Appalachian Commission consists of 423 counties across 13 states. And they come together as part of or as part of a special region in the Appalachian region to work together. So this is a very interesting and very good precedent for what we can do. Also when lives are at stake, health is paying attention to this approach as well, as well. This very interesting paper called gerrymandering for justice, redistricting us liver allocation to go really nice approach and some really nice optimization methods to say what should be the best liver zones in the country. And Oliver zone, meaning for organ transplants, how should those organs be shared across, within a region? And what they did with this type of work is they take, took a look at the original transplant regions that existed. And they said, Oh, well, there are some of these regions here and the darker blue areas. And they have a ratio of mutable liver's to patients on the waiting list that is rather high. Do for every 100 people, ones that are on the waiting list, there might be 50 levers available. But in these lighter color regions, for every 100 people, there might only be 25 levers available. Though it doesn't make a lot of sense to have a region where demand is really not being that. And then a region where demand is being met a lot better than what they did is they recreated these regions. They re optimize these regions for health purposes. Do better. They urine this liver region. And now this liver region has more leveraged that we see from this motivation that new regions can be used, perhaps for implementing stay at home orders, delivering aging, and capturing cohesive areas with shared movement and social networks. And that's what we're trying to do here. We do realize it would be a very messy policy situation, but there are some private precedents to help move this along. Though, how do you create regions? There's a really long legacy for creating regions, and it hearkens back to a lot of really great geography research. My favorite paper is from economic geography in 1955. It's called hinterland boundaries. The New York City and Boston. And the hinterland is just third of the excerpts, the suburbs, the area around 30. Though, this, this researcher worked for Stop and Shop in Boston, which is a grocery store. They wanted to better understand how to expand and better understand their clientele. And this is extremely interesting to me because it was a time before now, when making maps was quite art and gathering data was hard. Nevertheless, he made up of commuter traffic to showing where there was a lot of commuter traffic to Boston, and then showing where there was less commuter traffic to Boston. Using these ashes also showed where the newspaper circulation was more popular for New York newspapers or for Boston newspapers. This contour line says that if you live within this Boston, in this contour line, that 90 percent of the people who live within this zone here get the Boston, get boston newspapers. Inversely, about 90 percent of people who live within this contour line in New York at New York-based newspapers. And then he found this sort of 50 percent contour line here to say, this line's a toss up. You might get a Boston-based newspaper, you might get a New York-based newspaper. This one down here as well. It does it for telephone calls, though a toss up to and he also did it for check cashing. He also did it for where the CEO lived. And he was able to combine these and really split the two. We've also defined regions in some more creative ways over time. This very, very, very famous researcher, a geography, Wilbur's lens. He went through phone book in the late 1970s. I was just in 1880, frequency of selected regional locational terms from phone books for 276 cities in the US and Canada. Do you look for words like Creole, Aztec, Acadia, pioneer, pilgrim, Viking, et cetera. There is a lot of these terms that he deemed quite regional. And he delineated the US based on the prevalence of these terms in phone books saying, Oh, Aztec plumbing or pilgrim life insurance or something like that. He was able to say, we think that these are vernacular regions that I include this just because I think it's really fun and cool. He thought that the place this read lacking regional identity. Here perhaps some of us are from or have lived in some of these places. I know I have it. He said that this was a toss up zone. He had this north and south division and you call this a zone of indifference. Don't know whether you live in the north or south. And then the East versus West. Here is a zone of indifference. Though it shows from this now that if you live in they're infinitely small parts of Indiana, you might just feel very, very loss. But I don't know if that's necessarily true. The point of this is that regionalization has been going on for a long time. Today, social media has given us a new look at these kind of pop culture regions where these, and this one is sort of just a very lighthearted example of baseball where baseball fans lives. I think people really like looking at ease and they really like saying, Oh, this is mariners on career, this is Red Sox and Jake. And seeing where the administrative districts tend to follow these Culture districts in where someone has a very small fan base, in where there's a large fan base, et cetera, that we're very proud of our Atlanta Braves right now. And it appears that many people in the South tend to, tend to reach for the braids. Clearly, there are fans all over the place. But since this was for popular culture and they wanted to make it as simple as possible and show where some of the more popular PM. And then finally in this my last slide on background of regionalization, there's been some really exciting innovations in being able to participate in regions. And I don't really go into the mat in this, in this talk. But it is quite mathematical. And one of the exciting innovations that has come apart now is how strong the boundaries are. So this was a great paper identifying spaces that where people were moving using GPS traces in China and getting to see where these had a very strong areas where they're saying really people don't cross this line if they're over here. It don't tend to cross over there. If they're richer, they don't tend to crossover there. They may go up and down north, south, or they make up somewhere else. And it's also interesting to see why region, what live these regions occurred sometime. Boundary like a water, like a water boundary, a river or lake. Sometimes it's because of mountains. Sometimes you can, it is because of administrative districts and saying, oh, I I live in August, I'm moving to Augusta, but I'm not going to live on the south Carolina side of Augusta. I'm going to stay on the Georgia sign of Augusta because of familiarity or because of some kind of personal decision. Or sometimes it's infrastructural saying there isn't really a good way for me to get there. There's an I wake up air DO investigating why these regions exists is also exciting for us. That's my background on regions. Let's turn to what we did for this, this experiment. Though I have a kind of a graphical explanation of what we did. And I, I almost think it's a little more confusing than I wanted it to be. So I'll let you digest that. Though. Here are the steps that we took and I'll go through each. The first thing we had with the county to county flow network. And this means that for every county in the contiguous US, we've got about 3100 of them. We have an edge from one county to another county, and all those edges are going to be weighted. The next thing that we do is we derived regions from this network. And these may look like the states here, but these are, and finally, bottom-up derived regions. Then what we do is we find the network of adjacent counties with between region edges highlighted. The adjacency networks. It's sort of a match. And it means that if my county touches your county, we're going to put a little line between us. If my county doesn't touch your county, we don't get a line. Then we take a look at COVID-19, read some cases along this adjacency network. And then we find out whether they, the high rates and high number of cases are going between region. We would find that rather. Thing we, our hypothesis is that high COVID rates tend to stay within a region and not go between them. The let me dig into some more of our materials and approach. Here's the data that we use. All of these data are aggregated to the county level. We have kmeans. These are from dataset called loads LET, which is a household dynamics. And this is where you live and where you can YouTube. They have this at a very small granular unit available at the US census for free. It's at the block unit. We aggregated up to the county unit. We use this for 2015. For all almost all of these except for the Twitter network. There's about 3100 nodes, and those are counties. And this is the number of edges that exist between the nodes for each network. An edge means that it connects two counties. Any two counties. Edges are undirected, meaning that county a is connected to count E be the same exact way as county B is connected to county a. We just sum up the commutes or the Facebook friendships, etc. Though it's just an interaction metric. And they're weighted. So these edges are weighted, meaning that it's the number and number of Facebook friends, the number of migrants, et cetera. Though we have Facebook data from Facebook, they gave this out for free to researchers. It's called the social conductivity and that, and they pre-compute for us the number of friends that exists between two places. It's already undirected and they divide it by the number of Facebook users in those two places. We have migrants from the US Census as well. So these are people that start in one county and ends in another county, or they started counting the end in the same county. We do allow for self edges, meaning that if a commuter starts in the county and stays in their county, they're still included in. We have GPS trace this from safe graphs. This is a very, very large dataset, and that is not the size of these datasets is not reflected on here because we've already aggregated them up to the county level. Dave graph has given our data for free to researchers during COVID, for COVID related, wrote COVID related research. And that's the most recent data set that we have. And then we have Twitter. Cole mentioned, though, if I mentioned that someone's handle on Twitter and they have mentioned me back at some point, we get a link between us. And this is tree. This network is created by my coauthor, tyler colo, and he is use it in a number of his geography flow experiments in the past. And so this is the number of edges that exist between these counties. So those are our datasets that we're using. The way we define regions from networked delves in network science and is a little bit not. On the whole. We use something called community detection algorithms to assign each node to a community that based on the network. This is a great example. I teach a network, a spatial network analysis in, at Georgia Tech here. And I like to show what we've done in class. Though. This is an example of applying network modularity to the county. Catch you count each mute. They'll earn sciences city commutes. Though for this area here we have different metropolitan statistical areas and they are connected to other metropolitan statistical areas. This is completely a spatial. And then what the community detection algorithm goes in and does is it tries to find unique, unique little pockets in the network that tend to make sense. Meaning, oh, you all are sending flows to each other. You're sort of click like you have a nice little subgraph that's on your own. And then they give them one color, they assign them to one group. And so this little a spatial network here is of metropolitan statistical areas. And then we're able to assign them back to the geographic space at the end, and it creates these handy partitions. Let me give you two quick examples of how this has been used in research as well. I was on a team in 2010 and that was one of the first to use this method. And we did it any steady called redrawing the map of Great Britain from human interactions. What we did was we used the landline phone calls from British Telecom. You can see that there and with these little fellows here going where the call originated from and where it may have gone to. A lot of the times calls go nearby there often, especially when we used to use a lot of landlines. Most often calls are pretty local. Then we apply the community detection algorithm. And we were able to come up with these different zones. And when you see sort of this patchwork here and that, that these zones are a little bit of a toss up area. They didn't necessarily belong very strongly to each other. This output was interesting to us because it divided Wales into three regions. And we learned that a number of people who lived inland had advocates. And unfolds on the coast. And so that's why we get this lateral interactions here that divided Wales into three regions. We also learned that Scotland was very tightly its own region. About 80 percent of the calls at initiated there tended to stay in this area. And this was interesting because the Scottish referendum was, was an ACH enact around this time. And we got to see how independent they are in this calling, in this calling dataset. And then finally, they've also use this type of method for partitioning the world's light network into different zones. So the entire global, the entire global flight network of edges and ties between them. And the AI community detection algorithm was used to partition these in two different regions just from the bottom up and showed where there are different regions globally of highly connected flights. One thing that's interesting about this region is Alaska becomes its own region. And that is because there's a lot of small planes that travel within Alaska to deliver mail and deliver goods and deliver, deliver people. And in the spatial networks class, the students recreate this research and apply the partitioning algorithm to it. Though that is a couple of examples of the partitioning method that we used. Here are results. We try these different community detection methods. We did this in R using the AI graph package and I highly recommend the eyebrows package. Though, it was a good idea to try out different algorithms that already exist. Some of them performed poorly and some of them performed quite well. Then we have our five input data sets for each of these. One algorithm, for example, is called fast greedy and others called Info map, Louvain method, Red Cap and watch out. We have the number of community doubt. We don't assign those a priori. It just kinda tells us, hey, this is a number of communities we found. And then this value here is really important to us. It's called the modularity value maximize modularity. We want this number to be very high. So that number was really high for Scotland in that prior example. It means that if a lot of the flows originate from a single node, they are destined to go into their same community. Though there's a lot of churn and a lot of connectivity within one community and not a lot between communities, though that's what this number means. And we wanted that number to be where the higher numbers were and the number of communities. That makes sense. We chose to use the Move on algorithm, which was created by Vin to blend down and his team. Probably a little, probably around 10 years ago, maybe a little over 10 years ago now. And so we chose this algorithm to use in our study. The students in the urban analytics class this semester are also, I've also experimented with these different algorithms with these datasets. And they got to see how poorly some of them work and how good some of them are. So the last part in our methods here is the COVID-19 data is very, very hard transmission of COVID-19, which was a little bit of a, a kind of a deal breaker for this type of research out. I shouldn't say a deal breaker because we still made the deal. Maybe a pink flag, though. Covid-19 was able to be tracked geographically a little bit by variant, like the delta variant moving. It was able to be tracked a little bit by serology tests, you who had different antibodies and how those were changing and geographic space. But in general, it was very, very, it's very, very hard to track the movement of movement and the transmission of COVID-19. The way we did in this experiment was to take that county adjacency network, which again is just saying, Hey, if my county touches your county, we're going to get a little line between arms. In counties that come to a corner. They usually got a line between them as well. We have a lot of those in Nebraska and Kansas and taxes, those are square counties. Do the adjacency matrix or adjacency network has about 8,935 edges. It's on this kind of mesh or this lattice kind of structure where we put our COVID-19 data. Each edge in this network, again, undirected as the number of COVID-19 cases that exists between those two means. They have the COVID-19 case rate per 100 per 1000 people. Though that joint rates between the two. And then the difference in the rate though they say, Hey, what's your read a lot higher than minor, was your read a lot lower than mine? If so, there it means that there's not a lot of transmission or we can assume that there wasn't a lot of transmission going between us. We did this for three waves of COVID. On the first was until May 31st, 2020. The second was the summer of 2020. And then the third, which had predominantly the most cases, was from 2020 until July 1st, 2021. Had we started this experiment recently, we would have treated the Delta variant as its own wave. But this was started a quite awhile ago. And so we had these different waves. And then how do we know if our regions worked? The case rates are going to be high inside the region. So we want these to be really read. The case rates are high inside the region and really low between the regions that are sort of staying in these bubbles. The case rate differences will be lower within the regions and higher between the region. Though whatever the cases are. We say, Hey, you're on the same, you're all talking the same language within that region. But between the regions, you're doing differently. We think the cases will be higher between the regions and are within the regions and low between the region. So now will show you which regions performed best. And if any of these could improve upon states that we did the same tests are the states as well. Here are the resulting regions that we got, that we have our familiar states. In the top left are commuter regions. We have 75 commuter regions. We have 33 Facebook regions, spoiler alert. These did not perform very well at all. It is interesting that they made the West Coast one giant agglomeration. And it was kind of the loser of the team. Migrant regions, which tended to have very, very stark state boundaries. Our trip regions here. So these are from our GPS traces. And then the Twitter regions, which had a very strong east-west, very strong EAS. East was kind of pattern here. The maximum modularity was found for commuter regions and trip regions. That means that it was very easy to partition this network and it worked out very easily. The edges between and the edges within here. These mean that this is the number of adjacency network edges that went between a region and the number of adjacency network edges that stayed within a region AB plus EW is always going to sum up to the same exact thing. And that is going to be, that number is going to be a bit over 8,010. And looked similar. Though we use the z-score of the rand coefficient here, which is a nice statistic that showed us how similar our network groupings where we found oddly that the Twitter regions in the microwave regions were quite similar. We don't know why this is we think that it may have you with something with the state boundaries are not sure. We also found that the commuter region, we also use that jaccard Index 2 for similarity. And we found that the commu regions and the trip regions were quite similar. And that makes a lot of sense since commuters often are found in the GPS trace network. This is the last slide I have before we look at the COVID-19 metrics. Though, if we overlaid all these regions together, I was really interested in seeing where there was a lot of commonality and where there wasn't a lot of commonality. Though for this adjacency matrix, you're allowed to have up to six hit. You get hit if you were in the same region as your neighbor. If you and your county were put into the same region. And one of these here and then one of them is this dates. You kinda get a shape that we want is to see which neighboring counties were often placed into the same region. And it turns out that the what, that the dark, so the darker, the darker counties in the dark edges here mean that those two neighboring counties and the TV in the same region with each other. What's interesting is that when we see these white boundaries, it means that they were never placed in the same region. Though. It means that there isn't a lot of flow going back and forth between these different boundaries. And as somebody who lived in central Pennsylvania for a long time, I can understand why this Ohio boundary here is here. Maybe some of you have some really interesting anecdotes about why certain, certain high boundaries exist. I was rather about this Mississippi and Alabama boundary here. I really don't know what's going on there. I thought there'd be a much stronger one between Georgia and Alabama, particularly because of the time zone change here. I was wrong. Usually when you see a city that's how that straddles county lines, you're not going to start boundary. So here example, for example, is where Pittsburgh is. We have an area here where Kansas City is. And you can see that the state boundary doesn't play a huge role in though. The goal of this map is just to show you in the continental US where some of these boundaries where there aren't a lot of connections between counties. Base but connections, Twitter connections, GPS connections, migrant connections, et cetera. This also was a really curious one here between Wisconsin and Illinois because Chicago metro area really spans up there and I'm not I'm not really sure why that's the case. I'm sure someone in the audience and explain that's quite, quite well. So these are our results and I just have a couple more slides left and I'll wrap up to what this graph is showing here. What this table is showing is those statistics that we were looking for. Though is this column name is yellow. It means we want very small values as though we weren't the case rates within a region to be pretty high. And that's not because we want there to be a lot of COVID, but we wanted those contained in the region. And I'm going to just zip right down to week three here, because this is where the bulk of our COVID-19 cases for this, the commu extended to perform quite well here and having hot much higher within region chase rates and then lower between region case rates of something that did not perform well with our Facebook region. Their case rate within a region was about 90 cases per 1000 restaurant residents. But they're T3 between regions like when you cross regions were much higher. And so that means it didn't do a very good job of dividing up that space. The trip regions tended to perform well. The Twitter regions did not tend to perform very well. The state still a little bit better than we thought. So they were in the middle. And then the migration regions here, they didn't intend to perform very well unless they pretended to perform similarly to the Twitter reach. The key differences. We want those differences to be small inside a region, and we want them to be big between a region that we got our wish here with the commutes. The case rate differences are small on counties within a region and they're big across regions. But this was the case generally across the board. Now the case rate differences were pretty small within regions, and the case rate differences were high between regions. Though there was no real preference here for a type of region that we developed or state. For this particular example. Though the states just performed as well as the other region. The first or the second to last one year is an odds ratio. We want the odds ratio to be small as well. Though this is the ratio of cases between over cases within divided by the edges, between divided by the edges with it. It tastes a little long to unpack this one. But we want the odds ratio to be small as well. And it was the smallest for the commuter data, the second smallest for the trip data. Social media data did not perform well with this. Though, the Facebook regions had a very high odds ratio to Twitter regions had a high rate, high ratio. The states and the migrants were sort of in-between. The cases within the regions. In the cases between the regions tended to be the high, high, very high for commutes. They tended to be rather high for migrants. They did not do well for Facebook or the states. They did okay, but not great for the trips. They did very well. And for the Twitter they did surprisingly well here. That means our cases tend to be within the regions and not tend to be with between the regions. And this is on purpose put last year because I think this is one of our more biased, the six that we put a lot less credence in this one, but we put more credence in case rates, age differences, and that odds ratio just to my size. And I'll wrap up here. This is the example of the permutation. This is the results of a permutation analysis that we wanted to find out whether the things in that table were statistically significant or not. So we permuted, we randomly reattach 01. So within in-between, we said we don't know what kind of edge you are. We're going to pretend your 0. We're going to pretend you're one. You're gonna, we're gonna pretend you're between. We're going to pretend euro within. We sum, we summed up the 1s. So we summed up the ones we thought were potentially within 1000 different times, producing a distribution of expected thumbs. We compare this distribution to the sums of the actual boundary crossings to the actual betweens. And we saw whether ours were statistically different or not. I'm just going to focus in on the case rates in case rate differences here. Do for the commutes in the GPS traces. We found that this is just, this. Asterix is a bit of a bit of a significance indicator here, the fact that it's on this commutes. We'll hold off on that for now. Are on the Facebook. We'll hold off on that for now. But amend that there was a statistical difference or the commuters in the GPS traces. And for the case rate difference, we found that there was a statistical difference for all of them except for state. That what this means is that for commutes and GPS traces were pretty happy with the fact that they perform better than States. If we use these regions, they would prevent COVID spillovers way better than states. Don't move now to your conclusions and implementation. Through our conclusions are the following. We use large network datasets, define well-connected region. We find that regions based on computers and GPS movements perform better than they lack spillovers or cases, and they tend to be more homogenous in our case rates. That's what our experiments. We encourage the use of these types of regions based on the commuters, based on the movement. To help administer policies are schools, businesses, social distancing, mask, mandates, etc. And I'm just going to wrap up with two fun day. Since I have been working more with a, working more with the lab. And very lucky to do so. We always want to create sort of an interactive tool for people to you, for people to get, to experiment with the data. And I put this in the chat a little bit and earlier. And I'm a little scared to do that right now. Oh, now I see that there's some chat here. Hopefully, I will post this in the chat if you would like to see the different tools. Hi everybody. And what this does. And there may be a little bit of lag time, but you can the lats your own weights here. You can wait the commuter flows, you can wait the GPS trips. You can wait the migrants, et cetera. This is an old version. I'm realizing that doesn't have the color coordination. There are newer version which exists somewhere as a color coordination. Guys that these colors are changing every time you do it. But I hope that that gives you an idea that what you can do here is you can devise your own regions based on whether you want one aspect to be weighted higher, higher than the other or not. Now I'll go ahead and try to find, try to find the ones with the color coordination. It was a really fun computer science challenge to get all the colors to stay the same for each city. And then finally my last slide here, seconds left side. I had the students and urban analytics organ that there's a conservative movement called greater Ohio, where some conservative individuals wanted eight counties in Oregon to join Ohio. For NPR. Oregon may be happy to get rid of these low-income counties. Idaho baby be happy to get, have these high-income counties, et cetera. And we were able to find from this tool here whether that sounded like a good idea or not. The thank you so much. We have a pre-print up on search on the social archive. And we wanted to think the main grant browning, Geng Chang for, for helping us deal with that. I will conclude and thank you very much. All right. Thank you very much. You know, like I said, of course, fascinating talk. As I expected. Unfortunately, we are just about out of time, so I don't know that we'll have any time for questions. But you mentioned there was an injured urban analytics class. The course number for that is offhand. Right now. It is a special topics. Yeah. So it's an 83. Okay. Hopefully you don't get its own course number. All right, one. Oh yes, I know you go ahead. In the social network data, work on the project. I think that's conductivity index with the smooth and the Twitter data performed a little bit, a little bit better than, than we thought. They know you need to head off the classes though it's my off our chit-chatting a level 2 on. But I want to see you there. Thank you very much, we appreciate it. And I hope everyone has a wonderful day.