O O O O O O O O O dear. Yeah. Eight. Day. Thanks Bob I'm. So I think to be here is my first visit to the campus although it now has a frequent flyer I've been very near here very often but never actually made it to the campus so it's great to be here. So as Pablo says I have a personal story of work in science in fact starting off very theoretically and blackhole perturbations very very mathematical. Activities but it turned out. That the equations of relativity are pretty difficult to get anywhere with and so you rather quickly turn to numerical simulation and then I found there's an entirely new world of computation out there that is absolutely critical to making progress in only in and that field but in many others and now from my point of view from the head of the mathematical and physical sciences Directorate at N.S.F. I see this in every single subfield of of mathematical and physical scientists well as biological sciences geosciences every area that we cover including social behavior and economic sciences so whether it's compute intensive or data intensive or or somehow a combination of of them all. It's fundamentally one of the big challenges I think of the twenty first century and so that's more or less the the theme of my talk. So let me let me start by saying Times are changing and they're changing incredibly rapidly. Even those of us who work in this field. I think often don't quite appreciate how fast things are moving and so part of my talk will be to motivate not only how fast they're moving but how fast they're moving after a very very long period of things not changing very much. And so there's almost a step function if you were sort of graph the way science is being done across different ways whether how many people are involved in collaboration and how much data they're having to deal with how much computation. It's just gone like this for centuries and then suddenly it's like this. So that's that's ultimately my message. So I like to start off. With something I know about which is gravitational physics and just a one no slight or two slight history lesson. If you go back about almost four centuries now to the beginning of of modern science as we know it. Actually I think many people think of Galileo as the father of modern science in many ways he talked about things like data as important for Dick developing new theories he talked about reproducibility in science which I think is right now ever more more than ever. A serious issue that we have to think about with the complexity of the. Instruments in the computing environments we're using How would you reproduce your results. And so on. So so Galileo and then Newton really got us going. But I want to talk specifically here about the culture of science at that time it was basically you have a figure in this case very luminary figures in science but nonetheless scientists who would work say with an apprentice or a student and more or less that's the way science is still done today. You have an advisor. But it's in a much more complex environment. But if you were to look at the kind of data that would be collected by people back in the sixteen hundreds. It's something like some kilobytes of data that would be in a notebook. I hear something of an echo here I'm unsure if I got this too close but. I don't know but if it's bothering people let me know. So anyway if you were to digitize amount of data that Galileo would have in a notebook it would be of order kilobytes if you were to look at the kind of theory he was be developing or Newton would be developing it was really driven by the data of the motions of the planets had to be explained. Somehow that ultimately led to the theory of gravitation they were actually also of course doing computations to check their calculations once Newton invented the calculus he would also do. Do computations to figure out quantitatively worth certain motions of the planets being explained properly by theories and so on but so then think about how much computation could someone do without a computer. So if you're very smart. You can do sort of one floating point operation per second. Without a calculator you know maybe you could do that if you were very good. So that's more or less the starting point I would say where we were in terms of data. The theory the way data were driving science and computation in terms of collaboration that sort of you know handed down person to person and then people meeting very slowly at conferences going by horseback and so on. So things didn't travelled didn't move that fast. Nonetheless there were quite revolutionary discoveries at the time so that kind of gives you the state of science around sixteen fifty lesson. Now fast forward just in the in the area of gravitational physics about three hundred fifty years or so and you find starting in a say start in one thousand nine hundred seventy two. That's a sort of a landmark theoretical calculation done by Stephen Hawking what would happen if two black holes were to collide. And so there had been some revolutions particularly by Einstein about our nature understanding the nature of space and time and the fact that black holes as a theoretical construct at least of the time might possibly exist and Hawking was thinking about what would happen if they would be colliding. And you might think well when would that ever happen. It turns out we think actually black holes are colliding quite frequently in the universe so it's not a silly thing to be looking at so Hawking did this and so looking at the kind of work that he did. If you were to digitize this that's a scientific visualization of the time around one nine hundred seventy two at which one person working more or less in isolation but collecting ideas from others. No computer but if you were to digitize this I would say you've got a sort of a fifty kilo byte scientific visualization So my main point is that after about three centuries the culture of doing science hadn't really changed very much and I think if you go across all areas of science of course you'll find experiments were taking data and so on but it was really at the level of of the kinds of data that you still put in a notebook that hadn't changed very much. But now just go ahead about another fifteen or twenty years. And so there is a calculation that was carried out by by a collaboration I was involved in with particular with a group at Washington University. This isn't when I was at the University of Illinois where we did Hawking's calculation but with a supercomputer where we had to now solve Einstein's equations in full two dimensions so it at that time it was only a actually symmetric calculation but we needed people who are experts in writing parallel code figuring out how to visualize the amount of data at this time. Now we're up by by three orders of magnitude to magnitude so up to fifty megabytes of data and so it's really a lot at that time and so you need a small team. To be able to do these calculations sort of Honestly in the sense of really solving Einstein's equations as accurately as possible on a computer. Now just a few years later you go from two dimensions to three dimensions and those of you who know some relativity will know that the number of variables explodes the fact that there are now in three dimensions and not two also is another complicating factor and so on. So at this time we did a full three dimensional calculation of two black holes colliding. And that's a visualization of it. We need a larger group so this is carried out in Germany. We're at the Albert Einstein incident we had people that we were working with in an all areas of this the supercomputer now goes from something like a four processor Cray Y M P to a two hundred fifty six processor. Arjen two thousand at the time. So now to fifty six cores you can almost get an A cell phone by that time that was a a couple million dollar supercomputer. But look it's generating now fifty gigabytes of data so in just three more years we've gone up another or three hours of magnitude. So it's incredible. So we're up by now to fifty gigabytes of data and these trends are continuing on and on so I'll show you some more examples of this in a minute. But I want to point out a little bit about where the community now has gone from these individual groups working in isolation or individual professors to large groups of consortia in fact Georgia Tech. Here is one of the leading members of this thing called the Einstein toolkit consortium that are building now software environments that can use parallel computers with many thousands of course tens of thousands potentially hundreds of thousands of cores with adaptive mesh refinement and this collaboration now has something like sixty five members in a twenty five sites many countries are engaged and I want to show you a visualization of a simulation that was carried out by members of this consortium in particular by low Chawner It's all at the Albert Einstein Institute So this is a visualization now going into the grid structures that are simulating this pair of black holes. So the point is you want to put the boundaries. Very far away and in the middle of your grid. You've got now two black holes that each have spin. So this is now getting into the the case where you can really solve Einstein's equations that have stumped the community for about a century now you can solve them. And so this group is one of the leading ones and being able to develop and apply these technologies to all manner of problems that I've come to in a minute but this is one where it's just that the equations of relativity for black holes and this means that things like hydrodynamics and other things no matter is actually in this calculation is just pure gravitational physics. So what you're seeing here are visualisations of the black holes and this is actually a thing called the Lapps function which just shows in a sense how Time slows down near the edge of the black hole. So the calculations about to start. And so here we go. You'll see the two black holes are now orbiting around each other and of the color field show you the gravitational waves that will be emitted by this and so I'm not going into a lesson on why we have these but Einstein's theory predicts that motion that mass is moving around. Will it will emit gravitational waves which have never been detected yet but may well be soon wait for my next slide and so they spiral and they form a big black hole and then this burst of waves comes out and we are expecting in the next few years to actually be able to detect the waves for the first time from events like colliding black holes and then that will not only tell us that Einstein's theories were correct. Assuming they were but it will be able to then reengineer of the signals we see and figure out what kinds of things we're just doing that in the universe were millions of years ago because it takes a while for the signal to travel across the universe to our detectors and so what do we know about the the sources of ways based on what we've actually seen so we can tell. For example that black holes exist or maybe black holes just collided or black holes and neutron stars. What was the equation of state of the inside of a new From Star all of these things will learn a lot about the physics by interpreting these waves. So the point is that it takes these collaboration's of people to develop the software environments to bring in the computer. The science the algorithms the parallel computing making inefficient doing the visualizations coming up with adaptive mesh structures and so on. This is way beyond what any particular science group can do so we're trying to develop an infrastructure that involves software and science and a fundamental algorithms in mathematics and so on that allows people to do these kinds of calculations. So. So I think that this is actually representative of what this community has done that is there is a major breakthrough in that some equations that we've been we've been given a century ago by Einstein. But couldn't solve are now able to be solved and this is due to inventions in computing and in computational science in the algorithms that have been made possible by mathematicians and so on. So this just wouldn't be possible. So I think this is a major triumph in science. Now we also want to see these gravitational waves I talked about what do they do they really exist so we've never actually seen them directly. There was a Nobel prize of some of you followed this in physics for the indirect attention of gravitational waves and basically what that means is there's a binary neutron star system that's orbiting kind of like those black holes were and they're meeting gravitational waves that's been lose energy because the waves are coming out their orbits eks a little bit and you can calculate based on the the characteristics of that orbit that they must be losing energy according to Einstein's theory at a certain rate and you make a measurement and it's you sort of you can calculate that is exactly what's measured that is that that system is decaying exactly as predicted by Einstein's theory. So you inferred that there are gravitational waves but you haven't actually seen them directly. So that's not good enough. We want to actually detect them directly. And so at the National Science Foundation we have a project called Lego laser interferon metric Gravitational Wave Observatory. That is is aiming to detect these ways and we've spent almost nine hundred million dollars on this up to date. So it's by it's probably I think it is. The largest project that the N.S.F. supports and so I won't go into the technologies but I'll just point out that a an instrument that's a laser interferometer So it's the size of a small town. It's going to measure deviations and the distances between the detectors that are equivalent to. Something like that smaller than the diameter of a proton So imagine an instrument the size of this campus actually measuring vibrations in space time that are at the level of something less than the size of a proton that's really quite extraordinary. But the technology is there and so we're anticipating some major discoveries in the next few years when we can actually detect the waves for the first time and then interpret what they're telling us and so we're building this array of detectors around the world. The U.S. has two detectors the Germans have built one in Germany a town in French collaboration has one in Italy and so everybody is poised now for these discoveries to be coming in. So what will we be saying will be saying things like collisions of black holes a neutron stars will be saying relics of the Big Bang potentially will be seeing supernova explosions will be saying Who knows what this is probably the most the most important thing because this is opening up a completely new channel of observation in the universe and we've never seen any before so although we can calculate from known sources what we might see we have no idea what really might be out there and so you can expect something really revolutionary to occur in the next few years and there will be a front page New York Times story I'm quite sure when that happens. So so let's just step back and put this in perspective just even two decades ago you had a single scientists working in this area not doing much in terms of numerical simulation not able to do experiments like this by the way these experiments can generate petabytes of data as well. So this thing has gone from a completely mathematical theory and discipline to one that is completely dominated by large collaboration's there over eight hundred scientists working together in the detectors in the detector community too actually. I dig out the signals from the noise in those in those detectors and there are as you've seen there are many dozens of groups around the world doing the simulations and so this is now completely change the character of this field of science from very mathematical and theoretical to very compute and data dominated all the way across so. So that's just the beginning now. Now think about being able to have this kind of technology at your disposal so you can detect gravitational waves very soon. You can do calculations for the first time ever after a century. You can solve Einstein's equations and now begin to apply this and so it turns out you're just getting started. Because I haven't yet talked about hydrodynamics which you need to know about in a supernova or in a neutron star or about nuclear equations of state or radiation transport or neutrinos all of these things all have to be sort of put in as different physics ingredients to really look at things like gamma ray burst in the universe and so on and so you must now bring in new areas of expertise and it has to be integrated well in order to look at any of these complex problems. So look at that. Then look at the issues of the amount of computation that has to be done on this. So there's a book that you might have some of you might have seen on pedophile computing that's edited by by David Bader here that talks about the science case for doing very large computing and so this is one that. That Pablo endured and others and I thought about for a long time you will easily saturate a pet a scale computer for as long as you want to do one of these calculations if you were to put in all of the physics that you need to understand this and even with adaptive measure refinement downsampling as much as possible so that you can still fit you can still saturate these kind of facilities and so there will be a lot more data coming out a lot more expertise needed in order to scale things up to the level of of supercomputers that literally have hundreds of thousands of cores and actually we're go. We're coming up to close to a million cores and supercomputers these days. And so. That creates all kinds of new problems that I'll come to in just a minute. So just then stepping back after I want to keep driving this point home. All of this happened in the last two decades after about four centuries of not much changing in terms of the way the methodology and the culture of doing the science works so we have to really rethink how we're going to do this this kind of science. How do people collaborate at these scales how do we share the data that's needed. So I'm going to get into that with this as a motivator. So. So let me just go to one more level of complexity which is in the next ten years we will have an array of instruments scattered across the globe that will all be telling us about exactly these kinds of things gamma ray burst neutron stars black holes and those things that we haven't yet discovered what's going on out there and so again. Astronomy. This is now more about astronomy but a but it has to connect into these other physics areas that I was talking about astronomy has basically since Galileo came up with the telescope has been let's look at something. Let's work order it let's analyze that then. One is to possible to take a photographic picture of it. You have a plate you digitize it and so on that completely changing. So we will have instruments that are always scanning the skies and in real time being able to say we've seen something in this channel. Have you seen or known gravitational waves has it been seen in neutrinos can you do a calculation and how do you mobilize all of this when each one of these instruments can be generating potentially petabytes of data through communities who actually don't ever talk to each other really because you have people who do. Neutrinos you have people who do optical astronomy. They may occasionally talk to each other but you can't mobilize them to get in one room and try to calculate what's going on in this one thing that they've just seen. So you have to develop policies that allow people to share data and information across communities and so that's another one of the challenges that we have. So let me just go through a couple of the things that are being deployed now so what we have in place right now. Is what we have optical networks that are beginning to move data around a tremendous rates like say ten. Gigabytes us ten gigabits a second. That's basically nothing compared to the data rates that are coming out of these instruments quite soon. So here are two instruments one is called the Atacama large millimeter array which is here in the Chilean desert and so it's a beautiful site of I've been there. They just started. Operating with sixteen radio antennas they will have about sixty roughly when they're finished. It's about a billion dollars in infrastructure and it's going to give us an unprecedented view of the universe in the radio spectrum particularly in millimeter wave lengths and up here we have the expanded Very Large Array extended Very Large Array in New Mexico. So these instruments are just now coming online and they're going to be generating all of this data we have at the South Pole a thing called Ice Cube which of course Georgia Tech is engaged in as well where they will be they are now it's just been completed. They will be measuring neutrino events from exotic activities in the in the galaxy but attention from from far across the universe. Depending on what kind of physics is actually going on out there. So a lot of data are being generated there and one of the signal comes in from the neutrinos you can point back and say it had to be coming in that direction. Please look at the radio and see what you've seen there. So at the same time we have other activities very large optical telescopes. These are being planned now but they're at the level of thirty to forty meters across. So they're actually segmented so the total collecting areas are equivalent to say a thirty or forty meter diameter telescope which is well beyond where we are now typically eight meters is the largest telescope that are being built these are completely digital instruments they're not taking photographs anymore they're taking height resolution digital images. There's a gravitational wave observatories that I just talked about that are coming on line as the so-called advance like go is under construction. Now and it will have the sensitivity that we believe will see the first gravitational waves. It turns out if we can put one of these detectors elsewhere on the planet. So it's N.S.F. has actually got. Detectors being built and we've realized if we could place one of those gravitational wave detectors in either Australia or India or somewhere far away. We'll be able to tell where on the sky the signals coming from. It's a sort of a triangulation argument so without that we can't really tell. So we're able to do that we'll be able to correlate gravitational wave signals from things like exploding stars or black hole collisions with optical radio and neutrinos and so on. So it just keeps going and it culminates in a thing called the large synoptic Survey Telescope which is being planned now in fact parts of it are being built already. And if all goes as planned. It may be operating later in this decade and it will be again on the similar site here and in the chill a in a desert on a mountaintop there it will take a picture of the universe in a very different way from the way current telescopes operate in some summer operating a bit like this but basically it points there and it takes a picture takes a very high resolution picture that's equivalent to fifteen hundred high definition television images stitched together and it does this in fifteen seconds later it moves a little bit and then it moves a little bit more and it scans the sky and it takes about three days to scan the sky and it can see about half the universe from from any location the southern half of the sky basically so. So you know a good portion of the universe you can see. And so every three days it scans the entire sky and then it repeats. So over ten years. We'll have a movie of the entire southern half of the universe. That's seen and a resolution in the depth that's never been possible like this before and of course that means they're going to see millions of a vents every single night with this telescope. So no human is ever going to be able to look at all of those events we need to have machine learning techniques of testicle techniques all kinds of mathematics may need to be developed in order to really dig into the data that are going to be generated and by the way it's sitting on the top of a mountain in South America. So we have to get that data back distributed to. To colleagues all around the world and so on so. So just by comparison. There's a thing called the Sloan Digital Sky Survey. That's operating. It's been operating for about a decade. That doesn't sort of points in one direction and if sort of in a few places it's collecting information but not as comprehensive as what the L.S.S. team will do and over a decade it collected forty terabytes of data. So if you think about it. That's really a lot of data to to worry about this thing the L.S.S. T. will generate that much data every single night. So that's a decade of observing now compressed into a single night and so you see this coming up in every area as well. And I'll get to that in a minute but the data in flux is is so tremendous and it's coming in so many areas of science from biology to astronomy to high energy physics and so on. Now I want to even talk about this. There's a thing called the square kilometer or a which is planned a little bit further into the future that will but it'll have three thousand radio antennas kind of scattered potentially across the globe but potentially either in in Australia or in in South Africa. And it will be generating exabytes of data so another of many factors of thousands and potentially millions up from from even from the L.S.S. T. So it's it will be incredible. And so at the same time I want to come back to the simulation remind you that will be observing through all these different channels this multi messenger astronomy activity all of these different things we have to somehow coordinate move all this data around make it available to the community and compare with simulations that people are doing say in a large supercomputer. And so this is totally new world in the way science is being done. I think in astronomy and in particular I think the activities here it. At Georgia Tech are really at the vanguard of what's possible and this is going to be a very exciting area. So in order to do this. Of course it's going to require integration across disciplines and sort of end to end capabilities because if any of you are working in this area or would like to you're going to be doing it typically from your desk. And maybe from your mobile device and so you need to have access to all of this somehow. And so how do we make that possible. So it'll have to be done through sharing of data ultimately scientists are collaborating now through sharing of data when you when you send an e-mail to somebody maybe with a manuscript that's basically exchanging data so in every way I think you can always boil it down to the sharing of data how collaboration's are done. So here is Hurricane Katrina. And I was living in Baton Rouge at the time and so I experienced it from a fairly safe distance although there were it was actually quite damaging in Baton Rouge as well but not so much as New Orleans of course but these are the black dots indicate the actual locations of Hurricane Katrina. At that measured time so this is about five days out this is getting into about three days out where three days from now it's going to hit and you can see at this time the models. These are the tracks of computer projections of where the model of where the hurricane will go. So here we had no idea really could go anywhere but they eventually converge and we knew they would hit on the New Orleans area but in order to get this this calculation or this emergency forecasting done in a way that's useful. We have to combine these models of the atmosphere and so they say at this time within three days we know that almost certainly it's going to hit the New Orleans area but that's just the atmospheric modeling. So we have to now integrate observations from the satellites that are watching this as initial conditions for each of these as well as information about local conditions where the hurricane is so we have centers across the Gulf of Mexico oil platforms all of this needs to be integrated put into the models each of the models says something different. Actually at this stage. Some were going here. Some are going to New Orleans some are even going to Texas. So if you want to then calculate the storm surge which is the really the most devastating part of a hurricane when you're on the these coastal regions the winds are important but the storm surge is much more devastating. It turns out that it's a different set of communities that actually works on storm surge modeling. So they have to couple their store. I'm surge models with the atmospheric models. So they basically take out put from every one of these models and they use those input to different storm surge models but it's more complicated because you have different storm surge models. If you're going to go into the Florida Panhandle or the New Orleans area or the Texas area so there is this sort of dynamic data driven activity where you must say algorithmically I need to run all of these different scenarios and then see compare with satellite observations and see where is the hurricane actually going compared to my models to get the best information about what to project in terms of where people should put supplies and so on and if that chain keeps going. So you need to go beyond storm surge to how will levees actually respond when a storm surge occurs in a particular area where will flooding occur where should I put supplies. How will people respond as well. So in turns out there's very little public transportation. They have to switch the highway. So people can drive out. Do people have cars and so all of these things you really need to know all of this in order to make reasonable projections of what what we should take what action we should take as a government or as a city agency and so on and so this is the kind of thing you'd like to be able to assemble all of these communities from satellite of NASA type a no observations to fluid dynamicists to social behavior and economic scientists but you can't do that over a period of two decades you need to do this in a period of five days. So ultimately we need to think about the collaborative environments the algorithms the cyber infrastructure and the policies that allow people to share things so that we can do these kind of calculations in short order. So this is what I refer to as Grand Challenge communities so the idea is that not just a grand challenge problem like like Pablo and I used to work on this black hole grand challenge that was one problem that allowed us to eight universities to work together but at least they were in the same communities basically physics and computer science but fairly closely connected but in this case you have. Very disparate communities that need to work together to look at the real problems that we're facing in science these days. And so that's one of the the main messages here aside from the the computer and the cyber infrastructure is how do we get the collaboration is to occur across these very different kinds of disciplines. So they require more than just teams they require communities of people who actually won't ever be able to work directly together. They're very multi-disciplinary these emergency forecasting scenarios of course you can replace hurricanes with tornadoes or flu outbreaks and so on. There's real time activity now where people are thinking about looking at gamma ray bursts or supernovae and sort of the time domain astronomy how do you integrate these different communities and so on. So those are just a couple of examples. And then I'll just repeat the statement I made before that these communities really only work together or really only can work together by sharing data so this places requirements on not only the software the networks and the collaborative environments about the scientific reproducibility of course and in these very complex environments how do we make sure we could reproduce a calculation. When we have so many communities actually contributing something I'm quite worried about what are the right university structures to to do this promotion and tenure doesn't go to a community it goes to an individual. So how do you do that and what is a publication in this environment so is a data set of publication. So I'll come back to that in a few minutes so I want to talk about what we're thinking about the future of science and engineering and the the policy we need to put in place to help support this so but people are beginning to experiment with this. So this is some work done in particular by by Eric Schmidt hers now at the Perimeter Institute in in Canada sort of figuring out how can you broadcast information they say from your simulation code to your collaboration. So using things like Twitter. So in this case. In fact you can you can broadcast at us updates by Twitter if you're it care to know if you're black holes or collide or not in your simulation. You know you can do that and so can your collaboration and so can people. Out there in the community school kids could be following your simulations on Twitter. If they wanted to or using social networking technologies like Flicker. Most people put pictures of their their babies up here but some people put their black hole simulations and so you can use these social networking technologies to share information across communities and so just a couple of examples of where people are beginning to go with this stuff so. So here. Here's a message then that I've learned very much working at the National Science Foundation is that a really critical piece to all of this is understanding social behavioral and economic sciences and they're going to be crucial in helping us understand these issues of collaboration as well as policy and and so on. So I think it's very important to include them and her. And her research activities. OK so I'm going to step back and change gears here but I want to just point out that although I focus on environmental sciences in astrophysics. It's really everywhere it's high energy physics. It's biology it's geosciences this is sort of a nice pictorial representation of the earth cube that you might have heard about of your in geosciences there's a dear cult letter about developing a services for large scale data in environmental and earth sciences and finding ways to allow people to collaborate. I want to point out something many of you are if you're working in biological sciences particularly in D.N.A. sequencing you will know that there are D.N.A. sequence or is there that can generate theoretically can generate data at the level of at the rate of a terabyte per minute now most of that is not data that you keep and use but that's their theoretical capability now. And so that's just now but if you were to enter great that up. That's basically equivalent to a large hadron collider except there's only one of these and a central place on the globe and there are there are versions of these and many many of your labs and so there are probably quite a few of these just on the campus here. And so it's not just these singular experiments the individuals are able to generate data sets at levels of rates that are just unprecedented so. So let me just now back up and and then. Lane. Just to recapitulate this that things are changing dramatically and I try to to to point out that this is happening in the last couple of decades and most intensely in the last decade alone after many centuries of things not changing very much. If you look at how much it changes really tend to the nine to ten of the twelve in terms of the amount of data people are collecting or computing capability and so on. They have the collaboration capabilities or are needed even higher levels than ever before. And so on and so there's been all these reports science nature it's cetera. When I started at N.S.F. There was this kind of over the top article in Wired magazine that said. Science is now over which I thought was amusing my first month at the National Science Foundation to have that on the cover of a magazine. But it says Welcome to the petabyte age and they're just saying that basically the way we do science is never going to be the same. And sort of less alarmingly there's a book by Microsoft edited by Microsoft on data intensive science but it has scientists from around the world talking about what they see for their fields in all these areas called the fourth paradigm and so I think I just want to draw attention I think it's very important. So so for grad students out there this although this may sound scary. I think of the advantage it gives you. So first of all we still think like this because we've been brought up over four centuries that is the people like me. Who are gray haired no longer a student. But things have changed so much. So we just we can't really adequately deal with this with an incremental approach we have to really think differently about how we're going to support science and engineering and so students. You now should note that you are something like ten to the nine or ten of the twelve times more powerful than your professors were when they were your age and so you have really a great exciting time ahead of us. So just keep and be nice to your professors because they say they still know things. So so this this creates really serious crises I think for us and I want to just highlight a few of them without going into any any depth but so there's sort of a side. We're. A crisis going on and so am I just that we're not on top of this we have all of these things we have to worry about whether it's computing data software networks education organizational structures instruments and facilities were grappling with all of these things that N.S.F. now as we think about what's the future of things will be funding and I'll just point out a few of these. In terms of single computers your cell phones now have many cores your. Your supercomputers have hundreds of thousands going up to millions of cores are lots of new technologies coming out but I want to point out one thing. This is the number of of processor units as a function of the year and a few years ago. There's kind of a step function from like thousands to hundreds of thousands or something like that. And so this been pointed out that there are as many cores in a supercomputer now as there were transistors in the original say Motorola sixty eight thousand chips that that really dates me that's what the Macintosh came out with so the point is you would not have thought of doing a programming your Macintosh at the level of sending a single bit from one transistor to another but that's an equivalent of what you have to be thinking about it in a supercomputer these days with that many cores so it's the same scale of problem of course that drives thinking about higher level abstractions and programming languages and so on and there's also fault tolerance to work on so you'll know when on any single computer something is always breaking but on one that has a million processing units. Something's going to be failing at a pretty high rate so you have to think about fault tolerance and so there are many computer science challenges just at the level of how will we use these new computing facilities. I'll talk a little bit more about data. As we as we go forward but it turns out that we're generating more data. Now every year than we were then we have in all of human history in the past software. Not of this stuff works without software and it's no longer adequate to sort of as my advisor did give me a yellow pad of equations say you know we have to code these up on a super computer so we just can't do that anymore. Software environments are getting to be millions of lines or and more I mean that's even for many fields that it's just almost laughable that it's so such a low number and there are many bugs in these things that we have to start thinking software as a first class citizen and so at N.S.F. now we have a new program called S.-I squared for software institutes and innovators and so the idea is that we're going to be supporting software at a level that's on par with our investments in hardware and supercomputers and so on and so that's very important. It's a step that's just started but I can imagine we'll be investing hundreds of millions of dollars over the next decade and the development of software. OK so. So this is the graph that I wanted to show you about the evolution of data which I got from Chris Johnson at the University of Utah So this is the amount of data you can store in your brain in a year so it's basically had zero and it's getting. It's the only line that's declining in in time I think. And so this is the amount of this line is all the documents that they were digitized in human history is over forty thousand years and so this line shows you how much data unique data that's been calculated in some way I'm not sure how or estimated that we generate every year and so that's the crossover point where now every year. A few years ago this this crossover happened. We are generating more unique data in every year than we've ever generated in the history of time up to now so that's just an extraordinary fact to to grapple with. And so that means it's something that's really changed very fundamentally in the last few years and so what do we do about that. Well there are lots and lots of reports that say there's a crisis. But we're still not quite sure what to do with us on the leave that one for now since we were we're beginning to really face it but but it's going to require a lot of new developments in mathematics statistics algorithms cyber infrastructure ability to store this etc and policy. So another crisis how do we organize ourselves for these multidisciplinary. And so I I would point out that there have been many many reports of federal committees and advisory committees and so on. This is one from one thousand from two thousand and five and I would encourage if you don't know it to Google for. The pit TAC report that's P.-I T. A C presidential presidents information technology advisory committee so it's of the president of the US talking about computational science and they. These are from the executive summary of this report. They say first of all universities have got to significantly change their organizational structures. If we're going to remain competitive in science and then also federal agencies have to think very hard about how to fund this in a better way and so this has been one of the things I've been focusing on a lot of N.S.F. how do we do this and so I've had some some impact on this but it's something is going to take some years to really get right so I just encourage you all to think very hard about this and to speak out about because I think it's one of the challenges of our time really. And then lastly I would say how do we educate in these environments these environments are running away from us and they're getting incredibly complex. So how do we develop a workforce that really knows how to work with the new methodology that are coming out and we're very rapid timescales and how do we help universities transition to this so so I'm going to summarize this and then move on to another topic and in a minute I'm kind of winding down here but this is my my cartoon version of the national cyber infrastructure with people on campuses. We have these large scale investments in supercomputers of thirty million dollars at Apapa track two that's just for the hardware Georgia Tech again plays a major role in one of these with the the king. Linda ward for a visualization for the Oak Ridge facility and then there's also the track one facility which is being developed at the University of Illinois. That's a pet a scale machine two hundred million dollars just for the hardware all of this activity and data networks and so on and then. Again on an individual campus you've got individual D.N.A. sequences generating as much data as a supercomputer can that. Cost millions of dollars. All of these facilities that as people like to say now they're more silicone than steel so they're generating all of this stuff all this data. What do we do. And now you have a graduate student who says I don't need to just use this I may need some data from each of these things in order to do my particular science project so how do we educate people. I think we really have to think deeply about this in this new environment. And again I said this a couple of times but with science becoming so complex with these environments becoming so complex. How do we reproduce science so in these environments and so if someone published a paper like let's say Pablo publishes a paper and somebody is I want to see the data from your from your publication from from your code you might have to think twice about doing that. So this is the kind of thing where we're really having to think about doing this and in fact Science magazine now has a new policy that you have to agree to provide the data that was used to derive your results if you're asked for and I asked them out of science board meeting I said OK just to be provocative. If there's are a result on say a limit on the Higgs bows on that's published in Science could I ask for the Elysee data and the answer was yes. So there are some things like this. We're going to have to really grapple with how are we going to do this kind of verification and sharing of data. It's not traditional to do it but if think it's ultimately going to be required for us to move forward in this environment. All right so recommendations and all of this I've been highlighting all these crises. I just want to point out that. Particularly within the area of cyber infrastructure when I was leading the office of cyber infrastructure we had a lot of task forces. Six of them and some a number of you participated in these things and I would just encourage you to to look through them because there was a lot of work done by the community with many many recommendations that are not just for N.S.F. but for the community as well. So I encourage you to look at them and so we're taking the minute N.S.F. but so here are three that I have highlighted one is that permanent programatic activities. In computational in Dayton able to find in engineering or C.D.'s and he should be established within an ISAF and so that in a sense was the fundamental recommendation that came out of all of these and I'll just point out that you're doing you're doing that here creating a C S I New program which I think is is quite exciting and there are a number of them around the country but this one seems to be organized in a particularly effective way and that N.S.F. needs to establish processes to collect requirements on long term software road maps. We're trying to do that now and we have to think much more about interdisciplinary research on particularly how do we broaden participation of the science community so this is another area that I had some discussions with people on the campus here today as an engineering oriented school. Women are not highly represented to put it mildly in that particular discipline or those disciplines. So we have to face this really seriously and if you look ahead. It turns out the demographics are showing that. That the the class of of twenty twenty eight which was just the high school class of two thousand and twenty eight which is just born last year will be the first class in which whites are a minority and so we have to think very hard not just about female male balance but also the cultural balance if we're going to have enough scientists and engineers in order to be able to do this kind of work and so we really have to think very hard about the broadening participation activities. All right so actions. I'll just say that we have a budget request. I'm I've learned very much. How exciting a budget request of Congress can be and how meaningless it can be because you ask Congress for something and they have not actually provided a budget for F Y twelve yet so and so but but we did request of N.S.F. a budget of seven point seven billion dollars and the two very large activities across the ones of touched every single unit were thing called C S science engineering and education for sustainability nearly a billion dollars of our budget was dedicated to that in terms of our request and see I have twenty one the cyber infrastructure for. Work for twenty first century science and engineering. So as the as the budget request has not really come back with the budgets we've requested. Nonetheless we are moving forward as best we can and so with the see I have twenty one activities we are trying within math and physical sciences to launch something like a twenty million dollar activity in F Y twelve which we've just started in where we're thinking about how to network people together particularly focusing on data enabled science looking at new computational resources experimenting with things like G.P.U. clusters and such things but particular as applied to science and then access and connection to resources so we have to really think about how we get access from the campus to your desk from these facilities that we're producing or are putting out there. So in the last few minutes. Let me just say a few words about data enabled science so. The way we're seeing it is that there are these three levels of data enabled science at N.S.F. that we're trying to develop and so the first one is providing data itself so this is kind of on the cyber infrastructure side providing services that serve data up to the community whether it's the physics community chemistry. You name it we need to find ways to do this and to be able to cure rate data over long periods of time. So we're thinking hard about doing how to do that and then there's the data analysis and and tools the kind of information. So what are the algorithms how do you visualize that what about the mathematics that the cystic send so on. Many new things will be happening because of the data influx of new new developments in fundamental mathematics as well as Applied Mathematics and cetera. And then the third layer is the data intensive sciences themselves so it gets differentiated when you apply these things specifically to astronomy physics and so on. So we're trying to to ramp up in this way. So for example and in my office we talk mostly about these areas of a little bit in these areas particular through division of mathematical sciences and then someone here but the office of cyber structure kind of focuses more on the top part and then we're trying to figure out how to pull all of this together. So. So let me just skip this this particular thing but say that there are many fundamental issues around data so how do we remove boundaries between disciplines. How do we have center by sharing. I will tell you that every single project that comes now before the National Science Board. Is asked. What is your policy for sharing data and so that increasingly you're going to be asked to share your data I just is coming now right now we have a new policy and the doesn't require it. But more and more so there will be more incentives and some cases requirements of actually share data and so how do we attribute credit for people who are developing data so. So we're debating for example right now at N.S.F. if you apply for granted. N.S.F. you're used to having to provide your five most important publications relevant to your proposal. So we're probably going to change that to say the five most important products of the research that was supported by your by your grants or your previous grants and those products can include peer reviewed publications site of all datasets site of all software environments and so on. So we're trying to make to kind of raise the bar a little bit to provide incentives that this stuff is important and it's actually you need to get intellectual credit for it as well so I'm going to conclude with just a couple of fundamental points on on what we're actually doing and data policy at N.S.F. so so I'm going to make three statements that I think people would have a hard time arguing with that publicly funded scientific data that's funded by your tax dollars. Ultimately should be available and the publications that you produce should be available and science in the public will benefit. So I mean it's sort of one statement but I think it's true and there has to be a place to put this. Obviously if it's going to be served available to the community and has to be some place probably on the authors website is not the best place for that but it at least for now in some cases that's what people do and there needs to be an affordable sustainable cost model for this. That's where we're really having troubles because this costs a lot of money publishing costs a lot of. Any and so we have to find the right cost model so that we can make things available while also maintaining sort of the in the viable business model so. So the question is then you begin to ask questions. Well what data has to actually be shared. So if you're of a particle physicist at the Large Hadron Collider you will say that we don't share our data except within our collaboration which might have ten thousand scientists but beyond that they don't. It's not shared because there's a competition that actually spent twenty years preparing the experiments and so somebody wants to win the Nobel Prize and so but on the other hand there's a this this pressure at this is it's a public good to provide that data. So which data have to be made available. How much will it cost. How much effort so. So a lot of questions are asked and so we're not quite. We haven't quite figured out at what stage data have to be provided and how much of it and so on and where is it placed. And so on in a library does the N.S.F. actually provide that a set of so many many questions Who pays for it and so these are questions to which we're struggling to find the answers right now. How long is it made available. If you know if Galileo provided as they did He had an N.S.F. grant the grant long expired but we still probably like to see the data right so who's going to pay for that. So the thing that we've come up with after thinking about this for years now is that there's great variability in requirements and so peer review can help guide this process. So that's currently our fundamental principle in this and so what we've got is the a new policy at N.S.F. that says. If you are putting a proposal in you must put in a data management plan it or your proposal will not even be looked at so you have to now put in a data management plan that started last January so almost a year and in into this now and it could be a two page supplemental document to the existing proposal. It is not considered like yet another requirement. We are trying very much to grapple with this what we see is trends in science and then we want the community to say this is what I. Pose for my data set and your peer reviewers will look at this and will comment on it. So at the beginning we just want to get the process started and then we from that we will derive some requirements on the actual maintenance of the data or the how much I would need to be available and so on as we go forward. So that's kind of where we are. So let me just conclude then by saying that it's clear I hope that to everybody that things are changing really dramatically and that I think we need a comprehensive approach in order to look at these problems of twenty first century science and engineering so it's not just supercomputing it's not just mathematical P.D. East. You know it's not just data intensive science or statistics or it's on it's somehow all of this and really is needed for almost any of the complex problems that people are looking at in today's world and so among those the most the newest thing that's really getting people's attention is the data intensive sciences and and I was really struck by that when I came into math and physical sciences every single division is grappling with this whether it's a strawman the physics or materials or or chemistry and so on and somehow universities are to rethink their academic structures and their curriculum to deal with this. And so we're happy to help you do that. As we do it ourselves. So that's that's my presentation thanks. At zero zero zero. That is a that's a major complicating factor and in a lot of ways and so what I'll tell you it might not surprise you. I do deal lot with international collaboration is because many of these projects for example the the almost a radio astronomy telescope is funded by many countries around the world so ends up only is about the third or quarter I think it's a third share or something like that. So I was just actually last week I met with people at the European Commission and also at CERN talk in that the topic that always comes up is data how do we share this what are our policies how do we work collaboratively but it's still very hard for any one nation even the US to say you have to share all of your data when for example at CERN N.S.F. is a minor player actually and so. So we really can't do that but we're trying to work together collaboratively and it's on at all levels from an individual pianist of the heads of of funding agencies to even at the diplomatic level so for example in Antarctica. There are treaties. If you're going to do science in Antarctica you have to agree to share your data that come out of that within a period of there's a certain limit of the period so all of these things are moving in that direction of more sharing and openness but but still very difficult going to figure out how to do it. Right. Well who's supposed to do that. So everybody everybody will agree it should be done. Of course that you might have argument about which data are really worth curating but that's another thing. So I think that ultimately we need to develop a kind of a national and in the in the global sense an international data infrastructure. It doesn't mean we have like five centers we have to place your data but somehow Federated between campuses regional things centers that are perhaps disciplinary specific So for example the National Center for Atmospheric Research and Carr focuses a lot on curating data in the geosciences and environmental sciences and so. So that's in fact they just have a new. A data center in Wyoming that that's one of the aspects of they're working on and so increasingly you will be able to propose as part of your proposals to N.S.F. India and so on that some fraction of your funding needs to go towards the curation of the data and that will be an allowable cost but we'll have to figure out that the model that makes it possible and then the have to be a lot of judgment calls about which data are really required to be saved and curated and so on so well we'll just kind of stumble our way through on that one but it's a tough one. So I I didn't do it. I didn't take this time but sometimes I get this talk I point out how I did I did my Ph D. thesis at Yale where they have a good and bad Bible from fourteen fifty or something like that and it's beautiful and you can still read it and the Ph D. thesis that I wrote on my disk that cannot be read. So librarians are the one community that is really figured out how to keep things in a long and a long time and they also see that I think that their future is increasingly very digital or at least as a venue for collaboration and so. So in fact we've had a number of workshops at N.S.F. on digital data and the library community has been there and some exam. The Johns Hopkins. Library actually is one of the curators of the sky survey data so they keep the forty terabytes and they're finding ways to serve it up to the community. So I think that's a really good point and it's really something that we're working on to and some white Berrys are really much forward thinking in this area compared to others but I think that's a future of libraries. I would say the answer is No not yet there are communities that have best practices where they've already kind of coalesced around a particular kind of a model for this and there are they're just sort of community norms but they vary a lot from from community to community right now. So that's one of the issues. When we announced this the data management plan. Somebody sent me a very funny a provocative e-mail says OK N.S.F. I will I will comply with this I will provide all of my data every student puts it on a different disk in its own format that can't be read by anyone else aside from the student and then that I'm in compliance right. So I mean so we're not we're not actually telling anyone right now that they have to actually serve their data or store or serve it to anyone unless and then we might do that in a particular solicitation for which there is a well established community of culture around doing that but typically Right now we're just saying we want to data management plan and it could be that you could say that I don't have any data that really anyone would be interested in and you might say that and that but your peer reviewers are going to look at that and say I maybe that's true. Or maybe it's not true. And so we want to collect some information on what the community really thinks and this is one way to do it right now. Work we're doing that in fact it will be less than five years so it were definite we're already doing data mining So the interesting thing is I'm We've done some data mining of our proposals and it turns out. Words like data data management and so on. Two years ago it was in a small fraction of proposals and now it's in a very high fraction of the proposal so at least the community is beginning to really see that this is an issue. And there were you saying well we differentiate between science for scientists and scientists for the public. Right. So I think it's true but it's also true I mean particularly let's just take of the what's going on in astronomy there by orders of magnitude. There are not enough astronomers to look at all the data out there and some algorithms and matter how clever We're going to miss. Something. And so making it available you could imagine like a classroom every first grade classroom could take some degree on the sky kind of watch it you know and so you can imagine all kinds of things like this happening so I think we've been talking a lot about the opportunities for education like rethinking what education is all about. If you serve data up to the public as opposed to just to the science community which I think is already it would be a big step forward but since many states now are actually moving from paper to electronic textbooks you know with i Pads or Kindles and such things. You can imagine developing new kind of teaching around those not just technology adoption but really rethinking the teaching of science around those things and also providing a bit more inspiration so you don't just have a picture that comes from a so your neutrino Observatory but you actually have the real data from you know from the observatory as it's coming in. You can actually measuring doing that or connecting it into simulations and so on. So there are a lot of things that could be done in the area of citizen science that I think a change the way education is done and so on but I mean back to the original point about I think right now that the main point is thinking about. Making sure that the science is available to me that the data are available and the data the primary output of the science activities these days are increasingly so making it available across. The communities. But then making it searchable so I think there's a lot of knowledge. So I once had a paper that was really excited about and I got a Physical Review letter out of it and it turned out that there was a piece of it that I thought was so brilliant that was just very well known to the the applied mathematics community in for a long time and if I had a search engine that could help me find that knowledge. To our it's a part of the literature that I didn't really ever access or didn't really understand. I might have found that and I can imagine in time as search engines can become much more sophisticated so they could actually follow arguments between different bodies of literature for example you can really accelerate interdisciplinary science and in a much greater integrate way so that means that the publications have to be open as well. I would say if people. So Congress is interested in this. So the level of staffers talking to us or sometimes I've had to testify before Congress. There are questions like this there are metrics that people are putting in place to see if for example as a sign that scientific output from some of these facilities as high as you would want to be and are the breakthroughs there and so on. So there is increasing pressure to show demonstrated results out of investments of all kinds not just cyber infrastructure but of any program now increasingly we're being asked to provide metrics so we can assess whether or not we've been successful but so cyber infrastructure is certainly one of those. It's I think it's an issue that people somehow were not as mindful of when when we really began using computers a couple decades ago and so now this is something that people are thinking much more about so I think we had some some pretty bad hiccups in this and beginning. I hope that will be just constantly sort of moving especially when libraries become digital it's going to become more of an imperative to make sure as we move from technology to technology that we really address this as it's being developed. So that's that's all I can say about it and. Thanks.