Today we're very happy to have with us Professor Eric disturbed from Virginia Tech and I've known Eric for a very long time I won't say how long but I was actually a student and he was. A fresh graduate doing in my opinion and in my opinion. Today as well. Very very cool things in the area in many areas including this area of subspace recycling. Eric received his Ph D. from the University of Technology. And after that he spent five years. In one nine hundred ninety seven he was one of the prize winners of the Fox prize in numerical analysis one of the things that I was very jealous of him about of course I had graduated by that time story. And then in one thousand nine hundred eighty joined the faculty of the Department of Computer Science at the University of Illinois Urbana champagne and since two thousand and six he has been with the math department at Virginia Tech. He is of course involved in many activities including organizing conferences serving on their program committees and also on editorial boards of many many journals including the Journal of numerical analysis. Thank you Eric. But please help me. I want to thank you so. All right so the topic of the talk is sequences of problems makes sense solutions and I'm going to focus on solving. Sequences of linear systems and then the main idea behind that is that when we do simulations or when we optimize it turns out that in many cases we have to solve a long sequence of problems. And it pays off quite a bit to try to do something smarter than solving all of the as if they were in. Visual problems. So this is been work with a range of people several students they seem to either go to a bank or to send via But there's a large number of collaborators and problems in tomorrow feel as we should Kilmer some civil engineers at the University of Illinois who need to build an aerospace engineer. People in competition of physics and competition or quantum chemistry. A number of people in the city qualification and some recently people doing. Acoustic switch which typically has very hard problems and actually so young spent the summer working with me in Virginia and some very nice results and so I'll come back to there's a list of applications right here and this work has been funded by the National Science Foundation the Department of Energy and NASA. So OK so the overview of the talk is so we're going to talk about you know why do we want to solve sequence of linear systems and I'm going to give a couple of explanation a couple of applications and some examples. So you can kind of see how things work. Why you know what kind of different steps we have in these algorithms. And to show you that it really works and it pays off and then we'll move back a little bit and look at some of the mathematics that is under the hood. Right. So what makes these methods work and kind of want to show that there is very solid mathematical methods and analysis in fall in making this work a little bit maybe a look at the future somewhat depending on the time in the conclusions and some extra explain for religious conclusions of things that I didn't present in the talk but could have if there were more time. And some future work. OK. So on so many computational problems involve a sequence of systems with either small changes or highly localized changes or changes that have very special structure. So linear says. You know X. is equal to be is in the matrix equation. And those make change in various ways. And the reason to dispute extra time in smarter methods is that these linear systems solving these in the course of the simulation may take ninety percent or so of the time that is to say if you can do really well on those systems right now you will speed up your application significantly. So what kind of problems are they so this time the call evolutionary problems because some problems are like time like but there's no real physical time and also there's a problem. So you solve an evolution of problems in your problems in optimization so as soon as you do anything nonlinear in more than two variables you can solve it directly in C. typically you have to sort of there's a sequence of linear systems. There's many mutant type methods and so on. Parameter estimation so that's something that currently gets a lot of attention of course it is a nonlinear optimization problem as well but so here are typically problems where. You have some measurements. And based on some parameter choices your system approximates reality more or less accurately. So the idea is that you find this the parameter set the data that you have. And so you have to solve again you have to solve a sequence of systems there is not time involved but you some things you march through some parameter space and so again you get sequence of process are very similar. Many people in. Quantum chemistry and compositional physics use so-called Monte Carlo methods and markets aim of the qualities or methods for efficiently sampling very high dimensional spaces and again you solve in this case up to millions of systems that change very slowly over time. So not a class of problems. So here typically you would get mages that least look the same they may be slightly different but the SO here. If you discrimination changes your matrix changes completely. But the underlying physical problem may be very similar. And so again with with some extra care you can take this into account and get more efficient solutions. And the other part that that is important is that the actual simulation or optimization requires hundreds or thousands of systems that are very large they can go in a millions or you know at the national level billions of our gnomes and as I said for some of these problems you may have even you know a million systems a song. And some applications or it is so crack propagation. So you're looking at how cracks say something at the level of a micron say propagates to an airplane wing. Right. So they're already you can see you need to do this very accurately but at the same time the structural properties of a wing. Will not change dramatically if that crack propagates by a micro right so you can take this into account. So it's a policy optimization is a particular set of algorithms for optimal design and I'll show an example of that in a moment tomography you send a signal through a medium. You only have the measurements on the outside but you want to know what the medium looks like. So there's many problems and one one set of applications actually that is very interesting so-called uncertainty qualification So typically when we do a simulation we have. Real issues idealized properties of the material. Right. So if you look at the wall that was not going to be equally strong at every part of the wall. But there's no way you could realistically model that. So what you do is you issue some average say strength and some distribution of those strength around that. So if you want to quantify the risk of failure you need to do you need to solve again many realisations and come together the statistics to get the answers to this thirty six of the solution. And so I don't actually have done qualification myself but many people in this field have actually used these methods to solve their problems faster I'm going to show you an example there was obtained by some other people right. So the main idea is to exploit the slowly changing nature of the problem to get much faster solution rather special structural changes. So there's a few things you can do to. The obvious things in a time dependent problem of course is you use your all solution as an initial guess for your next time step. Everybody's been doing that. That's not the thing really interesting. So what we'll see is that many of the methods to be used for solving these problems build the search space and find an approximate solution in sight. That's what space and building the search space takes most of the time. So what we want to do is we want to recycle is adapted we use those search spaces to jumpstart the next system from one step to the next and make things much faster. There's some other things you can do so. If you don't know what preconditioning is I'll tell you that in a moment but the main idea is that it's a transformation of your little system to another linear system that is easier to solve. And of course that can be expensive or computing that transformation can be expensive. So if you have a sequence of systems there too. It may make sense to update those preconditions every step rather than computing from scratch and several people have worked on that we done this for adaptive mashing and Monte Carlo problems. And other people have done it for it's a fluid dynamics problems and so on. So here's a list of. And it's only a partial list of applications that I or other groups have used these methods for and you can see it runs really through a large range of problems and since two weeks. We're also talking to some people doing astrophysics so since we do quantum Monte Carlo then for electronic structures it's the most basic level of atoms and electrons and then hopefully soon will do astrophysics to I can really say that this method spends now. All ranges. Of scales in the universe. OK so here's one. Application. So it's different differential equation this is a Helmholtz problem occurs in many electromagnetics in many other fields and this is a very simple model problem and this comes out of somewhere. From site University of ball there has been using these methods. And so the parameters that govern the sports differential equation are still casting. That is to say we don't know exactly what the diffusion is but we know what the average and some of the moments of the probability distribution of this is and we want to know and the same for to say can you can think of these as chemical reaction or absorption and what we want to know if you want to know the statistics of you what is the statistics of the solutions that come out of such a system. And in order to do that. There's many ways for doing this on the qualification. They have to solve so this is what this graph gives you on the bottom they have to solve more than five thousand systems so somewhere between five thousand and fifty five hundred linear systems. These systems can be very large for this since this is a model. It's not that large here but in realistic situations these systems can be very large. And for each system what you can see on the on the Rye axis is the number of steps so the size of the search base that was built to solve this linear system. So so so this is the number of matrix vector products in some sense that was required and so of course if these are very large and very expensive. This is expensive. So this is a like using a standard method you can see some some problems went fast but many are in the Take hundred fifty three nations or two hundred fifty iterations. Some were faster and again some some more slower again. And if they use this idea of recycling so using specific compute results from one system so part of the subspace and I'll show in a moment how it works but to jumpstart the solution of the next. The virtually all the systems were solved in roughly twenty five steps right so. For some of these defectors six percent of these that effect are ten in the reduction of the total amount of work up to some of the some involved too so it can really improve the amount of work that you need to do. Very drastically. So here's another set of case and so topology optimization. The to pay. Comes from the notion that the shape of the design of the policy of the design is unknown in advance so think of the following problem. I'm giving you a fixed amount of steel. And you're asked to build the strongest Bridge say right. So you can and strongest in this case is measured by the energy taken up by the structure which is called to compliance so you transpose. Case the stiffness matrix row is the distribution of material. So this is a measure for the energy that the structure takes up. And this is how the system behaves right so under some load you get some solution you when you feed it in here. So how does this work. You put a lot on your system you compute how everything deforms the final analysis. Then you look at kind of the derivative So you say like well how would the compliance the energy be taken up by the structure how would it change for a local redistribution of material. So you compute that for every part of your domain and then you do a global update that can be some advanced method or can just be a gradient type approach. To redistribute the material so you can see like a little cartoon here how the optimal design comes out and you have to go through this loop many times. OK. So here is an example. So we have a simple structure this just came out of the literature support of the two points you pull are two points. We're only going to simulate one quarter of this the rest will be the same by symmetry and you can see that the final design. Looks like this structure right so it's not something you could easily imagine how it would look. And I'll show you a little movie how how the system how the solution arises. During the simulation. So this was recorded. Right. This is not real time. So you can see here to not only do you know does the solution change and it means every time you solve a linear system because the materials be distributed differently. The elicited. That you're solving us change but in this case we're also changing the mesh as we go because doing computations where you have holes is you know it's not efficient. So this is what comes out. Here are some results not for the specific problem that you saw but like for say finding an optimal bin. The large problems that we're actually doing work to up to two million are known but. OK running a large number of cases to see how well the method works we did on a smaller problem and you can see so this is how the material distribution changes this is the maximum material change materially solution for elements from this scale here and this is the number of iterations and this is the number of optimization steps right so you can see that there's some initial phases. In which the recycling goes into a lot but then very quickly you can see that we you know we use many fewer iterations so up to like effect the three. There's some overhead involved. So here is the same thing as the optimization step so it took about one hundred forty steps to find the optimal structure. And this is the reduction in time so you can see that the reduction in time is not quite effective three because there's some overhead so in this case it's about Effect two. We can also see it was still improving but unfortunately optimization was done. So this is the result this is what that being should look like it's only half again only simulating half of it because by symmetry we know the other half. OK here is one last example. So this is for electrical impedance tomography it's a fairly new marker fi method and the nice thing about it. It's not particularly it's not super accurate but it's very cheap. And so this is done with people in Brazil and one of the at the University of South Pole and one of the advantages is that you know with relatively simple materials in a P.C. you can go into the jungle or some remote. Village. And check people and get images and tell a doctor what they need to do so for them. This is actually a very promising technology. OK And this is this is the results from some model problem where they I think but they're insisting this case is longer rationed so you have. The signal you have very big jump in how the signal propagates from say you know we are mostly water right so it's very rapid but then in the lungs which is mostly air it's slower and so. This was just done by putting a tube into some cylinder with water but they actually did some tests on human beings too. And you have to sign a lot of papers to show those so we're not doing that. Again so here are the results for the numerics right so again this is for slightly different variations of our algorithm. The standard algorithm takes something like one hundred eighty to two hundred. And then after you know a few initial steps right some are our method essentially takes sixty fifty sixty iterations right so of between effect of three and four reduction in work. OK so how does this work. How can we make things faster so. I first need to tell you a little bit about the mathematical methods that we use to solve these systems and then how we improve them. So we have a linear system to be we're a as large K. X. is on the effect of unknowns and B. is the right hand side and we have some initial Gas X. knocks. We don't know if we have the exact solution because we don't know the solution but we can measure how well we satisfy the system of equations for computing the difference between the right in size in eight times X. So that's the residual. And then. OK At each step. We want to its roots of Li approximate the solution at each step we compute an optimal update Z. from a space of vectors which is the space spent by our not a times are not a square. And so on and this is called the Queen of Spades of the mention M.. And an optimal method for example may want to pick that Z. out of the space such that the norm of this residual is minimized so in that measure then it's the best solution you can find and this is the search space that I was talking about. Not just the terminal your thoughts like if you wanted to minimize the two norm. One thing you could do you should not do this. Numerically it's not stable you can just take those vectors stick them into a matrix columns. Parameterized solution. OK Kate times that are so this is just a linear combination of those columns and then it's just a least squares problem. OK so what efficient method to do is just each step efficiently and stable e solve. So this problem as you extend this matrix and as I said you cannot just really built this matrix that would be unstable. And one of the most popular methods was proposed by a former colleague of his soon as you said this gene best method which stands for generalized minimal residual right that's what you're doing here. You're minimizing it in the true one. And there are some older papers that are breaking the same but simply not the algorithms were not as efficient or not as stable. So OK so we need to find an efficient method for doing this but it's really solving least squares problems. But to understand why these methods are fast. We need to look a little bit further. So what you can see if you look at these vectors they are all poly there are powers of eight times these initial vectors. So if I take a linear combination of those factors. What I'm really building that's of course what this is that if you think of it. Starting with coefficients set on Ansett one right then. This is what you're building. And what this is is really a polynomial in The Matrix. A TO THE POWER zero into the power one updated part and minus on this is a plain normal in the midst of the minus one times this initial residual. And then. OK. In principle this polynomial is arbitrary. You can pick anything you want. Of course if you want this solution. It's a very specific polynomial. But in Prince we can pick what you want. But what we want is we want to solve the system right so what we want to see the residual The difference between the right hand side and eight times X. you want it to be small or a zero preferably. So this residual It turns out is a polynomial too because it becomes our not minus and then I get eight times this polynomial times are not so this is now polynomial of degree one higher but it has one annoying feature and that is that it's constant term is one that is to say this pulling over is not arbitrary it has to set this fire the constraint that it's constant term is one that means it's one at the origin. Right. So anyway so this is a degree of degree. M. in a Times are not. OK So this is what the algorithm would look like to do this efficiently. And then we'll continue with the analysis. So you want to solve the system you pick this residual or not you divide it by its norm so now it's unit vector and then you build the space the discrete of space as follows At each step. So you have the one. You compute V two by multiplying V one by a. Orthogonal izing against the one. And then normalizing and then you know you compute V. K. by multiple one by multiplying the K. by a normalizing it against the one FI two and so up to V. K. normalizing So this gives you a set of orthonormal basis vectors for the scroll of space. And then you can show that the score fissions for eight gives you a small employers one by amylin the least squares problem who solution is equivalent to the solution of the largely squares problem. So rather than solving and where N. is HUGE say millions by M. M. a small problem really solved the square sprawl of size and pulls one by M. and you compute this problem as you go. And so this becomes a very popular method if you try it out on Google Scholar you get something like four thousand citations or so which for a math paper is pretty spectacular. OK So this is what is being done and then you update X. as you know this is computed zat that the solution to the smallest squares problem times this set of vectors feeds you. Computers that spend the space and so on. All right so. If we had to do a million steps for a prom of size a million. This would not be efficient but it turns out that even for very large problems were sometimes done in fifty or one hundred steps so why why does it go so fast. Well the reason for that is that the crew of space is a space of polynomials in the matrix times a vector and so it inherits the approximation properties that pulling on us have on the real line are in the complex plane so. Let's assume a is there going to light. So A is Fifi here are as it's called So the idea effect is typically normalized eigenvectors times. The with the I can feel used to be inverse. And then. OK powers of eight squared will be the inverse times the same thing. So this drops out and I just get lambda squared here in the middle and more generally eight to the power Case V. led to the power K. Times v inverse. So if I take a polynomial in this right you just write your linear coefficients times this matrix. So it's a linear so you can move the two left feet inverse to the right and what you now get and that's really what is the crucial part is that you get v. And now you get a diagonal matrix. Where on the diagonal essentially you apply this polynomial to every eigenvalue individually. OK so. So you have to keep that in mind and so approximate solutions to linear systems and I can tell you problems and many more problems really rely on this property. OK So why does this now work well. Well let's assume the I can feel is of a lie in some region. And lets say that region is some part of the complex plane or if they make the symmetric or her mission. It's on the real line. And we know polynomials have. Approximation property so let's say you try to approximate the function one over T. by a polynomial. Now if this region that contains the eigenvalues is not too close to the origin it turns out you don't need a high degree polynomial to get an accurate approximation. And it means that there is some low degree polynomial that if you know apply this to a. You write out this expression you get V.. Times one over T. time so K V In first times are not so that I call that row here. But essentially what this means is that you apply the inverse of a to your vector and that's all you need you want to solve linear system. So if you can find a low degree polynomial so appallingly with modest. Such that this is satisfied. You're in business you have a fast solver. And it turns out that if the I can feel it or not and some are not too bad this can always be done for small and right. So if the region Ahmed is nice and that typically means that your I can feel you are clustered away from the origin. Then things go fast. Right now of course that means you can imagine from only heart problems this is not the case. So what do you do well in that case we do the following we find some matrix P.. And think of P. as a matrix that approximates the inverse of a in some sense you don't want the inverse of a that's too expensive and then we iterate with P. eight times X. So the idea is then the P. A will have crossed the diagonal use. But multiplying by P. is still cheap and in many problems we know exactly how to find such people. Of course computing those from scratch every step would be expensive too so that's why we also want to be able to update rather than compute from scratch. So that's why cream of methods has become very popular. OK So now here's the next step. Right. We solve the sequence of problems. Let's say we know that mostly our universe are in a nice region but part of our spectrum is not some eigenvalues are close to the origin and are pulling on. Born at the origin so it won't damp things there. It won't look like what we wanted to look like. So. The idea is that if we know the vectors associate with those eigenvalues we can project them out of the system because if we project them out then essentially we can think of the eigenvalues as gone it's not quite as simple but by and large of how it works. And so then we have fast convergence and since the region often doesn't change that much of the vectors that spend those invariant subspace is don't change too much we can project those out of every system in the system kind of learns as you solve sequence after you solve linear system after a linear system. OK So let's say we have a couple of vectors that span the space of interest in we make them the columns of our mates. If you till the. And eight times you kill the then is the image of this. That's what we're going to project and so we want to do a Q. are the composition we find an orthonormal basis for c till the. And then are or are in the first if the transformation. Never dissipates exactly never computed whenever we need to multiply by you. We just apply this matrix to defect that we because fission vector first. OK so now we so now we know the inverse over the rings of C. so we can project it out of a right hand side. That gives us an initial guess. And now we essentially do exactly the same algorithm we did before. So this iteration is more or less like with one extra step that we project out everything in the direction. C.. Before we continue. So we so initially we computed by multiplying eight times we came on this one and then orthogonal I's so now we do the same thing but we first orthogonal is against the C. vectors and then we have focalized If the vector is against each other. This is not very expensive. This summer overheads. And now we have a slightly different optimization problem to solve so rather than just finding the optimal Y. which is the optimal vector in the space. We also have to find the optimal vector in U.. But it turns out if you set up this problem. The right way and we did this some time ago for actually for a different. And this is just two completely separate optimization problem so the least squares problem you need to solve is exactly the same small least squares problem that you saw for us. So it's very cheap and then dizzied it you need to find is just minus this Y. Times inspect to be that you computed so be estimates you see Star avi so you already have this. And that's all you need to do. OK. Now there's different things you could use and OK so in do doing this has the effect that you just saw for many applications so in many cases we reduce the number of iterations by for example effect attending this uncertainty quantification problem right. Effect three or four in the tomography problem and so on. Right. So the idea is discrete of methods build the search space and get the solution by projection and the search space that's often the dominant cost. So if you can reduce the number of steps you need to take you give a smaller space to get the right answer. If you're in business. So the initial convergence is often relatively poor and only when the search space reaches of reasonable size you get super linear faster convergence right. So the idea is to get the fast convergence rate and a good initial guess if the right insights are the same immediately by recycling selected search spaces from previous systems that's the main idea. So there's different things we can do and so here I'm focusing somewhat on invariant subspace So those are spaces spent by factors essentially. And other people have done similar things for just solving a single linear system so not for a sequence or run of Morgan's that some work from that and then for sequence of problems we propose this my part was one of my students and soon one was another student. There's other things you can do again mostly for a fixed Matrix or not for a system that changes so one of the things we've done that is really new is if you're makes exchanges from one step to the next. But other people have for a fixed look at these cases. There's something quite different. You can. And there is if you look at the spaces you generate You can look at the angles between those spaces and that means you can measure how much those spaces from one step to the next or from one system to the next change and you can filter out you can kind of find out which components do not change very much. So those are worthwhile keeping because then you don't have to build them again. And off. Obviously in many nonlinear problems. You often find that if you keep the sequence of previous solutions. Then you can use that to to improve your conversion rate so. Paul. Fisher did this a long time ago for not a Stokes problems just to get initial gas not necessarily to speed up. Its ration. But for tomography problems. Again this non-linear parameter optimization problem. We've been doing this. More recently even in problems where they make the changes at every step and we speed up the convergence. And there's a new project that I've been working on with a number of physicists David separate the University of Illinois. In quantum of the Crow So they're the mates exchanges one of grow at a time at every step. And you need to solve a linear system with sheets of those matrices. So just to kind of give an idea of why how we pick what we want to recycle and why that helps. Here a few convergence plans and not going into a lot of detail. The residual are at step is a polynomial of degree. N. in eight times are not. And so you may want based on some knowledge that you may have in the Matrix so in many cases when these majors come from parts of differential equations or other problems. We actually know quite a bit about those matrices they're not you know we know where the eigenvalues are in some region and so on. So we can use this knowledge to select Well if we apply a method like Jim Ross how fast the things converge so one simple bound. It depends on what is called what you get the condition number of the eigenvector matrix so. Via the eigenvectors of A So you can take the knowledge that makes that's typically modest but if the eigenvectors have small angles with respect to each other. This one tends to be large for this can be a large factor. At times an issue residual and then times the best pulling only you can find over the eigenvalues not typically we simplify this and take this poor woman over some region containing the iron fields and based on that we can typically give a bounce. On this norm and then bounce increases exponentially in terms of the number that you can take. In some cases. So if A has eigenvectors with small angles with respect to each other. This can be a while over estimate so rather than look at this one we can look at a small variation. So here the eigenvector matrix plays no role but the region over which we have to take this polynomial changes that this is called The Field of values is the set of all failures where if you take a unit vector. And the matrix you compute you store a you and you do it for all possible units you. Then W. is the region that contains the field of values so we can look at this one too. And we're going to into much more detail words. So based on these bombings. We can a see how fast the method converge but it would also allows us to tell you like if you reduce in some sense the spectrum right if you project out things from the Matrix how much faster. Will your conversions be. Right. So what makes for a good math. Well your method needs to satisfy three main requirements right. So the space that you want to recycle for example invariant subspace but the more general things you have to be able to identify and convert to a quickly because your matrix changes every step. So if you don't get a good approximation say from the first system right. The next system is going to be somewhat different. So you'll never get an effect if we cycle space so needs to converge quickly it doesn't have to be very accurate but you need to be able to pick it up. Since your system changes every step. Whatever you're trying to find right. They've been very good for the previous system but it's not going to be accurate for the next system so it has to be really effective. Even if the approximation is quite poor. And of course and as the whatever space you're trying to pick up in various subsidies or something else. Of course is perturbed because you change the matrix so you need to convert quickly to the new version so you could say how quickly does the method learn or adapt well but the results already show you see that these methods are quite effective. So in general this works really well. There's some related questions but those are more problem the pen and write so if you change the Matrix in some way. How does the invariant subspace that you're interested in change. So for specific problems you can analyze that you cannot analyze it in general. So how fast is that conversion so you of course For example if you're looking at the Red Sox There is a lot of that is known because people who want to solve problems do this all the time. And they can tell you how fast certain methods converge. And the other thing is. Typically your method needs to be cheap. Because if it's too expensive. OK you may convert faster but you know you really paid for things right. So you're not getting anything. So here are some results for correct propagation problem. And this is more in detail. So this is the convergence. Per step so this is a total number of matrix vector products to solve for five time steps in this crack propagation problem. And so this is the convergence of full gym rats that's the best method that you can have if you solve one system at a time and OK so this one just starts with this one to cough and then this the sister eight lines here is our method with recycling. And the thing you can see is that the rate of convergence that we typically have is the rate of convergence the gene rests has at the end. Right. So it start. Slow and then it speeds up and we're able to just pick up that fast speed and have that for every step. Right. So this is how the norm of this residual that we compute goes down. This is the ten block of it goes down over the number of steps one so. So I'm going to give you one result so I talked about how fast things actually converge if you have a very poor approximation so this is a brand new results. We have both for her mission made system somewhat easier and we have it for nomination majors which is typically much harder. And then more let's move on to some conclusions. So OK so what can we say so. OK but do we want to say we want to say that if we have a very modest approximation of an invariant subspace that's what this band is about. That we get much better convergence. And as I told you that typically you may have eigenvalues say close to the origin. Those are annoying and I cross that far away from the origin. They will give you a polynomial it gives you first convergence. So typically you have some of both right so let's say lender Y. contains a nice eigenvalues and then the Q. contains the annoying ones we want to get rid of this part. So associated with the Q. is a set of eigenvectors that I put into the matrix Q.. And so why has the complimentary set of I. So a has this kind of decomposition into invariance subspaces Q. with the Q. R. in values and Y. with the line the Y. and. Now let's say we have computed a recycle Space see that approximates Q. in the following sentence not approximation in the usual sense it approximates the following sense that if I pick anything out of Q So Q.S. orthonormal column. Pick any vector out of Q. and A project. The C. components. Then what is left is is some vector of size Delta at most which is less than one or. So of course if belt is very small then I have captured Q two very high accuracy. But what we see just by measuring for a range of problems. Delta is typically only if the exchange from one step to the next filter is only maybe ten to the minus two. So for somebody would be solved and I can tell you this very poor. But it's good enough for us and that's what what this the next step the bangle show. So. So the matrix A or. So I'm projecting out of every step. I'm projecting out the directions in the spend by the C. and so let's say the matrix E. W. is unitary. So that means in my it's a ration I'm essentially iterating over the space spent by the W. vectors the range of the matrix W.. So now what. I can show is that if I pick vectors only out of that space because that's what I mean training over and I take the field to filter the range of eigenvalues by a. Then. OK initially of course that went over all the land the Y. and all the LED The Q. Victor is not projected out most of Q So how much. What is the range now. Well it looks like follow so at the top. I still have the next over the LEM the why that didn't change the leaders say I can feel it away from the origin and the small ones well it's the minimal the Y. multiplied by one minus Delta squared. So if Delta were very close to one this could be close to zero but even for a Delta ten to the minus one tenth or minus two. That's what we can easily get right. This is point nine nine nine nine times that I can fill you right. So if the letter Y. really is a set of values that is clustered away from the origin the portrait of these remaining parts is a very modest perturbation of those eigenvalues So let's say even very modest a curacy will give me convergence as if all the Q. I can feel is are gone. It is this extra term now in the case of her mission positive definite problem this will only add to this. So everything moves away from the origin that's fine. If Q. has negative values. OK they can pull this closer to the origin but only to a modest amount because it's multiplied by Delta squared. So if the Q.. As negative but not extremely large. Then. OK For example for Delta is order to the minus to the multiplied by ten to the minus four so. So in general. Right. They will give you only a very modest perturbation of these lower bound. So I won't go into this more technical stuff but OK using this ban on the eigenvalues for a mission problem we can immediately write down a bounce for the convergence rate which is very similar to what you saw above except that this is for a mission matrix so this condition number of the investments is one. For not emission problems we have similar bounce. The non emission case is always harder. So it's a little more complicated and I won't go into the details. Let me show you a few results just for analysis purposes so this the law passed in two hundred by two hundred two is about forty thousand or knowns uses a fairly standard precondition. And what we want to test here is just how quickly does the method learn right. So we solve the same system multiple times. So it has no practical value but it does tell us like how quickly does the method improve. In terms of speeding up conversions. So C. G. and min restos are optimal methods but they don't recycle anything right. So this is what you would get for the first system. So you take baby to monitor ations if you do this recycling So what you solve it once and then use a few vectors what you say here is only fifteen vectors. The forty is the number is the size of a space over which you compute that approximation at each step. So you save only fifteen vectors. You can already see you do much better. And OK And then as you can see if you do multiple steps it gets better and better and better. And so this line is the bombs that are convergence this convergence result you saw in the previous slide gives you. And this curve these red triangles is to convert. If you exactly projected out the fifteen this correspond to the small size. So you can see this bone is pretty good in this case and in practice the method actually does better and we can explain why that is because it turns out. Just finding that exact invert substance is not the smartest thing to do but anyway if somebody is interested in the details of that all this coming talk to me. But if we have an indefinite problem. Typically things are much worse. So if we keep solving fast enough. So like if we do it two steps then you get this curve and again it converges very rapidly. But after the first Sol. It turns out and this is just for showing how this works right the first four eigenvectors are very accurate but the fifth is not and so this delta squared is not really small and so you see the bone is not that good but it does it essentially mimics the initial convergence rate of your method. The fact that it speeds up the bank cannot measure. So it misses the fact that. So you prove methods typically do better than those bounds suggest. Right. So some more. A few more general remarks in the my conclusions right so first of all. So this idea gives you with modest effort right. It gives you typically much faster convergence and so a lot of people for many applications have started to use this. But there are kind of so. And of course if you solve harder problem so I suppose you solve over a large range of primitive values so you solve some complicated nonlinear problem or for example people who do order models. So you have a larger than them across system. And it's too expensive to solve this accurately but you actually know that for some set of inputs and also that you're interested in you can represented by a much smaller system which is the reduced model right. But now people want to do this for highly nonlinear problems or for permits rise problems so they have to do many of these. Of course in each of these steps. So the way the computers reduce in the malls actually involves solving the mystery can at the bottom level you can use this technique and people are doing that but obviously you can use the same idea here at this higher level and I think that's a very interesting approach to take you have to so. So you could say like if I have a reduced model for something Amoco system. Is that the medical system is premature I so I'm tracing in some sense in premature space. What is really the best way of tracing those reduced models over multiple you know over multiple parameters. And it might be that you can do that much better much more efficiently to do it directly at that level rather than kind of. Say Well we have to solve some set of linear systems again that are not too different from the previous ones and used to be cycling at the bottom. So some new things we want to do is write we want to think more about. Very generally at a high level what computer results can be exchanged between problem instances in the special as we do more and more complicated problems where we do everything at once we do both the optimization of design and the uncertainty qualification so we want to for example want to design that optimizes Well that is minimizes the risk of failure writes you doing a lot of these things all at the same time. So you want to take that into account as you solve. As you do your simulations. Right. So some conclusions in future work. OK So I think what I what I hope have shown is that we cycle in the search space is very effective for a wide range of applications. OK. And those techniques are fairly cheap. They're not for free. So there definitely are some problems out there for which it doesn't really work because say the matrix vector product is so cheap that as soon as you have to project out a few extra vectors you have to reduce its relations by a huge amount for it to be effective but for many problems it works really well. OK many problems that have localized changes in the problem. Small changes in those small I feel invariant subspaces So for example as tomography. And the changes in terms of the matrix from one matrix to next are actually not small. You cannot use ten approaches to show that they're invariant subspace doesn't change but the more detailed analysis shows you that it will be nearly the same and so that these methods work. They say. Based on based on the background of your problem right. You can analyze whether these techniques will be effective or not. OK Well so one thing we recently done right is so at least for some type of recycling we now have an effect of convergence theory we have a conversion the media tells us like even for modest approximations how much things will improve. And of course that shows that it's actually cycle space that this recycle space does not need to be accurate. Now we from experiments we already more or less knew that but now we really know it but as we prove this. If you have problems that change a lot like if you want certainty qualification in some sense all your mates sees or approximations to one ever it's matrix. But if you do a delusion of a crack. Right. Your mate is really become different more and more different in that case you need to update fairly regularly. So this is not something I shown here but if you're interested I can show some other slides. So you need to track the problem to keep fast conversions. What we saw that some of those bombs actually this is for the mission case depends more on the field of values really than on the eigenvalue So one thing we can do. Of course is base our algorithm directly on that we have not done that. So that's an interesting new new step we could take. OK I think we have a very nice bang for the noncommissioned case but the proof is not so nice so that's another thing and of course you know proving a bond is one thing it tells you something. But really what you get out of the proof is understanding what is going on and so a clean proof has a purpose. All in itself. And another proof of the same case. Mathematically has no. Different meaning but for understanding it can be very useful and so we have software available for if you. If these methods might be of use to you. You can ask me and Mike parks and other people at Sandia are actually building them into a trillion or so if you're using your more or less getting access to these methods for free. And that's it. I have one more slight with a couple of papers you may or may not want to read. OK OK. Yeah. For example. Well. So in this case this was for the min raise right. So you saw that one of the pens but the idea of the forty actually reflects the following. So for methods like the morass. You cannot build if you have to do a thousand iterations just for that if you do forty days you could just do forty and you're done. But suppose you have to do a few hundred. But if your system is very large That would mean solving say or keeping four hundred. Vectors. So typically you can't do that. So you have to restart and we restart in a smart way. But OK But now for minarets you don't have to restart right because it's a short we current method you can just continue you don't need to store those vectors. But of course if we want to approximate invariant subspace we have to store a few of them. So what that forty reflects is that we do forty min rest steps then reduce it to an approximate size fifteen. Then do another forty steps of me then take those now fifty five factors go back to fifteen and continue all the way and even across multiple systems we just keep going doing this. Yes. So you can do smarter than that. But well I mean so first of all you don't know what the future brings in some sense right. So you always have to make some assumptions but so it turns out although for analysis. This is nice because we understand invariance subspaces very well but so some other methods are better and they. They themselves to how the system evolves. So he said he of looking at the canonical angles. Actually what you'll see is that as you solve the system it approximates different subspaces all the time. And that's better. So for a single system that gives you better conversions so. So this is linked right so if you do if you do some of these methods for rest where you have to do some type of restarting anyway. Then it has similarities. Right. But the point of course the nice thing that we do or what we try to do is that we do it all on the sides but it has links to take in this case for example approximate subspace and you restart Now the point of course is when you think we start say our northern whatever you're only trying to find an eye in space so. So you're you're iterating closer and closer than ever in subspace So for us. That would make no sense because once I find invariance once I find an approaching event subspace the components for those eigenvectors are more or less projected out of my residual So if I did something like the restart and all the right. I would find a very accurate out in space but my solver would just stop right so the point here is that you have to solve your linear system and you need to do the computation for example on the sides and very cheaply. That's also why we don't get them very accurate. But it's good enough but but at the level of the algorithm there's obviously some similarity with this. Yeah sure sure. So in fact. So one person on this long list that was there was my butcher and he's just starting. But he's planning to use it exactly for that. So. So for example if you use the matrix polynomial say solve differential equations. Well you know functions Macy's is certainly also become a very hot topic right so. So there too. Yeah. People are using or at least some people are using trying to use these methods. I have not done it myself so I cannot tell you too much about it but yeah I think so. I think so so so if you want to for example. Depending on what you want to do right. You may want to apply some rational function to the Matrix right so that it involves a number of soul and so obviously then this could be very useful. You know I would say I think effect applying functions of times a vector all by itself is a very active research area and a lot of people are really even though you know people started doing it a long time ago. So there's a very old paper bank from divorced. About this but it's actually only recently became very very hot topic. So. It's a good question. Well. So one thing that saves a lot is when I started doing this. I asked people with interesting applications to do this in the nobody was interested. There's a lot. Well just sort of one system fast and we're happy and then so luckily that. And Illinois running to crack. Propagation and software and then once we did it then suddenly everybody else wanted to use it too. So have I thought about it differently well. So it started the reason it started is because we were trying to do smart methods for single systems so. So for example this method G.M.'s right has to do this full of nation of a search base if you had to do with thousand steps until you converge. You have to store a thousand vectors so you don't do that you restart it you do twenty steps computer new solution that you knew initially as you do it again in many cases your conversion is horrible. Right. So even though. One thousand might converge in say you know eight hundred. If you reset every twenty you may never converge. So what we worked on then was just a get the idea was somewhat similar finding something out of that source space that if you restarts you keep the same convergence rate is close to full dress. So after we did that the idea to do this for mates is like well if you make things a little bit so I was working with people in these letters concret UNAMI. Which is also they need to do it for different systems in some quantum not quantum but in Monte Carlo. So we thought well just try and work pretty well so then we kept going. And so course then kind of analyzing like why does it work has been one step that move things along then doing it for very different problems. Right. So there's been not a thing and then there's people. So for example into Margraf you're not only do it for a sequence of systems but we also have to sew for many frequencies so this all looks like mates is of a constant times today and you have twenty of those constant so I want to do all of them at the same time. So there's been a whole range of variations and then. OK And then the next step. I think preconditioning was another thing right. Well if you still computing a precondition for each of these systems let's see if we can do that too and that some other people have already started doing that. So there's a reference to people like tomorrow and some people from the Czech Academy of Sciences had already a few papers with us like well you know the studio is doing this or we can do something similar for preconditioning and they're very special problems but it worked pretty well. So so that's maybe the evolution and now we want to say like well if you do this receives a linear systems insight model reduction. We should do it for the model we don't directly rather than you know at the bottom. And actually there too so it's a sign manual meeting I made a student of trouble for and they are looking at some of these things as well. So OK.