Today we're very happy to have with us
Professor Eric disturbed from
Virginia Tech and I've known Eric for
a very long time I won't say how long but
I was actually a student and he was.
A fresh graduate doing in my opinion and
in my opinion.
Today as well.
Very very cool things in the area in
many areas including this area of
subspace recycling.
Eric received his Ph D.
from the University of Technology.
And after that he spent five years.
In one nine hundred ninety seven he was
one of the prize winners of the Fox prize
in numerical analysis one of the things
that I was very jealous of him about
of course I had graduated
by that time story.
And then in one thousand nine
hundred eighty joined the faculty of
the Department of Computer Science at the
University of Illinois Urbana champagne
and since two thousand and six he has been
with the math department at Virginia Tech.
He is of course involved in
many activities including
organizing conferences serving on their
program committees and also on editorial
boards of many many journals including
the Journal of numerical analysis.
Thank you Eric.
But please help me.
I want to thank you so.
All right so
the topic of the talk is sequences of
problems makes sense solutions and
I'm going to focus on solving.
Sequences of linear systems and
then the main idea behind that is that
when we do simulations or when we optimize
it turns out that in many cases we have
to solve a long sequence of problems.
And it pays off quite a bit to try to do
something smarter than solving
all of the as if they were in.
Visual problems.
So this is been work with a range
of people several students
they seem to either go to a bank or
to send via But there's a large number
of collaborators and problems in tomorrow
feel as we should Kilmer some civil
engineers at the University of Illinois
who need to build an aerospace engineer.
People in competition of physics and
competition or quantum chemistry.
A number of people in
the city qualification and
some recently people doing.
Acoustic switch which typically has very
hard problems and actually so young spent
the summer working with me in Virginia and
some very nice results and so I'll come
back to there's a list of applications
right here and this work has been funded
by the National Science Foundation
the Department of Energy and NASA.
So OK so the overview of the talk is so
we're going to talk about you know why do
we want to solve sequence of linear
systems and I'm going to give
a couple of explanation a couple
of applications and some examples.
So you can kind of see how things work.
Why you know what kind of different
steps we have in these algorithms.
And to show you that it really works and
it pays off and then we'll
move back a little bit and look at some of
the mathematics that is under the hood.
Right.
So what makes these methods work and
kind of want to show that there is
very solid mathematical methods and
analysis in fall in making this work
a little bit maybe a look at the future
somewhat depending on the time in
the conclusions and some extra explain for
religious conclusions of things that
I didn't present in the talk but
could have if there were more time.
And some future work.
OK.
So on so many computational
problems involve a sequence of
systems with either small changes or
highly localized changes or
changes that have very special structure.
So linear says.
You know X.
is equal to be is in the matrix equation.
And those make change in various ways.
And the reason to dispute
extra time in smarter methods
is that these linear systems solving
these in the course of the simulation
may take ninety percent or so of the time
that is to say if you can do really well
on those systems right now you will
speed up your application significantly.
So what kind of problems are they so
this time the call evolutionary problems
because some problems are like time
like but there's no real physical time and
also there's a problem.
So you solve an evolution of problems
in your problems in optimization so
as soon as you do anything nonlinear in
more than two variables you can solve it
directly in C. typically you have to sort
of there's a sequence of linear systems.
There's many mutant type methods and
so on.
Parameter estimation so
that's something that currently
gets a lot of attention of course it is a
nonlinear optimization problem as well but
so here are typically problems where.
You have some measurements.
And based on some parameter choices
your system approximates reality more or
less accurately.
So the idea is that you find this
the parameter set the data that you have.
And so you have to solve again you have to
solve a sequence of systems there is not
time involved but you some things you
march through some parameter space and so
again you get sequence of
process are very similar.
Many people in.
Quantum chemistry and compositional
physics use so-called Monte Carlo
methods and markets aim of
the qualities or methods for
efficiently sampling very high dimensional
spaces and again you solve in this case
up to millions of systems that
change very slowly over time.
So not a class of problems.
So here typically you would
get mages that least look
the same they may be slightly
different but the SO here.
If you discrimination changes
your matrix changes completely.
But the underlying physical
problem may be very similar.
And so again with with some extra care
you can take this into account and
get more efficient solutions.
And the other part that that is important
is that the actual simulation or
optimization requires hundreds or
thousands of systems that are very
large they can go in a millions or
you know at the national level billions
of our gnomes and as I said for
some of these problems you may have
even you know a million systems a song.
And some applications or
it is so crack propagation.
So you're looking at how cracks say
something at the level of a micron
say propagates to an airplane wing.
Right.
So they're already you can see you need to
do this very accurately but at the same
time the structural properties of a wing.
Will not change dramatically if that
crack propagates by a micro right so
you can take this into account.
So it's a policy optimization is
a particular set of algorithms for
optimal design and I'll show an example
of that in a moment tomography you
send a signal through a medium.
You only have the measurements
on the outside but
you want to know what
the medium looks like.
So there's many problems and one one set
of applications actually that is very
interesting so-called
uncertainty qualification So
typically when we do a simulation we have.
Real issues idealized
properties of the material.
Right.
So if you look at the wall
that was not going to be equally
strong at every part of the wall.
But there's no way you could
realistically model that.
So what you do is you issue
some average say strength and
some distribution of those
strength around that.
So if you want to quantify
the risk of failure you need to do
you need to solve again
many realisations and come
together the statistics to get the answers
to this thirty six of the solution.
And so I don't actually have done
qualification myself but many people in
this field have actually used these
methods to solve their problems faster I'm
going to show you an example there was
obtained by some other people right.
So the main idea is to exploit
the slowly changing nature of
the problem to get much faster solution
rather special structural changes.
So there's a few things you can do to.
The obvious things in a time dependent
problem of course is you use your all
solution as an initial guess for
your next time step.
Everybody's been doing that.
That's not the thing really interesting.
So what we'll see is that many of
the methods to be used for solving these
problems build the search space and
find an approximate solution in sight.
That's what space and building
the search space takes most of the time.
So what we want to do is we want
to recycle is adapted we use those
search spaces to jumpstart the next
system from one step to the next and
make things much faster.
There's some other things you can do so.
If you don't know what preconditioning
is I'll tell you that in a moment but
the main idea is that it's
a transformation of your little system to
another linear system
that is easier to solve.
And of course that can be expensive or
computing that transformation
can be expensive.
So if you have a sequence
of systems there too.
It may make sense to update those
preconditions every step rather than
computing from scratch and several people
have worked on that we done this for
adaptive mashing and Monte Carlo problems.
And other people have done it for
it's a fluid dynamics problems and so on.
So here's a list of.
And it's only a partial list
of applications that I or
other groups have used these methods for
and
you can see it runs really through a large
range of problems and since two weeks.
We're also talking to some
people doing astrophysics so
since we do quantum Monte Carlo then for
electronic structures it's the most basic
level of atoms and electrons and then
hopefully soon will do astrophysics to I
can really say that
this method spends now.
All ranges.
Of scales in the universe.
OK so here's one.
Application.
So it's different differential equation
this is a Helmholtz problem occurs in many
electromagnetics in many other fields and
this is a very simple model problem and
this comes out of somewhere.
From site University of ball there
has been using these methods.
And so the parameters that govern
the sports differential equation
are still casting.
That is to say we don't know exactly
what the diffusion is but we know what
the average and some of the moments of the
probability distribution of this is and
we want to know and the same for to say
can you can think of these as chemical
reaction or absorption and what we want to
know if you want to know the statistics of
you what is the statistics of the
solutions that come out of such a system.
And in order to do that.
There's many ways for
doing this on the qualification.
They have to solve so
this is what this graph gives you
on the bottom they have to solve
more than five thousand systems so
somewhere between five thousand and
fifty five hundred linear systems.
These systems can be very large for
this since this is a model.
It's not that large here but
in realistic situations these
systems can be very large.
And for each system what you can see on
the on the Rye axis is the number of steps
so the size of the search base that
was built to solve this linear system.
So so so this is the number of
matrix vector products in some sense
that was required and so of course if
these are very large and very expensive.
This is expensive.
So this is a like using a standard method
you can see some some
problems went fast but
many are in the Take hundred fifty three
nations or two hundred fifty iterations.
Some were faster and
again some some more slower again.
And if they use this idea of recycling so
using specific compute results from
one system so part of the subspace and
I'll show in a moment how it works but
to jumpstart the solution of the next.
The virtually all the systems were solved
in roughly twenty five steps right so.
For some of these defectors six percent of
these that effect are ten in the reduction
of the total amount of work up to
some of the some involved too so
it can really improve the amount
of work that you need to do.
Very drastically.
So here's another set of case and
so topology optimization.
The to pay.
Comes from the notion that the shape
of the design of the policy
of the design is unknown in advance so
think of the following problem.
I'm giving you a fixed amount of steel.
And you're asked to build
the strongest Bridge say right.
So you can and strongest in this case
is measured by the energy taken up by
the structure which is called
to compliance so you transpose.
Case the stiffness matrix row
is the distribution of material.
So this is a measure for
the energy that the structure takes up.
And this is how the system
behaves right so
under some load you get some solution
you when you feed it in here.
So how does this work.
You put a lot on your system you compute
how everything deforms the final analysis.
Then you look at kind of the derivative So
you say like well how would
the compliance the energy be taken up by
the structure how would it change for
a local redistribution of material.
So you compute that for every part of your
domain and then you do a global update
that can be some advanced method or
can just be a gradient type approach.
To redistribute the material so
you can see like a little cartoon here
how the optimal design comes out and you
have to go through this loop many times.
OK.
So here is an example.
So we have a simple structure this
just came out of the literature
support of the two points
you pull are two points.
We're only going to simulate one quarter
of this the rest will be the same
by symmetry and
you can see that the final design.
Looks like this structure right so
it's not something you could
easily imagine how it would look.
And I'll show you a little movie how
how the system how the solution arises.
During the simulation.
So this was recorded.
Right.
This is not real time.
So you can see here to not only do
you know does the solution change and
it means every time you solve a linear
system because the materials be
distributed differently.
The elicited.
That you're solving us change but in this
case we're also changing the mesh as we go
because doing computations where you have
holes is you know it's not efficient.
So this is what comes out.
Here are some results not for
the specific problem that you saw but
like for say finding an optimal bin.
The large problems that we're
actually doing work to up
to two million are known but.
OK running a large number of cases to
see how well the method works we did on
a smaller problem and you can see so this
is how the material distribution changes
this is the maximum material change
materially solution for elements from
this scale here and this is the number
of iterations and this is the number of
optimization steps right so you can
see that there's some initial phases.
In which the recycling goes into a lot but
then very quickly you can see that we
you know we use many fewer iterations so
up to like effect the three.
There's some overhead involved.
So here is the same thing
as the optimization step so
it took about one hundred forty
steps to find the optimal structure.
And this is the reduction in time so you
can see that the reduction in time is not
quite effective three because
there's some overhead so
in this case it's about Effect two.
We can also see it was still improving but
unfortunately optimization was done.
So this is the result this is what that
being should look like it's only half
again only simulating half of it because
by symmetry we know the other half.
OK here is one last example.
So this is for electrical impedance
tomography it's a fairly new marker fi
method and the nice thing about it.
It's not particularly it's not
super accurate but it's very cheap.
And so
this is done with people in Brazil and
one of the at the University of South Pole
and one of the advantages is that you know
with relatively simple materials in a P.C.
you can go into the jungle or some remote.
Village.
And check people and get images and tell
a doctor what they need to do so for them.
This is actually a very
promising technology.
OK And this is this is the results
from some model problem
where they I think but they're insisting
this case is longer rationed so you have.
The signal you have very big jump
in how the signal propagates from
say you know we are mostly water right so
it's very rapid but
then in the lungs which is
mostly air it's slower and so.
This was just done by putting a tube
into some cylinder with water but
they actually did some
tests on human beings too.
And you have to sign a lot of papers
to show those so we're not doing that.
Again so here are the results for
the numerics right so
again this is for slightly different
variations of our algorithm.
The standard algorithm takes something
like one hundred eighty to two hundred.
And then after you know a few initial
steps right some are our method
essentially takes sixty fifty
sixty iterations right so
of between effect of three and
four reduction in work.
OK so how does this work.
How can we make things faster so.
I first need to tell you a little bit
about the mathematical methods that we use
to solve these systems and
then how we improve them.
So we have a linear system to be we're
a as large K. X. is on the effect
of unknowns and B. is the right hand side
and we have some initial Gas X. knocks.
We don't know if we have the exact
solution because we don't know
the solution but we can measure how well
we satisfy the system of equations for
computing the difference between
the right in size in eight times X.
So that's the residual.
And then.
OK At each step.
We want to its roots of Li approximate
the solution at each step we compute
an optimal update Z.
from a space of vectors which is the space
spent by our not a times are not a square.
And so on and this is called the Queen
of Spades of the mention M..
And an optimal method for example may want
to pick that Z. out of the space such that
the norm of this residual is minimized so
in that measure then it's
the best solution you can find and this is
the search space that I was talking about.
Not just the terminal your thoughts like
if you wanted to minimize the two norm.
One thing you could do
you should not do this.
Numerically it's not stable you can
just take those vectors stick them
into a matrix columns.
Parameterized solution.
OK Kate times that are so this is just
a linear combination of those columns and
then it's just a least squares problem.
OK so what efficient method to do is just
each step efficiently and stable e solve.
So this problem as you
extend this matrix and
as I said you cannot just really built
this matrix that would be unstable.
And one of the most popular methods was
proposed by a former colleague of his soon
as you said this gene best
method which stands for
generalized minimal residual right
that's what you're doing here.
You're minimizing it in the true one.
And there are some older papers
that are breaking the same but
simply not the algorithms were not
as efficient or not as stable.
So OK so
we need to find an efficient method for
doing this but it's really
solving least squares problems.
But to understand why
these methods are fast.
We need to look a little bit further.
So what you can see if you look at
these vectors they are all poly there
are powers of eight times
these initial vectors.
So if I take a linear
combination of those factors.
What I'm really building that's of course
what this is that if you think of it.
Starting with coefficients
set on Ansett one right then.
This is what you're building.
And what this is is really
a polynomial in The Matrix.
A TO THE POWER zero into the power
one updated part and minus
on this is a plain normal in the midst of
the minus one times this initial residual.
And then.
OK.
In principle this polynomial is arbitrary.
You can pick anything you want.
Of course if you want this solution.
It's a very specific polynomial.
But in Prince we can pick what you want.
But what we want is we want
to solve the system right so
what we want to see the residual The
difference between the right hand side and
eight times X. you want it to be small or
a zero preferably.
So this residual It turns out is a
polynomial too because it becomes our not
minus and then I get eight times
this polynomial times are not so
this is now polynomial of degree one
higher but it has one annoying feature and
that is that it's constant term is
one that is to say this pulling over
is not arbitrary it has to set
this fire the constraint that it's
constant term is one that
means it's one at the origin.
Right.
So anyway so this is a degree of degree.
M. in a Times are not.
OK So this is what the algorithm would
look like to do this efficiently.
And then we'll continue with the analysis.
So you want to solve the system
you pick this residual or
not you divide it by its norm so
now it's unit vector and
then you build the space the discrete
of space as follows At each step.
So you have the one.
You compute V two by
multiplying V one by a.
Orthogonal izing against the one.
And then normalizing and
then you know you compute V. K.
by multiple one by multiplying the K.
by a normalizing it against the one FI
two and so up to V. K. normalizing So
this gives you a set of orthonormal
basis vectors for the scroll of space.
And then you can show that
the score fissions for
eight gives you a small employers one
by amylin the least squares problem
who solution is equivalent to the solution
of the largely squares problem.
So rather than solving and where N.
is HUGE say millions by M.
M. a small problem really solved the
square sprawl of size and pulls one by M.
and you compute this problem as you go.
And so this becomes a very popular
method if you try it out on
Google Scholar you get something like
four thousand citations or so which for
a math paper is pretty spectacular.
OK So this is what is being done and
then you update X. as you know this
is computed zat that the solution to
the smallest squares problem times
this set of vectors feeds you.
Computers that spend the space and so on.
All right so.
If we had to do a million steps for
a prom of size a million.
This would not be efficient but it turns
out that even for very large problems were
sometimes done in fifty or one hundred
steps so why why does it go so fast.
Well the reason for that is that
the crew of space is a space of
polynomials in the matrix times a vector
and so it inherits the approximation
properties that pulling on us have on
the real line are in the complex plane so.
Let's assume a is there going to light.
So A is Fifi here are as it's called So
the idea effect is typically
normalized eigenvectors times.
The with the I can feel
used to be inverse.
And then.
OK powers of eight squared will be
the inverse times the same thing.
So this drops out and I just get
lambda squared here in the middle and
more generally eight to the power Case V.
led to the power K. Times v inverse.
So if I take a polynomial
in this right you just
write your linear coefficients
times this matrix.
So it's a linear so you can move the two
left feet inverse to the right and
what you now get and that's really what
is the crucial part is that you get v.
And now you get a diagonal matrix.
Where on the diagonal essentially you
apply this polynomial to every eigenvalue
individually.
OK so.
So you have to keep that in mind and so
approximate solutions to linear systems
and I can tell you problems and many more
problems really rely on this property.
OK So why does this now work well.
Well let's assume the I can feel
is of a lie in some region.
And lets say that region is some
part of the complex plane or
if they make the symmetric or her mission.
It's on the real line.
And we know polynomials have.
Approximation property so
let's say you try to approximate
the function one over T. by a polynomial.
Now if this region that
contains the eigenvalues
is not too close to the origin it
turns out you don't need a high
degree polynomial to get
an accurate approximation.
And it means that there is some
low degree polynomial that if
you know apply this to a.
You write out this expression you get V..
Times one over T. time so
K V In first times are not so
that I call that row here.
But essentially what this means is that
you apply the inverse of a to your vector
and that's all you need you
want to solve linear system.
So if you can find a low degree
polynomial so appallingly with modest.
Such that this is satisfied.
You're in business you have a fast solver.
And it turns out that if
the I can feel it or not and
some are not too bad this can
always be done for small and right.
So if the region Ahmed is nice and
that typically means that your I can feel
you are clustered away from the origin.
Then things go fast.
Right now of course that means you
can imagine from only heart problems
this is not the case.
So what do you do well in that case we
do the following we find some matrix P..
And think of P.
as a matrix that approximates the inverse
of a in some sense you don't want
the inverse of a that's too expensive and
then we iterate with P. eight times X.
So the idea is then the P.
A will have crossed the diagonal use.
But multiplying by P.
is still cheap and in many problems we
know exactly how to find such people.
Of course computing those from scratch
every step would be expensive too so
that's why we also want to be able to
update rather than compute from scratch.
So that's why cream of methods
has become very popular.
OK So now here's the next step.
Right.
We solve the sequence of problems.
Let's say we know that mostly our universe
are in a nice region but part of our
spectrum is not some eigenvalues are close
to the origin and are pulling on.
Born at the origin so
it won't damp things there.
It won't look like what
we wanted to look like.
So.
The idea is that if we know the vectors
associate with those eigenvalues we
can project them out of the system because
if we project them out then essentially
we can think of the eigenvalues as gone
it's not quite as simple but by and
large of how it works.
And so then we have fast convergence and
since the region often doesn't
change that much of the vectors that spend
those invariant subspace is don't change
too much we can project those out of every
system in the system kind of learns as
you solve sequence after you solve
linear system after a linear system.
OK So let's say we have a couple
of vectors that span the space
of interest in we make them
the columns of our mates.
If you till the.
And eight times you kill
the then is the image of this.
That's what we're going to project and
so we want to do a Q.
are the composition we find
an orthonormal basis for c till the.
And then are or
are in the first if the transformation.
Never dissipates exactly never computed
whenever we need to multiply by you.
We just apply this matrix to defect
that we because fission vector first.
OK so now we so now we know
the inverse over the rings of C.
so we can project it out
of a right hand side.
That gives us an initial guess.
And now we essentially do exactly
the same algorithm we did before.
So this iteration is more or
less like with one extra step that we
project out everything in the direction.
C..
Before we continue.
So we so
initially we computed by multiplying eight
times we came on this one and
then orthogonal I's so
now we do the same thing but
we first orthogonal is against the C.
vectors and then we have focalized
If the vector is against each other.
This is not very expensive.
This summer overheads.
And now we have a slightly different
optimization problem to solve so
rather than just finding the optimal Y.
which is the optimal vector in the space.
We also have to find
the optimal vector in U..
But it turns out if you
set up this problem.
The right way and we did this some
time ago for actually for a different.
And this is just two completely
separate optimization problem so
the least squares problem you need to
solve is exactly the same small least
squares problem that you saw for us.
So it's very cheap and then dizzied it
you need to find is just minus this Y.
Times inspect to be that you computed so
be estimates you see Star avi so
you already have this.
And that's all you need to do.
OK.
Now there's different things
you could use and OK so
in do doing this has the effect that
you just saw for many applications so
in many cases we reduce
the number of iterations by for
example effect attending this uncertainty
quantification problem right.
Effect three or
four in the tomography problem and so on.
Right.
So the idea is discrete of methods build
the search space and get the solution by
projection and the search space
that's often the dominant cost.
So if you can reduce the number of steps
you need to take you give a smaller
space to get the right answer.
If you're in business.
So the initial convergence is often
relatively poor and only when the search
space reaches of reasonable size you get
super linear faster convergence right.
So the idea is to get
the fast convergence rate and
a good initial guess if the right insights
are the same immediately by recycling
selected search spaces from previous
systems that's the main idea.
So there's different things we can do and
so here I'm focusing somewhat on invariant
subspace So those are spaces
spent by factors essentially.
And other people have done similar things
for just solving a single linear system so
not for a sequence or run of Morgan's
that some work from that and then for
sequence of problems we propose this
my part was one of my students and
soon one was another student.
There's other things you can do again
mostly for a fixed Matrix or not for
a system that changes so one of
the things we've done that is really
new is if you're makes exchanges
from one step to the next.
But other people have for
a fixed look at these cases.
There's something quite different.
You can.
And there is if you look at the spaces
you generate You can look at the angles
between those spaces and
that means you can measure how much those
spaces from one step to the next or from
one system to the next change and you can
filter out you can kind of find out which
components do not change very much.
So those are worthwhile keeping because
then you don't have to build them again.
And off.
Obviously in many nonlinear problems.
You often find that if you keep
the sequence of previous solutions.
Then you can use that to to
improve your conversion rate so.
Paul.
Fisher did this a long time ago for
not a Stokes problems just to get
initial gas not necessarily to speed up.
Its ration.
But for tomography problems.
Again this non-linear parameter
optimization problem.
We've been doing this.
More recently even in problems where
they make the changes at every step and
we speed up the convergence.
And there's a new project that I've been
working on with a number of physicists
David separate the University of Illinois.
In quantum of the Crow So
they're the mates exchanges one
of grow at a time at every step.
And you need to solve a linear system
with sheets of those matrices.
So just to kind of give
an idea of why how we pick what we
want to recycle and why that helps.
Here a few convergence plans and
not going into a lot of detail.
The residual are at step
is a polynomial of degree.
N. in eight times are not.
And so you may want based on some
knowledge that you may have in the Matrix
so in many cases when these majors come
from parts of differential equations or
other problems.
We actually know quite a bit about those
matrices they're not you know we know
where the eigenvalues are in
some region and so on.
So we can use this knowledge
to select Well if we
apply a method like Jim Ross how fast
the things converge so one simple bound.
It depends on what is called what you get
the condition number of the eigenvector
matrix so.
Via the eigenvectors of A So you can take
the knowledge that makes that's typically
modest but if the eigenvectors have
small angles with respect to each other.
This one tends to be large for
this can be a large factor.
At times an issue residual and
then times the best pulling only you
can find over the eigenvalues not
typically we simplify this and
take this poor woman over some region
containing the iron fields and
based on that we can
typically give a bounce.
On this norm and
then bounce increases exponentially in
terms of the number that you can take.
In some cases.
So if A has eigenvectors with small
angles with respect to each other.
This can be a while over estimate so
rather than look at this one we
can look at a small variation.
So here the eigenvector
matrix plays no role but
the region over which we have to take this
polynomial changes that this is called
The Field of values is the set of all
failures where if you take a unit vector.
And the matrix you compute you store a you
and you do it for all possible units you.
Then W.
is the region that contains the field of
values so we can look at this one too.
And we're going to into
much more detail words.
So based on these bombings.
We can a see how fast
the method converge but
it would also allows us to tell you
like if you reduce in some sense
the spectrum right if you project out
things from the Matrix how much faster.
Will your conversions be.
Right.
So what makes for a good math.
Well your method needs to satisfy
three main requirements right.
So the space that you want to recycle for
example invariant subspace but
the more general things you
have to be able to identify and
convert to a quickly because
your matrix changes every step.
So if you don't get a good approximation
say from the first system right.
The next system is going
to be somewhat different.
So you'll never get an effect if we
cycle space so needs to converge quickly
it doesn't have to be very accurate but
you need to be able to pick it up.
Since your system changes every step.
Whatever you're trying to find right.
They've been very good for
the previous system but
it's not going to be accurate for the next
system so it has to be really effective.
Even if the approximation is quite poor.
And of course and as the whatever
space you're trying to pick up
in various subsidies or something else.
Of course is perturbed because you change
the matrix so you need to convert quickly
to the new version so you could say how
quickly does the method learn or adapt
well but the results already show you see
that these methods are quite effective.
So in general this works really well.
There's some related questions but
those are more problem the pen and write
so if you change the Matrix in some way.
How does the invariant subspace
that you're interested in change.
So for specific problems you can analyze
that you cannot analyze it in general.
So how fast is that conversion so
you of course For
example if you're looking at the Red Sox
There is a lot of that is known because
people who want to solve
problems do this all the time.
And they can tell you how fast
certain methods converge.
And the other thing is.
Typically your method needs to be cheap.
Because if it's too expensive.
OK you may convert faster but
you know you really paid for things right.
So you're not getting anything.
So here are some results for
correct propagation problem.
And this is more in detail.
So this is the convergence.
Per step so this is a total number of
matrix vector products to solve for
five time steps in this
crack propagation problem.
And so this is the convergence of full gym
rats that's the best method that you can
have if you solve one system at a time and
OK so this one just starts with this one
to cough and then this the sister eight
lines here is our method with recycling.
And the thing you can see is that the rate
of convergence that we typically have
is the rate of convergence
the gene rests has at the end.
Right.
So it start.
Slow and then it speeds up and we're
able to just pick up that fast speed and
have that for every step.
Right.
So this is how the norm of this residual
that we compute goes down.
This is the ten block of it goes down
over the number of steps one so.
So I'm going to
give you one result so I talked about
how fast things actually converge if you
have a very poor approximation so
this is a brand new results.
We have both for her mission made system
somewhat easier and we have it for
nomination majors which
is typically much harder.
And then more let's move
on to some conclusions.
So OK so what can we say so.
OK but do we want to say we want
to say that if we have a very
modest approximation of an invariant
subspace that's what this band is about.
That we get much better convergence.
And as I told you that typically you may
have eigenvalues say close to the origin.
Those are annoying and
I cross that far away from the origin.
They will give you a polynomial
it gives you first convergence.
So typically you have some of
both right so let's say lender Y.
contains a nice eigenvalues and
then the Q.
contains the annoying ones we
want to get rid of this part.
So associated with the Q.
is a set of eigenvectors that
I put into the matrix Q..
And so why has the complimentary set of I.
So a has this kind of decomposition
into invariance subspaces Q.
with the Q. R. in values and Y.
with the line the Y. and.
Now let's say we have computed a recycle
Space see that approximates Q.
in the following sentence not
approximation in the usual sense it
approximates the following sense
that if I pick anything out of Q So
Q.S. orthonormal column.
Pick any vector out of Q. and A project.
The C. components.
Then what is left is is some vector
of size Delta at most which is
less than one or.
So of course if belt is very small then I
have captured Q two very high accuracy.
But what we see just by measuring for
a range of problems.
Delta is typically only if the exchange
from one step to the next filter is only
maybe ten to the minus two.
So for somebody would be solved and
I can tell you this very poor.
But it's good enough for
us and that's what what this
the next step the bangle show.
So.
So the matrix A or.
So I'm projecting out of every step.
I'm projecting out the directions
in the spend by the C.
and so let's say the matrix E.
W. is unitary.
So that means in my it's a ration I'm
essentially iterating over the space spent
by the W.
vectors the range of the matrix W..
So now what.
I can show is that if I pick vectors only
out of that space because that's what I
mean training over and I take the field
to filter the range of eigenvalues by a.
Then.
OK initially of course that went over all
the land the Y. and all the LED The Q.
Victor is not projected out most of Q So
how much.
What is the range now.
Well it looks like follow so at the top.
I still have the next over the LEM the why
that didn't change the leaders say
I can feel it away from the origin and the
small ones well it's the minimal the Y.
multiplied by one minus Delta squared.
So if Delta were very close to one this
could be close to zero but even for
a Delta ten to the minus one tenth or
minus two.
That's what we can easily get right.
This is point nine nine nine nine
times that I can fill you right.
So if the letter Y. really is
a set of values that is clustered
away from the origin the portrait
of these remaining parts is a very
modest perturbation of those eigenvalues
So let's say even very modest a curacy
will give me convergence as if all the Q.
I can feel is are gone.
It is this extra term now in the case
of her mission positive definite
problem this will only add to this.
So everything moves away
from the origin that's fine.
If Q. has negative values.
OK they can pull this
closer to the origin but
only to a modest amount because
it's multiplied by Delta squared.
So if the Q..
As negative but not extremely large.
Then.
OK For example for
Delta is order to the minus to the
multiplied by ten to the minus four so.
So in general.
Right.
They will give you only a very
modest perturbation of these lower bound.
So I won't go into this
more technical stuff but
OK using this ban on the eigenvalues for
a mission problem we can
immediately write down a bounce for
the convergence rate which is very similar
to what you saw above except that this is
for a mission matrix so this condition
number of the investments is one.
For not emission problems
we have similar bounce.
The non emission case is always harder.
So it's a little more complicated and
I won't go into the details.
Let me show you a few results just for
analysis purposes so
this the law passed in two hundred by two
hundred two is about forty thousand or
knowns uses a fairly
standard precondition.
And what we want to test here is just
how quickly does the method learn right.
So we solve the same
system multiple times.
So it has no practical value but
it does tell us like how quickly
does the method improve.
In terms of speeding up conversions.
So C. G.
and min restos are optimal methods but
they don't recycle anything right.
So this is what you would get for
the first system.
So you take baby to monitor ations
if you do this recycling So what you
solve it once and then use a few vectors
what you say here is only fifteen vectors.
The forty is the number is
the size of a space over which
you compute that
approximation at each step.
So you save only fifteen vectors.
You can already see you do much better.
And OK And then as you can see if you
do multiple steps it gets better and
better and better.
And so this line is the bombs that
are convergence this convergence result
you saw in the previous slide gives you.
And this curve these red
triangles is to convert.
If you exactly projected out the fifteen
this correspond to the small size.
So you can see this bone is pretty good
in this case and in practice the method
actually does better and we can explain
why that is because it turns out.
Just finding that exact invert substance
is not the smartest thing to do but anyway
if somebody is interested in the details
of that all this coming talk to me.
But if we have an indefinite problem.
Typically things are much worse.
So if we keep solving fast enough.
So like if we do it two steps
then you get this curve and
again it converges very rapidly.
But after the first Sol.
It turns out and this is just for
showing how this works right the first
four eigenvectors are very accurate but
the fifth is not and so
this delta squared is not really small and
so you see the bone is not that good but
it does it essentially mimics the initial
convergence rate of your method.
The fact that it speeds up
the bank cannot measure.
So it misses the fact that.
So you prove methods typically do
better than those bounds suggest.
Right.
So some more.
A few more general remarks in the my
conclusions right so first of all.
So this idea gives you
with modest effort right.
It gives you typically much faster
convergence and so a lot of people for
many applications have
started to use this.
But there are kind of so.
And of course if you solve harder problem
so I suppose you solve over a large
range of primitive values so you solve
some complicated nonlinear problem or
for example people who do order models.
So you have a larger
than them across system.
And it's too expensive to
solve this accurately but
you actually know that for some set of
inputs and also that you're interested in
you can represented by a much smaller
system which is the reduced model right.
But now people want to do this for
highly nonlinear problems or for
permits rise problems so
they have to do many of these.
Of course in each of these steps.
So the way the computers reduce in
the malls actually involves solving
the mystery can at the bottom level
you can use this technique and people
are doing that but obviously you can use
the same idea here at this higher level
and I think that's a very interesting
approach to take you have to so.
So you could say like if I have a reduced
model for something Amoco system.
Is that the medical
system is premature I so
I'm tracing in some sense
in premature space.
What is really the best way of tracing
those reduced models over multiple
you know over multiple parameters.
And it might be that you can
do that much better much
more efficiently to do it directly
at that level rather than kind of.
Say Well we have to solve some set of
linear systems again that are not too
different from the previous ones and
used to be cycling at the bottom.
So some new things we want to do is
write we want to think more about.
Very generally at a high level what
computer results can be exchanged
between problem instances in
the special as we do more and
more complicated problems where
we do everything at once we do
both the optimization of design and
the uncertainty qualification so
we want to for example want to design
that optimizes Well that is minimizes
the risk of failure writes you doing a lot
of these things all at the same time.
So you want to take that
into account as you solve.
As you do your simulations.
Right.
So some conclusions in future work.
OK So I think what I what I hope
have shown is that we cycle in
the search space is very effective for
a wide range of applications.
OK.
And those techniques are fairly cheap.
They're not for free.
So there definitely are some problems out
there for which it doesn't really work
because say the matrix vector product
is so cheap that as soon as you have to
project out a few extra vectors you
have to reduce its relations by
a huge amount for it to be effective but
for many problems it works really well.
OK many problems that have
localized changes in the problem.
Small changes in those small
I feel invariant subspaces So
for example as tomography.
And the changes in terms of
the matrix from one matrix
to next are actually not small.
You cannot use ten approaches to show that
they're invariant subspace doesn't change
but the more detailed analysis shows you
that it will be nearly the same and so
that these methods work.
They say.
Based on based on the background
of your problem right.
You can analyze whether these
techniques will be effective or not.
OK Well so one thing we recently
done right is so at least for
some type of recycling
we now have an effect of
convergence theory we have a conversion
the media tells us like even for
modest approximations how
much things will improve.
And of course that shows that it's
actually cycle space that this recycle
space does not need to be accurate.
Now we from experiments we already more or
less knew that but
now we really know it but
as we prove this.
If you have problems that change a lot
like if you want certainty qualification
in some sense all your mates sees or
approximations to one ever it's matrix.
But if you do a delusion of a crack.
Right.
Your mate is really become different more
and more different in that case you
need to update fairly regularly.
So this is not something I shown here but
if you're interested I can
show some other slides.
So you need to track the problem
to keep fast conversions.
What we saw that some of those
bombs actually this is for
the mission case depends more on the field
of values really than on the eigenvalue So
one thing we can do.
Of course is base our algorithm
directly on that we have not done that.
So that's an interesting
new new step we could take.
OK I think we have a very nice bang for
the noncommissioned case but
the proof is not so nice so
that's another thing and
of course you know proving a bond is
one thing it tells you something.
But really what you get out of the proof
is understanding what is going on and so
a clean proof has a purpose.
All in itself.
And another proof of the same case.
Mathematically has no.
Different meaning but for
understanding it can be very useful and so
we have software available for if you.
If these methods might be of use to you.
You can ask me and Mike parks and
other people at Sandia are actually
building them into a trillion or so
if you're using your more or less getting
access to these methods for free.
And that's it.
I have one more slight with a couple of
papers you may or may not want to read.
OK OK.
Yeah.
For example.
Well.
So in this case this was for
the min raise right.
So you saw that one of the pens but
the idea of the forty actually
reflects the following.
So for methods like the morass.
You cannot build if you have to
do a thousand iterations just for
that if you do forty days you could
just do forty and you're done.
But suppose you have to do a few hundred.
But if your system is very large
That would mean solving say or
keeping four hundred.
Vectors.
So typically you can't do that.
So you have to restart and
we restart in a smart way.
But OK But now for minarets you don't have
to restart right because it's a short we
current method you can just continue
you don't need to store those vectors.
But of course if we want to
approximate invariant subspace we
have to store a few of them.
So what that forty reflects is that
we do forty min rest steps then
reduce it to an approximate size fifteen.
Then do another forty steps of me then
take those now fifty five factors go back
to fifteen and continue all the way and
even across multiple systems
we just keep going doing this.
Yes.
So you can do smarter than that.
But well I mean so
first of all you don't know what
the future brings in some sense right.
So you always have to make
some assumptions but so
it turns out although for analysis.
This is nice because we understand
invariance subspaces very well but so
some other methods are better and they.
They themselves to how the system evolves.
So he said he of looking
at the canonical angles.
Actually what you'll see is that as you
solve the system it approximates different
subspaces all the time.
And that's better.
So for a single system that
gives you better conversions so.
So this is linked right so if you do
if you do some of these methods for
rest where you have to do some
type of restarting anyway.
Then it has similarities.
Right.
But the point of course the nice thing
that we do or what we try to do is that
we do it all on the sides but it has
links to take in this case for example
approximate subspace and you restart Now
the point of course is when you think we
start say our northern whatever you're
only trying to find an eye in space so.
So you're you're iterating closer and
closer than ever in subspace So for us.
That would make no sense because once I
find invariance once I find an approaching
event subspace the components for those
eigenvectors are more or less projected
out of my residual So if I did something
like the restart and all the right.
I would find a very accurate out in space
but my solver would just stop right so
the point here is that you have
to solve your linear system and
you need to do the computation for
example on the sides and very cheaply.
That's also why we don't
get them very accurate.
But it's good enough but but
at the level of the algorithm there's
obviously some similarity with this.
Yeah sure sure.
So in fact.
So one person on this long list
that was there was my butcher and
he's just starting.
But he's planning to use it exactly for
that.
So.
So for example if you use the matrix
polynomial say solve differential
equations.
Well you know functions Macy's is
certainly also become a very hot
topic right so.
So there too.
Yeah.
People are using or
at least some people are using
trying to use these methods.
I have not done it myself so
I cannot tell you too much about it but
yeah I think so.
I think so so so
if you want to for example.
Depending on what you want to do right.
You may want to apply some rational
function to the Matrix right so
that it involves a number of soul and so
obviously then this could be very useful.
You know I would say I think
effect applying functions of times
a vector all by itself is a very active
research area and a lot of people
are really even though you know people
started doing it a long time ago.
So there's a very old
paper bank from divorced.
About this but it's actually only
recently became very very hot topic.
So.
It's
a good question.
Well.
So one thing that saves a lot
is when I started doing this.
I asked people with interesting
applications to do this in
the nobody was interested.
There's a lot.
Well just sort of one system fast and
we're happy and then so luckily that.
And Illinois running to crack.
Propagation and software and
then once we did it then suddenly
everybody else wanted to use it too.
So have I thought about
it differently well.
So it started the reason it
started is because we were trying
to do smart methods for single systems so.
So for example this method G.M.'s
right has to do this full of nation of
a search base if you had to do with
thousand steps until you converge.
You have to store a thousand vectors so
you don't do that you restart it you do
twenty steps computer new solution that
you knew initially as you do it again in
many cases your conversion is horrible.
Right.
So even though.
One thousand might converge in
say you know eight hundred.
If you reset every twenty
you may never converge.
So what we worked on then was just a get
the idea was somewhat similar finding
something out of that source space that if
you restarts you keep the same convergence
rate is close to full dress.
So after we did that the idea to do this
for mates is like well if you make things
a little bit so I was working with
people in these letters concret UNAMI.
Which is also they need to do it for
different systems in some quantum
not quantum but in Monte Carlo.
So we thought well just try and
work pretty well so then we kept going.
And so course then kind of analyzing
like why does it work has been one step
that move things along then doing it for
very different problems.
Right.
So there's been not a thing and
then there's people.
So for example into Margraf you're not
only do it for a sequence of systems but
we also have to sew for
many frequencies so
this all looks like mates is
of a constant times today and
you have twenty of those constant so
I want to do all of them at the same time.
So there's been a whole range
of variations and then.
OK And then the next step.
I think preconditioning
was another thing right.
Well if you still computing a precondition
for each of these systems let's see if we
can do that too and that some other
people have already started doing that.
So there's a reference to
people like tomorrow and
some people from the Czech Academy of
Sciences had already a few papers with
us like well you know the studio is doing
this or we can do something similar for
preconditioning and they're very special
problems but it worked pretty well.
So so that's maybe the evolution and
now we want
to say like well if you do this receives
a linear systems insight model reduction.
We should do it for
the model we don't directly rather
than you know at the bottom.
And actually there too so it's a sign
manual meeting I made a student of trouble
for and they are looking at
some of these things as well.
So OK.