A pleasure actually when we do our job and
our series will have a very very.

You also have to be your
first human to have.

Somebody who recently.

Spent one year left over since October.

Here or see this summer to spend some
time with that over the last few months.

Very diverse interests and
also just by the label the department of

Georgia Tech a significantly long
Earth and Planetary Sciences.

Science and Engineering even if you.

Aren't And so you start with
great large operation around.

The way I look at.

The scenery it works for us but for the my
stack machine learning a lot with lots

of different technologies moving like
Miley Cyrus and looking for hearing

that that you can get out of the way
THANK YOU THANK YOU Can everybody hear me.

I added that that's also always good
to do OK Well thanks for giving you

know opportunity to talk so this is not
a very polished talk a very very results

that are like a couple weeks old in there
I wanted to try to give you a just off.

Of what we do there is not an enormous
amount of math but there are certainly

a lot of connections to epic ations and
I will once in a while make a remark and

try to connect it to think you know
there's a lot of material with

you know how far I get the key
idea here is that there have been.

In part driven I guess by machine
learning a lot of recent developments

in the matrix factorizations that
allow us to think that we could

never hope to do the way that this
concern is the way we acquire days.

Because I mean seismic exploration
we use manmade sources

to find out what the Earth looks like and
so.

We these techniques can really have an in
major impact how you conduct that business

and then I will also talk a little
bit towards the end of our get there

on a very esoteric type of approach
where we can now do think that you could

never hope to do all any computer on
earth because the problem is too big it's

an enormously lift the problem but

by being very clever with linear algebra
you can actually do something and so

hopefully in the end of this talk you
have a an idea that linearized or

linear algebra randomization compressor
sensing machine learning have a lot of

offer a lot of tools to really make
fundamental breakthroughs in applied field

such as a seismic exploration OK His work
is not by myself it's by a whole bunch

of migrated students some of them are in
the audience some of them are postdocs.

OK So just to give you an idea and
and we aspire to these skills and

certainly in the first part of the talk we
will hit the sort of the order of scale

as I progress more towards the end
still an export of stage but

we really don't know Algren's and
with this sort of skill problems in mind

right so in the field of seismic imaging
what we do as I mentioned a little bit

earlier we have a source that sensor that
energy into the earth would be flex and

then it's being detected by a bunch
of microphones and from those

echoes we get back from the earth we try
to make an image of the subsurface OK so

it's a very diverse field
it involves Jew physics but

it also involves signal processing
a lot of mathematics on the guards is

what makes it a particular challenge is
that a typical images of thousand Q You

have a billion on knowledge that is a lot
so that already makes the big difference.

Full big day that's not only
big data was also big malls.

Right we typically collect two
dimensions more data than we do.

The image so that's a that's about
ten to the power fifteen and then for

those of you who know about P.D.S.
We probably gateways over long

distances which has major numerical
issues to do that right so

there's a lot of problems that
he chair right there now.

So what gives you are really doing is we
are basically using a way if you question

that is paramount to I am think of M.

as the speed with which way Spoke to
get in the earth that will and we want

to attain that as a grid functional space
from data we collect at the surface.

And those never last the defines how
wasteful we get we know its source

we can be later sourced to a way field
through the way if you question that's

a discrete system of equations
that this could ties is a P.D.E.

that describes how waves propagate
right and that is determined by M.

so the song Speed that very spatially
determines how that system works that's

a big system that's a system a billion
by a billion for one frequency if and

has a billion point shirts a large
system of equations right there.

OK And then we call that
the only data at the surface so

we're basically trying to find
an That explains or preserve data

because of course we do explosions in
the field to measure the response.

And to solve the sort of problems we saw
over a large minimisation probably very

minimize an objective
parameterized by that M.

that minimises the misfit between
observed data and the for

modeling that we do in a computer F.

that is permit by AM and

again to remind us these things
a lot things are also expensive

right to collect a seismic survey set
you back between the thirty eight.

Two hundred million dollars These
are brick exercises take months and

months with large groups but
it's not something you do enough to know.

OK So and the system resolve if you for

instance solve it in that kind of
main this is a large system but

there are direct links if you look at this
this looks very much as a an objective you

would minimize in machine learning
right so just label things different so

this would be to impute that this would be
training data out of the M R parameters of

a network ever is a network and
we solve a.p.t you can think of

that conceptually so multilayered network
except we have ten thousand players

we don't do not only operations
between the different layers but

it tells you how big the systems
are actually to really large systems or

we creations we have to solve just to link
the physical parameters in the earth to

the observed data OK So
and there are lots of.

You know associations you can have as I
said the form although you can think of

that as a genitive convolutional network
we do things called add joint state but

that's back propagation we do similar
to any sourcing that's sketching and

stochastic optimization So

all of terms that may be well known
in the machine learning community

also apply to our community we just call
them differently right but the structure

is the same except that you say our
variables are quite large although you

have to say that the hidden variables in a
deep convolutional network are also large.

But I think if you want to
think of the skills of problem

we're really trying to learn five K.

plus video right so we're not working
on small little images we work on

really really large data for us and so if
you were to say I want to use the blurring

in this field tomorrow at skill you
have a problem at your hand right so

this is sort of the but
that's a challenge so that's why

I'm thrilled to talk to this audience and
to start working on this sort of problem.

OK so I want a large research program so
I did a lot

of work on optimization compressor
sensing this is sort of an outdated so

it doesn't have machine learning yet
and it's Matlab you move to Julia in

the meantime but it tells you a little bit
a mind map of what the different things we

are in your group and we have to work with
lots of different things to solve this

problem it's not like a knife idolised
thing you can just work on with a small

team you have a large team of people
touching all these boxes in that mind map.

Just to give you an idea what
the output looks like so

this is sort of a conceptual model
of the earth where you have to last

the changes as it as a function
of space this is just two D.

But of course we want to go to three D.

and then typically what we produce
are either a bit blurred images or

sharp images of high frequencies that
basically delineate where the earth

changes we call this process to

generate this from the data we call that
migration I'll come back to that later.

But the idea is to get very
high resolution images and for

that you have to understand the physics
of how waves interact with the earth and

collect data and try to figure out from
the data what the earth looked like so

it's a large inverse problem but
there's a lot of H.B.C.

components because these
things are show Big OK.

So I was just so now let's go to
the research topic for today and

I will speak specifically today is how
we can exploit lowering structure that

underlies today to our mind to
collecting data in five dimensions

to sort their instance to receive
a dimension and time whereas the image of

the earth has three dimensions so to must
be some sort of an inherent redundancy and

we're going to try to find ways to
exploit that and then if I can any and

I want to talk about what we call so
if you image of all humans

which are lifting off the problem so
if billion barrels are not enough for

you in that case the it's a billion
by a billion matrix that's.

Right so which you can never hope to form
but you can maybe work as actions with it.

So that's what I hope I get there.

So why do we care well because we
would like to reduce acquisition time

acquisition is expensive it is
has an environmental impact

that you want to reduce can we be
more clever how we acquire data

reduce cost which is a very important
these days because the oil industry was

the lower price it's not in the best shape
but that's for everything data collection

everywhere is expensive perhaps if you
if you at least if you do with censors.

Then there's a massive amount of data and
that makes it also difficult to compute So

if you can find the percent patients of
your day that are enormously compress and

allow you to extract so four years of it

without having to form two full volumes
could have an enormous impact on how

you process that data because you take
away a lot of burdens related to Io.

And then can you do things
that you can never dream to do

unless you're clever with your linear
algebra and that's what I get there.

OK So just to again to to show you
a little bit what's going on so

this is a graded philosophy
of the subsurface So

that's the property of
are interested in and

as I said that has typically
about a billion point and

that from that with the wave equation we
can generate people shot records which

are basically a single source experiment
this is the data that's collected by

a two dimensional array of receivers and
so the vertical axis is time you see

here all these way from coming in and
then this is the action the Y.

coordinates for your source for
your receiver and

then you do hundreds of thousands of
these so you collect a lot of data and

then of course the question is can
you map from here back to here so

you say it's a large imaging problem so
they will soon hear it but

abides in the data they collect so
you have to do a mapping from

a five dimensional better buy to something
that let's say with a billion or so

OK to make things even wars we
are interested in basically doing

a lifting where every grid point in the
model that we're interested in generated

a volume like there is so actually the
unknown becomes a billion by a billion so

that's a that's something that you know
that's almost like one fifth of the earth

in a big mate the big Google matrix of and
it's dense right so that's the.

OK so we can all form that explicitly So

let's start talking a little bit about
what we do in the acquisition because

again we need to collect data
in order to create these images.

And so as I mentioned we are sampling
a five dimensional way for

you that has a long range for time and
then we have two receivers and

two sources that walk all at the surface
so it's a five dimensional data voice.

And so we work a lot on compressor
sensing in the beginning and using

transform domain techniques and these are
great if you are in three dimensions but

in five dimensions the cause of them
mention their basically hits you so

these things don't scale so all
the fancy stuff we used to know from mom

say I want to can Alice's things like
curve words wavelets all that stuff but

fortunately it doesn't really work
because it blows up in your face

because the data is so large so
that prohibits a scale up to five B.

So can we exploit an order type of
structure to handle this sort of problem.

OK so
to do that we're going to share a three D.

seismic survey as a along
the fifth Corps and

with a budget get rid of one corner it and
if you do that by taking a four year

transform a long time and then we
look at mono Matic frequency slices

right that is you have four
dimensions to source to receiver but

that one frequency and
then you look at different frequency so

you can be treated independently because
of the poor Transformers orthogonal.

OK so we call these things a frequency
slice OK So this is an example for

frequency slice that's a small one but
it's still a dense matrix all for

twenty seven thousand by
twenty seven thousand and

that presents a small patch this is soup
this is sort of a post stamp three D.

seismic example but that's already pretty
tough if you will have to work with this.

Stuart explicitly right so we would
like to work with this for Matrix and

but you know that may not be possible so
one of the things we can do is to and

I'll talk more extensively about that
as we go on is to recognize that

if we organize the data in a particular
way which I will revisit in a minute.

We can actually work in attractive form so
instead of working with this dense matrix

we recognize that if we organize
made to size this four dimensional

tensor into a matrix and we do that
clever that there is actually be done and

see and it's not so crazy that there's
redundancy because these sources

of receivers look at the same earth
there is fundamentally we don't and

see there certainly if you don't
charge too high temple frequencies.

So we can actually think in terms of
factors form and then we only work with

the factors and we can then is out forming
of all volume extracting certain volumes

that we need for the inversion that is
the imaging I mean is that OK I want.

OK so what we're doing is we're doing
basically the Netflix matrix completion

problem on steroids
it's a large matrix and

so there's a whole literature on this we
worked with in my group of those who has

been wrecked who is now a gurgly who works
a lot of machine learning problems and

there's a whole bunch of papers to
chicken find online that deal with.

OK So what is key if you do matrix
completion and I will I will go through

these and then I will make the connection
to how you have to rethink your site's

making their way to make it fits in this
framework because if you just do think

naively it won't work they have to be
sort of clever way to do and so you have

to first of all try to find every person
station of your day that reads a mate is

ation of your of your rank for tensor
into a way that the single values P.K.

fast so how can you do that
that's the first question we are.

OK So we basically have to think
about how to own fault a rank for

tensor into a matrix and
there are different choices you can make.

So if you have four dimensions here to
source into a receiver you can either

you know put together source X.

and and source Y.

and that's would be
the canonical representation and

the same sort of receivers or you can
do a permutation because there are many

different ways how you can
make the size of your data so

this is a example of a frequency
fly where that means

you look at the day that one frequency
where I too long to get a source X.

in the source by corners or the X.

axis and the receiver X.

and receiver Y.

coordinates on the vertical axis and in
here you see all these sort of different

little experiments that is basically a
little part of a three dimensional survey.

And this is what these things look like so
you see the sort of all sort of Tory

functions that we carry away
from the diagonal because waves

propagate away from the diagonal not this
doesn't decay fast the singular values.

If you permute So
you long to get a source X.

and receiver X.

source wearing receiver Y.

the single values to K.

way faster so how you organize
your data how you re percentage

as a Matrix makes all the difference and
that's the end that much of a real big

difference that makes whether your matrix
completion works or doesn't work and

this is sort of where you as a person
in the plights feel have to understand

why that's important and understand what
you need to do because nobody told us to

do this that's something that
we came up OK so that's good so

we have a representation where
the single values decay fast and

that's reflection of the fact that
seismic data is we don't because

again we collect five dimensional data and
in the end we're interested only in

a three dimensional earth and this data is
generated by a three dimensional earth so

that there is a low rank structure
is not necessarily a big surprise.

OK but that's not the whole story we
also need to think about sampling and

sampling we do not have a lot
of control over because

we are dealing with a physical system so
the simple sampling I'll talk about for

now is where we miss certain sources or
receivers that is we just didn't

collect if we didn't fire source somewhere
or we didn't have a receiver somewhere

right which is I did my boss told me I
couldn't do it because it's expensive or

I just couldn't do it because there
was a physical obstacle or something

OK what you want with the sampling and
that's an idea from compressed sensing if

you want your sampling to break
the lower rank structure that's the then

a rank minimisation algorithm
will give you potentially to fool

data back that's sort of the compressor
sensing idea we try to exploit

OK so if you are in a canonical
representation there is source X.

source Y.

receiver receiver why I mean
Mrs source we missed a column.

Right if we had missed a receiver
would miss of receive miss a row and

missing welcher columns in The Matrix
does the opposite of increasing the rank

it decreases to rank so this is
the Warsaw simple sampling you could

ever dream up if you are interested
in a matrix completion problem and

nobody ever told you that because it's
your Titian in a field to say we think

uniform random samples then we will never
have this but I can all do that in a field

in the field I miss a source and
I miss it all the way right so this is

sort of a I can't control that because
her physical system I'm working with.

But if you go permuted then it looks
like we'll lose random blocks and

now the rank or the decay of the single
value to say as much floor and

therefore a rank minimisation will send
a chance to get your full matrix back so

there's a simple trick where
actually there's a free lunch

we have a representation community.

Fast and the sampling is in sort of
lose the use terms incoherent it sort of

raises the slows down to decay of
the single values which is a good setting

to be in if you want to fill in this
matrix as if you took samples everywhere.

OK So let me use techniques from convex
optimization to solve these problems so X.

in this case is the full data matrix and
she had samples everywhere

then the curry A is say a mask
that takes out yeah so.

So.

Yeah so OK this.

Is.

Yours at the if it's if it's for
Yes yes correct.

Yes all.

I could be although

what we will show is in the recovery we
actually want to use information that X.

extends over the whole area so we consider
the poor the act rather than locally so

what industry to us they think in terms
of very little window stem bite them bite

them at ten we think of The Full Monty and
then we exploit the fundamental

redundancies that resides in the data
because we're looking at the same or

right so I you may be able to to
fine tune the things and I will

I will give you a little bit more point
as to where we're thinking theoretically.

Here you had so we in the end we don't
need to know which way to expensive OK so

what we do in order to do it is
there now we make a show OK So

anyway this is a that the mall's
how you acquire data so

that knocks out things where
you don't have anything and B.

as you observe data and

you solve an optimization problem
that finds among all major She's X.

the one that has the smallest some of the
singular value is subject to that if you

apply the sampling operator
to it you get your data OK so

now we make an and then additional
step and this is turning a nice

convex problem into a non-convex problem
but we are forced to do it is because

we can never hope to really work as to who
actually can only work in factories for so

we work in an AND LEFT RIGHT
decomposition of the matrix where the L.

has a small rank March smaller
than the size of of X..

OK now we develop the solver for
it is either some papers on that but for

the interest of time I will go with
a quick but we never do we never get to

see in the values we work as an opera of
the for being is known by the factors for

what the singer values are because we
can't even avoid doing any of that

OK so now that's nice but
you know who cares and

also you would like to know before hand
a little bit what should you do and

what you should not do what I
did now is very very qualitative

can we go to something that's slightly
more quantitative Having said that it's

very difficult to come up with strong
mathematical conditions that will tell you

exactly what you need to do in the field
right so that's still an open problem

the same thing with compressor sensing
that has beautiful theoretical results but

almost nothing translates to practitioners
are made These are not idolized

there's lots of things that just don't
necessarily map but there is one thing

that I think will give us a little
bit of a handle that suspect gap so

to look at basically what's
the connectivity of your sampling in

the Matrix and that is a predictor of how
well you think you can actually recover.

So if you remember our goal is
to find an approximation to X.

from noise you observe entries that
live in Omega and they settle for

make us much smaller than the set of X.

right so
we have a restriction operator that X.

on X.

that gives us the samples for
a mask if you if you wish.

And then we solve this
organization problem to get there.

OK So there are couple things that
you need to do so first of all

a lot of the mathematics makes assumptions
on how it is only goes distributed and

typically people issue it to
be uniformly distributed.

But as I said already that is
not necessarily something we can

do in practice and another thing that
you need is that you have to have

the singular vectors need to be
incoherent so there are some

conditions on what the single vectors look
like how the energy is basically spread

the Morse the energy spread the better and
that is something we can compute.

At least for a large chunk of acting.

OK so if we have uniform random sampling
then everything is not right so that means

we have a matrix we only have actually in
The Matrix where where it was white and.

And and so if that's uniformly
distributed that's good.

The other thing that we need is that our
data looks sort of spread so it should not

look like this it's more looked like that
and from the seismic data that show you

already things are enormously spread these
are waves of over get everywhere so things

are very incoherent it's you're in a good
setting where does this this would work.

I know it is a bit hand-waving but

I'm sure if you compute some of these
things that will be in good shape.

The only thing what we don't have
is this uniform random sampling and

only access to run them uniform random and
subset of entries in the Matrix we we just

don't have control over that in that way
so what then determines a good matrix

is a metric that tells us before hand
how well matrix completion would

be performing given a certain sample mask
and that's the sort of the aim here.

OK So that has to do with connectivity so
you want basically.

Your sampling to have connected so
many feet and

maybe that alludes to what you're
also maintaining with so blocks

and that is certainly true when you
are uniformly sampled then that happens.

And so you can express this in terms of
the spectral gap of the sampling mask so

if you have a sample mass going to
zero where you have no data and

the one where you have data you
consider there's a matrix and

you look at the two largest single values
from graph theory that tells you how

connected the sample points are and
that's easy to compute you can

compute these two large single values
on a very large system or the problem.

OK And then you can have either small
spectral gap or a large fractal gap so

that is that there's ratios are to close
to one or it's much smaller than one and

what the theory tells you it's good
if it's Mark smaller than one so

you want to have spectral gaps
that are much smaller than one and

we're working on and this is work also
is a student who is still at U.B.C.

I think really making this more more more
robust I want to just give you flavor now

how this works and
what the effect of it is so

if you have a large spectral gap that is
it is very is much smaller on then also if

you start nicely sampled and we can then
expect better results from A to complete.

OK So let's start with looking at
the idea OK So this is an example now or

of everything in the right
organization uniformly sampled so

this is what we can never hope to do and

but this has a very good spectral gap
very small that means there's a large

difference between the first in
the second finger Valley and

then we can actually recover for
remarkable low sampling rates right we

miss ninety five percent we have
only five percent of the samples and

we can get a very good result back
something like twenty four to be.

Right so well that's idolise
speak can never hope to do.

In the field but

it's good in a man in a computer to at
least check what these things can do or

what they can do what there is here how do
you make an error so this is a residual.

Or you'll lose something just to keep
your idea of these things looking and

find a main this is sort of a basically
acute unfolded sort of vertical axis

is time the horizontal axis of receiver
act the vertical axis is the receiver Y.

and then we look through it like this sort
of so this is a this is the time flies

if you look from the off and here you look
from the two sides you can see that you

miss a massive amount of data and this
is what you get after your interplay so

this is doing the interplays of all the
different frequencies taken in first for

a transform and this is what
the data is sort of looks like and

it's not perfect you can
see there's noise but

we get a lot of the as spectra of the data
from a very very low sampling rate.

This is much lower than typically people
can attain with a compressor sense.

When you work with transformers and
this is what the true thing is so

yeah it's certainly not perfect but
at least we we're getting there.

OK So let's now look at the small
case what happens if we have

a small versus a large spectral gap so
this is again is the original data and

then we do recovery where
we have a a a large gap so

it's close to one or
a small gap that's close to zero and

you can see that there's a remarkable
difference in the recovery it's the same

number of samples it's just how you select
the samples as to make sure that there is

a large spectrum so being the difference
between being clever and being not so

clever how you put your sources and
your receivers in the field.

And we can make a lot of that and
you can see as suspect to get goes up all

the action are sort of recoveries
go down every spectacle subsampling

is and that's so that confirms sort of
the relationship NOW course we would like

to have that much more precise in
the equations but there that's typically.

Not so easy but

I think we have a chance here to do
more than we are ever been able to do.

OK so now that's all still idolise
then let's look at what the guys

in the industry do and
see how that works and so

the most expensive and sophisticated
way people acquire data in the field

in marine settings is with so-called coal
acquisition it's a very clever idea.

So instead of having a boat so
what you normally have is a boat that goes

in the boat told you in the array of
twenty kilometers long with a conciliar

in microphones on it and then it and
then pulls like twelve of those a race and

then then they night it just tries to
transfers in certain acquisition area and

in additional way they would
do it sort of this way and

then they have to return because the boat
has to turn and they come back and

then there is really only good sampling in
this direction in a poor sampling in that

direction so it is a poor of
a cycle poor as removal sampling

because the boat has to prefer a natural
direction how it goes over the area.

So what people have proposed to do instead
is random coral sampling the boat goes

into random circles and the centers of
these corals are randomly distributed so

they are doing random sampling in a very
clever way that you can do is a boat

that pulls a twenty kilometer long array
right so it goes in very large circles

with the circle starts like the perturbed
as it transfers over the area

so this is sort of where the sources
are tell you can see from our

Chile plotting what the to react to
this is where are the sources firing and

where the receivers are firing So
let me show you a zoom so

this is sort of what these corals look
like if you look at the source and

that's not a nice pier all the grits
that people would like to have so

the question is can we go from a mask like
this and infill such that we have a equal

space sources that's the that's basically
the goal of this whole matrix completion

is can you fill in what you missed because
you missed stuff here right whatever's

white you didn't put a source and

then of course you have to
make Also choices to how dense

a fictitious source grid do
interplay to how a dense fictitious

receiver great door into play and
word that we can again you suspect to get

to play around with matey
see that have this mask and

then look how that all translates
to how you sample your sources and

receivers and how that translates then to
a umask that you have to invert if you

want to complete this data as if you
had you know infinite resources and

you spent light years collecting
the data all the data.

So we played around with this and
these are sort of the grid sizes for

the receivers and the sources and
look at the sweet spot to

see which one would be the best for
the best recovery now

unfortunately I haven't verified whether
these would really lead to better or

worse results because the complication
efforts to recover a volume like this

are massive so you don't do this overnight
you can't run too many either or

you could do it on smaller problems
if you wish but in this case we did.

OK so what in the end of looks
like if this is the mask

in if you organize the data in
the non-canonical organization and so

the task now is can you Interpol
eight wherever things are white.

And it misses a lot of data and the
spectral gap is not particularly good so

it's much better than doing other things
but it's not particularly good and

so so we can't expect fantastic results
certainly not compared to uniforms

random sampling where the spectacle
was significantly lower for this is

what the data looks like so if you zoom in
this is the to react area of the boat and

you can see sort of these different
stripey things are the arrays of

these receivers so every boat carries
like twelve receiver a race and

you can see that it
makes you crazy sort of.

Territories but mind you this is
in a weirdly organized dataset

it's the permuted is supreme mooted mask.

So the task now is from this
to fill it in right and

this is what you get if you fill it in and
so

that is what is remarkable about this that
we use information over the whole matrix

to fill it in that's different from what
the industry to US industry works a little

little cubes in parallel have no idea
about correlations that exist over a whole

survey area and that I think is the reason
why we can do is we were crazy enough

to think of this problem as in
The Full Monty in the who sort

of size rather than already immediately
chopping up things a little in those and

trying to play with that so
that I think is one of the Kerio

messages now you can see that the visuals
watch that so it's not perfect but

if you look at the data so this is the
ideal data this is the data we collect so

you can see you can miss whole
you know part of the data and

this is after you're into place so it's
much less good than the previous one but

still it's remarkable that
you can recover days.

And actually it's got the intention
from industry were Houston the company

which is slumber shape that does
this call sampling they come and

give a presentation on this because
they didn't know you could do this.

OK.

So how are we doing on time OK so you are.

A mayor on a few minutes over OK so what's
cool about is that sight from the fact it

can be covered these data volumes you
recover them in enormously compressed form

because you basically only work with the
factors so at the low frequencies that can

lead to enormous compressions we work
only with point five percent of this full

data voice so data from a compression
perspective you can instead of carrying

a a a truckload of hard drives is you
you can put your thing on my thumb drive

right is a massive compression
of what you need and

that may have a big impact on how
we've done subsequently work with this

data to create images so it's a it's
a that will be a game changer because

you can distribute the data over a cluster
much easier than what you have to do now

if you carry along all these
massive full data OK so

I will skip this a little bit but there's
enormous compression is the only thing

that you need to remember is that these
things work great for low frequencies for

high frequencies compresses less and
that is sort of on the suit

theoretically why that is so there's just
much more complexity at high frequency

so anyway we can form a beta for

a version on the fly without
ever forming the full data for

you that means I can give every note in
the course or access to the full data

whereas now they have to talk to our
central huge database and extract

the data and that leads to an enormous
amount of traffic now to have it all local

It will have an enormous basically impact
on how seismic data is being processed and

I think maybe we can learn
from this every skill to

larger scope problems in machine learning.

OK So that's sort of the most
down to earth topic and now and

in the next fifteen minutes I'm going
to take you to something that's maybe

much more esoteric but I'm going to try to
give you just of why we care about is and

why we can do certain things that
you can actually never hope to do

if you think about these things brute
force feed industries really put forth.

I think the dollar has six
hundred thousand core cluster for

themselves I think has to create computers

these guys like to do things brute force
they throw a lot of money at it but

something even if you have the Vegas
computer on earth you can just not do and

this is what this is OK so
there is some lingo here that.

You may hate but

I will try to connect it to as some
terminology that you guys may be familiar.

OK So let's first look at the physics
of imaging what's really going on and

I didn't talk a lot about that that
would be topic for another talk.

But what we really do to create an image
we probably gave away field in a computer

from a source into a velocity model
we think if we presented it and

the public gets that
simulating waves OK And

then we also back propagate here you
can sort of see backpropagation state

what we also know from the notion that the
networks you go down in the you go back up

the same structure here write me
back propagate there is basically

whatever it is that the receivers and
then we cross correlate to a field.

OK so that's not so
you have a way field in space and

time and forward and I joined and
your cost correlate and

you look at the zero electron that's
what your images and that's sort of what

tries to be present here they have
a source way receive a way field and

then your correlate them and
if you look at the zero lakh term the.

It's basically looking at zero offset
you look at the same point but you don't

have to you could correlate the two way
fields and look at two different points.

Now that's a lifting if you look at all
points so that the complexity you image

now becomes as big as the image by the
image becomes quadratic really large but

it has a lot of very interesting
information that people are interested in

and that's why they compute some of it but
brute force and

we're basically saying now we can use

clever sketching techniques to do these
things without having to do things for.

So we call this extended images because
you map now the data to a six dimensional

object which is consisting of the three
space dimensions times the seats so

it's a very large object and so
you can think of this is lifting so for

those of you are seen say
face retrieval problems

people sometimes lift
the problem this is a lifting so

you make the unknown's way larger and that
has particular mathematical advantages.

OK.

So what people do is do it is for
as subset of offsets brute

force massively expensive because you
have to do store all these massive

wave fields that have a billion variables
each and cross correlate them and

then do different cause correlations and
store that reaches to match

and so what we want to do is
actually think of in terms of

these image volumes as a function of
all possible subsurface offsets so

every point in the subsurface generates
of all three dimensional volume in terms

of the vertical and horizontal offsets
that is the distance between two points

in the subsurface where it's a massive
six dimensional object we're dealing with

OK so what we do now we
use clever techniques and

this isn't an observation I just
of a little and Verjee for T.V.

is an audience that actually we can
think of this object never format but

we can that we can create actions of
that of that object on the actors and

we can do that cheaply and
it's a very much a little bit related for

those of you were to Joel Trott's talk
a couple weeks ago we have actually is

to surround Wong actions of this
matrix and then we can play games.

All high speed up a little bit there's
going to be a little bit more mathematics

here but you know there's some math so in
the end so what in the end is going on so

mathematically what do we have here
we have a source on the right and

we have the data received you're on the
right and it's the forward wave equation

has the wave equation you wish that
forward way away from the source free is

the joint where you feel that is basically
the receiver WAFL propagating in an agile

and wave equation OK so
you have we are way fields in space and

there's a functional frequency for
our work frequency by frequency

OK So an image for
you is the outer part of the V.

and you so that's quite erratic in the
size of the view has a billion variables

then he has a billion by a billion and
so and the different columns in V.

and you we present different source
experiments because we have this for

every source we have a different one.

OK so what conventional increasing does it
says you know what we're not interested in

this whole object we're
interested in the diagonal but

that just means you take the heart of our
product of these different way fields U.

and v and some of them that's what you
would normally do and that's it gives

you sort of the zero at correlation which
you could do something else either is much

more so think of it as a cute object and
people only do to get an image if they

look at the diagonal and I'm saying I want
to look at all of diagnose are I want to

look at the whole volume at
whatever way I want to and I'm

going to be able to do that because this
whole volume has permits a lower rank.

So this is and that's the trick we're
going to play and we're going to only work

with probing So we're going to apply
it is volume on to probing factors and

then work from that OK So
let's look what this looks like so

say this is a very small earth model this
is the image for you so this is a thousand

by now it's one hundred by one hundred
images or you manage ten thousand by ten

thousand right because it's screwdriver
right in the brain the size of this model.

If you look at the single values
though they decay enormously fast so

there is an owner printing low rank
structure here that we may be able to use

and so we do that by sketching
what we sketch with P.B.S. So

we play and this is the trick here if
there's one equation of the second part

you need to remember this one
is if you enter just simple

looking at the linear algebra just playing
with matrices and then it stares at you so

what we do in order to compute
the action of we all know a.

Sketching factor say a random vector We
just have to invert the wave equation

we have to restrict it to where
the sources are we have to call balls with

the source convolve or correlate
with the source control over to data

inject back at the receivers and propagate
again so all these things you can do

consecutive in the computer
you never have to form before.

So that's cool because now we can try to
find out from probings what the range of

the what's the range spacer.

And we can do that by probing it with
a limited number of random vectors

on the right hand side.

And then we can do a Q.R.

factorization I have to apologize for the
notation we just run out of symbols so Q.

means in this context a source and
in this context it means the Q.

of the Q.
are factorization but changing

the notation makes everybody else I think
of the players forms for something so

it's hopeless so this is what I meant
was it's not a college presentation yet.

Anyway we can work with actions of e on Q.

and A and B.

is Q and that contains all
the information you have to have in order

to work with a large object
like this case you have in C.

and I should really speed up so

just let me just show you quickly what
we can do is a randomized this would be.

So.

When you have these actions
you can basically computer

disk you are factorization and
then use this Q.

and E.Q.
to form basically a low rank as three D.

approximation of that large matrix.

Right so that's this guy and
from that guy we can form a left right

factorizations by simple along being the
square root of the of the singular values

on the left on the Right OK So now this
case isn't a turn of the formula for

imaging right on the right this
is all the formula we have and

on the left we have now something that
is based on this matrix factorization of

this ridiculous the large object and you
can ask yourself the question which one is

going to give us a bit better image if
we are only allowing ourselves to work

with a very limited number of probings or
basically wave equation souls.

And think I'm losing everybody when I was
show you an image and then we're done.

So in a way what we're doing is
either every put a a a a a sketching

on the data which basically means sandwich
or matrix here between the W.S.W.

as fenceposts that's
a basically a matrix that.

Contains basically crosstalk because
these are Gaussian made fifty or

you probe basically on the right and
then work with these factors and

with the same amount of P.D. call speak
on the basically comparative to images so

the amount of computational work is
the same but we are just being clever with

doing fancy linear algebra tricks versus
doing things move force that's basically

what we try to show and what is motivated
by is that if you look at the P.K.

for dissing the value of the data first
has to be gave the singer a very.

The image for you the image for
you speak a much faster then the.

Then the data so
a low rank approximation of the image for

you is much better than
the lowering of the data.

So that's the key and the key idea here.

Are these are the programs
I will skip it and

to show you some results so I will
show you also what people care about.

So you may think that this
is all very subtle but

this is the difference between finding and
overs are for not so

this is only a bit Tran probing I did
a conventional way or the new way.

Sorry So this difference and
you need to look carefully you

see certainly flexors popping up that
are not existing in a conventional way

particularly Steve events here that's
typically where the oil reservoir

is that's what people spend a lot of money
on and you can do this same amount of

work same amount of P.D.'s also just
being clever visually and algebra.

Doing it and this is why we
increase the number of programs

to one hundred then there's
still an enormous difference.

You can see certain reflectors coming up
that didn't exist in the other approach

just by doing these sort
of smart randomized tricks.

OK So
I think that's it I will leave it here.

First.

The first line of the second part of what.

The first not the first.

Yeah.

OK sure this one yeah.

OK.

Sure.

This.

Well it is so this is so
fun to all be problems very linear

in the source have to pour right so
we write it's deliberate or

like this because we're leaning
on the sword every few years.

If it's a line in the source so

a whole class of inverse problems
would pertain to this not only waves.

But this but this shape I mean this
form looks very much like a C.N.N.

You have a primer trysts who show
network you have training date out so D.

is a training data for now we have
to be known with M.R. the parameters

of your network and people use stochastic
optimization techniques is sold at.

And then they use backpropagation
we use also back over again

we call that hunch that chilled
out a lot of connections so

we looked a lot at solving this problem
using so classical musician techniques but

that's a separate talk but since this
is a machine learning center and

I thought let's make sure the connection
is everywhere except our problems

are large and we can do tens of thousands
of iterations you can do hundred maybe.

She broke.

It off with.

Lars Bush share this story.

As.

Well so loosely so so in a way we
are also in the predicting business but

we are fundamentally different than where
you guys are so we are if I have to

correct I can't predict every data for
that area so that's a predictor right but

what we really care about you don't care
about predicting they would care about M.

So our parameters say the parameters of
your network are doing objects of interest

so we need at least part of our
parameters to be interpretable where you

meant Israel is learning techniques that
are hidden variables that's exactly what

we're looking at in my group now and then
we don't care whether we interpret those

variables but in principle we do really
care and as the object of interest.

Just.

We're going to go.

Away for business yes.

Sure.

Basically.

But I can learn today that it's not
the problem I WANT I WANT I WANT something

that gives me an image the images of I'm
interested in so if you happen to do so

in a way you can think of it we permit
to rise be the model by the way if

you question and you say to corporations
of the wave equation is something that

gives me attention what information you
have what you would ASADA questions you

could say I don't want to have an image
of the physical parameters of the earth

I want to know at this depth it
whether there's an origin of our or

not and
then you may do a black box one but

then you have to train that is a heck of
a lot of examples where you have data and

you know there was a reservoir
there then maybe you can do that.

For you right.

Yeah.

Now star struck.

So I think you mean that if you do you
mean the out the time will to sway you so

different experiments different.

Not just the lesson that yeah so
that we caused forty or timelapse Yeah so

we look at that we have a look from a
machine learning perspective all that but

then your he added not our scale of or
big larch right but

I so where I see where the opportunities
for machine learning are is to this

is that this is based on a presumption
physicist who thinks he owns the physics

understands the physics that basically say
the Jupiter is believed that's that's true

I don't believe it's true so where can
you use machine learning to make up for

the fact that the wave equation
doesn't describe all the physics and

that you have a bunch of parameters
you don't necessary care about but

some of them you shoot because otherwise
you can't interpret your results and

that's I think also the difference between
machine learning perhaps sometimes and

an inverse problem Visine first
problem is we really need something

that that you know we can interpose.

System and it just gets worse
problems you know you find all.

The learning will work in a way but
it's sometimes you know OK so.

This was.

Really sort of it.

Was Yeah well so I believe that I'm a big
believer in physical constraint network so

you can think of again networks that
have both so but I don't believe so to

allow this whole black box and CNN'S are
going to do everything I don't think so

but I don't believe that the wave equation
is going to do it all either right so

datz where that sweet spot to sit in
between Very well thank you so much.