I'm going to present to do research
that we have been doing in labor for

the last two years but
I'm going to show you a little bit also.

I have been doing.

Before I came here to Georgia Tech
before I joined Georgia Tech.

So robotics is part and in particular.

Control is very hard and
it's very hard because.

We have to work with systems.

For which there's a lot of I certainly
don't know the dynamics very well

there is very often I said to be about
the task we don't have any good models

about in the actions of the rest
of the world so that's one of

the enemies that we have to fight against
this enemy then of course there's a lot of

complexity in these robots so
you have a good morning there they nomics.

You have made you think
there's a free don't you have.

Contact you have and that actuation.

And in many cases you have many many
many parameters with respect to

which you want to optimize So
there's a show so also.

And not to mention the perception
of sight which is I think

an extremely difficult problem.

So my.

Days may be one of the ways to
deal with I said the complexity.

The idea of learning a control policy.

So I have been thinking quite a lot about
flight and I have been asking my students

and some other people because
I think that's a slight.

If you don't understand.

Anything from this presentation I think
if if you guys have in your mind the.

I would be very happy so.

Let's say we have two axis and
then in one axis we will put

Hamas' knowledge that we have about.

Our system.

Reporting System.

So we have low to.

High in all it's.

A very accurate model is a model
that requires a lot of knowledge.

And then on the.

On the X. axis we have essentially.

How many interactions we need if any

with a real actual system
in order to be able to.

Perform a task and
learning piece dynamics it's just

the subcase of performing the task so
in the case of that action which

means that you are basically you have a
very Yet yet we have a very good model so

you don't have to really you know you
can solve the problem of flying you

have a very interesting way to solve this
problem and so if you were to place for

example traditional money
out of control theory.

Where you would put it on this diagram.

You will put it most likely in
the case where you have very high

knowledge about your dynamics and

if you know they will in the environment
very well then you can solve.

Everything.

Offline right.

But then as you start incorporating
uncertainty I have any into that world or

the dynamics then you start you have
to be able to use some other tools.

As ation said model predictive control.

And then at the end of the bee you can
use also reinforcement learning so

when I refer to reinforcement living here
I mean essentially the classifier going to

be seen with you but I meant it as your
policy and you have to learn this policy

through interactions
with actual system so.

X. axis here is the axis of interactions
but essentially if you need a lot of

interactions with the actual de nomics
then your time scale of learning is slow

because you have to just interact with the
system if you don't have to interact with

the system and you have perfect knowledge
then if you sold everything off line and

then you hope that.

You have is going to be able to perform
the task bar on the real system but

then as you add a little bit
uncertainty into the problem

then you can do predictive control
which is essentially you don't

interact you don't have to interact with
the whole trajectory you just interact for

a few time steps and
then you'll be optimizing your Paulist.

OK so what I'm going to
present to days essentially.

It is essentially on a trajectory that
starts from the control theory to what

the protective control today unfortunately
I mean and I'm going to present the one

frame more that captures all
of these cases and of course

what is the ultimate goal what is the
ultimate goal you want to be able to push.

This frameworks to scores
as possible to low.

Model knowledge and very fast
optimization right you want to be able to

learn policies with a small
information as possible

with a few interactions as possible
with every other actual There nomics.

And as fast as possible.

So typically in optimal control

there are these two people or
so vocal control theory.

And that has been a lot of work.

This work has motivated
pretty much from their AC in.

Aerospace.

And they need to explore space so
often control was really

a very hot area at forty S. and
fifty's and we have these two gentlemen

to put a gag in and the sort of Belmont
who actually developed tools to perform

to solve difficult control.

Problems.

But now what what I'm going to present
today's essentially non-classical view and

non-classical view doesn't mean that this
view is not all that because it does we

lie on principles that go many years back.

But it is non-classical because it
brings different flavors together.

Especially being some concepts
from a statistical physics.

We have to rely on my super learning
because my SO many will do the job for

us to learn they now mix.

And of course control theory but
I lay zation will be important because

as we will see we can solve hard
control problems with sampling sampling

is something that you can part analyze
very nicely and it's a big deal.

So.

Going to go through my presentation is
essentially I'm going to go to different

areas for peak ation from
robotics to Aerospace Systems and

if we have time to condition
you to science and

essentially I'm going to show how.

What of a few research
the cast of mind control and

harvest use are play to specific tasks and
applications in these three

domains also another characteristic
of my presentation is going

to be that it's going to be a theory and
it's going to be demos and videos.

What everybody's excited and then go back
to theory and then again back to demos

OK so there is no free lance I
mean there is free lance but.

We have to assess.

We have to understand and

I'll try to do my best to explain to
you what is the weather the underlying.

Principles of the I'm going
to I'm presenting today.

And so.

When you open a textbook in
my control theory you see.

Equations that consist of a cost
function that you want to be

able to optimize and this cost
function has a cost and it has if

you have a time where I was in from the
end and you want to be able to optimize.

Cost function.

And the decision time you have.

You are you are you are constraint on a
classic dynamical system on a system that

has state X. It has you control and
it has some noise.

And so in one of the previous
lights I saw one or.

More control theory one was Belmont So
let's go with the bowman.

With the bellman principle and
so what the bellman principle

say is that if you want to be
able to find a costly state.

Then say I want to go to the exit

then what they should do is I
should be able to find out.

My cost to go to the to the next state and

from that I'm going to add the cost
to go to the target state so I have

many possibilities I'm going to pick
the one that has a minimal total cost and

that will give me a sense of control for
the state I am right now.

Now this is an equation in words
that computer scientists love and

so if we go into a very
simple example with

an example where you have to
go from the start say to.

What you're going to do.

If you're going to split this
example in two stages so

you want to you will try to
solve this problem in stages so

the task is to go from the start stay to
the goal state the dynamics of a trivial

in the end to control tasks are very
cleaver just choosing with state to go.

So what you will do is you're going to
start from there from that goal and

you're going to find at this stage
just before that goal how much is your

value function how costly is essentially
to go from the state of estate

from state aid to state nine.

And now you're going to.

Repeat this process for four for
states for so now you know how much

is that I mean cost to go from state for
state six seven eight and

you know how much was the cost to go from
six states six seven eight to the GO state

so then you can essentially find what is
the optimal action to take because you

can just compare this cost and realize
that well if you if I go to state seven

that this action to take
it has a minimum cost.

And at the same time you can
find how costly is state for

so you are repeating this process.

Until you go to the starting state and
then you have a path that

will take you from the start state to the
ghosting So there are few characteristics

here in the in the dynamic programming
Cristobal in these very toy example that

essentially it is a backward process
right we started from the go and

we back probably gate be a function and
then if we do that for

every possible state then we should be
able to know what is there of low control

of course here we are shown that we
know the nomics very well they're

going to study nothing changes.

And so.

Is that imagine that you want to
do that for the money to be later.

In the states place that that's really bad
so this is one of the main issues seen.

Then I mean programming place well it
is because of Amish novelty and so

there has been a lot of work in terms
of how to do that because of the Amish.

Now this is a nice graph and

people who do work on on.

The have discrete states in
discrete actions are very

they can understand this graph but
the piñata space.

We like to work to work with P.D.S.
So we have to compromise here between

the computer science point of view and
the aerospace point of view and

I think both of them
are exceedingly important.

So if we go to their world of space
engineering or to let it go in generating

audience or not to control theory
all of this operation of backward

getting the value function He's described
by a partial differential equation this

equation is like it wave equation so these
various estimates by put it propagated.

From the start state to the goal state
this is it by quick question before

a celebration and a lot of work and
there's a lot of work in.

Control and the main issue is how do we
solve this part of the first an equation.

So right so let us assume that
we know how to solve is then

your optimal control of
the social you're going to be

in the negative direction of
the gradient of the value function so

what this is really mean it means that
the controller suited for the day and

I'm still what's areas of the state space
where the value function is small so

if you end up being on the target divine
a function on the ticket is going to be

very small so you go negative that action
of the gradient of of the valley function

if you knew that one function.

So what has happened the last
ten years is that there has

been a work by a cup and.

In terms of trying to see
B five especially for

celebration that was about
two two thousand five.

Burt wrote a very nice paper in the
Journal of statistical mechanics try to

explain how these special differential
equations can be actually simply fight and

if you can simplify them then you
can solve them with something and

since sampling is something that we can
do very nicely then that's a nice way to

solve this of control problems.

So what what he did is that essentially he
said well let me explain I see this value

function so now we have this side
which is the ability function and

now let me let let us get a let
us make an assumption between

control of the noise these are essentially
the weight in your control court

is very low then your
control is very chip is R.

is very high then your control is
very expensive and so what this B.B.

transposes is essentially how strong your.

Stochastic am seventy S. OK so

what these assumptions is that if you
have very strong stochastic uncertainty

then you should be able to
have a lot of control of O.T.

which means that your
are actually has to be low.

And so then.

It turns out that when you do these
exponential because formation and

you make this assumption of end up with.

A lot of question of the spot
in the French immigration

this is called In physics the good of P.
deal for the bulk of for

complex question it
doesn't matter the name.

Very important thing is that you
can solve this partial differential

equation efficiently.

And so in that paper he actually had
a way to find the solution obvious

bias in the first question
which means find the size and

so that this site has
a form of indigo Alright so

that was essentially the key paper.

So then when I was a P.S. The student like
you and I was in the second year of my

of I was actually doing my masters before.

I was very much interested to control so
I had I read all of these papers and

then.

Moved to U.C. and I worked with the fun
I realized that there is in fact a field

theory that I Laus you to solve
this problem differential.

Equations and this theorem is actually
very general is very broad it can

be used in any for
many classes of partial differential.

Equations and it has the name of
two very important people most of

you know it's hard to find one right.

Physicist.

But there was this other gentleman there
Mark Katz was an upright mathematician

he was at U.S.C. And so what this If

so-called family cuts him
I would still say you.

Give these parts of
the financial equation I can.

Express this side as an expectation over
a course and then I will evaluate this

expectation if I sample from just
a class differential equation.

So now all the problem of solving this
partial differential equation has become

a sampling problem so if I have
enough competition power to sample.

Then I should be able to solve
this partial differential equation

I should be able to know and
then I will know if I will function and

then from that I can get my control.

So if I want to express the use
of a control of as a function of

divide we find of.

What you will see is happening is that
these you use in the positive direction of

this means that sysinstall plays a role
of these are pretty state has a very.

Light very low cost you should be.

More desirable so the controller should
push me with their actions off more

desirable States All right this is what
this mathematical expression means

communities that I believe
it's good to go to the states.

And so
then I start working on this topic and.

My work here was to develop
scalable I'll go to this

because there are many
problems that actually.

Emerge from this sampling techniques and
so.

The intuition here is if you
want to go from a start state to

state what if what happens
after you do a lot of.

The sample trajectories you out of.

This particular state you
evaluate the cost for

each one of its trajectory since
you have sampled these trajectories

you have used noice writer whose
ignore sample distance record and

then I will keep
the profile of the of the.

Of the noise for
the first time instant and

then they are controlled the optimal
path of our control will appear

to be an average in between
all of this noise profiles and

the advertising will be excessively
given by this probability P.P.I.

woodstoves you the noise profiles of that
result in trajectories with low cost

should actually play a more
important role on my control police.

Well that is nice in theory when you
drive it but when you implement it there

are some bad news the bad news is that
you have to sample from day nomics and

sometime in the theory say that kept
the sample from uncontrolled a number.

And so why this is a bad idea
this is a bad idea because some

control dynamics are typically not stable.

Also it's bad because it's not safe
when you want to really apply this for.

All systems to work with
the uncle told a nomics.

It's not scalable because many
times you get that secularists in

parts of the state space that are not.

Interesting for the task so you have to be
able to steer to go to victories in parts

of the States plays that
are relevant to your task.

And it's not medically efficient because
well what happens is that you have the six

exponentiation when you get to the sectors
that are very far from your target and

you explain the CA you find of course
you have a very high cost you explicitly

get bunch of zeros.

Right so there's no medical explanation
this explanation is a very it's.

Going to shrink.

Values that psych intake from.

Zero to blast infinity to zero and one so

explanation is a very hard
way to go operation so

then what do we do well what do we
do next we do import our sampling so

we are going to use.

Has been used quite a lot
in particularly and so.

Using this.

Idea.

Of.

That will that will steer the third
sector is in a terribly of way towards

parts of the state space that they're
really one with respect to our task and

then they will optimize
these threats activities so

how do we do import our
sampling well with.

That given by the first expression
which means your sample from the out of

control dynamics but now I want to be able
to sample from the control dynamics so

I have these two equations and
I want to make them to be equal.

So how do I make them to be equal
that's the idea of input and

sampling by essentially incorporating
these weight here with us.

I can't open a book is to cross
the country list if I have ever

had the opportunity to take a class
just across the country list and

I open a book and I go and I look for the
important sampling and I may see the name

rather than the Couldn't that have a thief
or a little more likelihood ratio in this

of a nice photo which I can get and
I can very nicely evaluate So

this new term will actually result in.

Getting a.

Therapy scheme in which I can essentially
update my controls I can generate

the sectaries I can have the security
of scheme I can start with.

Control Policy and
I can improve this control policy and

the result of these imports are something
is that I'm going to end up

essentially with some court
action in the course function.

So.

That was pretty much when I was
the first year of my post oak.

So then I put everything down and

I have this diagram here to
show you that what we did so

far is we start with a nomic programming
principle we did have become an equation

we did the explanation for this formation
we got this very nice simple partial

differential equation for
when we apply the time and.

Then we have a solution for our problem.

And then we have to do important sampling.

So this is the load diagram
of our reasoning OK so

if we did if we did not know
statistically this or or suppressing or.

We are not so much interest about this
theory maybe Graham you may want to have

in your mind.

But it turns out that as typically
happens when you study and when you do.

That there is a whole world
world out there that you may.

Not be familiar so your continued You
are living right you're learning is.

A lifelong experience so.

There was an alternative view and
I was there on the view of all of this

framework which actually resulted in
some very massive results that we

have right now on our project and so

I'm going to very quickly go into
this framework it involves a little

bit of physics but essentially what's
going to happen is that I am not going to

I'm not going to talk about peace anymore
I'm not going to talk about of the multi

principles I'm going to start with
two very important concepts and

they're very important relationship
between these two concepts I'm going to

start from the course of of the free
energy which is typically it has this for.

Big concept of generalize entropy you can
think about it in a less entropy as they

could but Labor Day version is
between probability Q. and Pete so

there's a very interesting
information between these two.

Functions or
quantities with this is actually described

by the third of the equation on my slight
which if I wanted to write it in words

it says that the free
energy is actually work.

Or Times journalists Enderby So
this is a relationship that you find

in thermodynamics OK so.

And also what you can get from this.

Inequality here is that you can optimize
their right hand side though they make one

E.G. with respect to Q You can find
out what is a probably distribution

and you have a way to find that any tennis
out that is given by this expression

which is a very known exponential
function and we see that quite a.

A lot in many applications seems
that this calmness and learning.

So now let's go back today and
I'm going systems.

I will take exactly the same inequality
and I'm going to assign you to this

to suppress the person into P. and Q..

Sample trajectories generated by they
numbers that are I'm control and

they nomics that are.

Controlled so any sample right now
in these probability space he's.

OK And now again I'm going
to open my favorite book.

And I can show that if I take
this inequality where essentially

what's going to happen is that I will
end up with a find them indication.

You see that you have an expectation
over the course function of state and

you have a control cost of value it is for
Over time what is' so that is a course

that we typically optimizing is
the across the of the control theory.

So it them sout that.

Actually becomes survival function and

in fact it satisfies a come to
circle the well known equation so

that was that really blew my mind because
I did not have to go to any argument

to feel you know dynamic programming not
going to be I can mix one principle all

that I have to do is
apply some very simple.

Gensis inequality and again use
these other magazine David if and

I end up with an expression that looks
like the classic optimal control.

So.

There is this other view of control

control theory which is very
provocative because essentially

the nation of this framework is to do that
I become the circle Boman equation and

the other case you start with
the NOMIC programming principle.

So.
So when I when I came here.

And I met a couple of students and
we put a lap together.

Thinking a little bit more provocative
about stochastic optimal control and

what we said and
this is primarily the work by.

Grady Williams is that since we have
this optimal probability the measure

what does it really mean it means that
if I had the optimal controller and

I was plugging it into my day nomics and
I was something physically the probability

distribution over the starts activity
should be given by this expression.

OK So Miles if we are half what
the third sector should look

like if they nomics
are optimally controlled so

then you stand off working with P.V.C.
The stench of going through there comes

a circle people an equation
how about I try to minimize

what I'm going to base my controller
in some way and I put them in that I.

Was between the optimal to which I have
it and actually big Q Which I can but

I'm there but
I'm at the base by an actual controller.

So you can do that and after some.

Analysis.

Essentially you can show that
you can still get it but

I think in the GO controller but this time

what happens is that you don't
have to work with P.D.S.

anymore of course if you want to make
a connection with traditional look to my

little bit of additional sense of
commodity namely to put the I can back and

in particular come to an equation
you have to work with these.

But that is only in the case where
you want to make a connection with

the principles the second benefit
you have is that there is no I mean

assumption right now between
control authority no.

Yes you don't have to make
this assumption anymore and

also the third point is that
he generalizes to any kind

of the classic disturbances you can
have stochastic disturbances for

which you don't have actually only Goetia
annoyance you can have blush on lines

you can have actually
also struck us think.

You can have.

They now mix that but none are fine in
controls meaning that their control

doesn't up at Lena only in your dynamics
so that's a very elegant framework

because she please very general it
comes from a simple post relates

it relates to the addition of up
to my lady but you can go beyond.

The mouth and so this is.

What we have been doing of course.

At the end of the day we will have
to use important sampling and

there are many important
steps important in.

There are many things any important
something which I'm skipping right now but

I'm happy to discuss this offline but
essentially we

apply this framework in a receiving
origin fashion on an actual system and

that's the system that
of the of there are.

So this is I think one day in
September it was before the ACA.

And I believe it was the first
week of September and

this is the third of
experiment of that day.

So.

This is the performance.

That you could get the task here is you
want to go as fast as possible around

the track you are you optimizing
essentially you are sampling trajectories

every sixty hertz.

OK.

You have a model.

Modem essentially that we
call you find it is we have.

You drive a car and
you collect data and you feet a.

Differential Equation I have to say here
that this is a collaborative work this is

a work that is coming from Grady ball and

Brian I believe everybody knows full and
Brian.

They have been working
very hard on this project

now it turns out that this is
with offline learn dynamics.

Now this is another video what do you show
what is really happening as this system

sample subjectivities and
finds the local control appliance

control in recent years OK So there's
another to do what do you see what happens

with the actual system and
then what happens with the.

With IT sector OK.

OK so now that there was a system and
that if you Keisha right you have to learn

the dynamics of flight and so we use just.

Begun to Gratian one of
the simplest things that you can do

to learn the dynamics but then.

Ball and cry and we're very curious and

then they thought about pushing really the
performance of the vehicle and instead of

just doing offline learning how about we
actually learn the dynamics online so

we started with a model that may not be
very high curated But now we are going to

learn the dynamics in an online fashion by
just doing the regard to be squares that's

the thing that everybody learns on a in a
much unique class that's the first one of

the first creation our dreams
to use Garson function so

if you do that then it turns out that you
can go a little bit faster so this is the.

The view from the.

From the government I was.

On board.

And then we saw another video
where you can see some.

Aggressive

maneuvers.

Now there are successes and
failures in many roads or

in the report it's right so these
are some cases where this isn't going to

recover.

But of course there are cases in which
this is them cannot recover right and

there are two three different reasons
why that may happen first of all

the sensing here right very sensing
involved and we use a G.P.S.

sensor and
sometimes you can lose the G.P.S..

Signal also the second reason is that
sometimes the car becomes overconfident

it believes that the modem is very
accurate but it's actually not OK And

I think the reason is that
this is actual little books.

Sometime.

Things do not work and
of course you want to be able to prove.

Your I would I'm sure that you
can get consistency in your.

Experiments but
we believe that the key here for

making these to be.

To improve our work is that we want to

be able to have better
ways of learning things.

And this is what we
are doing actually right.

Now it turns out that you
can get the same framework.

You can get the same framework and
you can actually.

It has some nice properties
about allow you to work with.

For the case where you have
multiple vehicles right so

now here in this particular case the task
is to go through the forest you have you

have all of the obstacles that are moving
and that the task is they have to

go through the forest and
they have to stay in close proximity.

So we have here we have nine
dollars with sixteen states each so

the total is one for the four states.

Now there are of course traditional
methods into a sector of musicians such as

the finance and then I mean program
how many of you know the financial And

I mean programming.

Everybody should know the financial
dynamic programming OK because there

are many papers in robotics and
control on different programming or

they go back fifty years so
if you compared this

method was differential then I mean
programming what happens is that.

You can go faster if you have a task of
course going through the forest with just

a single.

And you can be more risk seeking
meaning you can go closer.

Two obstacles and
that is because when you have to evaluate

your cost you don't take any of this so
you it is OK for

you to have course that sometimes
maybe actually discontinues so

with sampling you're really
exploiting that opportunity that.

Capability.

And this is another plot what happens
if you have nine whether And.

What happens with a complex and saw so
far is so some are going to operate in

this fashion you sample you if you
actually sample do you act but

it turns out that you can actually use
the framework if you want to be able to do

reinforcement All right so this is
work that they did before I joined.

Georgia Tech and essentially you can
do something like a gradient and

you can use exactly
the same exact same frame.

And I have few videos here.

But I think I'm going to skip these videos
because I think that the collective

go to the second part of my talk which is.

Essentially the work that we
have been doing together with

my collaborator in aerospace engineering.

Cerberus we have been in the surf and I.

Proposal Grant and
the task there is for us to be able to.

Come up with.

New audience for the executive that
actually take into account the assets that

you have in your sensors and seventy you
have in your day nomics right now that's

going to be more more more clear but
the questions that we're asking is

how do we get uncertainty if we
have to cross today nomics how do.

You want to push it to believe States and
then.

One of the interesting questions
is that if you want to.

Do.

Believe space that's a maze ation
what's really the proper representation

right you can start with mean environments
you can actually go to the case where you

don't have a day nomics you don't have
a model of the day none exist to work with

in violence but you can actually take
a completely different point of view where

you want to just go and sample.

Control the whole sample of trajectories
So since you guys don't know the D.P.

let me give you a sort of overview of
the financial economic program and.

The financial And I mean programming again
if you want to go from a start say to go

state and what we will do is we're going
to start with any missile policy we get

a trajectory now everybody likes to work
with the systems right because we have

a team as results and everybody likes to
work with complex functions with quadratic

functions because we know how to
optimize them so what do we do we

are going to approximate our knowing that
they not only out along that sector and

take what I think expressions of the cost
function along the same to exactly.

Right and then we will provide gate

value function which is really
essentially a plank but then I make.

Programming on the subject.

And I said we do that we have a way to

find how we should update our policy so
that we.

Overall cost then we update our policy
which anything you do and security and

we repeat the same process until we
essentially converts which means that we

don't have any improvement in our cost Now
this is a method of that requires doing

that over the enemies and quadratic up
looks mation of the cost function and

is the method that we're
going to be using.

Here So how do we provide gait and
how do we even have eyes

then I am well we have
been doing some work here.

In the craters it is essentially a way to
manage your dynamics properly sure that.

When you lean on as your dynamics.

Is not sensitive to your
disk they station step So

what happens essentially is that
you guys are familiar with it.

Or your scheme so
this is the first two graphs on the left

is the case of a double car pool
would just pick one of the states and

want to go to from a starting state
to is zero state so you have.

Three lines with different colors that
correspond to different sedation steps and

you see how the solution.

Actually changed in the first row as you
change your disk with a station step on K.

and see what happens when you use this
took us about a shell of what we have

developed you see that there is actually
some months sensitivity in your actual

solution so there's still classified to
create a easily is less sensitive and

he that allows you to perform particular
maceration for a long time what is'.

He then saw that there
are any implications of the.

Integrator in status to mation because
where else do we have as they now

mix if we want to do status the Mission X.
and they come off leaving so now let's So

what what happens there so here what
we have is that we have the violence

that grows and then whenever we see
divide it has to drop that means that we

have an update of them so
divided should drop right and so

what we have on the on the on
the left side is their oil or

scheme and
what we have on their right side is.

He.

Uses progressive ideational in the greater
So what happens here is that you have

their add in the blue and
red color corresponds to their

larger disk with a stations that you
see as unique because it's going to.

Come with intercessory brakes OK
you have this red line here so

there are many implications of across the
body in the goodness of maize ation and

state is the nation and we have a couple
of papers that have been accepted

that if you are right now and we are going
to continue on this work because

we believe that this is extremely
important now we go to the second question

which is how do you probe i
Gate they believe they know so

now we go from just propagating just wants
to cross the threshold of question why did

they believe they now mix if we
want to include the state they.

And also what happens if we
don't know that they not mix so

what happens here is
that it is they want by.

You paying.

Is that we are going to use a go some
process to be presented a nomics and

to propagate the day nomics So
let's say that you have some useful

data from an experiment you can use a go
some process to learn today nomics and

then based on these go shower process you
can probably gate the meaning of it of

audience so
now instead of having Now I have these.

Two equations for the predictive mean and
the predictive of audience and now I can

take the tools that the additional tools
I have from of my control theory such as

the first and then I mean programming and
apply them on these a presentation.

So I'm going to so the goal is to
steer the mean end of audience

also this framework allows you to
put a photo of simulation since you

have also on your state you can
actually see in which case is unified.

Yes that means that the optimizer
has brought the go shop process of

the presentation far from the initial
data so then you can go back

to the system any and you can query
wanted exactly you get the you to

update your gushing press the presentation
and then you continue your of.

So you want to be able to minimize
interactions with serial hardware

are smaller.

As.

Many mice as more as possible so
it turns out this is State of the art math

which is called people go and I think that
many people are familiar with with with

Bill go is the first method that actually.

Really reduce the number of sample
trajectories that you have to

play on actual system if
you want to run a police.

All right so
that was a very big contribution and

it's still one of the state of the art.

Methods so we compared these methods
with our methods with which is to

use a go someplace that a position of the
nomics and perform particular plays a shot

using the D.P. and
what you see we have two tasks.

Are you you see here that

in terms of number of the directions
with the actual dynamics.

We are compatible to go but
in terms of time for the let's say for

the case of.

Cardboard you need up to twenty two twenty
three hours to find the good policy with

people go wild with our math
of unit twenty five minutes so

we have comparable sample efficiency
with people go we can still not go but

we are we can find actually
policies much faster than people.

And then.

That that was actually a pair and
then what your bank did was

to actually get this Gaussian processor
presentation use them in their.

Back in the GO control framework and
it turns out that we can get very good

results with this approach as well and
that resulted in another it's paper.

And of course you want to be you
want to be you want to be able to

push the frontiers of Fiore and
compute ation which means that

you want to be able to work with
a whole ensemble of the sectors so

no mean nobody wants us work
with a whole ensemble So

it turns out that.

This is something that you can do but
it is very expensive.

Down to it boils down to a stand of
controlling right now it's just a single

ordinary for some equation just
controlling a partial differential

equation that tells me how the probability
distribution changes over time.

This it works that my student by actually
doing and we have few papers that absent

made it since this is recorded I'm not
going to give more details about that but

I think that there's a lot of future
towards really pushing the envelope

in terms of actual theory
mathematical theory sampling and but

I lays ation which can be
result from these frameworks.

And so advanced.

There are we have we also like
neuroscience We also like biomimetic

by only metric.

Robotics this is work that I
did using my post look but

here.

At Georgia Tech actually we have
been working in collaborating with.

A black.

And white university and since we
have all of his albums and we can.

Reject of musician and sample based
control how about try to actually

control a.

Neural system.

And we have some very
very very preliminary.

Results.

So what is next.

So we have these two universe that was
my world these two universe one is

optimal control theory the other
is that this cause physics and

in the same that all of this after I
think fifty forty years of research

I think we have a good understanding
right now of what is the overlapping and

I say that because.

There are many communities involved and

people use different I mean all of
the many times and different concepts.

And sometimes his fav difficult to
bring all of these people together into

a consensus of what we know and
then what is the next step to do because

if you don't know what what what what what
has been done then how do you know what

what is the process the progress that we
should make anyone with their action so

we have to bring all of these people
together to talk about these things.

And so I think that we agree
that at the center of this.

Connection is these very fundamentally
Mother Counts from these two very

important people find my own cats this
is that even if I'm in a cat's Lima.

But guess what there are extensions
of it that can go to the fully known

American base meaning that I don't have
to actually perform any transformations

maybe a little bit of one of the price on
the sampling site and in fact you can go

beyond that you can actually talk about
partially over several systems in which

right now you have to do with this
piece in terms of people or is this.

So this is where the future from my
point of view a lot of these I'll go to

that will be will be developed
are something based so there is a lot

of future in terms of how do we bring
this I'm going seeing to it but the good.

Hardware.

And we thought this is the last flight
the people that have influenced

my way of thinking by reading their work.

I would like to.

Thank my students very much I
have very serious students and

I have to sometimes be funny so
starting from

bank from the left he's worth is
different people will believe stick.

To.

A fan who likes one stuff.

Too.

David who works on the late
system since I took my ization.

To grade his work is definitely he's
world is different in the bottom.

To do all the routes.

George then.

Get out the dilatory who graduated I
was quite amazing him together with

Eric Johnson and.

Yes.

Except Bush and
devising him to give it with one and

showed us we have one leading our lab.

And then my nan and

of course our collaborators here at
Georgia Tech and our sponsors thank you so

much.

For the questions.

Yes.

So.

I think you can get a lot of in from the
information about this question for for

my students who
are happened to sitting in.

Your table but I can tell you essentially
that there was a model with certain

features and then we picked by amateurs
on that model in a way that the BUT

I'M A disappeared female.

In the model so the model is not
only in the state but the but

I'm of the supreme army so
that we can go for the Gratian.

OK but now there's a lot of ongoing work
in terms of how to learn the dynamics and

Biden is also involved in this and.

We have many ideas and
there are many undergoing experiments.

So there was a missile model and then we
picked features from up model it just

I didn't find it but I'm with us.

Yes.

Hyper.

Hyper than ever.

So.

We have not investigated that meaning
that we have don't we we have not.

We don't have any paper but

we have been thinking about that in
terms of what happens if you have for

example there is a change
because you have contacts right.

And so I guess in that case you would have
to be able to predict where the treaty is

going to happen between this
a dynamic say dynamics B. and then.

Do that in a preschool way.

So yeah this is something
that you know it's it's on

the pipeline.

And the other question.

You can ask me any question any question.

You understand enough the question is if
you are a kid if you have kids your city

that's a must but that we just became.

Yes.

We have a driver here and I

think.

That's.

What do what do I stand.

For Ok so yeah.

I think we are I think they're very.

Receptive you know it's
just I think a mother of.

Everybody I got them out
wants to be smart right and

he wants to make two published papers so
I would say I would classify I would

answer your question by saying that I
would classify people in two main classes.

Classes.

People who want to know what has
been done and then move forward and

people who just don't care.

And maybe sometimes you know this is
also very productive because you end up

with having a different interpretation
maybe south in a different point of view.

I'm not.

In a negative way but I believe.

It is important to know
what has been done so

that we know how much we have made
progress in the last fifty years or.

So but I believe the job of the
communities is poverty the communities.

To discuss.

And especially the robotics community I
think that there are bodies community has

many advantages in terms of
other communities because.

I will just remember that there is
a video with it's our family who say

it doesn't matter how
good would your theory is

doesn't matter how smart
you are if it doesn't.

Validate experiment it's just wrong so
robotics really brings this

very nice combination of.

The ability to work with theory but
also test it on a real actual system and

in that sense I think that it is very
scientific it has become very scientific.

So I believe that

robotics is a good place to be even
if you want to be like a hardcore

scientist.

If you want yeah if you want to make an
impact at the end of the day you want to

be able to apply and then you want to be
able to have other people using your work

so about having other people using
your I'm going to do your work.

So I think robotics is really placed
in a very very good position.

You can make a lot of
progress in terms of either.

You know question.

OK thank you so much.