So thank you for hosting me great pleasure to. Give this talk in. My city living the program and the people who are involved that's a great. I'm very happy to be part of it and. I think that it's going to be one of the strongest programs in the country. So I'm very happy to give this lecture so the title of my talk is the science of put on me. Happy some of us is among controller learning in physics. And. Autonomy is a field that involves a sensory is very is a very active feel very Can I be by proxy about the phone on the right. There are many fields that are involved and so when somebody talks about science before dawn on me. There are. Some logical questions that arise swatches any science right and whether or not we can think autonomy is the same do you think this appearing by itself OK and since I have a good record of things I'm always looking for some good word to appear on my title so this is the group work very great work here who are living together having controlled living and physics and the happiness is happening I think words because. As it turns out all of these this appearance controlled learning in physics have their own tradition right and so sometimes there are interesting. Interactions between people who are working on ina controlled side of things or people who are working only in the lending side of things or people who are working on the on the physics side of things but what I will try to argue in this presentation with of projects. I have been doing. Since I came to Georgia Tech is that as far as I'm concerned it's all one. Of mathematics and whether or not you name it control or physics of learning that is mostly a matter of. Comfortability right we like to boot. Labels on different areas of science and engineering so. I would like to acknowledge. First of all my students I have been delighted to work with very smart students and I have been delighted to. Many times together with my students who are on projects that we have been doing. If you want to do interesting he says I feel that failing is a requirement it's so. I'm really delighted to have the opportunity to work with all of them and then I had some great collaborators here and great students working together with other prefer sort of I'd like to acknowledge its. Jean-Marie and Biorn for the extensive collaborations that we had and then Panos and Magnus and this is my movie from Emory University. And the funding agencies so I would like to acknowledge those who are funding agencies so my interest is in the area of decision making and. If I had to describe what are the some important tradeoffs in the area of decision making I would pull out I would have these graphs I would have these two axis and so you mean. In the in the. Y. axis. Let's. Set the we have seventy. In the representation of the system but we are ready. For the environment OK So let's start with having the highest levels of instead of the. Here and then as we go up we have the low levels of our set to right and then on the X. axis. I will. Clays the time scale of optimization or time scale of interaction of the controller with the actual system right or the decision make it with the actual system so this will be a little bit more clear if I give you some examples so let's say brain for smiling is part of this is your making. Has the model beside the models but let's pick the more extreme side of enforcement learning which is model free where would you please ring for me well I would place enforcement learning. Here because essentially in order to be able if I have no no knowledge of the environment any day now I'm sure the system. I would have to interact with the system and the environment by playing future executives on a System few is something that we can discuss how many but the learning here would happen in a very small timescale right so you have to assess who work with projectors so here is one example of a robot that has to learn how to rotate the stuff right so that's a tofu with with a name you in it and. There is no model of the physics of the in there action of the robot with thought for the TOEFL with the environment for there always has to learn how to rotate the store food by just trial and error there is a reinforcement learning algorithm a sensory call legs are the words from this interactions and learn say policy right so obviously learning interacting with actual system and making decisions is slow here because of the fact that we don't have knowledge of the dynamics in the physics of their robot with tofu and it over with environment right and then you have the chops extol And obviously here you have to be able to be smart to initialize. To learn from them that that is another key work that you have to be able to start with some initial polish and then optimize these bonds then as you move more into model based things right once you have some understanding of the environment and the way how the system that you are considering in their lives with environment then optimization standardly can become fast you can make decisions in a much smaller timescale obviously you have to worry here whether or not your optimize it can meet the serial time requirements but given the fact that many of the algorithms that we have right now can be bought allies we can use Pilot computing in order to be able to make these decisions you know to apply model predictive control and then we can be very predictive right and we can be reactive to right so there is this indistinct but it will from here. That has to do with what is the timescale of optimization or the. Uncertainty and the end there are three points that I would like to make the first point is that. The first point is that there is a tradeoff between our set at the and time scale of optimization and so uncertainties plays a very important also that Waco we are going to represent it is also important here to here these were living can play a very important role right in the way you can represent uncertainty either in the environment or in the dynamics of the system that you're you're you're considering and because we are looking for elegant solutions and I will explain what I mean elegant We would like to have a unified approach right that can actually do both go from reinforcement learning to M.P.C.. Now typically you have only these two. Axis but there is one more axis here with I recently added in this is the axis of the spatial scale so this slice here is a slice of robotics systems that operate and they can be described by so costly for cell equations or ordinary differential equations but their ideas that I'm going to present here are actually scalable to systems that. Can be represented by stochastic partial differential equations right so this sort of systems that operate in smaller timescape if you want to be able for example to be present nano particles you will have a different presentation than just getting all of these or is these so we are looking we are interested for decision making principles that can actually scale or they can actually carry over to two systems and representational systems that go outside the typical resume of ordinary differential equation or so across a differential equation. So that's I have one more slide and then I will dive modding to the algorithms and these are this is more about that with the model of Z. and I would like to be up front from the second page here from the second slide and see and say these are the two people that really have inspired the work in my lab and I like to explain why OK why these two people have inspired us and I believe they inspired many other people too. So the first is fine man right physicist and in one of his. Physics lectures back at Cornell I don't remember the exact date he say's how do we go about a new law. And then he starts by saying OK so we will first guess it and everybody's laughing in the room right now it's impossible to just guess at low but. That sub that you may have to do then you will compute the consequences of this of this of this lot and then if you're going to compare it with an experiment all right and then he says if he disagrees with experiment is wrong and that simple statement is the key to science he doesn't make a difference how beautiful your guess is it doesn't make a difference how smart to make the guesses or what his name is if he disagrees with experiment he's wrong and that is all that is right that's a very solid statement coming from a physicist right and I can take the statement a little bit more towards the people who are doing or have been doing. Controlled. This but. They have not really taken their Then they're furious all the way to combining them with with an experiment not all of them but a significant part so then there is this other fellow here again this Nobel Prize who position himself in a different way he say's what this is basically striking energy marketable is that in fact that mental physics a beautiful and elegant theory is more likely to be right than the theory that these inelegant and in fact he has experience that. He talks from experience he say's in nine hundred fifty seven some of us put forward a complete theory of the force in disagreement with that is out of seven experiments OK not one not two seven experiments it was beautiful and so we did to publish it believing that all of those experiments must be wrong and in fact they were all right so it is the more logically you will put these two opinions in in in in the two sides of the spectrum. We should do don't know me but I think they are both right and I think that they only difference is that and at least what characterizes us is that in the SO time scale of things we are looking for the most elegant theory. The most beautiful mathematics but at the end of the day we will always like to take this mathematics way down to an experiment so they're both right and we should. Know about this I just wanted to side with you. So the outline of my talk is going to be follows I'm going to start by explaining some connections interesting connections between information theory and the cost of control and then you hear how one can actually do it I've used a full algorithms for systems for the classic systems and then we will take this idea all the way to different cases we will try to apply it to the case but we have multi-agency systems then we will consider the case of possible observability perceptual control and then I have I can give anything a lecture on how you can reproduce an unsettling but I just want to give you a tour of the projects that we have here and then some some future. Ideas on what is coming next in my lab so OK so let's start with the following The following two. Quantities here so the first one and I apologize for the computer scientists in the room because I'm going to be using for now expectations by using this measure for a tick. Notation so this in the Go Here is an expectation of whatever you have inside this exponential and you can think about it as a course there is nothing that I can now in these two equations that against the control right this is just an expectation of a general measure. Probability can measure we are going to call this want to hear the free energy because it turns out that if you just open a book Inside the school physics and you see the form of free energy it has the same four and then we are going to work with another quantity here that is the generalize entropy. Well if you forget these minus and you just make the. Cancellation we think that the piece this is nothing else than the black lab or they virgins between the two probably the measures couldn't be there isn't why I'm reading it I'm reading it in this for is because if you pick. Some constant then you're going to go back to sign on entropy So that's the reason why this is called generalized and OK. So you can prove. In a very. Very few steps that there is a. Relationship between the free energy and the relative entropy and this relationship is given by this inequality so this free energy bounded by this expectational of J. affects this was a geo fixed but now concluded that it different probably measured and now here you have this entropy there. And it turns out that there is a representation that is anything but patient while things that this go physics say took a free energy smaller than equal to warp. Times generalize entropy and as I said you can think about this one of a row as a temperature you can think about this as here this is a generalized. And what you can do is you can say OK what is the measured Q. of the optimal measure that if I was substituting back into this expression I could make this make while going to be an equality so this is optimal measure is the measure of the thermal dynamics which means that this is the case where the entropy is going to be maximized second go from a dynamic say that entropy maximizing So in that case this. Inequality here will be reduced to will be minimized and will be equal to the free energy so there is nothing about control up to this point and this is not something that I have discover but this is very simple right very simple so now if we want to talk about focus the control you can start taking measures P. and Q. and associate them with paths generated by focused differential equations and in particular. Let's again get to the same fundamental relationship and now we're going to take this P. and this is going to be a path measure a measure that essentially it presents that executive generated by this the Class B. for some equation OK And then we were going to get this other measure of Q. and we are going to associate this measure Q With the Class B. for some equation the difference between the two is that there is the control you hear here there is no control this is an uncontrolled diffusion because the For Sale equation so now these probability measures are probably measures over the past space. So it's an entire path could be an event that you sample all right and so now the question is if I associate this probability measures to this across the prism then what is the meaning of the school by club are they virgins and then what is the meaning of this free energy because right now. I have make the connection between these measures across the peninsula questions for class and then I'm going systems based one more step before I answer these questions there is a step of associating these J. right now with at that exact thirty and with a course that he is going to be evaluated along the subject of the course has a terror state coast and he's going to have a running course that's going to be also state state dependent right so you have a state that is there to you plug it here you're going to get the value so then south that this. Entropy here is nothing else than a quadratic there and with respect to controls integrated over the time or eyes or of the sample that exactly so this is a function that we typically happens because the controlled right would like to optimize quadratic control functions. And then. The question is OK now this looks like it's across the control optimization problem the question then is what is this bond here we know it took us to control that dynamic programming will give you the global solution so what kind of a is this what kind of bond is this how is it different from they nomic programming in which case right so this is you would agree with me that if I applied this jail fix here I have a course that I will evaluate by sampling still pass to the financial equations OK And so. What you can do is now you will take this the I will take only this part right this expectation and I will associated with this letter five and now there is this interesting that this girl physics is called the cat's name and the fireman cuts. What we does is to create connections between the cost difference equations and expectations and partial differential equations so in fact what happens is that for any set of expectation and is the that you're using to evaluate an expectation you can find a P.D.F. that they represent this expectation and vice versa for the P.D. especially if it is backward and you can always find an expectation and it is the that if you sample forward you are going to show the P.D. with with with sampling so the statement in the frame and cuts to my if and only if it goes in both ways so then what you can see what you what you can so is that this here will satisfy the buyer got some a comma go to. Remember that I'm sampling for executives to generate and then I'm plugging the X. inside the J. and I'm something of the sort of zip that is based on the uncontrolled a nomics because this by the P. I associated to Path generated from uncontrolled a number so what we have is by what some of the question is the drift of the uncontrolled a nomics and then we have the diffusion of these be part of the multiply as the noise so it is a specific way to get these P.D. is not any. But I'm not interested into this fight Actually I'm interested in the entire free energy I'm interested on this. This is the lower bound and I know. The Globe a lot more solutions because control is given by dynamic programming so then I'm asking again the question What is this what is the connection between this expression and dynamic programming so all the have to do right now is essentially the read I have the speed with respect and I know that there is a lot if make a transformation between five and and. Then if you do that then you end up finding something that is very popular a preview that is very popular in stochastic control this is the Come to think of partial differential equation so this side essentially will satisfy me come to circle people on the question disks I will become involved function so we derive dynamic programming we deride we show that disk size will satisfy the dynamic programming principle without using then I make programming arguments. OK just by using measure theory and some properties of of. Of the firemen it cuts. So that's very interesting why because now you have a way to approximate the value function just by only sampling and sampling from the uncontrolled dynamics OK that is what the math say's but unfortunately if you go and you try to approximate this with something that's a nightmare that's a free energy and people have been doing. Research in terms of how you can evaluate free energy as this is a hard. Quantity to compute but nevertheless there is beauty right because that's an outcome of the point of view of looking into. Focus the control so what we have them he's essentially to start with is creating the relative entropy duality inequality and then we follow the steps we apply the five minute cuts Hema going from expectations to P.D.'s then we do that is by what some are going to go out of P.D. then we took a look at it and transformation to go to what interested really this and then we show that this satisfies a common That's a copy Belmont differential equation and therefore it is a value function OK. So instead of south that there is an outer I think way of looking into this problem which is stopping completely from control theory you can start by this time the optimal control for the nation right where you have a course function right and you want to minimize it with respect to controls there is no measure theory here right so then what you do is you did I become that there could be a bell none of the first an equation and then you take the exponential plus formation right now you go in the opposite direction. You're going to have the by put some Acoma go to meet the you're going to apply the FIND MY GOD SO you mean the opposite direction from P.D.'s the expectation and he can so the same bomb you can show that he's going to have the form of free energy. So this attitude two different ways of looking into the same problem but now the question is. The question is. There is an overlapping between these two different ways of looking into this issue and making the question is that we I think we so this overlapping is true for the class of differences. That are in control and noise. And obviously the big question is does the is this is this is this connection General can I generalize it can I take it all the way to systems that have different focus this it is. That's a very valid question once you realize once you understand this picture OK but there is also an I'll go to benefit that comes out from this and I'll go to benefit is that this information authenticated presentation gives you more control in pieces. It doesn't give you the optimal control as you. Function of the gradient of of the value function it gives you the optimal control implicitly what I mean by that this measure here is the measure is what that executives would look like if you had built more control and you apply more control back to this because people are so equation and you were something at least exactly this piece that executive should satisfy this measure. And this hope to measure is actually very examiner in these measures of heretical presentation it holds for general class so focusing systems can be applied to jump if you processes it can be applied to doublets the classic process so more more more focused this is it can go all the way to infamy the missional stochastic process because it's all measured thought it. So now. We're going to be using this to do in France this is where my seat let me will count right and all obviously must lead it will be important for the case of the representation of the actual surpassing process but it's also important to hear that so we're going to be using this measure this optimal measure and they wake up we are going to do it as follows so now I just thought you might be a gave me a fundamental relationship OK And now I have these two is the ace I'm going to stick again with stochastic processes with we're not nice right and I am going to show you at the end of my lecture that you can do the same thing for. The casting process not also classic courses but for the big class of the classic process the problem is that you're going to lose the connection with Dynamic Programming in this case what you're doing a sense of is not focused. Because the control in the sense of dynamic programming but you had the control and so now we have this uncontrolled a nomics and the control dynamics and now since we have this measure what I'm going to do is I'm going to put I'm at that eyes my Polly see you I'm going to take control and I'm going to but I'm at that eyes so now I'm going to be looking into Parliament that I suppose she's right and I'm going to try to force this measure induced by the by I mean that I have control or to be us close as possible to the optimal one they want that is provided by this inequality there is a fundamental difference between looking to the classic problem like this and looking to suppress the problem through they nomic programming and they now mean programming the form of control emerges from the optimization but you need to find the value function here the form of optional control is pre-specified up an audit because you have to but I mean that Isaac. Right but then I'm I'm trying to push this but I meant that I had control of the be as close as possible to the optimal measure which relates the dynamic programming for specific classes of the classic systems. And so it is here is a very. Simple but I mean there is a show that we have been using just essentially having it that exactly a controlled executive. And optimizing within spec to the value of this control. And this is going to be a lot of the important sampling that we will have to use because sampling in the missional States basis from this is is hard in particular to evaluate to evaluate the scale divergence what I will meet is I will meet their issue between they did start and they did Q. based on the definition of equal but level of divergence right this is a ratio that I need so I will have to plug his rhetoric here and take the creating of this expression with respect to you but I don't have that explicitly what I have though is I have the optimal measure that is given by the fact that I have minimized this expression and I do also have that I do not go in there even if between the control and the on Control day not mix which is something that I can open a book in stochastic calculus and I can get the expression. They've achieved and it is so that I can evaluate this they give out the of on sample that of on samples of objectors so this is something I can actually compute So if you multiply this right you're going to go back to the study Q. This is the change in the chain rule for the for the case of of measures and so then you do that you plug this to the EVA Dave's and then he will take the gradient with respect to you and you're going to end up finding an expression for you that there's nothing else on it just an avid aging of the noise that you used to generate the sample to exact it is weighted by an exponential of course and this course will be this J.. That is one more extra step of importance sampling but we have to do with I'm not showing here but essentially schematically this is what happens you sampled exactly this right then. I store all of the noise profiles of this project that is and I will average them to get their optimal control sequence and they weight it weights in the average single be the exponential of the state First you find the off more controlled sequence that's going to have the same time or eyes on your plate just only the first part and then the sample again but you can carry over the control sequence that you found from before you will use this idea but I don't I could invariably begin with is this important something good quality important or likelihood ratio depending on which community you're coming from and then you can essentially. Repeat this process so the process becomes something like an N.P.C. process a sampling based book Rafiq optimization method right now this process can be used also in. Setting right if I don't move the system physically but I'm just sitting always on the same state and I'm sampling always from the same state I can be model free I don't have to know the day nomics of the system because the dynamics of this is them do not appear in the actual. Control equation. And so here is what we have done with the system with this book I think policies we work here with with we. Collaborate with James Ray group on this project and that's a very good for us to demonstrate the ability of our algorithms all the competition is on board here there's a G.P.U. the localization of the vehicle is with a G.P.S. right so we have a fully observable. And the cost that we're using for this track is that there's only it is out of the last visit is out of the last three and of course we have cost in terms of. Keeping the boundaries but we don't have an explicit state course that we have to track. And so the vehicle will go around the truck and it will do very aggressive and I drive maneuvers of course we have done some system on to defeat cation we have identified the dynamics of the vehicle in order to be able to generate samples of executives. And I think that. We were able to go very far in terms of how much we can produce the performance of the vehicle with face stochastic based of my ization techniques then when you get something out in the scientific community people are getting sometimes upset or they try to find out ways to criticize your work and so one critic of our work was that we over to him or over. To the small truck right but because here we had a Georgia Tech and we have a good colleagues like. Cerberus and Magnus who we were able to have a new truck a bigger truck with this was very educational experience for us because now we can test our audience for a long time and we can actually see behaviors that we're not able to see on a smaller truck so then we move to the new truck and now here I have two videos and I want to show you what what is the impact of the model if you have a data model. What what can really happen with his with his method so you need to have a good model you're right you need to do your homework you need to have some system idea but obviously this model cannot capture everything so there is some robustness level that comes from the fact that we are sampling projectors so let's see what happens here right so here is a situation in which we have a good model of the vehicle and this is where the model is not that great and you will see that there are some interesting behaviors here it's actually a funny video. So there are two stead of us is right and the vehicle is getting pushed around from these disturbances do we model all of these disturbances no but again the fact that we can sample allows us to. Be a little bit robust and other people will go in the opposite direction that. So it is important to do your homework right in terms of system identification but. At some point it's going to grow so OK so. We have still. A long way to go here right in terms of robustness and since I am in aerospace engineering. Space Engineering We have many safety critical systems so it is important to be able to robustly fi the performance of this algorithm so there are three things that we have been doing one is essentially working with robust sampling based techniques this is some form of I will talk in the next light that is more work on them assuming side essentially incorporating online adaptation so the first bullet will give us robustness the second bullet will give us more more performance now we don't really do any online adaptation and their third bullet is really going outside the paradigm of we need noise and Gaussian noise and working with representations that allow us to capture suppressed history that is beyond the case so classic courses fusions or all of the processes approaches that we are really interested in looking identify them based on data and use them in the existing framework and here is some preliminary results on Cubase M.P.C. this a paper that we have submitted under review in one of the greatest. Robotics confidences so the idea is as follows You have your yard here you sample trajectories and you have it you have found a controlled sequence and now suddenly there is a disturbance right and the vehicle is getting pushed and is that actually in the way it's all the prosecutors go outside. Of the truck so you have to import them someplace to use right one important something will be the nominal one right because you're you're going to carry this control sequence to the next to sample again to executive So the question is which control sequence you want to use you want to use this control sequence which is the nominal one already or you want to use vis a vis this control sequence so we are so this is the ideal one and this is the actual one so we have two versions of this Him P.P.I. I'll go away from that who ran one for the nominal state and one for the actual state and we will accept. The control of the call total solution from there. From the. Actual state for as long as it's cost is good but obviously we need one more. Controller to push this vehicle to go back to its normal for exactly so there are two levels of M.P.C. up in my ization. Labor We have two N.B.C. controllers other are running in. For the nominal and the actual state and then there's another victory based upon my ization technique that who tried to push the vehicle to go back to the normal. Projected So this is called Q Based M.P.C. here we have to combine. Look at picnics with more traditional. Methods in robotics and here is one example. A You have a system and you want to go out on the circle and you want to be able to go around the circle without really Davey a thing so here is what happens if you apply the N P.P.I.. The old version right and now you will see what happens if you have the two base. N.P.P. I control or they actual vehicle will always be within this bounds we have tried that on the actual vehicle to. Some very recent experiments and I think we were able to go out on this track which is again a long track we were able to do twelve I think laps. And then this is the reason is that it's also funded by the very very very lift rotorcraft center. Here are. Two cases in weights we want to be able to land this this helicopter this is the case where there is no noise the old version of our I'll go to with Noyce the old version of I'll go if and this again with noise but with IS TOO based robust something base to connect So what you will see here is that as you go for a landing this vehicle is getting pushed a lot so you are going to Cras Well here you are going to be able actually to learn so this is ongoing research I think that there is a lot of interesting research questions that will come from this I have done a lot of work. Before I joined Georgia Tech on the reinforcement learning site but I'm not going to be talking about that. Right now so going more to the multi vehicle case the Multi is in case so even the Multi is in case. What you want to do is you want to be able to control vehicles in a cooperative or non noncompetitive fashion but you want to be able to eliminate the exchange of information between the two vehicles and so one way to do that is by using this idea of best response dynamics or basis both. Ration which works as follows There are very very simple if I have to cooperate with another vehicle I will create a copy of the other vehicle and I will assume that the way the other vehicle will make the decision is by using the same optimality principle so every I will double essentially the state of presentation whatever vehicle and the only thing that I will in this way I will have an opportunity to actually predict. How they are that a vehicle is about to move the only thing I need to know is. Its position but I don't have to change anything objective OK So we have done some work on here too and this is all. Very Recent this work will appear in. Two thousand and eighteen and this if we do in a way it's the vehicle that will have to go around this track invited speeds and the only information that they would have to share it just the position of the other vehicle right but all that all the competition is actually decentralized right so OK so for the purpose of time I would I would have to keep this video. Let me show you something that is more who this is the case where we want to do tasks in the no property fashion and we want to actually race against a human right so we'll apply the same principle here it's just that one a vehicle in this case the first one will be. Controlled by the human right and the other one would just pass is actually our. Controller so this has some preliminary results not experimental results that we have right now but we are very much looking forward into. Improving what we have here in terms of algorithms theory and experiments OK so the human will pass again and obviously there's a lot of crashing right this is actually robotics so you will see next that. We do cross. Quite a lot especially when you have two vehicles in the same track so for example in this case this is our vehicle this is the human. You know this crossing it's very hard to get course functions here to optimize it right because there are a lot of criteria but you have to meet so we believe that these ideas of to base N.B.C. will actually fit very nicely into this racing scenarios because we have ways to abstract. Object if. Yeah so and then some more work on the simulation site I'm going to skip this people asking us if the framework scales we have this video here one hundred forty four states we use the same something basic needs. And we want to control all the vehicles this is all centralized we think about these vehicles as one big system. And there are some other work here but unfortunately I don't have time to talk about it let's talk about the perceptual control case we do a lot of M.P.C. in my lab so. This is a classical that we have in terms of the autonomy stuck right you have some sort of trick that if you do M.P.C. and systems but since that task is actually repetitive it's actually in some sense C. to have to resolve a problem that you have generated all of these data you can use these data in order to learn a policy and so that's one thing the other thing is that we want to be able to do these tasks in a G.P.S. the night environment when there is no G.P.S. and we have a lot of data from all of these experiments we have a lot of data that were created so in fact we have a sense only but also observations camera us and will speeds and then we have what they control spills out in terms of Fraggle and theory right so what you can do then and this is in collaboration with Byron is that you can use this data to learn a. Policy and this policy because we're dealing with visual input we're going to use a convolution on your network who have couple of millions of neurons in order to be able to learn from just throw images and will speed the throttle and sitting so that is the benefit of that is that if you can successfully learned responses you don't need to have access to G.P.S. any more and also you don't need to use the G.P.U. in order to do OK my ization right so you can essentially use the G.P.U. for other purposes so this is what you find this with mapping Now this what these videos so is that yes you can do this and the vehicle can go around the track and this is essentially that the picture of that that that we have been using a experiment to learn this policy takes about a day. All there on the truck right so it's not easy to get these experiments done on an actual vehicle and have training the neural network. Outdoors but people ask the question does it generalize right because it may just only have memorize the same so here are some plots you know an effort to address this question here we so essentially we have a way to map all the high dimensional data from the visual cameras from the cameras through to the space and we have the train and the test and this is the case of imitation learning and this is the version of imitation and what you can see is that four wheel speed the test and the said are on top of each other that is not surprise because the signal is actually very low their mission but when it comes down to when it comes down to actual visual input. You see that the training test is not the same as a test set and there is a way to see that also in demonstrations the vehicle can actually go around the track even if you change the lighting conditions so if it is close to night OK so there is some level of generalization. So. The other thing that we have been doing is instead of actually going completely end to end completely end to end right we want to be able to actually put some structure in your network. Architectures motivated by the fact that we have structure in. Decision making in these strategies any decision making would have to have a course function some optimizer and they now mix and this is another project in weeds essentially we learned. The course map just based on raw data. Images and we were able to perform the same task. And so now one of the questions that we are trying to address is which architecture is better or more optimal What does it mean to compare this to you it on network. Architectures and we think that we answer to that is in terms of free things one is the performance of a car well it's one of these methods is going to work the other one is robustness and generalization and then the next which is the more important for us is efficiency of the underlying training I'll go to and that is something that we believe there is a lot of food in the can with a nomics look after that with a nomics that you can use to actually compare the fish and see of the training on these two. Architectures so that's ongoing greasers. So I think I'm out of time I just want to show this plot here and this plot is motivated by the fact that all of my background is actually C.S. and E.C. but I'm in an aerospace engineering department and so people talk a lot about uncertainty right but there are different ways to be present. And save the day that the that come from different communities so in the aerospace engineering community. It was very very popular is very very very popular right and it is a way to be present and said that the especially if they are said of the east but our method but then on the muscle building site and more on the computer science science side of thinks you have these data driven methods right you have none but I'm a big metal semi but I'm at the metals and you're on the rigs and pretty much in this forum to have years at Georgia Tech we have done work on each one of this blocks here I just want to show you one example that these are related to the unit that I show you before we enter in policy he never space systems you want to have backup systems right you want to be able to. Have backup systems that make the systems to be safe so. The question is how can we detect that these polices will be robust how we can detect that these policies that nothing follow with these policies that's a very fundamental question to us and so let me show you a video so this is a video in simulation of the vehicle going out on the track and this is the violence that comes out as an up with from from then you're on a trip OK So this is this is the control so that you don't it will spill out. Control and divide about that control estimate so what so so so so what happens is that if you present to the network some new data that it has not seen before you get a spike in live audience of the control of the control policy right so that is then a way to say OK then you don't network has seen something that it hasn't seen before should be I should be able to go back to a expert in this expert for us is going to be an N.P.C. controller that's going to be fully. Observable So the question is therefore how do you how do you how big this how big this variance should be how do you find that threshold and so there is a control through the question here and there is a reinforcement learning question here you cannot there is this question through reinforcement learning by doing learning in order to find peace. But there is also they control flow to the point of view of looking into these things which I doubt that is going to be able to come up with a general theory that would solve this problem for any system and for any task right so we have done something like that here and. This is a paper that has gotten to. Review So let me skip and go to the end there's a lot of work that we have done on. Uncertainty estimation I think I have another five minutes or so so what is next and I know that some people are in the audience are really eager to see this light of what is next. So once you understand vis a vis this this connection. Then you can ask many questions so one question is out of any information for a tick problems that have no dynamic programming or presentation and vice versa is that any They nomic programming representation of the class to control that there is no information for a the equivalent. With are two different questions they fit question is the there's another axis here can I take this area of overlapping between the do disciplines and see how much I can push it forward like are there any other systems. For with this this connection is true and the as it is yes the there are cases in which he can go beyond his the A's B. just across the financial equations and he can still so that's the classical more control theory and information theory they still collapse of the same solution for a more general class of the classic systems OK. So just an example here you have a jump to Fusion right and you want to control the jump to fusion so now I don't have only the winner noise but I have also the noise you're right and the noise is going to be actually very general in in the sense that there is no neophyte in great when there is no only a spike but also the absolute get over that Spike might be around them viable too so how big that Spike is going to be these are called Double X. the graphic processes so you can actually do the same thing he only thing that you have to worry about is how is that I don't I could in theory but leave he's defined between the uncontrolled end and control dynamics but once you have this expression you can plug this expression back to the K O that bridges many ways ation new control this is the case where we treat this because this is something bad right you want to just reject the cost of the settlers but there is another point of view of looking into this problem how about if you want to control this because these say that the rate of the process is one of your control parameters right let's say that you are interested in neutral morphic forms of doing stochastic control so in these cases you talk about spikes and the thing that you can control there is they're fighting very. The same principle can apply here he can use again this information political presentation and you can do you drive control that you can actually apply on an actual system and then more on the data they not be programming we have done some work the last few years. With promise to give this idea of a nominee enough time and got three months this is an idea that goes. On the direction of of the NOMIC programming. This exponential to us formation and this doesn't work. For so there are interesting problems in this area that cannot easily be cut through by the information for the nation problem. Which is typically used in aerospace. Applications of the past to control just because of the fact that you don't want to use a lot of thrust right and so by that I will thank you and I'm happy to take any questions you. Can ask me any question will be yes. So there are two forms of uncertainty writing it on networks as far as I know one is their little toric and the other is that. I'm seventy. Percent of these what about. If I recall is what about. Lack of knowing the environment is one about like zero zero or. Beta. So. In theory I think you can get both but the question is can that can the operation of the. People form in real time all right so not something that we are still working on. I have no answer to that you know we just I mean the work that we have been doing right now is for the last six seven months. On this I.D.S. it will be great if you can actually become them and you may be able to decouple them if you have a loss function that looks like. Maximum likelihood. Loss functions. In which you have also the Sigma that you try to come up with. But addiction about these variants but I don't have a. That I can say here is an algorithm and it can scale and it can work for many many many cases. People ask me all of the above we are new into this. Business how to question. Great thank you.