You may want to put it on the loop. How do we make the slide show on the screen and not have the share card on there? Oh, oh, yeah. So on the touch panel, you'll need to turn on the projectors. Need to turn on. They're on. But we see the Zoom meeting. You might not be able. Oh, well, the slide show ended. You'll just have to restart the slide show. Oh, you may need to Yeah, I went through. You probably need to put it on loop, so it just keeps going. Yep. Okay. Is that on timing? The loop is Maybe set up slide show. Uh huh. See if Oh, here it is. Yeah. Loop. Mm hm. Okay. Hold on. All right. Then when you started again, it should just keep going. Then I can minimize this, right? Yeah, I see the slides. Looks good. Do you see the box over the slides or no, it doesn't view? No, I just see full screen slides. Okay, perfect. And they're gonna see it. So we have to go and I mean. Thanks, Katie. No problem. Uh, uh, t. And, um uh, uh um, uh, uh um, I Uh, uh, uh, uh, uh, Oh. Uh, uh, uh, uh, uh, uh, uh, uh, Okay. Okay, all right. Okay, all right. Hello everyone. So welcome to the Irene Robotics Seminar. So it's my, you know, great pleasure. I'm very excited and delighted to introduce our speaker, Professor fugTopku. Doctor Topku is currently a professor in the Department of Aerospace Engineering and Engineering Mechanics at UT Austin. At the same time, he also hold this text Moncrief chart Professorship in the computational Engineering and science. He received his PhD from UC Berkeley, and currently he's the leading director of center for autonomy at UT Austin. I know it has several, you know, excellent groups doing a lot of cool research on autonomy, learning control, and robotics. And UfukRsearch contribution has been well recognized by a series of, you know, very competitive reward, including theNSF Career Award, Air Force Young University Awarld. He's also a fellow of HBE, and he also leads several very large grants, including the Muri Project, NSF CPS Frontier, and the NASA ULI Project. Uh, research wise, you know, UfUk has really a long list of very fantastic research directions, you know, with a large focus on theoretical and algorithm aspects for design and verification for automsystem. And they have focused a lot on at the intersection of fumo control method, reinforcement learning, and control theory. And just from a very personal aspect. So I still remember when I was, you know, PhD student at UT Austin, I took UFC course on, I think it's called the fumo control method for verification synthesis. So I personally learned a lot and, you know, UFC is really kind of getting enter for me to enter this, you know, fom control message. So I really appreciate a lot of, you know, exciting directions. His lab has been exploring fusing on. So with that, I think I'm going to, you know, hand over the stage to Ufuk and, you know, Sensu visiting us, and it's really great to have you here today. And let's welcome our speaker again. Yeah. Thank you for having me. And yes, that the introduction would be short. I'm glad it was short. I can't imagine what long version would be like, and at the end, you made my day. Thank you. Alright, how are you? Is lunch good? Ready to sleep. Alright, let's get to it. Isn't this too much? I can hear myself. Is the sound fine? Okay, good. So yeah, said everything. I'm not going to introduce a lot more. Let's get to it. So, I have an hypothesis in this talk, but you cannot see that I have an hypothesis because the screen is covered. Well, let's see how am I going to get of that. Uh, Okay. Height, height, hight, following meeting. Yes. Thank you. Okay, so there's a hypothesis in this talk. Learning is everywhere. Even if you're allergic to learning, you cannot avoid it these days. So the point of this talk is that we should couple data driven learning methods if we are to use them, we should couple them with the knowledge that we have already. There's no point of actually being oblivious to what we already know to rediscovering everything from scratch, from data every single time that we try to do something is probably not the right way to do. And I think the payoff is going to be increasing data efficiency and bringing in generalization, as in that you may train for certain purposes in certain configurations, and you may be able to use it in other places. And I think that we can make a case for adding tons of other adjectives to that list. But the point is that using data knowledge together is going to create something nicer than each individual piece would be able to do. By itself. I'm going to try to show you a couple of examples where I think that this way of thinking has done quite a bit of good for us. The left one is going to be a story about that you can learn. If you couple it with proper knowledge, you can learn only from 3 minutes of driving data, how to do autonomous drifting on a high performance car wired for drivers driving. On the right hand side, the story is going to go that if you couple again, take advantage of structural knowledge that you may have about the system, you may be able to get your autonomous robot to adapt to changes in the environment very quickly and be able to deploy your robot relatively rapidly after that you realized that the working environment had deviated from what you had trained for. These will be things that I will be trying to drive through. All right. What is going on we have technical difficulties. Did it go? I feel like it's skipped slide, though. I clicked it so many times. Um, I did not. Okay. So the way that we approach this line of thinking is, so Alright, I don't know what I'm gonna do. There's quite a bit of delay here. Um, Okay. Let me just open all these things. So, I believe that the interesting problems that we need to solve for autonomous systems are not within the conventional disciplinary boundaries. Are not just controls problems. They are not just learning problems. They are not just formal methods problems. And I will have this trio throughout the talk. It is not that I believe everything can be solved in the combination of these three fields, but they are just close to my heart and we spent quite a bit of time in my group. The point is that problems are not anymore only controls problems, only learning problems. The interesting problems are at the interfaces. And therefore, we need to develop hybrids algorithms to be able to tackle challenges that we face in autonomy. And I'm going to show you examples of how such hybrid thinking is going to give evidence for the validity of the hypothesis that I have that learning and sorry, data and knowledge should work together. And going without going any further, I also want to clarify that, of course, this hypothesis is not only our hypothesis. It is not new. It is relatively widely investigated and it may be under different names. Some of them I pulled here. This is a relatively long old list. You may see papers or methods on physics in for machine learning using inductive biases, contextual inference or neurosymbolic methods for learning. And again, the list can go on, but it is investigated in many different ways under many different titles, and I'm going to go through my version of it. And this is going to show the flow of the talk. I'll probably not be able to get into all four pieces. But the point of it is that data and knowledge working together can help in many different ways. But depending on the application, the underlying system, the requirements that we may have, the way that we can access data, process data, and use it for our purposes. The way that knowledge is represented for certain purposes, they may vary, right? It is not just one way of representing the underlying problem. Therefore, I believe that we also need to look into a spectrum of models, knowledge representations, specifications, data, et cetera. And as we go through these things, they will evolve. And the evolution is mostly more the dynamical systems aspects will be more dominant early on as we go on, contextual knowledge is going to come as we go on structural knowledge and interactions between robots or the knowledge because of the underlying workspace will be more prominent here. And I'm going to try to go through these things. Four is usually too many for one talk. I may skip one of them at some point, but let me get into this one, the first one. Here the motivation was quite a number of years ago now, if you have a let's say, a fixed wing unmanned aircraft, and it somehow goes through a structural damage or it loses some core functionality that is critical for completing the mission it has, how can it adapt to this change and survive the mission or maybe able to deliver a most of the mission, but not all and even going a little bit further, if such a dramatic change happens, you may not be able to complete that mission anymore. You may have to make this decision that, Okay, I cannot do it anymore. I have to abort. I'm going to crash. When do I stop trying to survive and go and crash in a way that is going to cause the least amount of damage? In any case. So and also note that those are decisions that need to be made in one run of the system. This aircraft is going to fly and everything has to be done there. It is very data scarce setting. And the underlying dynamics of the aircraft had also changed. It is very likely not anything that people had anyone had modeled for. So we need to figure these things out on the fly there. So we see this as the learning problem, but it is very much different from the more modern settings of learning problems. Anyway, so I'm going to start with that. And let me get to it. It is going to mainly build on these papers that I have here. One of them is learning to reach, swim, walk, and fly in one trial. It's not that one robot is going to do all these things in one trial. They were different cases and another follow up paper on that. So the setting is relatively straightforward. We have these dynamics. Let's say this is after the change, the dynamics are represented as a differential equation. And F&G here is not known, and it's not that critical, but there's some dependence, of course, on the input that we have there. It can be polynomial dependence. It doesn't change too much. So our goal is to figure out what F&G are after this change. The couple of questions that we will try to answer. We want to figure out whether anything bad can happen and we will do that by computing an approximation of the reachable set of the system. From where we are, try to figure out a band on where the system might end up and the reason whether that your best prediction on where the system might end up is going to be intersecting with anything bad or not. Is there a likelihood that something bad might happen from the system? And the next question is, of course, how can we choose our control inputs to avoid such bad things from happening? And what is at our disposal in this problem, it is data that streams, the data that we see along the flight of this aircraft. And we try if we were to do that only with those couple of data points that we will see during the flight, it is very likely that we cannot do much, right? Therefore, it becomes very critical that we may try to utilize some side information. And the side information maybe in the form of lip sits bounds, some bounds on the local gradients, et cetera. We may still know the certain parts of the underlying dynamics, or we still know that this is an object in air, and it's still subject to the laws of physics. We may not know the mass. We may not know the center of gravity. We may not know the thrust capabilities of this aircraft anymore, but still it's going to be subject to subject to laws of physics. Can we take advantage of that? And here, let me just emphasize, it is not that we know the underlying dynamics, we just know these bits and pieces. And we will see examples where actually even the dumbest form of knowledge that you might try to incorporate into data driven learning is going to help tremendously in improving the results. I'd already discussed this. I want to have an over approximation of the reachable set on the fly control inputs. Why do we like this? This is going to be in such a way that if we know a little bit more, if we have access to a little bit more knowledge, results will improve. This is a representation of the reachable set, let's say, and we will only be able to get upper bounds on that. And this shows a couple of different upper bounds in this unicycle dynamics. If we have only a loose lip ***** bound here, we would get this whatever this color is. I'm going to call it pink. But if we can actually incorporate a little bit more information into it, if we know the lose lipsidpan and we know that there is some decoupling in the underlying dynamics, we may be able to get this tighter, a shorter set of future possibilities. And of course, shrinking that is going to reduce the possibility that we call a maneuver unsafe versus safe. And this story goes a little bit further. If we have even more knowledge, maybe tighter lifts bands, we can improve the results further. The second reason is that this is going to be almost trivial algorithm, but it is going to run fast, and we'll be able to give bands on its sub optimality in some specific sense. And we will be able to get practically useful bounds on necessary compution. And this last bit, I think that we don't have such guarantees commonly for learning algorithms. And I think at least in this case, it is very critical that we know how quickly this algorithm is going to terminate. We can bound the number of steps that we need to do. We can bounce a number of operations that we have to execute. Okay, so how is this going to work? Sort of inspiration comes from some examples of survival in real life. So this was an aircraft, more than two decades ago now. Um, goes through a damage shown in that image. And the pilots, after the fact, after they could land the aircraft, described it as essential for 10 minutes, they did some small maneuvers, get a feel of how this new object can be flown, and based on that, they were able to land it. So how these small maneuvers that we tried to figure out. And this is not only an aircraft example, later on, it occurred to me if you know how to ride a bike and if you get a now wobbly bike, it is not that you learn entirely from scratch, right? You would just get a feel of it a little bit, and then you can probably adjust to that. It is like that. So it is going to be this collaborator of mine described these little maneuvers as wiggles. So the system is going to wiggle a little bit and then try to figure out something, whatever it is going to figure out is going to be represented as so called differential inclusions. Without getting into any mathematical representation of it, dynamics of the vector field for a dynamical system is just a vector in some tangent space, right? A differential inclusion is a cone around that unknown vectors. Right? So we will represent them as cones around the unknown vector fields. And at certain time not at certain times, initially, at every sampling time, perhaps, apply for simplicity, applies coordinate aligned control inputs, and out of that, we will not get the exact vector field, but we'll try to build something around that. And the notation if it becomes relevant moving forward, unknown vector field is just a normal phase fonts. And if these bold phase G and F will be corresponding to the differential inclusion that is supposed to include the unknown vector fields. So this is the sequence of things that will happen to keep track of these differential inclusions, build them, and keep track of them, and refine them, actually. So let's say we have data points so far and over approximations of the unknown F&G so far. Then based on that, we can construct a differential inclusion. And then as new data point arrives, it provides a little bit more information. Now we can shrink it. And you can see there's just a data point comes and just puts a cut or a few cuts in a space where we try to keep track of these sets. And then based on that differential inclusion, we can compute and there is a close form expression for it up to some error bound, we can compute the over approximation for the reachable sets. And I have these expressions here. I'm not going to get into them. Essentially, they are trying to keep track of the underlying sets that we need to keep track of the sets in the tagent space for the differential inclusion, and then the sets, the reachable sets they are essentially over the state space of the system. The point I want to make is that what are all these expressions? They are set based operations in some almost primitive representation of sets. And so they will be easy to keep track of them. We can do the computations relatively easily, in most cases, actually in closed form, and it will be also possible to incorporate a rich set of side knowledge into these set based operations. Those are the messages that I want to I want to keep from this instead of how these things evolve. But so let's remember that. Okay, so this gives a way way to at an instance where we have some data that we have seen. We have site knowledge. This gives a way for us to compute an over approximation of the reachable set. Now, let's get into how to make decisions. So we are here. If we take this over approximation of the reachable set and put into a decision making loop, we will need to solve some optimization problems for reasoning over some steps into the future. And those problems will not be anything nice. They will be non convex optimization problems that we cannot solve easily. And there's a couple of steps of relaxations here. First of all, we will try to linearize the underlying representation. With respect to the control inputs. And even then even then the resulting problem is still not convex, we will get into a sequential convexification, and then we will solve the resulting convex problems and get into a loop, et cetera. That's going to issue the control comments, and we will come back. We will go in a receding horizon fashion, and we'll get a little bit more data, and we'll go around the loop. Of course, this part is mostly we can keep track in a closed form expression. This part is getting into optimization. Runtime optimization. Nevertheless, all these things can actually be implemented almost real time or real time for the speed of the dynamics that we have tried these on, and I'm going to show some examples. Before those examples, as I said, can we compute these fast enough? And yes, we have both theotical bounds on the computation time, and it grows linearly with the number of data points we have seen and quadratically with the number of inputs and states that we have in the underlying representation. And Uh, and this figure shows for this unicycle dynamics, shows a couple of things. This scion is implementation of what I just described. The sort of blue at the bottom is, if I had known the underlying dynamics, what could I do the best? And the The axes here are the horizontal axis, this unicycle starts from somewhere, starts implementing making decisions. The goal is to reach the star at the center. This one shows how many steps, how much time it passes until we reach the star, and this one is how much time is spent on computations. And so for computations, having lower is nice. And if we had known the optimal dynamics, of course, we can just solve this problem extremely quickly. And these are the others are some back the competing methods for this. And as you can see, this one actually is much lower in computation time, and it reaches the star quicker than the other data driven techniques that do not necessarily take advantage of the underlying dynamics. So this is just a proof of concept, and I think we can now go into a couple of other examples. This is for a quadrator dynamics. Again, we pretend that we don't have model for these quadrular. And the side knowledge here is that we have bounds on the lip shots constants for the underlying vector field. And then we also know that speed is the derivative of position. And might say that's not knowledge. And nevertheless, telling that trying to encode that into the learning algorithm explicitly helps quite a bit. I don't know how exactly it does. It probably provides some regularization for the search for the underlying representations. And actually, this is trivial, but we also have seen similar effect in many other forms of learning dynamics and trying to make decisions out of it. If you can encode such structural information into your representations that actually helps quite a bit. And here what we see is that again, red is what we have obtained using optimal control if we had known the dynamics and green is the implementation of this algorithm. So relatively closely, mimicking the underlying optimal, control inputs and the behavior that we would get out of it. And there's a bond on the sub optimality. We can obtain that depends on the lipschitz constants that we have and the width of the overall approximation, this reachable set. And that is where actually knowing more helps because if we can incorporate more knowledge into into building the overall approximation of the reachable set, then that width gets smaller and the bond on the sub optimality gets smash. Sure. The optimal model. How come the fly sponsor and the optimal The optimal is red, and you are the only one who's asking me for after after eight years and I had not noticed that. Um, net I'm going to guess one thing which I would need to check. I say optimal control, but I'm not specifying optimal with respect to what? But then optimizing what fuel usage or time of that. And we may not be directly comparing apples with apples in that plot. The red one may be regulating altitudes. No, you know, I don't know, so I let me not make it up. Do you have a question there? Very fair question. The thing that I'm hesitating is, of course, there's going to be a very clear meaning of optimization in the optimal control if we go and look into it. Um what Green is doing is a little bit unclear because it doesn't have access to dynamics. It looks into the future and it. Yeah. Yeah. Yeah. This would be very suspicious, Panos, if the objective for the optimal control was reaching the altitude quickly, right? Let me just look into that. Okay, so a couple of more examples here. So this is a ton of things from Mujoku and here we pretend that we know the inertia matrix and the thinking here is that might be a structural property that's easier to obtain, but how the robots interact with the world around them might be harder to dynamics, a certain part. But now you have a different systems that system just by chance, better than the original system in terms of It's a possibility. Lots of It would be there is one more thing that I cannot answer right now. What is the underlying model on which these two decisions are implemented and the performance is checked against? So I don't have that knowledge. Yes, I understand what you're saying. For example, if we had a nominal model and the system is different and these NA feed decisions into it, that could be possible. If the nominal model was treated as the ground truth here, then it would be suspicious, right? Um, Okay, so the side information is inertia matrix, some lip **** spans, and then these systems will be subject to laws of friction. And here are a couple of other um what is this comparisons. And here, these figures, a little bit of a disclaimer, these figures do not even compare apples with apples. Here is what is happening. The red one is going to be just implementing what I told you without pretending that we don't know the dynamics. The green one is going to show the execution of a strategy that comes from reinforcement learning. Tons of interactions with this environment, and it converges or you stop at some point. You have a policy to implement from that. So you take these things and then you implement, and you just keep track of how much reward that you actually accumulate in this environment. This red one actually in this example, again, accumulates quite a bit of more. This actually shows that probably the reinforcement learning algorithm could not figure out how to act in this case. For this example, they are roughly comparable, and for this example, reinforcement learning algorithm actually does better than what we have. But keep in mind that what I say as ours is, again, pretending that it doesn't know a lot of things, and as it goes, during this execution, it figures out whatever it can figure out and it accumulates this reward. So even being I would have taken beaten by the reinforcement learning algorithms outcome in every case with some margin, I would be happy with it. But it is just this is not enough evidence to take these things and start flying aircraft with it. But it is a sign that it is not insane, okay? There is some sanity check. Then even this aircraft example would not be enough evidence to fly an aircraft with. But here's what's happening. So this is a it starts head down and with a decent speed head down. And then we project that we don't know the dynamics. Again, lips spans and some rigid poly dynamics are known, we encode them, and the rest we try to figure it out. The point is that this actually indeed figures out how to just not crash and have level flights, not great flight, but level flight, at least. And we took this comes in a package simulator. We just implemented the LQR baseline that they have in that simulators. The point of this bottom line is are we doing anything? And the point is that, yes, it seems like we are doing something because if we did not do at least with some other baseline controllers, the aircraft actually crashes. It cannot survive. Okay, so this is the end of this first part. And I'm going to now transition to the second part. So the first part is within one run within one run, what can we extract out of the system? The second pieces, we will have a chance to actually do more than one run. It is not as severe as the first one, but we want to still reduce the amount of training time we spent and be able to do something with that. So it is going to now transition to we will sprinkle some neural networks into this. So I'm going to claim that a way of using learning is building a one step predictor of the states and using a neural network to represent a one step predictors. And that's not your favorite representation, that's fine. But I'm going to try to make a case for us. If you are to use neural networks to represent evolution of a dynamical system, we may be able to do better than what just this is already, I can imagine some problems with this kind of representation because whatever you learn as a predictor here is going to be obsolte even as quickly as if your sampling time changes, right? Because with some sampling time in mind, you try to predict the next state. These so called neuro ordinary differential equations emerged more than almost a decade ago now, time flies. Instead of representing the one step predictor as a neural network, they represent the underlying vector field as a neural network. And then out of that, you can of course integrate those differential equations and you can get your state prediction at anytime that you want. And all these things are amenable to back propagation, and you can relatively efficiently train these things, at least in terms of the competion that you have to spend on that. So what we try to do is if we had known a little bit more about the underlying dynamics, can we take advantage of that? And there are two things that will come physics, how physics would inform the underlying the structure of the underlying thing that we have, and then how they might be able to introduce additional constraints that we try to take advantage. And the setting here is we know the structure of the underlying dynamics, but there are parts of it that we don't know. These Gs as functions we don't know, we may know how to combine these things into the overall dynamics, right? And then the structure is instead of having one neural network here, can we have smaller ones and then combine them properly to create the vector field? The next pieces the physics informed constraints pieces Okay, we may not know how the system is going to interact with the outside world as an example. But if there's a contact, you will know some even trivial pieces of things that you will know about that interaction. So you will know that whenever there is a contact, magnitude should be non zero of the contact first because of the underlying friction, you may know that the components may have certain relations, so can we combine those pieces of information into constraints, and these Gs, the unknown things appear in these constraints. So as constraints on the unknown things, Yes, please. Okay, that's the first one, there's no I think the first one is not a matter, whatever you will need to pick some states as you would do in dynamic smelling, and then you can use that directly. The reason I ask that is because there's been a lot of work recently on choosing the right kind of replication for rotations for learning. Yes, I would agree that that would make a huge difference. In this one, there is no such deep thinking going into it. We just use whatever you can imagine, thinking about these proms for 5 minutes. The second question is actually extremely interesting question. I uh so here is the step. You have now an object that you need to integrate, right? And I think that's your second question about that. Stiffness or these are weird objects that probably existing OD overs had not been built having in mind them. That actually we these students whose work I'm presenting at follow up, work on how to integrate the resulting representations more efficiently. It actually required the first implementation was slow and they realized that it was not good enough for runtime implementation and they had to do follow up, work on accelerating the integration of these weird vector fields. Okay, so these constraints will come. Again, these constraints are on things that we don't know and we need to incorporate that into this learning process. And this picture is trying to sort of explain that. The blue points are real data points that we had collected, for example, this robot is working that you would use in learning. Red ones are made up data points. The blue ones, you can treat them as you ran things, you labeled them, labeled data points. Red ones are just sampled from the state space. We don't know how the robot is going to behave at those configurations at those data points. But we know that they will satisfy these constraints. All right? So we cannot label those points, but we have something about them. And essentially the training is going to be behave like these blue points at those blue points and satisfy the constraints at these red points through some penalization. As an optimization person, extremely unsatisfying way of doing it, but we are at the mercy of back propagation, and that panelization is going to get us to use back propagation in this thing as it is implemented in many packages. And that was the reason that we use panelization. It would have been so much nicer if we had a way to directly encode such constraints into the training of these neural networks. But if we could do it, life would have been so much better for many reasons as well. Yeah, at this point, how negative you make the restraint? Because it's free, right on the negative side. So do you just make them as crossed? Yeah, yeah, penalize that this value is on this side of zero. Yeah. You penalize the violation of the constraints. So essentially, yeah, just how small in this writing, how small it is doesn't matter. It cannot be larger than zero, right? Okay. So in one way, this probably can be interpreted as semi supervised version of data dream modeling. And again, yeah, back propagation is there. So here's one experiment or implementation of this. These two students, one of them is at RPI now, the other one is going to British Columbia. So this guy is pretty good with anything that moves. He's flying the drone and he's going to do a couple of runs like this with the landing claims that he did that for 3 minutes. And of course, a lot of people are her roboticist. You would I don't trust them. It is probably after trying 30 days, gathering 3 minutes of data. But anyway, at least the next 15 minutes is going to use the data that comes only from these 3 minutes of flight, and they will go and they will start they will run their algorithm. And a couple of points to be made while they are running their algorithm is that the training data has some bounds on the speed of the drone that they had and some bounds on the certain angles of the drone. And that's going to become important because we will see a tiny bit of actually, things will still work if you go beyond that training data regime with no theoretical or analytical understanding of why and how much it can go, but we will see that in the data. And then the model they will use in a MPC like fashion. And I'm not talking about that paper, but the actual implementation here required just thinking about the underlying uncertainties, noise, et cetera more carefully, and they had to extend this work to stochastic differential equations instead of just ODs. But after 15 minutes of training, I don't know, it's just create this environment. And the aircraft actually just sort of drawn knows this environment, by the way. There is not that it figures things out. I just that piece is not there. Dynamically reacting to the changes in the world is not there. And indeed, these maneuvers actually had required speeds almost twice as the maximum speed that the aircraft saw during training. So this is 3 minutes of data and then and then implement these things. Um, That students who is good with moving objects, he went to toyota Research Institute, and they introduced him to this problem that they want to do autonomous drifting, but tire modeling is a bottleneck because this drifting requires maneuvers at some limits of whatever in code stability, right? And it is hard to model and it may require quite a bit of data. So he thought data and physics based knowledge might be a useful way to do it. So this one he adapted his algorithm. This one, this is a professional driver collecting again, 3 minutes of data. It is just three is no, it is not that these algorithms work only with 3 minutes of potato. It is just three is a good number to talk about. So he drives that. Again, he's going to take that and relatively quickly run on the site. And then they were able to put this on the vehicle, and the vehicle was able to actually deliver some relatively interesting motions. This is that vehicle that they are inside, and this drifting, again, is done. And um, and it doesn't require a ton of training data because they were able to build in whatever they can rely on as existing knowledge. Okay, so this is the end of the second part and end of the 46 minute of my talk. What is the convention here? Do you guys go to the R 50 minutes Wait a second. 30 minutes? 13. Oh, 13. Okay. I was going to say maybe I should add a few more slides. Nevertheless, I am going to skip this third part. Recently, I spoke about the third part quite a bit. Instead of that, I'm going to go to the fourth part. So let me tell you what I am skipping. This next one is going to be we will go a little bit higher in the abstraction. We will want an autonomous vehicle navigate in an urban like environment, quite a bit of structure. And of course, we don't have too much reason about the underlying physics of this, not that it is unimportant, but we can abstract out that a little bit and start thinking about how to get an autonomous vehicle to follow, for example, the rules of a driver's handbook, so high level reasoning. And I'm going to say that such let's say the rules in a driver's handbook, I will call it as the contextual knowledge that we would have. If we are trying to learn to drive in an urban environment, it would be so stupid to try to learn everything, all the rules that the car might be following from just looking at the driving examples, right? You can just go in and encode those things into your driving and try to guide. Then we can focus on things that are harder to model, I don't know, driver preferences, the goals of the driver, blah, blah, right? So that is that piece, and it's going to require, of course, a different way of representing knowledge. And that is where more high level finite state automata or temporal logic like representations will become relevant. And then how we can incorporate that into a reinforcement learning algorithm. To reduce again, the learning effort that we might want to have or to be able to generalize to comply with the requirements that we have already encoded into the learning and reduce the possibility of violation of rules. Okay? That's what I'm skipping. If anybody is interested in that, we can talk about that later. I'm going to go to the fourth one. I guess the fourth one is here. The fourth one is when multiple robots work together, of course, we have to think about all of them together, right? But in many cases, these interactions between these robots actually happen only for special purposes, probably sparsely every once in a while. It is not that ten robots working together, and always they have to hold their hands together and they have to move in synchrony, right? They can for a good amount of time, they can do things separately, as long as they know how to synchronize. So that is one problem that we have been looking into it, but I'm going to talk about another problem where the structure of the underlying workspace gives us a chance to decompose the learning task, and we can take advantage of that structure. Okay? So I'm going to talk about compositionality and the underlying learning is going to be in the form of reinforcement learning. I already gave you why I want to talk about it, so let me just move into the setting. So hopefully soon I'm going to move to something that is not grid world like, but let's just use this as a running example. So we want to with a so what is unknown here? I show you this grid world thing, but if I put a dot as a robot on this, it is going to be moving in the underlying continuous state space here. Okay? And how it is going to how it should do that is not known. But in addition to that, we have constraints on it that if the robot starts from this point, this SI at the top, and if the goal is coming to this target region here, while avoiding these orange regions, they are bad places, I want that thing to be accomplished with one minus Delta probability for some given Delta. And the probable is induced from from the underlying randomness of the motion in this continuous state space. It's not clear from this picture, so I said these things. Okay, good. So, um But then even in this case, I can think about this manure or this behavior, not just a robot starting from here and trying to come here, but I can probably start decomposing it. You know, let's get this robot out of this room. Let's get this robot. If I ever enter this room through this point, I want to be able to come to this point with some probability. We can just think about patching behaviors in different rooms. So this can actually allow us to decompose the learning task instead of learning over the entire state space, we can start learning in localized regions of the state space. So I'm going to define a sub task or set task specification. Whenever I come to an entry condition, I want to be able to go to an exit condition within some time with probabilty P. This P, we don't know what it is. Okay? I know that overall, I want a one minus Delta probability, but these Ps, I don't know. Then bunch of questions that we can ask. How can we decompose a task specification into sub task specifications? How can subsystem can learn with these probabilities in mind, et cetera, et cetera? So recently, not recently, a couple of years ago, we said that let's look into this in a hierarchical way, and the high level is going to keep track of these rooms, which room I am going to. It is also going to keep track of the probability with which I want this robot whenever it comes to your room, the probability with which I want it to actually leave that room safely. Okay? But that probabilty is a design variable here. The overall probable I have is just one minus Delta accomplish this task. What I do in this room, how much stringent requirements I want to put in that room is still a design choice. Overall, once I make those design choices left for us, those probabilities for sub tasks, I want to make them in such a way that when we implement all these things, we satisfy the global requirements that we have. Okay? So that is chap track as this so called what is that? Parametric mark of decision process. The states and transitions are rooms and transitions between rooms, and a parametric mark of decision process is essentially, instead of having specific probabilities of transition, you have parameters for transition probabilities. And the design question is, find me a strategy and the valuations of these parameters that is pick me these probabilities with which I want to do things inside each room. And the strategy is how to patch these sub tasks such that the overall task specification is satisfied. And so it gets into an iterative loop. Let me just go to that pictures, and the story goes in the following lines. So we solve a strategic synthesis problem in this made up parametric market decision process. It spits up these sub task probabilities. It essentially says that, you know what? I wanted to figure out if you ever come to this room with this probability, you should be able to go from this door to this door while not killing yourself. For each of them. And then in each room, you do some learning and you realize, You know what, I cannot do what you told me. But somebody else figures out that, you know what, in that room, not only with that probability, but probably higher probability, I can accomplish what we have in mind. Now there's a little bit of negotiation to be made, right? I can go and make that person's task harder and relax my task. For example, stay safe with a little bit lower probability, and it's going to be probably more manageable learning task, and that gets into this negotiation and adjustments. And the outcome is, again, subsystem policies, how to do things in the room, and what probability I can do it. And then this piece is keeping track of that global consistency so that at the end of it, if every single subsystem does what they are supposed to do, we have a certificate that the global specification is going to be satisfied from these local tasks. Um, whatever. This example, I spoke about it quite a bit. Anyway. I'm going to go to this implementation. Oh, okay. Deeper than the options that Great. Yes. These subsystem policies are essentially options. And on top of the options, I think a couple of things that we can say, we put these specifications that the probability, blah, blah. And then on top of that, the previous one gives a means to essentially adjust these subsystem specifications and keep global consistency. Otherwise, we can think those things as like options. A parametric MDP beyond that. It is randomizing. No, I'm talking about randomized policies in the context of MDP. No, no, no, no, no NMDP is finite state transitions, you attach a transition probability to a transition, fixed number, right? 0.8. Parametric MDP is remove some of those or all of them, put a parameter. Now you have also constraints on the choices of those parameters. A synthesis in a parametric MDP is pick the transition probabilities along with a strategy. It is an MDP. Policy. No, it is not. I actually if you choose parameters, it is an MDP now. Yeah, there's still a at a higher level of abstraction, it's like a different space. You can just remodel to define the notion of policy. Let's take it offline because there is actually there's one more degree of freedom that I think both of these comments are missing. In addition to parameter choices, you still have the strategy choice. It has not been chosen. Okay. So here's the example that I wanted to look into. So again, on one day after six months of working on it, they took there are two levels of simulators for this testing ground of ARL in Maryland, and this one is taking care of the low level physics of it. This one brings in the perception, and of course, you can take it out there. So they applied what I just described. You can partition this into pieces. And then what the robot is trying to do is just follow whatever it had learned for each group. That's fine. But they went and actually changed the workspace. Where is that? It is here. So now this is blocked. The robot was programmed at the beginning to just have this task going from here to here while avoiding. But now this is actually nuts not possible anymore. And it was trained. It was trained to follow this path. For all for this task, it had actually avoided the entire, what it should be doing in the rest of this workspace. Of course, there is nothing cold in this now, but it did have to learn everything from scratch, then this high level model suggested that it runs further learning steps inside a controller that net that had not been trained yet and change the high level mission or not the high level mission, how the high level mission should be executed, and it essentially helped localize where further training has to take place. And there's a little bit of a buzz going around because of some DAPA programs, et cetera these days, like same day autonomy, I think if you want to have this is not the only way of doing it. If we want to have the same day quickly adapting to changing workspaces, et cetera, we should probably be thinking about how we build compositionality in this way into how we train and execute controllers. I am like 2 seconds away from my time. This is the end of it, and I'm going to shut up because I don't see giving a summary. I have been hammering on the same thing. Thank you. I think Let's just ask maybe math, let's limit up to two questions you have. Yeah. There's no question. I guess Okay, about the last part, um, you mentioned something about you are deciding the parameters, essentially the individual probabilities of the transition. And you're doing this optimization to figure out the overall probability gets satisfied. And then you are essentially requiring the RL policy to learn strategies that satisfy these requirements. Yeah, local, yeah. Yeah. So you mentioned something about if that's not happening. Let's say because underlying dynamics of the individual rules for whatever reason, the RO policy is unable to satisfy the requirement. Um do you, um do you have a combinatorial problem of adjusting your original transitions or how do you go about it? Do you essentially re run it by saying, That's not possible? This is I think we would have a combinatorial problem if we did not already specify what the sub tasks should look like. Yeah. In that case, we would have a horrible one. We don't have that one. And then actually, the trick is that we what is chosen is this probability of going with fringe rooms, right? And also then the policy that would be implemented. That can be even though it might sound like a combinatorial search problem, because of the structure of the market decision process and the parameterization of the policies, that can actually be written as a bilinear optimization problem and then use some ADMM variants of that, and that can be solved relatively quickly. Ask me one question. For the neuro D part, I saw you have this sort of some structures about the vector field and you have a multiple smaller network you learn try fitting into the structure. The other way you mentioned briefly, you can just maybe learn one big network. Handle everything. But I guess, the quick answer is no, it will be very, very difficult. But I'm trying to see how much, you know, like this in the problem you set up, how much structure thing you have, like model information you have, how much like the component I learn and where is the boundary? You really push this more towards more Mmm. I think that's an excellent question. And the results I had did not include did not shed any light on that. Um, and I cannot give you numbers on performance or constraint satisfaction or runtime. I am positive that the paper should have such comparisons. If you ask my personal view, whether I should use like one big representation instead of these smaller ones combined in some way. My personal bias in that is if you know that there is that structure, go and use it. I have not seen an example in our work or any other people's work where that's keeping blind to structural knowledge or physics based knowledge or anything hurts you. And also, even if you get similar performance, still use your knowledge because you will very likely reduce the computational requirements or data requirements that you will have for similar performance. Exactly. Exactly. And actually, the third slide or something was that different names, one of them was inductive bias or. This is bias, bias is a negative word that we usually use. But in this case, you are biasing yourself to the knowledge that you have, which is probably not the biggest currency these days. But anyway, let's stick with knowledge here. All right. Thank you.