okay all right uh let's get started it's great to see a lot of people here today uh although it's just the first week of the semester um so let's uh let me just give a brief introduction about the speaker uh so today uh we have a professor glenn cho um you know really our own georgia tech faculty, so to kick off the semester seminar, IREM seminar, and Glenn currently is an assistant professor, actually with joint appointments across two schools, so he's affiliated with the School of Cybersecurity and Privacy in COC, and also Aerospace in COE, so it's a very Very interesting setup. Before that, Glenn received his bachelor's degree from UC Berkeley. And then he moved to Michigan for his PhD. And after that, he moved to MIT for his postdoc at MIT C-Cell. I guess you work with the Russ group, Russ Tevig Group. At Georgia Tech, he directs the Trustworthy Robotics Lab. I think the focus is really about robot autonomy, as you can tell from the title. He has a lot of excellent work on how to design optimization, perception, planning algorithms for this general purpose robots, autonomous system, where you can think a lot about safety, robustness, resilience, and how to deal with real-world uncertainties. So it's very exciting to have him here today and let's see what he's going to talk about and thanks. Thanks, Glenn Thanks for the introduction. Can everyone hear me? All right. Yes. Cool. All right So thanks again for having me So I'm really excited to speak with you all today on our work towards holistic guarantees for robotic autonomy So I'm relatively new to Georgia Tech. I started last November as an assistant professor and just as a shameless plug for students out there in the audience if any part of today's talk sounds interesting to you feel free to reach out we're always looking to involve new folks in the lab so on the show so i direct the trustworthy robotics lab and we work on robotic autonomy algorithms that can enable a variety of robots like ground vehicles drones and robotic manipulators to operate reliably safely and securely in the real world and the goal with all of this is that in the long term, we can design appropriate technologies that can transform society through the automation of complex, tedious, and demanding tasks. But as we all know, robotic autonomy is a challenging problem. Robots are responsible for taking in raw censored data, as well as some form of a human-commanded task description, and turn this into actual actions in the real world, trying to turn this into autonomous task completion by moving a real robot around real humans. To actually achieve this level of autonomy, robots of course process all this information with a complex software stack. Typically, or at least more classically, this consists of a perception module which is responsible for processing sensor data, some motion planning algorithms which are responsible for taking that data, data and a description of the task to plan some trajectory that is going to actually complete the task, and then some downstream feedback control, which typically counteracts uncertainty in the model and in the sensing in order to stabilize the robot around that trajectory that we've planned. And a critical part of all this to planning control is a dynamics model, which governs basically how the robot is going to move. And of course, as we're all aware, in recent years, machine learning has unlocked the potential for end-to-end decision making directly from perception and human inputs all the way down to low-level motor torques. And undeniably, this has enabled a revolution in robotics. Thanks to learning, we now have end-to-end robots operating in the real world, whether or not it's vision-based planning for autonomous driving or warehouse automation. And it's hard to imagine such levels of general purpose autonomy without machine learning somewhere within this loop. But with every success of robot learning, we also tend to see many failures. And I'll argue that this largely arises from the unreliability of machine learning, especially when deployed in scenarios that are unseen during training. And outside of these out-of-distribution issues, we also have other challenges as well. In particular, we know that machine learning models can, for instance, be very fragile to adversarial attacks, where even minor perturbations in the input can lead to arbitrarily inaccurate predictions, which of course also causes challenges for robot safety. And overall, this unreliability can be fatal in safety-critical applications, like in autonomous driving, like in this example where this Tesla on autopilot plowed straight on into this semi-truck and sheared off its roof. So if we dig a little bit more deeply into what happened here, what we see is that this is to some extent at a technical level and interesting failure and one that occurred at the end to end level of integration. So what do I mean by this? So this Tesla was using a learned perception module, which through an unintentional fault failed to detect this truck, possibly due to adverse lighting conditions or due to some misclassification in this object. And the result of that was that this failure propagated, leading to a intrinsically unsafe motion plan, one which would plow straight through the truck, and this was such a deeply unsafe plan that it could not be saved through the local corrective action of feedback control. And as a result, a single point of failure in one component of this software stack was able to propagate through the entire system, leading to fatal consequences in the real world. So the reality is that I want to argue is that if we want to deploy end-to-end robot learning systems in the real world, then we're going to need similarly end-to-end tools for certifying their safety and reliability in the presence of both natural errors as well as perhaps more adversarial interventions from inputs all the way down to low-level motor torques. And trying to get to these levels of end-to-end guarantees has been essential goal of our research so of course we're not the first to think about safe robot autonomy there's a large body of work largely built upon rigorous tools from control theory and from formal methods which show that under some assumptions that the robot or whatever autonomous system you're trying to control is going to remain safe so this begs the question which is why aren't kind of why isn't Tesla or these other companies why aren't they just wholesale applying these techniques to control their cars and I'll argue that this is because a a lot of this work makes three key assumptions that make them difficult and impractical to directly apply to real-world robotics as is. The first is the assumption of a perfect task specification. So I'll argue this is one of the most important inputs to any task planner. If a robot doesn't understand what its task is, then how could it possibly complete that task in a reliable and safe fashion? It turns out that specifications for tasks tend to be difficult to write, especially when it comes to complicated tasks like cleaning a kitchen or driving through some urban environment. A second assumption is that our models are uniformly accurate everywhere. And I argued that this is an assumption that is fundamentally untrue for learned models, which typically tend to be accurate near their training data, but can become arbitrarily inaccurate when evaluated elsewhere, failure, which can lead to failure when integrated into the loop of a autonomy stack. So for instance, as a toy example, my classifier that's been trained on cat-dog images is not going to be able to make heads or tails when trying to classify an image of a car. And the third assumption is that of perfect sensing and perception. That is, the robot knows exactly where it is and where everything in the environment is. And we've already seen an example showing that this is simply not true in many examples in the real world. So due to these assumptions I'll argue that we're kind of currently forced to choose between rigorous guarantees on safety and reliability and real-world practical implementation which to some extent is largely driven by modern machine learning based methods. But I strongly believe that this choice between safety or practicality is not one that we actually intrinsically have to make and that by weakening or removing these crippling assumptions I believe that we can show that principled control methods can actually make real-world robot learning systems both most both more safe as well as more reliable and to work towards this goal in our work we are leveraging learned models for end-to-end model based control and to actually enable the robust behavior of these kind of machine learning components our methods explicitly build and exploit knowledge on where these these learned components are actually going to be trustworthy. So under this umbrella of trustworthy end to end autonomy, we focused on four key areas. So the first is in safe specification learning from human demonstrations and from natural language. So I'll argue that these are more natural ways for humans to specify tasks and as a result less prone to error compared to kind of manually specifying all your constraint functions which has made task specification a bit easier and safer to actually implement. Then given the task specification, model-based control is a powerful tool that you can actually use to construct the control policy that completes that task. But these tools typically assume or rely on having some known model and to overcome or at least weaken this common assumption of having a known model, we've developed methods for safely leveraging learn dynamics and and learned perception modules for the purpose of reliable downstream control, bringing us closer to end-to-end guarantees on safety. Beyond making it safe to simply interface learned models with existing tools in model-based control, we've also focused on how we can make these model-based control tools intrinsically more scalable. Finally, we've also focused on how we can better stress test our robots before or deployment using model-based reachability analysis to uncover adversarial test environments to basically stress test the system, even when the robot is complex and potentially uses learned components somewhere in the loop. But today, for the sake of time, I'm really just gonna focus on two areas, in particular, our work on constrained specification learning from human demonstrations, as well as our work on trustworthy end-to-end control from perception. And then I'll close with some future directions that we're currently pursuing in the lab, as well as our initial work on those fronts. Okay, so let's get started with the meat of the talk. In particular, talking about safe task specification learning. And in this talk, just as visual lingo, I'll be highlighting parts of the autonomy stack that we're currently considering in green. Okay, so before we can plan to complete a task, we're gonna need some kind of task specification to formally define what's actually safe and what has to actually be completed. And typically, this comes in the form of constraints. At the end of the day, these constraints are gonna have to come from a human. And I'm gonna make an argument that we're not always so great at writing down these kinds of constraints. So let's just take a very simple example of adaptive cruise control. So here's one candidate specification that encodes this task. If there is no car in front of me, go at a user-specified speed. There is a car in front of me, then follow it. So, what do you guys think? Is this going to encode the task or are we missing something? Any ideas? Speed limit, yeah. Anything else, yeah? That's true, yep, that's not specified. Anything else, yeah? Yeah, exactly, lots of corner cases. One more. Exactly, what does follow mean, right? So there are lots of different subtleties that you might have to worry about when you're trying to write down a concrete task specification, because robots have to get these kind of unambiguous specifications, right? And after a lot of thought, you know, you can get rid of all these edge cases and you can write down a specification that looks something like this. But as you can see, this already makes my eyes water, right? It's very complicated and it's very easy if you can imagine that if you're asking a human to go write this down, it's very easy to make just a single mistake that can compromise the safety of this entire system. So the question really is, is there a way that we can more reliably obtain these task specifications that are really going to fundamentally drive the system's behavior? Is there a way that we can make sure that we don't make mistakes here so that our overall behavior is also going to be safe? Okay, so the way that we went about tackling this problem is to take inspiration to some extent from how humans naturally might communicate tasks to each other. So let's just consider a very simple manipulation task where this robot has to carry this heavy weight from S to G without getting too close to some fragile glassware. So naturally, if I was going to show you how to do this task, one thing I could do is just demonstrate it, right? I could give a example trajectory in blue that describes what I should be doing in order to satisfy this constraint that I'm trying to communicate. So what I want to do is I want to communicate or at least want to convert this human intuitive representation of the task to some formal specification that the robot's going to understand. And in particular, this could be in the form of a constrained optimal control problem that looks like this. And here, there are going to be safety constraints that are going to define the set of valid or safe trajectories, as well as some cost function, which is going to place preferences upon those trajectories. And this conversion between kind of intuitive and formal is also the goal of imitation learning, where essentially given demonstration data, I wish to learn to act or to imitate in similar scenarios. like for instance if I change my initial condition to S-prime instead. So imitation learning has a long history and of course, it's also been becoming even more popular in recent years. And I argue that prior work primarily falls into two major categories. The first is behavior cloning, where control policy is directly learned from the demonstrations. And while these methods have enabled amazing results on real systems from even a long time ago even to now, they typically still fail when you move far apart from the training data. In contrast, by aiming to directly recover the parameters of the demonstrators optimal control problem, inverse optimal control tends to generalize a bit better. And in particular, how IOC works is it assumes that the demonstrations are essentially solving an unconstrained optimal control problem of this form. Then given these demonstrations, inverse optimal control, or IOC, aims to recover this cost function And it does so by choosing some, at least back in the day, at least it typically is done by choosing some basis functions of this form to parameterize the unknown cost function. And then aims to learn the unknown weights, theta cost. Then to plan from some new initial condition, S prime. All we have to do is basically optimize this learned cost function while accounting for changes in initial conditions. Okay, but a core challenge with IOC in the context of safety is that But IOC typically assumes that these demonstrations are unconstrained. So in order to avoid objectives that relate to safety, these objectives are kind of lifted into the cost function. The results of this is that the overall learned behavior can still potentially lead to safety violations. So for instance, if we just penalize going into this red unsafe set over here in the cost, one thing you can imagine doing is we could lower our path length cost just a little by violating that safety constraint. So there is this introduced trade-off between the natural cost function as well as the constraints that you're trying to satisfy with guarantees. So if you can imagine that it's already difficult to just satisfy one constraint, it becomes even more difficult to represent tasks that involve the satisfaction of many constraints potentially over time. potentially multiple constraints happening at the same time, how do you relax that reliably into just one penalty? It's quite challenging. So to overcome this difficulty, in our work we propose to just directly learn the hard constraints from the data, and then to use those explicit constraint representations for downstream motion planning. So to formulate this problem, I'm assuming that the demonstrator is going to be solving a constrained problem of this form, where there is a safety constraint s of theta which is essentially going to be some safe set that's parameterized by an unknown parameter theta that's what we want to learn. So just to make sure that this is extra concrete theta could for instance in this example be the principal axes of this unsafe ellipse that we're trying to learn. So in this work we're going to assume that the cost function is known although there is variance of our method that can work with cost function uncertainty as well and that it is just some simple function like path length and that basically all the uncertain or all the kind of complexity the task is encoded in the constraints. There might also be some other constraints that we know and our overall problem is the following. So given a small number of demonstrations, basically a small number of solutions to this problem, our goal is to find a constraint parameter theta which is going to make these demonstrations optimal. In particular, basically this optimization problem over here. Then our secondary goal is once we've solved this problem, once we've found a constraint, our goal is to use that learned constraint in a robust fashion in order to enable safe downstream planning. But this problem has two key challenges, and the first challenge really I'll argue arises from our assumption that we only have safe demonstrations. Okay, why do we make this assumption? Well, we don't want to, for instance, force the human to crash the car or crash the robot just for the sake of providing unsafe demonstration data, especially on a real system. And that causes a challenge, which is that typically, you might want to formulate this problem as a binary classification problem where you have safe data and unsafe data. Well, if you only have safe data, you can't just Tribology transcribe this into such a classification problem. So, we need to rely on other forms of information to tell what is safe from unsafe. And the way that we do this is we exploit demonstration optimality, which instead kind of implicitly tells us what is unsafe. I'll go more into what that means later. The second challenge is that we might just simply not have enough demonstration data for the learners problem over here to actually have a unique solution. And this constraint uncertainty ends up creating ambiguity for downstream planning. So as an example, let's say we only had the bottom demonstration. Now there's a whole host of possible constraints that could actually explain the data that was provided, right? There could be a whole bunch of constraints that could explain what we've seen. And that is going to cause a challenge for downstream planning, because which one of these constraints should you choose to satisfy? So to be robust to this, we design a robust motion planner that is explicitly going to try to find plans to satisfy as many of these constraints as possible. Okay, great. So let me go into the depth of our method to some extent. So to overcome each of these challenges, our method has two main components. So the first is in basically how we actually encode demonstrator optimality into our problem. To do so, we use the KKT or the Karush-Kuhn-Tucker conditions from optimization theory, which are basically necessary conditions for any locally optimal solution to an optimization problem. Basically, all such locally optimal solutions have to satisfy these KKT conditions. Our technical contribution here was to develop a way to compactly enforce these KKT conditions within the learner's problem here. And our secondary kind of problem is how we actually plan in a way that is robust to the constraint uncertainty. And the way that we go about doing this is actually just by slightly modifying the KKT conditions that I just talked about to serve instead as a safety check or a constraint checker, collision checker, if you will, for candidate motion plans. Okay, so let me now discuss both parts of these methods in a bit more detail, starting with how we actually learn these constraints. So like I mentioned, we leverage the KKT conditions, which for our purposes involve two main conditions that influence learning. The first condition is something called primal feasibility. And essentially what this says is it enforces that the demonstrations are safe with respect to the learned constraint. So that makes this candidate constraint over here invalid because it marks some on points on the demonstrations as unsafe, which violates this assumption. Okay, so that's the first one. The second condition that we care about is something called stationarity, which essentially enforces that the demonstrations cannot be locally improved without the violation of some constraint. So it makes this guy over here not explaining stationarity, because this bottom trajectory can be shortened without the violation of the candidate constraint. Okay, so overall, let's take a step back, what does KKT really buy us here? Well, it helps us overcome the lack of unsafe demonstrations through an implicit encoding of what is unsafe. So for instance, optimality basically implies that shortening that trajectory, shortening any of these trajectories is unsafe. So as a result, these red regions over here must somehow be unsafe. Otherwise, the demonstrations would have been shortened to basically to lower the overall cost, right? Okay, so to kind of next step is we replace the optimality constraint in our learners problem with these corresponding KKT conditions that I just talked about You might also wonder what happens when the demonstrations are slightly suboptimal Well instead of enforcing the KKT conditions as a constraint we can simply minimize the penalization, or we can penalize basically the violation of these KKT residuals instead. So the question still remains though, which is how do you actually go about solving this problem? And that was the crux of our kind of technical contribution, which was to find that for some simple but useful constraint representations like ellipsoids or unions of boxes and more generally unions of zonotopes, we can efficiently solve this problem to global optimality by posing it as a small mixed integer linear program. One of the benefits of this approach is that it kind of differs from standard methods like gradient descent, which are common in kind of similar techniques or similar kind of imitation learning methods, which can fail to find a good solution even if one does exist. And I'll also argue that by increasing the number of zonotopes within our parameterization, we can build up to solve and represent some fairly complicated constraints. So let's start off with a simple example, though. So here's a simple example of an unsafe set here, just in orange, which is the box that we're learning on a highway driving task. Here we collected demonstrations using this driving simulator and we're able to recover the box. Okay, that's great as a first pass. But what happens when you have more complex constraints, right? Well, the first thing that might now happen is it might just be really difficult to collect enough demonstration data to define a unique solution for or this problem over here, right? There might be a whole bunch of possible constraints that explain what we've seen. So to be robust to this uncertainty, one way that we can do this is to try and satisfy every one of these valid constraints. That is to basically stay in this green area where there does not exist some theta that's in the feasible set that also explains the demonstration data. And the question is, how do we actually kind of stay within this green set? And it turns out that with just a slight modification of this KKT problem, we can turn it essentially into a state-based constraint checker, or a collision checker, if you will. And this checker can be used for downstream sampling based motion planning. So if we're able to generate a trajectory like this pink one over here, using this constraint checker, then we have the following result, which is essentially that plans generated with our approach satisfy or are basically actually safe as long as the true constraint can accurately be described by the chosen constraint parameterization of Theta, and two, if the demonstrations that we got are actually locally optimal. Okay, and so to motivate this more complicated setup, we use VR to collect five demonstrations of a more complicated task now with many constraints, including collision avoidance as well as these human proximity constraints. We have to stay inside this green box to maintain a safe distance from the human. We also have to maintain a pose constraint on the end effector of our arm because we're manipulating some liquid and we don't want to spill it. There are a lot of constraints over here, not that many demonstrations, but because we're able to develop this robust planning approach, we're able to generate a novel plan that remains robust to the constraint uncertainty in the learning problem. Okay, so I just talked about how we can potentially learn low-level trajectory constraints, but in multi-stage tasks like this robot bartender example over here, we might now also need to consider high-level logical constraints. So as an example, logically speaking, I have to first grab the cup before I have anything to deliver to the person over there, right? So this leads to a mixture of logical constraints that we might want to learn together with the low-level continuous constraints that I just talked about, defined in the state space. So the question is, is there a way that we can learn both these low-level constraints, as well as the higher-level logical constraints that are essentially going to stitch together to define some overall task? So the first step in answering this question, I'll argue, is even coming up with a modeling language for such high-level temporally extended constraints. And in our work, we're going to use linear temporal logic, or LTL, which is a tool from formal methods. So in a nutshell, LTL extends standard propositional logic, like and or not, to now operate on time series data to introduce new operators like always some event must happen, eventually some event must happen or some event should not happen until some other event has happened. So to make this a little bit more concrete, let's run through this robot bartender task in LTL as an example. So we have fill the cup, grasp it, and then bring it to the human without spilling. As a first stab at this, we can go and convert this into a more structured format, like so, which we can see encodes the desired behavior. For instance, to never spill for any time steps, that at some time step, we need to visit the human, and that we shouldn't basically grab the cup until we've already filled it, and that we should basically fill the cup first. And overall, this can be formally encoded using LTL operators in this modeling framework. So then the question is, now we have a modeling language for these kinds of high-level constraints. How do we learn these constraints? How do we learn these LTL formulas from the demonstration data? So to do so, we're going to again, kind of parameterize the search space over possible constraints. Now we're going to have two parts though. The first is the same as before. So the first, we basically still have these continuous low-level constraints, Theta C in green, which are just the same as the first part of the talk. They just define these kind of low-level regions of the state space, which are safe or unsafe. What's new now are these logical constraints here in blue, theta d, which are going to be the LTL operators that compose these low-level constraints. And we're gonna call these parameters theta d. So with this new notation, we can write the demonstrators problem just like before, except we now have LTL constraints instead of just low-level constraints. And the learner's problem is also gonna remain similar, except we're going to search now for both low as well as high-level constraints jointly. Okay, but there's a new challenge here when we're trying to learn LTL formulas from demonstrations, which is that the KKT conditions, which we were relying on previously, only enforce local optimality and not some discrete form of optimality or logical optimality, if you will. So what do I mean by this? So for instance, while this demonstration over here, this pink guy over here satisfies the KKT conditions for the far weaker formula of just eventually go to the human, it's not logically optimal or it's not kind of logical at a discrete level in the sense that if we were just trying to minimize the path length, we should have just gone directly to the human instead of satisfying all these extraneous high-level constraints. So basically, KKT is insufficient to explain all the nuances of this pink demonstration. So the question is, how do we go and encode logical optimality or some level of discrete optimality? So we do this by enforcing that our demonstrations are approximately globally optimal within some cost found. And at a high level, the way that we go about doing this is in an iterative fashion by leveraging counter example guided learning framework together with the KKT conditions that I've discussed previously. Okay, so using this method, we're able to learn an LTL formula from a demonstration of a multi-step manipulation task, and this enables us to plan in a different real-world environment where the low-level constraints have changed, but the high-level constraints remain the same, and generalized to this new environment, and enables basically the planning of a novel trajectory, despite changes in the low-level constraints. Let me summarize this work. We showed that we can represent safety for complex tasks in robotics using constraints, and that we can learn these constraints from only safe demonstrations by leveraging optimality assumptions on the demonstrations. This enables us to safely plan beyond the demonstration data in spite of constraint uncertainty by leveraging ideas from robust motion planning. I'll argue that the overall impact of this work has been to make it a bit easier and safer for humans to specify constraints for correct by construction planners okay so before I move on to the second part do we have any questions yeah sure sure so I'll actually I'll probably okay so I think the best use case of this method is actually in this situation where you have like a small number of like relatively curated demonstrations rather than the setting where you have like an enormous amount of like very uncurated your low quality demonstrations. That being said, I will say one thing, which is that the KKT conditions, they work for technically any locally optimal solution. So you can have an arbitrarily bad locally optimal solution and it will still learn something, but it will affect the quality of what is learned if your demonstrations are too poor, basically. Yeah. There was another question? Yeah. Oh, thank you. So, there we go, cool, all right, anything else, yeah, yeah. Will it work seamlessly to have otherwise defined constraints you know like, they're not worth at all? Yeah, exactly, so you can basically think of, like there's that KKT based constraint checker, which you can think of just as adding an additional constraint on top of any constraints that you're already handling your regular ROT, yep. Yeah, yes, exactly, so I didn't go through the details, but yes, So you basically end up solving a even larger mixed integer problem, essentially, where there are now additional binary variables that encode, basically, the relationship between the operators in the LTL formula. So essentially, it's similar to what we had in the low-level constraint case, except there are additional constraints and variables that account for the LTL part of the learning as well. OK, cool. So for the sake of time, maybe let me move on. but this is great. All righty. In the first part of the talk, I discussed how we can learn safety constraints from human demonstrations. This is important because for robots to actually act safely, they need to know what safety even means. That work was a first step in overcoming this common assumption that we have perfect knowledge of what safety means. But if we take a step back, we'll remember that this is just one part of the stack, and one of the three assumptions that we're trying to weaken. In the second part of the talk, I'm going to turn attention to the rest of the stack, and I'm going to aim to weaken these other assumptions. In particular, the problem that I'm trying to study here is, given a task specification, how do we actually go about reliably executing that task in the real world when we are faced with imperfections in our dynamics models and in our perception? Just to remind you and to make sure we're all on the same page, the perception module for me is basically the following. So we're going to take in raw sensor data like for instance, images call these Y and we're going to extract out a subset of the robot state. Let's call that P and this could be for instance, the position and orientation of this drone over here. The dynamics F are responsible for governing how the robot state X is going to change as a function of the control input U that is applied to the system. Of course, in the real world, it turns out that it's very difficult to model the dynamics and perception of realistic robots precisely by hand. So the community has turned to machine learning in order to model the robot perception as well as dynamics using observed transitions and observed data. But data, while powerful, is not a cure-all. Learned models tend to be accurate near their training data, but can oftentimes fail to generalize beyond that. and these models can become quite inaccurate further from data, and using models in those regions can lead to catastrophic failure. I wondered, is there a way that we can actually use learned models more reliably within our downstream autonomy stack? The answer is yes, and one way to go about doing this is to explicitly keep the robot close to data where these learned models can actually be trusted where they're actually going to be relatively low error. So there's, again, a lot of work in this area. Previous work, I'll argue, falls into two key buckets. The first are mostly empirical methods, which, while effective at times, do not assure safety, which is kind of the goal of this line of work. On the other hand, we have safety critical control, which does provide rigorous guarantees, but oftentimes makes unrealistic assumptions when it comes to learned models or on perception. So to close this gap, in our work, we aim to strike a balance to provide probabilistic guarantees on safety from perception down to control by using learned models where they're going to be accurate. Okay, so specifically in this line of work, I want to control a system that has unknown dynamics and observations, given a data set of the dynamics and perception, which I'm going to use to train approximate models, FLEARN and HLEARN, which are essentially going to approximately model the dynamics and the observations. I then want to take these LEARN models, FLEARN and HLEARN, to basically safely get the robot from point A to point B. And for tractability, I'm going to make model assumptions on the system and on the sensors, and I'll show later on that they still hold for a variety of useful systems. Okay, so I'm now going to discuss a bit more detail about what we actually did first focusing on the case of perfect perception for the sake of clarity and then I'll go and relax this assumption later. Okay, so at a very high level this work aims to weaponize the phrase all models are wrong and some are useful. So as written it's a cute model but not one that's particularly actionable. How does this catchphrase turn into something practical that tells me where and to what extent I can actually trust my learned model for control? And we answered this where question in this work by restricting the use of our models for perception, planning, and control to remain near the training data where the models are accurate. And we call this domain the trusted domain D, which we represent explicitly as a union of balls of radius R around our training data. Okay, but even in this domain D, the model's not necessarily going to be perfect, as typically when we're training these models, we can't get zero training error. As a result, we can have little model errors that accumulate over time when we execute on the real system and this can lead to the system falling outside of the domain D. So to avoid this from happening, what we really need to do is we need to stay outside of some safety margin from the boundary of D, where a model is going to remain accurate to account for this accumulation of error. In this work, we quantify the size of this required margin by upper bounding the model error within this domain D. The key tool that we go about doing this is by estimating the Lipschitz constant of the model error within this domain. What this does is it gives us a way to bound the open loop reachable set, basically the set of states that the robot could potentially end up in, in the worst case when following this pink trajectory under model error. But as you can see, there can still be some pretty substantial drift from our nominal trajectory, which in the worst case can lead to crashes. So we can do better by fighting against the accumulation of this model error using trajectory tracking feedback control. And to instead estimate this closed loop reachable set, which is the set of states we could end up under this tracking control, we need to propagate the model error bound that we've estimated here in blue through the downstream tracking controller to actually bound how much the control is going to shrink the size of this set. The way that we go about doing this is by leveraging contraction analysis. But still, this might not be enough to prevent crashes. So the final step is to actually use these closed-loop reachable sets to inform downstream planning by actually using those reachable sets as a safety margin that is going to constrain planning to return trajectories that by construction safely reach the goal. As a byproduct of this, this is also going to encourage plans that actually stay in areas where the model is good and low error. Okay, so that's the method at a very high level. I'll now describe it in a bit more detail, starting with how we actually bound the model error. Our model error bound really relies on the estimation of one number, which is the Lipschitz constant L. What is this? So intuitively, the lift is constant is an upper bound on the slope, or the maximum slope, of the model error. Okay, so you might be asking, why do I need this thing? So let me motivate you on a toy example. So let's suppose that I am fitting a line to a Sinusoidal, and I am trying to get a upper bound on my fit error everywhere within this domain D. Okay, so you can already see that we might be running into an issue, in the sense that outside of data, I'm gonna have no idea what the model error is going to be. On the training data, I will know exactly what it is, but outside, there's going to be this ambiguity. And what L does is it basically helps to mitigate this uncertainty. So how does this work? So let's say that I want to come up with an upper bound on my model error at some unseen point in my training set. What I could do is I could find the closest data point over here, for instance, and I could just assume the worst, which is that the model error increases at rate l moving away from that data point and that's going to give me a upper bound on the model error at that point and you can repeat that for all points within this domain and get this kind of tight spatially varying bound on the model error that's going to be tight near the data and it's going to be loose far away and so this is kind of a simplified setting but this same idea is what we actually use to come up with the upper bound on our model error within our our trusted domain D. So the last question is, how do we actually estimate this constant L? Well, we leverage statistical techniques from extreme value theory to get probabilistically valid estimates of these constants. Okay. So the next question is, now that we've estimated this model error bound, how do we actually counteract this model error through control? So in our framework, we are going to use control and analysis tools based on control contraction metrics or CCMs. And the kind of technical contribution that we had here was given this Lipschitz based model error bound that I just described to figure out a way to derive how we can propagate this error bound through this contraction based control law to upper bound the closed loop trajectory tracking error under this CCM based control law. And the reason why this closed loop tracking error bound is important is that it basically exactly defines the size of this pink closed loop reachable set that we wanted to compute in the first place. And there are two reasons why CCMs are kind of really important for this framework. The first is using CCMs in the CCM framework is actually very efficient to compute reachable sets. It only requires the forward integration of a low-dimensional ordinary differential inclusion which can be leveraged efficiently within kind of a motion planner. And a second reason is that because these reachable sets that we're computing here are informed by a great number of relevant factors, including controllability, as well as the local model error that we're seeing around the trajectory, they're relatively tight. These tubes are relatively tight and not that conservative. Okay, so because we can compute these reachable sets in an efficient fashion, we can use them downstream in a sampling-based motion planner. So just to make sure we're all on the same page, a sampling-based motion planner aims to connect a start to a goal by constructing a tree of candidate transitions within the space until we basically get to the goal. The key difference here is compared to regular RIT is that now each of our transitions is going to be surrounded by the corresponding closed loop reachable sets that we've just computed, and because these reachable sets bound where the robot can end up in the worst case, we can now reject transitions that intersect with obstacles or leave the domain D, because in the worst case, that is something that could happen. So what this does is it constrains the planner to actually return trajectories that are going to stay in relatively low error areas, so preventing transitions like this, and instead but encouraging transitions that stay inside low error regions to robustly reach the target. And overall, the guarantee that we have is, if we're able to find a trajectory that reaches the goal under these expanded constraints, and the elliptic constant that we've estimated is valid, then we have a hard guarantee of safety and goal reachability on the real system when planning with the learned dynamics. Okay, so we implemented our approach for planning control on a learned ground vehicle model. So let's just first to motivate kind of our method. Let's see what happens when we don't use our method. So if we naively plan with the learned dynamics and leave the domain D, then our planner is overly optimistic. It kind of goes over this rough terrain simulating that it was not trained on. It destabilizes and fails to reach the blue target. Okay, what happens if we're a little bit smarter and we stay within the trust domain, but we don't account for the closed loop effects of tracking error and reachability. So, what happens is we get too close to the boundary of the domain, we fall out, we hit kind of the corner of this simulant, we destabilize and crash. It's only when we consider both the effects of kind of staying within the trusted domain as well as the effects of closed-loop reachability that we can actually safely reach the target. And here are some similar examples on a quadcopter. So without staying within our trusted domain, our system crashes spectacularly, whereas as using the safeguards provided by our method, we're able to reliably reach the goal. So to extend this framework to also handle imperfect perception, the core structure of the method remains largely the same. So we define trust domains for both the dynamics as well as for the perception based on the training data for each module, and then we restrict that the robot remains in the intersection of the two. The main difference now is actually in the controller structure. So previously, we were able to observe the full robot states for state feedback control. But now we have to do output feedback. So what this means is we now need to control the system from observation outputs like RGB images. And this necessitates a more complex control strategy that looks something like this. So we're going to first process the observation with some perception module. It's going to be some learning thing that's basically based on whatever architecture you want. and this is gonna be trained from data. And that's gonna be responsible for extracting out the observable subset of the robot state p from this image y. Okay, this p then gets fed into a state estimator. This could be something like an extended common filter, or in our case, a contraction-based estimator. And it's gonna generate a full-dimensional state estimate x hat. This state estimate x hat then gets fed back into our, gets fed back into our state feedback controller, or the contraction-based controller that we were talking about previously. So then the question is, in this more complicated controller, what are the new sources of error that we have to actually bound in order to compute our output feedback reachable sets? So we're still going to have the same sources of error as before, we still have dynamics error, but now we're going to have two new sources of error that we're going to have to contend with. The first is error that's just going to come from training the perception module and how that's not necessarily going to be perfect. We might be making errors when we're regressing our y's onto our p's, just like when we were learning our dynamics. And so we can actually make a similar argument and upper bound the perception error or the training error. Again, using kind of an estimate of the Lipschitz constant, oops. The second error, however, is more subtle. And that is because if you think about it, the control that we're taking now is gonna be depending on the state estimate x hat instead of the true state x. And that could potentially lead to the application of destabilizing control inputs if X hat, for instance, is inaccurate. So as an example, if you want to go over here and you think you're over here, but you're actually over here, trying to stabilize using the X hat could actually destabilize the true system, right? Okay, long story short, the way we also handle this is using contraction analysis. us and when we put all these things together we eventually basically just get a larger bound on the model error within this jointly defined trusted domain and this again kind of allows us to compute these reachable sets these closely reachable sets now in the case when we are trying to control from perception and we're again kind of able to use these model error bounds to inform planning just like before and ultimately kind of the benefit of this is that it It provides us with these integrated guarantees on safety from sensors down to the low level control. So let me just highlight a few examples of our method in action in simulation. So here we are using onboard camera images from a quadcopter and flying through a forest to safely get to the goal using our approach. Another interesting application is actually an active perception. So here our approach determines that we must actually avoid these visual occlusions in a scene where the perception cannot be trusted, and this information-gathering behavior, where we try to stay in regions where we can actually observe the system, is a benefit of this explicit coupling of perception, planning, and control, and really the key kicker in this work is that we show that we can do this coupling with safety assurances. Okay, so let me just summarize section. We show that we can use learned models reliably by using them near to training data where they are accurate, and that we can ensure that the robot actually stays in these regions at runtime by basically integrating planning with reachability analysis. The overall impact of this work is that we can provide stronger end-to-end guarantees on robot safety and robustness in the real world when we are using these learned dynamics and perception modules. Okay, so let me just briefly reflect on everything that I've said. So today I discussed our approach on making task specification easier by learning constraints from demonstration as well as our approach to safe control with learned models by staying near data. In both, we tried to take a holistic approach to safety by quantifying and propagating uncertainty through various parts of the stack in order to inform safer end-to-end decision making. And the goal here was that by weakening the crippling assumptions of perfect task specification models and sensing, we were able to push the frontier at least a bit on end-to-end safety for robots in the real world. While I believe that this work is promising as a first step in the future for more practically applicable guarantees, this is still just the beginning and I think there's a lot of work to do, and I'd be excited to work on it with all of you guys. In particular, when it works, machine learning has undeniably enabled some incredible results on complex tasks in robotics that we are still unable to certify. For instance, machine learning can process high-resolution tactile data, voice commands and natural language from humans in the loop to accomplish complicated tasks involving contacts deformation and many robots and underpinning a lot of this work of course is now the focus on large-scale foundation models and the generalizable features that they can provide so moving forward what we're trying to investigate in the lab is you know design the design of algorithms trustworthy autonomy algorithms that can actually solve such complex end-to-end tasks securely and reliably by this unification of machine learning with safety critical control. So in the final minutes, I'll just propose three kind of key directions that I think are fundamental towards unlocking these capabilities. And the first is secure perception-based control for robots with hybrid dynamics and observations. How can scale this up to more complicated real world tasks using trustworthy machine learning and how we can use learning control to enable safe and resilient human-robot collaboration. So going into each of these briefly. So why hybrid control? Well, hybrid control is important for many tasks in robotics that intrinsically require both discrete and continuous decision making. So as an example, in robotic manipulation, we typically have hybrid dynamics that arise from making and breaking contact with foreign objects, and hybrid observations may arise from visual occlusions. So we're going to be building upon our prior work and trying to solve several new challenges. So for instance, the scalable synthesis of controllers that are secure against malicious sensor spoofing, as well as kind of the safe integration of kind of perception and planning kind of when you're thinking about adversarial attacks. We're also interested in exploring many application domains where these algorithms can potentially be used. And we've made some first steps towards solving these problems. So here we've developed a framework for robust hybrid output feedback from vision to solve manipulation tasks. So to actually achieve this, we're using ideas from system-level synthesis to do error propagation in one shot compared to this separately propagated estimation and control uncertainty that we were doing in the contraction-based framework. So this reduces conservativeness, and we're using an optimization-based planning approach called graphs of convex sets instead of randomized planning to do the planning. So just to make sure that what we're doing here is clear, what we're trying to do is we're trying to push the sugar box to this green target using images collected from these yellow cameras without colliding. And our method explicitly reasons that we need to avoid this black occluded region where we can't reliably control the system. And again, what we see here is this information gathering. And we show that we can also do this in the context of hybrid dynamics and hybrid sensing arising from occlusions. To then scale these control tools up, I plan to use learned models and learned model reductions. So this is critical for kind of generalizable control strategies. When you're trying to do planning and control with, for instance, deformable objects, which involve complicated infinite dimensional dynamics. This again generalizes kind of the work that we've done so far and and brings out new challenges in trustworthy control with validation models, how we can do safe control when we are trying to actively reduce the complexity of those models, as well as coming up with ways that we can stress test controllers that leverage these kinds of models. Again, we're very interested in many applications of this work, and as some initial work towards this direction, we've been studying how we can leverage generative motion planners for planning from rich input data like natural language and images, as well as robustify their outputs, both in terms of safety and dynamic feasibility. The high-level idea here was to design a safe control policy that involves stabilizing potentially infeasible samples from a generative planner, like a diffusion model, and composing this with a downstream certified neural tracking controller, which provides kind of hard-reach avoid guarantees when we are starting from continuous subsets of the state space. And finally, I'm very excited about human-robot collaboration and how we can achieve guarantees for the full stack, even when we have humans and perception in the loop, bringing about kind of new questions like how do we actively and safely infer human intent in interactive situations using multiple modalities like language and demonstrations? How can we perform kind of safe perception-based control when we have multiple agents in the scene? And again, I see multiple possible applications that we're interested in pursuing. As a first step in this direction, we've been exploring ways to extend our constraint learning work to the multi-agent case. So in particular, what we're doing here is we are leveraging kind of local generalized Nash equilibrium conditions, which are similar to the KKT conditions that I was talking about previously, to now learn constraints on interactions between multiple agents. And here we have multiple highlighted constraints that we are able to learn with our approach. Okay, so before I close, I just want to acknowledge my mentors, and my incredibly talented students, and my collaborators who made this work all possible. So to wrap things up, the goal of our lab is really to make rigorous guarantees on safety, the reliability and security for real-world robots around humans. To achieve this, our work unifies machine learning with model-based control by using machine learning where it's reliable. I discussed two of our prior research thrusts towards this goal. The first being how we can safely learn task specifications from human data, and the second being given these task specifications, how do we actually use them to drive safe end-to-end model-based planning? Moving forward, our group is going to be building upon this work to that enable robust hybrid perception-based control, the Skiles using learning, and can enable in the long-term, safe human-robot collaboration. And I'm optimistic that with the groundwork that we've laid, that we'll be able to make more strides towards end-to-end reliable robot autonomy. So thanks, yeah. Thanks, thanks for a good talk. Yeah, I guess, well, I think what actually probably happened was that there was some ground effect from the obstacle that was flying underneath it. But yeah, basically, the model error bound was basically not valid outside of the domain there, and therefore is underestimating. No, this was for, I think, if I remember correctly, I think it was for waypoint tracking. But basically, the planner that was generating the waypoints was not accounting for, basically, the fact that when it passes above that obstacle, that would receive kind of these large, kind of, or basically it would lead to large model errors when moving above that obstacle and therefore, yeah, causing that. If that makes sense. Okay. Sure. Yeah. I'm hoping so. So I'm hoping that there's a way that we can essentially learn constraints from more realistic forms of data besides beyond just kind of state and control action trajectories. If there is a way that we can learn constraints from observations, then there might be a way that we can kind of get reasonable constraints in that setting. And then the hope is also to blend those learned constraints in the downstream planning control to solve these kinds of more complicated problems where you need, I guess, both learned constraints as well as learned models to account for all the deformations that are happening in those tasks. Yeah, sure. So right now, I guess we are able to handle time varying constraints in the sense that these temporal logic constraints are essentially, like, they have different constraints at every individual time step. If the low-level constraints are changing themselves, we aren't explicitly handling that, but if you know the dynamics of those objects, or if you know the dynamics of how the constraints are going to change over time, that's something you could potentially integrate into the optimization problem, and then you could invert those using the KET condition as possible. But That's not something we've explored, but it could be interesting. Yeah. Yeah. This might be something that you already talked about. Are you operating under the assumption that you can see the entire system at all times? Or are there ever cases when it has to go from point A to point B, and it can't see everything? Yeah, so then how does it decide? Yeah, so that's actually the teaser that I talked about at the very end, where we're talking about how we can reason about occlusions during planning. So essentially, what we're doing there is we're trying to model explicitly what parts of the environment the robot cannot see, and then to quantify how that is going to influence the growth of the reachable sense. So there is a way that we can kind of integrate this lack of perception into the planning control. And that's actually this information gathering that I was talking about, where we kind of couple the limitations of the perception with how we should kind of move the robot to kind of be robust.