Because I was aware of the Y. Do you want to click? I do not need a no, I don't need a clicker. I apparently, very loud. Did you a few weeks ago? Yeah. The advantage of two airports that are easy. Okay. Hello, everyone. Welcome. So it is my great pleasure today to introduce you to today's speaker, Professor Todd Murphy. He's a Professor of Mechanical Engineering and of Physical Therapy and Human Movement Sciences at Northwestern. He received his Phd from Caltech in Control and Dynamical Systems, and he works in the broad areas of robotics control, human machine interaction, and emergent behavior and dynamical systems. He has received the NSF Career Award. He's been a member of Darpart Defense Science Study Group, and is a current member of the United States Department of Air Force Science Advisory Board. So he's going to talk to us today about control principles for robot learning. Without much further ado, I'm going to give it up to Todd. Thanks. All right, thank you. So first of all, it's a pleasure to be here. It's awesome to have a crowd. And, you know, I primarily wanted to come and give a talk so I could just give a talk that had robot learning in the title. And then I reverse engineered from that every, all the slides that follow. Partially because as all of you know, robots and learning have just become this incredibly hot topic of intersection over the past decade and now, you can barely go to a conference without seeing, you know, 1,000 papers on how learning in various ways is influencing how we do robotics. And one of the purposes of this talk is to think about, well, how should robots and the fact that they are these physical things that exists in the continuous world and have to operate in continuous time. How should those be influencing the way learning happens? It's worth saying something about what my group does. We do in fact, have a lot of work in human robot interaction. Where the robots are supposed to assist people in actions. It's supposed to make them better and have either therapy benefit or have training benefit. We also have done work with physicists looking at emergent behavior of things that our macroscopic models of things that we might expect to happen at the nanoscale. We do work with companies on social navigation, so the top image is a robot that's supposed to navigate through dense crowds. This is work with Honda Research Institute and they're interested in robots delivering the lunches that you are all eating. All of these have in common that the robots are operating in situations where there are not just small unknowns that they have to work with. Not just parametric uncertainties but somehow big dramatic unknowns, Things that are either unknown, but they might actually just be unknowable ahead of time. There might not be data available ahead of time that models what they're going to encounter. And so how should the robots practically deal with that when really the primary thing that they have at their disposal is the ability to move in order to extract information. So why does learning need control? It's largely that there are lots of ways in which robots are impoverished. They have image net, for instance, is made up of largely beautiful pictures that were taken and curated to make the labeling and the associated learning as easy as possible. Robots have to operate in crappy environments with poor imagery or other types of sensors. They need to be able to use that to do things like Slam. They need to be able to learn their dynamical models in an environment. They need to learn about interaction. The only thing that's helping them is that they know something. Typically, we have to figure out how to incorporate that something, Whatever it is that they don't need to learn passively. They do not need to just observe the world and then infer from the things that they observe. They can interact and poke and probe and provoke data from the world. This last point that robots get to shape their perception in support of learning. They can either do it passively by just sitting or maybe randomizing inputs and seeing what happens. Or they can do it purposefully, where they look at their learning needs and they take purposeful action to extract information from the world. What I'm going to be talking about today are some examples of both. Situations where I think cartoonish situations where I think we can agree that this is actually a need. But then secondly, I'm going to talk about some very concrete approaches to creating control architectures that support this need. And so the concrete cartoon is I'll come back to the same simulation at the end. But imagine that you're in an airplane and you say a door blows out the side of it. And that somehow something unanticipated happens that you are not directly dedicating sensors in anticipation of that event. And all the plane knows is that there's been a dramatic change to the aerodynamics. A dramatic change that, let's say it's dramatic enough that you think you're going to crash. And now you have to decide you're up at 20,000 feet or whatever, and you have to decide how am I going to allocate my time between now and the ground. Should I just immediately start trying to arrest my dissent? I mean, that would be my first approach likely, but that's because I don't know what I'm doing. Instead, you might want to use some of that time to learn about the new dynamics that you're operating with to figure out what the affordances are that allow you to arrest your dissent. You might want to use some of that for other purposes. Like once you think you've learned something about the dynamics, you might want to spend some time certifying your knowledge of those dynamics so that you're confident in the predictions that you're making. And all the while the ground is getting closer. Whatever the case is, this is a single shot learning problem. You do not get to have lots of parallel executions of lots of parallel commercial airliners falling out of the sky. You get to do it once and you hope that you do it the best way at your disposal. And here what you're seeing is an example that I'll talk a little bit about the details of this. But this is something where in real time, the vehicle is, we have parametrically removed one of the motor, one of the rotors from the quad rotor, but the model does not know that. Instead, it has to completely re, learn a model and it spends the first second using its control architecture to figure out what affordances it has at its disposal. And then it reasserts its control over the vehicle on the left. The thing that continues following is what happens when you passively just record data and try to infer from that data what the dynamics are to assert your control. In this case, I certainly want to be in the orange vehicle instead of the purple vehicle. Okay, this really comes down to a picture, and particularly for people that are already at the intersection of robotics and learning, there's a version of that intersection that looks something like this. That you have a plant and you have control for whatever reason. You know, we have historically been not great at controlling that plant. And then you want to use like all these amazing tools that have been produced in the learning community over the past decade to improve the performance of that. You do that by adding some learning mechanism to the controller itself. I'm going to call Learning for Control. You are improving, hopefully you are improving our classical approaches to control by adding a learning element to the way the controller works. But control for learning is still, you have a plant. There's going to be a physical component to that, but now the thing that you are regulating is the learning process itself. You have a neural network and some physical thing, and you're trying to regulate how both of those evolve over time. Possibly your controller will also be a learning element, but it doesn't have to be. It's possible that the control of the learning process might be an explicit controller that you can write down a formula for. And that would be totally fine. There's no reason to believe that control for learning has to be learning based. It could be explicit formulas, or maybe we'll find eventually that you really do want to do a learning approach to improving learning. In fact, if you want to think of this as reinforcement learning in the standard way that it's implemented, you're probably going to have neural networks in both places where you have a neural network that is the thing that you care about and you have another neural network that is reasoning about all possible executions of that neural network on the right. Great. Like if you can figure out how to solve that problem, I am absolutely all for it. It sounds hard. Everything I'm going to talk about in this talk does not take that perspective. I'm not layering neural networks on top of learning how to make neural networks evolve in a coherent way. And so you just so you know how I'm using language. Sort of my assessment of the current state of machine learning is that we're going to talk about machine learning as the taking and processing of data for prediction, typically as related to sensory systems and perception. And then reinforcement learning is the processing of models for decision making, often dependent on simulation as well as data. We're all now in the world where modern ML is enabled by big data, and that typically means that the data came from people. So for those of us that are at the later stages of our careers in the room, we remember when we were in graduate school, vision was not a particularly interesting sensor, like there were lots of sensors that were special purpose sensors for whatever the thing was that you were trying to build. And it's really the data that was taken by people and curated by people labeled by people that enabled vision to become such a profound sensor modality for robotics. And partly because all that stuff exists in the cloud, we've now sort of become accustomed to the idea that compute happens in the cloud. I really like this. I don't agree with everything that Lacoon says, but I really like this quote from Lacoon about, you know, we have to reach. Cat level and dog level AI. Before we talk about human level AI, I think this is a very coherent way of understanding of what are animals good at. One of the things that animals are just extremely good at is dealing with novelty. That when you give your pet something new, dramatically new that it knows it has never seen before, it deals with that in a very fluent way and figures out stuff about it, figures out how to represent it, can identify it in the future. These are also coherent tasks that we might want a robot to be able to do. That they do not necessarily make sense in a context of just a big data view it, it's not just interpolation between things that we've seen before. Then lastly, I'm just going to say that physical learning is different from general machine learning. One of the ways in which it's different is that offline processing doesn't make very much sense when you have a mechanism that is in the business of learning that whatever algorithms you use, they have to make sense in a runtime scenario. Now it might be true that it can take advantage of offline processing in some components of the way that you've modularly created it. But most of what's going to happen for a robot that's learning about a new thing, it's going to have to make sense as a run time process. The other thing, this is where there's an advantage to that motion. If you think about generalized adversarial networks, like the example of seeing the kitten's ear sticking out behind a piece of furniture. One approach is that you apply again and figure out that, that ear probably belongs to a kitten. The other approach is that you just move. And you move around the desk and you go, there's a cat. The real question is which one of those is energetically more reasonable? Robots are actually pretty good at using, transferring their energy into mobility. It's not clear that advanced computation when you have motion at your disposal is the right way to go. And this comes around. So there's this great tech review article from April of 2023 talking about open AI and where the data is coming from and how the inadequacy of data is one of the big hurdles that we're currently facing. One of the quotes here is that the state of the art around data collection is very immature. That we don't know really how to collect data that will populate these big models that we want to create. And one of my claims is that robots are data collectors. They are not just machine learning users. They actually are in the business of collecting data if you are willing to make them in that business. Okay. So this is all sort of a, there's a spectrum here from no data, a spectrum from no data up to big data. A lot of what we've been seeing is technology that really exists because of big data. We've figured out good computational ways to make that technology slightly leaner in terms of how much data it needs. I think there's a limit to how much we can hope for there. And then the bigger question is why is this the case, like why is it that we have see so little work in the machine learning world of operating in regimes where there's no data, where there's no notion of having offline data, there's no notion of having pre regressed models. That the robot has to wake up naive and then figure out how to operate. There are some cynical views, but let me at least pose the most cynical one I can think of, which is that on the right hand side, everyone is super clear about what they're selling. That if I am a big company that has all the servers that store that data and that process, that data, I am in the business of convincing you that big data matters and that if I'm a company at the other end of the spectrum, it's totally unclear what I'm selling. It's not unclear that it's unimportant, but it is at least unclear what product I'm selling to you. And I don't blame any of the big companies for caring about that right hand side. But I think in what we do, this left hand side has a lot to be said for it in terms of important research problems. One of those is that really important applications don't have datasets. And I'm going to claim that not only do they not have datasets, they won't have datasets. For instance, nontraditional sensors, MRI, that might have datasets. You have doctors that are highly incentivized. Scitherine label stuff for you, AFM, but things like electro sense and tactile data. Even if you can collect it, who's going to label it? How will you render tactile data to millions of people to get them up to apply labels so that you can then learn classifiers based on those labels. I'm not saying that this is entirely utterly impossible, I'm saying the idea of having foundation models for these non visual systems is at minimum challenging. And I think in some cases it's going to be impossible. This is important for things like manufacturing when you're at physical scales that you can't simulate. It's important for data systems where the data is going to be scarce and sparse, austere environments where maybe because of other elements, you just have no idea what the environment is going to look like before you show up. Deepwater exploration is probably a good example of this. In all of these cases, now we're at that left hand role, where the root has to wake up and now it has to start sense making. Through its own actions and through its ability to regulate neural networks or whatever learning model you have. And hopefully be able to improve its ability to identify things. Do things like as simple as loop closure, right? Like loop closure in slam in deep water. Based on things that it has never seen before. How would you do that reliably? I want to give an example of this. This is a simulated robot. I'll show you a hardware robot in a moment. What it's doing is it knows how to move itself. So this is an important type of self knowledge. It does know its own kinematics. It knows nothing about the sensor. It happens to be a camera, but it doesn't matter that it's a camera, knows nothing about what's on the table. It has to build a neural representation so that it can identify the thing on the table in the future. On the left, you see the entropy of the predicted entropy of the neural network as a function of where the robot is. In the middle you saw as it moved In the right you see its field of view. There's a couple of things to notice. First of all, it's entropy is constantly updating in real time. Its control is responding to those updates. If you track the duck, the duck spends most of its time at the edge of the field of view. That's because that's where the duck is hardest to predict what it looks like in the middle of the field of view, that's just like essentially memorizing what the duck looks like. But at the edge of the field of view now, you're only getting access to a few of the features and you have to make predictions in that much harder scenario, this simulation. So this was a paper in 2022 in Nature Communications. The key point here is that you can close the loop to run in real time with the updates to the neural network. That this is a completely synchronous system. There's no offline batch processing of any type. There's no initial data. Nothing is known when the robot wakes up. The principle that underlies that simulation really has to do with what do we mean by coverage in a robotic system. One version of coverage is scanning raster scanning some volume. The problem with raster scanning is that it's really inefficient. Anyone who was tracking the search for the MH 370 flight knows that we raster scanned the bottom of the ocean. That part of what happened was that ocean currents drifted a lot of the debris away which is why we couldn't find very much. We also not that this is all about bio inspired design, but we know that whatever an animal does, it's not going to raster scan when it's searching for something. Biologists spend a lot of time thinking about the visual Ccades and other types of exploratory behavior. We've worked with some of them trying to understand that as well. But the key thing is that the way that animals generate coverage of an environment takes into account the urgency of short time horizons. That urgency is something that we want algorithms that will allow controllers to also do that. Then there's another aspect, there's obviously the inefficiency of Raster scanning. But then there's another component that is deeply wrong with Raster scanning that has to do with like statistical learning properties. Which is that you can't make any continuous system actually ID. But Raster scanning is the worst case scenario. Raster scanning is making everything incredibly correlated. As you're collecting data, that right hand picture is actually pretty good. Like part of what it's doing is it's making sure that sequential images, or whatever your sensor modality is. That sequential measurements are more IID from each other than they would have been, say in the left hand picture. This is now getting closer to a principle of understanding what is the control architecture intended to achieve. It's not intended to achieve a particular reward in the reinforcement learning. It's not intended to achieve a particular outcome in the neural network. It's intended to create the motor movements that feed that neural network data that has the right properties in a reasonably time efficient manner. And certainly the top of that list would be IID data. It's not actually ID, it's just as ID as you can get with a continuous system. Okay, so I want to take a step back and talk about like, well, how should we think about ID? I'm going to use ID in a very cartoonish way, by the way. Like, because I don't want to get too stuck on different debates about the right way to talk about it. But certainly fissure information and entropy ergodicity is probably less common in this crowd. But these are all things that in various ways represent the uncertainty associated with a distribution. We're going to assume that those distributions, those are the things that we want to make independent of each other whenever possible, or as much as possible. Fissure information is the one way that we can talk about parametric uncertainties. Entropy is the way that we talk about bits of information encoded into a distribution. The key point is that either one of these are reasonable for things like next best view. If you're only going to look one step into the future, probably you cannot do better than acknowledging that one of those steps is going to either give you the best fissure information improvement or the best entropy improvement. But as soon as you're willing to look on a longer time horizon than that, then it's not any more clear that those are really the right concepts. Because you can easily, for instance, get stuck when you maximize information. For instance, let's say that the classic like I've lost my keys and I'm going to look underneath the street light. One of the advantages of the street light is exactly that it illuminates features that you don't have access to otherwise. If you are just naively implementing fissure information or entropy maximization, the street light creates the situation where you only look under the street light. You need to figure out some way to guarantee the coverage characteristics. In addition to prioritizing high information state regimes of your domain. This is where ergodicity comes in. Ergodicity relates the time average behavior of a trajectory to the spatial statistics of the information distribution or any distribution. But for this talk, it's always going to be the information distribution. It's fine to think of it as the cool back libler, divergence between the distribution and the trajectory. But just keep in mind that, that actually does not make sense. You cannot talk about the cool back libler, divergence between things of different dimensions. And so you can think of ergodicity as just like a rigorous way of handling that somewhat obnoxious aspect of the mathematics. And that's a perfectly fine intuition. Orgotic trajectory, if these level sets represent the level sets of my information density, the orgotic trajectory is going to visit all the neighborhoods, eventually asymptotically, but it's going to visit high information density once earlier. And it's going to be willing to commit energy to making sure that it's covering other high energy areas, even if that means going through low information regimes, information maximization, something like infotaxis. That's going to be something where it's very easy to go and fixate on a particular high information state. Just to be clear, there are lots in robotics, there are lots of algorithms, historically, that have, in various ways, introduced sort of local fixes to this problem. Like non myopic methods, this is part of what they're doing. But ergodicity provides global level guarantees on making sure that you're not fixating. So when we first were doing it, I'll talk a little bit about this a little bit, but when we were first doing it, I think it took us a week to optimize the first ten second trajectory. That does not sound like a practical robotic algorithm. But then over the course of a decade, we got to the point where we can optimize these things in milliseconds. And it got to the point where we can practically do it for run time systems. The standard way of thinking about this is based on work by largely ego messages group out at UCSB. He's really been the states person of this entire world view. And the spectral representation basically says that the commonality between a distribution and a trajectory is that I can take the Foia transform of both of them. And as long as I'm willing to take the Fourier transform of something, I can compare Fourier coefficients. And if I'm willing to compare Furia coefficients, then I can generate a norm in that space of coefficients. And now I have an objective function that I can optimize. What we figured out in this paper from 2016 is that you can optimize that using tools from optimal control. Interesting as an aside, this is not a Bolsa problem, so you cannot frame it as sort of a traditional optimal control problem, but you can frame it as a traditional trajectory optimization problem, which the fact that those two things are not equivalent was news to us. Certainly it was also news to reviewers, But it was all worthwhile in the sense that now we were able to work with biologist colleagues and understand behavior like this. In the upper right is an electric fish. It emits electromagnetic waves. It has voltage recordings across its bodies, across its body that allows it to detect perturbations to those electromagnetic waves. And then the question is, if it's trying to localize a conductor so it can eat it, should it move. And what you're looking at here is actually the movement of a robot that is equipped with electromagnetic field and voltage recordings. That is in real time updating how it moves in order to localize the conductor. And on the right, you see the evolution of its belief space as a function of its movement. The reason I like this is that you should ask yourself, on the right hand side, it seems like the uncertainty is evolving in a very plausible way, and that at the end, you really know where the conductor is on the left hand side, would you be able to guess that trajectory? I think I at least would not be able to guess that trajectory. I get the question, would random exploration of the environment do the trick? It turns out that the answer is no for pretty good reasons. That movement on its own is not enough to enable electrocans. You have to be pretty purposeful about how you coordinate. Essentially, you can think of it as local triangulation as you go because otherwise you lose too much of the value of your earlier measurements. I'll also say that there have been recent updates to this, in particular the Cheti Silvio and Lenon paper in the Transactions on Robotics from 2021 has this just astonishingly good algorithm for implementing exactly this approach, but in a way that takes advantage of tensor train decompositions and so it's super fast. And Sylvan Lenon has these beautiful drawings that he's done in museums that use this technique and all this stuff is connected. We've done a bunch of drawing work as well where we use this primarily as a way of explaining what our gotta control is. But they've definitely taken this particular branch to a whole new real time level of really turning it into a coherent control strategy. What I want to talk about now is, well, how should that's not learning, that's just taking into account environmental uncertainty and searching for something that you already know about in electrocense. One of the challenges is that you can model like you're literally in the spherical cow world where you can only model spheres. If you give any other crazy shape, you can't predict what the electromagnetic disturbances are going to look like. And so you can't do things like estimate orientation or other aspects of the objects relationship to the sensor. We started thinking about, well, what if we had to learn, what if I just told the robot there is an object in the water, and now I want you to move and I want you to populate a neural network that represents that object so that you can hunt for it later. In the case of electro sense, like I imagine this as being something like a little fish or whatever. Something that is irregularly shaped and that it's submerged in the water. The fish knows something is there. Now it, it needs to learn a model of it so that it can search for it in the future. We chose conditional variational auto encoders because they're sort of the simplest form of unsupervised learning. It has two key components. One of them is the latent representation, so that you're not solely memorizing the relationship between your position and what you saw the last time you were at that position. So you have to have a latent representation. It would be fine to have an implicit latent representation also. But the other thing is that it makes predictions about entropy, which is what we need. We need to know as we search, what are regions that we have captured well in the learning model and what are regions that we still need more data in the learning model. As we collect throughout those regions, we want to keep all the data as independent from each other as possible. So that the neural network has the mathematical properties it's supposed to have when you apply back propagation. This is what we used. Here's the result for that rubber duck example. On the left you see the seed result. This is what establishes the structure of the latent space in the CVA. In the middle, there's the random sampling that what you would get if you randomly sampled across the domain and the associated prediction. There's the orgotic sampling. There's something cool to note here is that the orgotic sampling samples around the duck, but it does not ever get right on top of the duck because as soon as the duck fills the field of view, you're not getting good feature information anymore. Actually, the active sensing makes sensible decisions. This is sort of like if I asked you to learn how to read, you would not put a piece of paper on top of your eye, even though that's obvious when you say it. When we first saw the system doing it, we were not totally sure how to interpret this doughnut hole in the middle of its exploration. Except that it turned out that the duck was filling up too much of the field of view. Indeed, when you use this, you get much better predictions and you get way more unit activity in the latent space. Each one of your vectors in the latent space, it's called a unit. And how many of those units are being used in your generative model is one measure of quality for the learning, and you get way more of those units being used in the learning process. In other words, it's not just memorizing the field of view this is, this works in simple academic laboratory settings. It turns out that making this work with real lighting and a real robot, a real camera. Is challenging, even on a tabletop. Here we give it objects. I had program managers in my lab the other day. And the thing that we let them do is we let them select objects and put them anywhere that they wanted on the table. And the robot learns those objects, it learns that they're different. It categorizes them into different classifiers, and it does that in 5 minutes or so. I think this is the key thing is that your model of how learning works often is like we all have an overnight model of like you go out and you collect a bunch of data, open loop, and then you learn from it because you've got some enormous server cloud at your disposal. And instead here the robot is like quite sensibly just looking at stuff and regressing as it goes. And when it notices something in its regression that doesn't look right, it goes and looks again. It's a very sensible. In fact, watching it actually do it is often the thing that's most convincing. I will also say that even though these are only tabletop demonstrations, once we had it working, it has never not worked. Which I can say about no other hardware experiment in my life as a robotics person. That there's something very, very reliable about this way of thinking about learning that my claim would be that batch process computation does not have this property. It also works for any sensor. I haven't emphasized this as much as I would like, but you can hot swap the camera for a tactile sensor or for an ultrasound. And then you just restart the entire program and it learns a completely different new sensor modality. It will learn to touch the rubber duck and to touch the house plant because it doesn't get any information in the rest of the three D space. From the robots perspective, there is no difference between vision and oral data and touch data. And the reason that's important is that we overwhelmingly rely on anthropomorphic sensors because they're interpretable, because we can label them. Maybe as robotics become increasingly deployed, partly just out of a need to make them cheaper, are going to have to be able to use sensors that are not anthropomorphic materials based sensors. That just all you know about them is that they have a sensitivity to the environment. But that you can't say anything else about them that might be touch like, but it will not be mechano reception as we understand it. This allows one to build models directly with a sensor of that type. I'm, I'm going to carefully watch my time here. But normally at about this time when I'm talking about this work, someone says this is just reinforcement learning. So I'm going to say it isn't reinforcement learning, but there are interesting stories and it's kind of nice. Like after a couple of years of people saying it was reinforcement learning, I finally decided, okay, I have to become a reinforcement learning person. And there are really interesting connections between what I've talked about and the perception up until this point, and the way we typically implement reinforcement learning. I'm not going to talk about the philosophy of reinforcement learning, but in the way we talk about implementing reinforcement learning, we typically distinguish between model based techniques where you are inferring something about the model in addition to the reward and model free techniques that just infer something about the policy directly. This means that somewhere there's going to be for model based learning, the state transition model has to come from somewhere. That means that your dynamics, the way that you interact with the environment are going to have to somehow provoke data about the model based learning. In model free learning, then we have to ask the question, well, where does the policy come from? The policy comes from typically the ability to evaluate sparse rewards as well. These things are never actually just functions in the way that we would think of them as undergraduates. Instead, they are always distributions. That means that these distributions always have entropy themselves. They always have entropy that is implied over the state space. And that if you can figure out how to compute that, you're going to be able to explore with respect to it. We were thinking about this and we thought, boy entropy, somehow reverse engineering entropy in the reward all the way back into my state domain. Sounds really hard. And we couldn't figure out how to do it at first. And so we said okay, well both of these things are pretty good. We're just going to ask ourselves, well, how should we synthesize controllers that just use both of these simultaneously? So this, you know, combining model based on model free control is sort of a standard thing, except that typically it's not done aggressively in time as the reinforcement learning is progressing. But you can do that. I'm going to move ahead here. So here there's a model based approach to a standard half cheetah benchmark. Does, okay. Here's the model free approach, which I would say that that's characteristic of outcomes. When you try model reinforcement learning and simulation. One of the points I want to make, at least in terms of convincing people that there's like a real area that you could invest time in and that you could expect payoff. This is just like the most naive way of doing scheduling between model based and model free control that we could come up with. It works spectacularly well. I don't want to get stuck on the control aspects of this, but the way that we thought about it was a mode scheduling problem. Using hybrid optimal control, we assume that there's a policy that is model free that you already have at your disposal and then you have a model based control that you're willing to intermittently use when you think you have an opportunistic reason to do so. Then my group happened to have some background in optimal scheduling for problems of this type. The thing that you're scheduling over is when are you going to switch to the other control and how long are you going to stick with that other control when you switch? Interestingly, it turned out that like LQG techniques did not work, and I don't know why. It may just be that we're bad at implementing LQG. That's a real possibility, but there might be deeper reasons why you want your control architecture to actually switch between the two instead of trying to smoothly combine them into one thing. The nice thing about this is that it's an explicit controller. This is coming back to what I was talking about before, that even if your reinforcement learning has networks that represent the dynamics and networks that represent the policy and networks that represent the reward. This controller is a formula that you evaluate at every time step. That formula then requires almost no computation compared to everything else that you've already been willing to compute. I won't spend a lot of time on the mode insertion gradient except to say that it's a tool that computes the sensitivity of what you want compared to what you're expecting if you don't do anything, for instance, when you're walking out of the room. You could imagine that as you get to the back of the room, you're increasingly aware that if you do not turn left at some point, you're going to be in trouble if you just keep on going straight. The decision not to turn is going to have cost. There's both a sort of space and time element to that. But there's also, once you start turning left, you don't just do it for a millisecond. There's going to be a relatively long duration of engaging in that new behavior before you see, before you see the payoff that you're looking for, what we do is we look at T, we look at an optimal control problem. One term says, how much am I expecting to benefit from a model based approach, based on my current neural network that represents the model? The other one says, how bad is my current policy? I'm balancing these two things. We add them because we didn't know what else to do. If you have a different operation for how you want to combine these, please let me know. It's very simple. Just like is always the case, the goal in control is to figure out something that has a closed form solution That, you know, replacing one hard optimization problem with another hard optimization problem doesn't help. But if you can figure out a relevant optimization problem that has a closed form formula that you can implement, then you've bought yourself something in terms of your implementation. So we have a formula really is what matters here. And if we start out with a model free policy that's optimal, then we're also guaranteed that this control will not interfere with the performance of the system. People sometimes ask about the stochastic case. The stochastic case is not really that different. All of this is in an JRR paper from 22, okay? So this is my favorite, probably my favorite video from my entire lab. This is what it looks like when you compare the three approaches. And again, the only thing that is happening in the middle approach is that we're scheduling which approach we want to use when dynamically as the system progresses. What I like about this is that we have a relationship with Intel that where they were letting us borrow big service tax And we didn't use the service tax for any of the actual calculations, but we certainly used them to make this video. What you're seeing here is in the middle. First of all, ours also has some stupid behavior, which I just want to call out that when you're looking at the distributions of how different seeds respond, some of them still perform badly. But still, you know pretty well. And then a lot of them behave in nonsense ways, right? And, you know, this is sort of the standard situation. It is the difference between looking at what the actual benchmark does versus just looking at the plots of the reward curves, which actually all of these look kind of good under that metric, okay? So now looking at this and taking into account that, you know, there's many hours of computation involved in even getting this for a very simple benchmark, I'd like to circle back to the original thing that I showed you of, you're in a plane, something dramatic has happened and you're falling out of the sky. My claim is that this is not what I want to be somehow fixing the situation. It's incredibly dependent on simulating something that we might not be able to simulate. It's incredibly dependent on like massively parallel deployments that we aren't going to be able to make parallel. Also, even the good solutions are clearly bad solutions that I do not want when I'm hurdling towards the ground. This is part of what motivates the diversity of techniques in robot learning settings. My group has done quite a bit with Upman operators, and I'll say very briefly something about them in a moment. But that the motivation for Upman operators is partly that if I want a run time machine learning algorithm that can adapt in some crazy scenario where I have a safety critical need to adapt, that is just not going to look like a massive neural network architecture that I'm regressing over. Like we're going to have to be willing to use specialized tools that are intended for that scenario. So Coutan operators, again this is somehow go message and I can't, I feel like I just like follow, go around intellectually for the last decade. But Go Message is really the one who has promoted Coutman operators in recent times. And one of the points that he makes most aggressively is that Cutman operators deal with time correctly. What he means by that is that when you have an operator representation, composing operators corresponds to an exact thing happening in your sample time. You can think of this in linear systems as there's a notion of exponentiating your linear dynamics and that means something exact about the duration and what's happened to the state. It is also true that Oman operators have other advantages. They're represented as linear operators. You get to take advantage of techniques for linear control also because they are linear operators, you can talk and reason about their spectrum. You can say what part of my spectrum is currently uncertain and how could I change my control to improve the information about that part of my spectrum, like say your unstable spectrum. You can say, I'm specifically interested in the instability that is hurdling me towards the ground and I would like to learn something about that. What data could I collect that would tell me something about this clearly safety critical aspect of my dynamics rather than just learning generic features of the dynamics like I'm falling. Those are reasons to use it. The ease of control synthesis is also a reason to use it. But then there's other things. And this is part of what I'll end with, the dynamical structure. That adding dynamical structure to a generic neural network is something that people are doing and doing interesting work on, But with coupment operators, often that same dynamical structure is completely trivial. Conserved quantities, those are just zero eigenvalues or identity eigen values in discrete time, right? Somehow insisting that there be conserved quantities is trivial in this setting in a way that it is at minimum, not trivial in a neural network. Passivity constraints, stability manifold structure, these are all potentially things that you can build into accupment operator ahead of time. They all form a type of domain knowledge. But they are not domain knowledge in the sense of specifying an ordinary differential equation. Instead, they are a different type of domain knowledge relevant to machine learning. Basically upman operators. Every paper has some version of this cartoon. I think there's all sorts of things that are unsettlingly confusing about it. But the basic idea is that ordinary differential equations take finite dimensional states and allow you to have non linear operators on those states. Upman operators say, no, no, no. Everything is going to ultimately be an infinite dimensional state, but dynamics will always be linear. There's been a ton of great work actually. My favorite recent work comes from Harry Asata's group. And I'm primarily commenting that, first of all, he's done super cool stuff regarding impact. But secondly, not an obvious group to be doing coupon operator stuff, right? Like this has really spread a lot further into robotics and away from its original mathematical origins. That the basic idea is that if you have a coupon based controller, you can be constantly driving your robot with it. You're updating because updates to the coupon operator are basically trivial. You can close that loop. And then you can also close the loop around information maximization. In particular, if you're willing to take as your information measure the. Fisher information with respect to the spectrum of your recoupment operator. It turns out that it's trivial to compute and nothing about the neural network side of that would be trivial. This means you can run this controller, this active learning controller, at 100 hertz or a kilohertz, with enough compute, without any trouble at all for lots of systems that initially gives pretty good results. So here's the first version of active versus passive learning. This is what we published in the 2019 paper. That paper is titled, Active Learning of Dynamics for Data Driven Controller Using Coupon Operators. And I think it's one of life's ironies that in my mind, the word that everyone should fix there on there is active and what actually happened was everyone fixated on Upman. But nevertheless, what we got in this was a bunch of hardware examples and also some of these simulated examples where there's clear benefit to being willing to invest in this early information acquisition even if your safety guarantee is something that you care about later on. But that didn't work all that well. I still fell quite a ways when we went and investigated what was wrong with recruitment operator and its estimate. What turned out was that even the unarticulated dynamics were unstable. And they were unstable not just in the following direction, they were unstable in multiple directions. And there was no way, like, we sort of looked at the geometry, there was just, like no possible way that that was right. And we found some techniques from linear algebra that allow you to take a general linear operator and project it onto its stable spectrum. It turns out that if you do that, and if you say nope, I'm going to add a little bit of domain knowledge to my model. It's accupment operator. It's an unknown operator with unknown nonlinear dynamics. But I'm going to insist that when I'm not forcing it, it has a stable spectrum. Then suddenly you get extremely reliable learning every time that the combination of the active learning, along with some very small amount of added structure to the Koopman operator allows you to solve this problem. We've done this with manipulation tasks and we've done this with series elastic actuators and things like this. We have not done it with a drone because the reality is we'd break a bunch of drones before it worked. But it's something that I'm certainly very interested in doing. Okay, just to wrap it up and make sure that we have time for questions, robot learning as a field right now. There's so much interest in it and so much of it is really about how machine learning should help decision making and control. And I'm completely in favor of that. There's a lot of really amazing things that we can do in that area. But control for learning is also an opportunity because actions, we're currently in a space where there's a lot of tuning of networks going on to try to eke out the last little bit of performance that you can get but a robot's ability to act and change its data experience. This is one of the knobs that is largely unexploited Still, if we can turn that into a coherent framework that includes both things, I think there's a lot of opportunity here. It also enables using new sensor technologies, things that we would not normally think of as sensors. Because if you think something's a sensor, you just plug it in the robot and you find out, can it build up a representation? It doesn't need you to confirm. All you need to do is have meaningful metrics of can it find the object again? And if it can, then it must know how to use the sensor. It's a very sort of arm's length. It's like the opposite of explainable AI, right? Like I think of this as explicitly unexplainable robots that somehow still do exactly what they're supposed to do. That's my goal and I don't need the robots internal life to be explainable to me if it is reliably doing what it's supposed to do. Hopefully, I've given you at least a way of thinking about like Fisher information versus interview versus ergodicity. I think ergodicity as a formal and reliable way of characterizing coverage guarantees and asymptotic guarantees is important. But it's also sort of like the simplest version of active learning. It's proportional control for learning that there are going to be lots of things like, particularly once you start to model the dynamics of the neural network so that you're making forward predictions about how it might change if you did something that would be like PD control. The, you can imagine getting better and better at this, and this is really just proportional control. Then lastly being this is all about how to deal with novelty, but if you have to deal with novelty and deal with it on really fast time frames, you're going to have to be willing to use specialized tools. It's not just going to be like throwing huge neural networks and stuff. Okay, I want to think I've had amazing students work on this with me. I've had awesome collaboration both from federal sources and from companies. I would like to field any questions that you have. Questions, How is the control for learning different than the traditional way of dual control that felt bound did in the '60s, the dual control paradigm, right? Are there any differences in what have we learned the last 50 years that you felt bound, didn't know, and now we can do that. Yeah. Okay. So there's sort of two answers to this. One of them is the adaptive control community in some sense solved a bunch of these problems historically. Then the question is, why didn't we have all these technologies actually working? I think that the answer is not so much that we didn't have any mathematical characterization, it's that we didn't know what the objectives should be. I would say that like duality does not in any way tell you what the goal of your learning system is. That there wasn't even really the language yet to talk about that coherently. But then the second thing is that there was not enough of accepting the idea that for the things that you want to do, there's going to be an art of implementation. And as much as the machine learning perspective on this, where every network is its own artistic endeavor, frustrates me. I've actually started to become more embracing of the idea that when you pair really good analytical techniques with people who have lots of experience implementing things in hardware, you get different results than you would have gotten if you tried to do it formally. Because formally what you're going to primarily discover or predict is that nothing will work. And I think that this is actually even more so from the history of robust control. In the history of robust control, most of what we learned in robust control is that nothing will ever work. And we needed to somehow, culturally get past that. We use and rely on systems all the time that stu safety critical systems that do not satisfy the requirements for robustness. And that was partly figuring out how to use some of this artistry more aggressively. That said, the adaptive control results like as you say from 60 years ago, many of the things that we're currently rediscovering probably have a translation in that space. They probably right, like, I mean, a lot of it is also just a disconnect of nomenclature and words and variables. Certainly we spend a lot of time reading papers that are very old. In addition to the constant tsunami of papers that are coming out every day, that may not be a satisfying answer. Any other questions? Yeah, nobody else has one trail. I love this idea of plugging in a random sensor and just letting the robot figure it out. Yeah, silly question. Have you tried like two sensors at the same time or do you have any maybe two arms with different sensors? The places where that's most interesting to me is two different tactile sensors. Because all the different tactile sensors from different companies have extremely different mechanical properties. And so it's really interesting to have like two robot arms that from their perspective, they're touching the same thing. But their perceptual understanding about they're touching is like completely different. And that they need to somehow that um, so the coordination problem, even if you have perfect dead reckoning, because you have like encoders and everything, so they absolutely know that they're touching the same space. The question of like how do they label that and make sure that when they discretize the world into different objects, that that object now contains both signatures and that it can search with both fingers as it reaches through the world. That's also a set of pretty hard problems related. What is objects maybe. Could you talk a little bit more about how you tell the robot, generically there is an object to figure out what is. All we say is that the domain has stuff that you should find interesting. If there's one object, there's going to be eventually one really high entropy region. If there are multiple objects, there will be multiple regions that have lots of entropy associated with them. But if you give it one object that's like a sphere and then another object that is some crazy cactus thing. One of those things is going to be so much more feature rich than the other that the sphere is going to recede a little bit in the objects in the robots representation. It'll probably still cluster them correctly, but at some point that's going to become hard, not because it's not an object, but because it's boring. Thank you so much for the interesting talk. So I have a question about the example you showed the robot is nerve to differentiating the two objects. Yeah, so I'm wondering what robot, what has the robot learned? Like they know that's a different object or they will sort of a physical property of the object has not learned any physical property, so we are working on that now. Where the robot actually pushes on stuff and generates a mechanical property. But actually right now, it learns that the objects are different because the latent representation is different. So it's clustering over latent representations as a function of where in the domain the robot is. Yeah, so if next time the robot object within the same category but slightly different. Does the robot know that they are the same category? Or the Justin, it does know that it's the same category. So my student has a big bag of rubber ducks from Amazon. And when you pour that big bag of rubber ducks into a pile on the table, it gets pretty confused about what it's looking at. Because what it's seeing is like lots of latent features that look a lot like rubber ducks showing up. But then there's a really important part of being a rubber duck that does not show up, which is that suddenly those latent features are not grounded to a particular location in space. When it learns the rubber duck and the plant, interestingly, when it sees the plant, it sees it most likely as a plant, but it also sees it as a rubber duck, which totally confused us for a while. But it turns out that the reason it sees it as a rubber duck is that the latent representation is encoding that these things have a location in the field of view that they are similar and that they have a center of geometry that has to be predicted. And as soon as you create a pile of rubber ducks, that's no longer true. There's no notion of a center. So part of the latent representation gets broken. Yeah, Yeah, I think that's very interesting. Like if the robot is able to just differentiate different objects. But I'm thinking like for instance, the representation learning how did you model like the same property among different objects and how to gather the robot to learn. El robot like those are the same thing. Those are having sort of same geometry or same property without any of this human input. How the robot like like they generate this knowledge of things instead of just different object that different. Yeah, I don't have a good answer to you for why it works. I will say that the duck is a little confusing because it really is incredibly reliable. The plant is more representative of the problem that you're talking about because sometimes when it makes generative predictions, it'll add something that looks like another leaf that the plant didn't have and we don't really know why, right? Like there's no explaining why the prediction suddenly involves a new leaf or why some other prediction is missing some leaves. All of those predictions look pretty plantish, right? So my first cut at answering this, which is completely a guess, is that this process is approximating from below, right? Like it's building up a representation of the rubber duck from a completely uninitialized neural network. And one of the advantages of that is that it's constantly massively overestimating its uncertainties and acting accordingly. When we talk about building in a representation, oftentimes what we're looking for are pristine predictions. And this system will never produce a pristine prediction. If we could let it train for weeks in 5 minutes, it produces a useful prediction, but it will never produce something as pristine as what the sort of representation type techniques will allow you to do. Certainly like if we really built in a representation of geometry, for instance. So that the robot had a completely unambiguous representation of SE three like that would improve lots of things. But then you'd have to be really confident that SE three is real for your sensory system. Which we for sure would not be with a camera. We probably would be. But for anything else, it's not. All right. We're going have time for one more quick question. If there is any, Seth is going to crush me. You're so cavalier. You're so cavalier about. And it always works. And then later, you were cavalier with. And I don't care why. Because it always works. And it makes me wonder, is it right to be that confident? And if it's not, might you not want to have some explainability at hand? That's a great question. I want explainability in the long run. I think explainability as a continuous time requirement during research endeavors is misleading. That it means that we can't ask questions that are super useful to ask. I'm very interested in active learning as a certification process. For instance, that once you've gotten to a point where you think you've learned something really well, you actually turn your active learner into an active adversary. Because all of the tools that you need are there waiting for you to do it. And that still doesn't tell me what the robot is thinking. But might I believe that certification more because it actually happened on the hardware that the robot is using instead of in some in principle mathematical representation maybe. And then if you add to that mathematics that are intended for certification but that are also intended to work with data that came from the hardware system. That I would believe even more of my cavalier attitude about this is that after a year of seeing it work every time early on, my students and I didn't believe this either. Maybe no one else does, but I do. So that's a nice feeling. It's an example of not having something that only worked once for the conference submission, right? Every time, every time someone comes in, we confidently hand them objects and say go for it. But in the longer run, I think there's also a question about what should safety mean if I'm falling out of the sky. Like what safety properties we're no longer certifying safety, right? We're certifying something about the reliability of the system in its destructed state. That will give me the greatest hope that things will be okay. It's a different question, so I'm not so much being cavalier as non rigid. All right? Okay. Let's thank the speaker one more time. All right? Thank you everyone. Acceptable. Y.