Right, Thanks, Seth. That's kind of the most touching, impressive introduction I've ever got before. Really appreciate. Yeah, I would like to talk about my research for the past couple of years and actually my research agenda for next few years. And actually when a, when a person has asked me to present at this I rem seminar, I cannot underestimate the workload. And yesterday night, I realized that I want to change the story a little bit. Organization my existing research slides. So actually I just rewrite the entire introduction yesterday, which takes much longer than I expected. But I would like to share just some plot and the story in my mind and so that I can let you know that where are we and where were we want go. All right. Let's start by checking the state of the arts. So where are we? So I am not to export. I'm mostly doing leg locomotion research. So I'm not too familiar with other types of robots. But I think these are pretty much perceived as the sum sort of the state of the art, for instance, manipulation. This can be one of the state of the art right hand manipulation, disturbing the Rubik cube. We're serving one ruby cube by one hand is still pretty impressive. But this is done by OpenAI. But still not as HI or SD or human rights or human records are like three seconds, four seconds, still much slower than that, but very impressive. Navigation. I'm not sure there are tons of the navigation works, so I just pick one from the, my favorite colleague, Alexander fast at cougar. This, the heart teams work. This navigation robot navigating with the moving obstacles. Actually, this is kind of de corps kitchen at the Google plaques. So really Jeff didn't drink coffee here. Or it's like the most popular place. And this, this navigation robot can, weird robot can navigate while avoiding obstacles, moving human. This is her, so this can be some sort of viewed the SDR, one of the most impressive watch. You can see that scenario has been change it austere. He can avoid the obstacle nicely. This is good, right? And then what's next? Let's not skip over the drones. So this is a paper called The deep drone racing. I'm not too familiar with the drone racing, but it has this year with the gates based on that region sensor looks table. I think this is most PDO, one of the state of the art. Pretty nice, flies, table flight with a pretty good intelligence. I'm okay. This is the work of work for the state of the art for withdrawn. The one about legged robots. Well, this is the most recent paper from CMU collaboration between Facebook and CMU, which is acquired or RMA, rapid motor adaptation. Now the robot can walk over the very bumpy terrain. That small quadrupedal robot can work on that this uneven terrain. You can step down the stairs on a hiking path. So really this is one of the most impressive work and this is publish that this year's RSS. When I say the state of the art, I bet many of you come on with this video from the atlas presented by Boston Dynamics. He can walk over the obstacle, jump over gaps, step down the stair. Like do many poker like stuff. By the way, I'm not sure that you are familiar with this, but they have really bad Friday tradition of the posting the video. Why before the deadline? To just frustrate the other researchers? Really, they're bad guys. But my point is there, yeah, this is super impressive, right. Humanoid. We know that the humanoid doing that kind of the edge I look, promotion is super, super hard problem. I know that. But is this enough testosterone direction we want to go? I would like to show the state of the art in nature in the sense that men at all see the agility of these. Though. I believe this one is much more major than any existing robot. Sensors you can do. It can overcome many different obstacles, are very robust to failure. Maybe failure body can recover. Started, maintained super, super Asia. I don't think DO we're not there yet. Definitely would actually DO. Okay. Let's wait just a couple more seconds than you were done with the sourcing. Okay. This is his or her record is super impressive. But actually DO. I would like to make my point here. I'm not saying that this agility is something we want. Actually the docs are okay. More than fast racers. Actually there you see the, maybe we focus on the agility, but actually that agility is combined with many other features, like many other halogen. Intelligent decisions, careful motor control, this different from the register. They are different from the fast racers. And I believe DO what we want is not just fast racers. We want to see more impressive stuffs like this one, risk dog is playing piano. Right then. To do this, what do we need? We need more various motions with the nice hand, eye-hand coordination, right? No, the context must be dust should not eat, deal the piano too hard. So this one is very good example of the more intelligent behavior. An hour. I have a friend at Google whose name is Jay and Jay and I always our dream is to train the Roboto to fetch beer from the refrigerator. So this is different from the a, just a fast racing, right? This dough. The generator, fetch the beer and grass back to grab it and back to us. Which means that he knows how to manipulate things. To calibrate itself using just the beach and sensors must understand the context, must maybe understand our some language. Or it can be I'm not sure that what is the trigger? It can be proper language. It can be our body language. I don't know, but definitely when we want something more intelligent than now. And that's the research goal of our lab. A nice for five years. My tenure. I think this will be the research agenda for our lab. Also develop to develop intelligent robot that can make human life better. I don't want them to be just working on the same portray. Trend is fine. Walking, walking robots to locomotion is fine. But really can make my human life better. Maybe yes, maybe no. But I really want them to do more meaningful things. And I believe we need more intelligence here. So by, by saying a more meaningful things, I mean, few things like the, like the last one, I like the robotic dog is delivering some iodine from here to there. For instance, the, uh, I'm kind of lazy personas. I hope that my robot dog delivers the food from the texts care to my office at Klaus. I don't want to work like 2030 minutes to pick up the breeder from Moose. Or they can be lifelong companion for human. Like they can do more human close human robot interaction. They can support human. They can do emotion or support by seeing the dog had robotic dog has great potential. So the worst thing I really want to do is developing a guide dog. I think the robotic guide dog has pretty high potential. Of course, a guide dogs are expensive to train, right? We cannot mess product of mass, best mass production of the orthodox and restore the robotic. I don't, they can see the URL. The word with their regions. And they can combine with any other AI technology. They can explain the or what the IRC. Usually dogs don't talk human language. So nobody dogs, they can explain something to us and communicate with us with verbal language. Why, while providing the same service as annual guide dog for these kind of things can make our life better. And these are some research agenda of our lab. Most tourists wise. Of course there we are very interested in the deep reinforcement learning, which is pretty effective automated tool for developing highly intelligent agent. We always CDO Unidos y in a bribe achievement, like the game of Go, that are Fargo versus Lee said or watch all the games life and I am so sad that setter or loss. But yeah, that's kind of the big showcase of the deep reinforcement learning. And there's the autonomous driving. Sometimes today are developed based on the manuscript, but sometimes there are many of them are still leverage that deep reinforcement learning, as we're. So deeply, personal learning would be the term for logo for a good tool for us for many years. And this. So I would like to highlight this one. So today I would like to talk about few things. First, I would like to discuss the air for locomotion first. Do the I want to do many stuffs. I cannot just started right away. First we owe for like robots, we first need to learn some news for locomotion. Particularly, I'm interested in the overcoming seem to be a gap by leveraging the real-world data. Like to actually I conducted what theories of the work or in this direction. That's the first thing I would like to discuss. And then I would like to discuss some more behaviors, how to learn more behaviors. And I would like to discuss navigation while understanding the real-world environments. Before begin. By the way, please interrupt me at anytime I would love to hear your questions. All right. Let's start with the first line of research, which is they're learning to walk via deep reinforcement learning. So basic idea is simple, just we want to train the locomotion on the by deploying the reinforcement learning. There will be a word why. It's the biggest. The harder for deep reinforcement learning is that the simulation optimize controller often do not work on hardware, which is so-called, seem to be a gap. So this is the policy in simulation. But if we directly deploy this, then the, IT Pro than that, it falls immediately and it burns motor for a high chance. Some real words and actually even tougher for simulation. We cannot easily stimulate them. For instance, this is the memory for me, the three inches, which has some deformable body with the memory is memory. There's some memory. India, these dynamics or the doormat. It has the OH, the complicated texture is source to deformable. We can simulate them every simulate everything in the virtual world who also are one of the big philosophy at Google is that directly train on the real-world learning and we award is intrinsically free from the CMU seem to be a gap. For instance, my colleagues at the cougar, they train that tossing both by, by just how to pass the object, by using the real world experience. And I had a small project that is a training the policy on the rearward for these small reconfigurable robots. This is the automated training environment. Very simple sketch. You can train the fact table commotion in the real world. No simulation data, but directly loan from the real world. But actually deploying the deep reinforcement learning on real robot is not straightforward. Basically. On top of the existing challenges, we must consider two agenda. First one is the simple efficient learning algorithm. In simulation, we can use like millions, billions of your word. Numbers are much smaller. And the second is, we need a safer automated learning system. Otherwise, we, it requires a lot of manual intervention or that you will break the robot or surroundings. But if successful, this is one illustration of the learning process. This robot is so-called Minitaur, which is the very small quadrupedal robot plan, a structure, a degree of freedom. Two of her each leg. We didn't like the view. Within a few minutes, I will say the 1.5 hours, it learns to collect or they experienced and start to how to move forward. Also, I believe most of you are familiar with reinforcement learning, but the insured or just in case, reinforcement learning agent is trying to maximize the reward by interacting with the environment and agents do some action to the environment and it changes states and environment, evaluates the, the polytope, the motion is by giving, giving the feedback with a reorder. This is very simple, nice summary of the reinforcement learning. There's a framework of the maximum entropy reinforcement learning which optimize, booze expected return and the entropy of the policy. So which means that actually did this formulation says that, hey, you should do, achieve that task, but at the same time, you should maintain certain minimal level of the police entropy to explore more wide, wider variety of situations. And these promises us better sample efficiency and more robust policies. And how it shares our algorithms soft actor-critic with a lawn temperature. So basically DHEA, we modify the existing US. The critic algorithm. We formulate the of this entropy term as the entropy term SDR constraints. And then we change the constraint maximization to steer using Lagrangian formulation. In a nutshell, we were to not care about the entropy if the current expiration is enough. But we start to care about the entropy term now more if our exploration is not enough. So this term here, this term, our farm, will be automatically adjusted. It turns out that our algorithm outperforms the older existing baseline. Like when was it? This was like 2019. And at the moment it stimulate them. I think many of them are actually steer the kind of state of the art. It outperforms the base slice GDP, GTD, three PPOs sac with a fixed temperature by regulating the policy entropy. This is the one of the benchmark tests. You can see that it achieves a higher reward while maintaining. Here, on the right-hand side, you can see there that they're maintaining the entropy at the right level. It turns out that this algorithm works surprisingly, we were on the rearward. Definitely it can walk on the, learn to walk on flat ground. This is the same as the training environment, but actually the edutopia is structured a wider exploration. It is pretty robust and can walk on the Slav, very unstable platform, or it can step down, walk down the stair. Which never seen during the training. All right, This is the one part about your one challenge, which is about the algorithm. But unfortunately at the moment, we didn't have the scalable and safe learning environments. So actually, it requires the human intervention for two to three hours. Sometimes we need a cooling time. It was painful to do experiments where US cannibal learning environment, if you imagine this, easier for manipulation, right, is fixed so you can reset to any position at anytime and there's no big safety issue. But it's really more challenging for under actuated lack robot. We have to deal with many challenges. I can show them some examples, Joseph, I can show you some examples. Like they can. Oh, I didn't know that sound is wrong. You can dive forward and right. And it actually post left Toronto Mozart very quickly. Or it can it can flip to the side way. I'm not saying that it is jumping to flip. It actually. Late, left and right, a very gently start to flip over. And it's like robot, right? So definitely can step out of the powers and start to hit the fence and it can get tangled with a cable. So that is long then is tied to heart, heart, a learning process because TO that cable kind of the contaminate the collected samples. So we fix everything using some like edges, sometimes gadgets. Sometimes they are developing some policy. Sometimes they are developing algorithms. Like we add an additional safety constrained to the olive formulation. Now we have the additional safety constraint, which means that we want to keep the failure rate to 0. And it can be used to solve with the same approach, lagrangian relaxation. It turns out that these promises Ostia much lower failure rate compared to the fixed rate. Why idea of achieving the reason, reasonable performance. And, and OR so we do the automation using the multitask learner. Simple idea. We learned the mixed Blondie to leave the forward and backward learning depending on the robot position. If the center is the a in front of the robot, we tried to launder for the locomotion. And if it's the same here, but defense the backward than that, we try to learn the backward locomotion task. But actually it is simpler to kind of minimize the number of failures, number of the case where the stepped out of bounds. And this is a heat map of the tests in the, in the, in the 2D. And you can see that they are kind of attracted toward the origin. Here's the developed recovery policy that can automatically reset from the wide, wide range of the states. We just develop the manually. But it takes some time, but actually in never face a kind of arise successful. It can arise successfully. Make the robot get up from the forum position. So with horror the inventions, actually, we can reduce the number of the human resets from 89 to 0. So which means that you don't need to lift that a kilogram know about 89 times. Instead, just tell you on Baxter robot coli, you can unbox the robot, the ground. Wait a couple of hours, then you will get the locomotion policy. Like this is like the only expiration. If we fast forward to the year later, then you can see that they start to move a much longer distance and share our long policies. Like this is for the gate. And this is the back one day. I like that they come up with slightly different locomotion style depending on their foot structure. Now if we recall the initial motivation, we want to do learning on real robot because of the challenging terrain that are very hard to simulate, right? This is the initial exploration on the memory form, one of the challenging terrain and the state after a few hours. And then it gives us a pretty nice policy for forward. Forward one is doing kind of the mix is our say, something like the pronking gate, but not quite the same right? Back. What policy is Bohr like? Doe, energetic pacing guide. And we can repeat the same experiment on the doormat. Do the initial exploration. And after a few hours, we see some improvements. Then it keeps us the nice word gate. I like the actually this movement when their leg stop at between crevices and it tried to pull it off. And I believe this kind of detailed behavior would not be able to get with simulation that weren't policy. Somehow, it's kind of very energetic. Pacing gate is almost flipped over but it doesn't. Or another on, I like there are. So the, another benefit of learning directly from the real world is that we can actually learn the tiny engaged as well. So here the problem is that the minitaur robot has a planar structure. So it gets really hard to design the nice, effective attorney AND gates. Because in simulation everything is perfectly symmetric, right? Cannot learn deionized turning motion is a relation, but in real world, it kind of pro STL, slight difference between the left and rise SLI mass distribution. So then we combine the everything altogether we just learned. Just we can control the robot joystick. So it's like they once again, you just have one box, the robot, you define some password than that after few hours you're coming back from your work. Then the robot, we will be able to control the robot using joystick. Alright? Alright, But so on chain now we learn everything in the real world. So there's no simulation data rows. But is this the best way we can do that? Do that, do the learning here, whereas this simulation will contain some useful information. So our recent work, we start to pray, train a policy in simulation. Try to fine tune in the real world. The idea is same. We kind of mixed land, therefore the MPEG4 policy. But we just use the pre-training, Pre-training, of course. Of course. We made some changes for fine tuning. First one is TL. Now that instead of doing simple locomotion, we take the imitation learning framework. So this is the very nice work of Digest poll to imitate. They keep a reference of motion. According one is the reference motion and the left-hand side is the simulated one. We use a similar framework. And second thing is there we fine tune or we print pre-train in simulation using sec. But we find chin media read queue, which is randomized ensemble. Learning. This technique is key idea is to learn a random ensemble Q-function use. But it's more than three. But it turns out that it's actually faster than model-based RL like TEA and BPO. Honestly, we are still not 100% understand why read queue is so fast. We are the assets intuition is that it prevents the over-estimate or the underestimation problem of the cube function nicely. And of course the story that's, that's kinda one of the reason. But anyway, is super simple, efficient. And first thing is the wheel on Azure and robust rr controller in simulations. So instead of using manually designed the resect controller, we train the reset controller using deep reinforcement learning. And it turns out that it recess in very HER way, which is like five to ten times per se. The 10 times more faster than a company provided carob controller. And also they're very robust in the real world. So with these changes, our new version of the framework. Now we can go outside. We go to the yard, backyard and the way it's hard to we tried to fine tune the pretrain, the policy. This is a little bit scared version of the existing sticks and walk. And after a few minutes, we start to see some improvement. Yep. Yep. And here's comparison like the pre-trained one. It does not to forward that much, but fine tuned one. It can go forward very nicely. Or backward, same the pre-trained one chance to go backward depend due to the shape of the legs. But if for at the end, why they're fine-tuned one, it can go without falling. They are also actually better than they are with DEA. This fine tuning has a lot of potential. There is many existing way of doing some sort of fine tuning in. We're wired, for instance, using meta-learning or that you can, some stuff changed me the latent variable and the fine to India those knobs in the low dimensional naps. But we compare our method with the stack or the existing RMA, or the latent space journey. We believe they're ours is the best, but we will see, we will keep working on this research direction. All right, so this is onto so far our progress. India, the powder for locomotion. Hi. All right. So by the way, any questions or share the safety constraints. That's good question. Safety constrained. We measure the certain dangerous scenario like the center of mass goes below certain threshold. And then we try to avoid those situations. Noise purity of reinforcement learning. But speaking of the MPC, actually, thanks for bringing that up because they are now our talk about some of the manual control and also the hybrid, a hybrid approach of the deep reinforcement learning and model based control from now on. So, so I would like to talk about how to learn more useful and expressive behavior. So you know, as I show in the early slide, or this is the robotic movements. And natural movements must be much more than that. It's just, even without that their outlooks. If we just we demonstrate year like it is some of the skeleton of the same motions. We can easily tell. This one is robotic and one is with the reorder, we can easily tear. So what do we need for more natural motion of wildlife or something? I would like to talk for next five to ten minutes. So actually I would like to share some of our recent work. This is a collaborative work with the Facebook first or third tier knew we join our labs pretty soon. The title of the work is the model-based motion limitation for a very diverse generalized quadrupedal robot. In short, we want to do develop the model based control for doing more. Imitate, imitating natural movements. That's the main topic of this paper. So deep reinforcement learning or press a convenient way for imitating the entertaining in a given motion. The mimic paper up the Jason made a lot of impact in computer graphics and the robotics community is good, but it often not transferred to the real low, but clearly robot force down. If we don't do take care of their seemed to we're very much. And also the another downside Is, is super data hungry. It takes up to the ER, or like around the 100 million steps. And for Singer motion, I'm not saying about imitating entire behavior for learning singer motion. And it takes like a 100 million steps and significant amount of safe. So in this paper, we discuss the question like, can we directly develop a model based control? They can imitate they give her emotion. So which means that we don't need any loan. Yeah. Just we developed a controller and the same controller can to all the different motions. Turning, trading, pacing, slides, tapping. Just a singer controller is enough. And actually, I don't want to say we sort of over the problems, but actually we were able to achieve pretty good preliminary Richard motive. So big core parts of this work is the model-based control. How to control the robot to to follow the Cuban motion. And also the second important component is the trajectory optimization. Because the motion capture data sometimes that there's some skating, there are some artifacts. I want to fix those artifacts using the pre-processing, pre-processing stage, using trajectory optimization. I don't want to dive, to dive too deep into the mathematical equations. Just how like to give you some high-level intuitions, like the Air Force Roy computer, right? Contact forces to track their given center of mass for the colon of a contact force for each foot. Or when they are contact, we can apply some force if they are in the air, then we can apply the force from the foot. This is the rule number 1 and number 2, we adjust a bit positions if the robot moves too fast or too slow. Is it just that this is the power of idea in a very traditional labor time controller. But combining the Or rule number one and number two, we were able to achieve the good tracking performance on real robots. And second is we want to clean up the reference motion using the trajectory optimization. We cleaned up they keep emotion. We use the repeat the dynamic movement primitive, DMPs. And As you can see in this flow, the blue ones are the we, the TMP with when you fit a motion with the DMPs and the issues, the a much better tracking performance to the ground truth, which is that gray one. Chance. Results are pretty unlike the Kanri shirts in simulation. Actually can, same model based controller can actually track the all different types of motions. And actually they are even better than the reinforcement learning baselines. Not even in simulation. They can kind of the deployed to reward. We didn't have a good chance to work. Work that we did is we did deep reinforcement learning. But our model based control is kind of Dia, or robust so that it can do many things in the real world as well. And once again, there's no pre-training, just a single controller to do or different types of propulsion. So actually I think this my PDF of some interesting research topic, data-driven, model-based control. Oh, actually I would like to highlight another collaboration for this time is collaboration with Google. This one is about your Vireo locomotion. So basically to do more news for stuff, we have to do nature, hand-eye coordination. Otherwise, we cannot do many things. But in the context of the locomotion, how can we negotiate challenging terrain using visual inputs? This robot here we have the two vision cameras here and here. One is facing forward, one is facing down the safety or we can overcome some challenging preference. Share the key idea is to combine the deep reinforcement learning with a model-based control for sample efficiency. So deep reinforcement learning for feed eye coordination and her savings. And a model based control for low-level control. And this combining model-based control, it give us a much better sample efficiency. Give us the better learning curves. The horse period of this project. Oh, oh, by the way, there's also another really important topics into real transfer. So as I said, applying the idea of being personal learning for your robot or less, There's some drawback of overcoming. Basically we have to overcome that gap. So here our identify that a lot of this interior gaff come from the visual inputs. Here you can see that we're left-hand side. You can see that we are in the same images. They are different and are so we have the the SMR band, which is the occlusion due to the robot's body. Either we add the noise and do some painting those regions, and, uh, we down-sample it, applying some additional filtering. Then we get these images and they, they have this smaller gaps. And then we, and then we evaluate our approach on the variety of the complex thrice. I will just show you a couple. Like the first one is the step stones. Slide small gaps, but we cannot just footsteps to overcome them. Or are we can work on those. Course 2, which is quite hard to, hard to walk home. And their heights are randomized. We can, or by replacing the low level controller. We can also overcome them with a slightly different style. And we can overcome law more challenging case Lang the like they are moving step stones. Or are we or we can on the Kirby posts again. So it requires pretty nice idea. I feed coordination. Or we can deploy our method forward. We are wired as well. See, it can walk on the steps, tons. And e-commerce platform bodily, you all know, probably you all know that one or sometimes they face in the real world, we don't want to hide our failure cases. Sometimes due to seem to be again, a face. And we need those safety gadget to prevent further digesters. Or our right, our right to jump into the air or less. The last topic which is about the navigation. So basically, I wonder if those robots can understand surroundings to do more meaningful tasks. Lydia, I said navigation, but actually the navigation is some sort of deal. No narrow topic is really about handing surroundings so that they can do some meaningful stops. Fourier says something is doing something and kitchen. I believe there are many industries. Idea what Toyoda, Nvidia, they are all working on their kitchen environment because it's so crowded, it requires a lot of manpower to do house chores. So this is a very challenging workplace for robots. I've been working on the navigation of the robot throughout actually a few years. One of I try the different approaches. First one is the imitation learning. But data collection on legged robot is kind of a cumbersome. So I try to imitate the navigation of the human eye. So what I did was I asked my intern to mount many different cameras on his body and walk down the walk the randomly walk the room for many times. It give us this kind of the most perspective demonstration. And then we kind of try to extract some meaningful information from that is much prettier data. Basically, we have to translate this to the robust perspective. For more details, please refer to the original paper. It turns out that actually this human, we can learn success for navigation policy out of the box. This multi-perspective human and the human demonstration. Navigation distance is two. Highs. Shirt is kind of the proof of concept. But it can find the or some desire the image based on their own without any or from the unstructured human motion data or human demonstration data without any prior knowledge. We try. And also we try the deep reinforcement learning we want I gifts and challenge by combining two approaches. First one is just care about deeply personal learning from habitat and doing seem to seem transfer to the eye Gibson, Gibson simulator and also the domain data augmentation. Basically we add a lot more people who then you're actually, it gives us the, a pretty nice performance or baseline without know argumentation is on successful. It is a human or large reader. Our method, we augmented video a lot more people. And it can successfully navigate between the human and humans and get to the final destination. But I would like to highlight our recent project, which is the learning to navigate the sidewalks in outdoor environment. Because this is, this has this was challenged directly connect to Tina, you know, their last-mile delivery for the worst at the same time. I mean, this was surprisingly challenging. How says Can you guess like those those challenges? I've been doing, mostly indoor navigation. And I thought, okay, what can be the additional challenges for outer? We simply change our environment from indoor to outdoor. It turns out that a lot we facing, We're like wheels and we have to deal with the law. More sensor noise is unstructured context, or lighting? Lighting has been changing constantly. Son is some keeps moving. Something different. Weathers. No, sorry, this case, we don't have the specific AS can data, the target environment. We all need to go to new, new, new, since no guidelines. It's not nothing like autonomous driving. There's some guideline. Cars are moving with asserting an organized way. There's no guideline, just the Schumann's to work wherever they want. And many more. So this was very challenging problems. So we have to come up with some better way to learn the more intelligent agent. Our, say, our key approach or can be summarized as T2. First one is the learning by cheating. This framework is developed by the channel, that channel there, like there are a couple of years ago. Be key idea is that first loan a teacher agent with the privileged information which cannot be accessed by the typical robot, is kind of the shading, really like knowing the older position of the nearby CORS perfectly. And then we do the clothing once again to the student agent with which is support the sensorimotor agent in the original paper, student agent has and with a realistic sensors. Apply the same technique for our framework, but our teacher agent is now learn by the deep reinforcement learning, we first learn or capable teacher with a privilege information such as the bird's-eye view, something like this. So basically endosteal surrounding context perfectly. And then we clone this agent. This is o media are more visual inputs here. We DO egocentric visual inputs. So we clone the learned behavior to student agent. This is the one thing, one key technical component of the project. And the second component, The seem to rear. Basically we have to find the sensing modalities. Previous Lydia, we all know that. We kind of know that the including more sensors are helpful when we do the deep reinforcement learning. But for this case though, that's not, it is not. Sometimes we add a sensor that they introduce more noise. And we know that the deep reinforcement learning is vulnerable to explore a extrapolation. So we need to investigate different sensing modalities, including RGB, sliders, semantics, and semantic features, which is kind of a layer between the, uh, right before the infrared semantics. And so eventually the, uh, we, we find out that the combination of the GPS or LIDAR and the RGB semantic is the best combination. How can we do this? We really did a lot of experiments. We, for instance, we test them in the rear outside work. Are we, this is the left-hand side is of feature agent hours. And the success rate is really nice. And the color agent bomb seeing to the step out of the bounds or the pumps into the object. See the numbers on the left, right corner. And the somatic agent is really vulnerable to the US into we are gaps. With more of the combined, combining these techniques, we were able to walk the many trace India, India city of Atlanta in combined more than three kilometer. Here are the fast forward. A few success for triers. They knew some certain amount of human intervention like a hand for our SE of five ish per tire. But they can actually, they are surprised. They surprisingly what, where in the real world they can follow. They can avoid oncoming humans. Yeah, they can, they can really navigate totally unseen environment. Without them. Much of the problem is, but always TO, in robotics, really fun parties, the failures. All right, Are they occasionally face or do too many regions, like it includes the failing semantics are drifting or the GPS issues are so the driveways or sometimes a confused idea. Driveways. So try to follow the driveways instead of the basin. So really there are lot of failure cases. Was chair. Yeah, we're we're kind of doing good. And sometimes a thing. Yeah. Let's do a little more than sometimes they fade due to, I guess. But it was fine. We, our robot can steer to great job in summary. So in summary, our lab is working on the more intelligent robot. We aim for a better visual motor control, including navigation. And also that we aim for more expressive locomotion, not just robots locomotion. We want to do more meaningful interaction with the environment and the human and the patient. So your question is the lag robot and the real dogs have different different body structures. They are the cherry, the eye singing, still even they are the same. I think the semantics of the motions are quite transportable. So in that sense, the imitating motion or she being the dog lover of the, inspired by the DOM motions, try to achieve their expressiveness, their locomotive behavior. The thing is personal or is very meaningful problems. Second time, I wish we can have the next version of the leg robots. They become more similar to talk with the next I was pines and tails. That's what I want to see, but I'm not really expert on devout designing new robots. So actually I'm just a recasting my colleagues to please give me more mature, mature looking robots. Thank you for the reproducibility is, is actually hot issue in academia because we know some pay probably post this number. And then if we deploy idea, we tried to replicate that their work, then we don't get the same performance. But it's really kind of a hard problem. Actually, there's a lot of, first of all, I would like to kind of the plan or the researchers because there's a nice paper code. Implementation matters. So basically the US more detail matters. In reinforcement learning enters through the robotics. So we try to share or that the tears are actually people put the people and the entire community put a lot of effort on sharing details like open-source hardware or possessing the code posterior. Sometimes it's really hard. But, and also I would like to highlight another effort. So in dest and some other researchers, they try to investigate more offline learning. So basically they pretrained, train only using the offline existing data, just like the, uh, you know, the classification problem. But I'm honestly DO I understand their point? But I think the essay, a big fan of the robots. I think they are just whatever we can do with the real robot, then that actually makes the CEF. So I think there's some sort of the big topic we need to face for many years. Yahoo, but I'm not sure what is the really big answer for that question. Sorry, can you speak up a little more? Movies? I'm not sure which basically you are asking us. We can do single policy contract or the multi-port policy control, I guess. So really, it depends on the different projects. Sometimes we use train DEA 1 control policy for everything. Sometimes the LFO research, the alanine on bureau of a project, we default, we train the different policies for forward and backward gates. And then we kind of have the gating policy to select which one is which one. Or sometimes we can kind of deal, fuse everything into one big policy using behavior cloning. So there can be different coordination depending on the project. 0, 0, 0, 0, if I try and the amount multiple times we get the same gaits, know, really know. Sometimes it really depending on our reward function designs, sometimes DEA 1, one gate has a large region of attraction. So every time I read on the learning, convert it to that specific gate. But sometimes just it converts to a very random different things depending on the definition of the task. Tesco question. Typically we just generalized year target joint position. Sometimes actually, I would like to correct myself. Sometimes the joint never Kurdish position, but sometimes the high level commands to the low-level MPC. Both cases. And we can do about actually enters the eye. I need to one more correction for miniature robots or SDR, really low motor control seigneur, like the Qur'an to the motors Israel idea. Or it can be viewed as the torque control. So you can do anything. By the way, just, I would like to share one thing. This is a little bit off from my research is surf picture. I'm seeing interesting trend in the bone. Your research. My folks and the IT companies, their expertise and their deep reinforcement learning. And they start to learn and try to learn about the model-based control because they are kind of frustrated about the performance of the deep reinforcement learning. And they sing the a by combining more prior knowledge, they can make their learning better as the one trend there. And on the other hand, I'm saying my mother base of friends, a model model-based control community. They're frustrated by their intimidated by the recent progress in deep on a community. And they're trying to learn the deep reinforcement, learning to make them better. So we really do different community. They are trying to learn from each other. Seeing the next few years, I think there will be a lot of interchangeable research topics, like an interdisciplinary research topics in the community. I think the, I think that trend is very interesting in the community. The latter-day saint, but I don't think the reason for complex back. All right, everyone. Hopefully you're a woman, will say you are talking about the window robot. He's kind of brought to the media and I think we can do both learning and more. They're based control. I don't want to limit the approach to one of actually there's some work like I skip some parts but I have some preliminary work using methylmalonic back to when the not the imputed, but the window. One more term becomes very weak there. How can you react to those situations? The reinforcement learning you can do is with meta-learning model based control. I see India, you can see here, just try to estimate the, uh, what's going on and the MPC, the end probably you will be able to react to take given situation. Thank you.