All right. Thank you. A wheelchair. Okay, thanks everyone for coming. I'm to this host. I'm Lu Gan from the School of Aerospace Engineering. It's my great pleasure to introduce Mani Gaffer, who's also my former PhD co advisor, and we had a great time working in Michigan winter winter and winter. So Many he's currently assistant professor in University of Michigan in the Department of Naval Architecture and Marine Engineering and also the Department of Robotics. He's leading computational autonomy and robotics lab. Also, he's a recipient of Amazon Research Award, and his received the best paper finalists from RSS, and welcome the stages. He's right. Good. Thank you, everybody, and it's a great pleasure to be here. I apologize for my voice, and it's a little rough. I'll do my best to be loud and clear. And today, I'm going to talk about computational symmetry and learning. My lab works on a full range of robotics problem from perception to control. But I'm talking about a line of work that sort of fits in to this narrative which I personally like and invested a lot of my time in it. Okay, so one is sleeping. So one particular line of work I like to work on robotics is robotics for unstructured environments when we don't have a lot of prior information. And traditionally, that makes robotics very interesting because you want the robot, throw it somewhere. You do something for you, could be exploration or manipulating objects without telling the robot anything. Obviously, that's very difficult. Truth is somewhere in between. We work maybe in the lab and do field tests and a range of different things, but it will be good if one day robots can actually help us with, you know, fire, flood, and all the, you know, natural disasters that we see. And they are very common. We hear them every year. Although I'm in Michigan, we don't have them much often, but it's very hard to watch every year, you know, when we see fire and other types of natural disasters. Another reason is, this aging population that sort of used to be cliche, but it's very important if you want to do agriculture, for example, to provide food. Since we need robots to work on it. That's an important area. And generally workforce in factories, manufacturing, a lot of jobs that people no longer want to do. We need robots for those, which the hype in humans basically sort of confirming that there's a push for eventually having a robot, although it might not be in the next few years. But the question is, that's basically what everybody is after. How would you build a robot that can deliver what humans can deliver and more possibly in certain missions that could be principles that applies to childcare, education, taking care of elderly or exploring space and maybe interacting with people at the museum because people can do a range of these tasks, and surely, there are some principles behind intelligence that we could work on those. So in my work, sort of, if I have to run a PCA and find three directions, there are other dimensions, but this is what I come up with. So one important direction is learning and optimization, which broadly speaking, sort of covers all the classical systems theory, maybe estimation, control, and modern versions of it when we do maybe deep learning and optimization. So that's obviously very important. That's how we solve problems. And another direction that I think is very important is computational symmetry, and that seems to be independent of the other methods that I talked about. And a difficult direction that is pointing up is the new principles, how you put together a robot as an integrated autonomous system. We tend to stack them as a hierarchy of modules and then build a system to operate that takes a team of people, and a lot of time after the project is over, we have to do it all over again for the next project. It's not very easy. And I think there's certainly room to work on that area as well. But today, I'm mostly sort of talking in this XY plane. Not much in the third direction. So why I chose computational symmetry, the reason behind is that symmetry is sort of very important in mathematical physics, and that's the way language we used to model the universe and understanding physical properties. So in physics, people tend to study the symmetry of the system because then we can know all possible states of the system under certain transformations. And that gives a lot of information on how the system is evolving. So mathematical physics, sort of, it is a good common ground when we can tie robotics to physics, math, engineering, and also by inspiration. And the computational approach obviously is indispensable. Pretty sure all of you you solve problems, you use computer, you code it, unless you're forced to do it for a couple of times in some courses, just to find out the solution by hand, we no longer solve equations by hand. It's all done by computers. That includes ODEs, PDEs and linear systems and then linear systems. So and that includes modern versions of it that is on GPU and parallel processing. But the reason behind symmetry is generalization. If you want the robot to operate in all sort of environment, we need generalization. And the principle that I claim that at least one of the principle that enables generalization is symmetry. So if you understand the symmetries of the system, then you are prepared for a range of similar problems that will be sort of possibly thrown at the robots and the robot is already as if it has seen them in the training set or examples that we already showed the robot, although they're new examples. And the way to sort of imagine that is a equivalence classes. So what I want to do, I want to build equivalence classes for different data and tasks and teach the robot this abstracted level of equivalence classes. And then by virtue of doing that, every time the robot observes something that fits in within that same class, it's very familiar as if it has been in the test training set. So that's a good way to use less data, get more generalization. In theory, it removes the need for data augmentation, but in practice, obviously, you could still do both of them. And what is a good language to incorporate symmetry for Continuous symmetries, i group is best mathematical tools we have to model such problems. Lig groups are basically a group of nice transformations that sort of can be applied to vector spaces and surfaces, and they can transform them. And sort of familiar examples are rotation. You can grab a vector and rotate it. You can grab a surface and rotate it. That transformation is a le group. But for the purpose of this talk, you can think about a group of a set of invertable matrices. So we have matrices that are invertable. They're all nice, and then we do matrix multiplication as the group operation, and that's sort of my lee group object. There are discrete symmetries as well. Lou has done some good work on graph symmetries and morphological symmetries that you can talk to her. I will talk about continuous symmetries. So the first problem. Now I'm going to talk about a range of problems and tell you how this symmetry can help us to formulate and solve the problems in a different way. So for robot stated estimation problem, the problem is that the robot is moving with respect to some word frame and then we have this body frame, and we want to know how fast we're going and what's the three D rotation and position. And we want to do it in real time and as fast as possible. So we have a library for this. It's called drift. It's using something called invariant extended Calma filter. It's a filter. It's a state estimator that uses principle of symmetry preserving basically method to track the robot. And we can run it on a variety of platforms, but what it helps here, how we make that symmetry idea tangible is that these group of matrices, geometrically, it forms something called symmetric spaces. They're homogeneous space. So we ask a question that what is a class of problem that sort of it would give me some type of error variables that I can track. And that doesn't depend on my current operating point where I am, because the intuition is that if the space is homogeneous and symmetric, that means it all looks the same everywhere, right? Mathematically, topologically. If that's the case, then it doesn't matter where you are. The variations that you're doing locally, that should matter, right? Because wherever you are, it looks the same. What you're doing should be important, okay? So the class of function that will give you this property is called group affine. Now, if you have a process model that is group affine, and then you model the error using invariant could be left or right. What happens is that you can build a filter, for example, for a very complicated system like a legged robot, you have IMU, contact sensors, coder data, and then you want to track the body velocity and hose in real time. So you follow the principle of symmetry, which sort of I'm jumping ahead just to give you a flavor of it. What happens is that if everywhere is the same, then I can always do all my calculation at the identity just because all groups have the identity element, and my Jacobians are constant. Although the systems are linear, when I linearize them, the direction I need to move and propagate error is always the same because everywhere is the same. So that's very surprising because if you have worked ever on linear filtering, you need to deal with a lot of nonlinearity and coordinates and drive the Jacubians and evaluate them, and that causes a lot of problem in the state estimation. So that lesson from the lesson from symmetry in state estimation within this classical sort of framework was very important. And what was more interesting is that it made a big difference on the robots. We could put it on cheetah, a robot, full size army vehicle, husky, indoor robots, marine robot, and consistently, we could get very reliable robots. So the robot sort of is belind. You just have an inertial measurement unit, IMU, and some type of maybe kinematic information like body velocity or wheel encoder data for leg a robot joint encoders, and we do proper receptive tracking. And we get the accuracy in trajectory and velocity that is comparable to say RTK GPS. Which is very surprising for a very long distance, something like a few kilometers. I can speed this up. So Michita is moving outside of robotics building on different trains in Michigan. So the state estimator assuming the contact of the foot with the ground is fixed, although it's not in reality fixed and cracking the position. We don't have ground through, so we overlaid on satellite Mv and you see it's correct. And we just use IMU and joint encoders, right? No vision or extraceptive sensors. In this case, we use an IMU and a single shaft encoder on, I think, maybe front wheels of the vehicle. This is in Pittsburgh on a full size army vehicle. We had a project with Ground Vehicle Center. Obviously, they are interested in the filter because if it's fatal time and there's no signal and you want to just track and go fast for a long time, this is perfect. You can go for several miles and you know still where you are, you're not lost. It could be completely dark, for example, if you use ID or an army vehicle, obviously revealing where you are, active sensors are not allowed. You could use camera, but if it's foggy, it's dusty or it's chaotic, images might not be very helpful. This one is an indoor robot so we collaborated with Amazon Lab 126. They were interested in this filter because of Astro robots, and it turned out that that's also a very good and low cost status toimato for that type of consumer robots. We drove Husky on campus to see how far we can go 3 kilometers, but this is when the battery sort of died. I think we could do more. That's the limit of the hardware, not the filter. This one is simulation, but recently, we also used GPS and IMU to estimate the position of surface vehicle, and it works equally well. So this was a very encouraging work because we could get some principles and then see real world impact. That sort of encourages us to look farther into this line of work. And the natural question was that, can we do control with it? Because if you can do estimation, to do all of that is control. And if it's possible to build a column filter, so we should be able to do something like LQR, for example. And so the filter was built by Bar and Bonneville in France, so we were inspired by them. But then with Sangle and Will in Michigan, we thought we should be able to build a controller using the same lesson. So the goal is to build a tracking controller that is invariant to the current operating point. No matter how far we are from the desired trajectory, we will still converge. Okay? So it does not matter how far we are. It matters how we move. We want to use that principle. So this was slightly more complicated because for control, you need to bring in equation of motion. We typically write down the kinematic equation of motion on li groups matrix group using reconstruction equation. You might be familiar with the rotation version of it where the time derivative of the rotation equals rotation matrix times angle of velocity. I usually robotics, students are familiar with this rotation kinematics. And that works for any matrix lie group. This is twist twist is basically linear velocity and angular velocity. It's a six dimensional vector, although in matrix basis. So time derivative of the pose, rigid body transformation that includes rotation and position equals rigid body times twist. So we have the desired kinematics equation and the actual kinematic equation of the robot. Define the error using group matrix multiplication, which ideally this will be identity, take the time derivative, we get the aerodynamics. So the aerodynamics, as you see, it doesn't depend on X at all, the state. It only depends on the error itself. And the delta between velocity has a joint map, a joint map is basically similarity transformation because we're at this point in space, the desired trajectory is here, the velocities are not in the same frame. So we need to do a change of basis before taking the difference of two velocity vectors. That's all's going on. For the dynamics, typically you see ele Lagrange equations and generalized coordinates. But that's not good because it doesn't really work with the intrinsic structure of this type of groups. What we want to do, we want the dynamics to be evolving on these matrix groups. And so in symmetry and mechanics, this is a sort of solve problem. They call it Euler Pancre equation. So Pancras studied equation of motion in the Lie algebra, the linearization of the group at the identity, which gives you a vector space, and you can locally sort of think of this matrix group, which is a nolinear as a vector space. So you get this nice equation that the three D version of it is the Euler equation of a rigid body, I Omega dot plus Omega Omega cross I Omega torque, right? This one is the twist. Okay, so I have the equation of motion also that only depends on twist. It doesn't depend on X. Now I linearize. You don't need calculus. You just linearize using this X by formulating the error on the group to be the exponential of error in the lie algebra, you can do that because from a geometry perspective, you have this local exponential coordinate as a geodesic on these matrix lie groups. Basically, this exponential map is your straight line if you want to move along matrix groups. I can always primeterize it using exponential of something else and then you do the first order approximation. After all of that and linearization, we get a set of affine constraints. So that's nice because if I now formulate a quadratic cost function, put together the error and everything for MPC and add the affine constraints as your constraints to the optimization, all I need to do is to solve the quadratic program, convex and you can solve it using off the shelf solvers, like OIS QP. You can build a robot controller for rigid body systems using convex QP. That's very attractive. Before that, people would do nonlinear MPC, or maybe I still do. So we tested this on a surface robot. We need to add also hydrodynamic forces, but we can also sort of add a linear term. And we have so waves and current. This is a mathematical simulation. It's not physics based. So the nonlinear NPC has privileged information because it's using the same model to simulate. But what's important is that now challenge in non linear NPC is that you cannot increase the horizon for too long because it's expensive for self driving cars, airplanes, and all sort of, you know, real world platforms. We want to run NPC and we want to run it for long horizon, but it's expensive. This one is ten times faster, so you can increase the horizon much longer. And the performance, it's a little lower, but all within the same, you know, reasonable. 10 centimeters error for surface robot is almost negligible. So this was again, another very interesting results of symmetry and computation and looking at the problem differently from the beginning and sort of writing down the formulation differently, so from there, then I want to talk about a different type of symmetry that is not about sort of the error, but it's about energy. So motion planning in robotics is a problem of your trajectory optimization is that you have the robot and it has some type of equation and motion, and you have some type of obstacles in the environment. You want to find the trajectory that takes the robot from start to go, and this trajectory needs to be inodynamically feasible. It is to satisfy the equation of motion of the robot. Typically nonconvex, very difficult to solve, very expensive, and it's very challenging. Boston Dynamics, for example, is a sort of hallmark of model based optimization when they can do all sort of, you know, maneuvers with these methods, right? But they're all local sort of solutions with a lot of a lot of work and careful initialization. So the equation of motion for the robot, if you think about it, they sort of complex because of two reasons. One reason is that they're just nonlinear, and the type of solver we use are local. So that's just the problem of whether we use local solver or global optimization method. Of course, global optimization method will be slower and not real time. Another problem is that how do we model them? The coordinates you choose can make your model more nolinear or less nonlinear. And in particular, what is the most natural way to model the equation of motion? For example, if you're modeling your robot as a single or multiple rigid bodies. People tend to use Lagrangian dynamics, and then you write down generalized equation of motion, and that gives you a set of ODEs. Typically, these matrices depend on your Q and derivatives. Very nonlinear. If you have multi body, you get pages and pages of equations. It's not something that you can easily put it as a constraint in your optimization. And that's sort of a classical way of doing calculus of variation to get equation of motion. This method first derives the continuous time dynamics and then discretize it to integrate. This does not preserve the energy because using the principle of least action for a closed system, this conservative system, the total energy should be preserved. And that's the principle we drive the equation of motion from, which typically you see it in robotics book as these set of equations. And the alternative to that, which was studied actually a while ago by Derry Marsden. And in this very nice paper, you can find it's about discrete mechanics. So in discrete mechanics and variational integration, the idea is that instead of driving the continuous time equation of motion and then discretizing it, we first discretize the lagrangian and then derive the integration rule that preserves that lagrangian in discrete time steps as we move forward. This is a smarter way to do it because however we are integrating by driving that rule is guaranteed to preserve the energy. So we get a nicer set of equation to move forward in time. So this is called variational integrator. So you get this constraint basically the derivative of your discrete lagrangia with respect to the first variable plus the second one should be zero, and that gives you sort of some type of equation of motion. But if you do it on li group, something interesting happens because a robot moves in three D space. It has rotation and translation simultaneously. And on SO three, three D rotation group we can write it directly as matrices. You don't need to open it up, look at all those sine and cosine. Just keep it as matrix. Of course, this is a three by three matrix with all nine variables in it, and you get this kinematic equation for integration and this implicit equation for the dynamics. And this is called Li group variational integrator. This is very different if you would write down the continuous time model and then integrate. Plus, it's very compact compared to generalized coordinates. So you might have noticed that if all of these are variables, what is the highest order of terms here? Two. So we get quadratic polynomials. So my equation of motion using Li Guru variation integrator now is exactly quadratic polynomial. This is perfect for global optimization method known as sums of the squares. If everything is SOS, then you can solve the problem globally uptimally. Plus this formulation, I'll talk about it, it helps with a new local solver that will be also very fast. So if you formulate it like this, for example, when you integrate and move forward in time, when you plot the phases space, the system doesn't drift as expected, it preserves the total volume in the phases space. That's what you would want to see. If you do eulear integration, it will drift. Okay. On top of that, it's quadratic. So this was something we did in RSS 2023. There are different pieces. If you model your robot dynamics or equation of motion using Ligur variation integrator, whether for single or multiple rigid bodies, which later we extended in the journal version, you get exactly quadratic polynomials. And then you can use that to formulate polynomial optimization. And there are some advances in polynomial optimization known as Less hierarchy, and you can apply a method called moment relaxation and then they solve a sequence of semidefinite programming to find a globally optimal solution. And there is a way to verify. When you solve the STP, you check the rank of your solution. If it's rank one, you know you find it. That's why they call it certifiable method. So you find a globally optimal solution and you know you have found it. So that's because if you run a local sola, you might find it, but you don't know if it is the global solution. So this solves the trajectory optimization problem globally optimally, which sort was a milestone in the literature. And, you know, the math can get, you know, very complicated. You put together moment matrix, and you have to write down a lot of variables, which I don't intend to go through details. And but then you solve sort of a standard optimization problem and you can actually pass it to a package that is available. I think that's important. It's not real time. It takes some time to solve it, but at least we can solve it. One other important property is that temporal sparsity. So previous methods who attempted to use polynomial optimization, they had to deal always with tense problem formulation if they want to do it over some time horizon. But they sort of overlook this problem that because your dynamics and equation of motion are often markov processes. That means they only correlate to time steps. They don't have correlation across multiple time steps. That means that you can think about these variables, and you put them in a matrix that you have this band diagonal correlation, right? Because variables that are temporarily far away more than one step, they are not related. They don't have correlation. So you get this sparsity. So exploiting this sparsity basically improves the complexity to grow linearly with time instead of exponential. Previous method would use mixed integer program, and that would grow exponentially with time. So this was another important property. As a result of that, for example, inverse kinematics, you can easily solve and verify. This is just for verify for load up problem. Using second order relaxation to give you more details. You can solve it. You want to do drone landing, you have obstacles. You can solve using this method and find a globally optimal trajectory. You can imagine for safety critical mission, this would be important. You have the setup and you want to make sure you have the right trajectory to follow later. So we can solve it globally offline. And cart pole problem and different drone with suspended load, there's a drone, there's a cable, there's a load, and then it goes through obstacles. We can still solve this problem. So in this case, it takes about 5.5 seconds to solve it. So another surprising thing that came up is that instead of matrices as 03, if you use quaternion, it gets much faster. You can do first order relaxation, some details for what would be the size of the moment matrix that you code in optimization. So use quaternions. That will make it faster. What happened? I'm not going to pay for this. Taking advantage of the time to put pressure on me. No way. But probably the government buys this. Government was buying wind are, right? That was crazy. Okay, so another but the global optimization is good, but SDPs are sort of, you know, very slowly line of research in optimization. Everybody wants to make them faster, but it's not very easy. But a lot of time they give non trivial initial guess for local solvers. You might need to refine it still because of numerical accuracy. So it's nice to have a local solver that is very reliable and you can quickly run it to have a warmer start and then get a finer solution. So this line of work is very recent. I just thought maybe it's interesting to tell you about it. And the idea of optimization sort of manifold is that so a lot of these robotic problems are geometric problems because of the nature of space and how the robot is sort of modeled. So historically, people sort of found out that a lot of constraints that we have in the optimization could form a surface that then we could do something called Riemannian optimization on that surface, to make the problem unconstrained. That means that if, for example, you want to find a solution on the sphere and the sphere is a constraint on your variables in the optimization. But if you could just move on the sphere intrinsically, that would make your problem unconstrained, of course. So that was a very, you know, good realization in the library man opt, you can solve such problems. But of course, sometimes you want to do optimization on manifold, and you can still add constraints. This is important for a lot of control problems that we have inequality heard constraints. Okay. And this is GD SAM, for example, also here in Georgia Tech, that we use in Michigan also a lot as using this type of sort of paradigm. For general constraint problem, you would need to solve interior point problem. Now, you can also do imanan interior point problem to respect the geometry of the problem. And the rough idea is that instead of just assuming the space is flat, follow the geodesics, right? The straight lines that sort of respects the curvature of the space. And for that, you need to satisfy the KKT condition which makes it harder, but that is still done. There are some work. Then we thought if people trying to do it for rigid body, say, in this case, this is a very special problem. Rigid body, again, is a matrix lee group. It's a symmetric space, it's homogeneous. Can we make a symmetric version of this? And it turned out that we can. So using the same symmetry ideas because computing this gradient maybe is fine, but then you have to compute second order derivatives are manifold, and these are not very fun to compute. You need to be trained to do it if you want to do it by hand. If you do it automatic differentiation, obviously, it will be slow. So if you use symmetry, there is a good way to which we have it in the paper that sort of will appear. You have a really nice way to get second order derivatives and groups that sort of that's like doing algebra. That was very nice. The idea is that, again, if a space is the same, every time I need to iterate in the optimization, I shift everything to the identity. I sort of precondition all my variables to be at the identity, and that helps me to get the gradient and Hessian in the Lie algebra of the group. So you can have an interior point method that um operates on group, and it seems to be working really good. It has a really high success rate and doesn't really for the most part, doesn't violate KKT conditions. So I think this will be there are more details, but if, you know, you're interested in this area, I think this will be a very good local solver that uses symmetry. Right. So I want to move on to perception T is arguably more visual and interesting, maybe. So for three D perception, how would you think about symmetry? So we need to understand the three D environment. The big problem in perception and basically the umbrella problem that consumes people in robotics, CVPR, and related conferences is that you would want to understand the scene, right? You want to look at the scene. You want to understand what's going on, whether it's a static dynamics or objects, what are the relationships of the objects and so on. So scene understanding and to be able to interact in three D world. That's the goal. And some problems are safety critical, could be driving, space missions, and so on. And for robotic systems, a lot of time the resource constrained. We're not going to have, a lot of GPUs on a single robot. So the safety critical part of it means that we need algorithms to be reliable, and resource constraint means that they have to be efficient. And this is okay for two D because we can collect a lot of data and label a lot of data, but it's not that easy to collect larger scale three D, and now these days, four D datasets. Takes a lot of time to collect sort of image net size datasets for higher dimensions because obviously exponentially, it's more expensive. And as you see in the literature, the growth is the slope, it has a much lower trend for larger models and datasets. So I'm not saying it's not possible. I'm saying to make the mainstream and you can run it on, you know, consumer sized GPUs, it has not been as easy. Okay, so it's hard to scale up and it's not really addressing the efficiency, scaling up these models. But the idea of symmetry here is that you could reduce your sampling space using symmetry, and that's equivalence classes. Ideally, if you have seen a chair, then you have seen the chair from every possible angle that should help you to identify this chair or perform other tasks. This is a form of symmetry, in particular rotation symmetry in this case. Okay, so So to model basically a lot of entities, we need large models. But to handle this variability in observations, we could resort to symmetry, which is called equivariant learning in this case. And sort of intuitively, equivariant is equivariance is the property that basically the transformation commutes with your map. You have a feature map, for example, you have a network that extract edges or features from an image. You translate the image and then extract the edges or segment it, you get the same results as if you segment the image and then translate it. So it doesn't matter the order of transformation. So the transformation commutes with your map, the map could be a neural network. Or in two D, if it's equivariant as you rotate the image, the features are the same. They're just rotating, makes it very predictable. You know exactly what's going on in the latent space. Your features are stable, they're just rotating. Or if they're invariant, which is great for detection tasks or classification, they should not change and Okay, so then if that's the case, typically you would want to build equivariant with respect to some transformation groups, and the typical ones are two D rotation, two d rotation and translation, three D rotation, three D translation, rotation, and so on. Okay. And classical convolutional neural network are translation equivarian, Convolution sort of commutes with trans shift. So to generalize this, the idea in the literature is pretty straightforward in terms of math, but it gets complicated quickly. What you would do, you would do convolutional basic group convolution. So instead of classical convolution over just translation in two degree, you have to integrate over all possible transformation at every layer. That could be very expensive because for groups like rotation, you have infinite number of variables, right? Well you could do something smarter and that's called going to the cohen space. So you can sort of divide your original group by some subgroup that gives you a sort of a smaller space to work with where your group still acts on that. For example, if you want to do treat rotation, you can sort of remove one axis of rotation and work by rotation on the sphere using two axis. So that's one. So then you can reduce dimensionality and gain more efficiency, which is the example here. The cotient of your SO three by SO two will give you a sphere. You say one dimension. So instead of six Df convolution, you can do five DF and sort of you have these three axis of rotation at every possible point because SO three is the symmetric group of a sphere because if you have a sphere, you rotate it no matter what, and then you close your eyes, open it, you're not going to know if this sphere is changed. It's a symmetry group of the sphere. So if you dscretize this some way to implement it as convolution, you would have 60 operation. At each time. But if you take the quotien and go to the sphere, you only need two variables. You basically remove this role rotation about the normal, and then we have to sort of enumerate over 12 variables instead of 60. And we're going to do many of them in a neural network, so that's a lot of saving. So this is what we did. And the question is that if you go to the quotient space, can you recover the original dimension that you lost? It turns out that you can. So in this case, we using this symmetric discretization, meaning that if you um use this quisohedron, which is the finest symmetric discretization of the sphere. If you label every vertex with a unique color, any possible configuration of this in discrete rotation will be permutation of these nodes. You can get 60 possible rotation, and then you can supervise it to see which rotation it belongs. Up to discretization, you can recover the lost degree of freedom. That's also possible. And what you gain is that in terms of memory four or five times, um, improvement in terms of speed also roughly five times or a few times better. And that's very promising because you can process point clouds and sort of if you optimize this network, it can be real time. This network is not optimized. You could do tensor RT and other things to make it much faster. And we sort of build with Cynthia, we build a place recognition. So if you have this encoder, let's learn SE three invariant descriptors and then do place recognition. Okay? And what you wouldn't see in previous work is that if my features are invariant, no matter how moving in the space, I should see the same features. They should be stable. So in this case, we're rotating and translating the same point cloud using three D rotation and translation, and the detection rate is basically 100%. In an ideal case, you make no mistake because the features are invariant. In practice, obviously, you have sparsity noise, partial overlap, and a lot of other challenges. But if you do some challenging test, it trained on Oxford dataset, tested on Kitty dataset, let's take the state of the art from CVPR at the time and um you sort of test on one side. This is sequence zero, this is sequence eight. Basically what happens is that during the training, you go one way. During test, the car is coming from the other side of the road. A typical network has a breakdown here with 3%. The performance drops to three. But the equivariant network, because of equivalence classes, at least is maintaining 60, right? The accuracy. So because there are a lot of similarities, even if you rotate the scene, they still belong to the same place. This is just using Point Cloud. I think combining with images should make it better. I think I have maybe ten more minutes. So the last work I want to talk about is the neural network that we built from scratch. So so one sort of dream is that we're doing all these classical works and leg groups, and they're really promising, but I really like to do them as a network, implementing them like a neural network. It won't be a neural network, but it's a geometric sort of network that we implement them using library like Pytoch and running on GPU. It's using the same principles and doing something interesting like deep learning, differentiability, parallel processing. I think that will end representation learning. That will add a lot of power to these methods. So the typical so some background. So when I talk about group representation here, I mean, linear action of your matrix groups on some vector spaces. For say three space, the representation of your three D rotation is three by three matrices, right? If you pick a different vector space, you have to come up with a different matrix to model it. Particular representation that sort of comes up is the adjoint representation. If you work with the tangenus space of the group at the identity Lie algebra, that's a vector space. That's a space where you can model velocities, like twist and angular velocity. Every group acts naturally on that vector space a conjugation. For matrices, this conjugation is matrix similarity. It's a change of basis. So you can track velocity always at the identity by doing change of basis wherever you are, which a lot of people work on manipulation, you're familiar. You have velocity at the end of vector, you want to know it in the base frame, you do change your basis using adjoint map. So and then we're going to build a network that is algebraic and equivariant to that a joint action. So the traditional MLP, what it's doing is that each neuron is a scalar. A different way to think about it is that now think about each neuron as a group or category. So I think of a neuron itself to be a group itself. So we have a different type of network where, in this case, we're using group linearization, the algebra, but every neuron is sort of a algebra itself, a copy of the algebra. So the multilayer perceptorum would be something like this. You have neurons and connections and input output. It was a very nice work that we were actually inspired by this first, vector neuron. What they did is that you think of a neuron as a vector. And when it's a vector, then you can make it rotation equivariant because if you use that product, two vectors, if they rotate together, the dot product will not change. That gives you an invariant function. You can build an activation out of that. So if you rotate the input, you can sort of keep the features equivariant to that rotation. What we're doing is that we're generalizing it to be equivariant to this conjugation. And by doing that, this will generalize it to a very broad class of lie groups. Most general one will be special linear group of dimension NSL N. That's class of semi simple Lie algebras. Okay. So that's the and how do you build activation out of that? The key is that every semi simple algebra, that's why is limited to that comes with a trace form. These are basically matrices. You can find them for each algebra, and then the trace form will give you a conjugation or a joint invariant function. Then out of that, we can build the equivariant activation function. So it's invariant to conjugation. So we have a neural network that commutes with conjugation. We call it neurons. So from there, it's then easy to build a linear layer. You can't have bias. Bias will break the symmetry. It's only linear. And then the activation looks like something like this, which you have questions later, happy to explain. That mimics basically relu and vector neuron. This is not a metric. It's very important that it doesn't measure distance, but still it works. There's another nonlinearity that Li algebra has a structure called et. Lie Bachet you can think of it as The type of derivative. It's a derivative of one vector field with respect to another one. And together, when we have this type of layer, you can think of it as type of residual layer. The signal comes in, and then you have a residual Delta that you learn some directions and add a Delta to it. So we have this resonatostyle, basically layer as so pooling is very easy. You can just, you know, do Max pool and you can build invariant layers using killing form. So you can sort of build the whole neural network out of this. Things you can do with it, there's this classical problem called BCH formula, and it shows a lot in a lot of previous problems that I talked about, including slam and post graphs. So you have the exponential of two matrices, and then you want to know what is Z because it's not X plus Y if you're dealing with matrices. Okay. Unless matrices commute. There's an infinite series called BCH Baker Campbell Hausdorff series that it tells you that because if X Y are Lee algebra element, exponential of them will be your group element. Basically, you're doing matrix multiplication. This means that you can only do bracket and then do equivalent of group operation in the Lie algebra. That's the importance of this group. But anyway, the question is what is Z. So we train a neuron network to tell us what is Z, and MLP, it works. This is vector neuron, this is L neuron, now the interesting part is that this is training. During test, what we do we rotate the test set. That means that the network has not seen it. Because linearon is sort of agnostic to that, the performance remain exactly the same. Because of equivalence classes, it makes no difference if you rotate the input. It's the same class as the training set. But MLP sort of start the performance degrades. And this also shows why people just want to scale and add data because it actually works if you build a large model and add more and more data. It's just a doesn't seem to be the best approach and most efficient approach solve the problem. We want to learn the rigid body dynamics, something like this, although this is highly non rigid, but let's say we assume it's a rigid body rotating. And we're going to use neural OD. Neural OD learns the vector field as a neural network and then integrates it using off the shelf ODE instead of using a model for the OD. We replace the MLP with lineuron. This will give us basically equivariant neural OD. And this equation and motion are equivariant to rotation because if you rotate a rigid body, the inertia matrix undergoes conjugation. For those of you have mechanics background, the inertia matrix will go under conjugation and then the whole thing will be equivariant to rotation. You can factor rotation out of it. All right. So if you do that, what happens is that you can learn perfectly the dynamics. And interestingly, again, the error on the test set and training set are exactly the same. It does not change. This is on multiple trajectories. If you train only on one trajectory, theoretically, if you have seen one trajectory, you should learn the dynamics because every other trajectory is just a change of frame of that same trajectory in the space. That's theoretically, right? With more data, it does get better, of course. So that works too. At the same trajectory, we train, and then we continue forward in time, we still get the same results. It's very stable during test time. This is for homography tests, we have an object. We sort of do change of perspective. This is especial linear transformation, homography, computer vision, then we can basically detect it with, again, high accuracy, even Ivinen we rotate it, but if you use MLP, obviously, the performance will degrade. Another application you can do if you have IMU, you have angular velocity and linear acceleration. These are measurements in the body frame. And if you think of the motion of the robot, basically you would want them to be equivariant to rotation. That means if you train in your network using your IMU and then later you place the IMU at a different rotation on the robot, your network should not fail. You wouldn't want that. But it will fail because the gravity will be different. Gravity direction combined with other directions of the data will be different at a different rotation of the sensor, and the network has not seen that. But if you build a SO three equivariant using that linear on network, you can make it agnostic to the direction of gravity. So you learn this network, one way or another, to do the inertial odometry, and it is agnostic of how you will place it on the robot during test done. This would be great for any manufacturer who wants to sell that sort of network embedded in the IMU, right? They do sell EKF in the IMU, so nothing stops them from adding a tiny network inside the sensor in the future. And we can also use filtering. We can learn the velocity from that neural inertial odometry and then do filtering. And what happens is that this is a little complicated. I couldn't convince my student just give me two figures. He thought, it's important for all of you to see all of them. I disagree, obviously. So so I have to talk. So the previous work Tello, which is an important work in this area, it doesn't really work if you try body frame because if you try to learn initial odometry in the body frame, you truly need to learn the motion pattern. If you learn it in the word frame, you can overfit to the setup and the motion capture system, whatever you have. That makes it easier. So when you do equivariant, it works in the body frame, and when you rotate again, different configuration, you get better results. Okay? So now, that's all symmetry, and I'm just going to end with this video about which is not about symmetry to wake you up. Now, increasingly, a lot of interesting work are Hybrid, right? The student in my lab, Hanglo and SanglTank they trained this robot to do skateboarding, Simy L zero shot. The method is, they call it discrete hybrid automata learning. So we formulate the discrete hybrid automata and then learn a hybrid system. The LED lights are the modes. So we learn how many modes and then which mode the robot is in, and each mode has a different networks as a dynamics, and then the robot sort of automatically chooses to do whatever action needed to go to the goal, right? Now they're cheating because it's not a multi body. They fix one leg on the skateboard. Um, but I guess for the purpose of learning the hybrid system, it's really good. So we've got a lot of problem. For mechanical system, contact obviously makes the system hybrid. But if you want to manipulate objects, grab this, move things around, still, you know, you're dealing with this hybrid notion of objects, how you're going to place them. So you have this mixture of continuous discrete variables. So I think this work is very important. We haven't done symmetry on it. So the next thing is that looking into symmetry structure of the hybrid problems that I think are very important for robot learning. And with that, I want to finish. But I'm okay. And sort of the final thoughts is that if you have a problem that you think there are inherent symmetries, it's always better to use them, so it cannot hurt to exploit them. And ignoring them often hurts. And what you get in practice is efficient generalizable algorithm. So if you do it smartly, you can get efficiency, although initially it might look more complicated. And typically, they generalize better so equivarian representation learning sort of makes it very explainable. You know exactly how your features are changing. So you know what will come. And which sort of the lessons that some of these ideas are very old, you know, from variational calculus and optimal control. And what's exciting like that hybrid system is that having this notion of memory and combining it with this type of, you know, optimal control for that skateboarding task that also uses symmetry. I think these are not flashy, but I think if we do solve them, I think we can have a very big jump in robotics to actually do real world things. Thank you. And let's talk. Yeah. Any questions? Do you require some So the question is do we need to break down the environment into convex sets for doing that obstacle avoidance? No, but sort of we know the environment and the constraints are sort of quadratic constraints, some type of distance constraint. So it fits in nicely into the SOS. If it's more complicated, I think that will be a hard challenge. But it would be nice. I think to really do it on a robot, I think we would need something like breaking down the environment into pieces. I have a question on the perception part. You said you test a train on one direction and test on the other direction. So if you add one more modality, if you still satisfy the if you add the image, for example, there is a car, so the front of the car and the backside of the car, they are different like the lighters feature. They satisfy the so that's a good question. You're asking if we use different modalities, whether it satisfies. Also, do you mean if you combine them or just using camera, for example. Adding one more modality. Adding one more modality. So the problem is that right now, sort of ongoing work, we don't have a good network to model symmetry of projective spaces like images. So we have it for Point Cloud. It works very nicely. For camera, we're working on it. So it was the last workup so that my PhD student he graduated. He says he's going to do it. So but I already signed the papers. So we'll see. And for combining them, another student of mine look into fusing them because a lot of people want to use foundation models, and I want to use as well, their good prior information. We want to use vision. Vision is very important. And we figure there are ideas that you can do sensor fusion while preserving equivarians. We got some interesting preliminary results, but that student also got graduated, so she's also saying she's going to finish it. I'm not sure. But these are very interesting open problems. Yeah. Think Fusion. Sensor fusion is an open and very good topic. The idea is that you don't need to thete anymore because the chaos Fixed constant. And a, in that case, what happens if you have, um, basically what normally happens is that every time you drift some, you will be linearize use the latest narization point and update the new point to optimize. What happens and that works, that's smooth over some of the problems with aximating your exponential coordinates, for example. When you go from the algebra, always taking the full Can we get lucky with the otitis formula for rotations. But for higher level SE two, three, for example, I don't think that still applies or even SE three, I don't think that applies. How do you deal with that kind of approximation error fitting in. So if I understood question correctly, is that the state is on the group. We track the error in the Lie algebra by linearization. And even though it's state independent when we sort of integrate back, what's the guarantee that we're not losing anything because we just linearize, right? And that's a valid question. The surprising part is that there's a tax paper by Peru and Bonobl that linearization is exact. Although you drop higher order terms, you can prove that the linearization is exact. So do you get exactly log linear lock up the exponential to go in the tangent space, linear error dynamics that is very surprising for that class of group affine function. So you actually don't lose anything. In general, you're right. So you would think that you would lose. But for that class of problems, group affine problems, you won't lose. And then in terms of a lot of equation and motion like IMU and kinematics of the robot that we want to use, they are group affine. So that sort of serendipity, people just found out, and it works very nicely. Actually, bike ps similar with thing, but I want to go into more details. Right now, features auto or do hometry get the benefit of lug affinity or inference Gubin, the carfenhould satisfy possible to still get the benefit while you add while you add those measurements? Are you asking if it's possible to have part of the model that is not group a fine and we still do this? Yes, this is a sort of much better framework. So it's a last less generalization of all the previous sort of common filtering. If your part of your problem sort of evolves and liger, you would want to use this. Still, you can apply it to other parts. You get no linearities. I won't be constant to Cuba. But still, I think you have a much cleaner formulation. So it will still work better. Fine. On a great top. I actually put a question about the hybrotoma. I'm curious how you're identifying new modes and generally, how many modes scale off. I'm primarily with the cnlomtion or fill that they use n. It's a great question. That's a crazy part. So because you wouldn't know how many modes. So what we need to do, we need to set a maximum number of modes, but that's fine. You say, 2010. And then the way it is in the framework is that there's sort of one hot encoded. There's a one hot encoded vector, and then you back propagate and then find eventually which mode you are in and each mode activates a different MLP to do it. So it's all as part of the learning, you find out how many modes by setting the maximum numbers. You end up you're comtting to 120. You you end up with kind of serious modes that are very similar just to get that budget? That's a good question. I don't think I have a definite answer to that. I need to check with the student with my student, if that happens, my guess is the optimization sort of aims for the fewer number of modes, not the higher number of modes because that's the optimal. So you can think of it as a classification and regression simultaneously, and the optimization sort of this is a lower cost if you have obviously fewer modes as far as. Right, right. So with the right last function, it would go obviously to fewer modes. And then you can do the same task, say with one more, two mod, three modes, and you can do ablation study. At some point, you see that it has a diminishing return. That's the interesting part. So a lot of this manipulations that you have so many contact points and modes but maybe I think a few is enough to sort of do the task. Yeah, I'm learning a lot about it. I think it's very interesting that it just works and zero shot and no trajectory segmentation. You just collect data, train it, and you can learn this discrete. Because it's discrete, it doesn't matter, the jump in the hybrid system, it doesn't matter anymore. For the continuous part, you need to do segmentation for the training data. That's very difficult. Yeah. Okay. Thank you for coming. Yeah. Thank you. Passing us. Is my voice clear? My feels hurt so much. Thank you for the talk. Um, a, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh,