Okay, So yeah, so first of all, thanks again. This is really an amazing opportunity for for my group to interact with you guys and for my, for my students and my post-doc Daniel next week to have the opportunity to, to tell you what they're doing. So we really appreciate that. But just very briefly, the ideas we've been pursuing have been to try to take advantage of the fact that dynamics and dissipative PDEs live on finite dimensional manifolds should be able to use methods to learn those manifolds from data and to learn dynamical systems on those manifolds from the data. And so that's really what Alec that last week on. So again, the picture here is this is this continues to be the picture to have in mind. You have a full state-space, but the dynamics at long times live in some reduced space and we want to go back and forth between the two. Again, we're interested in lower dimensional models. Ideally minimal dimensional models. We want to maintain the Markovian nature of the, of the dynamics. We want to. And then one reason from an engineer's point of view to do that is to exploit those low dimensional representations for applications such as, such as control. We already heard from alec last week. So we'll hear from and the plan is we'll hear from Kevin and Carlos today. And I'll just put an advertisement for Daniel Florian who will speak next week on the, on the idea that to obtain a minimal representation in general, it can't be monolithic. You actually will have to sub-divide state space into smaller pieces and put them together because you can't come up with a global non-singular coordinate representation for an arbitrary manifold in an arbitrary number of dimensions. So that's just a prelude to his work or what I like talked about last week and for what Carlos and Kevin will talk about today, we're still looking at global representations. So one coordinate system that's representing the manifold globally. So next week we'll hear about how to get past that limitation. So that was it. I think without further ado, I should just turn things over to Kevin. So nice to meet all of you. My name is Kevin and today I'll be sharing a work though we've titled deep reinforcement learning using data-driven reduced order models discovers and stabilizes Lotus a patient equilibrium. Alright, so as the title suggests, this talk is motivated by deep reinforcement learning, which is a data-driven, model-free methods that in recent years has garnered a lot of attention. But its ability to discover very complex control strategies for very high dimensional systems. Some of these kind of accolades that it's acquired throughout the years are superhuman performances and very complex high dimensional games such as go StarCraft 2, indebted to. And as we're IV fluids group, we're mostly interested in this for it's promising aspects as a flow control application. So the general gist of reinforcement learning for those who are unfamiliar is you have a control agent, which given a state observation from the system that you're trying to control, tries to map it to an action that when applied to the environment, attempts to maximize some reward or Ctrl goal, interacts iteratively with the environment and learns a little bit each time it interact. So this occurs millions, millions of times until someone decides that it's good enough. So despite all these really cool achievements that it's achieved throughout the years, something that's often kind of overshadowed by all these cool things that it can do is that it has a bit of a sampling complexity problem. What that boils down to is that in general, you need a tremendous amount of training data. So things that aren't often talked about are all these cool achievements that RL has done in the past 23 years is that they take hundreds, if not thousands, of years worth of experiences to acquire those performances. So what that means is, if you're trying to apply this to simulations are experiments that are very expensive to run. Then RL as a prospect, becomes very expensive very quickly or even not very feasible to apply. So in this work today, we're going to introduce a method in which we combine the data-driven nonlinear modeling that Alex shared last week. Now kinda revisit again with reinforcement learning control to apply it to these high dimensional systems. In particular, we're interested in systems with these chaotic dynamics. And as I mentioned, this isn't existing problem that people know about. And there are people that are working towards various ways of addressing this issue. But for our method. It will probably fall in line with, if you look in the literature something called model-based RL, than the typical reinforcement learning problem. You have. Your environment that you control an agent that's constantly interacting with it. If your environment is expensive to simulate or perform experimentally. And all those iterations, it can cost you a lot of computation or money. In model-based RL, the general gist of it, at least in our case with our reduced order model based RL, is that we're going to first, we're going to learn a model from data alone for the purposes of control. And using that model, we're going to apply it. We're going to apply reinforcement learning to it to approximate a control strategy or control policy from it. Once we have this approximate control policy, well then extrapolate and apply it to our original system from which that we're attempting to model and see how well it works. So this is the general gist of our method today. Okay? So it's a bit of a method overview. There's a lot on this slide and it's not really important for everyone. Understand what's going on in every single box. I just want to walk through what's going on in each of the five main steps. And when I get each step later on in the talk, I'll talk more in depth with it. But the basic ideas in step one, we need to generate some training data for our reduced order model that we're trying to learn using that data in steps 2 and 3, we'll do something similar to alec talked about last week, which is going to learn in embedding coordinate system for high-dimensional system. And then we're going to model the dynamics of our system in that low dimensional space that we learn. To complete our model. We're going to do this with neural ODEs. Once we've completed Steps 2 and 3, which is basically building our model, we're going to do step 4, which is applying the deep reinforcement learning problem instead of to the true environment which could be expensive to evaluate, going to apply it to our reduced order model that we've learned. Finally, in step five, once we have our control policy, will apply it back to the full system, The original system that we modeled our system on and see how well it works out. Okay. Sorry, I'm just going to ask you a question, Kevin. Going further because I just want to make sure I understand. So I guess I would have imagined in my head that you would have just taken Alex results from last week and then just skipped two because you have H and G, right? I guess I'm just try understand it. You see you're actually having to generate you actually have your perturbing the thumb here. Yeah. In order to then, then learn the embedding coordinate and such, I'm just try understand, is that necessary to do that? Can you just like, take like the, the, the learned h and g from the natural behavior of the system and, and skip right to doing control. You can say a bit about that. Yeah, That's a really good question. The reason why we have to kind of start over a bit, but we're using the same framework. Is that in ALEKS model that he trained, his trained purely on just the natural dynamics of the kids. See that there's no control input and the model has no way for you to input control actions because it doesn't have that. There's no way for the agent to interact with that model. So there's no way for the agent's choice of actions that somehow influence the modelling. Anyway, guys, you can actually, you're looking at is the sort of the, the system which includes the natural system plus actuations. And that's the system we're alert with the individual manifold, everything. Okay? All right. Exactly. Okay. All right, So then moving forward, I guess we'll just talk about the system that we're going to talk about today and apply this to the system similar to last week is the Commodus efficiency equation, or I'll just call it the KFC for short. For those who are unfamiliar, it's a system that exhibit spatiotemporal chaotic dynamics and an R group. We use it as this 1D proxy system for turbulent flows. The system that we're going to look at has a domain size of 22 with periodic boundary conditions and looking at the system because it's well-characterized dynamically. And there's a lot that's out there on it. It has, this is a control problem. We're going to take the state observable system to just be the velocity profile. So for example, this slice right here in the KFC looks like a wave right here, velocity. And then we need a way for our control agent to interact with the system. We're going to allow that to happen through for fixed equidistant Gaussian juts, these are the actions which can act independently. But there's cash and jets are where they're located here, in the yellow bands here. Finally, because this is a control problem, we're going to establish that don't have the control objective be reduce the drag or an analog to it. So in other words, we're going to look to minimize the dissipation powering could input cost of the system. All right, So with this system, we're going to apply our method in the first step is to collect training data for a reduced order model. Because right now we don't have any control strategy. The best we can do is just see what happens when we apply random actuation sewer system. The first step in generating random, random actuations. We're just going to run trajectories and then we're going to apply random Chet actuations to them and see what happens. So the data that we're collecting looks like for a given state, we apply some random gen actuation record that. And then we see what happens when we apply that, which comes out as S t plus one. So this is the resulting state of that state action pair. Okay? Dynamically what that means or what we're doing is we're simply in the previous case where Alec was collecting state. Trajectories and then training off that we're collecting state and actions in what we're doing is basically we're sampling in the vicinity of the attractor of the unforced system. But we're just trying to collect data regarding what happens when we apply actuations to it for our model. Alright, so that's step one, collecting training data for our reduced order model. The second step, and in parallel with Alex work that was on the last week is we're going to now we're going to use that data to learn a useful reduce order coordinate system transformation. In the way we're going to do that. Again, we're going to use an under complete autoencoder shown over here. This is in hourglass-shaped neural network. And what it's trying to do is given a state input, it's going to try to map that to a reduced order representation that we'll call h, where the degrees of freedom and h r th, which are less than however many you have for your state representation. And then this autoencoder will then have to take that compress representation of the state and expand it back out and reconstruct the original state input. The reason why we want to do this, because we know that many high dimensional systems dynamics live in a low-dimensional space. And we want to take, take advantage of that and apply that into our model. Okay? So before I get into kinda the results for the data set that we collected, I want to kind of deviate a little bit and I'm going to talk about this first figure here, figure a. And this figure was generated data that was collected without any actuation. So this is a direct, direct analog to what Alec did. What's shown on the vertical axis is the Reconstruction era where the lower the error, the better it is. And on the horizontal or the x axis, this is how many degrees of freedom we're allowing in our latent representation. And what we see is that as we gradually increase the number of degrees of freedom up until eight degrees of freedom. That's when we achieve a very good reconstruction. And what we conclude as this is how many degrees of freedom we need to represent the space that the data lives. And for the unaccelerated KFC. Or in other words, this is how many degrees of freedom we need to represent the manifold epigenomics live in, or the little red line here. When we do the same analysis with the actuated data that we collected. In step one, we see a delay in the onset of this reconstruction or drop up until about 11 or 12 degrees of freedom. That's when we see comparable performance is between the two. And what we interpret this is as is when you're applying actuations, two trajectories of the KFC. What you're doing is you're just pushing them off the natural attractor that they typically move along. So what that looks like is kinda just filling in the space around the original attractor here, which is shaded in this pink here. A more physical interpretation be seen if we look at just the power spectral densities between the two datasets, the on actuated snapshots that you get from an actuated case. The data that we collected under actuations. In what you see when you compare the two is that the actuated dataset has a, has a longer tail in the high wave number region, which tells us that the data that were collected here and we're training on as these higher frequency features that the auto-encoders need to capture. So that likely requires more degrees of freedom to accurately represent. Okay? All right, so now that we have collected our training data, we have learned our embedding coordinate system. The third step to wrap up our sort of model that we learned from data is we're going to try to learn the dynamics in this reduced order space. And again, we're going to do the same thing, very similar to what allocated in his aqua week is we're going to use neural ODEs. We're going to try to learn the dynamics in 3D space. So what this neural ODEs job is, is given a lane space or linspace state H of T and some actuation, its job is to forecast what, how that lean state should evolve in time. In the way that we're going to train this is we're going to use the same data that we collected in step one before. It just states observables in actions and the resulting states. And we're going to convert those into the lanes, say, representations using the encoder that we learned in step two. So this way we, we already have our train data. Of course, we can also use the decoder, the other half of the autoencoder we learned in step 2 to always map back to the ambient space whenever we want. So we can always decode our forecasted trajectories from the latent space. All right, so now that we have our embedding coordinate system and we've learned our right hand side of the embedded dynamics and we can make forecasts, actuated forecasts in this lane space. Can look at what our models. Look like. So what I'm showing the upper plot here in the top two plots, these are just two different example trajectories of the KFC from two different initial conditions that are moving along following some random actuation sequence. So there's just a Gaussian jets that we've put in, but just randomly actuating. And this is what those two trajectories look like. Below each plot, what's shown is the reduced order model that we've learned from the data, starting from the same initial conditions as their respective upper plots and following the same random actuation sequences as the respective plots. And we see from a qualitative perspective that the forecasts match up about 20 to 30 minutes, which correlates about one to 1.5. Lyapunov types are qualitatively. It's making pretty good predictions to get a little more quantitative what's shown in the bottom two figures, or the ensemble statistics of spatial autocorrelation in the left here, and the temporal auto-correlation in the right. And we see that for the spatial autocorrelation or reduced order model matches very excellently well with the ground truth Casey, following random actuations. While the temporal auto correlation also does pretty well, although there's a little bit of a deviation. But all in all, the model seems to very true to the actuated Kasey. 11. Natural question that came up to us was, we were wondering how well does the model actually capture the underlying natural Cassie. So as a reminder, the reduced order models that we learned in the previous two steps, those are trained completely on snapshots of a kissy that were being perturbed by these random jet actuations. And so for the most part there's essentially no 0 actuation training in the data. So we wanted to know how old is our reduced order model capture those underlying natural dynamics or in other words, how well does it predict if you just turn the juts off? So that's essentially what's shown in this upper part here. Again, the top two plots are the KFC trajectory. To example, KFC trajectories where the actuations have been removed. And then below them are their respective forecasts made by, are made by the same reduced order model they showed previously. Picks up, all we've done is we've just said no actuations, set them to 0 and see how well it predicts. And again, we get pretty similar qualitative predictions we get about, but only up enough times with qualitative prediction. In terms of again, the same ensemble statistics of spatial autocorrelation, temporal auto-correlation. The spatial autocorrelation matches up again very well between the two cases in which actuations removed. And then the temporal autocorrelation again matches well until somewhere around 20 time units in which it starts to deviate or kinda dilate a little bit. But all in all, I think we were pretty satisfied with how well this turned out. Considering it wasn't trained on any of the data that we're trying to get it to show. Okay. I had a quick clarification question. Yeah. So when you're doing the reinforcement learning, are you still feeding it? The examples when the jets are turned off? So that sort of when you're, when you're learning the optimized control, it has the option to turn on the jets off. You kind of know what that effect would be? Yes. So definitely send the next couple of slides. We'll be talking about applying the reinforcement learning problem to a model that we have. Definitely this is something that we do care about because in the reinforcement learning problem, the control agent should have all the control freedom that you would have in the real system. So in the real system, if you're doing this as you traditionally would with reinforcement learning. If the agent wanted to turn that actuations off for some reason because maybe That's a good time to turn them off, then it should be able to. So in our model, we wanted that to also be true. But maybe you could just say a little bit more about what the, what the training data look like in terms of the actuations. Okay? So the training data looks most similarly to PC2 upper plots here. But are there, Were there other cases? What does the, do you have the pattern of the actuations as a function of time? I don't have them in this presentation. So there are on-off there, right person so that actually the actuations are allowed to be sampled throughout the control range. That's allowed in the original system. So they can be sampled anywhere between negative one to one or negative one as you know, full, maximum, I guess, kinda like section and then plus one and be like maximum chat input. There is a chance for all four jets to be sampled at 0, but that's unlikely. So each child would be just random number generator between negative one to one. And then you maintain that for some time interval. You a very brief interval. Yeah. Okay. So to continue the problem, now that we have a good reduced order model of our dynamics with actuations, the final stop and kind of coming full circle. Because we're going to return to this reinforcement learning problem that I introduced in the beginning. And now we're going to try to train our agent with a reduced order model. So as a recap again, in the typical RL process, your agent makes a state observation of the system. And what that observation that's going to make an actual choice. Aft, when you apply this action to the environment and the system evolves, the impact that that choice had on the control goal is quantified by this scalar report here. And then it's going to take all these pieces of information and use them to update itself. And it's going to repeat this over and over and over a million times. As you can imagine. Every time you need to run your simulation and run an experiment that can get tremendously expensive very quickly. In our method here, instead of doing this cyclic learning process with the environment, we're going to do it with our reduced order model that we learn which can handle actuation of the model them. And two things to note besides that for the first being that you're learning with this reduced order model answer to this full state environment. The other thing is that instead of learning to map the state observable to the action that controls the system. We're now learning a mapping between the lane state representation and the action that affects the lane state dynamics. So in other words, we're essentially training the agent in the lane state representation rather than the full state representation. For those, if anyone's familiar with reinforcement learning, you might be curious, well, how do you calculate the reward when you're in this model that's in the lane space dynamics. Well, you can do that by simply using the decoder you've learned in step 2 can always recover the full state representation and then from there you can estimate the rewards. So for curious about dissipation, we can always decode to the full state representation and estimate what was the patient at that moment. But all in all, we don't need to run many, many expensive experiments once we have, whoops, once we have our model, all right, The final step is once we've trained our reinforcement learning agent with our reduced order model, the final test is to see, well, does it actually work? So to do this, It's a very simple modification of the typical application. From the environment we get some state observation, will plug in the encoder that we learned in step two, which maps or state S8 observable to the latent state representation. The reason why we have to do this is because the agent that we trained only recognizes the lanes, the inputs. But once we do that, we can apply our control agent in the same closed loop fashion as you normally would with vanilla reinforcement learning. Alright, so when we apply or learn control agent to the full kissy system. So this is the original ground truth. We see that it quickly drives the system to equilibrium like state, which has properties that have low dissipation, low-power and caught input cost. So for reference, this dotted, dotted black line is the system mean of the KFC. And we're able to drive it to steady low dissipate a state. One thing that we wanted to make sure was that our reduced order model, sorry, or reduced order model based control wasn't just getting lucky and this is a coincidence. So what we did was we took the same initial conditions that we apply to the original Casey. And we tested the control agent on the reduced order model with the same initial condition. And we see with the same initial condition applied to a reduced order model, we get, we get the control agent driving it again. The same low dissipate its equilibrium with the same statistics. So what this told us was, our model appears to be able to capture the existence of this force equilibrium, as well as it's able to model the dynamics that lead up to it well enough for reinforcement learning agent to both discover and exploit them. All right, and as kind of a last thing, for the last now, we're very curious about what was the dynamical, dynamical systems significance of this reduced order model based strategy that we learn. So to answer that question, we turn to continuations in the forcing profile. So this equilibrium that are CTRL Asian targets, it's kinda sustained by essentially this constant forcing profile. So what we did was we took this equilibrium that's sustained with this forcing profile. Can we solve for equilibrium solutions? The KFC. And we gradually decrease the amplitude of the forcing profile down to 0 so that B come back to the original KFC. And what we find is that this force equilibrium connects backup to a known solution of a natural kissing known as u1, which is very cool. So all in all, this is a good control strategy because it takes less control effort to stabilize compared to if you were to constantly try to drive the system to some arbitrary point in space where the dynamics generally just don't want to be in. Okay, so to wrap up my talk, today, we've shown a method where you kind of proximity control policies and are completely an end-to-end data-driven fashion from just a limited dataset, which is different from your typical RL method. How we use autoencoders to learn the low-dimensional space that many of these high dimensional systems dynamics live in. Then we use Neural ODEs to model the actuated dynamics in a reduced order space. And then finally, we use deep reinforcement learning to kind of extract a control pole extra try to control policy from our Learn model. We applied this to the kissy to demonstrate that it's able to capture from these underlying low dissipation equilibrium states. And our deep reinforcement learning agent is able to find them. And we also showed that our, Although our control spot until policies were extracted from our reduced order models, they still transfer well to the original high fidelity simulations that we were modeling. And with that, I'd be open to taking any questions that anyone might have. Thanks. Well, I want to jump in and ask questions. So just just to say the last, the last point you were making so that this, this dissipation minimum equilibrium, which you're calling it, what I'm understanding. A, it's not a, it's not an equilibrium of the natural commode or she wishes key. What you're saying is that my understanding is that you have to force it. You're continuing to force, There's some forcing, non-zero forcing profile. When you get to that, Do I understand correctly? That's what it does is it minimizes compared to it actually reduces the, the dissipation relative to some related equilibrium. That would be the 0 forcing case. Is that right? It's actually lower, lower dissipation than that. That's the piece I'm trying to, I'm not quite clear on. So in this case, I'm pulling up there quite comparable. I think they're very similar, although this one might be a little bit lower. But yes, I think that's generally the case in our previous work. That was definitely the case where we used without a model. This was just reinforcement learning applied to the direct KFC. Yeah, I see. But in principle you could also change your control. I mean, right now you're just like minimizing dissipation. But in principle, your target dynamics could be a natural, let's say the equilibrium of the natural KFC system, right? I mean, yes, for sure. You have some cost associated with that product. All right. Okay. Just to jump in. So you could, you could do that, but only if you know it's there. Right? So the, so the nice thing and actually the kind of pleasant surprise about this was that this is, this is all from, now, this is all from data and in this case all, all even just from a model of the data, we didn't do any, we didn't look for any. We didn't do we didn't do any Newton iterations beforehand. We just took Rite Aid or found a model, apply to control to try to minimize the dissipation. And lo and behold, we land on one of these equilibrium states. Got it, got it. Okay, right, or the forced version of one of these bright green squares. But there's nothing, nothing to say you couldn't use like your advanced knowledge of, let's say you use Newton in advance and then you had a target and a principal I was going to sell. Let me just ask you just to follow on. So I guess the question is, what about you could do this also in account for like continuous symmetry. Like if you were to chart to target a relative periodic orbit. And principle you, I guess you're in some sense your control. I don't know whether you have to have built in of translational invariance where you are with your, your actuation system or where there's somehow your actuaries resisted would break that continuous symmetry and you wouldn't be able to target arpeggios. Again, That's my question. I say. So I definitely think you could try to modify your control schemes so that your chats could move around. I would venture to guess that provided enough data that your model is good enough that your URL isn't probably stabilize a periodic orbit. I think the difficulty in mostly comes into. How do you formulate the reward or cost function for your learning agent to find it. So you could either, you could either know exactly where it is and then try to track it and have the agent minimize the distance so that orbit. Or you could be very general and not even specify which orbit you want. Maybe you just want an orbit with this period of time and you try to reward the agent for bringing the system back to the same state? Same every X amount of time. So yeah, just just one-to-one further comment on that point. So so the actuators in Kevin's case, the actuators break the continuous symmetry? Yes, but they maintain they're still, there are still discrete cemeteries. And he has a paper on building all those symmetries into the RL framework. Discrete and continuous are both discrete and continuous. Well, in that case there are no, there are no more continuous cemeteries. One, right? Once you have the actuation, bright red, all the discrete ones, I guess if I guess we're seeking that you could have like if you had I don't know whether you can have travelling actuated. Yes. That's what I would say. Emulated you had that. Okay. Though, why don't you, why don't you build a sum of those as the experimentalist? But yeah, yeah. Yeah. Hey, I have a question for you. Sure. So Mike was asking about the natural unforced equilibrium and whether you could conceptually stabilize that. I, I think I remember that you looked into this, right? And if I recall, the unforced equilibrium is not stabilize at all. Is that correct? Yeah, so the trivial solution that's not stabilize the poll in our actuations with our actuation scheme. I didn't check if the other equilibriums or stabilize a poll to check the 0 soldiers. Okay, gotcha. Yeah. So it is possible that the E1, the natural U1 is or is not stabilized. I thought an year old paper you did that. Tell us for the 0 solution. So we're answering the question, why doesn't it target the 0 solution, okay? Yeah, well, as a practical matter, we kind of know that the E1 is at least the first version because as your algorithm does it. Another thing I wanted to bring up is I notice that for the, for the, for Stata you need 12 degrees of freedom to capture it. And so I guess the interesting thing there is that when you have your jets, you introduce for additional degrees of freedom and always for more than eight, which is what? The number of degrees of freedom you need for your unforced system. And so this might just be like a quick and dirty way to estimate how many degrees of freedom you need for your actuated dynamics. If the actuation doesn't qualitatively change the state-space structure. Though. That's a very cool idea. Yeah. Excellent. Any other questions? Um, but kind of a question for anyone who hasn't answered Fatah. Is there any sense in my fluid systems like equilibriums versus periodic orbits versus relook at equilibrium. Is there a general sense, one of those classes having a lower dissipation. But can you say that periodic orbits are pretty much always lower discipline or anything like that? I think from our experience with a channel flow, that would be hard to generalize because for example, there are low drag equilibria. High drag equilibria are travelling waves, traveling frame. And then you have RPS end up bouncing between the bouncing between them. So probably there are low there are probably lower drag equilibria than any of the ERPOs. But I'm not really sure of that. Because you could have rpoS that are on the edge, the basin boundary. That's it. That's a hard question to answer.