[00:00:05]
>> Thank you everyone so what I'm going to talk about today is some of the work that we have recently started looking at you know group that is how do you develop and learning algorithm or area algorithm whatever you might like to say which successfully couples knowledge of physics and data together so it's a very interesting and important problem the applications domain range from signs I think we today we heard David talking a little bit in some about it right it goes all the way to autonomous systems so depending on whom are you talking to there are different types of applications where we would like to couple model and data driven learning and what we're trying to understand is that is it possible for us to build a.

[00:00:54]
Sort of find a computational framework which includes both an compositional model as well as equivalent computer architecture that would allow us to do this so most of the world that would be presenting here I mean there's probably all of our world all of my work at least is all done by the students so.

[00:01:15]
Hopefully like so if I look into the whole spectrum of learning so I'd have at one end we have where we know the physics pretty well the physics the simple to solve we can call them model based or physics based learning and solving physically questions directly and the other end of the spectrum where it should be data driven or did driven the other end of the spectrum where we think we know nothing about the physics and there is no physics inside it's your data and we're trying to make predictions and you know the examples a great deal predictions are one great examples of that where you have the spatial temporarily varying data that you are generating you are fitting and the planning model to it to understand how the video is going to behave in the next next Friday.

[00:02:01]
What we're interested in is somewhere at the middle where you probably have or you must you have some understanding of how what is the physical principle of the system but also you have data and our goal is to understand is it possible for us to couple these 2 together what's the application it could be running into forecasting then more competition efficient way at the minimum or the maximum we might be looking at understanding hidden dynamics in the data that is not modeled by the physics or understanding in variances that exist in the physical systems so how do you do that so.

[00:02:44]
The way we are approaching this problem is a little bit different than you know using physics to constrain learning so what we are trying to get at is that is it possible for you to create an quote unquote neural network right that includes data driven learning as well as models of the physics or explicitly integrate dynamics within the overall learning model so to do that it turns out I'm.

[00:03:13]
There isn't so high level this is what we want to do we have a dynamical system models which is defined by a couple to show the partial differential equations are and you have a data driven more ability to learn from the data you want to couple them in a way question is a we know very well you know at least we know a lot of ways we can do data driven learning the the goal is how do you create and know little network like model that can be very easily integrated with C.N.N. but also can map physics directly.

[00:03:46]
So it doesn't so you know the there was a. I mean I'm sure all of us have heard the hard member Easter and the person who invented it our 1st family and Chuba it turns out he actually made a very interesting proposal proposition around 1990 S. for designing neural network in a very different way this class called cellular non-linear network a couple nonlinear network and the idea there was if you think about 10 system where you have multiple nodes arranged in agreed each node they have multiple notes arranged in agreed each node follows the order a differential equations and these notes are connected to each other through something called a template that essentially defines how the states of different notes influence each other so you have the rate of change of the state of a particular note depends on the state itself depends on influence from the other states and depends from the influence from other input variables at a very high level it seems like these are all constants but it turns out if you go into the deep into the theory there is no reason this to be has to be constant a linear term they can be highly done live here so initially the whole concept of solar neural network was developed primarily to solve the image processing problem but then there were very interesting work came out of was group and also another and the group into the line from.

[00:05:23]
Where this started showing is that you know once you start adding non-linearity into this term now we don't know you were in a position to model very very interesting dynamical system using this common general platform. It's the kind of system that could model where you know different types of differentially question the map very interesting and difficult optimization problem and solve it with this the map chaotic systems so essentially different types of dynamics can be very easily mapped to the generic environment so for our purpose what we were interested in treating this for the sake of solving differential systems differential equations and you know the way you can think of it is that if you haven't a couple system with multiple variable for each of the variables you create a greed and that greed is special discretization of the of the whole system and as well as if you have multiple variable there coupling determined by a string that couples different variables to get the very simple example so for example you are trying to map a reaction deficient system you get the questions from there you can go through a very simple process to derive what this weight factor that's going to connect the different notes together in a plane and the one that's going to connect different nodes across the plane.

[00:06:50]
I'm not going to a lot of the details of the math so there are a few constrain that you have to satisfy a hero in your mind when you are creating all these problems you can make this network to be a stable system where you compute the apple of energy function and make sure the steady state that the function of this minimize or you can make the network behave more like an ostrich or a network so depending on the kind of dynamics that you are modeling it's possible for you to inject different classes of.

[00:07:15]
Apply different constraints on this on this factor to mimic the dynamics so why is that interesting for hybrid learning or mapping data and model the interesting part is. If you look into this equations they very closely mimic the convolutional operation that apparently the way to solve all problems in the world the condition of the network so it's very closely mimics the whole convolution operation what does it mean it means that now what you have if you can map and dynamical system into this model you have a fully differentiable model why is it important now I can't train this model or in other words I don't have to uniquely determined this template from the question I can actually train the system it's a fully differentiable model to come up with at least parts of this template in a way that the whole system is stabilized so I can use concepts like stochastic gradient descent So in summary it's a differential model fully differential model of entities all over the operations can be represented convolutional operation so I can train them quote unquote using a you know the conventional training algorithm which we'd like to use on the D.N.A. and world likes to cast a gradient descent and there are a lot of discussions we can have with it the right way to 20 the answer is depending on the problem we were solving but of the but since we can create a trainable solver Now what we are going after is that can we use this property to 1st learn parameters that you do not know of a physical system at minimum but more importantly can you start discovering dynamics that he did within the data but not obvious by not obviously model by by the by the differential equations that you're using for a numerical solve.

[00:09:16]
So our goal is not to learn given the data we're not trying to come up with physics the goal is to understand is that is it a way that where we can couple he didn't dynamics present in the data we known physics together to improve analysis and prediction of and system so we started off with a very very very very simple example of this and then we are currently in the process of building a lot more complex problem and I'm going to go to that later the 1st thing that we wanted to do is that consider you have a 2 D. fluid system governed by Nevil Stokes and the whole fluid system you have multiple objects which are moving in with certain velocities and they can go and bounce off from the from the boundaries of this box so that creates a very extreme degree of nonlinearity in the object motion and our goal is to understand how if I give you the behavior of the system over the past few frames can you tell me what's going to happen how the fluid velocity is able to change over the past 5 D. plus 10 or any number of frames in half time why do you care about it it's a classic problem of forecasting right so there are certain things that's happening that we do not know but we are observing can we make predictions you do care about it from the context of inverse problem so you might want to know what is disturbing the velocity field this is crucial for a lot of you know the navel never the applications and also if you change the system from an fluid system to another system it might become important for driving occasions and all of these things so how do you do that so a simple way you can think of that you know you try to learn go to encode the object behavior through and data driven learning model and couple that with the physics different model of learning that's coming from this multilayered similar neural network and since this part is now different shippers you can train the center network as a single system.

[00:11:17]
So you can start getting into the problem that you know me so for example you are trying to do this prediction when you do not know what is the density of the fluid so you can do this continuous training or this is integrated training to be able to start with any random value to train it to be able to predict what is the value of the density and if the density changes again you do the retraining by observing whether a prediction matches with observation or not so you know real time fashion you can start moderating this parameters and and making sure that your prediction matches very closely with the data so what does it mean so for example this isn't this isn't simulation of this multiple objects that are moving into this plane and this is coming out of the numerical solver and this is what this hybrid network is trying to learn and you can see that it does a pretty decent job in being able to predict what we mean by pretty descent if you compare that with a very classical video prediction network that's a pure data driven network the accuracy of what the this learning is predicting how this makes learning is predicting compared to a baseline system you can see that the accuracy is a much higher and also the uncertainty that we get in making these predictions are much lower the more interesting part comes is that you know when you are making these predictions it's you can intake integrate busier learning into the process so when instead of along with making predictions we can predict the uncertainty as a part of the whole network.

[00:12:57]
So there's a very simple example so if you look into a more complex cases you're probably looking at a network more like this you have some of this network modeling what you know some part is more treated as a general dynamics where the templates are going to be trained you have a whole bunch of different data or even Network which are trying to extract features from the data and trying to sort of compensate for what the physical model is not being able to model directly as an output you are generating prediction you can run a whole bunch of different analyses on it you can quantify and certainty so all this or part of an integrated network model so we are so the more exciting problem or more practical problem we're solving on this domain recently a world that we've started with a joint collaboration between the 3 of us here on the machine learning side myself for the Justin rumbled from Georgia Tech D.C. and a professor from J C 2 and a 3 faculty in oceanography So what we're trying to do here is to understand that if you have an ocean model where someone is running sort of what we call a mess or skill simulation that is a low grade low spatial grade simulations from there is it possible for us to predict things happening at the supplemental scale that is a much much finer scale process the reason if you can do that together you will be able to run the whole and have to understand the ocean behavior that much faster but also and you know a lot more accurately and the goal is to be able to understand the variances estimate the state and also potentially We want to understand is it is it a better way forward to a similar data from them from different number of sources I'll be happy to share more details on this if any of you are interested still and work in progress but we have some interesting initial directions of how you will approach this problem so this talks about the model.

[00:14:59]
So the next part that we're interested in understanding is this model can be very easily translated into an hardware computing architecture and the answer is yes I mean I can come up with this different ways of drawing this figure the bottom line is that you will have some hardware exploration for deep networks and some hardware acceleration for this coupled network working together to be able to make this possible so I'm not going to go much about how do you actually read D.N.A. And I'm sure we haven't talked in the morning and there are a lot of work going on in this domain to me to skip some of the slides here but the part that I would like to point out is that it's important when you're doing looking into this kind of an exploration to realize you have to couple innovations in the architecture circuits and algorithms together one of the things that we're we have been looking at somebody that Peter was talking about how do you do this kind of an computations by being with inside the memory system and we show we observe very interesting results in terms of how fast you can train and infer the stick in for when you go inside the memory system in doing this explorations algorithmically one of the interesting thing that we observed where you were trying to accelerate this network is sometime you have to go outside of doing operation in the spatial domain you have to realize convolution in space is multiplication in frequency so having known signal processing algorithms bringing them in into this context can help or in some cases we might actually have to resolve to using more advanced devices or circuit techniques that goes beyond conventional.

[00:16:36]
Conventional digital architecture for example we have been looking at you know how do you use resistive RAM or fairly quick devices to make these things faster. But the interesting things come when you are getting to this exploration of the couple network systems because this isn't architecturally or from and hardware standpoint it's a challenging problem because although.

[00:16:59]
It seems like it's an convolution operation it has this problem that it has new waves which are non-linear and continuously time vary so you cannot really have a set of weights and load it and operate based on that you have to continuously update the weights of your the real network where you were when you were doing this kind of an exploration for paper a couple of years back in the sky where what we observed is you know a simple way to solve this problem could be doing and in a very simple Taylor expansion and.

[00:17:33]
Expand this non-linear function as the Taylor series you start storing the coefficients of the lookup table instead of storing the values and then whenever you are trying to evaluate this function you go ahead and use this lookup table now your functions are highly nonlinear So you need a very large table to probably have to borrow ideas from conventional caching to be able to create and sort of a cache are key for this lookup tables one thing that becomes interesting here is that your performance depends on the nature of problem you're solving if you're solving a problem which are slowly varying over time you'll get much better behavior from your caching but if you're solving problems which are highly nonlinear highly time varying you are you know you might get a much higher misread so this this types of and properties need to be analyzed.

[00:18:23]
When you are done with all these things you can conceptualize and hardware architecture Who are you have a whole bunch of processing engines which are basically doing the mac operations you had this lookup table that are doing your nonlinear function computation. More later what we're started realizing is that instead of the lookup table you might want to create an approximation of the whole thing whole function and use it as and is use it that as an accelerator but then the day when you are done with all these things what you're expecting is you know it's obvious since you are designing an ethic it's going to be much lower power there is compared to doing that in G.P.U. but also what we observed is that you know by doing this does give us a much faster operation for solving this kind of differential equations compared to solving the known and G.P.U. So why do we care about doing it faster so if you put this back into this prediction that we are doing on running this hybrid network if you want to do it for G.P.U. versus if you do it where this is sick by connecting and the in an accelerator it's versus you know this and a C.N.N. actually Rector's you're talking about as expected and of probably a 100 X. gain $10200.00 square in terms of performance so no this is is good enough if you're trying to solve problems at the H.P. scale if you're trying to take this models and start solving this at the 8 scale why would you like to do it on that robotics is a great application that's where being able to run this things on and specialize architecture makes more sense so.

[00:20:02]
I would like to summarize saying that what we have been observing is being able to couple model and data together is not only is not only important it's very powerful the what we have been trying to understand is that what is the right way of doing it there are very very different ways that you can do this coupling between modern physics what we believe are what we are focusing on is this concept where we want to bring in the dynamics as a part of the network rather than using the physics only to constrain constrain the learning in a deep network we also started looking at and we believe in if you have this kind of an models and you really want to do them faster and more energy efficient way we have to accompany them with proper computer architecture the part that I didn't talk about you need to develop we need to look at how do you develop programming models so that you can take an application and map it into this architecture and finally.

[00:21:02]
What we're looking at is the different space of applications of this model's science is an obvious application that we that we looked at but also what we are realizing the ability of the network to understand dynamics or hidden dynamics from the data is very crucial for applications in robotics and autonomic where you are.

[00:21:24]
Do not do not only want to depend on pass of their vision you actually try to learn how you know how these different things that's happening around the environment are connected to each other if they are to physical laws and being able to project and quantify uncertainty is very crucial in all these cases so with that I'll stop and I'll be happy to answer any question I guess I spoke too quickly and finish it too early.

[00:22:08]
Yes So yes so that's actually the very good question so I try to hide away from a lot of the details so essentially what happens is that you know when I formulate the the the couple in your network you can write down their energy behavior right so you can write down a level of energy function describing how this network is behaving so when you are doing this coupled training it's you can incorporate loss functions which are tied to the energy in the stability properties of this network just to make sure that you're not in a forever doing this parameter fitting you're not making the network wonder off as you are saying in situations where things are completely unstable and or nonphysical So that's that's one option the 2nd option is more often is this ability to you know you're making predictions essentially you can think of the whole concept of model driven control so we need to borrow ideas from there where we where we're going to make forecasting and we have to constantly compare how good our forecasts were and that aspect of that area need to be fed back into the learning process so there is there is a part of reinforcement learning that will be a need to be added into this network so these are the 2 primary ways that what you're currently thinking of the 1st part is a thanks to the lot of research that happens on this couple or so liberal network over the last 1015 years there's a very well established mathematical theories to formulate this these energy functions and operate on them for Ass The challenge is really when you bring those theories they were in isolation of the traditional more you know a 2 kind of a lost function where you're trying to make sure how well you were learning from the data so how do you mark these 2 losses together and still create and fully friendship a loss function so each of them are different.

[00:23:56]
The other factor that comes here is that these networks are not exactly the same the neural network the way sure has proposed there is there was a non-linearity in the system. Here we are removing all the non-linearity making use of state control so the the math that we know is not going to be directly applicable but we feel that the changes are not going to be you know the hard work has been done by others so we can focus on March coupling the 2 instead of the theory of the one we.

[00:24:37]
Like. Yes So that's the so there is this couple of thing that we're learning from this whole experiment so one is there is very much this idea of if you have a dynamic inside the system can you capture it now the example that you are explaining right the natural language processing power example at least at this point we're not sure that there isn't a dynamic straight the other thing that we're learning through this experiment is that when you have very different types of numeral neural network or models and you try to put them together how do you train or understand this whole things as a single system so that learning of how do train different types of network together is going to be useful even if we're extending there so one example I can give you is that what we're trying to understand this can we take spiking networks which are very different dynamics right and this this D.N.A. type of an approach and combine them together to solve a class of problems where you want to bring unsupervised and supervised learning together but some This idea of the last functions need to be changed the last function is to sort of understand how do you make it fully differentiable So some of this learning is going to be great I don't think this network in particular by directly is going to be applicable.

[00:26:00]
Some of the training and the learning principles are are applicable or essentially. Usable for anyone trying to Mars and it was a very different type.