so when you get an invitation to come to a department that's good you don't say no and then you think oh wait a minute I'm not even in a chemical engineering department anymore so in my case I know that when I used to be in a chemical engineering department and people in my area would come and talk then all they would do is this talking mask and it was really really annoying so apologies to Radiohead but that's all I do is I buzz like a fridge right and so do my collaborators so what I'm gonna do you have your option now you can walk out and I won't get my feelings hurt look what I'm gonna try to do is mostly talk in pictures because I know if when you're looking at a mathematical area that's slightly different from something you've seen before it's a little bit challenging to get the symbols really fast so this is work that is primarily done by Simon Olufsen he is getting his PhD in computer science and my collaborator in the machine learning discipline is dr. Mark Dyson from mark is also at Prowler i/o that's where he is right now and then hopefully depending on how many questions I get asked watch how quickly I'm going I hope I get to mention the PhD work of Johannes pheeba he is working in collaboration with Anessa Sileo who's at leveraging research so I really like this kind of research where basically all the students I work with also work with industrialists and we try to make sure that our problems are quite relevant in general you all were really very kind to see me still as a chemical engineer I started in chemical engineering I am trained in chemical engineering and what you can think of is there are applications that my lab has in manufacturing systems biomedical systems etc we also make limited contributions in mathematical optimization but mostly we're just between and then we're happy to be whatever anybody's going to call us to be um one of the things when working with computer scientists and when I train computer scientists is that we had better make everything that we do available so what I'm going to be presenting today is three different papers but already on github are is all of the material that is necessary to reproduce the papers hopefully from a chemical engineering perspective this is useful for specific applications also hopefully from a computer science perspective the computer scientists just want to beat us right you know they want they they honestly don't have enough problems right they don't have enough new and interesting things to look at and so they'd like to possibly use our test sets so what I'd like to do just to get us all on the same footing is start with a very brief introduction to Gaussian processes now anybody who works with the new book of Allah or with AJ can check their phones for the next ten minutes because I know you all already noticed but just as a sort of brief introduction here it goes okay so in Gaussian process regression this is also called creaking in in other literature I call it Gaussian process regression basically because I work with this machine learner mark doesn't mine what you're doing is you have a set of observations that is corrupted by noise in some way and so what you assume is you have these inputs X they can be multi-dimensional and then you have an output prediction which is f of X this could be our mechanistic models or something like this then you have a sort of unexpected error and we are assuming that this error is going to be normally distributed in this case in this picture it's with zero mean and then the the variance okay but what you might be able to see from this picture first off is when you're close to data points where you have already taken a measurement you're more certain of what your function is like and when you're farther away then you're lesser so this just makes some amount of intuitive sense and indeed what the Gaussian process regression does is that it says we have some sort of predictions we have the data points that we have measured and then in gray what I'm showing is for every point X this is two standard deviations away from the normal that are the mean that I am XI so what this basically can tell us is that the closer we are to the data points the more we trust what we've seen before so just as a quick illustration for how this might happen in practice I have my prior belief about the function now this is one of the ways that I think chemical engineers can really contribute is that what a computer scientist will often say is they'll start with sort of prior mean zero or they'll build normalized functions that their prior mean zero what we might like to do is use some sort of information that we already have about a function and use that as our prior information or our prior belief then we also have this special covariance matrix K now when we're building the model when we're training the model when we're learning how the function looks what happens is that we start off with having these functions that are possible within the our set of prior beliefs right so these are three functions that are pulled at random from the prior and as expected they're mostly within the gray lines but then I told you it was only two standard deviations away from the bold line and so as expected sometimes we have some of the functions going outside of the gray lines this makes a lot of sense now what we're going to do is we're going to take measurements and when are measured when you have a measurement and we have a sort of realization of what happens when we have that input then we end up with now our prior conditioned on more data that's just gone in and this is our posterior belief about the function so as we go along we can develop a better and better approximation of the actual function everything's going great now there's a couple flies in this particular point meant just because I mean this is a computational problem so if I don't talk about trade-offs and I'm probably lying about something or other right and in particular the thing that limits these particular Gaussian processes it's hidden in this k plus the the noise variance term the identity and then the minus one term so basically taking the inverse of a matrix is the thing that sort of limits this particular sort of method so what is happening is that if we take lots and lots of data points then we are inverting larger and larger matrices and so what you might imagine is that many of my colleagues in this area mostly they're studying matrix inversion right which is which is a little bit funny to be studying in 2018 but but there you go okay so then I can also pull representative functions from the posterior distribution okay so why do I care about this particular class of a sort of machine learning method well it actually been shown several times over the last ten years or so that this is extremely valuable for Chemical Engineering now about ten years ago now professor Grossman at Carnegie Mellon was showing how to use this kind of stuff and honestly been able cavallo who's working here in this apartment made a lot of really big contributions in this area I mean there's a reason that she gets to to come here right is that she's sort of a star in this area so what happens is that all of these authors have shown how do we somehow hybridize the the sort of functions that we know and love as chemical engineers right things that we can explain with data-driven models right so this is sort of the first set of contributions that my group actually does in this sort of area but we're really excited about it okay so the only other one I should mention just in passing that a lot of people work with is this Bayesian optimization in addition to predicting a function I can also optimize it while I'm predicting it this becomes important so what is it that we've done and what are what are what are we excited about the first thing that I want to mention is design of experiments for model discrimination so we all sort of work with biologists or chemists sometimes and I think when you get ten chemists in a room and you ask them what is the mechanism for reaction you get at least 15 answers right and so I might have people in pharmacokinetics who disagree with one another I might have people developing metabolic pathways or reaction mechanisms etc etc now the the mathematical setting of this problem is that basically we have and expensive to evaluate system right so we have the design space that's the input and then we have the output space that we can measure and there may be many latent variables in between here we don't we don't really know what we assume that we have some collective data and the most dangerous assumption that we're going to make here is that we're going to make the action that the data we have collected why is normally distributed that's the dangerous bit right with respect to some underlying function some underlying mechanism Plus this this this covariance term now the danger of course is that we don't have sort of a normal distribution here the machine learners do have ways of rescaling in these sorts of data so we have competing models these are competing mechanisms and in this particular setting we assume that we do not have enough information to distinguish between the models so what happens is that there is uncertainty in the model parameters and basically we are not these are all parametric models right because we're assuming that we're working with chemists or with biologists who want to write down reaction rates on these sorts of things so we have these parameters but then we don't know what the parameters are and we cannot distinguish which of the models is true right so what we're saying is that if you have M models each has probability of being true about 1 over them so what are we going to do in particular we want to know the next experiment that we're going to take now this is an extremely well studied area and what people do in this particular area is they develop these so-called design utility functions and what they do is they take in all of the previous information that you have about the function and then let's say that I've done five experiments or a five data points where am I going to take my experiment well I would hope it would be where the red and the blue are the most different and you know I would hope that my design utility function would tell me to take a experiment in on the far right I guess so all we have to do is if we have a well-designed design utility function we maximize over over the function and that's where we can take our next so that's that might be useful okay so so these have been developed over the years for mostly for storing tallying ability I won't be citing people constantly just to know that this is this is a well-established area of inquiry but what I would argue is even though there's been so much work over the years there's basically only two approaches there's the approach that is sort of analytical in nature and there's the approach that's sort of data-driven in nature the approach that is analytical in nature says let's linearize around the are current parameters that we know about this is assuming a sort of a linear approximation there are there are some people who do different approximations or make no approximations but whatever it's basically the same type of approach according to me and then what we do is we assume that the parameters are normally distributed with respect to our sort of general belief about the parameters and we develop a closed form expression for the utility function basically following classical literature so this particular approach has stretches back to Box Hill this particular approach is 60 years old right this is a really nice approach and when you can use it it's basically the best thing you can do so you get closed-form expressions we can evaluate those really quickly and the downside here is that I said we needed linearized models that means we need derivatives right so if this is a an analytical function we're doing well if this is like a PDE that we have to evaluate many times and it's expensive for whatever reason that might not be so we need a alternative approach or rather the competing approach in this particular area is the thing that's more sort of followed by the the systems biologists so what the systems biologists do is they say well we don't know what these different these models are so what we're going to do is we're going to use Monte Carlo methods to sort of estimate the the different functions so we'll estimate the competing mechanism mechanistic models and then using sampling what they'll do is they will determine what is the expected value of the utility function now this is great because you can have maybe one person wants to work with a model made based on a PDE for whatever reason one person wants to develop a model in Excel I don't know what but it's very computationally costing in particular if you know what K nearest neighbors is it's an expensive technique in computer science you have to do that at every time step so it's not going to be practical so what Simon thought about and the contribution that he made was he said well why don't we hybridize these two approaches right so we want the best of both worlds we'd like to be able to generate we'd like to be able to work with the blackbox models like with the hybrid approach we'd like to be able to use all of the really nice easy to compute functions from the analytical approach so what he does is he evaluates the models a lot of times hopefully fewer times than you would need with the money Carlo method because one of the advantages of these GPS is that they do tend to approximate a function well early on he trains the GPS and then once he's trained the GPU can just apply the analytical approach right so it's it's kind of a teen way of doing it of basically saying well we would like to use all of the techniques from the the analytical approach but we would like to who use blackbox functions okay the downside of course so we get all sorts of nice things now we're dealing with GP scaling issues and I showed you right at the very beginning that inverting that that matrix is not going to be the easiest thing in the world okay so what Simon is comparing to is he's comparing to a number of analytical case studies so basically how much worse are we than the analytical approach right because that's what we liked it easy and then basically can we also look at a non analytical case so what he does is he's considering a number of these utility functions it's if you go into the design of experiments for model discrimination literature there is a fun and sort of over 10 years debate about what is the best design utility function but of course I don't want to get involved in and so these are several but what you'll notice that is different between these is that you want to maximise the design utility function so each one of the three design utility functions will suggest different places for your next experiment so we have four problems with competing models and then three discrimination procedures so we have 12 tickets X okay so what time in dots is that now we have these twelve case studies and then he takes a hundred random trials for each of the case studies and this is the number of extra experiments needed for the analytical approach versus the number of extra experiments needed for the hybrid approach and across his his experiments looks like this this is the the mean and the standard error of the extra experiments he needs to do and so basically at least for the problems we're testing he is able to do roughly as well as the analytical approach okay that's great now we need to compare against this non analytical case study this is where the the the systems biologists are making their contributions so we are looking at four competing methods and twenty initial observations and what's going to happen is that we took we emailed this guy and we said you didn't even put your case study online can you put your case study here we have it and we ran it and all we required 20 to 40 extra experiments to be able to distinguish models which was a little bit concerning for us and our success rate was terrible so our success rate was we were almost never predicting the correct model now success rate being low isn't horrible right but what is horrible is when failure rate is high does it means that we predicted the wrong model right and so in particular this method of pussy Ferraris which is based on a chi-square distribution if you know that then you know it's a little more safe it's not failing but it's not exceeding either and then we have all of these these times where the the method just doesn't work so we were a little bit discouraged and then what what Simon was able to do is he took a look at this particular model that shows up in literature and it turns out that the model is in discriminable so what happened is that so in the systems biology literature they say they're doing designed experiments for model discrimination but the monte carlo method takes them so long that they are not doing multiple runs of their own work so they are not pushing it out there they're doing a few things and then saying oh look we think we know which model it is they are not actually proving which model is correct and they were posing a model in the end where their f1 and their f2 are actually indiscriminate with respect to the error that is in their parameter so actually this was just us not being not lying so at least we're not claiming that oh we can discriminate models when they are in discriminable just to check ourselves what we did is we threw away one of the two models that we felt was quite similar and we are all of a sudden able to predict at a much higher level okay cool so that's one thing that we can do is that basically we can explain what is the best model and why right so when you're in these sort of low dimensional spaces of the Gaussian processes explaining why is is something that that is is fairly straightforward so so that's something that that we like about this particular method and we're much much faster than any sort of Monte Carlo method we have made our code available online it's been forked a few times and so I guess we'll see what happens next with respect to that there are some case studies that Simon has been doing with our industrial partner buyer that seemed to be working well but I think I don't have their permission at this point to present what they're doing so okay the next thing I want to look at is multi objective optimization so I work with tissue engineers and when I work with tissue engineers the tissue engineers have to tell me sort of exactly what to do because I am NOT a tissue engineer but what they have is they have this bioreactor where there's only three degrees of freedom so what they can do is they can pump material through them through the bioreactor at this flow rate that's and then their medium is going to sort of get eaten up after after a small amount of time and so they're going to change a percent of the medium every hour hours so there's three things I can change I can change the flow rate I can change the percent of the medium that I'm going to switch out and I can change how often I switch out the medium and then what they're doing is that they are looking at flow past the scalp they have this neat setup where they're growing sort of bone neo tissue and then they have this partial differential equation model that they think is well modeling what is happening in the scaffold so it's a combination of sort of creeping flow when you're near the when you're in tightly constricted areas and then just sort of normal low Reynolds number flow now what my collaborators mocked as they want everything right so what they want is that they want to put as much material in that scaffold as possible right so they want to grow as much as possible they also don't want to pay for it right so they don't want to have this be as cheap as possible so the interesting problem that we have here and I think this turns up frequently other places if we have one thing that's extremely expensive to evaluate say like even doing an experiment in our case doing an experiment is solving a PDE and then what we have is we have something that's really cheap to evaluate which is cost so cost is really you all I have to do is is count up how much medium I used and that's how expensive this thing was so what I have is I have an input space so these these x1 and x2 would be my three variables that I can change I can put values in and I can get out what is the result with respect to my object of functions in my particular case I want to be minimizing both objective functions so I want to not be paying anything and I want to have as little void space as possible by the end so I've tried this combination of x1 and x2 I've tried this combination of x1 and x2 I try a lot of combinations of x1 and x2 and what I want in multi-objective optimization is called the Pareto frontier so what's happening here is that that black line to get better in f2 you would have to get worse in f1 and to get better in f1 you would have to get worse enough to so that is the approximated efficient frontier ok so I want both I'm not going to get both but I want at least when we asked the tissue engineers will which do you for her they said 100 we want to know the whole thing right we want to know what are the trade-offs right okay now okay so there are there is one work seriously funny three before we got this out the door but we still did something new so basically what beneath contribution is to do is that she's looking at weights on these parameters where she's taking all of the objective functions f and she's combining them with weighting function and then you could try different values of the weights and you could get some convex combination okay now that's a really good idea it works really well in some cases there exist functions for which it it doesn't work as well and so but there's it's boy there's also going to exist functions for which the idea I'm going to present isn't going to work as well either so they're they are different from one another and have different advantages but what we're going to do is instead of looking at the scaler ization method we're going to be looking at the pareto method good so this is a lot of Max and so I promise to not to buzz too much and so I will not boast too much but basically what you want to do in this Pareto method is that you want to be evaluating your next combination of x1 and x2 so that you are driving both objective functions in our case in the good way somehow now you could either do that with improving a volume or improving this maximum thing with respect to improving the volume what we want to do is we want to find this new point with with the cross in the in the upper right hand corner labeled Y we want that shaded region to have as big an area as possible right so we want to have that area be as big as possible that we want to find an X 1 and then X 2 that push down as much as possible into the right now the way that this works and this is this so far is a well-known sort of thing is that you basically tile the space with rectangles you count up the number of rectangles that you're now going to shade and then the reason that this looked so ugly is that there's this probability of Y given X thing that is sort of my probabilistic understanding as to whether or not there's actually going to be an improvement here so the contribution that Simon made here is that this it's well known how to do the expected hyper value volume improvement on if you have two black box functions or if you have two functions that you can write down he's doing it now with one function that you can write down and one function that you can't the black box so that is his new contribution here the other of the two sort of methods I don't want to talk about quite as much basically because it's just a little harder to explain but basically what it is is if you're looking at that point labeled FX one you take the smaller of the two dimensions with the dotted lines and you say that's how much my improvement is so it's just a different way of measuring improvement they are different there exist a lot of different sort of metrics in this particular area the multi objective optimization people have developed these over the years but in our case they're not so very different at all and so what we're working with right now is we're working with a reduced order model of sort of what our experimental collaborators have developed it's basically in a low dimensional sort of OTE that's quick to evaluate and what you can see is that basically the plot here more or less indistinguishable right so in our particular case these it's very nice that Simon has developed both this expected hyper volume improvement and the expected maximum maximum improvement but they're not so different what he's done is he started with these ten initial and then his methods are finding 25 extra samples that are sort of predicting what to do next now if you really squint you can see that the expected hyper volume of improvement happened to sample more points on the top and then the expected max min happened to sample more points sort of on the side between that 60 and 70 not a big deal right however and what I think is exciting is that he is doing significantly better than genetic algorithms which are commonly used in this sort of area and in machine learning what you often have to ask is am i doing better than random right because you know you go around and you develop all these fancy things but then am I better than just sort of a monkey throwing darts he is better than a monkey throwing darts so these are three metrics of whether or not your multi objective optimization method is working well I don't know which I believe except that I believe whatever is saying that my method is the best rate in the first one I want to be as low as possible in a second one I want to be as low as possible in the third one I want to be as high as possible and so there's no understanding in the literature about is there the best metric so I just use all three because what else am I going to do right at least we can we can show the trade-off that way so what's happening is Simon starts with ten function evaluations this is both this is for the random methods for his specialized methods which are in triangles and red and blue and then the genetic algorithms as well and then over time what you can see is that his method are performing fairly well so that's nice it's still only one problem so not probably interesting enough yet but he took the back to saying plots and took basically every test function we know in the literature and basically the plots look the same so what is what is of advantage here is he's taking advantage of the fact that he has one black box function and one function that he can get these derivatives out of really quickly right okay now one of the things that I mentioned earlier is that Gaussian processes have an annoying tendency to degrade when we go to many samples they also have an annoying tendency to degrade when we go to many dimensions and so we did try to test this a bit there's a lot of stuff that's normalized here but basically the GD the MPF e and the vr are the same as before except that we are normalizing the results with respect to random right so when you take when you are looking at an increasing number of dimension and here what I mean by dimension is input dimension right because in the problem that my collaborators have they have an input dimension of 3 what if I have more design variables that I'm able to put into my optimization problem what am I going to do then I still only have two objectives I still only have a blackbox objective and an objective that I can evaluate but I am increasing the input dimension right so as I am increasing the input dimension I absolutely expect that my performance is going to degrade right so I mean as as the input space gets bigger my predictive power is going to get smaller and so what we are doing is we are taking the predictive power that we get out of our methods and dividing it by the predictive power that we get out of sort of random so basically what's happening is random is also degrading very fast right so that's why you see in this first plot as you're increasing in dimension the two methods that Finan has developed are staying there they're performing relatively well compared to random whereas the genetic algorithms are degrading and actually not very different from random the second math shirt we're showing just to basically we mentioned that there are trade-offs clearly in the second measure we are not improving over random right and then in the third measure again y'all so what Simon has here is he has a method where he is able to predict what is the next point that I ought to take as my experimental measure which is which is a really really valuable thing for him to be able to do so not only it Simon able to say where should I do my next experiment to try to figure out where this entire Pareto curve is but remember early on with these Gaussian processes what we also have is we have a probabilistic understanding of how good our approximation is right so in in red we are showing the true Pareto frontier this is the dotted line so this is what we want to get you can see that the the probabilistic approximation with only ten data points is not great and yet the red mostly falls within the uncertainty regions as we would expect right so so basically know the model isn't very accurate but the model also knows that it's not very accurate so that's kind of fun we increase the number of data points that we're measuring and basically the noise is going to significantly down and we get better in better approximation so that's great okay so then what we can start to thinking about doing is looking at the expensive to evaluate problem we I think one of the really interesting things that I think all of us has to deal with in this area is legacy code right so we tried and we tried and we tried to get this PDE model to run on any cluster but the one that they wrote it on and basically incorporated basically every person in our computer support team in Department of computing until they all hated me and nothing so we can only run on this one cluster in Belgium and basically what Simon's result says is that we only we only were given time to run an extra like ten points or something like that he's basically able to show with fairly high guarantee but actually they already had a wonderful point right they already have that red point in the upper left-hand corner anyway you can try out this work his works been accepted to I Triple E transactions on biomedical engineering it's pretty cool okay I want to be respectful of your time of course you know I get an invitation like this no what I tell you everything that my group is doing right and so what I will only mention rather briefly is work that johannes vive who is another PhD student in the group has been doing with collaborators at Schlumberger so this is much if you've seen other process systems engineering talks this is much more traditional process systems engineering little less machine learning but basically degradation matters right so what we have is the famous state task network where we are trying to produce assorted products using these sort of unit operations and we have all sorts of choices to make right we can schedule processes on the machines we can operate them that sort of slow normal or fast sort of operation speeds and then we had better choose when the maintenance was going to be now here what I mean is I mean preventative maintenance right because if you choose not to do maintenance well then you're going to have to do maintenance later it's going to be much more expensive and it's going to be corrective maintenance right so we don't actually hair just about making preventative maintenance as cheap as possible we actually care about the trade-off between preventative and corrective okay so this is as mathematical as we'll get in this particular section but basically what we have is we have process variables right these are basically balance equations etc etc we have maintenance models and then and this is fairly similar to work that was even happening in the late 90s we're gonna have a health model so the health model is going to tell us about how our particular process is is is going just to mention that previous people have fought about this this much at least for since since the late nineties okay so this is this is great but what we want to do now is that we want to combine this process level scheduling and planning with more sophisticated degradation modelling so I don't know if you know this but you all in addition to having an amazing chemical engineering department also have an extraordinary industrial and systems engineering department just across campus and the people who are there have been doing a lot of really great work in degradation modeling for quite a while now it's it's amazing stuff but for whatever reason these fields don't really come together all that often you don't often get the sort of entire systems view of saying let's look at scheduling planning degradation and modeling and sort of all together okay so what is degradation modeling this was less clear to me right if I come from the Operations community so I'm less knowledgeable about what what degradation modeling is but basically what you assume is that you have this degradation signal and over time as you're using the Machine the the machine will be great until you decide to do maintenance if you pass over this sort of s-max well you're in trouble you know I mean and and in various amounts of trouble right like either the unit is down or something worse happens or whatever okay but what these degradation modeling people have done over the years is that they've developed this really nice technology that is using sort of stochastic processes to predict when you're going to pass over this s max with some probability distribution so what they're able to stay is they have all sorts of they have a lot of really neat mathematics that they've developed they are able to tell you if you can posit a sort of stochastic process for them when they are going to pass over this this s max okay so that's really cool but I've got a bit of a problem here I've made this look awfully simple where I have this flat line right but this is a scheduling problem and I might be using my unit in different ways over time right sometimes the unit is off sometimes it's operating in a way that maybe the unit won't degrade very quickly sometimes it's operating in a way that the unit will degrade very quickly indeed and so although if I already know what is the sort of history of the machine I will know maybe what the the degradation signal is probabilistically or at least that's what the assumption is in the degradation modeling community um I might not know it from a process operations point of view so we are only using what is the simplest thing in the the process operations literature which is so called robust optimization where we're saying we give ourselves some probability of failure and the way that we give ourselves some sort of probability of failure is we introduced this new tuning parameter alpha where if alpha is say one half then we are only looking at the the very particular point and then basically changing alpha is going to change how conservative my estimate is so we're looking at the state task network and what has been done already is that people thought about how to integrate a unit health and maintenance scheduling and then look at multiple operating units what we're going to do is we're going to use robust optimization so that we can sort of guarantee probabilistically whether or not everything is going to be okay we're also going to Metra sighs just how expensive it is to robusta fire selves against you so what's happening here is I mentioned that if I already know what is the sort of sequence of operations at least in the degradation modeling community they do assume that I can predict what's going to happen over time um however what we're going to end up with is a bunch of different jobs in a row and then what we're going to do is we're going to evaluate sort of over and over again for lots of degradation modeling signals how many of these signals are going to end up so this we can do many many times either with historical data or just generating lots of possible signals and then what Johannes does is that he says well what is the price of being conservative right so the more robust I am the more conservative I am - let's do just maintenance ASAP because maybe something will fail and I'm afraid of that for something like so as the Alpha parameter increases we get closer and closer to the deterministic answer the deterministic answer is going to be extremely brave right to think that we can just sort of push things to the very limit so the probability of failure is going to shoot up so that's that peak so as I increase alpha I'm going to be modulating my probability of failure and then I can do what's sort of considered in this in this field to be talking about how expensive was it for me to robusta Phi my process so as my probability of failure are increases then the cost of doing the preventative maintenance only if it isn't right I mean this makes a lot of sense right so so basically if I'm going to have an extremely expensive process where I'm doing maintenance constantly I'm probably not gonna fail that's really useful so what what yo.hannes actually ends up doing is that we don't we don't know what is this value of alpha what how we should tune this parameter right except that we know what is the cost maybe of preventative maintenance and we know what is the cost of predictive maintenance so what we can do is that we can just solve an optimization problem that is saying what is the cost of preventative maintenance versus the cost of predictive maintenance now this is honestly a very long-winded way of saying that we get right back to the same sort of Bayesian optimization the same Gaussian processes as we were using before because what we have on the the right hand side is the cost of failure and then on the left hand side we have the cost with respect to doing the preventative maintenance as we're planning to do it so these are too expensive to evaluate things and basically what we do is that we go right back to using Bayesian optimization again right so Bayesian optimization is doing the Gaussian processes and at the same time it's sort of predicting what is the next best place to sample from an optimization point of view and so what we're here in in blue is that we do many many runs and consistently the the Bayesian optimization method knows where knows how to balance between the corrective and the preventative maintenance this is also available I'm brave enough to put the name of the journal before it's accepted because I don't normally get that bit of comments in first around with you but also his code online so basically what he's doing I flew through his project and apologized for that but basically he's taking historical data he's developing a sort of typical process operations model he's also developing a sort of typical degradation model and then he's he's melding them together using robust optimization which is quite cool so thanks so much for having me I want to end now in case there are questions or I hope there are questions and you can try anything that we've done so everything that we've done should be reproducible if you can't hit Bo on those on that code and get exactly what we get in our paper we have something wrong and please let us know [Applause] thank you yes yes yes right there's nothing bad about f2 there they are models so this is I mean this is the classic in discriminability problem right is that so f1 and f2 within the noise that was within the system are identical problems I mean from an engineering point of view what would I do well from an engineering point of view either I would find new maybe input design variables that I could tweak or I would find new measurements that I could take or I would find hopefully I would try to lower the noise in some way or another or I would develop the model in different in different domains I am in a computer science department and so all of those conversations are not things I mean what my students are experts at is developing methodologies and so the reason that we we cheat and we just drop f2 is to check that we are not going insane and that it's not that our method is just performing like wacko right so I mean yes if I have a if I have us if I have a system that is indiscriminate ball I'd better do some really clever engineering right about then we only trying to design the computation system that's going to tell the models apart yep how how is it how is it at all reasonable yeah so there's two so it's it's a good it's a it's a it's a good catch right because there's two answers to that the first is we're not actually doing all that great we're just doing better than random right so so what I am doing here is that sort of wine under the GD the wine I'm under and PFE I'm sweeping a lot under the table and what I'm doing is I am comparing to randomly deciding where is the next points that I want to evaluate so even though I scale really badly in terms of dimensions I don't scale as badly as random scales that's one answer the other answer is that there's two types of dimensionality here right one is the input dimension and one is the output dimension now what we are using here is we're using a very particular test case function that's well known in the multi objective optimization literature and the reason that we're using it is that it's the only function that we know of that scales really nicely with respect to dimension where it's dimension is just a tunable parameter and then the problem gets bigger so there's probably some sort of special structure in there as big as you make this function there's always going to be two outputs but there might be more and more input so is the the GP finding some sort of pattern in the data possibly right so so I I like this but I do not claim that this happens all of the time frame please yep okay that's a really excellent question right because right at the beginning I I said that with the GP that where we are going to make this this normal assumption um with like heteroscedastic heteroscedastic data they can do some amount of normalization but it's very very hard whether it's a reasonable assumption or not is an excellent question from an engineering point of view right the results show that for at least some of the test cases that we have tried it's a reasonable assumption but that's a dangerous thing right because machine learners regularly get themselves in trouble this way right you know like oh this seems to mostly work and then I'm predicting really weird stuff so it would have to be very problem specific it's a good point I mean it's something that we have to be looking at right yeah aha this is okay that's that's a really that's a really good question could I use it to find new models at least at this point the best I can think with respect to that is the kind of work that happens often in chemical engineering is we have one model that seems to work often and then we want to move to a very slightly different scenario right where we want to change a few of the input sort of few of the inputs and then we want to see what is different the best I can think of where Gaussian processes might be able to contribute something new is if we use that previous model as a prior as our prior knowledge and then what we're doing is with each new sample we're figuring out how we deviate from things once we know how we're deviating from things from from the old model using our now posterior distribution we would have some knowledge about the new function perhaps that that probably because GPS are nice and you can do all sorts of nice analytical things with them you could understand in some sort of sense that's but I agree that there is an awful there's a lawful lot of big questions you have a lot of work to do