If you're in for something I will consider contributing to the British scholarship lecturing very hard for Russian Foreign Ministry I'll start by saying I suppose and that's what's wrong I said George if we say we are not of the busiest Holy Night and Mansons national arts then you can see three Israelis really really good at the news with one aspect of that. Really is partly the pastoral operations research and the writer of the zero zero research center and I see us being with an amazing faculty seems not to be a shock to your city masters and Dr began to find out and help me to research that myself and he has brought the interest of research people in opposition to life probability and they're likely patients in health care and has operations management for future use it has gotten more than fifty percent of the first three regularly I suppose he has the fourth book there and we're actually. Going to see how with this spring. He has supervised thirty three doctors do it and you currently supervising nineteen others leads into the number zero to let you know economy of Engineers is two thousand and five and has received numerous research awards. Including more spies. Awarded guys paperwork cation process price problem price rising opposition and presidential young versatile word and he has told us there are several Yes core values old can be used in areas of careful services health care and mediation published in addition off the Sentinel research from your own he also has been a great educator and great inspiration for many young people myself so we're very happy to have people. So I just want my friend. Thank you one thing if you expect to learn another dive into Type I think you will be disappointed so I'm very glad to be here particularly because I see. The average age in the audience is exactly what I would like and this includes George are a. Great to be here so so today I would like to tell you about some work that I have been been pursuing in the last couple of years in the area of the district and last in learning but from a different perspective. Than you might have expected so it consists of three bodies of work including. Best selection this is work with my. King who is graduating and my novice at the time but soon to be a faculty member at the might be Muslim they're. Least median of square regression with a whole and algorithmic approach to immigration so. So typically what I tried to do in the talks is I would like to make sure that the message in the Dover publishing series that has been written one hundred seventy now a classic book in the Dover series with a title might mark a program in statistics and I'm just quoting one there are many others but. Continue optimizing undeniable have been packed into the sticks but a lot in addition in the last two decades convex organization methods of increasing importance titles like compressed sensing matrix completion and more many others are hot subjects at the moment. However many problems and so this isn't allowed to convince you of that soon. Cannot really be expressed as mixed in the optimization problems in other words it's not that I do is the problem to put in the framework it's a natural way to think about it. However at least in this experiment I know. This mixing that of the musician is considered impractical. And the corresponding problems. Not solvable that's the Karen belief to the point. What is the evidence I have of that I have some empirical evidence of this is that people that are educated at Stanford Stanford is one of the top Department statistics have not taken or been exposed to what. Is not know they can of course they don't know what it is that's a serious matter in my book I don't know what's happening in this department. What is the answer to that actually. So maybe the Stanford is not only allowed liar so your other heuristic heuristic methods are used for example I mean some of you might know some of you would know lasso is a method to develop one thousand nine hundred six for best subsidy regression or card classification with aggression. Is your heuristic methods. He has a problem and we have. A method which is your reasoning. I would like to educate you. Many of you know but many of you don't know the progress of mixing those are my position in the last twenty five years and I can say this with pride because my contribution to that is very minimal so this was done by the community not only by one person so and. So these are some some facts this speed up between suplex one point two subjects economy let's. Get started nine hundred ninety one and the first version of simplex simplex one point two. The company was sold to I log and then I.B.M. simplex eleven. Cannot see this truth until two thousand and seven is about twenty nine thousand times this is both Bixby's calculations. Groby one point No this is a people from sleepless. Started a new company called In fact. Out of this department in the George I believe in two thousand and nine is roughly comparable to subjects eleven with speed up between Groby one point zero and roughly the current version of the flex maybe six point zero is it is now it's about twenty times you multiply it was about five hundred eighty thousand times this is just software this is really the collective intelligence of people who misuse optimization who propose are going to restrict lower bounds cutting planes and so forth at the same time. So just will straight what this means. That would have taken seven years to solve twenty years ago can now be solved in the same twenty year old computer in less than one second that's what it means this is namely not not considering hardware just software or hardware has about the speed up of about three hundred twenty thousand times these are Proxima Now if you multiply the two you have two hundred billion times. And there are too many fields that I know what the progress has been in the billions I hope you realize what it means two hundred billion it means that something that you have taken centuries takes seconds. OK. However given that so what is the motivation of this work given the dramatically increased power of mixing the optimization is mixing the able to solve key multi vital statistics problems considered intractable a decade ago that's the question and in particular how do these kind of solutions compete with state of our solutions. Here's an example the most widely used and I mean this in all sciences not only. Are there the most widely used methods scientific method in today's computers is aggression and very good includes scientists you know people in history made this in engineering everybody uses regression. A small percentage use of the magazine is being widely used yet the way we teach it we teach it as an art I think it's a trick question of the Mighty we say well we have these requirements Tyler Moore and so forth that's not science that's an art versions the question is Can we had this process and most importantly in my view given the young people in the audience. What are the implications of the interests of this sticks you know if there are positive answers to these questions should we change the way with its of these things. So what are the problems. I address in this talk that work is broader But so I pick some examples that are simple to state so the first classical problem is a so-called subset selection regression So that's the usual Gaussian problem where you have some. Observations Y. X.. Corresponding data recorders. The problem that Gauss all of the century said goal is to minimize overbet the. Problem and now we have a small complication we want that the number of distinct variables to use smaller. Less than K. like let's say for example we have. Ten thousand observations we have a thousand. Variables What is the best regression with fifty non-zero components. Second problem at least median regression so we are very similar but instead of the sum we want the median of the value of where is it to us why is this relevant because if you say you want to regress income as a function of for. Factors that affect income and you have Bill Gates there are Warren Buffett one of the invasion of the included here might actually make things very very unstable. But if you have the median It doesn't matter it's much more robust to where are or outliers. And the final part of the talk is I suppose you want to get action that has nice properties sparsity multi-collinearity categorical variables with many nice properties how can we achieve that in a way that we have a guarantee that we achieve it by the way I recognise a big audience feel free to ask me questions when you have a question ask me even though I'll try to answer in other words if I know. So let's start with our first friends. Subset selection so so this is a problem that in nine hundred seventy four was solved by to start this little farm and Wilson this always by implicitly. And in fact there is a team in our art for those who would not know it is a free package that implements many algorithms Massillon And so this six and it's implemented there and it's called leaps so if you ask clips it solves that problem in a sense but but it cannot solve it doesn't scale beyond the number of variables being about thirty it doesn't solve Beyond that I mean check me out on this. So lasso. Was proposed by Robert Gibbs here I mean ninety nine this is done for a professor and in parallel by another stand for group one thousand nine hundred eight and just to illustrate the impact you know citations is a measure of impact reasonable size of impact how many sections do you think George Dance a good book. Dancing is sort of the father of the of the migration then we are programmed because. One would expect to know how many Any guesses. Come on and some guess. Six thousand. So this was a couple of months ago when I searched I didn't search in update so the last row has has a citation of ten thousand and growing ten thousand citations. So you could say many people use last year which is over twenty years than sex. Who is over sixty years and what is the method the method this to take the original problem are the penalty which is the absolute value of the baiters. That's a quadratic optimization model because this is a nice convex function so it can be solved reasonably well with convex optimization codes. And. Some theoretical evidence that it produces solutions namely solutions that have small number of non-zero baiters and this is what is being used today this is what departments around the world teach. About sparsity that I know of. I would first like to make an observation that it's technically stated that is I believe it's not widely known fact I would like to ask the students of the audience whether or not about to say they know of the next first life George two. So I claim that lasso has absolutely nothing to do with sparsity I claim that lasso has only to do with robustness and here's the statement suppose you have with the original problem but your made the execs error in it right. So what you would like to do you like to minimize over beta the worst case over some back to Beijing in Delta that are bound into now which is exactly norm since it's also important just that they don't I would like to have errors but the errors are not unbounded there is small bounce on the right so this is my home article statement that the minimum overbet as a robust regression this is equal to is not empirical statements and my question is the minimum of the usual. Last longer times the norm of the beta so if you used to equals one. And. B. equals two. That's the last. So in other words this is that what does lasso. Mean if you look under this light it it protects you. Against arrows on the data sparse that there is no to be found. That I can see anyway right so is this known to people. Who knows it. So we don't teach it so here's a method which is not of this difficult segment by the way that is. If you know something about duality this is not a deep statement but I believe it belongs to the classical things we teach right. So this is an aside has nothing to the top but I thought to mention so so I claimed the problems of the nature of that I suggested can be naturally be expressed as discrete of my some problems like this is not like a deep serve the ration So when how would you model that thought to be easy and undergraduates exercise maybe the first lecture of George's class maybe the second. Mini might have something that variable zero or one would have the betas to be began formulation M. times e I missed some of the i's is less than K. So if the I go zero the betas are zero if the guy calls you on it and it doesn't constrain by picking a property themselves tell you how to select the variables so this is an exact formulation. Of the problem in question. And I consider their natural one not a great formula but a natural formulation. So the title of the talk is. Statistics under a modern optimization LYNCH So what is more than so I claim modern. People into this is nothing more than think first order methods. This is a popular conflict methods are popular musicians are known and in view I have is that you should use both first order methods to find good feasible solutions and. Mixing that optimization methods to find lower bounds are really much NG That's the game that is the modern means two things first order methods continues methods and discrete ideas that we have seen huge huge improvements have been made so there are some concerns here I don't want to but there is a this am I can be computed from the data this is not a magical thing OK so I skipped it. Another way to formulate problems with navy don't like Begins is for those of you that are. A bit more knowledgeable optimization this is a way to say what I really want I want if the eyes. Big better I is also zero does the implication and this is a way to express this with special order sets that's a way to express if you don't want to have began methods is another way of formulating the problem that the. Major state of the art. Of my system coach can accommodate. So let me reiterate before entering the theory how do is an example of in the end of the day what we have achieved. So so this is time so this is a problem about diabetes or your world problem because about any course three fifty observations patients sixty four variables and the objective is to find how to explain how diabetes what is the effectiveness of diabetes that's my health care contribution. For today. With only sixty six variables to explain OK So here's what is happening in these methods so very quickly using to restrict methods of the first order kind I'll tell you which method just really strange. Steam leak within like two seconds you find a solution that you don't know it's optimal but you find a very good solution and then you it takes in the programming about four hundred maybe five hundred seconds to prove optimality right. Lasso a competitor method takes also a second to find the solution and stops doesn't know. So these methods because of their in their programming business can solve right to the best of my knowledge. I do not know this is not a big problem by the way but just to illustrate I do not know any software package that can solve these problems or probably manage to get on the right so so. So the question is Is this an accident I mean this is the or are we lucky. So that's a relevant at all but what I would like to emphasize is that. If you don't care about guarantees so what you don't care about that you want high quality solutions to solve the problem you aim to solve you can only solve this if it takes the same amount of time the other methods we can only teach in our courses if you want more you want to be sued then you have to work hard and you know it. Illustrate that George Will patience my friend. OK So let me out tell you a bit about the methods right so so far how are we doing everybody following. OK so now we go a little bit higher and we have some. But not too much higher though so I would like to illustrate the key methods we use not the huge mathematics where the used to say that these are not and if they do this matters so what is that that's the problem we want and the function of data. Is a convex function clearly convex a quadratic function and you can bond the function that the gravy and survival. Function virtue's So what this basically says is that the function of the gradient of the function is a leap this is a continuous function with a parameter lambda. So this implies Taylor series Elementary things that if you have data is upper bound it by the first or the expansion class and they're out there that it an exact inequality so I found what I want to optimize with an upper bound that is also quadratic So that's a constant That's a quadratic and that's a linear function in beta and that's a quadratic function. But a quadratic function that. The variable does not appear with an X. beta that was beta that's actually material. So instead of solving the original problem I solve. This is not equivalent I mean I'm using a heuristic In other words I find I solve this problem subject to this condition and this problem I can so we close form. So this is by way of first order type method I I find the expansion optimizing the expansion and it turns out that this is equivalent if you do this and you do it a little bit of mathematics the simple things you find that this is equivalent to solving data times a vector. Subject to I want a chord in it's non-zero but this problem beta mine is you square subject to that is solvable in closed form in the following way you find you take the use use are not constants. You take the use you sort them from highest to lowest and the solution is that the Bay does are equal to the K. largest and there is that zero was called the thresholding methodology these are not difficult observations these are not rich are also in my opinion so for more level observation I mean you know we don't need a Ph D. do this and I would. OK. How are we doing. OK so here's an algorithm. We missed our lives with a solution maybe we'll take the solution of the least West forget about loss or even misalignment loss. So we find it a ration So I call this given the you actually age of H.K. or you it's just this business sorting business what is you you is basically this vector so I basically iterate this is variation and if I want to be fun sure I could also line search to be faster practically and so forth. And I I continue until no more improvement in the solution. OK this is a method that is very fast very fast all these kind of methods are very fast moving. Now what can we say about how good this method is first that it converges to something and the something is a fixed point of the of a duration. That the method after it to find an epsilon. Solution such that the that you are all within epsilon of a theory another Was it doesn't improve any further more than epsilon it takes all the one or the one of epsilon iterations and this is the total in other words. We can say something about how fast it. Converges. And at some point the proper method has a problem that it finds a discrete set and then in one iteration we find the right answer your I.Q. is the cancer so the bottom line of all this I won't bother you too much there is that if this is a method that is provably convergent and it's extremely fast and you say that it is good but it is extremely fast and. Now so let's see some experiments for the example I gave you. So. So one possibility is to now so we have this method. Of people that know a little about their programming know that if you give it a warm start a good solution it accelerates into the programming a lot. So here's an experiment would just use a heuristic we start mixing the ger with calls with no we just give it to God will be the formulation I gave you and then we give we give it away with this warm start and you observe that in five hundred seconds and maximum time is five hundred seconds that you are within. Thirty percent of optimally deny fifty percent ninety nine in one seconds. For various k's the calls. If you do call start you need five hundred seconds you are very close but not optimal but with it work start and in the programming you close you close the gap. So this suggests I mean of course this is not the only of this but but this suggests that this method is fast and is useful. It closes the gaps it helps closing the gaps This is also true for this is the case which is the usual regression case that you have more more data less factors this is an example this is harder example you could spot it is not a luxury anymore you have now let's say this is this is data genomic data for fifty patients again a healthcare example most of my exams are health care anyway so you have a fifty patients and two thousand genomic factors and you would like to understand certain diseases a function of his exam factors you don't know if any of them is relevant right so you need this partial Lucian because it's a bit of course if you have fifty two thousand there will be zero I mean in sample I mean clearly so you need something that has four factors if you're going to have anything useful five factor so and indeed the same. Think holds true namely if you use this method for warm starts boy it is it helps OK. Back to drug problem the drug is QUESTION How good is the method. So so this is now give you some. Critical evidence sparsity So last was designed supposedly to be sparse the community believes is very easy if you know my cause or. Who is a top notch our decision I have work you know this is what is being taught Lashawn sparsity this look at that so they yellow This is a collection of many problems not just just one so the yellow is mixing the programming this is lasso the blue with the stepwise regression supplies something with it actually very interesting and this is State of the art of. The cone non-convex approach is in fact who is responsible for this so. This is the number of known zeros so we created problems that their rights pass this ten this is a bitty thing we know the answer because so in all and then we start. Increasing decreasing the noise the harder problems are on the left and the and they become easier and you observe the normal cases last so found that I had to parse it a lot of mixing that are found the right sparsity found the right answer. And. The one that we used did not. Write So now what is interesting is they found the same answer about the same amount of time but the method to clone got to prove that it is correct but if you just want a good answer the amount of time is very comparable because both of them are solving in essence convex problems and simple simpler things that will respond this part of the selection. Well we are going to see business so now we say because we know the answer this is the air or the the algo of makes mine is the to answer over the top over the truth. And again you see that the mix in the door is lower and sizably better this is in other words and eight percent there are sixteen percent there or. So and this continues even as the noise becomes. Less because the lower where are the easier the problem but. So on and this it was a case for an bigger than B. the usual case the unusual case very much the results are even more dramatic. Because in this case on this part of the sides we design the problem with you know of five. Four or four or in four four correct variables and you know last is not even close. On the accuracy also in other words. So in other words solving problems to optimality using these methods definitely not surprisingly but the family gets you a spark solution provably. And. And this is more accurate. So what do we learn Here's a small summary of what I have told you so far for the case of bigger Vampi. The usual case mix in this organization would warm starts finds provably of Muslim nations for an investor thousands and be in the hundreds in minutes. In minutes. And if you want the goods to Lucian but not provably in seconds for the case and smaller than P. This approach. Find solutions with better prediction accuracy than last saw for any cause candidates and because in the thousand. In minutes improving optimality now or. In hours not days I'm using here. At this two hundred billion business and I'm using a very powerful in that this is we're not alone here we have the strength of a community behind us to be able to make these statements it's not like we've found the fantastic. Another thing that we have found is that mixing those are solutions have been a significant lead to be taken sparsity I would say that's a very clear advantage and Ed and some advantage they outperform in predictive I guess about this is not so decisive. So Michael Gross was on recently in the paper has you know many many examples I mean this is the magic approach that most of optimization methods namely mixing desire and one of the first of them is capable of solving large scale instances namely problems that we care to solve practically is not an academic exercise. Questions comments on that are about to change gears that's way. OK. So let's think about now so now we have some a scenario that it would give us promise that these methods could be of some practical use let's think about the process of regression so so recently. Let me give you an example in practice that I was I care to solve this is not so so this was a problem I don't know if you see this work of. This full of from Freakonomics from a cargo of economist Francis and so there's a fellow who made some claims about. The correlation of crimes and abortion he by the way he wanted to mark out of the award for his work so it's a very visible work. And you make of claims so I set out to play because this will change that did not sound correct to me. They did not sound correct and I can tell you where the claims is he more or less claim that. There was a very significant in this nation one hundred seventy three the United States. Versus Wade that legalized abortion in this country seventy three and you observe crimes violent crimes in one thousand nine hundred ninety one and there was indeed in the data a very significant drop from one thousand nine hundred ninety nine hundred ninety. This is a fact kids are to our nation statistically motivated Was that a reason that this happens is because of the UP SO who commit the crimes I don't believe that statement by the way but that's a claim who commit the crimes. These crimes would have been committed by the unborn children seventeen eighteen years later but the argument remains. Series the argument there are many people who are not born because of legalisation of abortion this is true it's also true of the dropped. Is this correct or not not is a social implications of this mathematical I mean think about what you are saying I want to say so I checked. This is all based on regression with all the questions and the data I mean I was able to find the data he used. So but in doing so you have to have regulations that make sense namely they have some characteristics I mean this is not the whole experiment we have data and when to explain so you need the and in doing this regressions you need to have. You need to understand are is there multi-collinearity you know who the prime suspect here may. B.. There's another factor that affects abortion and that factor affects crime not that the F.B.I. or correlation so understanding these correlations matters it matters also from a from a policy perspective not only mathematical perspective so you have to understand if you Romberg wrestles with these variables and you don't control form of the column you have the you can also conclude many things you might find for example that. You know there isn't crime decreased is because mix on left the White House that the year. I mean if we're not careful so what do we do in building is a question modest at the two days what with teach our students I teach a course at MIT called data models and decisions and this is what we teach the students first year graduate students and I'm I don't think we are aware now to the mighty We do what the practice is we say start with a model if you if this variable is not significant take it out put another one mean if this multi-collinearity changes and so forth don't tell them the method would tell them what these are desirable things but we never tell them how to achieve this objective simple to honestly. And to the best of my knowledge there is no such method that I'll go to achieve this because it's not clear that if you take one variable out we won't create a problem elsewhere it's not clear to me in fact this is a crank a prime candidate in this exam or when we tried. So the purpose of his work we have seen that his approach in the program works at least in this particular case can we not journalism in other words can we right all the desired properties are very good as a model within the media programming problem which is very natural for some of the educated in this way and I'm like myself it's not natural for people who have not seen the program in their lives. So in other words I view this as an exercise I don't know it has a very. Because this is a problem I mean that's actually a very interesting. Interesting characteristic about research if you if it's consistent with your experience it's not that difficult but it's very difficult it's not. So here's a formulation of this problem I'll try to work way to give you a glimpse. A bit of that that's the previous business but now I put a robust mess in the story I put up with the last two things not because of sparsity because of robustness So that's a good property because the data are estimated they're not exact data So having got them there is good that's their business contribution. So this is part of the business we have seen right here's another example we want pairwise So for example let's say we take the variables compute the correlation from data we know what the correlation is what does multi-collinearity means it means we don't know who should political relations point nine So we don't want both variables to be included only one of them will be included but I won't impose my human intuition which one I leave the problem decide so I write Z.-I plus C.J. these are the correlated variables be less than one. Now here's here's another idea suppose you have known me other people say affirmations which one there are many millions of affirmations well take fifty of them. Take the log out and have a variable the square root of a variable the power to the fourth blah blah blah. Take this fifty of new variables and say out of these fifty and only want one. But I let the problem find what is there right now me and reformation which basically says out of this transform variables only one of them should be selected. And so forth and you can put more things there so this is a problem of the end of the programming via Now how about significance what we say you would I'm very curious. This is for the statistics part of the department we cover the health care. So what do we say very important is to find our visit music and what is it is that the six of these guys is what we tell our students right. Now this is hard to compute but we can do what's called the Great idea of blood and perform some for the so-called bootstrap So you re sample the data you compute. The confidence levels and if you find that the levels are not good you can you say this can we measure variables one two five and seven I want to exclude only that combination how do we do it with say out of this selection of variable C.J. I want there are four I want at most three so this excludes this vector and only this vector you put it in and use told you basically use a simple copy plain idea so in other words it in this reprogramming allows you to to basically accomplish something that. I needed in that particular application in many others it allows you to find models that. Satisfy all the constrains you can think of similar to honestly humans cannot do that is this human cannot do this I mean we are not engineer for that and you know for example we did we gave data to our students we do on it actually do to them very tired enough. But in the program it doesn't retire it only requires that it's one of the solutions so how good is this idea for the is it feasible computational I would say is very feasible computation very feasible so you can solve so we're basically the following there was there was data in the U.I.C. and the University of environ they have a nice. About two hundred problems very question type small law. Large very large and we solve this problem using this metal imposing all the constraints. She is and I mean I want to have two hundred here some examples and I didn't pick the best I picked a random collection of all the problems so in this example we found for example that the are squared of the is comparable to other methods except we were able to find a good spot solutions must parts of the other methods but most importantly they imposed everything that we wanted to gather whereas all these other solutions did not have the properties we wanted for example I could not so here's an example with airlines where was it an airline signable small was a big Here's an elevator example is data of predicting elevator usage in big buildings in New York. So we ask where it was the same but the but by look at the correlation it had ninety nine at the moment I was able I could not find a good solution from a human perspective of course was you know eighteen variables nine thousand of the variations but we start putting the correlation at point six seven eight it found the solution to start putting the correlation more and more less less less you you baby will fit but is able to do it humans cannot do that. So in other words it is possible with the programming to find interpretable solutions sparse with multiple reality significant variables all of them simultaneously I think this is a useful tool and I think this is how we should be treated right not. Me not this is a program just put out the black box with how you run it. OK. Last but not least. So the idea of outliers last week a problem in which. When you find when you do that at least where. You have issues with outliers. This is not a minor issue let's say you do it in legal trial and in this clinical trial there is one outlier just one and click out there you don't have thousands of patients you have a hundred patients but one patient is you know things did not work here but if you because you included in the data you might like you have your own your conclusions that's actually a real problem is not any Maginot you want. So and not your idea has been proposed in which instead of solving really squares or the absolute values and of the rates of the median now the media has a property Think about it so is the median of there is the jewels right now so you have a country that is he joins there is it just depends on the choice of beta of course as you change the beta reordering changes and the only thing that matters is the media in this huge world so the large outliers do not matter that's a good idea in my opinion a lot my and there has been proposed in the eighty's by. The gentleman by the mail from Shaw. So the question is but unfortunately this is a very discrete problem nobody knows how to solve it. Right that I nor an enemy exactly that heuristics. So what what is the state of the art the state of the art while the problem is and we have all these problems and behind them and you know I think this is a not informative statement so in your marriage is used roughly we can solve problem with fifty observations five independent variables within your marriage and after that it doesn't work and there are a few risks so I won't bother you now with all the details of this is now more elevated this is now perhaps. A senior class on modeling not the not the junior maybe first year grad Mr So and I won't cover it because I would. I mean I want a new form of easily some but is the same business first of the methods for heuristic solutions. It is a program for lower bounds and use them in warm starts inside the not only the root but inside enumeration. As you will see it really does well in the problem that's a start. And I want to cover the details but this is the exact formulation of this problem I mean the only thing I would like to remember it can be done. And if you follow there's a paper which has been published actually just a few months ago in hours of statistics that does this. Which by the way I have to admit to be honest with our media and most of that is the first person I have ever submitted to and most of the districts this is completely. Not in my home court to say it was accepted and I would say positively enthusiastically not which gives me but in the end of the day people and most of the dislike was of course was like Mark programming in optimisation and that's a corresponding and it's a strong stronghold so you might be one of the first in the programming if not the first in this program e-mails that this is. So so we can say and the same behavior namely that for some other data right you. You find a feasible solution fast and the lower bound takes longer. That's the same picture in other words. What time are we. How much time we have five minutes ten minutes. Then. OK. So I can tell you about the method and the mother this I think a nice one in the sense that So let's look at this so take the numbers so this is the wives variable minus X. beta but this sort of veteran other words fix a beta sorta guys and this is a Q. flood just. By emphasis means after sorting which depends on data. So clearly one value is the sum of the first two plus one minus the. The other two values Now the nice thing is that this function is called VIX in beta. It's nicer let me give you a small it's an easy proof of that. Also And they get out of exercise so I would like to prove to you but if you take as a function of data. Take this sorted largest If you plug two values you some them up is convex So now what have we done we expressed one of the median as a difference of two convex functions. Now let's let me convince you that this problem is equivalent to that so. We have so we say. Take six beta for I mean that. Might surmise the sum of W.I. times these values subject to some of the equal them zero is less and less only go to one. I claim what is the problem with this is a linear programming problem as far as W's are concerned what is a problem going to do is going to sort the numbers and some of them up so this actually gives you the sum of the M. largest. And the w's are not negative numbers. So this suggests. Using your programming that this in fact a convex function. So the details are not so important but so deep in my support so if you express something as a series of convex functions you can do nice things about it you can actually the first part is is already convex the second part in fact leaving out recently a programming problem optimizing minimizing bade over this is linear programming by duality and this is not this is of course minimizing a concave function is tougher but you can do things about if you restrict methods of what I want more than. The details. So again they impact of warm starts is very very substantial in fact more important than before is there is a tougher problem there's a reason people don't. Have any for this funny one Wilson you could do you could solve the problem with thirty variables in here or not come out even to five and we see that but this is the same same story you can solve. So what have we learned about this problem that my old classroom starts to proveable of Kemal of the problems of medium of media and regression to about five hundred bucks in under two hours. You have to realize that if these are two hours in today's computer software that is two hundred billion times of nine hundred ninety. Two. So reason people believed it was completely impractical at the time. But the problem is people don't update their beliefs they form a belief which is true in life we form a belief and then rarely will visit. But if you have a much larger problem and it was ten thousand a year and we solve it in under two hour our performance significantly All State of the opera but we cannot prove optimality in a reasonable amount. Is still not we have some more work to do. So what I would like not to in very many few minutes I would like to give you some remarks of at least my view. On complexity What is complexity Here's what we teach our students this is what we did you write We teach you. I'm also guilty as charge used to not anymore is is the following we say I believe that the theory to have the following requirement. That it's up to it cause to be positively correlated with empirical evidence. We have a theory that is negative we call a because it was not a good theory. Would you say. It's an obvious statement but we don't apply to ourselves. So an example is the simplest method the simplest method. In the complex of the world is not a good theory it's an exponential method because indeed there are some examples in which to measure that it was not done but imperiously the method this extremely good. So this something wrong is either wrong with a complex emotional when used or with programming my vote is on the complexity here is the problem and what I believe is important is that the two hundred billion speedup forces us to consider what distractable. Perhaps something to do remember that is what you used to believe in the seventy's when the carp proposed and be hardness of Forth we can solve problems with ten variables let alone thousands of variables so to develop a theory of characterizing what is different of what made great sense. But we use the same theory today somehow to speed up the piece of programming as two hundred billion did not enter the complex the world reason I find incorrect I mean if I must proceed with I mean I it's and it doesn't in via like this in other words. These problems are intractable we don't even teach the methods yet empirically control them this does not make sense to me at least. Worst we have. And other thing you will find about we are given for him everything scientifically I made. Up that we have a good about keeping him out of you know that's another story. See who is in YOUR of the. So what is my definition of complexity so what distractable In other words in my view what distractable. Is if it can be solved for sizes and times that are appropriate for the application that's what I think. Blessedly matters right to say I want a system in which as the input increases and so forth it has nothing do with reality in my in my experience to give you some examples online trading problems these are all problems and I have some experience about this I'm not talking only hearsay online trading terms may be sold in milliseconds that any of the clothes not make billions of dollars by the need to solve problems like this right so who cares if you if you have a polynomial method or a linear method it doesn't matter the only thing that matters it has to be solved in milliseconds it doesn't look for mention method in fact this sort of men about physically about this organization many others there's a sort of Spock relation for how to optimally schedule the training the method used is an exponential I work from. But it throws in milliseconds in the application in question there are other problems in which it doesn't number one number two. Problems used for planning need to be solved in minutes or even hours you don't need to solve it in milliseconds you're on a bigger problem to understand the policy question you can do it in half a year. Or so therefore you have a lot of invisible you heated with but in the galley they all of you can solve it. So in my view asymptotic polynomials of ability or N.P. hardness is not relevant and there is that in others we have a theory that in my view has no resemblance to the practical realities of the world I see I used to believe I mean I was educated I was in the mathematics department of my team we teach them and we harness complexity one complexity to advance craft taking all the classes I used to believe in them when I went to the real world I said why doesn't it doesn't add up so that's my use my definition and in fact recently Irani been told was visiting and he. Only a pair of I like but act double. Tracked up all right. And then I also have yes. Great your name. OK I made a new friend really. Yes. Of course. I don't have a my my for my tax cut I have my heart of solving a real problem as it does in very our world and for that I will use whatever I have you know if I need it and hardware is extremely cheap now not in the computer with just one other mighty few months ago is of the order it's a thousand dollars and these capabilities could create a computer in one thousand nine hundred nine that the great company had it's that level I mean you could. Spend fifty million on this so if this takes more hardware so cage Well let's use it but these are executive you know I mean engineering and engineering department after all at the moment anyway engineers this sort of practice that we need to do with computers and I want you OK that's my view but I agree with that statement you said I also. Take your vote. But I want to make this comment was for the use of your name sir Spiro you're my friend. So are some other nations. Are. Some of the market forces. And it's also what is the significance I wonder particularly problems and source of electricity OK. OK. OK. Yes we have an hour now. But but unfortunately I believe interpretations disaster you so much so we don't do you think there's a problem to statisticians I find this mind boggling but I mean you know we cross out fields So in other words the interpretation is important because it has implications of what we do what we teach so I think that's relevant final comments. So I believe the problem is that the six in my opinion consider intractable to take a decade ago are now tractable but obviously examples of another ten I mean it's not a beach is the ONLY So the last saw which as I mentioned is not a detail it's a major thing in statistics it is a robust of proper notice parts of the property in comparison to last so could is the main incumbent in my own provides a significant role in taking Sparty as well as a natural predicting I can see not as much but so in my view the time to include my own statics is. It would be a mistake not to do it this is the publicity department that has both you know twenty percent of the faculty in optimization in rough numbers twenty percent it's artistic. It's an opportunity. There are too many departments I have strength in both. So corresponding so that you don't think I don't. I only preach or not do so next year especially now I have a munition with a very young so this isn't coming we'll be teaching a class first year graduate class of the my to satisfy their mother might as well let's call these three lecture these three topics would be three different lectures and the other examples and we have I mean you know hopefully with other people. I can see that for the young people in the audience I know for a fact that this is this is an area to expand I mean what evidence do I have of this this year I have four. First year got the students. I miscalculated that's why you know I went much further than when I saw it on the numbers but but these four people they are I believe a talented people each of them had this specific problem of classic upper arms as this one was caught the other one was discrete is V.M. It doesn't matter but these are problems these are actually four chapters there's a main book Inside these things that we have it might be by Friedman to see it on me and. One. Thing I believe the book is called so I took four chapters of his book. And all of them have to restrict and I assigned. A job for each problem to one of the students all four we made it that is in the end of the year these are not extremely knowledgeable they're very talented but they're not knowledgeable they're not people you know just arrived so I think we made progress A young people made progress I didn't solve the problem they did I posed and they solved it so I think it's doable so when I think many people can contribute on that note I think you. Could you also mention your name so yes dear. Heart Yes. That's right. Yeah. So this is not a completely automated process judgement is needed that is and I believe that's a good idea a human modern era has intuition about them that I wouldn't have so when we solve this problem. He said eight C. is make would rather use glow cross Valley that in fact we have automated this process we use cross-pollination to select what is the right age see but but at some point some judgment is needed I don't advocate that we would take the day that we give this black box and then an hour later we look at it the answer is only some participation is needed for example with star formations I made it to me and affirmation houses sum up some amount of human interaction is needed I think. Yes your name. Yes. So. This is not this there are you can you just mentioned two levels the first level is use them as a humoristic like the rest of my. This can scale to hundreds of thousands hundreds of thousands in fact we have sold the problem emerging that. The number of observations were two hundred thousand and the number of variables were five six thousand variables right so in its case as far as humoristic solutions are concerned is very scalable ask a level as Muslim if there is one is we use continuous methods. If you want provable guarantees I would say the skill ability to prove all of the military is in the maybe ten thousand And because then thought if you want further than that we need. More work. Right so this is I'm not claiming we have done what I claim who teach these methods and they do scale to some sizable. Yes. I. Yes OK Another friend so even if you have it. Also fine with me. When. I don't know this paper over here we had. A fantastic. It pleases me to see the snow that goes as a positive not. Like the hand of a prom motorcycle and. Projection like waiting structures like constructing waiting very well for the. D.J. was the one constraint I guess a woman needs. Like opportunity and I'm just curious with what we have done also I mean you have written a paper has many things I just want to give you a glimpse but what we do also is that there are multiple ways to have pulled me out of the if it's possible the two variables have Local me up but the combination of variables is actually very calm in the air so what we do is we again we use what's up with a good day to day that we would take the solution for the solution. We want two three four and five have been selected we calculated the correlation matrix of the five variables with a conditional number of this. Plea bargain values of this matrix and then we observed you know we used some. Value method to say that sets want two thousand and three should be excluded so you put Z. one plus two plus if you listen to another using a meter of method to do that that's how we do the what you just said. Yes by right. Here worth. Of question please. I'm wondering. Whether. There might be something going on where problems there actually arise. And have some nice structure that might help your program where they might be selling in houses. Rather than. Simply. You know yeah now so you're saying it is possible that problems are rising the world at a much easier number compared to what the complex is here with you know a good having you. Know although I doubt that I mean it you know I've been in this business for about thirty years and in some problems I've seen for many I don't believe that the real world problems are particularly easy or harder so. This is speculation I don't have the kind of evidence of this but. If if if I want to say why is it that you cannot solve we now can solve is really the collective intelligence of a community and I think that's what changed and I also believe that scientifically. I mean science doesn't change these really in fact the. Mechanics of many on even those an approximation we still teach correctly so by the way so and that and what I'm trying to point out especially with young people in the audience that if you broaden the horizon a little bit. You might actually have access to things that you have not thought about and it's our obligation as educators to teach the young people what is possible even if I'm wrong to say I made some compilation I'm not I don't think I am but let's say I am I think is worth knowing that this is a set of methods that are worth teaching in our environments and and I think this would lead to progress not my me but my my all of us. Would really want to go with. You. And I don't believe everybody agrees on this. Critical evidence against.