For the last talk today. David good morning. You tell us about maximum current problems on sports thank you and thank you the organizers for the invitation and the opportunity here just for a long time talk. Talk title towards the end of the day maybe it's not a good idea so I think I actually like sound of his version better than my own version. And I also I guess I want to apologize to some of you who have heard this story before the other hand looking at the program I realize that some of you who've heard to stop before approaching your talk as well so maybe. Maybe maybe not I don't feel that there's enough. Anyways. So before I get to the I before I get to the one concrete model and concrete question that I want to explore. Today I want to set it in the broader context. And the sort of big picture question here is the this is infamous. Gap that appears between. Optimisation problem up to musician value optimal values that we can often compute by nonconstructive means in the random graphs in the random graph settings versus what can be achievable by. Putting normal time algorithms and there's a growing list of such problems including the largest clinker and the K. said Mark Scott and so on it's a really long least of problems not only in the context of random graphs but also in the context of machine learning. Models and high dimensional statistics and one perhaps the most infamous problem in this setting is the problem of finding the largest peak in the density of the strain the graph. No they lived in the dense of the Sunni Graf the largest clique is of this size twice the rate them of the number of notes but the best algorithm that we know of that can find a lot of this clique that around simple enough time is only half of that and since one thousand nine hundred eighty six when the carp falls this is an open challenge since then this has not been improved. So when you when you sort of experience this repeated failure to to to to bridge the bridge the algorithmic gap the new search for answers you want to understand why why is it the case that these problems are hard apparently hard algorithmically. And a lot of insight was. Offered proposed by. Statistical physicists and people in this is working on spin glass model that essentially for the first time suggested that maybe we should be looking at some kind of a face that incision. That might explain it's not proof that this problems are hard in some sense so they suggested that that may be the geometry of near optimal solutions in this problem change is scum placated enough to prohibit at least certain classes of algorithms and maybe even all pull in nominal time algorithms were quite far from that but but that's that's a potentially useful insight but I'm not going to discuss any heavy duty statistical physics. Methods here but one particular insight that incredibly useful is this certain overlap gap property of near optimal solution that I will discuss today in the context of one problem and one problem only and the problem is of finding him mocks him caught in the random sparse hyper graph that about introduced So here's the. Here's the problem. So we have a random hyper random hyper graph on endnotes we have D N hyper edges and they'll just refer to them as edges from this point on all uniform see each edge is the collection of K. on the order to. Note So we have and this edges are chosen uniformly at random and we think of Dia's constants each edge is unordered sequence of keynotes. OK whether you generate them with repetition or not the repetition that's usually besides the point it doesn't matter doesn't change the big questions at the scale that we were interested in. So let me introduce the maximum cut problem in. It's defined as follows We want to partition the set of notes into two groups in coded by minus one and one. And the value of the cut associated with this partition. Is written out in the way that you might object to and I don't mind if you're just to that I might I object to that as well but it's convenient way to introduce marks marks got problems so here's here's the value of the cut with big negative. Some over all edges and on each edge would take the product of the of the values so it's really will look at the parity of this product. So the good and our goal is to our goal is to maximize find a card that maximizes this value. Maximizes this very soon other words we we want to hope it was me. In other words we want to we want to find the petition that makes the number of the number of edges where this is product as negative as large as possible in the number of edges when this is supported piece positive is little spots of all but in fact. OK So this is not the canonical Mark Scott problem the you're familiar with it's in a way it's a center of Mark Scott problem and it turns out that the marks caught in the case one k's equal to two you can read off from this version by just adding and subtracting been a number of edges the inside two partitions so what in fact what isn't being measured here is not a cut for the case k's equal to two is not exactly the cot but the cot minus the number of internal edges in two parts. So we want to maximize the use in that essentially if we learn how to weave know how to maximize this we know how to maximize the original cut. Point. For Capel to. Finding the marks so we have a good approximation for the values in L. also all talk about those even in the case for K. you for the hyper graphs but as far as they all get it and so are concerned we don't know whether we can get as close to the. Optimality as as we want for graphs for hyper graphs We will show some negative results that one of the goals of this is the growth factor No well the get factor of between the optimality. No it's it's. It's it's not constant time. For growing a constant degree but growing you can make the gap smaller and smaller I'll get to that's all I'll give some in facts I think now let me give some background maybe this will answer some of the questions. Sort of. Yes. This is in other. OK. OK good two to two and one answer to two questions first. So this is this is the goal and we want to say something about the objective function. Sort of say something about how does it scale with the number of notes with the degree. And also say something about very quickly sort of can we get near optimality or not just Sanders was asking for the case of for the graphs. Sort of jumping ahead the reason we're looking at the hyper graphs is because we can say something about hyper graphs that we cannot say about graphs and that's sort of the limitation of the methods and OK so when one key is in because it's as you observed it's a because it's X. or set when case are this problem is trivial because making it minus one everywhere is makes it. In the car to every in on the. Hyper if you have very negative once or you quote unquote cut every action objective function is the sense not interesting still the smallest normal graph setting would be one case even that means at least four and that's the setting we consider. It's it's not hard to show. That this scaling of the largest got as N goes to infinity and as D. goes to infinity but in a different or the first you take end to infinity and then make deal larger and larger is of the order of magnitude it's linear in then and a square root N N A degree OK and that's it's not hard to do and. OK then the question is What can can can one say something about the constant in front of this of four now it's an existential question. So. We will look at this question in context of graphs in two thousand and four got some. On. That bound was that bound is. Algorithmic and there is a fairly straightforward algorithm greedy type I'll get it and that will get you to the right skinny I'm not sure it's exactly this one but they will get you to the right scaling or squirt of the this worth of the appears because if you do a random partition then the number of edges cutting across is roughly Gaussian with parameter Gaussian. With its poor song with mean with mean value of these so with playing by playing with fluctuations with standard deviation squared of the you can take advantage of that and create a cut of the size roughly scored of the and it turns out that just by first moment method you cannot do much better than that then so that's the right skin. And that scaling prevails even if you have your in the hyper graphs even when case known to but what's to constant. You can get some bond here and. We've improved this bound somewhat in the more recent paper using existential mean and explore as far as existential result our concern and case equal to two for the graph setting. Pretty much a complete answer was given in the paper by Demba mountain are incensed who's who obtained the factor up in the constant in front of square root of the times then and that constant turns out to be numerically about one point zero seven and the origin of this constant I'm going to. Comment on. Assume. OK this is this is a existence or algorithmically for the graph so we can get. Some numbers definitely smaller smaller than this so the thing is that going back to the square. If if these increases then the gap between the algorithmic and existential big gap drops at the rate of one of a square root of D. so it's not exactly. In some type setting it's the game is slightly different here. OK so. So. First and since then I'm pretty sure that the that is improved algorithmic result although I was not able to quite the a nail that in the literature they all get at them is actually quite simple that you go note by note and it states you have some partition that already is there you just simply put a note that into group that maximizes the implied caught by this by this assignment for this for a for the current node at such a greedy type algorithm and turns out that every time essentially what you're doing you picking a maximum of two Gaussians and maximum of two Gulshan is roughly like standard deviation that's a good deal and that's how you get. You know any other questions. OK so where is this constant coming from it comes from this theory. And since the work of them will mount an audience send this was extended to quiet several other papers throughout the models. In the end the idea is that you relate the model and sparse of the shiny graph to a model which is called the Sherrington Kirkpatrick or in general case been glossed more than the more those various The analysis of the models involved but the model itself is very simple so let me let me. Tell you what it is to remind you what it is if you've seen it before so we are now looking at sensually at the complete cave uniform hyper graph what on every key. On every edge which is. A set of notes you have put their weight which is just a standard normal Gaussian So now it's like a complete K. uniform hyper graph and every edge is associated with a standard normal Gaussian. And what you look at is that you look essential at them largest. Mark Scott in this model where you partition the notes into two into two groups. And but for every product of the once and minus one you multiplied by the associated weight so that it's even less of a familiar kind of a Mark Scott problem because the Gazans could be positive and negative and natural This products are minus one and one in any event you want to arrange plus and minus once in such a way that this total sum is as large as possible. And that's a that's a canonical object in this spring was theory. That has been of interest. And thanks to the long serious so for their paper starting from the really groundbreaking work of. And then telegram and punching and others they have managed to nail the the value of this optimization problem. As well as the scaling to result in this case you need to scale it by and by the. By this amount that turns out to be the right value for the scaling and the limiting constant exists and can be in principle solved numerically and that value. That live it is precisely the limit that the tears in the in the marks cut in the key for the Capel to two and as it turns out for the general case well. So in summary what what what we know is we more or less know the value of the marks cut problem on hyper graphs for. Degree selfish smarts could ask for degrees sufficiently large. But the focus is on the algorithms and so on and no doctor. Program. Or how the does it grow with K.. I'm not I'm not sure they have analyzed because even computing that involves solving a certain partial differential equation numerically and that's a good question so. But not not sure anybody looked at that. I don't want to get into that turns out that I'll turn to two way of doing that is that it is a like a countable sequence of measures over measures and you need to do some more of its. Not but we're stuck it's complex. It and I frankly I don't have a very good in intuition and understanding of that I had a few slides about. About the method of connecting. How do you actually get from model this sharing can can Partick speak with us model to the sparser and then graphs this short one of these the in the pollution method. But I'm not going to spend a lot of time on that the idea is to build a mix model which on the same set of notes has a partially the original sparse random graph more though and at the same time populates is with this mean field model with a galaxy in. Hyper weight hyper edges and introducing some intercalation parameter that the that continuously sort of. Interplay between the random graph model and this being with us model and then somehow controlling what's happening with the ground state and so on and so forth let me in the interest of time not do that offline like happy to talk about that yes. Yet this. Yes so so this is the Interpol lation this is the sum of two her Milton Yes this is simply Mark Scott on the graph this is. This is the Hamiltonian associated with this last model so it's. This Yeah. Yeah yeah I me other questions about the slice that I was not planning to. And that's part of the into pollution now I want to I want to turn in to stay focused on the algorithms and the first idea that I. Want to introduce is the idea of a Kells that goes under the name coast which is if quite a fascinating phenomena again discovered and started primarily in the context of spin glass models but it's equally easy to describe it in the context of random graphs So what's the cause. And how's the described in context of random graphs consider Cup two versions of our of the shiny graph G. one G two on the same set of nodes. And and only with only. A certain number of shared fractions. In Specifically suppose this two random graph share a one line is dealt a fraction have one one is dealt a fraction of common at it's its is it to generate this graph but the but so that each of them marginally just looks like a graph so says it to generate the graph you start with just one version of this can often do you simply dissemble each edge with probability Delta independently OK and then some of the edges are from your old graph and some of the edges are brand new and the one minute fraction of them is common. So these are like two graphs kind of alike. Sisters when when one Delta is equal to zero then it's natural is just two core piece of an identical graph and one Delta is equal to one then these are just too independent core piece of them graphs both of the same collection of notes but of course it's edges that make graphic. Interesting no no no not notes So here's the serum that was in the came as again again a culmination of serious the worst but in the sort of last form was proven by Chen and punch Enco in two thousand and seventeen called The Kills this is what it says now because I it might be hard to parse what it is let me just say informally what it is and then connected to the formal statement basically it says is full that this is the following fix any delta. Even Dell love Dell to be any small as small as you want point zero zero zero zero zero zero one. And considering though this couple. Versions So in other words the two graphs share a ninety nine point nine nine percent of the edges but there is a little bit of idiosyncratic edges that each of the graphs has consider Mark Simone caught on the first graph and consider the maximum caught in the second graph each of them is a partition of notes into two groups. And look at their inner product the inner product basically how how much they overlap how much the grid. And naive really want would expect that because you perturbed the graph just a little bit only one zero point one percent of just have been perturbed you there should be a card which is new almost optimal and second graph which agrees almost everywhere with the first go but that turns out to be extremely far from being true it turns out that as soon as your birth or fraction of edge is the optimal every nearly optimal solution in the second graph is nearly our fog and all to the every nearly optimal solution in the first graph. Just. Rut. So for. Every it for every Epsilon and dealt a civil suit I mean I. Know you can but you know because you by making this. May create a little careless in my computer so I don't I don't I don't believe so what what what Depends or both of the parameters is how close to optimality you want there is another there is another complication it's a sort of come complexity in the statement is that you need to make the degree large enough so that exist for every absolute Delta did exist large enough but I'm going to gamma gamma in cause the proximity to optimality. And lodging of these such that is as long as you look at the graphs with degree larger than the zero. And two solutions in two graphs one nearly optimal in the first graph one nearly optimal in the second graph there in a product will be there will be epsilon orthogonal. So. OK So that's the. Result. And I already said it in sort of. Human terms what it means any questions. OK so. Just a quick word about the proof of this is the proof to show the something like this happens in the spin class model itself and then Interpol it from us model two to the hyper graphs a random type of graphs. The second property that I want to share with you. Which is also geared towards the stablish a negative result is the overlap get property that I have alluded to earlier. So for now let's just fix one copy of the graphs and when I'm going to look at it this SR is just one of the friendly graph. And. It turns out. The following is the case and for that we need the really the graph to be hyper graphs and. Therefore the number of notes per should be at least four. That exist to two numbers tower one and tell two between zero and one and a certain parameter which is typically smaller than the up optimal parameter. Such that the following holds for all sufficiently large degree D.. So what holds is that if you take. Every two nearly optimal. Cut where loose and nearly optimal means that sigma on the supported that normalized appropriately gives you at least a value that much which is still below the optimality but may be close enough and the any other solution and if you look at the their overlap in a product. Then there overlap is either at most one or at least two. In human words what it means is is the following so take two nearly optimal cuts in this. Graph and see what percentage of notes the egregious. And what this says is that the percentage of knows their greed is either extremely large is large or small. So if let's say toe to is like point eight or Tower One is point one It means that either the agree on the least eighty percent of notes or they disagree. On or they agree on at most ten percent of notes but nothing in between this in. There's nothing in between you cannot find two nearly optimal cuts that agree on thirty percent of the notes. This value style one and toe to depend on how close you get to the up the military but importantly for what I asked for the algorithmic implications they do not depend on the on this. Degree so they. Overlap the overlap gap that is created by dis to barely one told to survive somehow no matter how large large and large a degree you take. Yes. Me. A Yes By making dumb a closer and closer to gum a star. Then. It's not. A Yes A Yeah it otherwise none of that none of this is that curious right. Right none of this is vacuous first always I can start with the cart and stupidity of just a small percentage of notes so that I still stay nearly optimum So that gives me this part and that is sufficiently many of them which are nearly orthogonal to give me this even. In non Backus. The high probability is for that with respect to the randomness of the graph there is no other the randomness here. Yes with high probability with respect to just the randomness of the graph every Gama optimal solution and every other gum up the most solution. Satisfy the property that there overlap is has this discontinue support. Why. In the various I try I don't have an intuitive Explain of that but but I have some references to the literature and that that phenomenon is not unique to Mark Scott It has been showed up in some other models as well. But in other what models is more straightforward. To show that as far as why is that the happens I don't have an explanation maybe others do but big but this phenomena helps. Establish the negative results for algorithms and that what we want to that's what I want to discuss next OK. It turns out that you can put results together so that we have a sort of combined version basically it says that even if you take now to couple. To random graphs models which are coupled any fraction of edges and that could be zero or zero percent or one percent that could be completely independent of completely identical This overlap get properties survive so even the couple of versions take nearly optimal solution in the first graph and in nearly optimal solution in the second graph their overlap is either more star one or at least out to. It and that turns out to be the case the the proof is by combining the overlap get property in one copy of the graph with a careless property so the care combined with this gives us this and that turns out maybe for the reasons that are not there not clear now that turns out to be what we needed for the algorithmic implications yes. So the ODP theorem here is not chaos chaos usually refers to perturbing the graph itself this is about one graph but it turns out by combining with chaos and this essentially get their G.P. in when the coupled model and the end it's I highlighted the word all here and for us it was important that this is this literally means all for all Delta starting from zero all the way to including zero and including one could be identical to graphs or could be completely independent two graphs that property holds any other questions getting later in the days of you you got that. Ninety percent happy but my the stuff OK So so let me continue. So while I don't have a good explanation proof technique. Let let me let me skip what I wanted to say is that while I don't have a good explanation why something like this should happen this phenomena was suggested and then proven by a statistical physicist earlier. In the different context and different models in the different versions of the statement usually it's was described in the form of clustering or shattering for the so called random constraint K. set model but the idea is. Very similar Why is it the case maybe physicists have good explanation I I don't what it perhaps not surprising that when you have something like that that that complexity of the geometry of optimal solution should indicate some kind of a hardness when you have the behavior like this when your optimal solution for extremely sensitive to the even tiny percentage of the edges of your graph perhaps it's not surprising that it's hard to find nearly optimal cuts. So that's a that's a guess. And now I want to introduce a class of algorithms for which the. Emergence of the overlap their property implies some negative results. So this this class of algorithms sort of loosely speaking we would refer to it as local algorithms formally These are called the factors of ID. Practice or by id type algorithms. A question like. That with like you story if something goes. Up. Easily. None. In a sense what the shattering the overlap get property sort of suggest kind of the opposite in the sense that you might be pretty close to the solution but to get to even close it let alone the optimal solution you almost have to go down quite a bit before it gets to be better quality solutions so it's like extremely extremely. Complicated landscape of solutions there's lots of local optimums with the various qualities some close and closer but every one of them is still quite significantly separated from higher quality solution and the optimal solution so that and that with this picture it's going to kind of becomes almost immediate that something like Mark of Cain want to call a method would not be a good. Method to solve this problem or it so let me introduce the class of algorithms it is it is not Markov chain Monte Carlo method it's something else. So the the. Idea of the algorithms or called Factor society is to code algorithms which make decisions entirely based on the local neighborhoods of a graph so it's it's an attempt to construct for example cuts where you decide whether the no there lose one or minus one just by starting a small neighborhood around your note and just observing this neighborhood then maybe with some additional in the musician decide that this no this one and that no this minus one. And two go that let's call that more formally define it more formally we fix a certain radius R.. And first we consider the set of all trees rooted to release with depth R.. And suppose on this set of trees we. Have a function. That maps the following things so we take the interest from. Internal from zero to one to the cardinality of the number of nodes and the function maps each element here to the minus one and one. So what US dysfunction stand for and what is this meaning the meaning of it is as follows If the notes of my tree are associated with some randomly generated uniform uniformly randomly generated weights then this function will tell me based on those weights whether the answer should be decision should be minus one or one. So if I generate this nodes uniformly at random you can think of it as the random seeds in your randomized Elgar written that OP one observing as a radius R. neighborhood of a fixed note makes a determination whether the notice one or a minus one you could have a function which it completely ignores the completely ignores the random season just makes a determination makes a decision just by looking at the. Graph structure on your note what will make it the more powerful class a fellow get at them by letting it to have that in the MS Asian. And no. We going to use we use this function to make a cut decision so the cut decision is done as follows We make associate uniform random weights with every note of the entire graph so generate an end of variables uniformly at random I.E. D.. And then for every note we'll look at the our neighborhood depths neighborhood around this node we typically observe a tree because there's a sparse random graph. We observe the three on which nodes are associated with some random weights generated here we use this function and that function tells us who is one and who is minus one. So that class a fellow get it them's is called. Factor of a DA or show local algorithms for short. Any questions or confusions about the L. that yes the function can depend on that it could be that if I have three like that I use this function if I had three like that I use that function. A Not in this setting so we're sure that it's somewhat iffy isn't equivalent that as long as the only thing that matters is the policy of the three itself and the random the creation of the nodes Yes yes. Just so you. Mean you know. Say F. is fixed yet. So suppose we have this this function this function just maps the decorated set of waited three waited with values into minus one and one this is just some fixed function function so let me describe how we use this function to make a cut in the random at the shiny graph so the way we do that is we for even every note we need to decide is minus one or one with for every node we look at the art of Depp's neighborhood and on this note we observe a tree that trees decorated with a random weights we apply this function and it gives minus one or one and we do that for all nodes of the graph some altering its Lee. So that creates kind of a sort of like a correlated set of decisions on the entire graph. So I. Think my what when did that start actually. Five. Minutes to five and you have about five minutes five minutes OK so that. I have some. Kind of a historical bullet points where this algorithm comes from interestingly perhaps it's originated not in the sirup algorithm but in the regarded theory but then through such a line of work on Gaspard graph on the graph limits percolated the field of and in the graphs. Given time limitations let me jump to the. Jump to the conclusions. The conclusion and put put put to sort of informally says that the local algorithms defined to light like that they cannot construct a nearly largest caught in the random hyper graphs. With the uniformity at least for. Me. To be more precise that exists a value which called gamma algorithmic which is strictly smaller than optimal Gumma star. And there exists a large in of the zero such that as longest Well naturally this number of nodes go to infinity and there's long as the degree of the graph is at least does you know. Every cut producible by the local algorithm. Normalized appropriately will get a value at most Gumma algorithmic that's not me I think. OK. At most gum augury it make and that constant historically smaller than gamma star. And that now putting that formally involves several soups Unfortunately the first is just a set of those going to infinity the degree going to infinity that's natural perhaps the more most least natural object here is taking the supremum of the radio of the action of the local out. Rate them as well as the any function that is used within the class of. Trees with depth are so that says that even if you decide that you're going to use the best rule out there for trees with depth thousand and among those you will find the best somehow find the best decision to you half this will still come short of the melody even approach upon the proper normalisation. So. I do have a proof but it's not a good idea to do to go through the proof sketch no given that I have what's known three minutes two minutes left. Or it's two point five minutes so instead. Let me just share with you some some some kind of key key key things that are used one of the things that we use is that it turns out that optimal cuts in nearly optimal cuts turns out and this model are nearly balanced and there are nearly those in the sense that the number of ones versus minus one says is about the same one can quantify that by giving a more precise bound on the imbalance turns out to be a poor normalization by the number of notes the ratio of one versus minus one D. case at the rate like this as degree grows so that turns out to be important. And the idea of the proof is. To use Interpol ation on the starting from completely. So we fix a particular incarnation of the local algorithm for particular edges are a particular function F.. And what we do is we started with two independent sources of random the X. X. and Y. as if we were running this algorithm twice on completely independent copies of two graphs so we have one graph at yes and the other thing is that where is that we're going to run this algorithm in analyzing. On couple versions of two grabs at the beginning suppose we have two independent versions of the graph and two independent rounds of this algorithms and. Because everything is completely independent the two carts that are produced are completely independent themself both of them are nearly balanced so you almost create like two completely orthogonal cuts that way each of them fairly balanced and they are in the independent In other words nearly orthogonal to each other. And then what we do is remember this couple version of graphs when they start over they share certain fraction of edges we start moving these two independent copies and merge them to each other by making them forcing them to share more and more fraction of edges by was sampling let's say one edge at a time but while we do that we also started resembling the sources of the random seeds that we used to generate the cuts so at the beginning X. and Y. are independent but now we're one of the copies we keep X. but for for the second one. Where is it we used a certain fraction from the first source in the certain fraction from the seconds or so that in the end we have a one that on of the algorithm on one copy of the graph is the beginning they are completely orthogonal to each other in the end they are identical by construction it turns out that that Interpol ation is scantiness enough in the in the intercalation parameter Delta so that when you look at the how this overlap between two solutions evolves as a function of the of the commonness of these two graphs it's a continuous function starts completely orthogonal independent ends up in completely identical to solutions as a result it should. But is that. The have a good as as the result it go through all the stages of the overlap starting from zero all the way to one and therefore at some point it should hit the region which is prohibited to region the overlap get Bridgette where we know the nearly optimal cuts. Not exist and that that gives the that's the give the contradiction argument. And. This is this is it I have some open questions in the work in progress that perhaps better to take offline in the thinking. Yes turns out the care theorem is also holds for cable to the overlap get property does not hold for Cakewalk to to at least not conjecture to hold that if you look at the overlaps it turns out to have a continuing continue continues to support on the overlaps and that implies that means that that line of argument cannot be used for the Scots and graphs as opposed to high progress. Yes And in fact this is not the first argument to to show the overlap gap implies nonexistence of local algorithms first we use that in the context of largest independent sets. I see right so you're bringing X. or because it's solvable. Yet so X. or write X. or is solvable and then and it still would mean it it did thus exhibit overlap get property in in a certain sense it just means that you cannot use local algorithms to do that but Gaussian elimination that you would use to solve the problem is non-local it's. That that fine you're only optimal solution or what you haven't mind. I think. You know. Thank you thanks.