Thank you for the invitation so I'll be talking about the turn to work with three D. bus who are new about Paris and then some other works with you while. So the general theme is going to be results that relate mixing times and the cut of phenomena to he think time self set which are worse in some sense so this results are kind of abstract by themselves but there one can use them to prove you know Father theoretically results are so to construct counter-examples in which. You know one control is to be ever of mixing sorry if you think times which is they are pretty early easy Your And then the deuce corresponding B. F. here is for the mixing times. In But actually in some cases one can actually analyze mixing in cutoff using keeping times and we mentioned the K. So if my question on trees another class of example is the simple London walk on Roman Nugent graphs but I want to get into that twenty and prove cutoff for them just by considering he think that it's an OK So P. would be the transition me three X. a piece of T would be the transition probabilities of the continuous time Chain time he saw be working mostly in continuous times but everything works in discrete time is a stationary distribution about the state space which is going to be finite throughout and we consider only reversible Markov chains. Biden's condition all. OK And then for a function from the state space to are we defined in the usual fashion the word the corresponding measure is the stationary distribution and then for Assigned measure of Sigma when the find the P. norm of the same measure just as the norm of the density function with respect to the stationary distribution and then we can define the Cylon LP mixing time as the first time T. at which distance from the stationary distribution of the distribution of the chain X. time T. is the most silane when we maximize over the mission state X. and when upset on equals one half we just emitted from the imitation So another important notion of distance is the thought the variation in the stance which you know we can write the it is Moxy more mula a new way but X. really just one half of one distance and then the sudden thought that evaluation mixing time we denote the by side. So generally the mixing times are non-decreasing in B. but under reversibility we have that for every P. a strictly larger than one that L. P. and L. to mixing times are the same up to some constant depending on the P. So I'll be switching between two and L. infinity all the time but they're actually just the same up to some constant. OK but this is actually not true for everyone else too so the thought of addition mixing time might be a small of all of them and to accept that I'm. OK So one of the things I want to talk about is the cut the phenomenon and this is a property of the sequence of Markov chains and it's the condition that the rush of Cylon in one man is the sun and makes in time ten to one as M. goes to infinity for any fixed Cylon. So when we cutoff the graph of the worst case distance from stationary as a function of time looks a sympathetically like a step function that drops from one to zero the mixing time. So. You know it's because a cut off is not the only topic I want to talk about them not going to go through all this example is a really the point of this very partial list is kind of to demonstrate that cutoff is very common it's actually more the whole than the exception so maybe it's more appropriate to give one example for which there isn't cutoff and that is simpler than the walk on the end cycle or more generally a fixed a mention of side length and. OK So although it's very common usually very finely rigorously is hard and requires some very detailed information. Sometimes on eigenvalues or really understanding of what is the bottleneck for mixing. So I understand in Seminole paper from eighty six the problem of finding the abstract condition which imply cutoff. And though the cut of phenomena received much attention most of the progress was done through understanding examples rather than developing general theory and so one Progress was done by terrorists who proposed the product condition which is generally a necessary condition for cut off and that's a condition that the product of the spectra get and the mixing time they verges. But unfortunately this is not a sufficient condition and Miss is a relevant example as were constructed by us and by. And so with Buss who in Paris we showed that you can characterize the cutoff in terms of concentration of times and you know the precise statement would be a state that later on. So is the state not if it turns out that the condition kept your is a cut off when you look at the L two distance so you can define the notion of cutoff with respect to every notion of these things so when we do it with respect to that distance it turns out that cutoff is equivalent to having cutoff in LP For larger than one and this is equivalent to deceive aren't of the condition when we replaced by the L. two. And so this is under versatility and this is due to change in. The. OK So this is really why we look at the cutoff with respect to the total variation distance because it is in a sense not interesting when we look at the distance for the larger than one OK So as I'm going to describe further it's Classico that took that evaluation distance sort of mixing time can be understood up to constant factors using and Nitro to probabilistic concepts such as heating fans and such as a stopping times whose distribution the distribution of X. at the stopping time is the stationary distribution and the total valuation mixing time is also very related to copy which is another neutral probabilistic concept. Converse Lee. We don't really have any natural probabilistic interpretation for the L. to mix in time right and moreover we don't have any bound on the ultimately found which is always sharp up to a constant factor. So similarly we don't have really a probabilistic interpretation for I per contract if it is weather and OK so we various We showed that both the Earth to mix in time and the local constant can be characterized using heating times up to constant factors. OK so we say that a parameter is robust if changing the edge weights of the chain by a constant factor can change the value of these parameters or only by a constant factor then similarly for random walk some graphs we can talk about a bus in this Under off a zombie trees. And so the point is that the methods we have to control the Infiniti mixing times are using a whole bust your metric quantities like a the Cheadle constant or the eyes of parametric profile the specter of this profile or the locus of constant but these are all analytic quantities which are robust So it's very natural to ask whether or not that mixing time itself you know infinities robust OK so this was because my but also by various other authors who like they are consistent set of course and. Once you understand that infinity mixing time is the same can be understood in terms of fitting times then you're in a very good point to construct a counter example because you only have to show that mixing times are non robust which is easier. So this shows that it's impossible to obtain a shar bound on this quantity which is non robust using these analytic quantities which are robust OK So they said Classic because that heating times are related to anyone mixing a under reversibility so. Early work of others says that variation mixing time is up to universal constant the same as the. Time they set times the size of the set the stationary sides. And this was recently refined by Paris and so C. and independently by only fair R. who showed that actually it's enough to consider sets of sizes one half right so if the sizes of one half we don't need this here. Yeah. Yeah and we must surmise over initial states an over the oil target sets. No no I'm thinking Mark Seymour. Yeah. So this means we're starting that day wrong story. You know is the term the stick will take worst day. Yeah thank you. OK so. So the following example is is very useful So first of all it shows where the threshold one half comes from and I mean the threshold right here on the size of the set so if we consider sets that are larger than one half then we would heat them in constant number of steps and this would not capture though that of the mixing time and. This is more important point is that actually here the relation between heating and mixing is mostly fine then just determining the older of the two objects and here for every time T. is up to smaller all their terms their worst case since from stationary the time T. is just half the probability of its size you know if it's of size a half plus Epsilon and vertices. Then it would have a two step Cylon and the vertices on each click here so in expectational would be one of a Cylon. And. Yes So the point is that the Cylon is not allowed to tend to see what we will fix it and. So so so that was just this side comment that in this example we see that we cannot improve this we can replace one half by a quarter but not by three three. Quarters. Where you are. A. Little late yeah. So. So. You. Are you. Close. And. Yes. I want. OK. So. The point is that I'm making here and you know here this is a really simple calculation you can do by hand is that suggest that maybe what we should look at is instead of these expectations look at the tastes of times and maybe get a more refined connection with the Nixon times so this multivariate defining a mixing parameter using tells of hitting time so hit the Cylon is a ten day maximizing over initial states and then for a nation state X.. Hit epsilon parameter is the first time T. at which every large set is hit with probability a case to one minus the Cylon and when I say large I'm saying the size is at least one half OK So it turns out that the son took the evaluation mixin time is the same as hit the Cylon this if we are willing to change the steps Cylon by a little bit. So they'll tell you smarter than a Cylon and one man to subside on and this is up to an additive order is the inverse of the spectral gap. The the local time agrees with the expectation Yeah yeah it's different. OK So so I'll get through that again soon but the result I want to talk about today actually an extension to the infinity case so you know this you know maybe he finds earlier results but in that sense is maybe less surprising. But so this character is ation of infinity mixing times using heating times and the claim is that for reversible Markov chain stolen from the mixing time is up to universal constants just the first time by which every small set is escaped from with probability at least a one minus three half the size of the set. OK and here it is again using math. Instead of words. So the point is that one their action is really easy namely the fact that the stymie is the lower bound on the mixing time. So if you assume that this condition fatalism for some initial point and some set. Then clearly the probability of being in a time T. is a key least three halfs of A but then using the beach in order principle there has to be a point inside such that the probability of being in this point would be at least three halfs a PI of a. And from that just using the definition of the infinity distance we see that we're not mixed So this is in the sense a very Eve way of trying to bound the mixing time but it turns out that this actually gives you also the reverse direction. So another result we have is that you can also characterize the mixing time in the relative entropy distance but then you need to change these three have a constant over the log one of our. And so as far as we know these are the first sharp bounds and. Mixing times. And OK so here is a very simple a corollary which I'm not sure how you would prove otherwise and it's that the mixing times so we suspect to both total value of entropy and infinity of discrete time lazy chain. Up to constant the same as the one that a state acts as are the probability P X where the peer are uniformly bounded away from zero and one OK so if you believe that all that matters is hitting times rather than mixing times then of course that changing the odds in probabilities shouldn't change the heating times distributions by more than a constant right so this sounds like a very simple fact so you know maybe you don't need all these to mix machinery to prove this but it turns out that this may fail without reversibility So a counter example is due to books South Paris and city. It. So I actually wrote it in a way that can allow laziness but yeah I mean you can assume piece not lazy so this is why I wrote it like that that even if P. is lazy then doesn't probability wouldn't be the X. it would be P. X. plus the times the ordering of it. Yeah. Yeah. Yeah. So here we are only allowed to add loops or weighted loops and in cosmic conjecture we are allowed to change the edge weights or a do rough is only three so this allows us to replace the edge state by two edges so this can actually really help us they wreck the walk to do walk in direction it wouldn't work either wise and then we can kind of set up traps to eat but just adding loops just makes it spend more time at each vertex but it kind of follows the same trajectory so it doesn't effect heating times. OK So again this is a more precise version of the total valuation inequality Iad So it's the same as heat a plus minus don't up plus this term was over the is the inverse of the Specter you get. So one comment I wanted to make is that this term is generally smaller than the two other terms. For reversible Markov chains. But you know when you consider the case that the product condition or its so called This is the condition the T.T.L. is lifted off the mixing time which is a necessary condition for a cutoff and then you can just kind of forget about these two terms and then it really tells you that the meek so upset I don't like heat up side on. Under the condition this is a much stronger inequality. So a conjecture I have is that you can actually go beyond reversibility and then carked arise the order of the mixing time up to consent is the maximum of a heat at silence heat quater and this parliament to hear a call to star which is just an extension of the notion of the relaxation time so it's the first time T. such that device. P.T.S. is at most one over East Square a device for any function F. So if P. was reversible than this would be precisely the read exception time OK. So yeah I think that this would be pretty interesting one. The mini most time you need to heat every set of size at least one half with probability at least three quarters and if you can take the previous one which is the notion of doing it in expectation. OK so. We can define the notion of hit the cutoff but just imitating the definition of the evaluation cutoff where I said yes requiring that this rush one in the limit and then the result we have is that the evaluation cutoff under reversibility is the same as heat cutoff and this is almost the media from the two sided inequality once you realize that a cutoff implies the product condition OK So we showed that but for thought the relation cutoff this is known so you can assume the poor the condition or its cause otherwise need their cutoff of course OK and then under the condition. You can forget about these two terms and then it becomes definition chasing. So it's this condition that So this was kind. Of a distance from stationary were the only test we're looking at is is there a set that there is large probability we still didn't hit it OK. So we have OK So there is a mole. In a sense need to a version of this when you fix your starting distribution and you only want to understand whether or not you have cutoff for distribution or you know abrupt convergence so for each. Distribution you can look at the set that. Of size at least one half that expected heating time of the set these maximized OK and then. The condition becomes whether or not the heating time of the set is concentrated. But this doesn't work when you don't fix your initial distribution so this is a bit soft that. OK but then it's really about saying that war sets have their heating time is concentrated is cut off. OK So the next result I want to mention is about characterizing the logs so will have constant So first I need to define the spectra to get a so it's just the smallest eigenvalue of the N. But then given by this variation of formula I involve being the form and take of all functions with variance one. And by the point in the quality when we scale time according to the spectral gap then to distances from stationary K. exponentially and also variances the K. explained in Chile when we applied them P T two functions. So there are some one of consent these defined using a similar external characterization but we know me I saw the entropy if square is one this is the film enough for the entropy So this is important for two reasons the first one is that the carked arises I per country give it the so it's the. Largest beat us such that piece of P. is a contraction from the Toonami to the one for beta T. norm for every T. and the other reason is that it captures the N. infinity distance up to the studio global fucked or a where I guess the lower bound here uses reversibility So one of the licks ation you can do is instead of having i per contract to fifty full of functions you can escalate for indicators and the following these even the farther Alexei Sion it's a hitting time version of five per contract with the and all I want is that the probability office keeping a set starting from station earthy would be some constant times a the size of a race to some exponent strictly larger than one so the reason I'm saying this is I per country. Because I'm raising the exponent to be larger than one and then a pretty easy which I don't want to do tells you that this is actually weaker than usual i per contract to fifty even for indicators. So. This idea Fahy think I per contract with the Month events looking at this picture and get perfects this because calling it a is just the smallest I didn't value of the restriction of to a so it's kind of like the generate her of its restriction to a of the generator of the chain killed that one need to scape say and the point is that. They kept her is the specter will get one we take a minimum of a smaller sets. So if we do not by conditioned on a by eight then the law of the reverse ability that of this keep time from a starting from condition on a a cause it's a completely monotone law which is a fancy way of saying mixture of exponential and the smallest parameter in the mixture would be precisely the spectra to get off a OK So this is the general effect and this gives us a tail esteem by just plugging in the head of the exponential is to be. OK And then when we were here we had it for conditioned on a beast immediately translate into the meat condition was starting from pi right because we have to not to scape A We first of all F. to start in the set a time zero which happens with probability by eight and then we can use the previous A teammate. OK so if and now we consider a weighted version of the spectrum get we wait things according to the size of a so we punish smaller sets more so we divide by the log one over the size of a. OK and then take in from all our smaller sets so this is exactly what we need for the hitting time version of five per contract evictee. Know it very easy so as the probability of not escaping a by M. of a starting from station earthy so we plug in this estimate but then a can sense with the Lambda a here and we're left with this log and we get some exponents larger than one which is what we want that is OK so you know this is just a part of the proof but. I wanted to explain this more for intuition sake so together with Paris we show that this parameter is the same is the locus So one of constant up to a constant factor. So. Then we had this in this general lower bound under reversibility a whose proof is very analytic so you'd want to have some probabilistic explanation of this inequality and the proof of the previous proof was using Stein's in TURP relation theorem for a family of finality cooperators. So now we can understand it using hitting times so for every set a there is a distribution called the quasi distribution on a such that starting from the distribution of this time is exponential. With barometer of the spectra get of A and then what we had before is an inequality becomes equality and so I mean we already saw that. In order to be mixing L. infinity we need to have that tail off not escaping a set would be proportional to its size so this shows that right so we already know that the time that we need to get that is the Laura bound and because of that we see this is it or just from hitting time considerations so I'm referring to. To this so we saw that this this was trivial this was a one line calculation to show that this is a band on infinity mixing time using that we see that. One over. Is a lower bound on the fittings. OK. So remember we had this product condition which is necessary but not sufficient for a cutoff and then generally a problem is to find large families of Markov chains for which we do have the simply cation that for the condition implies cutoff. So I want to talk about Markov chains and trees so this I just mark of change such that the graph supporting the transitions is a tree. And then with Busta in Paris we showed that if for such a Markov chain as long as the Cylon doesn't tend to zero two up at Lee. The difference between the Cylon and one minus the sun and mixing times is that most some constant times. The Joe metric mean of T. Meeks until there are. No good side on. OK so in particular a tells us that inside the cutoff window if we scale things by Joe metric mean of the mix we have subclassed in convergence to stationary thirty. So generally you can't do better even for birth and death chains. In another comment I have is that actually the condition that the chain is the tree can be. So we kind of can allow it to take bounded Distin jumps on a tree. OK And then another result we have for trees is that the end infinity mixing time is up to universal constant just the maximum of the L. one this total of the thought of a time in the inverse of the locus of its right to require that this is always the lower bound and an infinitely mixing time but here we have the converse inequality. So earlier it was shown that first trees the thought that evaluation mixing family is robust and there are bonded perturbation of the edge weights and because this is always true for the locus of constant we see that infinity mixing time for trees is robust so hoping that then to explain why a trees are very good candidates to apply these. Relations between heating times and mixing times because for trees we can actually say what are the worst sets Yeah OK so the main tool we used in the proof of the theoretical results is the Stars maximally inequality. So to state the for any function from the state space to our we can define the max amount of function F. star a as taking supremum of the order times of absolute value of a piece of T. X. which we can also write in terms of expectation and then the result is us that. Under reversibility dippy norm of the maximum function is the same as the norm of the original function times the conjugate the exponent of P. and then there is a version that bounds the one norm of the maximal function using a one of the F. times slow gaffe and I see another gust inequalities to what you would have been dupes maximal inequality. OK so I one of the one proof and that's. To the proof of this result. Yeah you need. You need I mean this works. You need reversibility you can be funny even in this kind invert space set up and then it has to be self a giant So if you consider a walk on the cycle with the fixed biassed you see that it can't be true when you take F. to be an indicator of a point just not true. OK So recall that we had a parameter that was the first time that every small set is escaped from with probability at least one minus the half the size of the set then we showed that this inequality actually even we showed these for L. infinity instead of fell to and I want to prove the other direction OK so so fixed T. to be nine times the. Time parameter and then. Right for every set B. by definition fall so using the Markov property a. High This should be nine instead of six in the exponent a right to can try to escape from its nine separate times and condition where we wore so we get this and the point is that because it be smaller than one half I want to get rid of taking three half so and you can verify that this is true so the point is that even we even though we define draw a so here it was eight to the power one we can assume that we have this with an arbitrary power so three would be good enough for us. OK And then we generally can write second moment often non-negative on them via. Integrating PI where the set on the viability is more than S.. OK So we want to control as to distance so I mean first of all we can write it up to this constant term as just the two norm square of the distribution. And then right by definition this is just the second moment of the density function. And then using this a dentity we can write it as the integral from zero to infinity of twice S. a PI of the points in which the density function is at least this. So the point is that I don't have to show that this integral is very small it's enough to show that it's bounded by some constant because by the point of quality sheltie after it's at most some constant it becomes very small Ok so in particular it would be enough to show that into a grand is the most something we can integrate So let's say So one of rest of the power three have. OK so. So let's consider the set vs which is all the points why such that supermoon of all time. The probability of being a S. at time K. is larger than S. over to the size of a. Right so this this is a set that these defining terms of some level a set of a maximum function. OK So this is the thing we want to get the control over its size it S. by S. So by the definition of a S. you know if we sum this of the a while you will see that I mean this in the sense Markov inequality we see that. Is that most. The probability of being at the S. and then we can write this probability as terms so the first one is the contribution coming from the case that we didn't escape S. and the other one is the contribution coming from the case we did the scape S.. So here we have this general. That's precisely the way we define T. so we can say that this is the most a PI B S to the power three Well you know provided the B.S. is smaller but let's leave that for now and and the other term we define the fit B.S. such that if we escape from eat then we know that the probability of ever being at S. again at the later time would be at most this quantity here from the definition so that's precisely the reason we defined B.S. like that right so then we can as substract the S. and hold we have to show is that PI of B.S. is small. We might hope to be able to show this because this is defined as it's basically the set of a Y. such that some maximal function is large at those points. OK But unfortunately this doesn't work so you have to do things to actually different T. And so here it was very convenient to define B.S. like that because A it gave us immediately control over this term in a way that was very convenient for us but if we define B.S. slightly differently then. We can actually control its size OK so if we define it to be the pointed to which a P K Y S of A is larger than the square of log one of or size of a yes. To which. And so after multiplying by some calling this term delta S.. So exactly as before we get this inequality. And we want to show that it's small the right inside this model. OK. But then this maximal function. It's one arm is most kind of the one that first moment of this times to log of this and then relation tells you that the maximal in quality tells you this is at most eat times to log by a a. OK. So then mark of the inequality tells you that pi of B.S. is at most one over a square with S. because. This is such that one or devalue of the maximal function it why is larger than square with their one norm of the maximum function OK So this is good for us right if we substitute the tear than it's what we want provided the delta is small. Delta S. is smaller than we're good. And then when you work out the details you see that if we don't have this then we're already in a very good situation so if those tests is not much smaller then S. by S. It actually means that this quantity we want to bound is this stretch explain and child to begin with so it's some bra. OK so so either you are very good to begin with or you can forget about the S. OK so that there is more algebra but the idea is like the cheating proof I did before. So it's the first time T. such that every smaller set. Is this from with probability at least one minus three half the size of A So the probity of not escaping it is that and then instead of taking a ball I took like nine times out of all and they said that I'm getting this estimate to some exponents so I can forget about this and I can write three instead of one. Here yes. Or. No it was so for entropy. So it's the same expression we had before but before we had here three have a then here it turns out that this is the relation you need to in the probability of not escaping in the size of a or you know the right punishment to take for the size. And then it's the same result. So. It turns out that not only that the same results translate into the discrete time case you can also translate them into this Jane that makes only one lazy step and we could that every chain so because a reversible change if there are per year the period to it makes sense that making one step is enough to overcome or is new the city issues so this was a conjecture by and feel. So you know because he think times are the same for this change in for the continuous time chain in the sense that it's very reasonable to expect it and this actually gives us a way to prove it show for total evaluation distance so we show that up to changing the time by square with T. factors a this is true for any initial state OK so the term we have on both sides ten to zero as M. goes to infinity and. And then for this effort chain. And the relative entropy mixing times are smaller here there is in the converse inequality because for example for simple abandon walk on the end and clique a. Strictly smaller or their right to making two steps for the every change it's already completely Meek's but the continuous time Chen because we saw a small probability doesn't make any step then it has to spend more time until it sneaks. OK so I don't have any time to discuss counterexamples. Though some of them are very nice. OK So one of them briefly explain why trees are very good candidates. So because we can use the tree structure to roughly identify the warthe sets and first of all it's easier to think about birth and death chains and if I ask you on in what is the worst set of size one half to hit then it's like hitting the center of the mess it's like hitting the point such that disconnecting eat leaves us with two intervals of size less than one half its precisely this point so we were reduced hitting time of four sets to hitting time off one point which is the center of math is so it turns out that we can generalize this to trees so for any tree there will be one point such as the if we delete it from the graph then every connected component is a stationary mess at most one half. So we fixed one there can be at most two we fixed one and we call it the central vertex or dilute the not that by zero. So I don't have time to expend it but there is a fairly simple of the duction the total tells you that a concentration of fifteen times of sets of size one half is like showing that the heating time of the center of mess is concentrated in one direction is easy. For the other one you have to do something. So instead of finding the worst initial state and proving concentration starting from the worst the nation state we can actually prove concentration for every nation state such that the expected heating time is large to the center of mass so we don't need to worry about what the what exactly is the worst initial state and then we can use the tree structure again. To decompose this heating time. So the hitting time from X. to the center of mass we can write it is a sum of crossing time off edges so the time it takes us to get from here to here plus the time it takes us to get from here to here except or up. OK but the point is that this crossing time off edges are independent London via Bellus OK so. So to get concentration of the heating time it's enough to show that the variance is the variance is smaller all of there than the second moment Square right then we take X. such that this is larger than it ever. OK So as I said before we can decompose the heating time into some of these crossing times of edges. Right to an V.I.I.I. I thought very ticks and on this path then. The difference of the sitting times is the time after we got here it took us to cross this. So as they said their independent. And thus the variance it's enough to get control over the individual variances. And then you using general considerations de Law of these crossing time of edges is completely monotone it's a mixture of exponentials and. Parameter is at least two by general considerations OK and then. This is enough to get the control we want our differences because the violence would be at most the second moment of each of these crossing times. And then for a mixture fix potential is the second moment looks like that but then we can pull out. So each of these guys is the most. Over to sorry twice so we pulled that out and then we're left with the first moment right and then when we substitute it into the some of the variances. Then we get some of the four times the first moment and putting this outside we're getting. Just the estimate we want. Yeah so similar consideration apply for two but I think I'll stop here thank you. One. Of. You mean the constant Yeah well actually we don't really have even for the station time right so the relic station and you can approximate it using spectra get both appear restricted to sets and this you can understand in terms of fitting times when you start from Kweisi stationary but it's actually a conjecture that instead of that you can start from by conditions on a journey you only have one sided inequality so even for the delegation time there isn't a heating time characterization involving pi. Yeah. Yeah. OK.