it is joint work with Chi Chi Kok in tightly and actually one man chandran thanks for the invitation so I want to talk to you about a person problem to introduce the problem to you and stay some motivations and then the proof has two parts the first part is about continuous up where the scaling is a continuous version of the alternating scaling algorithm I would explain to you what is the algorithms and to maybe more detailed analysis of the first part and then the second part is the smooth analysis I just give you some high-level ideas of the proof and then I would conclude with some discussions ok so let me first introduce what is a frame a frame is just a collection of factors even up to V and human after un in d dimensional space that span span out e and then it is called econom fame if all the vectors have the same length it is called a possible frame if some of the outer product is equal to the identity so you should think about this condition as as a generalization of an orthonormal basis when X is equal to T sum of UI UI transpose is identity is just a normal basis in this talk you should think about n is much bigger than B then it is an over complete basis and people use it in in complication theory you still have the property that if you have a vector you project into those basis vector and then you can recover the vector so people send those coefficients to the other parties and because it's over complete you can lost some coefficients and still be able to recover the original signal so there was some motivations for studying frames and they have applications in communication theory quantum information theory I don't know much about those so what is the motivation for the person problem we now we want to construct a frame which satisfy the economic condition and also the parsifal condition and it's not so easy to construct them and they only feel algebraic constructions of those frames for given n given T there are not many known constructions I think it's harder when maybe n is smaller and for all range pause and asked this problem at that time he even wants to construct a frame called grassmannian frames not only econom possible frame but also satisfy the constraint he wants to find a frame such that for every pair the inner product is small economist if old frame which minimized the maximal inner product and those frames are even more difficult to construct but on the other hand it is easier to construct a frame which approximately satisfy both conditions approximately equal naam approximately possible for example we can generate random unit vectors and then by matrix concentration you can show that it is almost possible and and you also have good angles between between them if you generate a random frame so then he asked a question if we have an approximate frame approximately economical frame can we just move the vectors a little bit and then it becomes economical and hopefully it doesn't change the angle of any pair of vectors and then he you would be able to construct a good frames and those angle correspond to efficiency of signal processing applications they have in mind okay so that was the motivation of of the patent problem so formally what is the plasm problem is we are given uh frames they approximately satisfy the econom constraint each factor is between 1 minus epsilon and 1 plus epsilon times T over N the norm Square and you approximately satisfy the possible constraint is between 1 minus epsilon and 1 plus epsilon identity and now he asked what is the best function that you can prove in terms of NP and epsilon such that you can move this you into into a frame B such that we satisfy the two constraints exactly and then you want to pound a movement so if you're given such a you can you find a V such that you satisfy the two constraints exactly and you want to pound the distance between the movement the movement is defined as some of the UI minus VI square the norm Square and you want to find the path function to to pound it okay so that's his formulation was the best function that you could prove you okay so what are some PVS without there were two previous work before before without the first one has an interesting assumption and also an interesting conclusion you need to assume that T and and they are relatively prime but then they prove there is a polynomial bound on en and the interesting conclusion is at lon Atlantis at leat epsilon square is a it has a good dependency on epsilon and the idea was to define a dynamical system which always keep the econom constraint and then improving on the passive all constraint getting closer and closer to possible but always satisfying the econom constrain and then there's another without which doesn't have the assumption that the relatively prime and you also have a polynomial dependency on TNN and dependency on epsilon is much worse think about this we thought is worse than the first without maybe the whole thing two to the power one one over seven something like that and they use a gradient descent which oh I think I mix up one keep the passive oh and improve on the econom this one keeps the economy and improving on the passive oh and then there are some examples some simple examples showing that you need the movement you need is at least a times epsilon okay and then the open question is can the function be independent of of n so this is the worst example maybe you can hope to achieve P epsilon is the right bound but so that was the open question okay so we we prove we answer this question positively we show that you can always just move the factors by a factor of e to 6.5 times epsilon and then our proof has two parts first part is we define I would talk about what is the operator scaling algorithm the alternating algorithms and we define a continuous version of that and then we can show an upper bound a square times M times Epsilon I have a possible frame and then your sample would be close to pass and then in the second part we would to a smooth analysis we would take the input randomly perturb a little bit and then we we use the dynamical system to move tobacco's so second part we would to the input you a little bit before applying the continuous operators Gaming algorithms and I saw that convergence would be would be much better if we I don't know we we don't analyze their algorithms I think it's algorithms specific it's not a generic and analysis and I should mention this without recently the next Hamilton and anchor Matra improved this without they prove T square times epsilon and also the proof is much shorter than us much simpler and much shorter and with a swank amount so so you can relax in this talk you don't need to take this talk very seriously and then in the end I would discuss their proof I would still show our proof anyway and in the end I will discuss their proof and then compare the two approaches and there's still some advantages of our proof and I want to please cast that okay so let's see the continuous operator scaling algorithm okay so if you are given this question how do you move an a-plus mainframe to satisfy the two conditions exactly this problem is only difficult because you have to satisfy both conditions simultaneously if I just want to satisfy one it is very easy if I just want to satisfy the econom condition I can just we scare the backers and if I want to satisfy the possible condition is also easy I just need to do this linear transformation of the factors and then you can check that some of the outer part at what becomes I so this is a standard linear transformation so and then a natural algorithm is just to alternate between these two steps so if you if I'm given some input first we scale such that they satisfy the econom and then after that we apply this linear transformation to satisfy the passive oh and any column constraint would be violated and then you fixed and then you keep fixing alternatively and and want to analyze when it would converge so our first observation is this is actually a special case of the alternating scaling algorithm for operator scaling for more general problem and this algorithm is analyzed so we can hope to apply there without tool to understand the person problem and then how to how to apply there without first idea was I keep track of my input I apply this alternating algorithm and and then it would move and then I am interested in suppose I can rescale the backers such that the economists evil I want to measure the distance between them but instead I can bound the total movement so I keep track of the algorithm how much the backers move I sum them up and you see as an upper bound on on the distance between between the input and output so that was the idea but before I explain further let me introduce to you what used to operate a scaling problem which is a generalization of the frame scaling problem so the operator scaling problem is the following an operator is just a collection of matrices u 1 up to UK could have many many matrices each of them is M by n and so this problem was introduced by covers so now given those you as input I want to find a left scaling matrix M by M right scaling matrix and by N and and if I scale each matrix UI by multiplying the same scaling matrix on the left and right now I want the resulting matrix to satisfy this doubly stochastic condition sum of V I VI transpose is a scaled version of the identity and sum of VI transpose VI is a scaled version of the identity so this is the operator scaling problem for for some constancy okay so sometimes such scaling matrices don't exist if it exists then you want an algorithms to find it okay maybe it will be more abstract I would explain to you how to reduce same scaling to operator scaling and I will say as operators satisfy these two conditions are doubly balanced okay and also maybe would be good to motivate this problem by showing you some applications a simple special case of operator scaling is matrix scaling so matrix scaling is useful in preconditioning for linear servers to solving the optimal transportation problem you can also use matrix scaling to find a perfect matching in a bipartite graph although it doesn't give you a fast fast algorithms faster than than the best but it's also useful in designing deterministic approximation algorithms for the permanence it would be an exponential approximation but it is deterministic so frame scaling is another special case the case that we are interested in finding an econ impossible frame is same scaling so it was used earlier in proving actually communication complexity lower bound giving a lower bound of the SCI rank of the hardiment matrix and it was used in machine learning robust subspace recovery and now we use it in the person problem there is a common generalization of matrix scaling and frame scaling they don't give this name but I just call them PSD scaling and it is useful in approximating the mix determinants of matrices and then the most general form is operator scaling and an operator scaling was study again in 2016 their motivation is income computing the non commutative rank of a symbolic matrix so before this operator scaling algorithm we only have an exponential time algorithms for this problem but now we have a polynomial time algorithms for computing that non commutative rank computing the Brassica deep constant and and also recently was used in an orbit intersection problem in invariant theory and actually in their paper they use our without on on the more general version of of the person problem we have a version of the person problem defined in the operator setting it's just a brief introduction the scaling problem is a simple problem but it turns out to have many applications and if we want to design an algorithm for operator scaling we have we can generalize the simple alternating algorithms by just repeating these two steps if we want to satisfy the first condition you are UI transpose to be identity we can just we scale as in the frame case and then you can easily check that if you be scale like that then sum of UI UI transpose would be identity if we want to satisfy the second condition we can we scale in the right and and then you can show that you will satisfy the second constraint and then a natural algorithm for operator scaling is to alternate between these two step and hope that you would converge so in the reasons paper is not obvious but then they show that the operator scaling problem has the formulation which is geodesic the compacts not convex in the standard way so a natural algorithm is just to alternate between these two steps and hope that you would converge so let me show you why the famous scaling that we are interested in is a special case of operator scaling it's a very simple reduction I'm given a bunch of vectors UI in P dimension for each Packer I I just create a matrix UI such that the i's column is UI all the other columns are 0 and then these conditions if you look at it or the serial carbons doesn't matter and then this condition just becomes the possible conditions you are you a UI transpose and then the second condition UI transpose UI all the rows would not matter and then you would have the the norm square in in the I diagonal term and then UI UI transpose UI you would cut this matrix some of them you would have the norm in the diagonals and if you require to be identity you're just requiring it to be to be econom so the frames scaling problem can be reduced to the operator scaling problem so in the first part of the talk we would actually solve the problem in this more general setting we would actually move an input operator which is which is close to satisfying the two conditions to exactly satisfying the two conditions so we consider this operator person problem which was also used in in in a recent work we are given the input u u1 up to u K and then they almost satisfy the two conditions they are almost identity and almost a scale identity depending on the size and then we want to move those input into VI up to VK such that they satisfy the two conditions exactly and we want to bound the distance bound the total movement the total movement is defined as some of you I minus VI the 14 years norm square instead of 2 norm square it's a natural generalization and then what is the best function that you can upper bound it by M and K and epsilon so in the first part of the talk we would prove that it is always upper bound by M square times n times epsilon for this more general problem okay so let's come back to this idea that we want to keep track of the distance between the input and output by the total movement so this idea doesn't doesn't work directly because for example the examples if I'm given this input you would not converge so when you scale for the econom you get those backers and then when you scare it back to the passive all you would get back to this and then they were just you won't converge to to the solution and even though so those examples are rare but even though you don't consider those examples you can easily imagine that there would be an input that is scalable but then if I if I measure the total movement the path was 6 AK a lot and you would be a very bad upper bound on the final distance so this idea doesn't work because the path was six AK a lot and and we would not be able to give any meaningful upper bound so to fix to fix this problem our idea was to to define a continuous person we don't make a full step alternatively instead we would do both step simultaneously and we also just both continuously instead of making a large step ok so let me first state the algorithm and then explain how we come up with it the algorithm is just defined by these differential equations how much we change the I matrix depends on this equation so this is the left scaling matrix and then that is the right scaling matrix of of the UI and and then we do it at the same time so we just add them up and there's a new parameter s here as is defined as the size of the operator which is just some of the phobias norm square of the matrices topic is is the is the input so to come up with this we just do the following in one step of the alternating scaling algorithm we apply this linear transformation and then what we do is we first scale this matrix now if we do in this discrete version we do this step if we do the continuous version we just want to move a little bit and what we do is we just scale this matrix we multiply by this factor M over s such that this matrix is close to the identity matrix and then once it's close to the identity matrix we just use the Taylor expansion of of the matrix so instead of writing it as an inverse we can write in this more convenient form and then we do the same for the right staring and then we just add them up and and get this dynamical system so that's how we define the continuous version of operator scaling yeah that's a very good question we can realize this connection but now we understand it it is actually a gradient flow off of natural potential function that I will introduce in in in a couple slice if it is not scalable I would also actually mention in a couple slides if it is not scalable we would just rank it to two size zero I would so now the high-level idea is instead of bounding the total movement of the discrete algorithm we just found total movement of this continuous algorithm we want to keep track of the path length and use it as an upper bound of the final distance and then this is how we define the distance and now we have a system which matrix is changed over time and then we want the input is a time 0 the output is at time infinity and we want to measure this distance and then we can write in this form and then we can just pound it by triangle inequality so this is just the path length so if you don't look at the equations I'm just saying what I said in the picture now we can bound the final distance by the total movement at each time so I want to pound it I call this I call this time the local movement at any local time what is the movement of the matrices so now this is an important definition which is going to answer your question on gradients flow how do we keep track of the progress of the algorithms is by this potential function potential function that error measure of the current solution so this is the first term we want this time to be zero such that this thing is a scalar of the identity so this the first matrix is the era of the first condition the second term is the era of the second condition and then we just look at the four piñas none of these two matrices and then we add them up so it is a quantity called Delta so before for the person problem we measure the era by by this epsilon and now to analyze the algorithm we we use this data so first give you some intuitions in a person problem we say that input is close to identity by this measure Epsilon you can think of it as L infinity norm bound on the eigenvalues we want the eigenvalues of this matrix to be between 1 plus Epsilon and 1 minus epsilon is an infinity norm bound this Delta you should think of it as an l2 norm count of the eigenvalues so this is a most outer which means sum of the eigenvalues minus 1 square is at most Delta so instead of working on the our infinity norm bound which is more difficult to deal with we we work with this Delta bound the l2 norm error so by the way this is a definition which is defined by a curve s and it was also used in previous algorithms previous analysis of the operator scaling algorithms so we look at this data instead of epsilon so from now on we won't talk about epsilon too much we would just focus on Delta and and then there are some simple properties Delta is 0 if and only if the operator is doubly balanced the two constraints are satisfied exactly so and then you can easily show by this standard L to infinity arguments that Delta is at most M square times epsilon square you can upper bound Delta Phi Epsilon losing a factor of M and then now we will focus on proving the total movement from now on we were both away from Epsilon is much easier to work without her we will prove that the total movement is MN Times Square without her and M PI this bound it would just be M Square and epsilon what we want to get in the first part and and now in high side we can understand the continuous operator scaling algorithm is doing gradient flow on Delta so at any time I look at the Delta and I want to move in the direction that minimize Delta and that is exactly our algorithm so we didn't know that at that time but now in hindsight we we know and then it also simplifies some proof a little bit by knowing this connection so is a is a natural gradient descent algorithm minimizing Delta so now we want to analyze these these continuous algorithms and we have found some nice identities so the first one is what is the change of the size is just equal to minus Delta so if the error is bigger than the size would decrease would be creased more so the size would always be not increasing and then the second one is how much the total changes this is the quality that we care the most we want the error to decrease a lot the change of the Delta is equal to the minus of the total movement the local movement so if as some time we move a lot we would also decrease Delta a lot so now with the connection to gradient flow this lemma you can understand it using gradient flow and we don't need to provide a proof so first I want to claim that the dynamical system would always come to adopt a balanced operator and then the proof is simple we just show that by level one the size is decreasing by lemma two we would say that the second derivative is positive it's non negative so it is compacts so it would eventually converge to a point when Delta which is equal to the change of the size and then change of the size would become zero at some point and then Delta at that time would also be zero so in the case when the input is not scalable we would we would show that the size would become zero and then we can argue that for those input Delta has to be very large and then you can do whatever you want you can just output any econom Parsifal frame and then and then the bound will be within M square n times Epsilon okay so this is the important lemma because it shows us how it ties the change of the error to the movement that we made so remember we want to bound this term this is the distance sometimes you don't need to look at the formula this is just the distance and we round it by the triangle inequality of the local movement and then by this formula instead of bounding the movement we just need to bound the change of the Delta square root of that so now I want to I just take square of both sides I want to pound this term and then the next idea is instead of bounding from time 0 to infinity square root of the change of the Delta we found we define what is the half time half time is the first time when Delta becomes Delta over two and and then we can just use a geometric sum argument to say that if the total movement at that time from zero to T four times of that would be the result of movement from time zero to infinity it's a simple geometric sum argument so we can focus on the time pounding the half time then you would be easier to pound this time and then to pound the term on the right hand side we just use a simple Cauchy Schwarz inequality it would just be and then that would be bounded by t times delta so we have a pretty simple form to bound it oh by first reducing to the half time and then for the half time we just use a simple couch this was and then we upper bound the total movement by t times delta so now what is important is to pound this half time we want to pound the time for delta 2 becomes Delta over two is small if we can show that we can pound the total movement of of the algorithm ok so how do we pound it it is not it is not obvious and then we have to use potential function again define pile covers so curve s come up with other right definitions and I think this one is the most non trivial one he divides a capacity of of the current solution as this minimization problem which I don't have a easy to explain intuition about this definition which is related to some definition from quantum information theory based on divergence but yeah so it is an interesting question what is this would it correspond to some maybe KL divergence of quantum entropy or some time so for our purpose so for their purpose so this function is nice it satisfy some nice properties and we know exactly how you would change if we scale by left and right matrix and you can keep track of the change and for the purpose of this talk we just show that the capacity is unchanged over time this potential function is unchanged over time in their applications yeah yeah our input keeps changing but the capacity is unchanged in their algorithm they use it as a potential function they prove an upper bound of a capacity they proof a low about of the capacity and then they show that each step the capacity increases and then they use it to bound the number of iterations of the algorithm so let's skip the details the keep on is because these two matrices they have to trace 0 and then and then you can show that capacities unchanged and then we can also prove upper bound and lower bound of the capacity upper bound is by the size lower bound is by the size minus MN x squared theta basically this MN squared Delta is correspond to the total movement of off of our problem so this proof is also not new we just adapt to prove in their paper to to our setting in the second part we have some new methods to to prove improve bound on the capacity so one implication for example is when a time T T infinity then delta T is equal to zero so then we would have as the size of infinity is equal to the capacity in infinity and capacity doesn't change so what is the capacity of the original operator is just equal to the size of the output so we would come back to this fact and talk more about capacity later but for now we just use these two identity capacity doesn't change and then we have upper bound and lower bound of the capacity to finish the proof to pound a half time the algorithm is a little different so in the discrete algorithm the size doesn't change and then capacity would increase in the continuous algorithm the size could decrease but the capacity is unchanged it's not exactly the same algorithm so now we can come back and use capacity to pound a half time so capacity is unchanged over time and we have lower bound and upper bound of the capacity so it implies that if we apply the upper bound at time Peck T the size of T upper bound the capacity at time T capacity doesn't change so is capacity of time zero and then we apply the lower bound which is at least size zero minus MN square with data so what does it mean it means if I combine the two without it means the size wouldn't be creased too much is upper bounded by MN square root Delta if the initial error is not too much I know that the size of the operator won't decrease too much so now I can use it to bound a half time because on the other hand I also have this the change of the size is equal to Delta minus Delta so suppose for a long time Delta hasn't decreased without her - then it means that the total decrease of the size is lower bounded is T times Delta over two so it means the size would be creased by at least this so then we just combine the upper bound and lower bound and then we would say that the time T at time to MN over scroll without her Delta would becomes Delta over two otherwise we would have a contradiction so then we also have an upper bound on on the half time and then the total movement is just t times delta which is MN times crowd data ignoring some constants here so that's the proof of the first part so let me just summarize proof we want to bound the final distance we use triangle inequality to found the path length and then to pound the path length we have a nice identity to relate it to the change of the Delta and then we use half time and then geometric sum and then crush this was and then just say this is at most T times Delta and then to bound that half time we use the argument incapacity we used without in capacity to pound a half time and then the final without is just our L to vs L infinity the two error measures so then we have we have this without for operator scaling so that is the first part I think maybe it's too technical the second part would be would be would be in a higher level so first I want to say that smooth analysis would only work in a frame setting we also want to make it work in the general operate operator scaling setting but we don't know how to do that yet if I want to summarize the result in the first part I can say if we if you have a good low about on the capacity which is at least s- some function of TM tara proof in the in the first part shows that that would be an upper bound of the total movement so if you can improve this time you can improve the total movement so now in our second part we focus on improving at this time we proved before we proved it was TN x squared data and now we want to prove that if we slightly Pro to the input then we can prove a much better bound on on on the capacity so the proof is a little complicated but let me try to explain the high level ideas so our intuition was just it's very difficult to find examples with small capacity so idea was if I just randomly perturb the input then the worst case would be gone and then the dynamical system would converge much faster so is what we are going to do someone give us an input of the operator who we just put up a little bit and and then we apply the continuous algorithm in the first part and then and then we found the final distance by the sum of the two part of the movement of adding noise and then after that moving to to an econom possible frame okay so for this analysis to work we need to prove three things so after I get to this part of the input from the first part the movement in the dynamical system would just be a function that we can prove in the capacity FPN and then the new data so I want to prove that I don't need to add too much noise because if I add too much noise I would already move too much in the first step so I want to upper-bound the perturbation movement so this is easy to pound and then the second part is a little tricky we want to prove that if we add noise that Delta won't increase by a lot because if the Delta increased by a lot and the second step would would would move a lot and then the third part is we want to prove that after we put up there that has an increase too much and at the same time capacity has increased a lot and then using the analysis in the first part capacity have a good lower bound and then the total movement would be much smaller so that is the plan first part is easy second part let me just quickly explain what is the perturbation process and and explain how we deal with that outer usual so basically how we add noise is just generating random Gaussian backers so I have input UI I want to add some noise to UI I generate error vector GI in Rd each entry is independent Gaussian noise with variance Sigma square and then to deal with the data issue so this is a technical part we want the noise so first we generate random noise and then we project the noise into this subspace that the inner product with UI has to be 0 and then have to has to satisfy that conditions and then we just set the new vectors to be this UI plus the Gaussian vector project into the subspace that satisfy these two constraint I what what why do we need to do this projection because we want the data to be roughly that always no data because if you expand the new data then there would be some cost term then there would be some quadratic term quadratic term would be small and then this two cars Fang is to make sure that the cross terms to be 0 that is what we are trying to do and and then we have T 2 T times and random variables and then those constraint is just n plus P Square dimensional so we still have enough freedom we just project into it and and let's just say we pretend after that we just pretend the noise independence Gaussian so this is how we deal with the second step the third step is it's the most important one how do we prove an improve capacity lower bound so how do we work with capacity there is a reduction from operator capacity to matrix simple matrix scaling case there's a reduction from this capacity if you look at the eigen value decomposition of X and then this is the output you can reduce to the matrix capacity case the capacity of this matrix is equal to the capacity of the operator and then when we were working on this problem a lot of the time we get intuition from from the matrix scaling problem so now I look at the matrix scaling problem and then I want to say that after I add noise to the matrix then the capacity would improve a lot so we want to use the techniques in our dynamical systems to pound the matrix capacity so in the first part you can think of it as we have a capacity lower bound the capacity lower bound would indirectly argue about the convergence of data would pound a half time Delta would becomes Delta over two we cannot take too much time in a second part we would directly argue about the convergence of data and then use it to say that we have a good capacity log around so how do we prove it we want to say if the instance has some random property then we can directly argue that the data is exponentially is its linear converging so it's geometrically decreasing we can directly say that the data decrease of data is at least some cousin fashion of that are not constant but some some fashion of data it won't be a constant in the end and if we can do that we can use it to argue about the capacity because what is capacity as I said earlier capacity would be equal to the size at the time infinity and then we we can show that if Delta is decreasing so quickly the size wouldn't decrease too much and then we can use it to prove capacity lower bound so capacity doesn't change and then I know that capacity is equal to the size that's something that I mentioned before and then I can tear actually bound a size change what is the size change is just the change of the size over time change of the size at each time T is just equal to the power of Delta of T and then I know that Delta is X is G magically decreasing so Delta at time T is at most Delta at time 0 times this Z to minus mu T and then you would just be at most Delta of a meal so we can we can use it to bound the initial capacity if we can show that our our system is converging linearly so we use this and we use it to show that there's a improve capacity lower bound and then how what is the condition that we can guarantee this the condition that we use is in the matrix case is a comedy toriel definition pseudo random property we need to guarantee that we have a fast convergence of of of the continuous algorithm is this that most entries of the matrix is at least segment square that most entries are nonzero and cannot be too small if we can show that then we can show that matrix scaling would actually converge faster then the convergence is Sigma square times n times Delta and then roughly speaking this factor of n is where we get where we gain to remove the dependency on n of the total movement so this part is a is based on a company Torrio argument to say if our matrix satisfy this property and then you do this dynamic on the matrix scaling on those input Delta would be creased quickly and then Delta would decrease quickly would implies that the capacity of this matrix is high and then it would correspond to the fame corresponding to that matrix also has large capacity so let me just summarize proof I know that if I have a capacity low about of the fame by the previous part I would have the total movement amount but I don't know how to directly argue about a capacity lower bound in a frame is that we would do this reduction from fame to matrix and then if I can argue about a capacity lower bound for this matrix and the same lower bound applies for the frame what we do is we took the input of the frame and then we follow the reduction from the capacity from the frame to the matrix and then we say after we follow this reduction the resulting matrix would satisfy the pseudo random property most of the entries will be at least Sigma square if we ask Sigma Sigma square noise in the frame and then once we have this pseudo random property we can directly argue about the convergence of Delta and if we have the convergence of data we have the capacity lower bound water for the matrix and then we would have the capacity lower and fall for the frame and then you implies the total movement is small so that's that's a little messy mainly because we don't know how to directly argue about the capacity on on the frame we only know how to argue about capacity in the matrix so if we put in more concrete pounds this is what we can prove after we add Sigma square noise to the frame and then we would have the pseudo-random property for the matrix we can have these conversions of of Delta in the matrix case it would implies in a matrix cake capacity lower bound is this same lower bound for the frame and then the distance would be Delta over Sigma Square and and and now we just need to choose a sigma square we choose Sigma squares to be this and then you can show that a balance that two times the movement of etic noise in the first part would not be the significant part and then the second part dominate and if you put in this bound you'll get T to one point five square root hour but then we for analysis to work we have to make some additional assumption better to be small enough and to be large enough and then and then and then put it together we just get D to six point five so that's that's the proof of the second part so now I just want to wrap up the talk so so our analysis of the continuous algorithm it's not only useful in solving the possum problem it can actually be used in analyzing some mathematical quantities of for for scaling problems so what are some quantities that people care about one is for example bounding the condition number of the scaling solutions if an instance is scalable you want to know what is the condition number of the left and right scaling matrix and in some cases if you want to analyze your algorithms for example there's a new algorithm based on this geometric convex optimization to solve this problem and then their running time depends on the condition number so we have some technique in bounding the condition number of scaling solutions and also as you see in a second part our result would also give bound on on the operator capacity the capacity of a frame and the capacity of a frame is closely related to the price conduct constant of the frame so our techniques would also give us some ways to bound these mathematical quantities in our case we showed that if we satisfy this pseudo-random property then we have better pounds so if we can identify some nicer properties that we can get better bounds we there would also be some implications for those problems so recently we are thinking about replace these pseudo-random conditions by special conditions so for example if I'm given a matrix if the matrix has a special gap then we can show that matrix scaling would converge faster and and then there would be some some implications for photos problems then you would implies condition number is very small and we have a very quick algorithms to find a scaling solutions some open problems I also want to talk about the recent without one open problem is proofing the optimal balance epsilon now we are already very close we have P square epsilon want to know whether it is any efficient algorithms to solve the plasm problem which is also banned by the recent work the t square epsilon they also give an efficient algorithms to solve the problem so what is the they are proof instead of tracing the total movement they just directly found the total distance between the input and output and then they have some very clever idea of using a nice distance function and then just make everything works very nicely and in general we also want to do that smooth analysis of operator scaling so our approach would also work in a more general operator setting I think it would be interesting to see whether the shot proof would also work in the operator setting as well so thank you for your time