it is joint work with Chi Chi Kok in
tightly and actually one man chandran

thanks for the invitation so I want to
talk to you about a person problem to

introduce the problem to you and stay
some motivations and then the proof has

two parts the first part is about
continuous up where the scaling is a

continuous version of the alternating
scaling algorithm I would explain to you

what is the algorithms and to maybe more
detailed analysis of the first part and

then the second part is the smooth
analysis I just give you some high-level

ideas of the proof and then I would
conclude with some discussions ok

so let me first introduce what is a
frame a frame is just a collection of

factors even up to V and human after un
in d dimensional space that span span

out e and then it is called econom fame
if all the vectors have the same length

it is called
a possible frame if some of the outer

product is equal to the identity so you
should think about this condition as as

a generalization of an orthonormal basis
when X is equal to T sum of UI UI

transpose is identity is just a normal
basis in this talk you should think

about n is much bigger than B then it is
an over complete basis and people use it

in in complication theory you still have
the property that if you have a vector

you project into those basis vector and
then you can recover the vector so

people send those coefficients to the
other parties and because it's over

complete you can lost some coefficients
and still be able to recover the

original signal so there was some
motivations for studying frames and they

have applications in communication
theory quantum information theory I

don't know much about those

so what is the motivation for the person
problem we now we want to construct a

frame which satisfy the economic
condition and also the parsifal

condition and it's not so easy to
construct them and they only feel

algebraic constructions of those frames
for given n given T there are not many

known constructions I think it's harder
when maybe n is smaller and for all

range pause and asked this problem at
that time he even wants to construct a

frame called grassmannian frames not
only econom possible frame but also

satisfy the constraint he wants to find
a frame such that for every pair the

inner product is small economist if old
frame which minimized the maximal inner

product and those frames are even more
difficult to construct but on the other

hand it is easier to construct a frame
which approximately satisfy both

conditions approximately equal naam
approximately possible for example we

can generate random unit vectors and
then by matrix concentration you can

show that it is almost possible and and
you also have good angles between

between them if you generate a random
frame so then he asked a question if we

have an approximate frame approximately
economical frame can we just move the

vectors a little bit and then it becomes
economical and hopefully it doesn't

change the angle of any pair of vectors
and then he you would be able to

construct a good frames and those angle
correspond to efficiency of signal

processing applications they have in
mind okay so that was the motivation of

of the patent problem

so formally what is the plasm problem is
we are given uh frames they

approximately satisfy the econom
constraint each factor is between 1

minus epsilon and 1 plus epsilon times T
over N the norm Square and you

approximately satisfy the possible
constraint is between 1 minus epsilon

and 1 plus epsilon identity and now he
asked what is the best function that you

can prove in terms of NP and epsilon
such that you can move this you into

into a frame B such that we satisfy the
two constraints exactly and then you

want to pound a movement so if you're
given such a you can you find a V such

that you satisfy the two constraints
exactly and you want to pound the

distance between the movement the
movement is defined as some of the UI

minus VI square the norm Square and you
want to find the path function to to

pound it okay so that's his formulation
was the best function that you could

prove you okay
so what are some PVS without there were

two previous work before before without
the first one has an interesting

assumption and also an interesting
conclusion you need to assume that T and

and they are relatively prime but then
they prove there is a polynomial bound

on en and the interesting conclusion is
at lon Atlantis at leat epsilon square

is a it has a good dependency on epsilon
and the idea was to define a dynamical

system which always keep the econom
constraint and then improving on the

passive all constraint getting closer
and closer to possible but always

satisfying the econom constrain and then
there's another without which doesn't

have the assumption that the relatively
prime

and you also have a polynomial
dependency on TNN and dependency on

epsilon is much worse
think about this we thought is worse

than the first without maybe the whole
thing

two to the power one one over seven
something like that and they use a

gradient descent which oh I think I mix
up one keep the passive oh and improve

on the econom this one keeps the economy
and improving on the passive oh and then

there are some examples some simple
examples showing that you need the

movement you need is at least a times
epsilon okay and then the open question

is can the function be independent of of
n so this is the worst example maybe you

can hope to achieve P epsilon is the
right bound but so that was the open

question okay so we we prove we answer
this question positively we show that

you can always just move the factors by
a factor of e to 6.5 times epsilon and

then our proof has two parts first part
is we define I would talk about what is

the operator scaling algorithm the
alternating algorithms and we define a

continuous version of that and then we
can show an upper bound a square times M

times Epsilon

I have a possible frame and then your
sample would be close to pass and then

in the second part we would to a smooth
analysis we would take the input

randomly perturb a little bit and then
we we use the dynamical system to move

tobacco's so second part we would to the
input you a little bit before applying

the continuous operators Gaming
algorithms and I saw that convergence

would be would be much better if we I

don't know we we don't analyze their
algorithms I think it's algorithms

specific it's not a generic and analysis
and I should mention this without

recently the next Hamilton and anchor
Matra improved this without they prove T

square times epsilon and also the proof
is much shorter than us much simpler and

much shorter and with a swank amount so
so you can relax in this talk you don't

need to take this talk very seriously
and then in the end I would discuss

their proof I would still show our proof
anyway and

in the end I will discuss their proof
and then compare the two approaches and

there's still some advantages of our
proof and I want to please cast that

okay so let's see the continuous
operator scaling algorithm okay so if

you are given this question how do you
move an a-plus mainframe to satisfy the

two conditions exactly this problem is
only difficult because you have to

satisfy both conditions simultaneously
if I just want to satisfy one it is very

easy if I just want to satisfy the
econom condition I can just we scare the

backers and if I want to satisfy the
possible condition is also easy I just

need to do this linear transformation of
the factors and then you can check that

some of the outer part at what becomes I
so this is a standard linear

transformation so and then a natural
algorithm is just to alternate between

these two steps so if you if I'm given
some input first we scale such that they

satisfy the econom and then after that
we apply this linear transformation to

satisfy the passive oh and any column
constraint would be violated and then

you fixed and then you keep fixing
alternatively and and want to analyze

when it would converge so our first
observation is this is actually a

special case of the alternating scaling
algorithm for operator scaling for more

general problem and this algorithm is
analyzed so we can hope to apply there

without tool to understand the person
problem and then how to how to apply

there without first idea was I keep
track of my input I apply this

alternating algorithm and and then it
would move and then I am interested in

suppose I can rescale the backers such
that the economists evil

I want to measure the distance between
them but instead I can bound the total

movement so I keep track of the
algorithm how much the backers move I

sum them up and you see as an upper
bound on on the distance between between

the input and output so that was the
idea

but before I explain further let me
introduce to you what used to operate a

scaling problem which is a
generalization of the frame scaling

problem so the operator scaling problem
is the following an operator is just a

collection of matrices u 1 up to UK
could have many many matrices each of

them is M by n and so this problem was
introduced by covers so now given those

you as input I want to find a left
scaling matrix M by M right scaling

matrix and by N and and if I scale each
matrix UI by multiplying the same

scaling matrix on the left and right now
I want the resulting matrix to satisfy

this doubly stochastic condition sum of
V I VI transpose is a scaled version of

the identity and sum of VI transpose VI
is a scaled version of the identity so

this is the operator scaling problem for
for some constancy okay so sometimes

such scaling matrices don't exist if it
exists then you want an algorithms to

find it okay maybe it will be more
abstract I would explain to you how to

reduce same scaling to operator scaling
and I will say as operators satisfy

these two conditions are doubly balanced
okay

and also maybe would be good to motivate
this problem by showing you some

applications a simple special case of
operator scaling is

matrix scaling so matrix scaling is
useful in preconditioning for linear

servers to solving the optimal
transportation problem you can also use

matrix scaling to find a perfect
matching in a bipartite graph although

it doesn't give you a fast fast
algorithms faster than than the best but

it's also useful in designing
deterministic approximation algorithms

for the permanence it would be an
exponential approximation but it is

deterministic
so frame scaling is another special case

the case that we are interested in
finding an econ impossible frame is same

scaling so it was used earlier in
proving actually communication

complexity lower bound giving a lower
bound of the SCI rank of the hardiment

matrix and it was used in machine
learning robust subspace recovery and

now we use it in the person problem
there is a common generalization of

matrix scaling and frame scaling they
don't give this name but I just call

them PSD scaling and it is useful in
approximating the mix determinants of

matrices and then the most general form
is operator scaling and an operator

scaling was study again in 2016 their
motivation is income computing the non

commutative rank of a symbolic matrix so
before this operator scaling algorithm

we only have an exponential time
algorithms for this problem but now we

have a polynomial time algorithms for
computing that non commutative rank

computing the Brassica deep constant and
and also recently was used in an orbit

intersection problem in invariant theory
and actually in their paper they use our

without on on the more general version
of of the person problem we have a

version of the person problem defined in
the operator setting

it's just a brief introduction the
scaling problem is a simple problem but

it turns out to have many applications

and if we want to design an algorithm
for operator scaling we have we can

generalize the simple alternating
algorithms by just repeating these two

steps if we want to satisfy the first
condition you are UI transpose to be

identity we can just we scale as in the
frame case and then you can easily check

that if you be scale like that then sum
of UI UI transpose would be identity if

we want to satisfy the second condition
we can we scale in the right and and

then you can show that you will satisfy
the second constraint and then a natural

algorithm for operator scaling is to
alternate between these two step and

hope that you would converge so in the
reasons paper is not obvious but then

they show that the operator scaling
problem has the formulation which is

geodesic the compacts not convex in the
standard way so a natural algorithm is

just to alternate between these two
steps and hope that you would converge

so let me show you why the famous
scaling that we are interested in is a

special case of operator scaling it's a
very simple reduction I'm given a bunch

of vectors UI in P dimension for each
Packer I I just create a matrix UI such

that the i's column is UI all the other
columns are 0 and then these conditions

if you look at it or the serial carbons
doesn't matter

and then this condition just becomes the
possible conditions you are you a UI

transpose and then the second condition
UI transpose UI all the rows would not

matter and then you would have the the
norm square in in the I diagonal term

and then UI UI transpose UI you would
cut this matrix some of them you would

have the norm in the diagonals and if
you require to be identity you're just

requiring it to be to be econom so the
frames scaling problem can be reduced to

the operator scaling problem so in the
first part of the talk we would actually

solve the problem in this more general
setting we would actually move an input

operator which is which is close to
satisfying the two conditions to exactly

satisfying the two conditions so we
consider this operator person problem

which was also used in in in a recent
work we are given the input u u1 up to u

K and then they almost satisfy the two
conditions they are almost identity and

almost a scale identity depending on the
size and then we want to move those

input into VI up to VK such that they
satisfy the two conditions exactly and

we want to bound the distance bound the
total movement the total movement is

defined as some of you I minus VI the 14
years norm square instead of 2 norm

square it's a natural generalization and
then what is the best function that you

can upper bound it by M and K and
epsilon so in the first part of the talk

we would prove that it is always upper
bound by M square times n times epsilon

for this more general problem okay

so let's come back to this idea that we
want to keep track of the distance

between the input and output by the
total movement so this idea doesn't

doesn't work directly because for
example the examples if I'm given this

input you would not converge
so when you scale for the econom you get

those backers and then when you scare it
back to the passive all you would get

back to this and then they were just you
won't converge to to the solution and

even though so those examples are rare
but even though you don't consider those

examples you can easily imagine that
there would be an input that is scalable

but then if I if I measure the total
movement the path was 6 AK a lot and you

would be a very bad upper bound on the
final distance so this idea doesn't work

because the path was six AK a lot and
and we would not be able to give any

meaningful upper bound so to fix to fix
this problem our idea was to to define a

continuous person we don't make a full
step alternatively instead we would do

both step simultaneously and we also
just both continuously instead of making

a large step ok so let me first state
the algorithm and then explain how we

come up with it the algorithm is just
defined by these differential equations

how much we change the I matrix depends
on this equation so this is the left

scaling matrix and then that is the
right scaling matrix of of the UI and

and then we do it at the same time so we
just add them up and there's a new

parameter s here as is defined as the
size of the operator which is

just some of the phobias norm square of
the matrices topic is is the is the

input so to come up with this we just do
the following in one step of the

alternating scaling algorithm we apply
this linear transformation and then what

we do is we first scale this matrix now
if we do in this discrete version we do

this step if we do the continuous
version we just want to move a little

bit and what we do is we just scale this
matrix we multiply by this factor M over

s such that this matrix is close to the
identity matrix and then once it's close

to the identity matrix we just use the
Taylor expansion of of the matrix so

instead of writing it as an inverse we
can write in this more convenient form

and then we do the same for the right
staring and then we just add them up and

and get this dynamical system so that's
how we define the continuous version of

operator scaling yeah that's a very good
question we can realize this connection

but now we understand it it is actually
a gradient flow off of natural potential

function that I will introduce in in in
a couple slice

if it is not scalable
I would also actually mention in a

couple slides if it is not scalable we
would just rank it to two size zero

I would so now the high-level idea is
instead of bounding the total movement

of the discrete algorithm we just found
total movement of this continuous

algorithm we want to keep track of the
path length and use it as an upper bound

of the final distance and then this is
how we define the distance and now we

have a system which matrix is changed
over time and then we want the input is

a time 0 the output is at time infinity
and we want to measure this distance and

then we can write in this form and then
we can just pound it by triangle

inequality so this is just the path
length so if you don't look at the

equations I'm just saying what I said in
the picture now we can bound the final

distance by the total movement at each
time so I want to pound it I call this I

call this time the local movement at any
local time what is the movement of the

matrices so now this is an important
definition which is going to answer your

question on gradients flow how do we
keep track of the progress of the

algorithms is by this potential function
potential function that error measure of

the current solution so this is the
first term we want this time to be zero

such that this thing is a scalar
of the identity so this the first matrix

is the era of the first condition the
second term is the era of the second

condition and then we just look at the
four piñas none of these two matrices

and then we add them up so it is a
quantity called Delta so before for the

person problem we measure the era by by
this epsilon and now to analyze the

algorithm we we use this data so first
give you some intuitions in a person

problem we say that input is close to
identity by this measure Epsilon you can

think of it as L infinity norm bound on
the eigenvalues we want the eigenvalues

of this matrix to be between 1 plus
Epsilon and 1 minus epsilon is an

infinity norm bound this Delta you
should think of it as an l2 norm count

of the eigenvalues so this is a most
outer which means sum of the eigenvalues

minus 1 square is at most Delta so
instead of working on the our infinity

norm bound which is more difficult to
deal with we we work with this Delta

bound
the l2 norm error so by the way this is

a definition which is defined by a curve
s and it was also used in previous

algorithms previous analysis of the
operator scaling algorithms so we look

at this data instead of epsilon so from
now on we won't talk about epsilon too

much we would just focus on Delta and
and then there are some simple

properties Delta is 0 if and only if the
operator is doubly balanced the two

constraints are satisfied exactly so and
then you can easily show by this

standard L to infinity
arguments that Delta is at most M square

times epsilon square you can upper bound
Delta Phi Epsilon

losing a factor of M and then now we
will focus on proving the total movement

from now on we were both away from
Epsilon

is much easier to work without her we
will prove that the total movement is MN

Times Square without her and M PI this
bound it would just be M Square and

epsilon what we want to get in the first
part and and now in high side we can

understand the continuous operator
scaling algorithm is doing gradient flow

on Delta so at any time I look at the
Delta and I want to move in the

direction that minimize Delta and that
is exactly our algorithm so we didn't

know that at that time but now in
hindsight we we know and then it also

simplifies some proof a little bit by
knowing this connection so is a is a

natural gradient descent algorithm

minimizing Delta so now we want to
analyze these these continuous

algorithms and we have found some nice
identities so the first one is what is

the change of the size is just equal to
minus Delta so if the error is bigger

than the size would decrease would be
creased more so the size would always be

not increasing and then the second one
is how much the total changes this is

the quality that we care the most we
want the error to decrease a lot the

change of the Delta is equal to the
minus of the total movement the local

movement so if as some time we move a
lot we would also decrease Delta a lot

so now with the connection to gradient
flow this lemma you can understand it

using gradient flow and we don't need to
provide a proof so first I want to claim

that the dynamical system would always
come

to adopt a balanced operator and then
the proof is simple we just show that by

level one the size is decreasing by
lemma two we would say that the second

derivative is positive it's non negative
so it is compacts so it would eventually

converge to a point when Delta which is
equal to the change of the size and then

change of the size would become zero at
some point and then Delta at that time

would also be zero so in the case when
the input is not scalable we would we

would show that the size would become
zero and then we can argue that for

those input Delta has to be very large
and then you can do whatever you want

you can just output any econom Parsifal
frame and then and then the bound will

be within M square n times Epsilon okay
so this is the important lemma because

it shows us how it ties the change of
the error to the movement that we made

so remember we want to bound this term
this is the distance sometimes you don't

need to look at the formula this is just
the distance and we round it by the

triangle inequality of the local
movement and then by this formula

instead of bounding the movement we just
need to bound the change of the Delta

square root of that so now I want to I
just take square of both sides I want to

pound this term and then the next idea
is instead of bounding from time 0 to

infinity square root of the change of
the Delta we found we define what is the

half time half time is the first time
when Delta becomes Delta over two and

and then we can just use a geometric sum
argument to say that if the total

movement at that time from zero to T
four times of that would be the result

of movement from time zero to infinity
it's a simple geometric sum argument so

we can focus on the time pounding the
half time then you would be easier to

pound this time and then to pound the
term on the right hand side we just use

a simple Cauchy Schwarz inequality it
would just be and then that would be

bounded by t times delta so we have a
pretty simple form to bound it oh by

first reducing to the half time and then
for the half time we just use a simple

couch this was and then we upper bound
the total movement by t times delta so

now what is important is to pound this
half time we want to pound the time for

delta 2 becomes Delta over two is small
if we can show that we can pound the

total movement of of the algorithm ok so
how do we pound it it is not it is not

obvious and then we have to use
potential function again define pile

covers so curve s come up with other
right definitions and I think this one

is the most non trivial one he divides a
capacity of of the current solution as

this minimization problem which I don't
have a easy to explain

intuition about this definition

which is related to some definition from
quantum information theory based on

divergence but yeah so it is an
interesting question what is this would

it correspond to some maybe KL
divergence of quantum entropy or some

time so for our purpose so for their
purpose so this function is nice it

satisfy some nice properties and we know
exactly how you would change if we scale

by left and right matrix and you can
keep track of the change and for the

purpose of this talk we just show that
the capacity is unchanged over time this

potential function is unchanged over
time in their applications yeah yeah our

input keeps changing but the capacity is
unchanged in their algorithm they use it

as a potential function they prove an
upper bound of a capacity they proof a

low about of the capacity and then they
show that each step the capacity

increases and then they use it to bound
the number of iterations of the

algorithm so let's skip the details the
keep on is because these two matrices

they have to trace 0 and then and then
you can show that capacities unchanged

and then we can also prove upper bound
and lower bound of the capacity upper

bound is by the size lower bound is by
the size minus MN x squared theta

basically this MN squared Delta is
correspond to the total movement of off

of our problem so this proof is also not
new we just adapt to prove in their

paper to to our setting in the second
part we have some new methods to to

prove improve bound on the capacity so
one implication for example is when

a time T T infinity then delta T is
equal to zero so then we would have as

the size of infinity is equal to the
capacity in infinity and capacity

doesn't change so what is the capacity
of the original operator is just equal

to the size of the output so we would
come back to this fact and talk more

about capacity later but for now we just
use these two identity capacity doesn't

change and then we have upper bound and
lower bound of the capacity to finish

the proof to pound a half time the

algorithm is a little different so in
the discrete algorithm the size doesn't

change and then capacity would increase
in the continuous algorithm the size

could decrease but the capacity is
unchanged it's not exactly the same

algorithm so now we can come back and
use capacity to pound a half time so

capacity is unchanged over time and we
have lower bound and upper bound of the

capacity so it implies that if we apply
the upper bound at time Peck T the size

of T upper bound the capacity at time T
capacity doesn't change so is capacity

of time zero and then we apply the lower
bound which is at least size zero minus

MN square with data so what does it mean
it means if I combine the two without it

means the size wouldn't be creased too
much is upper bounded by MN square root

Delta if the initial error is not too
much I know that the size of the

operator won't decrease too much so now
I can use it to bound a half time

because on the other hand I also have
this the change of the size is equal to

Delta minus Delta so suppose for a long
time

Delta hasn't decreased without her
- then it means that the total decrease

of the size is lower bounded is T times
Delta over two so it means the size

would be creased by at least this so
then we just combine the upper bound and

lower bound and then we would say that
the time T at time to MN over scroll

without her Delta would becomes Delta
over two otherwise we would have a

contradiction so then we also have an
upper bound on on the half time and then

the total movement is just t times delta
which is MN times crowd data

ignoring some constants here so that's
the proof of the first part so let me

just summarize proof we want to bound
the final distance we use triangle

inequality to found the path length and
then to pound the path length we have a

nice identity to relate it to the change
of the Delta and then we use half time

and then geometric sum and then crush
this was and then just say this is at

most T times Delta and then to bound
that half time we use the argument

incapacity we used without in capacity
to pound a half time and then the final

without is just our L to vs L infinity
the two error measures so then we have

we have this without for operator
scaling so that is the first part I

think maybe it's too technical the
second part would be would be would be

in a higher level so first I want to say
that smooth analysis would only work in

a frame setting we also want to make it
work in the general operate operator

scaling setting but we don't know how to
do that yet if I want to summarize the

result in the first part I can say if we
if you have a good low

about on the capacity which is at least
s- some function of TM tara proof in the

in the first part shows that that would
be an upper bound of the total movement

so if you can improve this time you can
improve the total movement so now in our

second part we focus on improving at
this time we proved before we proved it

was TN x squared data and now we want to
prove that if we slightly Pro to the

input then we can prove a much better
bound on on on the capacity so the proof

is a little complicated but let me try
to explain the high level ideas so our

intuition was just it's very difficult
to find examples with small capacity so

idea was if I just randomly perturb the
input then the worst case would be gone

and then the dynamical system would
converge much faster so is what we are

going to do someone give us an input of
the operator who we just put up a little

bit and and then we apply the continuous
algorithm in the first part and then and

then we found the final distance by the
sum of the two part of the movement of

adding noise and then after that moving
to to an econom possible frame okay so

for this analysis to work we need to
prove three things so after I get to

this part of the input from the first
part the movement in the dynamical

system would just be a function that we
can prove in the capacity FPN and then

the new data so I want to prove that I
don't need to add too much noise

because if I add too much noise I would
already move too much

in the first step so I want to
upper-bound the perturbation movement so

this is easy to pound and then the
second part is a little tricky we want

to prove that if we add noise that Delta
won't increase by a lot because if the

Delta increased by a lot and the second
step would would would move a lot and

then the third part is we want to prove
that after we put up there that has an

increase too much and at the same time
capacity has increased a lot and then

using the analysis in the first part
capacity have a good lower bound and

then the total movement would be much
smaller so that is the plan first part

is easy second part let me just quickly
explain what is the perturbation process

and and explain how we deal with that
outer usual so basically how we add

noise is just generating random Gaussian
backers so I have input UI I want to add

some noise to UI I generate error vector
GI in Rd each entry is independent

Gaussian noise with variance Sigma
square and then to deal with the data

issue so this is a technical part we
want the noise so first we generate

random noise and then we project the
noise into this subspace that the inner

product with UI has to be 0 and then
have to has to satisfy that conditions

and then we just set the new vectors to
be this UI plus the Gaussian vector

project into the subspace that satisfy
these two constraint I what what why do

we need to do this projection because we
want the data to be roughly that always

no data because if you expand the new
data then there would be some cost term

then there would be some quadratic term
quadratic term would be small

and then this two cars Fang is to make
sure that the cross terms to be 0 that

is what we are trying to do and and then
we have T 2 T times and random variables

and then those constraint is just n plus
P Square dimensional so we still have

enough freedom we just project into it
and and let's just say we pretend after

that we just pretend the noise
independence Gaussian so this is how we

deal with the second step the third step
is it's the most important one how do we

prove an improve capacity lower bound so

how do we work with capacity there is a
reduction from operator capacity to

matrix simple matrix scaling case
there's a reduction from this capacity

if you look at the eigen value
decomposition of X and then this is the

output you can reduce to the matrix
capacity case the capacity of this

matrix is equal to the capacity of the
operator and then when we were working

on this problem a lot of the time we get
intuition from from the matrix scaling

problem so now I look at the matrix
scaling problem and then I want to say

that after I add noise to the matrix
then the capacity would improve a lot so

we want to use the techniques in our
dynamical systems to pound the matrix

capacity so in the first part you can
think of it as we have a capacity lower

bound the capacity lower bound would
indirectly argue about the convergence

of data would pound a half time Delta
would becomes Delta over two we cannot

take too much time in a second part we
would directly argue about the

convergence of data and then use it to
say that we have a good capacity log

around so how do we prove it we want to
say if the instance has some random

property
then we can directly argue that the data

is exponentially is its linear
converging so it's geometrically

decreasing we can directly say that the
data decrease of data is at least some

cousin fashion of that are not constant
but some some fashion of data it won't

be a constant in the end and if we can
do that we can use it to argue about the

capacity because what is capacity as I
said earlier capacity would be equal to

the size at the time infinity and then
we we can show that if Delta is

decreasing so quickly the size wouldn't
decrease too much and then we can use it

to prove capacity lower bound so
capacity doesn't change and then I know

that capacity is equal to the size
that's something that I mentioned before

and then I can tear actually bound a
size change what is the size change is

just the change of the size over time
change of the size at each time T is

just equal to the power of Delta of T
and then I know that Delta is X is G

magically decreasing so Delta at time T
is at most Delta at time 0 times this Z

to minus mu T and then you would just be
at most Delta of a meal so we can we can

use it to bound the initial capacity if
we can show that our our system is

converging linearly so we use this and
we use it to show that there's a improve

capacity lower bound and then how what
is the condition that we can guarantee

this the condition that we use is in the
matrix case is a comedy toriel

definition pseudo random property we
need to guarantee that we have a fast

convergence of of of the continuous
algorithm is this that most entries of

the matrix
is at least segment square that most

entries are nonzero and cannot be too
small if we can show that then we can

show that matrix scaling would actually
converge faster then the convergence is

Sigma square times n times Delta and
then roughly speaking this factor of n

is where we get where we gain to remove
the dependency on n of the total

movement so this part is a is based on a
company Torrio argument to say if our

matrix satisfy this property and then
you do this dynamic on the matrix

scaling on those input Delta would be
creased quickly and then Delta would

decrease quickly would implies that the
capacity of this matrix is high and then

it would correspond to the fame
corresponding to that matrix also has

large capacity so let me just summarize
proof I know that if I have a capacity

low about of the fame by the previous
part I would have the total movement

amount but I don't know how to directly
argue about a capacity lower bound in a

frame is that we would do this reduction
from fame to matrix and then if I can

argue about a capacity lower bound for
this matrix and the same lower bound

applies for the frame what we do is we
took the input of the frame and then we

follow the reduction from the capacity
from the frame to the matrix and then we

say after we follow this reduction the
resulting matrix would satisfy the

pseudo random property most of the
entries will be at least Sigma square if

we ask Sigma Sigma square noise in the
frame and then once we have this pseudo

random property we can directly argue
about the convergence of Delta and if we

have the convergence of data we have
the capacity lower bound water for the

matrix and then we would have the
capacity lower and fall for the frame

and then you implies the total movement
is small so that's that's a little messy

mainly because we don't know how to
directly argue about the capacity on on

the frame we only know how to argue
about capacity in the matrix so if we

put in more concrete pounds this is what
we can prove after we add Sigma square

noise to the frame and then we would
have the pseudo-random property for the

matrix we can have these conversions of
of Delta in the matrix case it would

implies in a matrix cake capacity lower
bound is this same lower bound for the

frame and then the distance would be
Delta over Sigma Square and and and now

we just need to choose a sigma square we
choose Sigma squares to be this and then

you can show that a balance that two
times the movement of etic noise in the

first part would not be the significant
part and then the second part dominate

and if you put in this bound you'll get
T to one point five square root hour but

then we for analysis to work we have to
make some additional assumption better

to be small enough and to be large
enough and then and then and then put it

together we just get D to six point five

so that's that's the proof of the second
part so now I just want to wrap up the

talk

so so our analysis of the continuous
algorithm it's not only useful in

solving the possum problem it can
actually be used in analyzing some

mathematical quantities of for for
scaling problems so what are some

quantities that people care about one is
for example bounding the condition

number of the scaling solutions if an
instance is scalable you want to know

what is the condition number of the left
and right scaling matrix and in some

cases if you want to analyze your
algorithms for example there's a new

algorithm based on this geometric convex
optimization to solve this problem and

then their running time depends on the
condition number so we have some

technique in bounding the condition
number of scaling solutions and also as

you see in a second part our result
would also give bound on on the operator

capacity the capacity of a frame and the
capacity of a frame is closely related

to the price conduct constant of the
frame so our techniques would also give

us some ways to bound these mathematical
quantities in our case we showed that if

we satisfy this pseudo-random property
then we have better pounds so if we can

identify some nicer properties that we
can get better bounds we there would

also be some implications for those
problems so recently we are thinking

about replace these pseudo-random
conditions by special conditions so for

example if I'm given a matrix if the
matrix has a special gap then we can

show that matrix scaling would converge
faster and and then there would be some

some implications for photos problems
then you would implies condition number

is very small and we have a very quick
algorithms to find a scaling solutions

some open problems I also want to talk
about the recent without one open

problem is proofing the optimal balance
epsilon now we are already very close we

have P square epsilon want to know
whether it is any efficient algorithms

to solve the plasm problem which is also
banned by the recent work the t square

epsilon they also give an efficient
algorithms to solve the problem so what

is the they are proof instead of tracing
the total movement they just directly

found the total distance between the
input and output and then they have some

very clever idea of using a nice
distance function and then just make

everything works very nicely and in
general we also want to do that smooth

analysis of operator scaling so our
approach would also work in a more

general operator setting I think it
would be interesting to see whether the

shot
proof would also work in the operator

setting as well so thank you for your
time