But the unknown parameter is. Still the vector of regression go fissions. So the number of parameters is be and it's going to be large so you have to do something to do with high dimensional view of the problem. OK then the last so it's like this. So you Dake the many miser of the following you dick to the sum of squares I normalized by N. that's my habit you don't have to OK And then there's a penalty which is proportional to the L. one norm of the Kofi shits. And I put a two here because otherwise I have to right to send it out of place OK so I guess this procedure is known and maybe you worked on this theory or plighted. OK so now I want to show a but so I talked about. A fix or sparsity on Friday but maybe you were not there but anyway I want to show its goal a little bit into depth topic so you need a certain condition to get good theoretical properties of the system mater in that condition. I call this compatibility but it's related to restrict that I can fell use. And I want to you really to show you that this condition is really quite natural So that's why I am going to do you write it here and really big to make life simple let's forget about the noise for a moment OK so I'm going to cancel the noice. Just just for development of the series later on I'll show you that's quite useful to think about that case too. So if you have no noise I have this kind of thing so there's no noise so why is just X. beat us zero so you don't you get X. The mind is the zero squared. One. OK So let's consider this problem first there's no noise now. OK now I'm going to write down the derivative of booted to Ciro and it's called the K K T conditions while. Conditions this is taking that the reflective of this term that's extra and spose X. B.. Minus Zero and then a two but there's a two and zero to ten PSI too so I'm just cancelling the two digits the derivative of this term and then the reflective of the this term is to sup differential call it sit so you put the derivative to zero and solve maybe I should get a star here made it spit so this is the sub differential of the absolute value function or L. one norm. So. This says that set this effect or which components are the sign the sign off beat a star and if beat a star is Ciro I don't care what set this but it's something between minus one and one day so sit. The largest value in society so that's an absolute fall you list in one OK. OK now I'm going to multiply by because Star Mine has become so what I get then if I multiply here by beat a Star Mine is beat or transpose so you get again this quantity. X. B. star minus zero squared. And then I multiply by beat a star minus beat on this side too so this is lumped. Be a star minus zero so far so good. OK so now we have let me just write it so it's beat. To true beat transpose zip star minus and here and then beat a star transpose it just OK. OK now I'm going to use that the super norm of said Star is less than one so I can bound disturbed by the L. one normal B. just zero OK And this storm is exactly the L. one normal fee to start OK. Now we're almost done and then then we get to a result so let's see let me call this zero ective set of zero so that's the. Set were to go fishes are nonzero you can be more precise here but let's not do that we just look at strong sparsity. At zero is going to be the cardinality of that settle dispersed city. And I'm going to write B.. Or maybe B. maybe better B. This zero is going to be B. J.. G. in this zero. And B. minus zero so I'm just going to split. Perfecter according to being in the active set or not. And let's see them again right here. And so this is to sing right. And here I can split it up so you get. OK minus love. There just split it up and inside and outside perspective said. And here I used to triangle inequality we're almost done OK love. B.. This is zero minus the. Zero just. So. And now comes to the trick let me put it here. I'm going to define the following quantity. I put a head here don't worry about that squared minimum. Be square search that beats. The zero is equal to one and I normalize so times as you like this. So this is the crucial quantity This is called a competent billet a constant so it's just introduced to make the following step. OK So if this is so negative I'm done then I just proved that the prediction error for the last so is less than or equal to zero and then I'm done so it Ciro so then so I can that's not an interesting case of what if this is positive news that this storm is larger than depth or and then with this normalization you can bounded by love and then because of the there are no squares here so there's no square in that split this queer So there's this OK I'm putting to square here a couple. Closer it's just a couple of stone stairs. And this is just. This. I've pulled ahead here because of the Random Act of sixty nine so that's step one times X.. This year will no square here so now it's still without a square so here. Yeah so if you are OK just defining this funny quantity you get a bound for the prediction error of the last or so disturbed and cancels with that term and then you get X. The Star lines to zero this and over let's flip the squared and you get and don't like square root and here's this book to square so you just put it to the other side you get lumped the squared zero. Squared this series so that's a bonus for the prediction error. And I want to show you know I'm going to dislike that it's also the so it's an upper bound and it's also the lower bound. OK that's. OK that's just too good to make that the last step. And so this guy had this just depending on how did on the design its own property of the design and it's a curse here explains a little bit further so that's the main thing this is a competent at the constant. It's the point. Constance. So the idea is if you look in the literature similar are going to disappear but they're occult restricted I can value so this one is a little bit different a little bit less stringent which allows me also to get the lower bound because actually what appears in the Indus upper bound also appears in the lower bound to Apparently that's the main quantity that. Rule still being so let me try to. Give the details known to slight So yeah sometimes it gets a bit technical and then I well maybe all the time so then you're just zoom out OK. Yes. This one. OK yeah you have to think a little bit because you know if you're if you define it in this way it's almost by definition like that it's sort of the definition. So I don't write it all that it's not difficult you know it's just. A whole. Yes subject to subject very good question thinks it's here. OK So we start again so here's to the linear model again. Unknown recursion functions and I'm going to look at lower bounds and upper banks but mainly I want to show something new so lower bounds and let's just for lower bounds to issue noise Gaussian noise with the Unitarians is maybe you starting point B. parameters and observations and it's higher dimensional So you're still also Yeah that's the I guess most of you are familiar with it. And also the motivation so I'm not giving that if you want to ask me the. Scoop motivation to do it this way if you think of sparsity. And the tuning parameter that's something to remember it's usually a Ford or square would look over and. OK this will occur from time to time so this is to actually said I've written that on the board the size of the exit is to sparsity which is sure to be small whatever it needs. OK And like I said on the Friday So typically if you look at the prediction average at this difficult it behaves like the number of parameters defied by the number for proficiency that the number of active parameters times the price to pay for not knowing it OK So typically it behaves like this so if I take a lock be defied by end it's moralist the joining parameter squared so it's a serial time slot the squared so you see it appear here. S. zero times up the square but there's a. Constant in the autumn there. Due to yes some kind of identifying ability conditions. OK So actually. OK so I want to show that this compatibility constant which actually appears there is really needed so I want to establish tight bounds for the last song for the prediction error for a start and you can ask why well the main reason to do look at type bounce is to show that these kind of compatibility constants are be necessary. Well uncertain on Thursday I'll talk a little both about to the bias law so in there you will have the situation where you do a lot so when actually the true parameter is not sparse and then you maybe want to know what happens and this is largely still open problem it's all talk about Officer state and it's nice mathematics I think. Some geometry I'll talk about it your materials probably OK now OK just put down the main result and there are some conditions which are off to form the true underlying parameter if it's not zero then it's not going to be below the noise level and up that's called Better mean condition. So a better min condition is off to form like this so the noise level is Norris squared look below for N. if you go below this level you cannot really tell whether the parameters zero or whether it's small but non-zero so if you're away from that that's like these so there's a constant here so it's a larger ordered and squared look be over and. And I'll need such conditions now as a statistician you don't want to be shewn that if something is non-zero then it's large enough so that you will notice it that's what this is actually Schumi So this is a very bad condition from a statistical point of view so usually I don't assume that for lower bounds it's I think OK. So for computer science they're OK Also as well they always assume this because they say if you don't assume that then I can't find an active set with first at stations I mean we can discuss this but if you're honest you don't assume that for inference but for lower bounds I think it's OK so in contrast to proponents because this is just a condition to make life easy. OK. Then I come to discount but ability constant so I'm not just that's why I put the head there I'm going to look at random design and I take the expectation of the ground matrix so it's now a deterministic B. times the Matrix and I'm going to consider that quantity which I wrote on the board. But now replacing the. Cigar matrix by its expectation so if you're friend of the scientists would be a random quantity but now I just have a deterministic quantity and I call that gone but to beat it to Constance So that's a very funny thing it's you have the courage of etic. Function here in which you want to minimize subject to a constraint and this is a non-convex constraint so you end up with. Non conflicts up to my station problem. Actually what I'll just tell you now is I'll give you an example later on I hope it's time where I can explicitly calculate that thing but in general it's not so easy. OK So this is the best I'm in condition that I'm going to issue so it's on but it's on the slide let's don't worry about it like this I mean maybe you like me like to look at it with just that I've shown you it that there's some condition there and then comes following result OK so you have to. Say your design is random and say its to columns or the rows are ID copies of some factor with some go Ferencz matrix Sigma Ciro it think of it like that and to normalize it I just issue say the diagonal matrix is old once or are at least bounded by something i mean i'm just have to somehow normal isolette say that all entries are bounded by one and then that this is important this is not a normalization I'm going to be sure that this largest I can sell you then is not going to grow too fast so this makes my kinfolk addition is also occurring very much in the literature so now comes the result this is a tight Bonet So should these better mean condition. Goshen design with the GO Ferentz matrix with the largest I can fill you or small order look be. So difficulty people shouldn't abounded but let's allow to grow a little bit the usual value of the tuning parameter. The usual assumption on the sparsity and that's the usual Because you know if you look at this. Yes you wanted to go to zero right so love the square dots lock squared lock be over N. times is zero you want it to go to zero so that's what I'm sure mean here you want some kind of virgins OK So that's this condition and there's some some little technical conditions here some convergence so also if a serial is a little bit larger OK And then the prediction error off the last so is this bound you see that it's just exactly like here Noni quality here of course not exactly it's as some thought experience so with a small order term coming there. Yeah so this says you can't get away from discomfort ellipticals it's just if you expect me to type bonds. Yeah so I'll talk about the that one too but it's more complicated there. Yes. Yes so I have to explain a few things to better conditions I don't maybe not to go too much into old a technicality So let's first just explain discount the ability little bit further. So this is my notation I have written that on the board I do it again. So to if you have an arbitrary subset just put everything outside the subset to zero and in this Victor and reef leave the rest untouched and that's called this you can do that for an arbitrary factor and B. minus this is to the complement operation you know. OK now we're going to do some linear algebra if you have a matrix you see metric positive semi definite matrix you can defined it's minimal I can sell you. And if TO ME KNOW I can feel you suppose if you can identify defects or be from from discourage erratic form OK so it's Gleick positive gives you some identify ability now we're in a situation where you have an extra set in a non active set and then you can define If you have two sub spaces of linear combinations of the columns of linear combinations of variables in essence variables not in this you can look at its canonical correlation. Yeah here's a be missing Thanks yes there's a B. So yes the canonical correlation is defined like this well canonical correlation so the yeah. It's the angle and if you have an angle between two vectors you can also look at their distance and the distances than one minus that angle if they're normalized to be equal to full length one. Yeah. Well. Yes. I maybe I did the nomination wrong or that's could be but I'm just writing this for two fixed source X. in and and next to if they have length one it's two minus their inner product right. If they have length one so that. Then thinking in that direction will go. Yeah. Maybe the one half should be here he. Will. Have you know the difference of those two factors so B.. That C. B. is B. This. Plus B. minus S. right well or needed minus whatever. So. Now we go to the L. one world and you can write sed sites right on similar things in one world so you just look at the minimal I can value or I think you'd have changed the notation here but do you just look at the minimal I can for you but instead of the restriction that the L two norm is equal to one you restrict a one norm to be equal to one and maybe the one half is on the wrong place again. You can also look at some kind of L. one canonical correlation so again I forgot to be here with I replaced the norm here by one more. You know and then there is this normal I say sion with S. steps because if you go from L. to Norm to a one on you say OK if you have active sit with this then the norm is a square root this is a bounded by square root is time still to Norm so it makes sense to normalize with the size because the normal something is typically much larger than the two norm so you'd need some normalization if you want to compare. So but the bigger to go and so if you look at the old all to the one you find in the literature compatibility constant it's something like Canonical correlation as you replace the L. to Norm back to L. one Norm And here's an inequality sign and there's not one but maybe some other constant it's something like L. one type of canonical correlation you just compare linear combinations in one space with Lee in your communication to the other space and then you look how far are they from each other. And this is the new one which I've written here. Which gets even more complicated so it's like here. Except that you instead of this restriction you have this restriction. OK And it turns out that's the real thing that place a roll in the lower bounds. And so it's a little bit if you look at this it's a little bit so if you put your I think one of them if you put somewhere if you change a little bit that definition you get to restrict it I can tell you but you might also know. You a smaller one gets difficult smaller and one. Very good so you want it to be close to one because then you should trekked more and then that helps makes the you wanted to be large because you're dividing by it and if this is large if this is one you're happy this is what I'm doing here but you have to make it a little bit smaller than one if you are if you have to do with Noyce but depending on the situation or you know OK so this is to you. But the I can give you a geometric interpretation there that you just see you have here. This restriction make sure that the first part is you take the conflicts who both signed go fix will of the variables in this and you look at the assigned convicts who have the variables and not in this video maybe use blow it up a little bit. And you look how far they are from each other that's what this is and then you get this picture so this is just the active set is just one fairy ball and all the other ferryboats you take the sign convicts who and then you look at this distance and that's the bolt to be that the constant but now it's this does new one is a bit more complicated as it were difficult interpretation OK so the new one is bigger than the old one because well so that means it's. You may have gained something because you're dividing by it increasing in units true and you want to. You to be. Increasing you that. You want you to be large you know and if you can take it equal to one you're happy you're so we are we take it you know you here there we take one. So that's now we understand compatibility and. So this is we know that the upper bound depends on the competent beauty constant we can prove that under the conditions of that sandwich I started with you have these upper bound and for that you do not need. Any better mint condition so you don't need to non-zero stay away from zero. This is actually a new result and this is the result in literature which is a little bit you know here you have nice remainder terms which are nice and here is just order of magnitude and there's tissue and so on so it's a little bit cleaning up the situation from the literature OK. But see I think I'll just skip this part except this part so. If you look at this current none can fix up to my station problem what is important is to show that the solution will have non-serious inside to sit this and let me just give you. The geometry of that it's kind of nice I think three feel that. So here's to L. one ball OK here Ciro maybe you know this picture to let me try to write it all it is just a picture in the book of. Sheep should run the hasty and so on so here if you beat us heroes here and you want to minimize. The L. to distance subject to L. one norm being equal to one then you hit a point and you get to spar solution right that's that's the idea of the law so it's we're in a different situation here still one ball again and now I want well forget about this minus a spark just put it to zero for a moment then I want to minimize. The the ball subject to the elbow norm being equal to one so what I'm actually doing is. Let me see. I forgot how it goes with the idea is you're sort of trying to blow up a balloon here right and if you do put a balloon in a box it will never go into corners right no matter how hard you try not my balloons so it will not be sparse it will not be a spar solution yes it's more or less like you do a balloon outside it until it's inside to the L. one ball and then it will not be in the corners and you have to do a little bit of arguments here because it's in high dimension you never know what the turns out to be like that has so you get a non-zero solution which is kind of good. For the theory though let me Schiphol this is. OK. I don't know about time let's see. I'm assuming this quantity is going to see a row and that's a little bit more of an issue meaning that you have a convergence in the first place so shouldn't call for chance for a little bit larger set as well OK see. And then for the noiseless laso and it's like there but now for normal four to grow matrix replies replaced by its expectation you get the following results. So I'm comparing out a noisy also with the noiseless less so take your tuning parameter of order square o'clock P. over and here's a quantity which is going to zero that's why I put it in green. And then with large probability the prediction air over the last so if you compare it with the prediction arrows to noiselessly also it's the remainder that the difference is a smaller order so different result says you might as well look at the noiseless also because the bias dominates to ferry and and you can think of this is to bias. And so. What the remains is a small order. OK So this difference is smaller ordered in the quantities itself OK I'm just going to forget about it's smaller smaller order so that So this is like the noise is also due prediction error of the noise has lost its like the bias. And if you then compare the noisy loss A with the noiselessly also disliked a fair use and that serum which I just shows you says that this storm is of Larcher ordered and disturbed OK so you can might as well look at. The noise is also to get tight bounce because that's the main term OK S.O.S. And both are clearly these two are equivalent. God this is a little bit about the kind of theory you used to prove this kind of research. Nothing very new. And then once you know that you only have to study to noiseless laso the prediction error of the noise of this law so you can find the expected result and there but then you need to see these bit I mean conditions OK so that's where you where you get to type out. So for a. Chick time. Yes. Dear. Thanks. So to proof this this last theorem you need some. Help to my station. And if you need this I really need very much that the solution will have none zeros here if you can't blow up a balloon in a box and get all get to the edge to the corners. If you combine So you combine the two C.M.'s you get the lower bound for the noisy losses OK OK. So this is all about random design and now let's go to fix design. So we considered Gaussians the sine what about sixty seven. Again let's start with the noiseless case. So noiselessly also is the one which was on the board now but now with here really the goal matrix and not its expectations so the noise so this is now a fixed matrix so it's everything deterministic here. And then you have to change nothing you can apply to previous theorem it's just nothing changed it's just a different station so you can get the same result but now. The definition is with with this matrix It's just a different matrix there OK. So for the Norse risqué so we are done so let's consider an example an example is for to go to fairly ation penalty so let's consider a function well let's say a function evaluated at and points so that its effect on an dimensional fixed or and then go to fairly Haitian is just the sum of the absolute value of the difference if each jumps all our largest this whole artist is an art and so on still too. For a function only evaluated and points. And do so with total variation penalty still noise says so I mean in my eyes I have a fixed function which I don't know maybe well it's it's not it's a bit artificial it's noice the situation I want to recover this function I do the. Residual sum of squares while the sum of squares hold a L. to distance and go to for a penalty. So that's like a noiseless law so so does it so you can always write a function of. The defined on those and points as X. times be. So that's kind of obvious a millimeter trying to put a cheer. So a function F.. I think it's like this OK because one two I if you say F. zero is zero. OK. So these are my my victors my my B. This is my B.. So and I some from one to on to so I can write this some B. K. indicator OK a list and I. Know. So that's my next matrix. OK so you get this X. matrix these days I mix those B.'s So you have a loss of problem so it fits in the framework that we considered so far OK then there's a little bit miffed or a gossip going on that if you lost so it won't work if the columns of your design spaced highly correlated Now let's look at this design terrible high correlations between the columns right correlation meaning in a product if you take the inner product of those two divide by N. it's almost one and so do huge correlations or in the products this is a simple still shows the loss of still works. With I'm going to calculate the gold but to build at the constant. So I have a function and say it's just piecewise constant. Well it's always it's just defined on a few points so it's like this maybe like this and then like this and then let's see. This is to the point where the first jump to securing and here is the second jump and so on. And. So did the distance between between jumpstarted the D.J.. Then the Compared to be that the constant is equal to this so there's an exact expression for that quantity here. Which depends on the distance between jumps. So this is an example where you can really calculate that thing. OK And these are to the so it's a minimizer over some beast and the beast except murder minimum is a thing or dose B. Well it's the inverse You see to infer Cis of the jumps. In first of the just as between jumps now if you then if you have to mint condition in a bet the mint condition now says that the jumps. Have to have a certain the size this didn't have a certain size OK so if you this is between jumps It's like the same order as N. the size is the moralists noise level with Otherwise it has to be large. And then you get this for this particular case you get you know it's the. Still up the squared times to compensate beauty as Ciro defied by the compatibility constant foods like these. And if you look out skills so if that this is between jumps is the same old a time and if they're as Ciro or as serial plus one So you've got you can calculate the first. The value where the function starts is the free parameter so let's put an X. if parameter so you have a serial jumps plussed where it starts so it's a serial this one free parameters in the sense that it's a serial close one. If equal jumps then the maximal one over the jumps of maximal value is done this year Opus one to five by and you do the calculations and you see that the prediction error is order the number of jumps squared times loken over and where Logan over and is just up the squared. But anyway if you if you have only a one or two jumps you get a parametric rate OK parametric rate in the sense of one over and the locked of course because cos you are you don't know where to jump start and you can extend this to trees if you have a total variation penalty on trees you just for instance for insist just cut up the trees into Paul figure out and you do it for each boss and you get similar results so you can do this for more general graphs OK not just to finish up a little bit quickly you can do the noisy case like. Like before but that's it gets a bit technical I misstate it so it may be good not to go to go into all the technical details but you get something like. Because of the noise you get in the Complete the BT constant some weight. So they're here and that see just to say how it goes though it's a result of of papers are not young and know how to beauty and your harness Laden are very clever it says OK all you have to do is kill it somehow Why do you use a penalty in the noisy case it is because there's no choice it's almost to spend all to you have to get rid of the noise kill the noise and what the amount of noise it has to appeal is less if you have higher correlations because it's what you do you have your active Ferial sure non active variables. And then you just project those known active variables on the active forms and what's left over that's the part of the noise that you have to kill the inner product of that Barclays' the noise so if if there's much correlation and you project and then see what's left over then there's hardly anything left over so there's hardly any noise to kill so here is this all defined this is to project all this is just a usual thing if you do square scene no this is the yeah the protection of the non active faerie was in the active ferryboats and this is what's left is left over. And this is then the go variance matrix of what is left over and then you look at the diagonal make elements of this matrix and more or less what you have to kill the noise you need to your tuning parameter to be larger so that you put this this quantity without the square in the weights here. OK so if there are more correlation this will be very small and so. You put one minus that quantity into it so it will be almost one book should be somewhere that's what I'm saying is you only need that part of the kill the part of the noise that is left over after projecting on the active sit and so if you have high correlations not much to kill. OK And then the weights stay close to unity and then you're going almost a constant one here. So let me just didn't give the result is so the upper bound is from the spacer or it's also in other papers but I think the real result is in this one young a beauty and later it was fairly nice day you see the Compared to be the sequence and they actually have it and instead of a one you have one minus something small here depending on those projections Yes we see you can lump the square with a serial defined by accountability constant like like being there but now for the noisy case. And a lower bound is similar to the bit I'll skip all the conditions the lower bound dislike this. These are small order terms lumped at times the sparsity and then the competent be the constant and with instead of a minus you have a plus now and then you just have to hope that the two are close to each other it depends on the design OK. Just to finish if you apply that to the total variation penalty in the noisy cases for now it's gets really interesting because that's what we do we have a noisy observation of a function with a should we do that though to variation penalty we calculate those weights this is really hard work I must say that it's done in the paper by dollar Leon and all. So and then you get the following result. So the upper bound was known. And for the lower bounds you applied previous results and for the special case you get this and so you get the lower bound and the upper bound they differ only in this term as you get really quite tight bounds for the total variation penalty and so it depends this is in general not a dominating therm. So what do you see yeah so if you have only a finite the many jumps a three then this will be the dominating third and so you get a parametric rate that slipped. Parametric rate one over a square root of IN THE me put it down again look. Then. In order. Here. So this is the one over square with a friend it's a paramedic rate then there's a look B. term which is peace now N. So here's a look enter and then there's an additional local term so you get the parametric rate. A lot are not the square root locked on but a lot better. And if the number of jumps is growing then this will probably not depend on our flat for our church but then this can be the dominating term. And I'm I don't know whether this is to the tightest possible result but at least it's a two locked rooms you know see how tight it is. Of course you can this can also be extended to the more general graphs this is work with with the. Functions Quarterly that I've just bits on archive. So let me give a summary so for random the size you need these I can value conditions but then the bias dominates the variance and for the bias you have except expressions for the prediction error under these bit I mean conditions and for fixed the science there are upper and lower bounds but then there may be you get because of the constant here and you know it's it's not a priori clear how big the gap is depends really on that the sign OK Now name me. This is all nonsense you go through look at that so I have to acknowledge some people I talk to from conflicts up to my station in the local fix up to my station to help me out thank you very much for your attention. Yes. In I don't think so. Ever infinite variance or. OK. Let me think you get a similar bounce but not with exponential probabilities. I guess that would be my guess depend so if the design is bounded at least. So bounded exis. Then then I could do similar things but then the probability is just not going exponentially fast to zero the. These. Are. Exactly. The. Nodes it's OK Just you left your noice. Which is affected and you have another factor which is fixed you just need some banks of high probability for quantities like this and if there's a second moments you know you don't with many of them so if their second moments you can use got some kind of what is called again. Forgot it's called good if you know me from bounce Also if only if you only have second moments. So it should work. Thank. You for the old saying. What will the Obama look. Like them so far I got no one no source there you go see cool.