[00:00:05] >> Well. That's. OK Thank you. First I want to mention that this lights I will always put on my on my web page so you. Copy to link and then you will find this light in case you're interested. And the 2nd thing is a small disclaimer so starting with this lecture will be a bit more mathematical for the last time was the sort of overview in an introduction and today we will dive into the theory. [00:00:49] And we will mainly to do theory from the ninety's I think this is very important because it lies to the base to foundation for to what did keep learning and then start in the next week we would come to the deep networks and today for people mainly about cell networks OK So again what is the what is the subtle network not just maybe put it on the on the board so you can always see does because everything is related to that So those are just functions which are linear combinations of. [00:01:21] These units. And we have some interesting parameters here. The M. is the number of hidden. Did the number of units in the hidden layer. Those are the parameters that we fix and this is the activation function. That we did he pick before read to do the learning and so what people then we're studying. [00:02:00] Is there so how large just those are largest this function space which functions can you generate that that functions of by writing them in this form. And then you can you can go on and once you use you have figured that out you can try to get something on the weights off conversions and Stewie's of converts is always in terms of the number of parameters which is essentially to the number of off units right so you want to see how fast can you convert to a smooth function if you make the the rich. [00:02:39] And if you like air and exercise so a function that it's not so easy to approximate is if you just take 2 inputs and you want to multiply them OK so you just multiply 2 numbers and you can wonder now can I do multiplication can I somehow approximate it but functions of this form and then if you think about this you see did those a functions which are somehow this is a linear function and X. So here you can build something then you have to possibility of it to get in on the narrative to do something but it's it seems to be very far from from my playing right numbers right so it's and you can wonder now Francis dysfunction is that kind of the role of proxy made it by yourself on that. [00:03:29] OK so. In total we have Yeah so those vectors. Vectors and then we have here real numbers in total we have deep plus 2 parameters in each of these terms and and times close to real parameters that we can that you can choose. Moreover you can see that these functions faces are nested so if I make the Larger and make it 2 and primes they then we get additional terms see and it just increases to function space. [00:04:07] And the thing people have been starting with a purely approximation thetic. Idea is. To answer the question whether it's true did any continuous function on say 0 want to D. can be approximated with respect to the. Uniform norm. By a network if we make the the the end large enough OK So that means the Center for any continuous function and any epsilon be confined to an M. such that this cell network that approximates dysfunction up to up to uniform power of order Epsilon and now you can wonder is that true or not. [00:04:56] The thing is it depends and it depends on the the activation function registers to unload. First of all what you should see is to the one dimensional case is much easier OK so if you if you think about value if you sell the networks and one dimension then this becomes just a right this is just a number as well and then we just. [00:05:22] Somehow have the seat most here. And if you take for instance Sigma to be doing the rally or function. That maximal X. and there are right so you seeded with one that in one dimension if you use just a good piece Why are these constant functions precisely no functions. [00:05:45] And so all what you have to study is somehow where you can with piecewise linear functions where you can approximate any continuous function visions of obviously true so this is some harder one dimensional cases it's much easier. In 2 dimensions or. Higher dimensions it's much harder and it reflects the end points is that you can do it if you have 2 inputs that you can do many things like multiplication though so which are not so easy to to to to approximate the destructor of this form. [00:06:20] And so we're in the early nineties until the mid nineties maybe dare in many different proofs I did just to prove this universe approximation to and they used to fund ideas. Mainly based on for your transform. Don't transform and also on the theorem. These different proved techniques they they tell us a bit about how the cell phone network approximate functions and therefore I want to discuss several of these prove techniques the only one that I don't discuss is down by nothing because it's a completely abstract proof and the other ones are more or less constructive and therefore you I want to discuss them a bit. [00:07:05] Yeah. Yeah. No I just give you this as a problem to think about so so let's consider dysfunction and now I give you to task approximate this function by a function of this type where M. is say fixed and you and so how do you do that how do you do this approximation let's say 6 most of value activation function OK but even then but if you're in 2 dimensions it's not so easy to approximate multiplication with the network. [00:07:46] Because it's the inside you have something linear OK so it's just a linear combination of X. one and X. 2 here and on in on the American nonlinearity comes through the food in here yeah so you can try it maybe I said. I know OK Good point so the universe approximation is doing this for fixed. [00:08:12] My solar some of to seek Mars for a large class of sickness this is true but then there are also some signals for which it's not true I would come to that. Since there's no serum get SR statement so that's a definition so. We say the universe approximation property holds for several networks that activation function see if this is true and then to see when will come later and that the conditions on the Sigma. [00:08:39] Very quick point. OK. The orbit so. And so how do these proofs work. Many of the proofs that exist they.