[00:00:05] >> So Dr Sun The movie is currently a principal research scientist at u.t. r.c. conducting research in computer vision machine learning and building products in robotic inspection from this research prior to this he was assistant professor of electrical engineering and jointly assistant professor of applied mathematics and computational science at cost a us to study into something of an he directed the computation at cost which developed a novel mathematics and all reasons as well as software for video and age and are standing technology is fundamental optimization of years and have to advances in. [00:00:51] Motion based the video something Taishan detection his group also developed technology for seismic image analysis electronic Mike Ross p. image and medical images prior to cost he was a Post UK associate with Professor stuff from the so our total in visual u.c.l.a. from Tucson age is on the 10 there he made fundamental contributions to the new View invariance problem object recognition and developed I can all of you for video tracking his fish he is young and computer engineering from Georgia Tech so actually he was our alumni in 2008 is ph d. developed a fundamental shape optimization methods for computer vision that aided in technology for video tracking and medical image analysis he was advised by perforce Anthony Yazzie is bachelor's degree was in Computer Engineering and Applied Mathematics wish he also earned from Georgia Tech giant 3 he's currently area chair for the leading Computer Vision conference including c.v. p.r. and Ok and is. [00:02:06] International as well come every speaker to give this seminar today thank you. So thanks for having me so today I'm going to talk about new stuff in that's going on in deep learning. So I'm going to sort of give you my spur Specht of on that. So my title is called Solving the flickering problem in modern c n N's So what do we have what I mean by flickering so suppose you have Suppose you take a state of the art object detector and apply it to b.t.o. you get results that look like this so some Sometimes the object is there sometimes it's not if it detects it sometimes it goes out of you comes into view and that of course is is not good. [00:02:57] And so this looks like going to a small sort of annoying problem that we can sort of sweep under the rug and apply some conventional techniques then we knew we should be able to correct this but it's actually a little bit more fundamental than that and so I want to sort of give you the issues behind that but one of what I want to say 1st is that you know tell you why I'm working on this so I'm at currently a United Technologies which basically builds. [00:03:28] Most of the components for a commercial aircraft so it builds things like the landing systems cargo systems engine systems and a lot of that is required in requiring automation So for example in the landing systems you want to be able to detect the runway accurately and has to be stable for safety reasons cargo systems if you if you automatic cargo systems that pick up creates position them everything has to be stable automatic inspection systems for engines you know if you miss detection on a defect that could be disastrous for an airplane right when the flies. [00:04:10] So you know kind of things that this automation is that something that is sort of something nice to have or is it actually a need actually it's actually a need so these are the sort of the trends in the industry that we're looking at. So there's a glowing growing urban population. [00:04:30] And most of that growing urban population is the global middle class. And that growing middle class is increasingly wanting to fly between cities. Most of the populations between these urban between urban cities there's a need for more flight more flights there's more demand in flights etc And there's a lot of new types of aircraft that are coming in. [00:04:56] Small aircraft where that are also requiring a lot of needs. So yeah so there's really a short history of pilots so currently we're in need of. Roughly $30000.00 pilots a year. The cost of pilots is greater than $30000000000.00 And if you just look at things like very simple cases like entering into the runway there's a lot of accidents that happen there and that that accounts for just that already accounts for $10000000000.00 per year so that's a that's a big thing and so what we want what we're working on at you see is trying to develop cyber pilots so automatic autonomy for flights. [00:05:43] And. One of the needs to do that is validated and verified perception so perception this is quite a big thing to do this and what I mean by validated and verified Well every single component that you put on an aircraft it has to be go through a rigorous very rigorous process right so every input that comes in you have to provably say that this you're going to get the correct output. [00:06:11] Ok so what we need is real mathematical proofs too to ensure the reliability of these systems without that kind of stuff nobody's going to put it on an aircraft are 8 So back to this flickering problem. So one of the questions that comes across is probably I don't need to convince many of you but one of the issues I come across and say in aerospace industry is say why why deep learning solution can use something else well the problem is that well as you know before deep learning it was kind of painful in the sense that we had to sort of manually engineer features the deep learning sort of automated that whole process and so it's there's the more convenient automation of this whole feature learning process and it really outperformed the existing computer vision technology and so this was like you know great after 2013 when you know when this deep learning was demonstrated on the image net benchmark many people switch to deep learning so basically deep deep learning is the best right now and so what we would like to do is to try to make it more stable. [00:07:27] So if you try you know you're doing the usual tricks to correct this whole flickering problem you can do data augmentation you can look at the proposals jitter them around a bit. All of those do you know work a little bit but doesn't solve the main problem you try to smooth across in time that doesn't help either because you know sometimes yeah it makes things a little bit more stable but it also gets rid of true positives which is not a good thing so it's there's there's a there's a lot of issues in this and so what I was 1st started this I asked well what about all of the car industry they should have this whole thing working. [00:08:09] So this is the output of Tesla's auto pilot I mean you can see that it's pretty good right but you can see all this this kind of flickering sometimes a pedestrian is there sometimes he hasn't which is which is not a good thing who or when you have an aircraft in the air. [00:08:29] So what about the academics who work on self driving cars it takes a little while for technology from academia to come to industry so what are the what are the what about the academics at work on self driving cars they must be on top of this they must know how to solve this but you can. [00:08:50] Use your. Own. Song. Very. No. No. This is more. Than. All right so you know the there's a flickering problem seems to be quite pervasive. So what's the problem I'm going to claim it's lack of invariants So this is a study from Professor to color the cardinal's lab published in The Journal of a neuro science where he'll where you basically. [00:09:40] Looked at a sort of comparison between human and machine performance so the idea is you take random background images you have 3 d. models of objects and you can render them into the into the image that did various different viewpoints and had various different differing illumination conditions so on and then you can sort of ask human to classify. [00:10:05] The the object and you can add a deep learning system to declassify the object then you can measure the performance and there's also primates in the study as well and you can see the deep learning is good in the number of tasks but for this one this kind of these kinds of tasks which are quite representative what happens when you're dealing with aircraft you're coming in and Oakley Gangle and so on so forth. [00:10:32] You know the performance of is the latest deep learning neural networks are not very good compared to humans. Yet so deep learning systems are not invariant unlike humans to view point illumination partial occlusions quantization Xandra and you can even fine tune things on this on these synthetic datasets and so on but that still doesn't work. [00:10:57] And of course I mean if you have enough data whatever that is maybe trillions of images so on labels and you have very huge clusters and all that perhaps you're going to increase the performance but you know in the industry we don't really have time to wait for all that kind of stuff so we need to sort of move ahead anyway. [00:11:20] So this is that would that was sort of synthetic images but now there's a recent paper coming out 2019 which just looked at video you Louis you apply your detector classification to one image you go a few frames ahead and then all of a sudden the classification changes for it so again go from domestic cat to monkey just a few frames ahead and the changes that you see are almost imperceptible. [00:11:47] So. So this is what the conclusion of that paper is our analysis demonstrates separate occurring naturally in videos pose a substantial challenge to deploying convolutional neural networks in environments that require that require both reliability and low latency predictions. Yeah and so this is an even more surprising case so this was a paper by. [00:12:13] And Weiss where they just took you know State of the art convolutional neural networks that were trained on imaging at and just noticed the fact that if you shift the image. You the predictions can vary drastically. And this was sort of a systematic study in the show that up to sort of 30 percent on modern cnn's prediction. [00:12:39] Can change. So that's kind of surprising because what we learned with convolutional neural networks is there's a convolution supposed to be well behaved with respect to translation and then there's these pooling operations which is supposed to give you some invariance to translation but clearly this shows that that's not the case. [00:13:00] So this whole idea of invariance has always been a big problem in computer vision it's nothing specific to deep learning and there's been a long history in this in this area so I want to just briefly mention some of the history that's it's been kind of a roller coaster in terms of in terms of what's going on in the eighty's and ninety's it was sort of the big thing in computer vision everyone was in church of those invariants literature in the ninety's suggested that what or what people were searching for did not exist great so there was a paper by Reisman who said that while viewpoint exists invariance don't exist there is a paper later which showed that illumination invariance didn't didn't exist so live this looks like bad news but you know if you consider the problem a little bit more. [00:13:52] Detail. Under certain constrained models of illumination you can you can show that invariance do exist and that nonexistence of viewpoint invariance led to the surge of local invariant features where you sort of didn't want to search for a very general case invariance but just very small a very small restricted view point of view point transformations so there's some work I did along with Stefano's lotto where we showed that if you look at the viewpoint invariance of the photometry rather than the geometry of the of the 3 d. structure you could show that they're there in fact. [00:14:34] These invariance do exist at the time we didn't really have any very practical system so that kind of die down. Right now everyone I mean I mean there are of course people working on this problem invariance but the large majority are not. And so. Great now 2020 is what is going to happen while I hope that invariance will we revisit invariance in the terms of deep learning and so today it's going to be a very very small step in that direction and so we're going to look at solving this most basic invariance. [00:15:17] In the lack of translation invariance in cnn's and I should say that you know a lot of the methods I'm going to say are you know quite quite simple but I think putting it together in a framework is it's quite interesting and. Shows shows something. So I want to make clear what I mean by invariance So everyone there's a lot of people have different versions of what invariance is so what I mean here is that if I shift if I move the camera. [00:15:49] And in plain translation of the camera that's going to correspond to a shift of the image. And what I want in an invariant descriptor is that I want what in very descriptor is is just some representation of the image which does not change as as as as you move the has to translate the camera. [00:16:11] So that's different than something called covariance where you know if you if you look at the representation of the image of the feature representation of the image then a translation of that of the image is going to also correspond to a translation of the of that feature representation so that's a different notion it's called covariance some people call it echo variance. [00:16:37] I'm going to mainly talk about this invariance but I'll mention this at some in some some point. Ok So before we get to how we get this translation invariance in deep networks I'm going to go back in should ask how do we get invariance How do you construct an invariant descriptor or feature anyway. [00:16:59] So here's kind of the process very very simple idea but this is this is how you do it so imagine that you have so we're going to treat the image as belonging from our 2 to ours so then assume that you know there's an infinite field of view of the camera so you get these images are 22 are so in one in one dimensional I'm going to sort of depict the image is this Ok so for illustration I'm just going to say that the image is this one little impulse function it's 11 little dot in the image so if I translate the image and then use this notation here p.s. I translate the image I'm going to you know essentially do that to the image of a shift by to. [00:17:43] Shift to the to the left by 2 arm and so now what you can do is do something that looks a little bit stupid but bear with me. So what you're going to do is you're going to translate the image by pixels. And you're going to you're going to see you're going to consider all of those translations of the image right so you're going to get this here's the original image you're going to translate by one pixel to pixel etc all and also to the left now I'm just going to some all of these traits I'm going to use some all of these and I'm going to and that of course is the invariant of why well so if I sum all of this what am I going to get I'm just going to get this all of these ones right so what's the proof Yeah it's a very simple change of variables straight so I probably have to cut this short not going to go through the details but it's a very simple change of variables and this some of these is just this constant rate and you can see if I shift this thing I'm going to get the same thing right so that that is invariant to shipped Ok so in practice I don't need to to really physically translate the image by all possible translations. [00:19:01] Hike in equivalently So this this value here is just the sum of all all of the pixel rate so that that's that's that's. That's sort of the proof and I guess it's again a change of variables so instead of considering all possible translated images and summing them I can equivalently just some all the pixels in the image and then repeat and then at each location put the average are the sum value their rate so yeah so this this says that you know there's an equivalence of pooling over the translation group and and spatial averaging. [00:19:38] Ok so what's the problem here well can we just use this average value as a feature we'll probably if you know anything but computer vision you'll say the that's a bad idea why because. I have these 2 images they have 2 different. Average values but if I. Compute the average and if I sort of normalize the images such that they have the same average I'm going to get something that looks exactly the same here so if we were to use the average value as a discriminator would it would it in this case we would say that these are 2 different images are 2 different objects in this case they would be the same object so that's not a good idea. [00:20:21] So average value is invariant to translation but it's a poor classifier So why they tell you about this well we can sort of we can build on that fact a little bit and we can. We can instead of averaging over all possible translations we can just average over a small. [00:20:42] Small set of translations for a limited set of translations so instead I just some like this and so what does that do so this if I so let's suppose I have an image that looks like this and then this is the translated image by say one pixel. Then if I compute this feature it's going to look something like this where the height of this is just want to over n.. [00:21:07] Now if I look at the if I look at that shifted image I compute that I'm going to get something that looks like that right so now if I take the difference between these 2. I'm just going to get so absolute difference between these 2 and some all of them I'm going to get to Whereas in this case I'm going to get to over in so so as you can see here this is a little less sensitive to translation than than that is so we can we can still get some insensitivity to translation. [00:21:41] By. By doing sort of this limited average. And so this what I mean by insensitivity mathematically is that if I look at the difference between the feature and the feature of the translated image then I can bound this by some concept that doesn't depend on as and the size of the translation. [00:22:04] So point here is that we don't in practice we don't need something we don't need a fully invariant descriptor so long as the classification doesn't change. And so this idea is you know it's not something that's really new here it's just ideas that have been around. In this whole generation of local invariant descriptors in fact there's paper 2000 which is called geometric battler which are out of to have these kinds of ideas there. [00:22:37] So back to deep learning. The question is can we just sort of blur the image and then input the image into our convolutional neural network Well the answer is no because after the 1st layer you may not guarantee that in sensitivity. So we have to be a little bit smarter and so what we're going to do is we're going to enforce the incense and translation insensitivity through. [00:23:04] Throughout the network. And so we're going to enforce that the convolutional kernels are. Our smooth kernels. And so the question is well if we just hand code these these filters is that is that is that going to compete with these learned approaches Well that's where this. Her might approximation theorem comes in so you can take whatever kernel you have and you can so you can approximated by a linear combination of derivatives of the Goetia of a Gaussian Ok so what we're going to do what we're going to do is we're going to learn those quite efficient so rather than the pixel values of the kernel. [00:23:50] And there's some literature on human vision which. Has some theory about how the human visual system works and there's some theory that says well what the human visual system is doing something similar to go. And it doesn't use too much in too many derivatives up to 2 between 2 and 4 is where the majority of the information is contained in. [00:24:16] Ok So each of these Gaussians and its derivative lead lead to translation in sensitive kernels. So I said that so rather than learning that the values of the kernels we're going to use is Gauss or might approximation so this is this is how a layer of a conventional neural network looks like so you have the input from the previous layer there's a convolution with the kernel did rescind sort of more recent work is using 3 by 3 kernels for the majority of the time then there's this nonlinearity So there's this rectification it take the max between the the output and 0 and then there's a there's a way to reduce down the image to something. [00:24:58] Some smaller in size so that could be subsampling cooling whatever whatever you want to call it so what we're what you can do is you can think of this 3 by 33 kernel as being a linear combination of various different basis functions in these basis functions are just an indicator functions so what we're doing is just changing that basis to this derivative and its derivatives. [00:25:28] And that's that's really all the change is and with this we can you know we can actually show some insensitivity So I want to mention some sort of related work so there's some to be quick because I started late so I probably have to go quick but yeah there are some ideas that were motivated that are already there in the literature so there's a work of scattering transforms with Stefan Malott who sort of designs these waves a hierarchical feature transform to get in sensitivity to translation and and. [00:26:07] Defamation and but none of this is learned and there's some other recent work 2016 which actually uses these might functions but their motivation is to sort of reduce the number of parameters in the network and they don't analyze sensitivity to translation so on and there's several other works using this now for and various other motivations so this is our result so this is like. [00:26:36] Layer of a layer of the network. Just exactly what I showed you before and what we can prove is with this kind of bound like this so if I have a stack version of these layers. And then I do an average cooling at the end. Then I can take a tour of a bound the difference between the feature and the translated feature by one over Sigma to the to n.. [00:27:06] And then this is the this is the Lipshitz constant less than one to the nth power and then there's these product of the weights and then this is the size of the feature and then this is our translation amount so what's happening here is as you can see is Sigma gets large this is the sort of layer more and more things this is getting more and more insensitive to translation. [00:27:32] So I have to point out that this also depends on the training process rate so these are weights that are learned and so we have to add some weight regularization so that this this doesn't go out of control. So we can also do things like when we can we can also generalize this to defamations so defamations look something like that in the reason they're interesting is because viewpoint changes of of the of the camera correspond to defamations or diffeomorphism. [00:28:06] Of the domain and also 3 articulations and things like that correspond to transformations of the domain Alright so let me get into some empirical demonstration of this idea so what we did was take say Lake resin that. So this is one of the most popular convolutional neural networks for image classification and so what we're going to do is we're just going to take the 3 by 3 kernels in that deep network keep everything else the same. [00:28:34] And we're going to change that to this Gauss or might approximation and what we call it we call this Gauss net so implementation so the implementation deal it's not very hard to implement this you can use a so all of this is supported with the current backpropagation tools with torch and. [00:28:58] Tensor flow things like that. So the main point is that we have to implement these convolutions with the f f P's but that is supported in these packages for the most part so we tested things on a dataset c. for 10 so this is a typical dataset 60000 images 50000 training 10000 test in the task is to classify and so this is how we evaluate So we take the test images we shift them by one pixels in all these particular directions and we measure the probability of change. [00:29:37] In the classification. And also. We also measure the probability of change of at least one of these so 2 different metrics. Should be low rates. So there are some different shifting conventions so when you shift the image there's always what do you do at the border. And so in the in one in one of our protocols what we do is we just fill in these values with the corresponding pixel values in the original image in this in this one what we do is we take the image down sample it down to 30 by 30 and then 0 pad it so that it's a 32 by 32 image we train on these images on them when we shift we just shift in the regular way and then we just bad Ok so those are 2 different ways of doing this. [00:30:31] They were also something similar to this paper was already published so this is this is what the graphs sort of look like during training these are different box during during training these are the sensitivity scores so we're right now just pay attention to this red red line and this this blue line so you can see the sensitivity decreases all explain what anti aliasing is in in a few minutes. [00:31:00] And you see a similar pattern with this with the 2nd measure. And you get similar similar patterns that you see here as well on this this other version of the sea for Pan 0 padded data set. In terms of the training accuracy pretty much all of the models get very similar test accuracy. [00:31:26] But the Gauss net is much less sensitive so here like the final results you can see that test errors are all very similar So we've tested it on various different incarnations of the Rez met So 18 layer version versus a 50 layer version. And you can see that the sensitivity is reduced this no subsampling is. [00:31:50] Performing no subsampling So if you perform no subsampling Theoretically it should be fully translation invariant. But. It's so you would think that this should be 0 but there's all these edge of facts that also come into a into place so it's not going to be perfectly 0. So yeah so in terms of the architecture sizes pay attention to these $2.00 so the set is actually smaller in terms of the memory cost of this because there's fewer parameters rate so we had these $6.00 parameters instead of the 9 parameters of the typical convolutional kernel. [00:32:31] In terms of speed so the trading speed is roughly the same Maybe it's like 2 x. more or something like that for the same number of the pox but inference is much it is a bit more so it's 7 x. but we haven't really optimized everything so far all we did was take a resonant which is designed was chosen for different reasons and we just took exactly that same thing replace to try to demonstrate the idea. [00:32:59] How much how much time to weigh whose stories like 15 minutes 15 minutes Ok. So yeah so I should say that you know this whole issue is not completely resolved in the sense that you know the bounds that we have are based on the Sigma the higher you choose Sigma the more insensitivity you're going to get but you know how do you choose that parameter really and so there's no I mean I mean so if you if you if you choose Sigma large it's going to also reduce the test accuracy because you you have very coarse scale features that sort of blur out everything. [00:33:37] So that that's sort of not resolved I'm going to skip this proof due to time but I should say that the proof of insensitivity is not that difficult it's just based on very simple calculus very simple calculus estimates but the main idea here is that the main idea is that when when you represent the kernel as a weighted combination of these Gaussians the Gaussians are smooth and what you can show is that the difference between. [00:34:07] When you sort of calculate this is this this estimate you're going to show that it actually depends on the difference between the Gaussian and the shifted Gulshan which you can bound which is very which you can bound by by this kind of thing and then layering it you sort of sort of propagates through as well so it's not that difficult. [00:34:27] Would remain here so yeah so you know why are the conventional cnn's not. Wired Why I mean they not be insensitive well. Well that's because of this if you sort of look at the estimates you have to have that the difference between the kernel and the shifted kernel should be small and in practice there's nothing that in forces that. [00:34:52] When you train you don't typically it in for smoothness of the kernel. And you can sort of feel like this if you look at the kernel and shift it Colonel to look at the difference does this is going to be high because there's not smoothness and then over here is you're going to have edge of facts. [00:35:11] Yes So I don't talk too much about the detail but you know there's. You can try to correct this up by doing a pooling. At at the layer at the end of each layer but if you subsample that's going to break it so what you can show is that the difference between say the combo of output and the shift it comes all the output is going to look something like this and because there's no smoothness there's typically not any guarantee of the smoothness of the input coming in as well as the kernel this can actually be quite large so yeah that's that's the reason there. [00:35:59] Is So there's another approach that is in the literature it's sort of a called anti-aliasing approach on when the idea is to comes from Signal Processing principles where when you take a signal processing course the slogan there is a you should subsample without. Anti aliasing or low pass filter in 1st. [00:36:24] And that's oftentimes ignored in convolutional neural networks. And that local low pass filter in preserves an approximate version of translation covariance and that is a result very old result and. Within our framework we can also show that this is translation insensitive as well so that sort of works out in terms of being insensitive. [00:36:53] And so the idea here is that it sort of keeps the sensitivity of the original kernel and then does anti-aliasing to sort of undo the sensitivity at the end. But you know so typically you know in you know when we're designing convolutional neural networks. You know it's not the case that we're trying to reconstruct the sun the signal so what we're trying to do in it really is we're trying to do classification so we have this very big you know billions of millions of pixels we're trying to reduce it down to something very small so the idea is not to really try to reconstruct things. [00:37:32] So will there be more direct ways to translation insensitivity than trying to do this kind of anti aliasing. And you know in our work where we're trying to sort of prove or disprove one approach is better than the other we're trying to sort of derive principles and explain things but we did come across this case where this anti aliasing approach is much worse and that's this case of this 0 this sort of shifting and then 0 padding at the end all and you know I can completely say I understand fully why that why that's happening but in this case you're sort of the new information or new data that you're introducing and removing is is is sort of. [00:38:21] Is more of a drastic change than in this case where the information that you're removing and adding in is essentially redundant and so these 3 by 3 kernels are very sensitive to these kind of small corruptions. And this may look like sort of a you know a very ideal case why do we care about this well in practice when you do a translation of the camera there's always issues of visibility right so there's always the issue that something comes into view and something goes out of you out of you. [00:38:54] And so we have to not only be invariant a translation but we also have to be robust to these kinds of or clues in phenomena so that has to be. Thought about in terms of the design it's well. All right I'm doing good. So you know so I would like to say well you know you we have the whole backbone we put it into state of the art object detector and you know we solve this whole flickering issue but. [00:39:22] This is ongoing work so maybe in a few months or something we'll have all of that done. But to summarize the talk so current cnn's lack many human intuitive invariants is current c.n.n. slack invariants the translation though convolution and cooling are used. And this is attributed to the lack of smoothness the enforced in the kernels. [00:39:50] And this can be made translation insensitive by enforcing smoothly smoothness via Lipschitz. Smooth basis on and there still remains issues so this is not completely a solved issue as I said there's this dependence on scale. Larger you choose it the more insensitive but then there's also issues of accuracy. [00:40:16] And so also we need to sort of what we didn't really explore too much is that we can also in force some of this during the training which we didn't really do because it because the sensitivity depends on the weights that are learned so you can try to enforce some of that in the training as well. [00:40:34] So the big step to sort of move this on is trade do figure out invariance to viewpoint in networks that's a big step so you know the typical things like in plane rotations scaling that kind of stuff you know there's ideas out there it's not that it's not that big of an issue. [00:40:55] I'm not going to say it's trivial but it's you know it's there's ideas out there the viewpoint there's not there's not really that many ideas there so we knew where we're going so we think this is sort of a big step there's a whole literature on adversarial robustness and there you are you can sort of prove robustness for very small networks to a small additive perturbations which is a good step but when you're talking about vision applications you also have to consider domain disturbances and we have something now that can handle these domain disturbances so we can actually use some of these verification methods to actually prove that you have robustness provable robustness So that's kind of our next step going forward. [00:41:47] So that's all I have and you know this is just getting going so. I'm getting started on this and we're always looking to hire So if you're interested let me know thank you. And that's Ok that was what I was talking about here. Yeah I mean that's that's something that's to be tested empirically but if you just shift by a little bit a little amount it's kind of similar to this case. [00:42:38] So what what happens when you shift the camera to on its own expected so those are sort of small corruptions or whatever but since we have these large kernels they're quite they're pretty robust to those sort of small corruptions so shouldn't change to drastically but if you have like a very large inclusion. [00:42:57] You can't we can't say anything about that. Right. You're. In. A hole. Now. And then just use it without training. In the network because the player. Model years. You. Know so I mean we did something similar in the sense that we just took we just took a regular train present that took the coefficients projected onto the basis and then c. use this to see what the outcome was made wasn't too good you have to retrain things but in terms of like the last layer we didn't try late just training the last probably won't work because if we train read readers yeah yeah. [00:44:00] Yeah but I think these are kind of these things seem to be adapted to the In this seems this seems like it yeah. Yeah but it didn't so you're so you're the only difference between what you're saying and what we did seems to be just you're saying just retrain the last layer. [00:44:23] So we didn't try that but we retrieved the whole thing so we just did the Yeah Ok never mind yeah so Ok yeah all right I can try that or Yeah Ok if this. Evidence evidence of what you are in favor of you're not at the additional evidence Ok yeah yeah because that will lower than what you say that if I now create a training regime that already and I base that in a better convergence properties. [00:44:55] Yeah that those voters who couldn't be right if I can say well the landscape is crappy but that is going to happen. You know that was the basis. Of you wonder why. Would they. Be this bought by nothing but I did offer that you're not happy that you got back your. [00:45:24] Life like I thought like women golfers I would never write for men. Though if. I read now I do believe it's possible sewing random really but I can get on there but I've heard that you know a little bit bigger going I'll be right you have a picture of the big out. [00:46:01] There Ok so let me make that I think sure you don't use a 3 by 3 but you mean this one this one you're right the only thing that I think you're in a little bit yeah yeah why. Are you. Well the main reason so they're not doing this for no reason right the 3 by 3 is part partially efficiency right you can't do that already. [00:46:29] No I'm talking about the parameter efficiency rate so 3 by 3 is because there's only 9 parameters rate so I mean initially there was the. I think lake the Alex met or whatever it was 11 by 11 the 1st layer on and then progressively It got more painful to train so people used does smaller and smaller supports. [00:46:52] Yeah yeah but so this is the yeah so the difference here is that. I mean we do have these larger support kernels right but but we only have a few parameters because we're just taking a linear combination of them. Yeah it has to be bigger Yeah these it's not bees are not 3 by 3. [00:47:17] Yeah it's not it's not a direct Yeah yeah. Yeah. Yeah yeah yeah yeah yeah so yeah so. Yeah so yeah I have to implement things with f.f.t.. Which Los things down a little bit and surprisingly f.f.t. has bugs and by torch. Distributed training. And. Yeah we did all the training on but once you start putting a ws and then distributed by approach training there's some bugs that are still being worked out between a ws and Facebook. [00:48:01] What's the size of the feature map so each for each show. For each other feature map it's the size the support is the size of the feature. No I mean a movie as long as you train on those kinds of textured backgrounds. Yeah you can you can have various As long as you train with various different textures in the background. [00:48:42] There should be any type of problem. There.