Thanks for having me and it's been fun to visit thank. You so. Understanding complex noisy data streams are obviously a really important part of vision and in part of cognition in general right so you know parsing the sort of buzzing like a booming buzzing confusion of the world is like a really challenging piece of figuring out how to deal with the world right so before I go forward I should say I please I definitely would like questions in the middle of the presentation right so you know stop me any time I would much prefer to not get through my slides but like actually communicate then you know sort of just talk at you guys for an hour or so. You know it's there slides and I have some like prepared text or something but please just just interrupt me when wrote one relevant so you know I was saying this type of really noisy data streams parsing was very important you look at that image you can tell pretty quickly there's two cars there one is in front of the other you know they're on a field in front of mountains and there's much faster for you to figure that out for him than for me to say it and you can do this despite a huge amount of variation right like position post size lumination distortion and noise mean background variation seen variation rate presumably has seen this image before but you can still tell as a car and even if you don't know what an alpha is you can still tells a car so you know there's obviously a lot of variation that makes this problem challenging and you know this is true also in other domains not just in vision right so you look at that it's hard for you to tell what that says most were here before I was testing. But. They are that says Hannah is good at compromising and you could tell that even though like you probably haven't heard that particular sentence before with that speaker identity and so forth. And what's really going on is that there's some sense in which a sort of explicit representation representation. Being constructed where somehow under some like some objective function the categories of interest are being pulled apart. And you really need to do two things to make that happen to be for different objects right need to tell when it's you know one car versus you know a boat or something and you have tolerance for changes to the like the input and so computational easy to have any one of these things right because templates are integration to get either of them but together is really hard right so in a way to visualize this right is that you have population representation of this thing. Along some kind of manifold right say an object identity manifold is the face of sort of transforming through space and a good representation have this property that two different faces will be separated across all those different transformations a bad one would look like this right but the real one on the pixels is like this right in some sense this is the problem is that they're all kind of jumbled up together there's this tremendous tangling of the interests of the factors of interest rate another way to think about this is that the sort of natural physics axes of the world like for example you know retinal photoreceptor voltages or hair fill Point number two so those are the sensors for the different devices things that are like street in those axes are really not straight in the axes of natural behavioral vents like a face deforming face moving through complex that environment are totally misaligned OK So that's one reason why the problems of this kind are computationally hard there's a sort of nonlinear misalignment but also it really needs to be done fast which is another reason why the computation is heart rate and you look at those images and you can figure out what's going on and then you kind of know what the stuff is you know WAY faster than it is for you to say what they are and you can do it while you're listening to me talk nonetheless like that processing has to happen somehow pretty quickly impair all right and in fact if you think of the object core representation regime as being. Of roughly one hundred to two hundred milliseconds of looking at an image I mean humans are like significantly higher than chance or practically like maxing out their cut their power even by that very short duration so the core idea is that cortical brain meet effectively is doing some kind of computation to go from this representation to this representation OK and in particular that it somehow untangle in those very tangled up structures right and you know it's well known from a lot of neurophysiology in anatomy and stuff that those processes are likely sensory Cascades you know there's a series of fairly simple operations but that nonetheless the winning three put in series they do this very complicated nonlinear on entangling OK. Now. Sort of putting a bunch of neuron atomic architectonic and sort of latency evidence together what you see in the visual system this is the ventral visual stream and in particular this is this idea that a lot of object recognition and other higher level visual behaviors may be read out of well here are roughly at the top of the sequence of processing where you know maybe each stage is roughly ten milliseconds after you get into cortex of the sequence of brain areas that pass data on this is what a sensory cascade looks like so that roughly another way to think about it is that information comes in on a particular on the retina and then it sort of gets reprieves center really represented out through these different layers to the point that by the time it gets to the top of the ventral visual stream in the inferior temporal cortex it's possible to read out lots of interesting things easily from that neural representation OK. So you know if you want to think about this in terms of the sort of have the fast moving up the reason of it is something like this where you basically have you know that sort of sequence of thing. Things going on in your head like that are you looking at the images in the neurons are firing in the sort of sequential way roughly doing something different for each image but somehow in such a way that by the time it gets to this area at the top lots of things have become explicit about what's going on in the scene question. It's probably not smooth in the sense that. You know there may be clear ways in which small changes in the pixel representation lead to big changes downstream and so like you can think of non tangling as being more entangling as being a process that actually makes things look of that are quite fairly non smooth actually and you have to kind of to smooth them out a bit. You're adding the well in a sense yeah I mean if you learn the parameters of that process you you're adding to your prior information about what the parameters should do right so of course the information is all in the image if you can see it by looking what the thing is right so in a sense it's like there's no it's not like the information has increased the amount of information in the Shannon theoretic sense increases that's not possible. But that doesn't preclude it being the case that the information is laid out in a smoother way so that simple decoders can get at it and I think that's a good way to think about it is not is not that to the extent you're adding information you're getting the same information for every image and that information is your priors about given a given pattern how should something be smoothed out effectively. OK that makes sense yeah OK. So you know underlying this this is this idea of there's these kind of three things the mule's neurons and behavior so stimuli are what comes in neurons are the stuff in the middle and behavior is what you read out of this rate and so what we've just what I just said basically in response to a question was the idea that there's this kind of information transformation process that actually loses a bunch of information that's why I've drawn it as like a bottleneck or like an hourglass kind of structure and then it sort of pushes it into this representation that can then be read out to do many different things like you know figure out what the category is location and size posts that are OK in a way where would be really hard to do this with linear readouts if you were in this representation. Right basically that kind of untangling a curse now to just make that concrete I'm going to tell you a little bit about a multi-year a electrophysiology experiment in the CAC OK so this was done in collaboration with Han and Jim De Carlo in Jim's group at MIT and what was done was implant of a multi or a electrode arrays into the four and I T Those are sort of toward the top of the ventral visual stream a couple of hundred. Sites collected and with that set up. About six thousand images were recorded responses to six thousand images so in this case the images were constructed by taking sixty four three dimensional objects putting them in categories you know it actually there's a categories of objects and putting them on uncorrelated photographic backgrounds OK at three levels of variation so like low variation where the objects are at a fixed position pose and size a medium level very very Asian in the high low res and the other can sort of be all over the place at different poses sizes and the object categories are sort of these natural sort of random selection of natural categories animals boats cars chairs etc So it gives you just a sense of what is in that So if you take that image that image that you recorded with the set up I mentioned a moment ago and then you've been the spike count of each neuron the neuron spike they don't start spiking until a bit after the the actual images are presented because the wait for the data to get through the visual system. And so if you've been it seventy to one hundred seventy miles I can post a mule's presentation and you average it over a bunch of repetitions then you get one scalar per neuron per image and that's what's in this matrix effectively OK so. First I want to talk about why this data is interesting what about this data makes it useful so. Welcome back to that imagine you have that data and you try to do some of the coding from it needing that you want to build a linear combination across neurons to do something like the tech to if it's an animal or not or a different linear combination of its current Whatever So those are linearly trying to read out from the neural representation the neural responses what's present in the image OK if you do that with the V. for data that's data from the V four here which is kind of an intermediate visual area what you see is that it's quite good at low variation so if you try to detect animals versus versus both versus cars et cetera the sort of categorization task so like something like sixty something almost seventy percent at low variation chance here is twelve point five percent so. This is in a way task but a before is much worse even though the neurons are dry and being driven by the images the stuff is in the receptive fields of the neurons for those who think about such things so but nonetheless the population is not able to easily decode what's going on what the content of the images can. You can actually look at human data on a whole bunch of these tasks such as the basic animals versus boats or his cars or Leica within category. Excuse me with an with in category cars task or with an incredibly faces task of faces versus each other you can measure human behavior and what you see is that actually on this basic categorization task that's the black bar now humans are way better at higher variation than before they're more comparable at liberation they're way better at higher. But if you look at the ID population that's the area after V four in the ventral stream sequence I-T. is much better OK so you can decode pretty well out of I.Q. with features treated as an Like machine learning features or with a linear decoder you can decode what the object is pretty well OK now this was back in two thousand and twelve and when we first looked at this a lot of machine learning algorithms at the time were also getting smirched by that high variation learning variation part of the dataset so there's something about the computations being done and i t from V four to I T that were interesting right and actually if you look at many different tasks each dot here is actually a different visual task what you see is that comparing the neural decode to these to human performance you can predict it pretty well out of I.T. So the same things humans are bad at itas bad at something humans are good at like he's good at you can pretty good pretty well out of this top level visual representation but less well out of the four and actually much less well out of earlier layers in the sequence right so there's a saying it's not just the performance is good but that these neural features are making an interesting patter OK and that pattern. It's predicting the human pattern it's on some detailed way behavior is followed these neurons right so what that's basically to say is that those neurons are worth explaining OK I think I will. Hold on. One. So we can just see it just briefly here is that monkeys actually can do this behavior as well and you can record you know exactly what behavioral patterns they make and if you see that then in fact you record their errors that monkeys on this task are very similar to what human behavioral patterns errors humans will make. So that suggests that there's something similar there's the neurons in the behave in the monkeys are predicting the human behavior and also predicting monkey behavior. So what this basically says is that somehow that visual representation by the time you get up to eighty is doing this and this decoding in such a way this untangling in such a way that linear readout is able to predict category behavior right so all this is sort of a long way of saying that a predictive model of neural responses in this pathway is of Rio's of interest OK you want to do this because you want to come up with a quantitative hypothesis for what generated. Those neurons those neural responses right that's why it's interesting for that reason OK So the obvious solution is to use something like convolutional neural networks right and this was even obvious five years ago because you know they sort of build in the basic neuroanatomy of the ventral stream you know that they're like hierarchical in the retina topic that is to say specially tiled right so these are the things that were known from the neuroanatomy therefore it sort of made it was obvious sense to use a system that had these and convoluted neural networks were designed with these in mind right so just briefly I'm sure many people here are familiar with them these are these multilayer neural networks and individual layers of the neural networks have made up of these sort of an early plausible basic operations like filtering threshold and pulling normalization cetera. Each of which has a kind of like neuroscience interpretation and a data science interpretation so you can think of these as being useful for various reasons and I can go into the details of all of this the sort of boils down to a tremendous amount of work that people did over decades and I'm going to just posit that we use this. The key thing is that these structures are applied convolutional meaning basically the same in all locations I mean in theory you don't have to have weight sharing but that's the typical idea to make it both tractable and also deal with the fact that sort of neural net natural statistics are pretty similar at different locations so that being meaning that if you have an input which is image like you get an output which is image like two and then in this particular picture there is like one grey slice per filter type OK. So in natural images. Averaged over many trials models like this with the types of nonlinearities that I mentioned a moment ago in that type of filters of this kind. Have predict roughly fifty percent of variance in natural stimuli in V one there as that is to say early in the ventral stream OK and key insight is that one of the reasons that they are able to do this is that the filters have this sort of not nice shape roughly in the form of Gabor wavelets where there's different like wavelet elements for different orientations and frequencies so this sort of boils up down together like a tremendous like decades of work in visual systems neuroscience to come up with this idea that is sort of one layer convolutional at work there's a reasonable job presenting neurons responses in early visual cortex you know it's usually think about where this comes from right I mean of course you can think of it as being sort of the intuition of the discoveries of the one and they're properties of the ones like there's just this fixed basis that that makes just makes sense if you're smart enough and maybe you can figure out what good properties of that base of such would have and do that in general anyway it seems natural in this case when the looking at their experiments the kind of theorize something like. These basis elements there are other good at ways to do this as well the idea that basically you want to have neurons represent their environment and do so as efficiently as possible and this kind of sparse coding idea that all cells and field and others came up with which is basically idea that if you want to. If you have a network that has takes an image and has to produce the same image out from it. Through a hidden layer and you impose that there is a sparsity prior on the hidden layer then it turns out that if you train the elements of the first layer filters to do this you'll get filters that look roughly like the Bourse OK so that's a really nice idea because it gives you a kind of underlying reason to understand why the filters are the way they are in a kind of prop in a principled way so it's natural to take some of those ideas and sort of push them up the ventral stream right if you're interested in neural responses in higher cortical areas. So after all you know like input input is image like outputs image like if you just you can just keep sucking right of course there's a huge number of parameters consistent with that idea right there's like architectural parameters how many layers how many filters etc network structure there themselves the continuous values and all the filter player templates so that's a huge number of parameters in this obvious big question is how do you can't find the right parameters OK. I'm going to just say in that they understand why this is the case is a lot of discussion but basically turns out to be really hard to generalize these basic approaches to many layered networks OK there's no obvious way to do this in a way in a way that works OK. There's another strategy which is neural fit you basically record data and then you put the parameters that do this that produce the neural responses right on permits basis but it turns out that it's really hard to do that too because there's not enough neural data to constrain this very large model class it's a very nice idea but in practice it's sort of leads to overfitting. And so the basic state of affairs in there of years ago when we started looking at this was that direct neural fitting was less successful higher of his visual areas before eighty cetera. So we came up the strategy which is sort of obvious that there's nothing big there's nothing non-trivial about those ideas that you use the other constraint you have which is you know the system has to do a task right I mean I showed you that earlier the system actually does do a really interesting behavioral tasks like optimize for categorization and then just fix the parameters that way and then compare to neural data right so there's really two underlying metrics here there's a performance metric accuracy and some challenging high variation visual object categorization task and neural predictive E.G. the ability of the model to predict each individual sites and are like that point and so the idea is that if you do this for a challenging task and this is a challenge for network engineer that the put the animals to do the task. The hypothesis was you know if you if you feel if you do this and you'll get a better optimizer one you'll get better and two. Of course this begs the question of what it means to map a neural system to the brain and one way of thinking about this is imagine of two brains of source brain in the target brain. A very natural way to do this is to just ask that one be the linear transform of another right it's plain not the case that for every neuron in Monkey one there's exact neuron a monkey to exactly match as. But it sort of makes sense to give yourself a little slip room that neurons and monkey one are at a linear combination of neurons and Monkey two and that's the slipper we give ourselves in comparing a model to the brain other words basically treated general site as a linear combination of model units figure that out the mapping with linear regression and then accuracy as goodness of fit on held out testing images OK So the question was whether or not we could do that so before we got into trying to deeply optimize neural networks to solve the task we wanted to do some simple high throughput experiments to check whether we had a chance So first of all we did something like random selection of model parameters you know random selection of those architectural parameters I mentioned a moment ago and then measure both performance on neural predictive eighty and each dot years different model and what you see is that there's a modest reasonable correlation between performance on the one hand and neural predictively on the other you know so it's about point five five. There's quite a bit of variability as well so this wasn't terrible. But it led us to think well what if we did some hyper parameter optimization on the architectural parameters and then just measured neural predictive it as were the blue dots are so it's blue that here is again a different model you see that the optimization works pretty well that you're able to get out to the right but also that the correlation increases a lot that this is now roughly eight point eight now so this is just well if you want to get better at making good predictive eighty maybe you should like performance optimize right. Of course the opposite the other thing might make sense as well it's optimized for a node predictively in the measure performance and that's what these red dots are and this was interesting because it sort of reinforce sizer overfitting problem which was that although you could optimize for no productivity you can actually get a lot better than you would do by just optimizing for performance alone and you didn't get your predictive any back I mean the performance back so blew me a lot more sense than red. But really this was fine this was where we were at the time relative to a bunch of other approaches but the problem was. Who are sort of bad everywhere I really wanted to be up to the right OK that was basically the problem and so we had to do much better at optimizing for performance and we did a bunch of stuff we basically threw the kitchen sink at it not with the idea that the optimization method itself would be anything biological but just to see if we could push out to the right. And so we did a bunch of things to sort of improve architectural parameters so I mentioned that automated met a parameter optimization ensembles of modern models chosen for boosting and also filter parameter optimization using. To cast a gradient descent on like whatever categorization task we had available and at the time we didn't quite use a software actually is different slightly different last one sions that are typical now but very similar idea and with those ideas together you know produce the model and we did it with a bunch of different training sets turned it turns out that the most effective one is this image in the training set which is by now become a standard in the field which is thousands of images in thousands of categories OK And the idea was training on real photos first question was well OK Having done that does a generalized of the types of stimuli that we had here right and to make that sort of test as strong as we could we remove categories of photographs that didn't appear in the tests that we like OK so no animals no boats and cars etc OK. We got a model that way in the test of it and it turned out that actually as you get better as you get better on this training task you get better at the performance on the testing task as well so generalization works pretty well OK In other words there are features we're learning something that was useful for general sort of object recognition even though we haven't seen those particular categories before I would say it is point what's happening is we're retraining in a way classifier on the top hidden layer of this thing OK. So just to put it in crisp ective number there is this picture it showed you before the red bar was the model that came out of that OK so it was doing pretty well on the high variation condition that was before and some of the models OK. So were our core question was does it predict neurons better thereby OK this is one representation of the data that's useful to look at for some neurons so here what we're seeing is like basically this is neuron fifty three and I T and it's like it's response is not time OK this is just response as laid out on a per category basis and so Each dot is like there's thousands of testing images here OK And so you know what you see is that this is a face neuron because it really likes to respond to faces way above the baseline level OK So this is an easy neuron to look at in this type of representation and so we can see is that if you try to predict using the linear regression metric this neuron from the top hidden layer of the model as mentioned a moment ago it does a pretty good job OK not perfect this is in Our Square about point five five OK but it's getting a lot of the sort of matching a lot of the ups and downs of the neuron and this is true not just for neurons like that are really easy to interpret like the face neurons but also for others like you know sight forty two which is I don't know it only exactly what it is but it has a bunch of peaks and the neurons are able to be fit pretty well you know the red line being the prediction. And so we were able to go quite a bit up to the right is the basic short story there that if the blue dots were what you see in the now here in the lower lower left we're able to get here so you know more than one hundred percent improvement in your fitting in much better performance as well. So that was really gratifying and it suggested we could have kind of predictive models of who are least reasonably predictive models of higher cortical areas which was a was really hard to figure out and there's kind of these mysterious areas we don't know exactly what they're responding to but that led us to have given that we wanted to then sort of investigate what was going on with them right so one way to do that is to compare the different layers of the model to the data OK so what we're trying to do here is fit that same neuron I showed you before but with Layer one OK the first layer what you see is basically able to capture like the low variation part so in this in this way have. Laid out by category but then within each block by variability so the images where there is like head on fixed position pose and size Those are all in the sort of left part of each block right so that layer one is able to be selective but when you look at the sort of higher variation conditions is just not able to be able to be robust tolerance as well but if you go up through the layers that's what's able to be done real to be both selective and tolerant at the same time so this is a pretty natural way of thinking about how tolerance is built through the layers of network and what you see if you look at the median over all the units and R T critics that that layer is significant you know sort of if you go up through the layers you get better that's what those red bars are. Now if it were the case that the neurons were totally driven by their categorical value this wouldn't be that interesting right because of course if you're good at getting categorization and your eyes are totally ruined by categories alone like they're just categorical in their responses Well that's sort of obvious that should work so to just check that out we did something like the built ideal observer model so basically models that try to predict neurons by knowing the category perfectly or all of the variables that generated the image the position the pose the size etc That's what you see here so that there is significantly less but there is certainly better than chance here and I tell you but they're much less good OK a question. Well I T. It's likely five years. We'll get there it's a good question. But but. No No So these are data collected so it's interesting all the neural data I have shown you so far is where the animal is doing R.S.V.P. it's just sitting there staring at the images and you know the animals task is to make sure that it's pixelation doesn't leave the center of the image OK now it's a good question what happens differently in the neurons if the animals doing a task it's a great question that I'm going to mostly not talk about today. But just noticed one thing is remember I show you those relationships between neural data and human performance it's funny what's going on there is that it's monkeys doing nothing predicting humans doing the task so even that even if you are basically that suggests OK don't worry about that issue too much for the purposes of what I'm going to tell you today if you're worried about dynamics and all that that might change but if you're interested in just that sort of like seventy two hundred seventy millisecond temporal average been then maybe you're OK because that average been in the monkey doing nothing while it's not away it's not sleep OK it's an awake behaving monkey where the behavior is to fix it what that says is that you know basically the as long as you are if you have the amount of awareness that I was like I said earlier I mean nothing what happens when see responses and I think as long as the animals awaken is motivated to do at least something that's watching the image. Then you know you can predict from those neural responses you can decode behavior and quite accurately at a sort of per task level so it's worth explaining data the data with us but even if there's more details that if the animal were doing something OK. So but what I was saying here was basically that the fact that these categorical or sort of all variables or deal observers didn't predict the data nearly as well as those models suggest that you kind of need both a performance constraint is going to actually do the task there will be all observers do the task we're all models and the neural networks do the task but you also need to actually be a real network. And it's basically those two things together a kind of performance constraint but in our with an architectural constraint that leads to the better neural productivity you can't just read the category off the image and that's that's what the neuron is doing. OK. But now to your question earlier about looking about intermediate A We actually also had data in other visual areas like before OK So we wanted to compare to that and just to give you a sense it would be for data looks like if you put it in that same representation that I showed you the face neuron in before it looks like this which is basically a mess OK And this is way subsample because otherwise would just be a black blob. This is why before it's hard to understand. OK. Now it turns out that if you look at earlier layers in the model and late layers in the model you don't do a particularly good job of explaining this data. OK no regression is not that great OK but if you look at intermediate layers especially this layer three you do not bad job OK Certainly better than you do in the early or late layers OK so I can't tell you why this neuron is doing that stuff why the peaks are the way they are but what this is saying is this intermediate layer of the network optimized to do this downstream test has the right non-linear basis to predict that that neuron OK so if you summarize this across all the before neurons in the population what you see is that that's like saying basically that you sort of speak in the middle in terms of your predictive it. OK And again you're way better than exist other existing models were. And just to sort of drive the point home these categorical ideal observers are terrible before they're packed they're basically not above chance question. So you're saying. Totally yeah there's nothing special about this being for right so I don't mean to give that impression deeper models that we can build now are better and they parse out the intermediate layers better so this is it's like it's calles it improves as you like twenty twenty one hundred layers of get worse you know of course you do better on the test eventually better than humans but it doesn't track with the neurons after a certain point. So that doesn't work so well but if you have you know ten layers or eight layers probably closer to the number of receptive fields you know this is later work you can get a better picture and you can maybe parcel out those areas unfortunately have quite enough data in P I T C O T T Right now separately to be statistically sure about that but in practice I totally agree there's nothing like this layer three is like just before and that's it alone but I don't I'm not I don't mean to suggest that I mean to say something like the intermediate style computations that you have to do you have to have them and when you have them they give you a better picture of what the intermediate neural response areas are like to get. So you know that just contrast that I.Q. goes up two layers this week for predictive repeat in the middle and as we in many others observed early layers also get what looked like the bore wavelets and a variety of other interesting phenomenology in Layer one you know in early layers and others probably better than also shown that those are like among various certainly among the layers of the model the best prediction of early visual cortex as in the early model layers that's not to say that these models of the one are better than other models of the one maybe that are more detailed biologically in those comparisons are really interesting to do and I think open at this point but anyway what you can see from this is basically I think I'll skip that in time but what this basically is suggesting the. Is that if you compliment the sort of from below approach where you try to figure out what each layer is doing and think about it and then build new filters on top of those but this kind of top down approach where you impose a behavioral constraint. With the architecture what you actually get is a pretty good picture of like constraints that multiple different layers along the pathway we don't have any of the two data we to date ourselves so we really can't make that comparison very carefully. But to some extent it's been made OK so. You know this was a really useful kind of. Intermediate model and point because it suggested that although we didn't exactly know what all those intermediate or early layers were doing we could put like this architectural piece of knowledge down and the functional piece of knowledge down and although they were kind of broad strokes by themselves neither one alone sort of tells you exactly what the model of needs to be putting them together gives you a pretty pretty more quantitatively exact picture so it was good but at the same time it's like you'd really like the bill to know something quantum qualitative as well from such a model raises like a black box I mean it's not a black box model in the sense that from a picture if this gets on me and my nerves a little bit you'll say it's a black box model of a black box system so what are you learning. And of course it's a totally transparent box in the sense that the cost of measuring everything about it is practically zero or whatever like it costs to run your G.P.U. right so I mean the brain is a bill is a black punks because it's extraordinarily expensive to figure out what it's an abscess but you can't do that but you get to measure the activity is very expensive etc. So it's in some sense this box is extremely transparent but what it isn't is obviously understandable OK it's not simple so what you really want to do with this such a thing is well how do you generate sort of qualitative predictions that are useful or interesting. And the basic approach of this is the only one that I really believe in at this point is you do a bunch of experiments and when something comes out of it that need. Does not match your intuition then you check it out in the real data OK so it's a strategic plan for coming up with qualitative predictions and so I'll tell you now about one of the more interesting ones that we saw OK So obviously you can do categorization tasks right you look at that image you see the plane and maybe you know it's an F. sixteen. But you can also like not only tell what it is we know where it is and how big it is and you know its aspect ratio and stuff like that in fact you can sort of quickly assess the scene as a whole right. So obvious question is Where are all these properties coded neurally. Right. And sort of obvious hypothesis and this was like kind of our world model to even if you look at especially given what I showed you a minute ago right that that basically would not be at the top of the ventral stream where these properties are included. Because basically you imagine the kind of transformations of over identity for serving like transformations are aggregated at each layer it's kind of how you manage an invariance is built at least it's natural to imagine invariance being built that way right so like as receptive field size increases you might imagine category tolerance increases but like position sensitivity goes down right that's a natural thing to think certainly we thought roughly a version of that so maybe these these properties might be encoded like positions certainly earlier in visual areas or somewhere else in the brain so just a sort of put like various hypotheses down previous studies had told us of for categorical properties information went up through the layers. OK for the category orthogonal properties What did we know well one eyeball this would be that there would be a tolerance sensitivity trade off that's very natural OK. H two is like a tolerance sensitivity trade off but we're like we one is better than human so maybe you're a human performance when you get the ID OK. Then there's this other performance this is what Jim my post like advisors like Paul was probably going to happen was that actually for the category of thought in the performer properties in from. It would be quote preserved OK that you wouldn't totally lose it wouldn't make sense to throw it out if you could use it so why would you so that information would somehow be preserved. So we wanted to figure out which one of these it would be and in doing so we'd sort of random the various tasks and found some that we weren't totally expecting which was that as we got better at the categorization task performance on the out of the top hidden layer. Was getting better on a position estimation task. OK even though the goal was to become invariant position it was getting much better at that because it was a little surprising maybe not impossible but certainly a little surprising So then we thought well maybe this was H two not H one maybe was H two. This was true for all the tasks so we looked at supposition estimation scale rotation etc. We thought was maybe H two so we figured out OK which layer is position estimation going to peak it is that early in the middle I kind of thought to be in the middle because you have boundary detection and that would give you I don't know where the objects are roughly but as it turns out you get better as you go through layers as well OK increase performance on the position intimation task of each model where and this was true for all the tasks that we could check them all out on write and you know like this was like one of twelve experiments I ran that week and there's the only one that did something that I was not expecting. So eventually we convinced ourselves that we should collect more data to figure out whether or not the data was in line with this OK because it was a little out of line with what we expect and we record we produced you know our original result that categorization was better and I can even before this is given set up that's good better than the one before better the one but actually turns out that I case also better than the fourth position estimation and in fact if you look across all the various tests. We found that in general well always I T. was better than before and usually before was better than the one across all these tasks OK. We also did an experiment like this which was sort of more like a classic standard receptive field mapping stimuli which have sort of like an X. and Y. position and orientation property because we expected to find like a different result here given what people are known and indeed that is true right the classical results are we didn't like contradict them or anything OK In other words for things like X. and Y. position V one is better than these higher areas OK for these types of stimuli go ahead. Both Well this part this particular plot is v one. Model but we want data also even. More so has this property and we later check that out. So you know we want is really good at barcode reading to write it's not surprising you can tell where these people like it's good at figuring this stuff out right better than humans and that. Probably better than so far as we can tell better than humans. So it's not like the you know we weren't contradicting the known knowledge and sort of put this in the perspective of human cycles like behavior if you measure behavior for all these different tasks and you plot performance by fraction you know neural performance fraction relative to human behavior as a function of number of units in that you draw from. This is on a semi log plot then what you see is that you know OK you increase for each unit a certain amount of performance on categorization tasks like basic categorization like animals worse as boats etc or subordinate identification cards versus each other roughly at the same rate but this is also true across many different tasks across all the tasks effectively and it's much more constant that in I.T. than it is for the other areas so basically it's like saying you know. For every single edit you did of in for power that you get an additional unit and I T. on one of these test you kind of get equal amounts of power on all these other tasks not exactly but. Reasonably so. So really what's going on is this. OK at least for these types of textbooks OK And this is really the crucial thing it's really not about the task as it turns out it's more about the nature of the stimuli OK so what you should take away from this is one I T. is definitely not invariant OK so if you think of you know a sort of specific general idea of when we as a was aggregating over in various dimensions that's definitely not what's happening in this rental stream process but perhaps some kind of generic aggregation is OK but you're not aggregating over the invariant dimensions you know of category of the category identity preserving the. From Asians lower level properties are not that low level OK We thought of those being low level properties like position at least with complex objects and complex backgrounds and it's not like intermediate or intermediate either which is kind of what I was betting based on border protection in particular really it depends on the nature of the stimuli for a simple stimuli there will be properties that are low level OK but for these complex stimuli have all the properties travel together in a categorical and non categorical properties together and it's not just that not all position information is lost but actually it it's not that the information is produced like we talked about earlier but it's made more explicit for every one of these types of categories so maybe to suggest that I keep doing some kind of generic scene parsing sort of sort of the powerful real or central image area but just just the sort of summarize why this was useful is that of course you could have discovered this if you had had the data but you didn't need to know what neural network model to like to do this is all data. But we would never have done the experiment allowing us to make these plots if we hadn't suspected that this could possibly be true and the only reason we did this is because we ran a bunch of experiments within a model that had some potential to be right but that we didn't expect that answer OK So that's the sense in which I think predictive models are useful even if they're not totally understandable in all their details OK. So you know around the time that I was doing this I really thought it might be interesting to look into some things in auditory cortex as well because it also seemed to have a bunch of properties that were like the kinds of. You know invariance things that were happening in a vision and of course not exactly the same but sort of roughly related and so I ran into these folks also at MIT Alex salmon jobs who are really interested in addition. And so together we went down the road of trying to think about how some of these ideas apply in auditory cortex so just as a little background auditory cortex has a lot of is is actually presupposing and uses a tremendous amount of really. Interesting auditory subcortical processing so you know these inferior color killers various areas of auditory cortex that are doing a lot of really interesting things after the Coakley but there's this area of cortex green. Thing which is about a bit interior of early visual cortex sort of roughly there and the question is What are those units doing right so what was known was that there's roughly a kind of a pair of core belt pair about structure not perfectly well this is a monkey brain we actually measured in humans and so this is just schematic. But in the primary area of auditory cortex it was thought that basically some kind of spacey or temporal filter in was occurring but that in these sort of other non-primary auditory areas like the belt or the power belt there was sort of not totally clear exactly what was happening OK So this is our question is Are could we figure out like better models of non-primary this is a higher auditory cortex. So we did a bunch of different ways of dealing with this actually one of the convolutions were the most obvious things since time is sort of evidently local. But it also turned out to be useful in fact more efficient to treat frequency as local as well and that is to say use a coarse model of the Coakley as the input OK So that is the that is to say to transform sort of amplitude into this into this type of diagram OK this sort of like Coakley or Gram representation. And to stick those into a continent OK was the strategy so basically optimized for performance on a challenging auditory task and then compared to neural data so let's get the. Mike off. And so. The task that we were in decided to use was a six hundred Way word recognition task so taking a bunch of words from various speech databases to MIT world Street Journal etc These are various. Speech corpora and combining it with significant background noise. So that's what the samples would sound like she had your dark suit and greasy waltz water all year with like you know auditory scenes and speech babble and music clips in the background so it was really non-trivial and you the task was pick out the middle word in each one second clip so like if yours she had you or you would be picking up a pad OK. And so humans are definitely above chance chances were able one percent here and humans are around seventy something percent on this task they're doing pretty well but far from ceiling. By optimizing network we can do pretty well this is going showing performance up through the layers of such a network so close not quite at but close to human performance at the top layer of this network measured on how that data with different speakers and different auditory backgrounds etc. So the question is how to compare this to neural data. Level for we did that we wanted to understand like was the behavior comparison useful again there's a lot of interesting behavioral work you can do with auditory to kind of look at the patterns so in particular you can take something like. Various state like clips and put them on either dry or like noise backgrounds of various kinds like auditory scene speech babble music see speech noise at different levels different signals noise ratio levels OK. Again doing a six hundred to a task and what happens is that if you actually measure human performance on this one each of the different performance conditions you get a pattern you know with different levels of noise in different proportion you get different proportions correct for different types of noise backgrounds OK And so it turns out that actually if you look at the model that we built versus the human pattern it predicts it pretty well OK so not perfectly but you know really hard to distinguish from different humans different humans don't predict each other perfectly here either so the model predictions pretty good at this sort of behavioral pattern you know the loading of. That this network was not optimized to get this pattern correct it just happens to fall out behaviorally this is a little bit like the pattern of the scatter plot of dots that I showed you before for the visual system you can also look at the ask whether the distance space of the model is a good way of predicting the actual distortion of the signal by measuring the abs different drivers is wet and just to make a long story short there. Early layers don't predict human portion the sort of distortion in the layers responses don't predict that well but as you go up through the layers you get much much better behavior so task optimizing and does skiver a kind of a reasonable speed in which to think about the distances between stimuli as predicting performance curve you know proportions correct behavior really. So OK so we're in a situation where we had something that was performing reasonably well was doing the behavioral pattern reasonably well so we want to as to what extent it predicted neural responses now in this case we were using humans for a couple reasons one of my collaborators work with humans so I had no choice the other was that humans do this stuff whereas it's hard to get mechanics to do a lot of these to like do it's not clear that they really have. Good responses to a lot of these things and so. Sam and Nancy and just McDermott Nancy Kanwisher and just McDermott measured one hundred sixty five commonly heard natural sounds and while humans were listening to them they were brains are remembered with M.R.I. M.R.I.. So like a baby crying or running water or you know like car horn or. Text you know speech so it was just a variety of stuff right and then of course for each box will measure an average response the sound right nothing like the temporal resolution or spatial resolution that we have with the electrophysiology to OK but it produces a data matrix of roughly the same form for our purposes OK so it's like the sounds versus voxels And of course. This is a sort of ugly situation where the number of predict if you are doing the coding this is an ugly situation where the number of voxels is way bigger than number of stimuli so you have to do all sorts of stuff to prevent or if it is actually for us it's not a big deal because we separately predict each each voxel So for us it's just like more chance more more checks OK. So we were interested in using a similar aggression metric to do this and I should say although it's the case that we use the same types of regressions here if you look at how many units were needed I should have had a slight is how many units were needed to predict each voxel you need many more units average here to predict voxels than you do in the in the electrifies case OK So there's some titrations of the resolution of the mapping that should be explored more but we really haven't done yet. But just a sort of describe what it what came out of these was first of all the reliability was not bad in other words if you look at reliability this is just neural data in L. D. Sorry this is a reliability plot so what this is saying is that actually basically the stuff that's reliable responses is in auditory cortex and outside of it it's some reliability but significantly less OK. If you look at median predictive It is a function of model there you see peaking in this sort of slightly later regions of the model but in particular if you look at predictive any differences between high and model low model layers high and low modelers what you see is that like model predicts early layers are much better at predicting the sort of primary area that's what's outlined in blue OK that's using a different separate like primary area finding localizer So that's where the lower areas are much better than the higher at predicting responses but in these non primary areas later layers were significantly better that's why that's what the blue means so this was interesting because an encouraging because it suggested that we could figure out sort of confirm and like deep in our understanding of where we like higher. Arkell structure in the auditory cortex was in humans which was hard otherwise so early there's a better explanation of primary cortex where higher layers are better explanation of non primary auditory cortex there's a specter of temporal filtering model that was around and sort of the best model for primary cortex and so just comparing to that in the high rez high risk reliability areas what we saw was about the same this is a bad bad color plot but in particular what you're seeing is that this sort of. Kind of dark yellow is where the net was about the same as the spectrum for a model in the yellows where it was better with a deep it was better when you see is sort of basically the case the primary auditory cortex was about as well predicted by earlier layers of this model as the space or temporal models that have been there before although it's actually a bit better so there's stuff going on in primary auditory cortex that's captured by early layers of the network they're not captured by the spacesuit temporal model OK but there are significant proven space in the everywhere but especially in these non-primary areas where you it's really hard to do much with the space or temporal ball. You can also look at a kind of region of interest basis so in that tunnel topic area what you see is variance is best explained in intermediate layers and that's interesting actually because it's not the early layers that are best explaining primary auditory cortex it's intermediate layers suggesting that and this is something that others have. Seen in some sort of anecdotal way this is probably relative to visual cortex early so-called primary auditory cortex is actually more stages in this more stuff going on in the sub cortex an auditory system relative to the whole system then there is in the sort of same thing in vision OK so perhaps it would be the case that if we had good subcortical data we'd see early layers mapping to say inferior click illustrate something like that we don't have that data so it's hard to say but it's an interesting question if you look in speech selective cortex this is speech so at the cortex generated a different way with you know like. Outlined with the Independent. Localizer what you see is that performance you know productivity goes up through the layers maybe not totally surprisingly since you know we optimize for speech selection task but if you look in bunch of different our eyes a ton of topic but also pits selective music selective speech selective what you see is that you get significantly better predictor predictions out of the test optimize C.N.N.. But especially in sort of non-tonic topic non primary areas so music selection for instance music selective area and now we're really trying to bore into that to see if you know if we have models that have multiple different. So much in great areas so for example not just word recognition but something to do with music so little hard to formalize that task in a good way but we think we have an idea for how to do that then perhaps we can get finer structure in a theory cortex that would be otherwise hard to come up with you know have so hypotheses for figuring out where we think their structure are they are we and we think we have been able to identify that there are maybe several streams in particular within Word Recognition for finer structure so just to compare this to the sort of previous vision work what we see is if you look at word recognition performance on the task that I mentioned earlier and auditory cortex productivity on that one hundred sixty five stimulus set there's actually quite a strong correlation here each dot here is a different model again and I find it can be a little embarrassed about this pot because the number is very high and I don't but nonetheless at this we reproduce a couple of different ways there's this quite strong correlation in this regime and this is really just to say that although the models are structures are different and the tasks are different and the data is different and if you take a vision model you try to use it to explain auditory cortex data it's significantly less good the Nonetheless there's this sort of principle that binds them together you know that basically if you optimize for task with a reasonable wish structure then you get a pretty good predictor of of intermediate neural computations so maybe it's a little. Grandiose to call those principles more of a theorist stick that if you are really bad if you want to have a good model or a better model than you have now if some cortical area and your current models are terrible at the tasks that you think that the area performs make the models better at the tasks right and then perhaps if they are sort of somewhat narrowly reasonable then by combining those constraints you get a better prediction of the sort of more intermediate more detailed levels. And of course this is not beautiful in the same way that like this was right so we don't get like a beautiful picture for what for what the receptive field structure is. Despite a lot of trying there's not a lot of great results. Like visualizing what the filters are right we did a lot of that and it wasn't that successful and other people have done it. Better than us but again I don't feel like it's that successful in the sense that although you can sometimes see what some filters are doing sometimes on some ways it's quite it's quite a can't it's like a very it's very messy it's like some units seem to be unpredictable but most not. You know some units seem to be understandable and most not. So you know I'm not saying that this type of understanding is impossible for higher visual areas maybe one day we will figure it out but what I'm saying is kind of the meantime there is a sense in which there's something deeply principle going on here like basically one of the knobs when you build in a Network Rail you're you have to formulate a model class in this case it's the C.N.N. class I should say in the picture this visualization is sort of like each point in here is a model and as you go out through sort of go out of the outer radio you get more and deeper networks where it is just a way of the way visualizing architecture class space but then within you know in the model class only some sub manifolds of models actually are able to perform any given the logically valid test right most most models suck at any given task and so a sub small subset is able to do each task sort of tasks pick out these bits of the space. And of course to get there from an arbitrary and. Well starting point you need to implement some kind of generic learning rule OK in this case and sort of mostly going forward we have gradient descent with hyper parameters on the matter parameter selection and then once you've done that you can map the brain data or you can say well look to what extent is it the case that like you know you get a strong jet stream by doing this process that allows you to say something about what the internal structures that's principle even if it's not quite as. Sort of structurally elegant as being able to interpret the receptive fields you know of course you can think of these three things in a sort of formal way and we do this as like a way to figure out how to make models better architecture class task which is sort of a last function of the dataset and then our learning rule which is a mixture of there's a hyper parameter optimization gradient descent. Of course it's really not a principle in the sense that of course if you push it too far it doesn't work right if you push the performance on any given thing beyond human performance you're probably not going to get better at predicting neurons in fact we see the resin that isn't better at predicting neural responses than we G.G. revealed is definitely better than what we had back in two thousand and twelve twenty thirteen but it's not enormously better somewhat better and also it gets worse as you eventually start to push too far so you know I'm not saying that there's this deep principle by which categorization optimization is the answer to the visual system it's not it's just one proxy task that gets you closer then you know otherwise you would have. But of course you know I'm really interested in things beyond this domain so you know instead of CNN'S interest looking at recurring networks instead of you know categorization looking at task switching ideally somebody would hand me a better more biologically realistic learning rule than strict back propagation I'm going to leave that problem since it's hard but then you know I'm really interested in napping to other brain regions as well you know pile cortex frontal cortex so you know just what's going on in the lab now. We're starting just getting set up but things that are or of interest is really interested in looking at recurrent architecture for us for dynamic tasks so you know instead of just the feed forward structure have local recurrences have long range recurrences and have you no longer in ships and long range by you know feedbacks and think about both like tasks like Obs action recognition which are inherently dynamic as well as time discounting So you've got to do it but you've got to do it fast accurately but feste right or things like situations where there's heavy occlusion but visibly taking the same strategies optimized for these tasks and check against both the static data maybe will be better than fifty percent and also dynamic data looking at the neural response patterns themselves you know the dynamics over time can we predict non-trivial structures in the way that the neurons change with state which they certainly do right like I talked about the situation where you know we were beginning from seventy to one hundred seventy milliseconds but there's reliable dynamics at the ten or twenty millisecond range and what are those patterns doing and you. Came up earlier about. The animals doing a test that's a place where we think that it's clear that there are some differences that is to say the dynamics are different when the animals are doing a task and probably caches out in some fraction of the images that are harder being doable for those animals but a small fraction so you know maybe the absolute performance numbers aren't that different but the subset that is different is really interesting and tells you something about how recurrent Amex is doing processing OK So this is one direction I think is really interesting. I'm also interested in sort of pushing into other cortical areas I show you to vision an audition you know there's a cortex and auditory cortex but I really want to go into a non-human animal or non primate because all sorts of interesting visual neuroscience techniques can be done in these other systems and so one of the most interesting cortical areas to me is rodents. At a sensory cortex in particular whisker trigeminal system because it's very advanced the animal's doing something very interesting with its whiskers and so you know not only is it like it's sort of better using its whiskers and it is seen. And so we came up with the idea working this is work with Mr Hartman You know Northwestern who is a whisker expert on building neural networks that take that basically built a sensor model three D. sensor model and then look at trying to solve sort of. Reasonably somewhat ecologically relevant. Shape the Texan tasks with that type of input OK And those are all and of having to be recurrent models because it's a very temporal signal OK and that's work that's underway we have some pretty good models there and so now we're in the process of comparing two cortical data from that system of course the biggest hole probably in the thing that I've said so far is that although maybe it's the case that the adult state is reasonably predictive. Of by the models by you know optimizing for categorization. The learning rules the learning task is terribly like on a biological right in the sense that no no my cat no human presumably but certainly no mechanic has access to thousands of labels in each of thousands of categories that's just crazy right so but none of the system does a good job. So you really we need to if you know people back to the picture I mean it's a couple slides ago of these three things. Even if we know what architecture class is OK in a visual system and even if we are going to assume OK that we are excuse me even if we are soon that we're going to be using sort of gradient the center based methods. Excuse me. That you know obviously was something wrong with our task with our last function on our dataset OK So you know a big. A big region of interest in the lab is trying to figure out better task for self supervised learning brain and trying to be as creative as possible here because we're feeling like we there's a lot of been a lot of negative results coming out of auto and coders and similar things you know so one of these things is most interesting to us is future prediction underage and controlled action so interactive data set construction but there's a hard task and I don't have a lot to report on that yet although it's a really interesting direction. And finally you know although it's the case that you might have a system that has like a really great representation that can do all these different tasks like I showed you earlier you know categorization localizations eyes posed by the law went on any given task in the world you're actually only ever doing one of the write and you have to figure out which one it is and learn them fast how do you have to choose them and have to learn how to switch between them right so we're looking to try to understand what the kind of controller architecture is here that allows us an agent like this is sort of embodied in a world to do continual learning in a reasonable way OK that is to say figure out what tasks are happening now learn readouts that are good for them learn how to build readouts in a way that will use the information they already have efficiently and then switch flexibly between them so those are those are some ongoing projects in the group that kind of build out from deep rich sensory models but go to try to deepen our understanding but also like use them to understand other other cognitive domains as well. But anyway I just leave you by saying that there's these kind of task modeling test your modeling can make you know significantly improve quantitative models of high intermediate level cortical areas there can be qualitative insight if you you know if you if you if you if you work at it and also the concepts are useful across multiple sensory modalities So that's that's what I have to say and please let's ask some more questions of the great so thank you. Yes. Yeah you want you want understanding like this roughly right like what did I say this thing you want kind of like understanding like this what is the view what are the futures doing and what. It could be well if you know how to do it let me know I've been very unsatisfied by our own and other people's efforts to make a kind of word level story of what the fee quote what the features are computed and I can tell you at some level they're computing what they need to compute So the next level can compute what it needs to compute So the next level can compute what it needs to compute So at the end you solve the task right and that's like that's a story right it's like saying look you've got this big nonlinear system it was optimized by evolution in development to solve this big nonlinear task there you are right that's an explanation of some kind but I think you want you want us to sort of we basically get over this way I have a high criteria for what would have to be to have a visualization were an explanation of the units be useful specifically you have to be able to reduce the training time by virtue of whatever insight you have so if you could tell me look me knowing this thing reduces the training time a lot you can just stick it in right if you have an intuition for it if you have your job or formulas or you know certainly like unsupervised data the sparse auto and coding approach rate is much cheaper than what we have to do the labels are cheap comparatively rich was great if there were some insight that allowed you to get out of having that bad supervision task as driving and then that would be a way of reducing the training data in some sense and I think that would be really great while. You know all that maybe those gray bars that was sucked those are those sucked even harder. Now than ever. You're asking some interesting so. There's a number of different questions imbedded in there which are interesting. I think there's one question which is sort of an easy version so I'll answer that one first which is like how much did neuroscience help OK How much does neuroanatomy tell you and I think it's a very long what road in the sense that like you know. I was talking to somebody about this earlier today that the first comments were developed by Japanese guy back in the late seventy's and he told me at lunch one day that the reason he knew to use common that structure was because his neighbor down the hall it. Was Tanaka K.G. thought it was a neuroscientist who told him Look the vision system solves this with a hierarchical retina topic structure and you know if you wish it was like mathematicians are like hierarchal right in the top of the structure that's like repeated convolutions. So I don't know how true the story is but it's what he said and like that sort of the kind of thing that happens once in a lifetime slowly in this kind of unpredictable way where you know like some insight from qualitative insight from you're a science may help you make a better model and it took a long time for the models to be actually engineered to the point where they were better. In the same thing could be said for a bunch of those in Iraq showed you those sort of intermediate steps like filtering in that's the convolution part of that but also like the different nonlinearities that were in there like neuroscientists figured out stuff about device of normalization and other types of normalization steps before they were useful from a column that operation point of view but it was very uncertain and there was lots of things that were could go wrong in implementing them and so it was really not like a direct cause of trend translation. So you know I don't for a minute in my own work I've certainly I'm not counting on neuro science and site to tell me what to do next in a detailed way right I mean I would say half of my group as a I does machine learning and I on the hope that. By solving more interesting tasks we'll have some stuff that will be models for neural processes that we currently can't even begin to approach. Then there is a nother version of the question which is harder and I'm going to rephrase that and then punt on it which is basically Is it the case that there are other model classes that are not convolution in some sense whatever you mean by not convolutional that predict the neural data as well and solve the task as well and the current answer to that as well is to produce one problem there which is what do you mean by not convolutional right like of course it's the case that something might not formally look convolutional but actually compute the same functions intermediately so. So modular without that definitional issue OK. But the answer is at this point no there are no other classes not even the solve the task though so that we know well case what exactly we don't have a positive control there. Right and so I wish we did and it would be a great question to answer in that case and so I have to punt on it basically because. I don't know we don't have it I don't know maybe we'll never have it or maybe there's a universe out maybe basically the only types of networks that's all that are of that form maybe there's something else that's really different in some way but somehow you could solve the categorization test well but then doesn't look like neural data and that would be a great positive control but let me put it where you can do the following thing which is you can take a shallow neural network OK And you can approximate the deep neural networks to some extent with the shallow neural networks. In the To some extent explain the neural day I mean explain the responses solve the task but they're terrible at explaining neural response patterns and like me for something. So in a way maybe that's like something like it's kind of a jury rigged positive control that's basically the universal approximation theorem failing the Rosa Parks mission is true but. It applies in. Anyways and so like a shallow neural network that can approximate the deep neural network I O. isn't necessarily going to have intermediate steps that look like you know we're going to have neurons that look like the intermediate steps and so in a way that's a bit of a positive control but I don't I think you'd want something better than that but I don't know how to give it to you at this point yeah. OK thanks thanks for having me.