[00:00:06] >> From charity University of Technology in Iran then he came to the us to repeat in physics in Brandeis. Fusion physics but he was specializing in competition Europe and so that assists. You know how your time then he did a couple photographed at Outback will have a critique of God. [00:00:29] Then he didn't have a line better with reading pack as well and he also worked more than 3rd and so I wrote it you know it has been your incentive you work so much and on different things like most of the decision making about the decision making and learning some of how to really. [00:00:50] You know. See entering the environment how do we learn from it uses aberrations that are going to paper and work on and I propose that I'm sure but also collaborate with people who do human were energetic American tackling sex. So we could have more of very interesting things and we super exactly have you know we have an erratic take away. [00:01:17] Thank you for playing writing and and for the nice introduction and yet adult men seem to have been doing a lot of stuff and I'm very happy to be here to share with you some of the I just thought they had been very interested in the increasingly in this last the focus on flexibility and learning and decision making. [00:01:42] So. I mean just from the title of my talking May 1 day yes of course Decision Making need to be flexible. In order to be successful but learning is really about adjusting to behavior so wide learning itself it's like I did the made point of my talk which is allowed to argue that in the real work. [00:02:04] Those we were based learning and decision making need to be adjusted and hopefully a show some evidence for different types of adjustments and underlying are mechanisms so today really I would be talking mostly about. The modeling world some behavior some behavioral work in different species to make this point that this is almost true across a 1000000 Many a few 1000000 species. [00:02:32] And what we can learn from it so why why this what this to be true why we need to be flexible in terms of learning and seasonally So what are these these characteristics that basically make the real world make the make for civility an important. Requirement for learning and this one as I said is that. [00:02:54] The choice options have many features and attributes so it's not clear how how you combine these features in attributes to make this is the other thing is that. The reward p.b.x. is a scarce and nonspecific So you you take many actions and then your followers read a very simple response which is not specific to why you know your actions are successful or not or the options you chose were rewarded so this to this already makes learning and decision making very difficult in the real world and naturalistic settings but on top of that the real world is uncertain and volatile and what I want to argue is that because of these characteristics and the other things that are just the most important ones in my view. [00:03:43] We need to. We need to use the information in the in the environment to determine how to combine integrate information for this for making decisions we need to decide what to learn from each week back and how much to learn from a tree. So to my talk is really divided into 3 pieces trying to answer these 3 questions and in each part of our lives and one of them in the 1st part is that how much to learn from each feedback. [00:04:15] I mean if you think they're learning even the simplest forms of life can learn from feedback you know even at least you can learn from a shark. Now that makes it a simple but it's kind of obvious that you need to avoid things that are hurting you and you need to approach things that are rewarding but the real question is how much to learn from it and what I want to show is a some evidence about how the amount of learning to be adjusted according to volatility in the water as a city just loosely defined as a fast and farm and the rewarding work so why should be actually learning be sensitive to voluntary to give you an example a cartoon so imagine you're choosing something an option and you give rewards feedback and here shouldered the happy and sad faces in order to sign that all you were basically figure out what it is the option of choosing is good or not you may try to do different things for example you can integrate. [00:05:18] Feedback you get over time here I'm showing the slot learning basically or just adding all your experience and if you do that you will see that Ok this option you're choosing is 50 percent good 50 percent bad but this is not really the wisest could do you could actually put more weight and more recently while I'll come. [00:05:39] Down but the question is that how how much you should how much how much weight you should put on more recent outcomes or how basically how we should integrate we want to eventually. Now imagine your environment again there what we've actually this gold and red circles Goldman's good readings that really doesn't matter but as you can see this reward environment is very volatile. [00:06:03] Things become good and bad very quickly in that in this environment if you try to integrate past outcomes using this kernel it's basically tells you how much weight you put in different options a different time this is this is not a bad way of integrating but maybe you should integrate on the shorter time ask and wait more recent outcomes more strongly. [00:06:28] But this comes at a cost so the reason you do you can do that is because you can be more adaptive. You can integrate you can change a behavior faster than warm and it's changing but your estimate of a reward value. Or probably less precise because you're relying on very few. [00:06:50] Pieces of feedback now you can now you're indifferent were meant and now the the reward is more stable over a longer period of time and in that case you want to integrate on a longer scale which gives you more persistent because you're relying on more feedback but this is more adaptable adaptable because if the environment changes you're not going to detect it very quickly. [00:07:17] As you are looking too far back in time and so so there is there is this trade off I will be talking about this kind of trade of whom I talk but there's a straight of between being after the and being precise which I refer to do adaptability precision trade there are this is this is I didn't invent this and there are many different kinds of trade of which with similar notions but I use this to to to refer to the behavior. [00:07:47] And so so what it tells you is that you can be as precise as possible in estimating the words that you. And as I think at the same time being adoptable and flexible in changing your estimate there is a trade off them. How can we study this people have been looking at a task like this the skull probably 60 or so learning task for many many years but not necessarily looking at the effect of volatility which you can do very easily mostly was used for learning studying learning because of the quest continuously so the task is very simple the animal human looking choosing between 2 options here showing with green and purple. [00:08:30] And choose one of them and your feet that what is special about these cats is that there are policies of the cast takes probably 6 so for a for some amount of time let's say for the last 21st 20 trials there were probably 3 on the red option is 80 percent and for the green option it's 20 percent and. [00:08:55] And after 20 trials it reverses So now the green is a better option so every 20 trials in this case there are the good and bad options for each end and you can control volatility of the environment by this block the idea that people have been playing with thinking about how you learn in this environment you can capture it with simple reinforcement learning here you can see assigning value to the green option of g. and then updated every time you get a word feedback using a simple were prediction error which is one minus the g. if you get rewarded or minus b.g. if you don't turn to the learning great which is showing bit of a reward. [00:09:43] And you can use a different learning rate for everyone to try out and again when people think about these things they say Ok this is basically how you're looking at this question and we're going to fit a behavior from us this is one possible model and extrapolating rate. [00:10:01] And that's something was that if you are in the or intuition I should say it's very also similar to the cartoon show that the beginning the when you are in a more volatile environment you can imagine the learning which would go up because you want to learn faster and if you're in a more stable environment to gain decrease. [00:10:21] Is there any evidence for it so there is this influential paper by Ken Burns and colleagues in 2007 that many people refer to. As a you know all timid answer for this question in which they show that they learn a great cause of actually in volatile environment and is lower in a stable and this isn't human. [00:10:45] And then. We replicate this experiment actually Daley's lab replicate this experiment in monkeys and they found basically no difference between the learning rate in the tile in a stable environment and we did basically burns experiment in our lab and you don't see any difference again in humans or in monkeys so even though it looks very intuitive that it's Ok you know if the environment is more well it's hard to just learn faster because that helps you to catch up with what is going on this doesn't seem to be true all the time so there is not conclusive evidence for that. [00:11:26] So we asked the question of how I too can adjust learning and is this learning is I just mean it's all about the learning rate though there are many many models out there in terms of how do you just think it may think that learning next quest to learn. [00:11:50] I mean these are some normative mothers trying to say basically how we should do it what is the optimal way to just learn it and if you're interested we reviewed that in this paper of these different types. But the main issue with all these models. Is that. They're trying to account for the changes in the learned rate for which really there is no conclusive evidence and that also in some of this model specially in the Beijing normative model it's not clear how high level computations are assuming there are implemented by neural mechanisms and brains so. [00:12:30] Again most of the smaller the hierarchy called to make there's a level on top of the learning process they said that inform how much learning should be done for the computation for them very very difficult to implement to break so so we asked the question 1st of all seeing this result that I'm not clear with the learning goes up or down or even doesn't change as you can see you can you can the results show you can say yes some for some subjects who opt for some goes down and we don't know when it happens or you can say there's no real concrete changes but what did you see whether you actually need something very high level to figure out and they're just learning in the switch. [00:13:17] So we can read a Macone signal for adjustment of learning. To uncertainty and follow it to the bottom and the mother lead. Is rather simple straightforward Let's say so you have sensory neurons for presenting rather than options are available and then to project to what we call this value in calling a runs that are going to estimate for learn the value of red and green option and they in turn project to decision making circuit in this case use of takes all the competitive process to make a decision and every time you get feedback after making decisions when you're updating the values the values are assumed to be stored in the synapses projecting to value encoding the idea is very simple is that basically if the synapses are stronger that they have. [00:14:10] If they're stronger you basically cause more opposing optic activity and higher fire rate and as a result your percent higher value and influence choice more strongly. So what we did we assume that in this model the 2 called war depended matter plus is the assumed that this synapses that are learning the or estimating the value of different options are metal plastic so. [00:14:39] They are they are just more than being plastic and I tell you what they are exactly but they are there's a big literature out there by matter plasticity and have been used to to answer different questions to try to use the same ideas here to see but are they can't explain it so what we have is that this method plastic model of synapses of 2 levels of synaptic effort to see which you call reconstruction on the Ws different colors and each one has multiple methods state. [00:15:13] So all these let's say this green metal state receipt which we have there we call them weak. Have basically the same efficacies but they have different levels of stability in terms of how they are going to change to another state the same thing with a strong sense of. [00:15:32] How it works is that we assume that if there is a putting changing event this week matter state become more or less stable by moving upward in these states and also some of them become actually strong and strong with a state actually transition to a more stable state so again there is a movement to for recursing of this to become more on a stable and strong seems to be stable when there's pretty much the same think in opposite direction happened when there is no. [00:16:06] When there's a question event and so fronting up has become more unstable some of them transition to the most unstable weakened state and weak states become more stable so this is this basically the entire. Architecture basically in terms of how the learning is happening so even though the architecture is simple it's difficult to trace what happened because there are these states that are. [00:16:34] Making things complicated and there are many transition probably to see are shown with this p. and q. values so assumption here is that the more stable state have the smaller the transition probably and you can use different creation to quantify that but that really doesn't the main idea here is that you have this method States only here in the states that have the same Africa see so to the extent of war they are doing exactly the same thing but but different depending on the stability they are changing differently when putting station depression event happen Ok so let's let's look at the behavior of this model in this probably stick reversal. [00:17:19] Though the task as I mentioned is very simple let's say for 80 trials the green target is a better option and then after that the red target becomes the better option and they are rewarding that series 80 percent of it so what happens is that if you look at the where the synapses go if you are in the environment when Green is more rewarding and dissing obsolescent code for the. [00:17:45] Value of the green target they basically become more stable they go the transition from because they too strong the state and this is the fraction of synopses in different strong states. And you can see more time passes they move more to s 4 which is the more lost the stable strong state and you can see what happens to the weakest states so they become empty because there's a transition for and to the right and down there though most of the week synopses. [00:18:22] In w 23 and for most of them are in again in the strongest. You know what happened in the when you learn their Mormonism or vote heart which is when that every 20 trials who are and who are probably his switch again this is quite arbitrary what I call volatile in a stable but this is just for referring to this Vironment So now then warm and change it's more frequently you can see that the synopsis don't actually find enough time to go to the deepest strong state and so basically you would get a different distributions of seeing us as a different place and dad would determine how the the mother's response when the reversals happened when the real Our poverty so because this mother has many transitions of it is and you can see you can't you have to track a different matter states like how synapses are distributed there you can actually make your life easier by coming up with that effect of learning for it if it also helps you to compare with the. [00:19:27] Reinforcement learning so why do you do basically us feel what would be the behavior of this mother in the tub assure the fraction of synapses in different matter States at the bottom of the practive learning rate which would be what would be the effect of learning that if yes you only have transition between reconstructs now here assume that the reward is assigned to the better option and what you can see effectively learning those events is going up after a reversal and you can do the same thing for cats with defective learning that when the reward is a sign on the worst option remember the better option is warning 80 percent of the time and the worst option is rewarded 20 percent of the. [00:20:12] It both of them apart. What happens now is that the effective learning rate goes down so as you can see if you put them together you can see that. They're tracking very good depending on the outcome between the simple task because the reward is on top correlated so it's either on the good or bad option. [00:20:30] But one of them goes up for the better option for the worst option which basically means that the system become more sensitive to outcomes that are supporting what the past history has been telling and become less sensitive to the events that are in the opposite direction now you can look at the world talent garment and you see the same thing happened but the but the learning rate or sorry the effective than a don't go to the same values that we saw before the start closer to each other the crossover and. [00:21:08] The and not so die different from each other this is a very simple prediction of this mother and you can get it almost with any prime care to put in this model but but the important thing is that this mother does not predict an overall increase or decrease in the learning rates but he says that actually there is an increase in the effective learning rate or decrease depending on which option you're looking at is it supporting the better option or worse option and again this task only has 2 options if you think about having many options in attack that you have multiple options you can see that language becomes specific to option a specific and history dependent. [00:21:53] I think that could explain what we saw before which is inconsistent results by the change in the learning. Because this model really doesn't say because crossover if you average over different possible outcomes you would see very small difference or no difference in terms of the line because it actually tells you you cannot be blind to what option you're looking at you need to consider that and then you will see this this is specific change over time is there any evidence for this so you should be putting be talking about it but the data the data from dailies the Johns Hopkins. [00:22:36] They basically look at this probably simply learning test monkey monkeys and so we have thousands of trials that we could use to estimate this effectively so as you see in the in the in the bottom panels. This is the estimated learning rate from monkeys choice behavior and this is quite close to what we see in terms of money putting Again it doesn't need to be match exactly because you can say specific set of primes but the point is that the fact is very change over time in a specific way. [00:23:14] And and that is confirmed by experiment. Again we only could do this because there's so much data you can't really do this with human data Ok so. And another thing is that this mother predicts choice behavior better than competing mothers here and b.b. tested many mothers. First because they are o.c.d. and also because nice reviewers who didn't believe us so especially the young mothers with we tested the Beijing model there is that all and not the Beijing mother and they are they are not doing well used to us. [00:23:53] And so we bought stable and was our environment our mother predicts choice behavior and this is prosperity data so it's not considered a number of. It takes into account the number of parameters each month so one thing that I want to mention that is interesting is that. Because of this the way that the mothers adjust its behavior again it doesn't know anything about the environment it just perceives the rewards received reward feedback and just the transition. [00:24:25] Is that it predicts that the mother becomes slower in response to reversals compared to orally models or normative model and we can actually see the reflection of that in the choice behavior of monkeys so this is this is a look at the quality of fit with the log likelihood right after reversals and as you can see the volume I refer to as are the M.P.'s who were dependent multiply says the it captures the behavior better right after events which shows that the monkeys are following these things so they become slower in responding to reversal so in a way the mother is trying to actually discriminate between this community better between the good and bad option while becoming more insensitive. [00:25:14] To changes in the. Reward environment and that is the very interesting. Connections to some of the work that Garrett Stanley has done Georgia Tech looking at the effect of neuroleptic action and the fact that it's an interesting discrimination at the expense of detection so I hope it's clear there is a very similar process here that this method plastic system is adjusting or adapting to the environment and as a result of that it can actually discriminate whether between the good and bad option but you can detect reversals because you can do both at the same time there's a trade off as well so I thought this this this link is very interesting but just to call could hear the 1st part of my talk hope they show that the war depended matter plasticity provides a mechanism for continuous adjustments of learning without any underlying optimization of knowledge of the environment and predicts that the learner it's our time dependent on this thing the sort of action specific So there is no single set of learning great that applies to all sort of learning of options etc and it's a very different view of looking at the models. [00:26:32] And why we learn differently let's say by different options because again there is no single learn and so that also predicts or explains why it doesn't there is no significant change in the overall overall learning rate according to political stand worm and which could explain the inconsistent results and the effect of learning very. [00:26:58] Sorry defective both with so. Doing so bad but not so great I could stop here and answer some questions or keep going before moving to the next question that's point of view for. To continue. Ok. Ok So going back to the. To the to the main point of my talk now let's look at the question of as I said. [00:27:32] The real world options are many attributes of features so the question is how we how we should combine it into information for making this and that the specific things that the whether the strategy for combining the world of information depends on uncertainty. So this I just give you a quick and introduction just to make sure people the same pages are simple stuff you already know but when people think but value valuation of risky options so let's say you have a gamble that gives you magnitude and one of the property p. one magnitude and 2 would probably p 2 etc How would you evaluate that and the simple answer mathematical answer is that you can't for the expected value even which is just a product of that magnitude and probably think. [00:28:22] That's what you do but nobody does that and this is known for the past 200 years the expected value is not what drives behavior so people can be expected utility theory in which the magnitude is replaced with a function of matching the map so basically tells you reward outcomes have a utility and that is subject to Pacific so you replace them with a function of them pretty similar to follow that probably 250 years ago it turns out that that doesn't work either and so Prospect Theory came and said that actually everything is subjective magnitude I'm totally for you multiply functional magnitude and functional probably score probably 3 to 4 percent and that's how you confirm subjective value and again depends on different subjects and so on but no matter what you go from a. [00:29:19] Between different models you always multiply the function of magnitude of the problem but another alternative you could do is that you actually you could combine these 2 pieces by summing them with different weights or you could think that you actually use the utility of certain outcomes. And you add probably have certain outcome to come up with subjective value but actually may not even come up with a subjective value you can use this this way of weighting them to compare options I will get it quickly in a 2nd but so instead of multiplying the student which is the correct things mathematically. [00:30:02] You want and I think themselves very stupid because you're comparing apples and oranges magnitude and probability of different units. They can be on very different scales and so how would you actually add them if you think in terms of brain that's all that what brain does all the time of us. [00:30:24] Combine things that have no relevance in terms of units or what what an additive mother would give you and it's very funny if you look at hundreds of thousands of papers and you know Joyce on the race people fight over what should be the function that is moved to provide to give you subjective values but nobody actually very few papers actually test whether you may actually now do to like these 2 functions and you add them. [00:30:55] Anyway what does not even that would give you. It's very simple is related to what I was saying then what I meant is both sides and you need to have combined probably to a magnitude an additive model would allow you actually to put less weight on the probability because it's uncertain because it changes all the time and put more weight on the magnitude and if then values is stable you can actually more rate put more weight on the probability that magnitude adad basically gives you flexibility to deal with and serving the environment so. [00:31:31] We asked whether to. Move ink or transition between using an additive in the most value. It actually depends on the uncertainty and we tested it or we assume that and the risk when they're rewarded for mission is given so there is no uncertainty in terms of the change in the. [00:31:53] Even though it is is there is some sort of uncertainty expected uncertainty here because the outcomes of holistic people use a sub a multiplicity of model and other uncertainty even through our properties are known and have to be learned then the using. And that's basically what we want to test whether this is this is the case. [00:32:17] So we look at 2 different tasks this is a cross species study data from Ben Hayden's lab at University of Minnesota the monkeys provided the 2 gambles and examples are showing with 2 bars that have different colors the color would tell them how much water they get so if let's say they choose the left option which has a blue green and red. [00:32:44] Get a green party get medium reward certain amount of juice if they get the red part that means get nothing and the Monkey know what the color its color means they have been trained and the size of each party will tell them what is the probability of getting certain outcomes and I can assure you we have done also of all of this to make sure to completely understand the sensitive to both pieces of what you know we basically replicate the same experiment humans we just replaced a drop of juice with points that would turn into monsters otherwise it's very similar and because we wanted to actually be able to compare them as. [00:33:26] It's called that possible. So this is a gambling task we'll look at choice on the on the wrist and then we use the probably still wears a Learning to same task of explaining earlier but I didn't explain all the details of it which is now the animals choose between 2 options this is a red and green option and that this daughter on each option which tells them how much you are they may get with with the probably 3 they have to learn so if they are let's say the red is 80 percent or 20 percent rewarding which again has to be learned by dynamo to make it 3 drops of juice with the property and if they choose the the green option they may get reward one drop of juice with probability that is it has to be learned So basically this task is very straightforward is a gambling task in which the protein reward has to be learned and you can have different versions of it where there are probably 30 changes frequently or on frequently or not at all so you can have a completely stable and warm and very we were probably these are fixed but they are not given and the animal has to learn we did again the same experiment with humans. [00:34:40] And but again replaced a number of dots with numbers and just jump over this detail because not important what we found is that now comparing these 2 thought when you look at the gambling task where you look at choice and the reste. The most the most this is in monkeys in most sessions the animal are choosing a hybrid Muscle Beach is a mixture of multiple here an additive mothers or an were moved to believe mothers most of the time so when I say Hubbard mothers is that together with the multiple mother can explain most of the sessions. [00:35:19] Because better multiplicative is the weight of the world because of mother on the choice behavior so as you can see most of the time a move for the a component. Is dominating the behavior now the same monkeys so now the same monkeys different monkeys now they're doing this. [00:35:41] Mix learning task or probably 6 reversal learning tasks and now you see the parent completely switches to our hybrid model which is dominated by an additive was because the better multiplicative is very small but I can see the shift in the starting monkeys data is for human data you get basically the same shift the slightly weaker in humans than monkeys this is cross objects in monkeys this cross-section but you can see that. [00:36:12] The most where the mother can explain the choice behavior and the risk has a stronger influence and choice behavior on the wrist while an additive mother provides a feat for choice and it so this this this resolve is interesting because what what is not clear. What dimension is that and that is even more than even the of cause I'd be smarter for construction of value what is really means is that the animal could just compare probably 2 of these 2 options and compare the magnitude of the 2 options and then combine their comparison to make a choice but there is no need actually to come up with subjective value in this case which is completely. [00:36:56] Which is basically undermining the most important. Concept in your comics. And it happens and there are answers to the other evidence that happens in most cases so actually moved to be mothers for a concert and subject to value is rare. Now. If you remember the premise of this I think Brother was that it's more flexible to what we can do you can see actually whether the weight of magnitude to probably 80 between a stable and volatile environment or less volatile a more solid environment is different as you can see it is so when you have your increased volatility the rate of magnitude to probably to increase because the probability is not reliable and so the animals both monkeys and humans rely more on the magnitude which is given and it's certain and probably So there you can see that what I claimed earlier that the wall is the additive mother allows for its ability because you can adjust how much weight you put on a different piece of formation is actually true behaviorally and also you can see that the reflection of that in how neurons increase frontal cortex encoding the word information so. [00:38:17] We here are looking at neurons that encoding the difference in magnitude of the 2 options on the screen and if you look how strongly they are they're responding to difference in magnitude that is correlated with how strongly the monkey rate magnitude going for over 80 even session of the experiment so there's a correlation here which on panel not true for neurons are encoding the sum of the reward magnitude and it shouldn't because again these are the neurons we think that they are in. [00:38:48] Helping decision making so there's a reflection of what we see in them in the behavior as well Ok you can watch to see how I'm doing the time. I'm not doing well Ok so let me see so here here also you can compare all different things to learn great on reward on the word trial and other strategies and there is no difference between more and less Well the ties and Bob. [00:39:15] The volatility does not have any effect on me again this is this is the monkeys you could think that happens in other places as well so the here is data from Joe my calling the Johns Hopkins looking at the choice behavior of mice leaking left and right and getting the word just and it's been estimated you are probably too sorry their learning varied in the more and less fertile environment the different the differential of less and more of a child is not the same as before but if you can forgive me for that but but the idea is that why do you see if you have to meet the young rate in this task you can actually see that volatility slide increases the learning rate for reward a trial but it actually decreases the learning that rewarding trial and that is an interesting observation again added nuances to what types of adjustment one would see terms of volatility and similar thing happened when you think about a predictable or unpredictable the environment we just learn going to skip because to get to the. [00:40:26] More important things so so hope this is the 2nd part of my talk show that the under risk humans and monkeys use a multiple of the mother for combining the right information which which could correspond to integration of reward into the. Subjective value but under uncertainty that goes away and now you have a basic the model which is I think which means that the animals are comparing attributes and not fusing information to come up with one single subjective. [00:41:03] Well needs help to be more flexible so from here also you can see that there is a tradeoff being between a flexible or precise model for evaluation and again here I'm using flexibility in a more flexible these clearer proceeding here if you can say mathematical precision it's what would drive. [00:41:29] The model base and multiplication of probably too much so I have 5 minutes yes and I can just mention what I wanted to say in the last part of my talk which is the question of what to learn and the idea is that if you. If you think about you are eating an apple and it's crispy and is red and it's big and you get a good experience what you want to learn you could say that you're learning about the texture by the color of this apple or you're actually learning by this specific type of apples. [00:42:10] Based on work that this do I refer to as you would do object based learning you're learning something about the object itself or you're learning about its features and the main argument is that which again we're all going to it is that if you want to learn about objects there are many things to learn about so it's a huge. [00:42:30] Dimensionality reduction if you learn about the features of an object when you get feedback but because it allows you faster learning as well because if you update your value but an object only can updated when you see that object but you can update many features every time you receive feedback on an object. [00:42:53] To give you much faster. But the war the not cross structed such that the features of an object would predict their value their love and consistency not all the red. Currency things are good yes so that basically causes you to move to our object Mr So there is what I want to say there is a trade off between what you can learn about object or about the features of an object and we tested this trade off in a few experiments they don't have time to go through but what we show if I can just shamelessly go to the 2 year. [00:43:41] Is that. All the cyclists show you actually drive the behavior one way or the other you can read about this in a paper published so it's probably important but but volatility of the environment will push you to our future based learning so then warm and change is more frequently you picked future based learning but if then one is not generalizable you go back to object based learning and if the environment is higher dimension you go to feature based learning because again allows you to learn faster and now again you can see there's a tradeoff between the flexible mother vs precise model for learning from a lot of. [00:44:22] The more interesting stuff here is that. You're going to read about the stuff in my paper published in 2017 but the more interesting things are what happened between object based learning and feature. So what I mean here feature is like individual features you learn about the color or shape or something there's object based learning where you learn about each object it's so you can imagine something between the 2. [00:44:51] What could be between the 2 is learning about the content in the future. So this sounds I mean I use terms from visual. Field of Vision because I have some background there because of my many post-doc but by this should make sense to other people as well that what we can learn we could learn something between the 2 and that is something I think very important because what we know mostly about the presentation of rewards is limited to very one dimensional things here where the animal in the green or red button and then you want to see how the reward values are presented in the brain. [00:45:30] So you can have now more more complex tasks where you have here in this case the 3 features shape pattern and color in the subject choose between them and get the word feedback and now we do the specific reward of schedules so this is the reward probity assigned to different options you can actually study how are the most sorry how humans learn about different features and what we find is that go quickly to the this part is that you can look at the choice behavior by feeding choice behavior which is the goodness of fit over time or you can look at the estimates of the subject over time but what you observed is that the start learning about informative feature individual features very quickly as you can see by the weight of this and their estimate and in the right path. [00:46:24] And then slowly start learning about the informative conjunctions So it's much slower learning but informative conjunctions for 2 reasons perhaps one is that the need to see those crunching since less often than teachers but also could be other mechanisms such as attention involved in this. So that makes the learning faster or slower so just to to just lose just maybe word more minutes about this is that you can actually capture this behavior with a recurring or a network. [00:47:00] Because I mean we do different kind of mother link but it seems that this would be suitable here. Instead of mothers that you've built from scratch because it's not clear how different representations are actually formed over time so again if you're experiencing the choosing something to give you are feedback it's not clear whether we actually have a representation for certain features in our value system we don't or do we have uncertain representation for conjunction of the features or not we know that we have all those things when we come by visual processing but. [00:47:40] The main difference there which is for learning about all those all those input information about the visual board you just need to open your eyes and you get you get feedback conference you can understand the statistics of all sort of connections in the word just by observing it when it comes to reap the reward environment you don't get enough feedback and that's really the main difference there so to go to my conclusion here you can actually capture these things and you can look at their presentation this recurrent unit and try to relate that to some think that it's probably important interesting for some people here which. [00:48:21] Q. It was a horrible thing here so if you look at the record of the units he would have in this case you could have a different types of units that he could have plasticity in the recurrent excited Asian and he sion or him put together to these units and these units actually have different representation in terms of these learning strategies and that is very exciting to do something that we are exploring now I should say my former students Shiva. [00:48:52] Exploring and it seems to give us a very nice predictions in terms of how different subtypes of neurons actually could contribute to the representation of the value in the brain so we that. I hope that. Kind of convince you that certain characteristics of the real world require Tourette's abilities and continuous adjustments of learning and decision making terms of how to combine them into good information to learn from we want to back story what to learn from what feedback and how much to learn from reality. [00:49:30] Friend there is always a tradeoff between being flexible and precise in terms of the strategy for combining the word information so the strategy I mentioned was where you confuse the information we were probably the magnitude and subjective value versus you compare. The learning rather I didn't get to talk about it too much but it's the idea of what we learn from. [00:49:54] Up to choosing an option and get a feedback or learning something perceived by the objects about these features a conjunction of the features and and about the amount of update in the summation of word value to the mentioned earlier that maybe we have a different way of updating them but in all cases it seems that choice ability is is more important and because if you are precise which means that you're trying to get more reward or optimize the amount of reward to get the environment changes drastically you'll be dead so it's not as important to be to get one person more of juice or fruit or whatever versus being alive and that's that that's really what is driving me these the thinking about them to trade up inflexibility and Chrissie's And so before I forget to acknowledge all my collaborators. [00:50:59] Actually I should acknowledge Shiva my my graduate student who is now in the job market she's doing a postdoc she did as you probably saw her name on all the studies and mentioned she's amazing. And my collaborators daily then Hayden termer Quinn and Alicia squared of without whom we couldn't say so much but just human experiment and thank you for. [00:51:25] For your attention but when 5 minutes or less are. There as. You open to some questions. I have a question yes. Well 1st of all I really enjoyed that thanks very much for that provocative presentation and I have a lot to talk about later I think if I understood that. [00:52:04] One of the things that you were saying in the 1st part of your talk was that the when he's introduced the volatility in the in the rewards it was it seems to be the case that the the learning rate that we usually lump together parts of things in and refer to is one learning rate is actually something much more complicated than that it's maybe. [00:52:28] There are multiple learning rate and it's very dependent upon the task and so on I mean did I get that right that yes it was a great player and that you can enhance learning rating sort of one sense but diminish in another sense and then exact would interpret that as no change at all exactly That's correct yes I need to and the other thing is that the doesn't have to be the same if you have these adaptive system they are different for different options probably for different domains etc depending on what you call them together because probably there is some clumping in the brain in connecting things but yes they are the are different for different options and there are different introspective reward feedback so if there are 2 baggers confirming what has been going on then bottom and you get something versus something else if you're not confirming that So comping together I mean 21 learning rate is being at the stake because you lose all the important information about how this system is I just think. [00:53:30] By by doing that. Yeah Ok thank you and maybe just a quick follow up to that. I'm wondering always in these kind of behavioral experiments when you start playing around with the reward probabilities and so on I'm wondering to what extent the subject is aware I'm imagining myself playing this game and at times thinking like wait I know why I was correct in that but yet I've gotten a reward versus other times I might blame myself for getting it in perhaps one this is something we might not be able to determine from an animal study but I'm wondering if you have any insights about like what the perspective of the individual has and what impact that would have on their. [00:54:11] Impact. I wish I knew that question so I don't know because even for human experiment which we do in my lab I strongly believe that human subjects are able to produce a lot of **** when you ask them what they did so that's why I simply don't ask them about their strategy what they do because because again we grew up to do that you know from childhood we have to we have to come up with an explanation for our choices so we are very good at that and I don't think it's a really related to what we do I don't think you have access to that at the conscious level so I don't know the answer like whether the subjects know what they're doing or they're upset some people have done you experiment be probably 6 or so learning and looking at how people feel and stuff like that I can tell you later but we did and it was a. [00:55:10] Value sounds like estimating confidence might be a way to get insight and the more what they were more care was just asking about. Conscious. Choice. Yes I resigned there was a great talk I have the question related to kind of your thoughts on what are the benefits potentially of having a simpler additive model in terms of implementation a 10 year old level because there are to 1st approximation runs out of some tractor items implementing these kind of more complicated multiplicative models can be to be problematic for in a simple sort of network model so I was wondering what your insight about that would be for how this simplifies quota problem yet the. [00:56:06] Real problem yet so so the 2 issues with the with the moves for you mother again moving mother is a very simple version of whether we combine all the information to come up with one number I mean it looks very elegant subjective value Yes and that's why people love it you just come up with one number for all the garbage you have to deal with and you come up for another option you come up with a number and you compare the number and that's beautiful yes you can make a decision over anything you want so 1st of all that requires a lot of wiring the need to be tuned so that if all these areas including this piece of information this piece of information this is going to that place that the subject evaluates computer it requires a lot of wiring and also requires of humility that they are not off because if they're off it's over yes you can just suddenly change your connection make it weaker or stronger but if you are doing comparisons which is the point of additive model as you know better than anybody innovation is the really the main thing you just compare you can do it within an area you compare and you know which is better in terms of this specific aspect attribute another area you can do it for another thing another day and all you need to do now decide which of these you should listen to and which one you should listen to should actually not be deterred. [00:57:25] I mean by any. Normative mother but by your state like you're hungry you ignore everything you just look at the magnitude if you are like a different state you just listen to that system more strongly and that's why I mean if you look at every evaluation system you see all this connection between this area which tells you you have access to control that's a choice but but but that's basically telling you that maybe the comparison is done they're reading an attribute you compare things that are relevant to probably to magnitude other things and then you combine them so it is in terms of that is more flexibility in terms of how the animal or the brain can choose what information to listen to without trying to come up with a way of converting everything to one place and then using that again story which is to me that's completely wrong because that to us but it's very difficult to tease them apart so here I show some example because we have so much data and we really do mean to that and. [00:58:33] We could actually tease apart whether what they're doing but. But yeah there are a lot of interesting things again because you're working on a certain level that a lot of interesting things in terms of what this computer should imply in terms of innovation competition between Circuit did these 2 types of computerization and I think there are a lot of interesting things we can talk about it. [00:58:57] Thanks that I didn't leave so much time for questions. By going over time. I want to ask about optimality but I wonder whether we should have to do it for our conversation where it's not Ok. I mean sure he was doing form other people that are chemo you can ask the question I can't because I can help I think you can maybe it's more polite to take it offline if people. [00:59:35] Timetable it told. You exactly right Ok then thank you again for this wonderful presentation that was paradise and. I think a few of us are going to have the truth don't you that are you. And. You know I have the things that thank you for. Thank you very much for again for the supposed soon to be playing writing me and I'm looking forward to my conversation and the new and few other people. [01:00:15] I hope to hold by I made a point that I wanted to make about the flexibility. Yes thank you very much.