All right. Thank you, everybody. Thank you for coming. Can everybody hear me okay in the back? Is Zoom okay? Yeah. All right. Then we can get started. As I said, my name is Anya, and I'm an assistant professor in psychology, and my background is mainly in cognitive neuroscience with recent force into AI. As it turns out, there are lots of similar questions that have been asked in both fields and lots of similar techniques that we can use to study them. And imagine this powerful device that can let us seamlessly take a thought in one person's mind and transmit it into another person's mind. Well, we have this device. It's called language. So I can go outside, see the beautiful blue sky in Atlanta today, say, the sky is blue. And you will have a picture of a blue sky in Atlanta in your mind. And this is something we do all the time. Of course, I'm transmitting information directly into your brain right now, and we often don't appreciate just how powerful this tool is and how much it has done for human culture and human civilization. But there is one more thing that's happening here in this process. Not only does the listener, the recipient of the message, get the message, but they are now also able to reason about the speaker's minds. You can ask, well, I guess, Annie believes that the sky is blue. How did she know? Is she lying? What was her source of information for that? What else does she know? Language is also a way for us to probe the contents of other people's minds. And in fact, that same idea also has deep root and artificial intelligence with the tearing test. The tearing test from 1950 is a setup that allows a human observer to interact either with another human or with a computer through a chat interface. And, you know, the original intent is to figure out whether they're speaking to a human or a computer. But ultimately, the way they're doing it is probe somebody else's cognition, knowledge, reasoning abilities through language, right? So we're using language to interpret the underlying thought. And this link has been made even more prominent with the development of large language models. So these models are neural networks that are trained on large, large, large amounts of text, pretty much the entire Internet these days. And they're trained with this very basic objective, the next word prediction task. So the fox chase the blank. Maybe a model would predict that rabbit is coming next as a plausible continuation. And then there is additional stuff that happens, a fine tune that allows a chatbot to actually answer questions and provide explanations. But ultimately, it's a big prediction engine. Train on language, but it turns out that these models now exhibit a lot of the behaviors that in some way resemble reasoning, and they exhibit a lot of knowledge that they've extracted from the underlying linguistic training data. And so as a result, we now witness this rapid growth of a new discipline that you might call AI psychology where cognitive scientists study artificial models as if they were human subjects to probe what kind of cognitive skills and abilities they have. So you might have studies that look at theory of mind. So social cognition in these models, working memory capacity, personality traits and political biases, and, you know, some going as far as saying, Well, maybe we can actually replace human participants with these language models because they can stimulate human responses so well. They can't quite, but people have pretty far reaching ideas in that space. And so that's cool. But there is a danger here, right? So there is potential to start conflating language and cognition. So just because these capacities are tightly interconnected and we use language to reason about cognition, doesn't mean that they're one and the same. And so what might happen is you might fall to a fallacy that someone or something is good at language. Therefore, they're good at thinking. And in fact, it happens all the time already. So if you have an eloquent speaker, maybe a politician that really draws you in with their words, speaks very persuasively, very convincingly, you are more likely to buy their argument, even if when you look at the content more closely, it kind of us. The opposite kind of fallacy is observing someone of something being bad at thought or at a particular cognitive capacity, and then dismissing all of their abilities, including language. And so that has happened a lot with language models, especially a little bit earlier on, people say, Oh, look, they hallucinate, or they fail at this complicated math problem that I gave it. Therefore, these are not good intelligence models, right? And dismiss all the things that they can do. And in particular, they are quite remarkable linguistic abilities. And so the main claim that I'm trying to make here is that when we evaluate cognitive systems, we really should try and associate how good they are at language, and there are other traits that we might call cognition, intelligence, thought, and whatever else you want to put under that umbrella. And so in my work, I look at this language thought relationship in humans and human brains and in artificial models. And I think there is a lot of potential for these two areas to talk to each other and mutually inform one another. In the talk today, I will first give you a background saying why we think we should separate language from other kinds of cognition in humans by talking to you about the language network. Then I will provide a framework that can allow us to talk about the language cognition relationship in a little bit more depth, and that's the notion of formal and functional linguistic confidence. And then we're going to look at three case studies, computer code comprehension, legal reasoning, and generalized world knowledge to see how some of these ideas play out with a heavy emphasis on human neuroscience since it's a neuron ex series, but also with some language model stuff toward the end if we get into it. Okay, so let's get started. So decades of research have shown that language processing in the human brain takes place within its own dedicated network. This network responds to language at different levels of granularity, words, phrases, sentences. It responds during both listening and reading, so different kinds of language comprehension and also during speaking and writing, so different kinds of language production. The exact location of that network will vary from person to person, so this is just a rough location of where it is. It's typically left lateralized. And in addition to being responsive to all different kinds of language, this network is selective for language. So if we look at how strongly this network gets engaged in response to a linguistic stimulus, so written sentence, a spoken sentence, you will see high activity, high engagement. But if we measure its responses in response to all kinds of cognitive tasks that are not linguistic, then its response is low, and that applies to math and logical reasoning and problem solving, different kinds of conceptual knowledge, et cetera. This evidence from brain imaging is also corroborated from studies of patients with damage to the language network. So people who have really extensive damage to their left hemisphere essentially took out the vast majority of the language network. They, of course, have severe problems with all different kinds of language comprehension and production. This phenomenon is known as global aphasia. But if the damage is primarily limited to the language areas, they exhibit unimpaired performance in all these other different kinds of cognition, such as arithmetic and problem solving and reasoning about cause and effects and about other people's state of mind. So evidence from both brain imaging, FMRI, and from global aphasia suggests that the language network is selective for linguistic inputs and outputs. But that's not the end of the story. Of course, the language network doesn't operate in isolation. So first of all, you need to get the information into the language network somehow, right? That's kind of a given. So audition for speech, vision, for written language, or sign language. But then even after that, we have this language network that's responsible for core linguistic knowledge. And then we need to make sense out of the words that have been spoken to us. We need to integrate with the rest of the information that we know and maybe to reason over it. So then a host of other networks gets involved, such as regions responsible for social knowledge and reasoning, general modeling of the situation, keeping track of different agents and the scene, a whole host of other regions responsible for domain specific world knowledge, including knowledge we might acquire through our senses. A network responsible for general cognitive tasks, so problem solving and executive control, and putatively some regions may be responsible for semantic tasks specifically. What's happening is that there is a constant flow of information between the language regions and all of these other networks that together enable us to use language to do things in the world. And what we can say is we can say, Well, look, we know these are all different capacities so how are they interrelated. What are the mechanisms supporting them in humans and in artificial systems that are using language? Are they using language in a human way? And so what we can do, we can group these capacities into two categories. One is formal linguistic competence. So these are language specific computations taking place in the language network in humans that allow us to interpret language according to the rules of its grammar and its vocabulary. But in addition, we also have functional competence, which is a whole host of non language specific abilities. These systems can operate over non linguistic inputs. If you're showing people a movie with the relevant content, the systems will come online. But in order to use language to do things in the world to interpret the meaning of what's being said to you, you need all these other systems as well. But to give you an example, an example of formal competence is maybe a sentence like the keys to the cabinet R on the table, knowing that the verb here is R and not is, that's an example of formal linguistic competence that requires knowing the rules of English syntax. And surprisingly, this actually has been quite easy for language models starting pretty much as early as GPT two and definitely GPT three. And it's a remarkable scientific and engineering breakthrough. People these days are so focused on AGI that the linguistic abilities of these models actually are taken for granted. But it's really non trivial and it's not something that linguists are expecting being able to extract the richness of grammatical rules for English and many other languages just from pure text data, that is a really non trivial achievement that tells us a lot about language, the cognitive phenomenon. On the functional competence side, we might have an example like this. Six birds were sitting on a tree, three flew away, but then one came back. There are now four birds. So four is a word. So if you're predicting this paragraph one at a time, you will need to find the appropriate word form. But in addition to that, you also need to perform some basic calculations. And so this basic math is an example of functional linguistic competence. And this is where it can be challenging for language models. If you give them this example, they will be okay in part because they've seen this many times. But that's some of the functional competence domains in mathematical reasoning and social cognition is where you see their performance start to break. And even when they do succeed, it might be for the wrong reasons. So their success might rely on memorizing the answer or figuring out how to solve this narrow class of problems without generalizing to a broader class and just kind of figuring out some heuristics from their training data. And so one thing that the EI community has realized is that just scaling these models up and just leveraging next word prediction to its core is no longer enough to improve functional competence. And so these models are now being fine tuned on specific tasks and sometimes augmented with additional modules beyond just language, kind of like what we have in the human brain. And so that shift is already apparent in AI, even though still many people try to have a model that kind of does it all in one go. And so one example that might be helpful is, this is the language network. This is another network that we call the multiple demand network or the front of ital network that is responsible for general reasoning and problem solving. So in general, the harder the problem you're trying to solve, the more your brain hurts, let's see, the more activity we will see in that network. And so in response to, say, a bird problem, what's going to happen is that you're going to read the description here. It's going to flow through your visual corteses into the language network. The language network will extract the meaning behind the problem, but it cannot do math. And so what's going to happen is that it might pass the relevant information to the multiple demand network, which will then perform the calculation and return the answer for which you can then verbalize. So we see an interaction between these networks using this simple example, and we think that it happens for the vast majority of linguistic behaviors. Any questions at this point? Yeah. Does it matter what is the source of the data? Because, for example, a video is way more data compared to a text, for example, or something like that. How does the brain deal with that? So the language network is selected for linguistic information, and we'll see going forward how much we can push that selectivity, right, whether we have kind of exceptions to it. But in general, information from videos, it flows primarily through visual cortes right from low level to higher level, and, you know, if you think that it's the amount of information that matters, then you would expect kind of the opposite. You would expect the brain region to respond more to videos than to language because language is more bare bones. But in the language network, we see a flipped result where it prefers language other non linguistic kinds of inference. Yeah. I'm not so familiar with this because you mentioned maybe the upstream logic processing area can send information to the language processing area in humans. Do you think larger language model should be combined with other like reasoning model to work together? No. I. Maybe the reasoning model produce some input and inserts the answer into the module. Yeah, that's great because I'm actually not coming back to this point, but in the paper where we discuss formal and functional competence, we actually say, Well, look, if the human brain is compartmentalized, where you have language processing and then cognitive processing in other parts of the brain, then if we want a model that's human like, we probably want that same kind of specialization. And there are two ways of doing it. One is to build in those modules in advance. We call it architectural modularity. So you have a language module, a reasoning module, you combine them together. Or you might have an architecture that's end to end that is trained holistically, but something in that architecture permits it to develop specialized modules within over the course of training. And that's a hard problem of how do we do that? What are the kind of training objectives we would need. But in general, I think it's possible to have a system that is modularized in the end but doesn't start out with predefined modules by human engineers. Great. Okay, so with that, let's dive in. So in this first study, we asked, okay, you know, how specialized is the language network really, right? Is it really only responsive to language or maybe it's something about language that is shared with other inputs. And so here we looked at computer code comprehension as a compositional symbolic system that resembles language in many ways. And, in fact, you know, it's they're called programming languages. So clearly, there is a parallel there. And so if you think about what happens if you're processing a little problem in say, words versus code, you have to start with interpreting individual words or tokens. You then have to combine them into sentences or statements using syntactic rules. You then have to link individual sentences or statements into a text or a program, and you then can extract the underlying meaning, and they perform reasoning over it to get at the answer. So there are lots of similarities, and so we also know that computer code does not have an evolutionary specialized module inside the human brains to reason definability. So it would make sense for the language network to take over. And so that is a really strong test of selectivity for language versus other things. And so we can also apply our formal functional distinction here. So the process of getting to the input to an overall meaning of a program, we can call formal programming competence. But then we also have to do reasoning over it, right? We have to interpret it, predict the output, and so that we can call functional programming competence. And we are focusing mainly on formal programming competence in this study. So what we did is we showed people little snippets of code problems in Python and matched sentence problems and asked them to predict the output. What is the answer going to be here? We also had a whole other programming language we looked at that was as different as Python as we could think of and it's Scratch Junior. It's a graphical programming language for kids. It doesn't have any text in it. So it's a series of blocks that you combine to give instructions to a cartoon character. And then this was also matched with corresponding sentence problems. So we look first at responses within the language network to Python code. So here we have a basic sentence reading condition contrasted with the non word reading condition, so nobody could have predicted, blah, blah, blah, versus things that look like language but aren't linguistic. So that's our general localizer that we use to identify the language regions and just make sure that they do respond to meaningful sentences more than this control condition. And then these were our two target conditions, sentence problems and code problems. And so first, we see our expected response that this network responds to sentences, but not to matched controlled non words. It also responds to sentence problems, which makes sense because again, these are sentences. And so the critical condition here, of course, is our code problems. Is it going to be as strong as responses to sentences? And it's not. It is a little bit more than non word reading, but it is significantly lower than responses to our sentence problems. Suggesting that there is a little bit of a response in the language network to Python code, but it's not nearly at the level of English sentences. Yeah. How is the language delivered? Is it auditory or visual? It's visual presentation. You. But we see the same thing for auditory language, so we would expect to sim. But here it's matched, so it's visual for both language and code. Okay. Yeah, well, yeah, so visual for code seems very different than auditory code, at least for me. Well, yeah, auditory code delivery does not work. It's like a known problem. Which is very interesting because there are a lot of insights that can be gleaned from psycholinguistics in order to design maybe programming languages that will be easier to comprehend auditorily, but that's a whole other tangent. Yeah. I never trained on language sentences, type code. Ah? So this is the brain. This is the brain. This is the language network. Yeah, yeah. So most of the stuff I'm talking about for the next 20 minutes is brain responses. Yeah, yeah. Yeah. But the same question. How familiar are the subjects with coding? Relative to how familiar are they with English with language? Yeah. There's a diversity, even among native speakers of how. So this was one of the first studies on computer code in the brain, and so we picked a pretty lenient threshold. The people needed to be a proficient with Python, but we didn't impose any strict criteria in terms of how long they needed to know it, I think our lower bar is one year. And so that's the minimum. But of course, you know, it's usually more. We've collected that information. We've tried to look at individual differences, but the sample size is simply too small, so I wouldn't make anything out of it. Um, I will say that for foreign languages, you would see engagement of the language network. Even if you are learning a new natural language, you would see much more activity than you would see to Python. It's true that the level of experience is different, but oftentimes it might actually go the other way where if you're proficient and fluent, you activate the language regions less. It's easy for you essentially. Okay. So now, in contrast with Oh, and so now we can also look at Scratch juniors or second programming language. So again, responses to sentence is high, responses to non words are low, and we're looking at our critical code condition. And here we see very low responses. So if the programming language doesn't look anything like English, like Python does, then the language network really doesn't care. And so we can conclude that it responds a little bit to Python and not at all the Scratche. And then we can also look at the multiple demand network, which we know is responsible for general problem solving. And so in contrast, it has really strong responses to code problems. So first of all, it responds to sentence problems as well, which makes sense because they are performing reasoning. It's math problems in half of them, string manipulation in the other half. But then we see this additional response and code problems, suggesting that code interpretation, right, is a form of programming competence, also loads on the multiple demand network. And so we see this for both Python and Scratch junior, right, where in Scratch Junior, the Gulf here is even larger numerically. Yeah. C activation of those two numbers? Not in this study. So the question of how we look at the co activation. Let me wrap up this part, and then we'll come back to this. So to sum up these results, we see a little bit of response in the language network to fight in code and not at all for Scratch junior. In contrast, the multiple demand network exhibits robust responses for code comprehension specifically to both of these languages, suggesting that that's the main network responsible for extracting meaning from computer code. And then in some follow up work, we actually did some decoding using code embeddings from a code neural network model showing that both of these brain networks have information about the identity of the computer program that's being read. Information is there to some extent. But then another research group did some follow up analysis, suggesting that probably what's happening is this kind of false recognition situation for the language network where we see an initial activation and then a drop. So it's like, Oh, is it language? Oh, no, it's not. So that's our best hypothesis of what's happening. And then, of course, with Scratch junior, you don't see this false activation because it looks nothing like language. So the false alarm didn't happen to begin with. This doesn't fully answer your question, but it kind of goes in the same direction of what is actually going out there. Alright, so we can conclude the form of programming competence relies primarily on the multiple demand network. So we still have maintained this selectivity profile for the language network. Okay, the next study is a more fun collaboration that we had with a legal reasoning researcher, where we can see some of these ideas essentially play out in a more complex setting for a pretty complex cognitive ability. So in general, we live in a society that requires a lot of governance of social relations between individuals. So laws exist in order to give people an idea of how they should behave in this complex environment. And so the key principle here is taking a law and applying it to the world around you to know what you should be doing, what you should not be doing. And the process of interpreting the law in relation to your environment is called legal reasoning. So that's the subject of our study here. And the lead author on this study is Eric Martinez, who has a law degree which wasn't enough for him, so then he got a PhD in cognitive science, and he's really interested in legal reasoning and interpreting legal texts from a cognitive science perspective. The legal literature makes a lot of claims about how things should be or how things are, but we are still in the earlier stages of having empirical evidence for what is actually going on in the human mind as we are interpreting legal documents. And so we can use the formal functional framework here as well. So we can talk about formal legal competence, and that's the process of extracting meaning from legal texts, which are often written in a very particular way we might call legalese, and so you might require specialized training in order to actually understand these documents. But then there is the process of functional legal competence, and that's the legal reasoning itself. Applying legal knowledge to specific cases to figure out how a law or a contract might bear on the situation at hand. And on the functional competence side, there have been a lot of debates in the legal literature about how things are, how things should be. And one side of that debate called formalism. Applying legal rules is a formological process, and so actually one inspiration for the study was the feedback we got on our computer code study from people on the Internet, Oh, well, it looks like computer code is just like reading a contract, right? You're doing this very formal, structured process of interpreting very formally written language. And so I bet the responses to contracts in the brain would be kind of the same as the code. So many people have that kind of intuition. But on the other side is the realism School, which said, Well, look, applying legal rules is not just about formal reasoning. In fact, it depends on the context and the social factors around it. What were the consequences? What was the intent? What were the harms involved? And so it's not just about pure logic, it's about contextualized reasoning about the social implications of the rule violation. And so, can take this framework and actually start making neural predictions about it. On the formal legal competence side, we have our two potential players of the language network and the multiple demand network. And the question is, which of these networks takes this extra toll from interpreting legal documents? On the functional competence side, under formalism, we would expect the multiple demand network to be involved in actually reasoning about the text. And under the realism, we would also expect involvement from social cognition regions, such as the theory of mind networks. So we have some interesting brain predictions to work with here under both the formal and functional side. And so we have people read two versions of contracts, the Legal Ese versus the plain. The content is exactly the same, but LegalEs has center embedded clauses, passive voice sentences, stop in all caps, which is supposed to make things easier to read, but actually kind of doesn't help. And a lot of low frequency words and phrases from Latin, et cetera. And then the plain but they can rewrite all of that in plain English while preserving the meaning. And so we looked at two groups of participants. We had lawyers and non lawyers with a comparable degree of graduate education, and we looked at responses in the multiple demand network to legal As and plane. And we saw that there is indeed increased activation in both groups to legalese versus plane, suggesting that indeed, it takes an extra cognitive toll to process legalese contracts relative to plane, even if you have a lot of experience. So even lawyers show this second and then we can also look at responses in the language network. And here we actually see a difference in the lawyer group, but not in the non lawyer group. And essentially suggesting that specialization for reading legal text that does emerge manifests in the language network. So we see more involvement here. So we see the multiple demand network engaged across both groups of people, and the language network engage for lawyers specifically after they read the contract, we have people read scenarios about a real situation where that contract was in play. And so say there are two roommates with a rental agreement, and now there is a conflict, and our participants had to answer different questions about the resulting situation and the applicability of the contract. And so we have them grouped into a logic category, a moral category, and different kinds of legal questions. And so we first look at responses in the multiple demand networks. So first of all, we see strong responses to a logic question, which was just like, a math related problem and nothing in response to a moral question. So that's good news for us. It means our positive control and negative control are working as expected. And then the responses to the legal questions were something somewhere in between. So first of all, we see that there is heterogeneity in between different kinds of legal questions, suggesting as maybe is expected, that legal reasoning is not this one monolithic thing. There are many sub processes going on. And we see some engagement, but it's not nearly as strong as to large questions. And it's matched with difficulties, so difficulty is not the explaining factor here. And then we see the same pattern of responses in our non lawyer group. In the theory of mind network, we see responses which the zero basin here is shifted, but essentially we see much more engagement in response to moral questions and legal questions relative to the logic questions in both groups, in non lawyers, we see a little bit more for response to moral questions, which some of our lawyers have reported treating moral questions as legal, so maybe that's why. But essentially, we see that in neither case, do we see this clear patterning of legal reasoning questions with pure logic? So we see that both networks are involved during legal reasoning for lawyers and non lawyers, which suggests that formal legal competence recruits the multiple demand network. In lawyers, there's also additional recruitment of the language network. And functional legal competence recruits both the multiple demand network and the theory of mind network in both groups, which essentially provides support for social realism, suggesting that it's not just pure logic that there is social reasoning involved here, and we get similar results from looking at the correlational patterns in the brain across these different kinds of questions and tasks. I will say, however, that this is work in preparation. So take this with a little bit of a grain of salt. We'll be finalizing these results in the coming weeks. Alright. And so now we are moving on to generalized world knowledge, which is perhaps the domain I'm most excited about. It's also the messiest domain where we really start to see difficulties trying to separate out language from knowledge and language from reasoning, just because they're intertwined so much. And so I think this is a very exciting area to be working towards. So we'll talk about whatever we have time for now. But then, you know, I hope that a lot of my future work will be in this domain as well. So language has a lot of information about the world. So models trained on language will learn a lot of stuff just by memorizing different words or tracking occurrence patterns. So factual information like Paris is the capital of France, so birds lay eggs. That can be learned directly from the text. But what is more is that by tracking co occurrences between words, we can learn a lot of the distributional information. So I'm a model or a human would encounter sentences like the sky is blue today, the sky was pitch black, the sky is pink, and keeping track of that information provides a distribution over possible colors of the sky, what colors are possible for the sky, which colors are more frequent versus less frequent. And it's mainly this distributional knowledge that we're interested in here today. And so one specific kind of this knowledge is generalized event knowledge. So it's storage of common templates of templates of common events observed in the world. So the fox chase the rabbit is a good plausible event. The rabbit chase the fox, less plausible. The fox cheese the teacher. Also not very plausible, but possible, right? So these don't violate any fundamental rules of the world around us. And so because all of this information is in principle, could be available in your linguistic input, and lots of information like that we learn from language instead of direct observation, we ask, does generalized event knowledge rely on the language network? And so here, we're first referring to the language network in the brain. And so we show people sentences like the Fox is chasing the rabbit versus the Rabbit is chasing the fox. We also show them line drawings of the same kind of events and had them do one of the two tasks in different blocks. So one is a semantic task. So is the sentence or picture plausible or implausible, and the other one is a perceptual control task. So the stims is just moving on the screen, left or right, very slowly. You have to report which way it's moving. And so our hypothesis were, well, the language network is not engaged in pictorial even semantics, it cares about language, not meaning. So it wouldn't care about pictures that would only care about sentences. Alternatively, the language network cares about meaning, and therefore, it would be engaged in pictorial even semantics, maybe just as much as in sentence semantics. And so we'll look at responses in the language network. We see high responses to sentences during the semantic task. That makes sense. Sentences extractive meaning. We see low responses to both stimulus types for perceptual tasks. It's a hard task. They're tracking it on the screen. They're probably not even thinking about the meaning, so we see low responses. And so the critical condition here is pictures during the semantic task. Is it going to be high as for sentences semantic or low? And we see this result with something in between. And so we didn't get a conclusive resolution to our hypothesis here. So we have to say, like, the language network is somewhat engaged in pictorial and semantics. There is some response, but it's not as strong as four sentences. And we replicated this result across multiple experiments. We see a similar picture emerge in naturalistic stimulus comprehension. So listening to stories versus watching movies, we see a similar kind of drop off, but no engagement for the music stimul. We don't see the same kind of response for single objects categorization, though. So asking, Oh, does this animal live in water or can this object be found in the kitchen? The response in green here is very low. So maybe it's something special about events that is causing this response and the language. But also we have this really valuable piece of causal evidence from two participants with global aphasia who have really severe language deficits alongside 12 age match controls that we had. So for sentence picture matching task, asking which of the line drawings goes with a sentence, they perform at chance or near chance. So these are two participants with aphasia, sa controls. So that makes sense. They cannot process language properly. They're bad at this. In contrast, for picture plausibility judgments, they perform at the level or close to the level of healthy controls, suggesting a dissociation between their linguistic skills and their event knowledge. And so this causal evidence helps us draw a novell conclusion that the language network is recruited but not required for pictorial Evan semantics. But that doesn't answer the question. Why is it recruited? But, uh, actually, before I get to that. So, one other question you might ask. Okay, the language network does not does have a preference, but other brain regions that respond equally strongly to verbal and pictorial semantics. And the answer is yes, there is a set of brain regions that we're finding that actually seem to care about semantic tasks, regardless of whether it comes in form of sentences or pictures, and they are separate from both the language regions and the multiple demand regions. And so happy to talk about that more. This is a work that we are also preparing to put out currently. So stay tuned for semantic processing in the brain. Okay, and the second question that lingers from that conclusion is okay. You know, why do we see this response in the language network to pictures? What is it doing here? And one explanation, well, it's just task irrelevant activation. You see this, you know, a picture, Fox chasing rabbit. You have activity for Fox, for rabbit, for chasing, but it's not helping you do the task. An alternative explanation is that, well, the information that you've extracted from language actually is helping you do the task by tracking those distributional properties of the input, you can reason about event plausibility. It's not the only route which is why people with aphasia can still do it, but it's a possible route for solving the task. And so we don't have definitive a definitive answer as to which of these explanation is correct. But what we can do is we can start following up on the second explanation, asking, Okay, what about language models trained on purely distributional properties of the input? Can they do this? And so that will be an existence proof that it is possible to solve this task with distribution of language information alone. And so we looked at language models along with my colleague here, Karina Cow. What we did is we had minimal sentence pairs, so we gave models, sentences like the Fox chase the rabbit versus the rabbit chase, the fox. And we looked at how likely a model judged a particular sentence to be. So you can see just what is the probability of a model outputting sentence one versus sentence two. We expect that probability to be higher for the plausible sentence versus the implausible sentence. And if a model got a particular pair, correct, we give it an accuracy score of one, if not zero and then average over a bunch of minimal pairs. And so we did it for two kinds of sentences. One is animate inanimate interactions. So the teacher bought the laptop versus the laptop bought the teacher, which essentially means that the second sentence is impossible. It violates the animacy restriction on the verb. And so here we see that the language models in green here actually perform close to the ceiling and close to humans who are at ceiling here. And even very basic our control language models before neural network era, they perform fairly well, as well, although not as well. We also looked at animate animate interactions, so the fox chase the rabbit versus the Rabbit chase the fox, and there we actually see a performance gap where language models of the recent generation are not as good as humans. And then our control language models, which are much more basic, they actually are a chance or only a little bit about chance. So there is a meaningful difference here that's being captured. And we call that the gap between the impossible and the unlikely. There might be many explanations for this gap. If we go back to the formal functional distinction, we might say, well, maybe the selection restrictions on the verb, knowing that bot requires an animate subject, that actually might belong to formal linguistic competence, and that's why models learn it alongside with general grammatical rules. Versus knowing that the rabbit chased the fox here, it doesn't violate any linguistic rules per se. It is more about the graded event knowledge that we have about the world, and so that maybe we can put that into functional linguistic competence. So we can start using this framework to tease apart different processes, different aspects of general event knowledge. And I should start wrapping up here. But to finish up the event section, does generalized event knowledge rely on the language network and the brain? The language network is recruited but not required. And does generalized event knowledge naturally arise in large language models? Yes, for impossible events relying on selection and restrictions and less so for graded event knowledge. That's definitely above chance, but it's not as good. Then we have recently expanded our knowledge framework to a whole host of different domains. We call it elements of world knowledge or EOC. We've looked at a lot of them. Specifically, we'll leverage here contextual knowledge. So the Fox cheese, the rabbit. There you can solve that problem just by memorizing. If you just have those sentences in your training corpus, the model will be able to tease them apart. But here, we have sentences like Ben and Alok are friends versus Ben and Alok are enemies, and um the sentence Ben and Alok of friends is plausible in the context of Ben likes Ag, but implausible in the context of Ben Hits Log. And then the enemies one is the flip, so the context determines the plausibility of the target sentence. And so we tested a lot of different language models from Hugging face, the open language model repository on a lot of these domains, and we see that um our best performing domain social interactions, the later generation language models actually achieve good accuracy. They are close to one. Was on spatial relations, we see models performing much worse overall. This bar here is the human ceiling, so we see that humans here are not quite perfect for some interesting reasons, actually, but there is a huge gap between humans and models here, which is the main thing we care about. And we can do the same thing for a bunch of different. The main takeaway, really, is that social interactions and social properties are easiest for models, and knowledge about the physical worlds of physical relations and spatial relations are domains which models struggle with the most. And overall, it's interesting that you see in general, all models kind of cluster together. Of course, there are differences between them, but overall, domains that are hard, are hard for all models, domins that are easier are easier for all models. And so we show here that basic world knowledge in language models varies drastically by domain with social knowledge being easier than physical and spatial knowledge. And the goal here is really to create a framework. So we have ways of generating new sentences with those same templates. There are ways to expand it to potentially new domains. So if there's any interest in this kind of world knowledge framework, we're happy to talk about expanding and building upon it. But to conclude the talk, formal linguistic competence is knowledge of linguistic rules and patterns. Functional linguistic competence are non language specific skills required for real life language use. This distinction is grounded in human neuroscience. It helps clarify the discourse around language and thought, and it can be applied to domains beyond language like programming and legal reasoning. And finally, it's applicable to both humans and large language models and AI systems in general. With that, there are two review papers that I want to highlight. One, about the language network with my PhD advisor at Fedorenko and Tamaegev and our paper about Associate language and thought in large language models with my colleague, KamaHowd, I was a senior author, and a few other amazing authors as well. With that, I'd like to thank my amazing collaborators and all of you for your attention. Yeah. For the wild knowledge question, I wonder if you're able to differentiate between sentences that are plausible and likely to be expressed versus sentences that are plausible that you would never see them on the Internet. Yeah, and there is a little bit of work on that. So the question was sentences that are impossible versus possible versus, like, common, right? Frequent or describe events that are impossible, possible, but rare, possible by frequent. And so language models, at least, you know, up to, like, a year ago, but probably still are good at distinguishing common and uncommon events and bad distinguishing uncommon versus impossible events, which is kind of what you would expect because there are statistical models that extract regularities from the input. There is another paper came out recently that tries to bring that distinction one step further and say, there are also impossible events and inconceivable events. So impossible events is like I don't know, like an elephant flying up in the sky versus inconceivable is like, colorless green ideas furiously. So you can't even imagine what it would look like. And so that gradient might actually be even more fine grain that refused originally. But it is a very useful distinction because that's exactly where we can break past the statistical regularities that govern language models and, you know, to some extent govern humans too. Yeah. I'm really interested in the fact that the large language models were most deficient in physical and spatial. Least efficient. Mr. Amo. Yeah, yeah. Yes. Have you looked at activation in the pin language network in humans in terms of physical physicality of the sentences or its relation to movement? Yeah, so kind of in, fitting with the overall formal functional competence framework, when we look at how the brain responds to say complex narratives, we actually see that words loading heavily on, say, social content engage social cognition regions, words like family, relationship, love engage so social regions, whereas words that have to do with colors or different textures or weight, they engage a different set of regions which, you know, we would expect is responsible for processing physical knowledge about the world. So it's not the language network where we see those differences, it's outside. So, you know, these words they seem to recruit relevant domain specific regions. Yeah. I'm going to go back to the Evok slides with Theon. With the results? Yeah, the neural network performances. What are material dynamics? Because that one, it seems to do pretty well. Right next to the social property. Yeah, material dynamics is like I have some cloth. I watched it rip versus I have I don't know, milk, I watched it rip. And so these are the kind of properties we're looking at here. And with this example, you, of course, can see that a lot of that information can be recovered from simple word co occurrences. So just looking at cloth and rip together versus looking at, you know, milk and rip together, that can give away a lot of that information. And so that's probably why it's easier. The sentences in this domain are very short, right? So here, there are a bunch of factors other than the domain itself that will influence LLM performance. We have looked at them. I used to have a little disclaimer down here. So these are things that you might, you know, some of that have to do with formal confidence maybe. They explain some of the variants, but not all of the variants. The domain still contributes on top of those differences in sentence length and basic word coccurrences. Yeah. So Python experts might have activated that network less. And then when looking at the legal data, Lawyers increase the activation versus non legal grad students? Within legal, do you think that there was a scaling based on level of expertise that non expert to expert lawyers, but then within lawyers, there's variability based on how proficient they are with the gel system, et cetera. Mm hmm leading to less activation from hiring experts, similar to Python experts. Mm hmm. Let me give you an example from a different domain that I think will help. So, work from Evslab by Sana Malek Marda who is defending in a week or something. She's done a lot of work on bilingualism and multilingualism. And so there, what she's finding is that you of polyglots, people who speak five or more languages. They have activation in the language network that essentially goes like this. The better you know a language, the more activity you see in the language network, except when it's your native language when it drops. So there is a meaningful difference between a native language and other ones, but kind of the better you understand the language, the more activity you will see in the language network. And the multiple demand network, essentially, the better you know the language, the less you need to activate the multiple demand network. And, if you experience speaking a foreign language, and if you don't know it very well, your brain gets tired. That's the multiple demand network getting engaged. But if you're fluent in a language, you will not experience that, and, you know, multiple demand network is no longer active. So we see these kinds of trade offs. They're not necessarily linear, right? And so from the lawyer data alone, it's a little bit hard to say what exactly is going on here. But we might expect some potentially nonlinear curve where we see first language network getting specialized. So we do see this increased activity, but maybe later on, it actually drops again because it becomes easy and so you don't need to engage it as much. So hard to say, but I think something like that is possible. Yeah. For the first example, does it matter for the language used? Is it more interpretable or not? For example, Python is relatively readable compared to C or something that is more machine native compared to human native. Is there any work that you know along? Yeah, so the work I was referring to is from Marina BednGroup Johns Hopkins. And so they are the ones who put forth this hypothesis that the language network has this false alarm response to Python, which leads to activate a little bit of not at all. So if that's true, then we would expect some kind of gradient where the more English like the language is, the more activity we will see in the language network. And so, you know, how do you quantify English likeness that's where, you know, it might get a little bit tricky, but that's roughly. So I would expect essentially all programming languages to fall somewhere in between Python and Scratch junior, right? The most language English, and the least English, like. Yeah. So regarding the Fox Chase the rabbit example, so what kind of prompts are you using Because I feel like the right kind of prompt, you can get the right kind of answers. It's the prompt cate account for this kind of So for those for that initial study, we are not using prompting these are pre trained models before they ever fine tuned. And what we're looking is we're looking at the log probabilities of the output, right. So literally just using this model for its original purpose, which is text generation and looking what is the probability score for generating one sentence versus the other. That tends to give you more information than prompting because essentially in prompting, you don't know if a model is failing because it didn't understand the prompt versus it doesn't have the relevant knowledge. And so there are studies showing that for log probabilities, a model might succeed even when a prompt would fail. We actually just had a follow up paper accepted to a workshop like a day or two ago where we did log probabilities versus prompting on fine tune and non fine tuned models, and we show that log probabilities are still better. So instruction tuning doesn't necessarily help with prompt based responses. So this is actually potentially a more sensitive way at getting at this underlying knowledge. It's 12 18. Oh, so maybe that should be the last question. Question from Benjamin Fanelli. I just wanted to ask, how much of human intelligence coded language? Can you think of any other media that might have capturable intelligence? Question is how much intelligence is captured in language. And so, as I said, lots of information is captured in language. The vast majority of the information we have about the world is learned indirectly through language, through other people, right? All of the I don't know, physics and chemistry and astronomy, right? It's not us observing the phenomena. It's us learning about them from textbooks and from one another, same with social knowledge, right, gossiping about one another to watch out for. That's different from the core question I'm trying to target, which is, are the mechanisms responsible for language processing the same as the mechanisms in the brain or in an artificial system used for reasoning or is general world knowledge stored in the same circuits as linguistic knowledge. And so that's where we see a lot of dissociation. So even if information comes to us linguistically, which a lot of it does, it might then proceed to other brain networks and systems for this more targeted domain specific processing. And a good last question, please. Relax. I