So a lot of very good to give thanks and I
thank you so much for a nice introduction.

OK So that's the general idea today we're
going to talk about language in the brain

in particular how you how it is that
you might learn a new language and

what that looks like once you've
learned a new language so

language is easy once you figured it so

kids have to work really hard to learn
language but as if I'm doing a good job

today as I talk language will just fall
away you'll forget that we're even

I need in talking to you and when you're
really in depth a conversation it's like.

It's like language isn't even part of what
you're thinking about you're thinking

about the concepts in
the thoughts right so

that's native language that's fluently and

that's what a lot of the work has focused
on so today instead we're going to talk

about what it means to learn a language
now learning a language is of course

much more difficult it's effortful it
takes a lot of practice and it's something

you have to work at even once you learn
the language speaking in the language it's

not your native language can be hard to do
they were trying to talk about the neural

processes that are behind learning a new
language and what that means in the brain.

So a lot of the previous work has focused
on native language understanding so

they put people in a scanner they
take pictures of their brain while

they read words in their native language
and we try to figure out what they are by.

It's doing so this is a slight departure
and that was about all of my work up

until this point was on this is a slight
departure in that we're going to ask them

to learn a new mapping of symbol to Word
and they're going to ask if we can still

pick up that that word representation in
their brain when they see this image.

So the questions we're going to answer
today are can we detect learning so

can we tell that the person is
actually learning the mapping that

we're asking them to learn and can we
detect the contents of the learning so

though I'm showing you a symbol
instead of the word cow

can I tell that you're thinking of
that word cattle So you've learned

the mapping from symbol to cow can I
see that new mapping in your brain.

And then we're also interested
in how that learned mapping

how the representation induced by the new
learned mapping is different than

the representation induced by their
original word form so how is reading

the symbol that you've learned means
cow different from reading the word.

So this is in collaboration with a few
of my collaborators that you that I was

that you that can tell just
recently I just moved in July so

this is my student Chris Foster
he's going to forever regret

putting on the CD cap and then taking his
picture because I use it in all my talks

I'm sure he doesn't appreciate it
as well as Chad Williams And so

Chad is actually the one who collected
the data and all of his his supervisor.

So you may not have heard of
the University of Victoria so I thought I

would just tell you here is Victoria it's
actually really close to Seattle and

it's actually south of
the American mainland border so

it's one of the most elderly
places in Canada and in addition.

It gets less rain than Vancouver.

And it gets no snow it doesn't freeze it
doesn't get below freezing it's very rare.

So that's where I used to live I
lived there for three years and

now I live in Edmonton I went in is not so
of the mainland border it does get snow.

You may wonder why I moved if you have
a question about that we can talk

during the break.

But Victoria's beautiful in my
collaborators there are fantastic.

OK So this is generally the flow
of what we'll talk about today so

we're going to talk about how
we find language in the brain so

this is the typical studies
I've done up until this point

looking at native language and native
language representations in the brain.

Then we'll talk about the paradigm we used
to get people to learn this new mapping

between symbols and their native language
and we go through the results and will

wrap up at the end so at the beginning we
need to talk about how computers represent

meaning so what would it mean for
a computer to understand a word.

So first all this talk
about what it means for

you to understand a word can you tell
me a word that's similar to orange.

Tangerines a good one Clementine
you guys are really close

to that orange space you're rating or in
space you're all naming citrus fruits if

I got you to go longer you might say Apple
maybe it's a banana so like in this realm

of things that are sweet and grow on trees
right citrus fruits Nobody said car.

And nobody said sandwich even
though you eat sandwiches right so

you have an understanding of what
semantics means what words mean and

I can ask you to sample around
a particular point in that space so

you have in your mind
a representation of the world and

that's what we're asking you to draw
from when we talk about word meaning so

what we need to do is get computers to
also build a similar representation.

So here is.

A representation of the world we have one
axis in that sweetness and we have one

axis and it's grows on a tree so all of
our fruit is pretty sweet and all grows on

trees and sandwiches don't grow on trees
maybe they're slightly sweeter than cars.

So they're a little closer to the group
than they are to them a car is.

So this is a very simple vector
space model of semantics.

B.S.M. and sorry and

this is the computer model will be talking
about today that represents word meaning.

So here in a vector space most
Magic's every word is a point and

so every word is assigned
a list of numbers so

here in the two dimensional space the list
of numbers will have two elements so

here sandwich is zero and zero point zero
three because it's a little bit sweet and

doesn't grow on a tree so every word will
get a list of numbers and that's how we

tell which words are similar they will
have lists of numbers that are similar.

So you and I know many many words and
we're able to differentiate between all

of them and that's because we have a very
high dimensional space semantic space so

we're able to tell the difference
between many many thousands of words.

And we define the dimensions of
language as we learn the language so

my daughter knows what a horse is if
I wanted to read a zebra was I would

just say it's a horse with stripes and

she would understand intuitively that
there is a dimension that is straight and

horses sometimes usually don't have
stripes and zebras always have stripes.

So we learn this these dimensions
as we go through life.

So we need to get a computer to
understand these dimensions in.

The same sort of way.

So we could create a computer model that
had the same dimensions as humans but

there are multiple problems with
that first of all be tedious

because how many dimensions do you have
the split of all the words in your head

you have maybe even infinite
number of dimensions.

It's also error prone in that we have
to get somebody to write down all

the dimensions and it's subjective
right you if you don't own a dog

you might feel differently about dogs than
I do because I own a dog maybe you have

a fear of snakes you feel differently
about snakes than other people do so

everybody's own representation of
semantics is a little different so

it doesn't really make sense to
ask people to write it down.

Possibly take too long and I computer
scientists so I have no patience and

I get computer I want computers to do
everything quickly there's also no ground

truth so like I said you and I have
different experiences of the world so

there's It wouldn't really make sense for
us to try to write down the one

true vector space model of semantics for
all people.

So what we do instead
is we we get a computer

to learn the dimensions automatically and

it assigns that each one of the words
a point in a space in that space.

And so we're going to process a large
text corpus essentially download

all of the web pages we can get our
hands on and then we're going to build

a model from all those web pages together
so you can sort of think of it as

like the average semantic space because
many many people write those web pages and

we're sort of averaging
together their experience so

now instead of each one
of our words having

two numbers in the vector we're going
to make these doctors much longer.

On the order of hundreds of dimensions and
that allows us to tell the difference

between tens of thousands
of English words so

in order to do this vector
create these vectors for

each one of the words we're going
to look through this corpus and

we're going to look for words that
are associated with a word of interest for

a word of interest as banana we're going
to look for all of the time to see the man

in the corpus we're going to see what
words appear nearby So as you can imagine

banana often appears with the verb
eat probably less with the verb drive

that would make it similar to that noun
Apple but different from the noun car.

And you can see how the way that we use
words implies something about the meaning

and so that's what we're leveraging here
in these computer vector space models of

semantics that's been going on for
a long times nine hundred ninety seven was

say one of the first really popular models
of this late in semantic analysis and

then I will be using today is called Skip
Graham was released in two thousand and

thirteen and it's been a pretty effective
model I'll tell you a little bit about it.

So skip Graham works like this it is
a neural network model and the task

it is is trained to do is to predict
context words given essential words so

you sell you tell a neural network
Central where it is banana and

it needs to predict with high probability
the words that will occur nearby.

So it's going to predict with high
probability maybe high probability yellow

and so
its ability to predict those nearby words.

Tells us something about the word
meaning so you can use any

large body of text that one word using
today was trained on Google News So

it's a whole bunch of news stories and
the hidden layer of

this neural network is just one hidden
layer becomes the word vectors.

So sort of the hidden representation
that the network learns

becomes the forms the space
in which we place words.

So we have some are kind of remarkable
they can do a lot of different tasks

they can approximate human judgments of
words similarities if you ask a person

to judge they may day and noon How similar
are they they might give it a very

high score two point nine four and
the distances between those words in

the vector space will correlate with these
human judgments of words similarity.

We can also they can also ask answer
total questions about synonyms so

what's a synonym of levied
imposed synonym and so

those two words will be close together in
the vector space whereas the other choices

will not be can actually perform about
as well on these sorts of tasks as

students take necessity you might be
impressed if you're not impressed.

OK And then also a possibility for
role filler so like what

given the verb fire and the noun employer
how how likely is it that agents so

do employ employers to the firing
Yes they typically do.

OK and they can also do something really
neat and just check just track the meaning

of words over time so if we are able
to split our our corpus up into decades

we can watch how the meaning of
a word changes over decades so

for example the word broadcast
used to mean to go see it and

now you know of course it means something
to do with transmission of radio.

So the actors are.

Interesting surprising tool math human
behavior pretty well there are people who

will argue that they're not perfect and of
course they're not but they're that there

are a tool that we have now is not
going to talk about how these models

are trained on Corpus data relate to what
the brain is doing so what does the brain

do when you think of a word does that does
your brain representation have anything in

common with the vector space reparative
representation that I just told you about.

So here we're going to take our
vector space model of semantics or

we're going to try to figure out does it
have anything to do with E.G. or any G.

collective a person a person
reads the same word or F.M.R.I.

of a person reading the same word so
it's been employed in all

three of these modalities and
today we'll just talk about.

It.

So in order to do machine learning we
need to take our brain imaging data

turned it into something that machine
learning algorithm understand we're going

to take it and then try to predict
the vector for that word so

we just in E.G. we do the simplest thing
you could possibly do which is just

to concatenate all the time series for
the time period we're interested in and

that becomes a feature vector So this
is the simplest thing you could do and

it's also there's lots of room for
improvement here.

Another thing that we do is
we average all the trials for

people reading a particular word so
when you read the word banana

we're going to average all of the trials
for when you read that red banana Plus we

average across all the participants this
is a little different than what people

had done previously in previous work they
would build a model for each subject and

then measure the accuracy of each subject
and then average those accuracies together

and report the average accuracy
of a subject specific model so

here averaging across subjects this
might be a little surprising for

people who work with brain imaging data of
course people have different sized brains.

I think because of the smoothness of E.G.
that's why this works so

now we have a brain data average that
tells us what the brain usually looks like

when you read a word like banana we're
going to train a regression model for

each one of the dimensions
of our vector space model so

that's what these bit is represent and
we're going to use the brain imaging data

multiplied by that data vector to predict
one of the dimensions so each one of

the beta vectors is independent that
means we have an independent model for

every vector every dimension of the vector
space another room for improvement but

another simple assumption we made that
actually ends up working really well.

Because the brain data is
high dimensional we use

a regular riser to Ridge aggression.

Which just means that we ask the model
to only choose the most important

features when it's making its prediction.

And so what this model does is it
takes as input brain imaging data and

it produces as a put a point in
their high dimensional space so

it's predicting one of the points
in our if this was sweetness and

grows on a tree it's going to
predict a point in that space and

then we can tell what word it is by
finding the words that are nearby so

we're going to test our
model on how the data so

here in my example I have seven words I'm
going to hold it the two words most and

house I'm going to try to predict on
them what word I think they're so

I'll take my model I'll give it
the easy data multiplied by the data

here's that matrix and so
I'll predict a point in space for

both this brain image as
well as the screen image so

I look at them in space here's
prediction one in his prediction to.

The two and asking the model to choose
between the two held up words telling

the model here are two brain images
the words they correspond to are most and

how so you tell me which is which So
I want to know

is the assignment of prediction one
to mess and prediction two to house

better than the vice versa prediction
one to host prediction two to miss So

here it's really obvious the green lines
are much longer than the red lines and so

we're going to choose the assignment that
says prediction is most prediction to his

house this is the two vs two test so
essentially we're holding up

to two brain images and we're just asking
the computer to decide which one is which.

So two vs two accuracy will
be the percent of those tests

that we run that are correct and
will run all possible pairs of words so

instead of sixty there's a seven hundred
words were pairs that you could test and

the reason you do this is because you're
using two predictions to make one

assignment so you know that even if one of
your predictions is quite bad the other

prediction might be able to compensate for
it so you have better signal to noise it's

easier to tell if your model is better
than chance using these two vs two test.

And chances about fifty percent because
you're choosing from one of two choices so

if there is no signal at all if the brain
had nothing to do with actors we would see

but fifty percent So overall this is
the flow so we start with a corpus we

run it through this word to back algorithm
to get our vector space model and

then we take that back to space not on the
average of all our E.G. data and we run it

through the classification framework this
is our bridge regression to predict points

in space so we run the two verses to test
and we get to our two vs two accuracy.

So it will be talking about today is
mostly this to persist two accuracy it

just tells us if there is a relationship
between the word after space and

what the brain is doing.

So this is going around I think
the first example was two thousand and

eight Mitchell paper so it's been used
a lot and one thing my group has done is

actually assemble together a bunch of
freely available brain data imaging sets

of people reading words and made them
available online so that people who build

vector space models can test against
the brain this is brainbench

we have a bunch of different modalities of
any gene E.G. we also English and Italian

data sets abstract abstract and concrete
nouns although right now it's only now.

And growing so
we're always adding data sets when we can.

And so here is with the same.

So my students are going to never have
a picture taken with this headset and

here's skip Graham so
performs quite well in this case

about as well as these other two models
and because it's quite popular and

Computational Linguistics
that's what we'll use today.

So it's on the order of like seventy
seven percent versus two actors and

this is across all of those datasets.

Other questions now if you could start.

Early this.

Is your story about.

Now.

So today we're just been talking about
single words and they're not in context

there are actually pronouns and verbs and
nouns in what we talk about today so.

We do have multiple parts of speech.

And there's some new work coming
out of people reading sentences and

how that you can use a recurrent mill
network which is a neural network

trying to predict the next word in
a sequence how it's hidden representations

actually relate to the brain so
people are thinking about beyond nouns.

But for now we're just talking about
Sigurd's get any other questions.

So in brainbench there's
a bunch of different paradigms

sometimes they're answering a question
like Is it bigger than a microwave oven.

Sometimes Or just told to visualize
it there's a bunch of different.

Actually my case they are because this
magic space they used was behavioral So

the semantic space the vectors
are actually made up of the answers to be

a real questions.

But often it's just a task to make
sure that they're reading and

understanding the words so
it can even be like a one back task.

Yet.

The question is kind of what

if you did instead of the two verses to
test something more like rank accuracy so

if you had a list of sixty words and
you had to choose from the sixty but

to rank them by the distance to the
predicted point it actually performs a bet

on par with the two vs two test because
you get something the two vs two test is

essentially rank accuracy with only two
elements you can think of it that way.

So but another interesting thing to do and

you could also have a much
bigger set of words.

Yes we'll talk a little about the E.G.

when I get to that part we're
just going to start but.

In general we do the simplest things like
a theme for today we do the simplest thing

you can which is just take the time series
you're not doing any for your analysis

we do a little bit of I.C.A. to take out
artifacts but it's it's pretty low key.

OK So the way that we're going to ask
people to learn this new language is

using a reinforcement learning paradigm so

it twenty four subjects come into
the scanner they're all fluent in

English they've got like a very high
score in an English fluently test.

And they had to learn this new language
while having E.G. Niji cap on.

Now at this point I need to get a copy
at that this language is actually

like an incredibly simple
mapping of symbol to word so

it's as if you're learning
a new vocabulary so

you're learning which symbols map to words
we're not talking about grammar right now

not talking about syntax none of that
we're just talking about mappings.

For instance by reinforcement learning
which will make clear in the second series

this act of Chapter sixty four sensors.

And the sixty one signal sensors and

we cut to that five hundred
hertz down sample to two fifty.

So here's what it looks like in
the scanner so you're going to come in and

you're going to see is a symbol it looks
like this is taken from one of two Indian

languages and then you're going to be
presented with four choices now the first

time you see these four choices you don't
know the answer you don't know which one

of these words the symbols responds
corresponds to so you choose one dog

you're probably wrong twenty five
percent of the time you'd be right.

And then the next time you see
the symbol maybe a choose a different

word maybe choose Run and then you get it
right so it's a process of guessing and

getting feedback is how you learn
the language in the in the scanner.

So that's like this so for five hundred
seconds you see this funny symbol and

then the choices appear on the screen and
then you have up to two seconds to

make your choice once you make your
choice it turns white for five hundred

milliseconds there's some intertribal
interval here and then we give you

feedback about whether it was correct or
not and you see the feedback for a second.

So answer is interested in answering
a few questions the first is

is there a difference between receiving
correct and incorrect feedback so

do you does your brain respond
differently to those two feedbacks.

And we're also interested in can
we see the semantics of the word

you're learning so if this is you in this
is correct can we see the semantics for

the word you at this point in time
when you're reading the symbol.

So our data set is made of sixty words and
they're randomly assigned to sixty unique

symbols so there's no rhyme or reason for
which symbol goes we didn't choose like

I'm pretty Cowie looking symbol for cattle
that the sixteen words are made of fifty

four nouns three verbs three pronouns and
they're presented in random order so that

means that every subject sees the words in
a different order and in addition they're

presented around a different number
of random times for each participant.

So some participants see the word count
twice some participants will see that word

count twelve times.

So we're going to go through a few
different results one of them is sort of

one of the more traditional
analyses of just an E R P

analysis of the signal to see if
we can detect or word positivity.

So interested here and asking what is the
difference between receiving a positive

and negative feedback so does your
brain do something different when you

receive positive versus negative feedback.

So here is a graph on the Y.

axis is the voltage This tells us sort
of like how much your neurons are firing

in here is the telling and so on the spot
here is the onset of the feedback so

this is the point at which they
see either a check or an X..

And so the lines look like this.

If it's correct you see this
large more of a deflection and

if it's incorrect you see there's
more of a up bump here and

if we take the difference of those two
waves we see that there's a big difference

between correct and incorrect feedback
in about two hundred eighty knots.

So this is typical reward feedback
response and so it's good to see here.

So where when other question you
could ask is as you receive more and

more feedback so you got it right
Cal is the symbol you got it right

the next time you see that
symbol you guessed cow.

Is this is a response
to that difference so

you're going to be expecting the correct
feedback that you will get it correct so

what does your brain
response look different so

again here's the onset of the feedback and
here reword one is red and

then sort of mustard and
green blue blue pink so

you can see that the the curve
changes as you learn so

this can be interpreted in more than one
way one could be that it's a difference in

reward so it's less rewarding to
get it right the second time or

it could be a difference in learning you
are learning you've already done a lot of

the learning the first time you get
it right so we can do is average

across all six of these peaks in and
find sort of the grand average peak and

ask what the difference is between
the conditions for that grand average.

So it looks like this so the very first
time you get that correct feedback you

have a large response and that it falls
off with a sort of diminishing returns.

So as you continue to get
the words correct you.

Are less excited about it.

So now we're interested in
we have shown we see in

the behavioral data that
your learned that language

are our subjects on an average get
around eighty three percent accuracy so

they learn the language by the end can
we tell in your brain that you actually

learned that mapping can we see the words
the symbol is mapped to using E.G..

So here are going to take all
the words are represented six or

more times this is because we need a good
signal of the actual representation for

the word and we're going to average all of
the trials beyond the second repetition so

three four five and
up as many times as you saw that word.

We're going to take E.G. signal from
all the sensors again we do nothing

special here we just put
them all together and

this is from zero to seven hundred
milliseconds after the simple answer so

just like we talked about before we take
in the brain data we'll pass it through

a model and predict a point in the in
the vector space and turns out using

this task set up we get almost eighty
percent accuracy so we're able to tell

what word a person is thinking of when
they're doing the simple earning task.

So this was surprising to me and exciting
but there's lots of follow up questions.

So when do you learn the representation so
at what point can we actually detect

that you've done this learning so
not on the first

trial remember the first trial you have
no idea what that symbol means and

you're guessing right so maybe by the
third trial or maybe by the sixth trial.

So that we answered this was to
average the trials together so

here this this increase includes correct
and incorrect trials so whether or

not you got it correct or incorrect we're
going to average those trials together.

The read is going to be averaging
trials one to three then two three four

four five six three four five and
four five six and

so we're going to see as a function
of which trials we averaged together

whether we can see the word the year
you should be mapping the symbol to.

So the graphics like this story on
why is two versus two accuracy so

we're seeing here is that if we
average together the fourth fifth and

sixth trial we're able to get almost
the same accuracy that we get using all of

the data which means at this point in time
we can tell that you've learned nothing.

This exciting we're able to
see quite early on trial for

five six that you're already
learning this mapping.

Another question you can ask is have
the behavioral data so we know that

some of our participants don't learn as
quickly as other participants so some of

their accuracy behavior accuracy is just
low or if you're having a bad day maybe

if you got to have coffee there's a lot of
reasons why this could be so we want to

know is there a correlation between the
two versus two accuracy that we can we can

get using the data and their behavioral
accuracy so if they are behaviorally worse

at this task are we able to do they
also have worse to persist to accuracy.

Rate.

So we had twenty four subjects it was
pretty clear division of the top and

bottom seven performers seven
written seven best performers so

we didn't cherry pick this number.

So we took all of those with Task
accuracy below eighty percent and

they get two versus two accuracy fifty
nine point seven So almost sixty percent.

This would probably not pass
statistical significance test.

But if you take those with accuracy above
eighty five percent we get sixty five

percent so there is a correlation between
your behavioral accuracy how well you

perform that task and our ability to tell
in your brain whether you're meeting

making that mapping and you might wonder
well why is the sixty five number so

much lower than the seventy nine
that you told us earlier and

that's because we're only averaging
together seven subjects here instead of

a full twenty four there's
a signal to noise problem but

so we're able to see that those who
perform well in this task are actually

doing a better job of creating that
consistent representation in their minds.

And finally where in the brain
is this new language so again

we have two vs two where it went so we
had to persist to actually see on the Y.

axis and on the X.

axis is time so
here's the onset of the symbol is zero so

this is the symbol they're
learning the mapping for and

five hundred milliseconds is
the onset of the choices.

So the graph looks like this so here
we're trying to look with windows a fifty

milliseconds or trying to predict what
word they're thinking of during this part

and here it's still there see this
point they're seeing for words but

they can see the word that
is their preferred choice.

If they're doing the task correctly so
we actually get above chance accuracy so

we're able to tell what words are the
symbol is mapped to very early in time

it starts about one hundred forty
milliseconds in these wind Sobeys

one hundred forty one hundred
ninety milliseconds and

here much later in time after they see the
onset of the choices we see another peak.

So it's like they have a representation
when they see the symbol and

then once they see the word choices
they have another sort of refreshing

representation of the same word once
they see that the correct choice is on

the screen to the questions.

Across.

So people are remarkably similar in their
understanding of words even if they have

different opinions about them so there's
I don't know if you could do it E.G.

data but they've done it with half
my training on one person and

testing on another and it works for
some people not all people.

Because we do it we do agree about a lot
of things in the world especially nouns

the kind of nouns we're talking about so

you understand what a chair is
there's not much disagreement.

And they did have a question.

For you.

So this is the yeah so

this is everything's this should be trial
three and on and we didn't actually

Selectric correct trials but it mostly
contains correct trials at that point.

Yes we'll get to that in a second so
one of the questions we're also interested

in asking is Where in the brain is
this representation showing up so

I mean these are called the outs of
course because we're using E.G. So

that's a very smooth representation but

that's one of the things we'll talk
about maybe just one more question.

So here every point in this graph
represents a new set of betas.

But one good question would be what if you
took the bait as you learned during that

early peak and apply them everywhere
else what would that look like so

that would tell us they would answer for

us the question is the representation of
the word when you're reading the symbol

the same as a representation of the word
Once you see that word on the screen.

When you were.

First learning the word in English so
there's somebody was.

Just.

So.

Yes.

So none of that was an E.G..

And I guess it kind of gets
the intersubjective question that

is there could we use this across
subjects and sometimes yes.

OK.

Just interested in where in the brain
this signal showing up so what.

Sensors do we see this representation.

So here is one of these points
represents an easy sensor and

we're using not just that
point not just that sensor but

all the sensors in its immediate
neighborhood in order to train the model.

So this is from zero to four
hundred milliseconds so

this is the time they're viewing
the symbol this this is mostly the time

they're viewing the choices and here's
an average over the full one second.

So here is a few I don't
know if you can see but

a few of the points are white white
points are significantly above chance so

actually during the time that they're
reading the symbol none of the sensors by

themselves can actually predict what
word they're reading the word mapping

to the symbol so that means that this
this representation in this early point

is pretty weak it's not as
strong as we see later so

here during the time there they're reading
the choices including the word they know

is the correct word we see a much more
robust strong representation and so

individual sensors in Left Temporal can
by themselves tell us what we're reading.

And if we take the full average you
see that increase in the number of

statistically significant points meaning
that there's some complimentary signal

across the two time periods so when we
put them together we get a better model.

So it's not like everything before
four hundred milliseconds is noise

there are some additional benefit for
including that in the model.

So I'm going to do a little
bit of wrap up and

talk about what we talked about today and
why it's interesting so

we went through a lot of results and
sort of like why should you care.

So we can detect the word positivity
as people learn this new vocabulary.

That is that we can tell that they have
they have a different brain response when

they get a word right and when they get
it wrong we can also see that the reward

reaction to mission is over correct So
as you learn the paradigm as you learn

going through the paradigm your brain
response to the correct trials diminishes.

And so this could be either because
there's less learning on the subsequent

trials or because the reward is less

great because you knew it was
coming it's an anticipated board.

So we can also now we can detect
the semantics of the word

during this learning trial.

Now this is important because some of the
past work doing this word vector analysis

has had some criticism some of those
criticisms are do you know these vectors

actually have anything
to do with semantics.

Right they could be just
the visual features and

actually there's a there's
a correlation between.

Frequency and wavelength so
words that are frequent are shorter

That's just a function of language it's
not something we can get around and

so something that's a little semantic
something like word frequency

has an effect on the word form and
that means that there will be fewer white

pixels on the screen when you're
reading words that are more frequent so

how can we tell that what we're doing is
actually decoding a semantic signal and

so the work that I talk about today
because we're not showing the word

we're showing arbitrary symbols is a good
piece of evidence that what we're actually

decoding is semantics because these
symbols are arbitrarily chosen and

mapping doesn't have anything
to do with the word.

And we're also not coaching
them to visualize the word so

a lot of previous work had asked them to.

Just beforehand I asked them to imagine
all the words and then in the scanner they

were supposed to imagine the word for
three seconds and that's not very natural

that some of people do when they read and
so here this is a very engaging paradigm

because you're doing this learning task
and it's enough to engage people to create

a sense of representation that we can
still pull them out with this technique.

So we can also detect this learned
representation is early as four five six

so even though we're including correct and
incorrect trials trials for five

six are enough to tell us what the word
is that you were mapping to that simple.

And we found also that behavioral accuracy
correlates with a two versus two accuracy

so even though so we can tell how
well you are performing on that task

just looking at your two vs two accuracy
this could be useful for things like.

Determining the difference
between a correct guess and

somebody who's actually learning.

Now we also see two peaks in those two
verses to accuracy which I think is

maybe one of the more
interesting results here so

we're at the very beginning when they're
reading the symbol we see the short peak.

And then we see once they see that
the correct word is on the screen we

see this much more robust
representation of the word so

the question is is there a mapping between
these two states is there a similarity

between the time they're reading this
symbol and the representation they pull

up when they read the actual words and so
one of the ways we could test that and

we're working on right now is if we
trained a model just on that early point

that high peak during the simple reading
part can we test it later in time during

the time they're reading the actual word
in this or can we do still predict that

with two verses to accuracy so
this is a if you may have heard of it

time generalization method technique
where we're training and testing across

different time windows to see how
the representation changes over time.

So we see we see even with this
though we can tell that there's

a difference between the semantic
representation for the symbol and

the one they pull up during when they're
reading the choices so the first thing is

the symbol representation appears later
it's about forty most seconds later and

it's much less it's sustained for much
less time so it's just a little peak and

it's gone whereas the time when they're
reading the choices it's a much longer

peak although it's a similar
accuracy which was surprising to me.

And the peak that happens after
you're reading the choice is earlier.

We also saw that the distribution
sort of the censor space

told us that the representation during
the time you're reading the symbol is

much more distributed weaker and none of
those none of the sensors by themselves

could tell us what the words we
needed all of them together.

And so this is different than when
you're reading the choices when

individual centers were enough.

So this tells that representation
during simple reading is not as robust

it's a weaker signal.

So in the future there's a few
ways we could go with this one of

them is OUR I've already talked about
which is if we train a classifier during

the time you're reading the symbol does it

perform well if we tested during the time
you're reading the actual choices so

is there a similarity between
those two representations.

And we can also extend it to
other learnings scenarios so

it would be interesting to test
people what representation

is look like when people are learning
to generalize to new stimuli so

if I told you you'd never heard of a Zebra
I told you that the zebra is a horse with

stripes What is your learned
representation for a zebra look like.

We can also talk about different ways
to do curriculum presentations so

is there a way to present
trials in a particular order so

that people helps people learn and
with that I'll actually

stop asking questions and
e-mail me if you have any that question.

And you'll never pronounce
my name wrong ever again.

So.

It's.

Your.

Shoes.

She goes.

You know you.

Go you have any predictions that
would be interesting so we would play

them word and then we'd either play
them like English word or we'd show

them English words that be interesting I
have thought about how that would work.

It's an interesting idea.

If.

You learn.

The words right.

And then they remember.

What you know.

So without them you know.

There are ways in which
the brain leaves me.

And the only.

Way I'm going to be just.

The way the.

Still I want to believe that sort of thing
I guess his question the question is

sort of like if if we encourage people to
not translate because what we're doing is

we're actually getting them to translate
we're showing them the word and so we're

encouraging them to do this mapping to
words we could have said show them doesn't

work for all of the words but we could
instead show them pictures like we could

show them a picture of a cow so the symbol
for cow and then a picture for a cow and

maybe that would be a different show
a different response so be interesting.

Yeah so I think well so

there's a whole bunch of work that shows
that reading is very automatic and sort of

like the representation that you pulled
up is pull up in your reader where it is

quick and not something you have a lot of
conscious control over so I think that

that's what's happening is they're reading
that once they see that correct word it's

like their brain just does because they're
also seeing three other words right but

they're able to focus on the word
that they know is correct.

I mean.

You.

Know.

So this is a great question yeah.

So the way that we tested
just about significance for

these models is called a permutation to us
so we take all of the sort of assignments

of which brain image of corresponded
to which where do we shuffle them so

if we were just learning
some correlate of vision.

If if the word vector had really
nothing to do with it and

we could sort of like hijack it in predict
something it had to do with vision

then when we shuffle it we could
still do it and so we can't or

at least that's what we're testing against
when we say it's better than chance or

a test we're testing it
against a model that had that

situation where the words were where
shuffle the semantics no longer lined up.

With the actual brain image.

And so
if you're interested that gives a fifty

fifty point zero one percent accuracy so
very very close to fifty percent.

I retraced.

That's interesting so
if we instead of showing the words the.

Screen they came up would be say a set of
four symbols and you had to choose which

one is right to be Learning Association
from symbol to symbol and I think

the science question would be Can you see
some of the visual features for the second

symbol during the time you're reading the
first symbol that would be interesting.

I'm not sure I want your I would
think maybe you might be able to

do that in different modalities but
E.G. is very smooth.

Just.

So because you're an Indian speaker you
know these you know these letters so

if we only took people who speak an Indian
language I forget the two that we chose

Tom And.

Yeah.

So we both know one of
them that's good so.

That's interesting so if the response
to be different if this was your native

language that's interesting because you
could build then you could build sort of

like memory devices in the thing that I
would find hard about this task is you're

seeing these symbols you know like I don't
know curvy thing with the stick on top and

then that's you I guess and you're
trying to build these associations but

if you actually knew what that you know
actually meant in the language and

you could actually build the betterment
representation I would be interesting.

Yeah.

Yeah.

So it's interesting I think the question
is something like If you ask people so

they they get to see the symbol and
then they ARE you ask them before you show

them the choices their
confidence in do they.

Think they're going to get it right

would that correlate with the sort
of strength of the representation

when they read the symbol I think
that's a really interesting question so

whether this like introspection correlates
to if you can tell how strong your center

group is indeed and there's a bit of
personality in there too right because.

It depends what you kind of
person you are you know I mean

some people just aren't as
sure about their answers.

Yeah.

Yeah.

Yeah.

Yeah.

It's a good question so the question is So
why do you see that one little peek and

it goes away you have to
remember that word right.

It's a good question I don't I
don't have a good answer for it.

But you think if you could detected
at one point if it was still there

we could still detect that maybe it's
not as strong that might be true.

I would have a cautious person so

I would imagine I would do not operate
on this task and I could imagine

that when I think about what I would be
like if I was doing this task is like.

You know like and so that's what you see
you see like talent like the person is

like deliberating internally and
then they see the choices and.

You know but.

So we train them we train them separately
I do have some really preliminary

results of training training during that
very first time in testing later and

actually doesn't look like
there's a very good match but

we haven't had a chance to
really dig into that deeper yet.

Great so.

Our model is as I said the simplest
model you could possibly train so it

doesn't know anything about which sensors
are close to each other and it doesn't

do any sort of correlation analysis or try
to project into a lower dimensional space

there's nothing there nothing on top of it
but of course all of that could help here.

Now to regression does have the property
that if there are a set of correlated

variables it will sort of spread
the weight across that group but.

Nothing beyond that.

Yeah.

So what perceptual
priming be in this case.

Where right.

I see.

And like what kind of I'm
interested what kind of features.

Yes I like a frequency analysis
that would be interesting yeah

there's lots lots to do here.

Is this OK.

Yeah.

So do you mean sort of if we ran
the reinforcement learning paradigm for

longer and
we use the data from later yeah.

Yeah yeah I think you're right that.

Yeah.

Yeah I think you're right
that if we went for

a longer and maybe use that
additional task in addition.

I bet it would become more robust and
that would be good because

then maybe we could train at that time and
test everywhere else that would work.

If.

You're going.

To make her.