Hi Good afternoon everyone looks like
we're going to get going my name is

an alum a ho and I'm with Samsung Research
America my role at Samsung is I work and

I manage our intern program some to
give you a quick overview of Samsung.

OK I want to give you a quick
overview of Samsung and who we are so

Samsung Research America is you
know Samsung as a whole is a huge

umbrella company below Samsung there's
a consumer electronics I.T. and

mobile communications and device Aleutians
I'm sure you've seen some of these

products maybe some of you own them how
many of you have some Samsung products.

Can somebody in the room name
a five products that you know of.

That are part of your daily life or
that you know are latest and

greatest any brave souls out there.

OK maybe not today.

So let's move on so
really quick I really big deal here

quick facts of Samsung We have five
hundred twenty five thousand employees

worldwide we're located in
eighty seven countries and

our net sales are three hundred sixty
nine billion we have been ranked the six

most valuable planned it's pretty
much a pretty good standing for us.

Making it meaningful we have six
design centers worldwide and

we have one various awards for our design
and overall localization approach.

Our commitment to R.

and D.

We have thirty six R.

and D.
centers as you see there's five in

North America we consist
of one of those R.

and D.
centers we invest daily in our R.

and D.
which is really big.

We are considered pretty much a cost
center but it's the company sees

a lot of value in investing in
aren't in research and development.

We have two thousand employees in
Silicon Valley we have grown rapidly

over twenty percent in the past two
years we are stem cell research America

there is Samsung semiconductor
that's located in San Jose and

the rest of the other entities
Samsung strategy an innovation

center Samson next contacts and services
are all located in Mountain View as well.

Here's our campus like I say we're
located in Mountain View California.

We take a lot of pride in our center
we play a pivotal role in developing

innovative products and solutions.

And our mission to research is huge

we definitely focus in user
experience innovative products and

concepts key business units and
product labs and Advance Research.

Samsung Research America consists of ten
blabs we have the mobile payment solution

of any of you have your Samsung phone
there's Knox services our pay system and

so forth comes from mobile payments and
securities and

that's located in Mountain View and

in our office we have a mobile experience
team the next generation materials

research team which is actually located in
Boston we have the standards and five G.

mobility that's located in Texas and

we have the think tank team which is
one of our innovative teams or mad

scientists if you want to call them and
we have the advance applied research lab

which we have Mason that is representing
to represent in that lab today.

Really quick just a slight last slide.

Some of our product research highlights
if any of you know the sounds and

your watch that came directly from

our Mountain View location there's
a project beyond that was also innovative.

The whole concept the whole optimization
of the product came from our

Think Tank team mobile Knox team again
it is located in Mountain View So

just some of the cutting edge edge
technology that we have comes

from Mountain View Again we have Mason
here today to talk to you a bit more about

the AI lab and the great things that he's
doing Thank you so much for your time.

I have one.

So I have to presentations for

you and the first one I was going to talk
about the AI center in general that.

Was actually just renamed in the past week
to the Artificial Intelligence Center

we were applied research but now we're
sort of our own entity within Sri and

after that I'm going to give a talk
about what I've been doing and

how it connected to my dissertation
research when I was here at Georgia Tech.

So the Artificial Intelligence Center
we have

three main campuses in North America
where there is actively

developing in research in the field
of AI in machine learning and.

So I'm in Mountain View and
that's where most of the team is but

we also have groups in Toronto and
Montreal and where growing very quickly.

The head of the lab very hack he is.

The senior vice president of the group and
he threw him where we have all of

these new initiatives that we're
trying to address both short of

underlying technology for sort of basic
research for AI machine learning but

then also how do we integrate these
technologies with Samsung products and so.

Larry just joined a survey last November
he was coming from Google where he was

also an S.P.P. doing dielike
systems in the Google assistant and

actually he was my supervisor
when I was at Google and.

Now he's the head of the group and
I'll tell you a little bit more about my

research in a bit but our entire group
we have about forty people right now but

we're growing and
we have a lot of headcount

to grow with in Mountain View we can hire
like another twenty to thirty people and

then within Toronto in Montreal we also
can have some room to hire people and

we could scientists and
engineers both masters and

some undergrad we have a few
undergrads right now but then also I.

For fulltime and internships.

Some of the things that we're looking
into include all of the like you know

the big things that you might think
about when you think of an AI lab and

that would be things like computer vision
and natural language processing but

a big thing that we're trying to
address right now is context how do we

leverage all of the devices that might
be in someone's home especially because

that's where Samsung has a huge advantage
and so we have all of these devices

like someone might outfit their entire
kitchen with only Samsung appliances so

how do we leverage all of this information
and use it to make predictions or

assumptions about the user so that we can
make the experience of using all of these

devices much better so a lot of our
emphasis in our research is based off of.

Improving the user experience.

And then we also have some more basic
research especially we have some very

new initiatives in robotics which I'm
a part of and then also things like.

Autonomy music generation our
generation which also I'm a part of.

Yeah so our.

I guess the theme of the group is always
to think about the user first and

then from the user what what
can we get from the user and

what can we what knowledge can we get from
the environment that can be useful for

the user so after the user.

We want to create some sort of
model about how the user my live or

how they might use or current devices and
we will manage model the context in which

the user might be situated within
those appliances that they're using So

both all types of Samsung appliances
that they might be using.

And then we're trying to make
the experience better so

we want to understand the intent natural
understanding of someone's intent and

how they convey desires so
not just through verbal commands but

also multi-modal so
using things like gesture and.

To indicate things.

In the language we use the word axis to
represent words that can be in big US and

you need context to to to fully
understand them to like look at this here

if you have to understand my gesture and

where I'm pointing to understand where
I'm looking where I'm addressing.

So using all of these types of things
to get a better model of the user and

their intent and then applying that to
all of the devices within Samsung and

trying to create a better
user experience for for

all of the users through
services that integrate.

And so finally we have all of these sort
of small entities that we've been looking

at within each of these broader
categories so everything from speech

recognition to just a recognition
with computer vision to understanding

to doing predictive analysis
with mobile mobile behavior and

understanding some someone's
behaviors over time by tracking them.

And then understanding how do we
interface between all of these different

technologies so how can we interface
between someone's phone and their T.V.

and all of the devices and

so some of the specific groups within
a I so we have this Friday group and

then we have individual groups that
are doing research within within that.

So one of the main groups their main
focus is computer vision so a lot of it

is based off of general perception things
that you may consider when you're doing.

Sort of robotics tasks or even.

Trying to do general understanding of
a scene so semantic modeling understanding

what is the what is the real important
thing when I'm taking a picture or

when I want to have a picture
what is the most salient or

relevant piece of information
in that scene and

then also compression is a big thing so
we want these things to run quickly and

also on mobile devices how can
we compress these huge neural

networks into something that is capable
of performing inference in real time.

And then another group doing.

Intent understanding so trying to
leverage our everything that's from

all of those devices and put it together
to understand someone's intense and

someone's behaviors over time so
we're doing all this temporal modeling and

also instantaneous modeling individual so

that we can better understand context
in which people are making decisions.

I mean we're trying to leverage
all of these things to make

the experience better.

So.

Can we use multi-modal interactions to
enhance the interaction between someone

and their device and so
taking that further like we're trying

to create so if you think about something
like Syria where it's completely

verbal commands we're trying to sort of
augment that experience with multi-modal.

Interactions and
sort of using the whole body and

how we make the same way in which
we might communicate with people.

And then also.

Using AI to.

Explore creative domains for what we might
think of as typically created I mean such

as music generation so
I'll talk about this more specifically.

And finally so we are hiring and
we're trying to recruit faculty and

students undergrad master's and Ph D.

level and.

There are lots of ways to enter to
interact with us and interface with us and

one of the easiest ways for students is to
contact us and try to get internships and

establish that connection and
if you're doing research internship

a lot of times what we try to do is
have your research at Samsung aligned

with what you're doing at Georgia Tech and
so that your thesis

work here is sort of informed by what
you're doing Samsung and vice versa.

All right so I'm going to move on to

more specifically the stuff that
I've been working on at Samsung and.

OK so.

I graduated from music
technology here last summer and

for the first six months I was actually
working for way and I was doing social

robotics and since January I've
joined S R A and the AI team and.

Within that.

I've been trying to integrate a lot
of the stuff that I was doing here at

Georgia Tech in that thesis and

build off of that with my own research
now that I'm doing something.

And so that the crux of my thesis
here was trying to understand how

physical embodiment and the constraints
of being living in the real world so

something that we have in a robot has but

a piece of software doesn't
necessarily have how that effects.

Music generation or
how it affects musicianship in general so

how we understand music how we play music
and all of these different things and so

it's I was exploring musicianship but
you can see how this problem might be.

A political two or
relevant to many other spaces that involve

embodied interaction so anything where
especially in the domain of robots there

has to interface or interact with the real
world we have to understand what is

it about its physicality that's
going to affect its interaction.

So one of the things that I
miss is from my thesis work.

It was developing models that were
capable of generating music for

both sort of for any physical platform
had some idea about music but then it

was capable of saying OK I know this about
music and I have this sort of embodiment

of the set of physical constraints
how can I generate something that.

Fulfills some creative idea so the.

So what you see here is some zoomed
out representation of music so

this you can think of this
is frequency in time and so

this is a human approximation
of generative model so

I created a physical body system that
replicates the constraints of a human and

then I can do the same thing but with
something much more capable something like

a robot that a human would not be capable
of doing and you can see that like

the zoomed out bird's eye view
looks relatively the same but

in actuality when you actually generate
these things even though I understand

the same things about music and
it's trying to get sort of create

similar ideas because of its physical
constraints or the set of physical

constraints on both of these different
simulations different music music emerges.

So here's an example of the first
one the human approximation.

So that's a human approximation and
that's something you know

capable Vajra from player would
be able to do so this is.

The part that was generated by
the computer is the improvisation

the background is still human musicians
but then I can change the embodiment

the set of restrictions and
you get a very different behavior.

So you can see that it was
sort of a lot of the change

the general shape of the music was very
similar to the to the first one but

the actual notes in
the rhythms that it played was

sort of enhanced by this ability this
extra ability that is not human.

So.

Now that I may as Farai I was when
I was first recruited by Larry

one of the things that
we're talking about is.

What what could I do to.

Continue my research that I was
doing here at Georgia Tech.

And still contribute to
sort of the aims and

objectives of the Artificial
Intelligence Center at Samson and so

I wanted to create a group that focused
on creative and Larry that was sort

of I think the creative application space
so things that we typically think of as.

Being creative domains so
things like music in our generation but

also creative AI sort of encapsulates
the idea of creative thinking and

things that are designed
to think creatively and

problems of and
adapt to unforeseen scenarios and so

that's also an aspect of creativity
that we're trying to address and.

So on.

And the creative thinking side.

You can think of it as I like to
think of creativity as the ability to

connect two seemingly different ideas
in a way that works or that a way.

That it is capable of
fulfilling some function so

in this one this is a MacGyver clip
where he uses a chocolate bar to

block a sulfuric acid clique and so
to like completely different things but

you know it works and then obviously
things like Path me understanding.

Positioning in order to achieve some
goal all of these I think are signs of

creativity and there are things that
computers are quite capable of doing and

in my previous research I was exploring
both of these things can we take two

separate ideas and blend them and also can
be within a creative application space.

So one of the things I did is.

You've probably seen some of the style
transfer technology that's been happening

with computer vision where you can
take style of one painting and

put it onto the content of another.

This is sort of the same idea but
with music where it can take Mozart and

take that melody and
apply it to the chord changes.

John Coltrane's giant steps.

So it's so

it was able to you know I doubt
the melody for this new context so

it's connecting these two things that are
different but it works and so that's where

I sort of see creative thinking in
creative applications in computers.

Sort of that's the definition so
this brings me to.

How do I integrate so
there was the theme of creative AI.

Sort of specific to my research and

then there is sort of
a broader theme in the group.

Where he's been pushing this
concept of learning by example and

exactly what that means is sort of
different within different contexts so.

Within the Bigsby assistant that
can be like teaching a skill and

how do you do that quickly and how can it.

Adapt to the user so that it can then
apply that skill in different contexts and

the robot in robotics it might be teaching
them a physical skill How do you.

Pick up this teapot and
pour a cup of tea so

being able to do those things
requiring huge amounts of data.

Is sort of one of the big themes in our
group this year and so for me I was

trying to figure out how can I take this
theme and apply it to my own research and

what things what specific
aspects about this problem are.

More relevant to a musical task and
so my goal is

to not just learn by examples but actually
a few examples like I was saying so

things like if you start learning and
want to learn and come into play and

then in music we're also not just thinking
about these instantaneous things or.

Things that.

Only require one decision but
we're actually thinking about

sequences we want to learn how to how do
you create the best sequence of notes for

example but
it's not just the sequence it's.

How the sequence falls along in time so
they're defined by or

they're dependent on this
temporality of the problem.

But it's much more difficult to say things
that might happen over a large amount

of times in music we're only
thinking about things that happen.

Really on the millisecond perceptually
we can hear just noticeable

difference is about ten milliseconds
is the average for the average person.

So understanding that threshold addressing
that threshold just noticeable differences

is something that we need to address when
we're designing generative music systems.

And then additionally with music the.

Metric that we use for measuring
the performance is subjective it's

hard to quantify some of these things like
OK it's generally music but is it good or

not that's something there's no like good
bad distance metric that we can use for

music it's something that we have
to consider human perception and

human subjective.

And then finally I want to wrap all
of that stuff into embodiment So

I want you know a robot to be
able to learn to play music or

to improvise so if you take all that
you get this long string of words

which doesn't mean a lot I think but
if you trying to be more concise about

what I what I do basically my goal is
to teach embodied agents to improvise.

So I want.

To teach robotic limbs or manipulators
to play piano specifically that's

what I've been working on and so this is
a simulation that I've been working with.

In terms but the physics simulation and
then also this is just

a design that I've been working
with based off of the fact that we

will probably build something right now we
don't have the hardware implemented yet

but we are thinking about how we can build
something in the future and this is.

A set of it's a joint team
that's quite common for for

manipulators and robotics and so that's
why it was designed in this particular way

some of the things that
I've been thinking about.

Or that some of the big
characteristics of this problem.

Understanding music so we remember that
music is subjective but we have to create

some representation of of this sort
of subjective space that we can

use to measure and evaluate the sort of
perceptual and subjective takes on music.

And then the other big thing is all of the
decision process is like the notes that it

plays and the things that it's how
it's played in the future has to be

planned according to its
physical constraints so

I can teach this one robot
how to play piano but.

I want to be able to generalize
to different types of.

Physicalities So if we have a new robot I
want to be able to use the same technology

the same underlying teaching
process to get it to improvise.

And then finally.

There's the part about

learning quickly what model what sort of
baseline model does it need in order to.

In order to establish.

Sort of a grounding that
it can learn quickly so

it needs you need to have some He's to
be situated with some sort of baseline.

Where it's initialised that says OK I'm
I've got this I've got this already and

I want to build off of it how can I do
that how can we do that very quickly.

And so.

Today I'll talk about the top two.

I've been working on all of them.

Focus on these today so in my thesis.

This problem was addressed in sort
of a my gut decision process method

where you have a set
of discrete states and

you're trying to find the optimal sequence
of states that allows you to play you know

a desired sequence of notes and so
if you think with Sharon a robot.

If you think of each configuration of its
arms as being a discrete state then it's

trying to find each one sequence and
so it's basically trying to

find that one of these is a configuration
and it's trying to find the sequence

within these configurations that allow
it to play some desired set of notes.

And then.

To get it to improvise maybe we don't give
it explicitly the desired set of notes but

we give it some higher level ideas
say OK I need you to play within this

chord progression within this note density
within this sort of tonal harmony and

then it figures out both how to play and
then what notes to choose and so

this joint optimization of embodied
decision making and also musical planning.

And so.

That's a really key
point that I wanted to.

Continue working with is that we have
this joint decision making process.

And so now what I've been working with is
something that is with him on the set of

physical constraints is relatively simple
you can you know it's possible to break it

up into states and find the pathway and or
you can break it up into stink actions and

do some sort of more typical
reinforcement learning methods on it but

when you start having manipulators
that have many degrees of freedom.

That the number of disc possible
discrete states becomes massive and

so trying to search over all of
those takes a lot of time and

then also the number of possible
actions takes a lot of time.

And so it's very difficult to learn
a model that's capable of you know playing

piano and so ideally what you would
want to do is have a model that's

capable of generating the joint positions
directly I know the joint positions can be

sort of in this continuous space if
you just think of them as an angle for

each joint This is seven
degrees of robot arm.

And so I can think of OK why can't I just
generate sequence of joint angles because

there are seven degrees seven degrees
of freedom that's a factor of seven and

I want to just generate those over time so

you're creating an animation that
hopefully we need to piano playing.

So the idea is to take a set
of desired notes to play

learn a model that allows it to
generate the sequence of joint angles.

But one of the things where.

It's nice in reinforcement
learning is that you can.

You get some sense of reward
based off of this action.

Where here there is no.

You're generating the actual joint angles
and so there's no sense of reward.

If this isn't a function that.

Different.

With respect to the parameters of the
network so how do we propagate that wasp

back and how do we basically how do we
get the grading if this is the thing

that we want to generate So
something that I've been exploring.

Here's the is just a bigger
zoomed out representation so

if you think of this as
time then pitch here's

this instance we want to play this so
there's some sort of joint.

Positions that allows it to happen and
we're trying to find a sequence of

positions that allows all
of this to make sense.

So what I've been working on is something.

That I've been thinking about and
actually I saw hints of

sort of promise at the end of
my thesis last year where.

Maybe I don't have data
to model this directly

I don't have a training set to just
say here's the notes to play and

here is the joint angles
that allows that to happen.

If you think about this in robotics this
is sort of an inverse kinematics problem

you're trying to find a set of.

Joint angles over time that
allow it to achieve some task or

move to some position.

And I was thinking OK well we
don't have a last function that's.

Differentiable with respect to
the printers is it possible to say

to have multiple networks and be able to
evaluate which one of them is the best and

use that information so if we have three
networks that are niche allies differently

randomly but differently.

And be able to evaluate
the output as one being good

basically sort them first second and third
and then use that information to update

the network parameters so we're creating
a model based off of that information.

So.

Initially.

My last.

So during my thesis I
was sort of using this.

Fine tuning process where
I had two networks and

I would say OK at the end after
we've trained this model.

I want to be able to fine tune it in so
that the robot is capable of

playing the output is and so
I would have two networks and

then I would say OK if one was
better than the other I update the.

Second place one in
the direction of the first so

I say OK you should look more like
the one that looks better and so

that's basically the idea here but
if you want to get this to work from.

Sort of the ground up without
having to initialize the networks

with a huge amount of training and
not just use it as a fine tuning thing but

actually use it to learn
the model directly.

It's if you were just to use that process
you would very very quickly fall into

a local optimal that's nowhere
near the global optimal because

the two things will converge very quickly
so you're saying OK this one was first and

like the optimal vector that I want maybe.

Very close to this and then I would say OK
the second one in that direction sort of

rotate the output and use that rotation
that output to propagate that back

through the network as the last function
and that would make it differentiable.

So just doing that it will fall into
a local optimal that's so optimal.

So I've been playing around a lot with a.

Strategy about how to
do this successfully.

And I was able to do it with using
three networks and the idea is that

I don't want to fall into some
bad optimal place in the net

in terms of network weights and
where they converge on each other.

So but I still want to leverage all of
this information so the methodology for

this is that we first read.

The output like first second third or so
that might be in terms of this problem.

It could be even though it's generating
two impositions the output that we're

really concerned about are the notes that
are played by that those strong positions

so what is the end result so the notes
that are played we can look at and

compare to the input notes that we
want and use that as a way of ranking

each individual network so that we can get
a sense of first second and third and so

the process for updating then is to update
the third in the direction of the second

place one and the second in
the direction of the first place and

that gives us a differentiable
function and then.

The first one.

I push away from the third place one so
it's in there it's moving in the opposite

direction so I just know that the first
place vector shouldn't be this and

I just say push it away and that doing
that creates some variance which allows.

The network to fall and something that's
not doesn't allow it to converge so

quickly into some poor local optimal.

And so using this method I was the first
experiments I did were just OK predict

one or generate the output of one period
of a sine wave so if I have all of these

each one of these is a value and
I want to generate basically find X.

for each one of these so each one of these
are all five random values non correlated

and I want to generate the same
wave respective input for

each of these five outputs and this gives
me so and this is something that I can

do I know how to generate as I
would wave with supervised is very

easy to generate supervised data for this
so I can compare directly my method for

this to the supervised method so where
all I'm doing is saying which one is.

Closer to the desired result of the same
way and then updating and that way and

you get something the end result is
very very similar to what you would do.

With just doing it purely supervised.

And actually I mean there's no
significant difference here so I was able

to do it with a sine wave so I moved
on to the forward kinematics problem.

So given the set of joint angles I want
to predict where the end of factors

are in world coordinates so there
are seven joy angles seven inputs and

in there's two and
a factors six outputs X.

Y.
Z..

For two defectors and so I do the same
thing I'm very easy to generate the.

Data set for supervised training but then
try to do it in the sort of collaborative

way and you get something that's
not significantly different so

it sort of converges on
the on the same thing.

So so to me this was sort of evidence
that OK I could ground something

train something from the ground up without
having to do some preaching process and.

In this case I was able to do
experiments with supervised

information but in the case
where I really want to use this.

For playing piano I'm
generating two angles but

the output is the piece that I really
want to optimize for maximizes the.

Notes that are played and so I want the
notes that are played to be equivalent to

the notes that I want to play and so
I can use that cosigned similarity between

the output of each network as
the evaluation function for

each of the three collaborating networks
and use that to rank the outputs.

And so that's how I get my
first second and third and so

for this the architecture were.

Much the other ones were sort of
straight fully connected this one

because we have something over time
there's a custom convolution layers of.

Convolution leaders at the end also so
it's a little bit more sophisticated model

architecture but
their general idea is the same and.

So I said I couldn't generate
supervise trainer but

actually I created
a simulation where I cared.

But it just takes a lot of time so
it's not like I could do this for

any robotic thing but this is for
experimentation I can evaluate my.

Method directly against this so
this is my.

Manually designed sort of
pathway system for for

this arm and I say go to a new and
capable of going to the know and

then I just have to go to random notes and
use that as training data essentially So

I just have a go all these random notes
and then I can record the drawing

positions over time and
say OK here's my data set and use that and

so I can get you know an infinite
amount of training data from this.

And then so that's one way of evaluating
and then that's one method of training and

then the other method is this club or to.

Generative network method.

And so here's the results.

So what the supervised method
it's optimizing those joint

angles directly is trying to create
a model that is capable of replicating

the straight angles that it was given in
the training set and you can see that it

creates a very a generalization a pretty
good generalization of what that was so

if you go back.

You can see.

Here.

That you know there's this move left
right movement with the arm and

so the network is trying to find those
values that allows it to do that and

it gets some approximation so
it has like an idea where the no is but

there's nothing in the last
function that says.

You know you actually need to
play the note it just says OK

this is the joint angle
that you need to hit but

I mean there's going to be this sort of
get smoothed out in the training process.

And then so in this sequence of notes it's
like eight or nine seconds of notes and

only hits the keyboard once and
it's not even the right no so

even when I have training data
this is a very hard problem.

And I would say it doesn't work and

that's actually why reinforcement
learning exists sort of.

But then with the collaborative
method where I can actually say

the last function is
actually playing the notes.

It improves much much more in this
actually hitting like have to know

that I want to it's not perfect yet
so working on it but

there's a big improvement with
this club or to method and so.

Completely without any label data or

anything just using the evaluation metric
and we were able to establish that.

Design a method that's capable of getting
it to play piano so this is the sort of

the kinematics part or the planning part
of the problem and this is actually what

I've been working on the last few months
sampling is mostly this kinematics but

remember music there's also
the music understanding part and

so in this example I was explicitly
saying OK I want to play these notes but

what if it's the case where
the robot is physically incapable

of playing of the notes that I give it
then I want it to get play the notes that

best represents what I want to play so
has to understand something about music to

make those decisions about how do you what
is the best representation to do that and

how can do that according
to my physical constraints.

So the second part of this is going to
be more about understanding music and

so what I mean by that is.

If you think of each of these
yellow dots as a measure of music

can we create some space where things
that are perceptually similar and up.

And up clustered closely together
in this sort of learned space and.

So this is after the tree
network was trained and

this is the representation of the.

Using tensor flow.

And then here's a seed.

So that's one cluster.

So obviously learned
something about density.

Yeah and then also learn something so

learn something about rhythm learn
something about know density and

whether things are descending ascending
learn something about tonality so

even though last example some of them
were just chords that were played then

there was like the one that was playing
everything's was still in the same key but

many notes so
it's learn good things I think and so

I'm going to show you that was sort
of the end result actually that was

the end result of what I'm going to
describe now so we had this data.

Which consisted mostly of classical
music and jazz improvisations and

I use that to train these networks and
I had many different ways

of training them and
then I'll talk about how do you evaluate.

OK so.

Basically the idea is to be able to take
some piece of music and then embed it.

Into a nice vector that captures some
good musical features so project this.

Time frequency representation into a space
that is perceptually musically relevant.

And to do that one of the ways.

For doing these things like an image
processing is something called the noisy

auto in quarter so I guess bored with
different types of adding noise so

one way is to drop the notes randomly and
then try to reconstruct

all of the notes that were there so
your have some context you're trying to

reconstruct stuff that's there another
way is just to talk beats entirely and

then reconstruct the missing beats and
then dropping different octaves so this.

Sort of like a I was trying to replicate
like a left hand right hand on piano.

And then you would
generate the other side.

And then this one is inspired
by language processing where.

Where to back where you're.

Embedding the context of the surrounding
words and so you take the input

if that's one measure and you're trying
to reconstruct the previous measure and

the following measure and
then some some representation of that.

And then so
this is a summary representation and

this is just the forward prediction.

Another way of doing that is through this
model called the deep structured semantic

model which is also designed for language
and actually it was the same for ranking.

Utterances and.

Basically instead of trying to
you're not optimizing the output

like trying to predict the.

Like the first or or
the previous or next measure

instead you're directly optimizing the
embedding trying to make the embedding of.

Two adjacent measures look similar.

It makes the assumptions that two
things that are continuous in music

are semantically relevant.

And then another way of learning
this embedding is to actually just

get it to classify the composer and
use that as a task for

creating some sort of feature metric or
future space and use that as the.

Representation of the music
because the idea is

if we can I if we're able to identify the
composer is probably capturing something

that's meaningful about the music in
order to make that classification.

And then something actually so
this is published in Haiti this year and.

The this was.

The contribution of that paper is where
you have you do this the typical.

Task like you are doing
in our own coding task or

predicting the previous and
next measure in this network but

then you have this adjacent pass at
the network is also trying to do and

that's actually doing the classification
so you have and this is train and

and so you have a network that's
trying to do multiple things and so

the idea is that
the embedding is capable of

doing both context reconstruction
as well as classification.

I meant to evaluate these things so
in each of these methods we learn.

Back to representation of the music and
then so

what we're trying to do is predict one
experiment is to predict forward so

can we feed these sequence of
a better insight into analysis.

And then be able to predict the next one
so we can actually generate with that so

in this case we have a sequence.

To analyse and in which generating
the next imbedding so trying

to predict forward and then using that
generator generated in betting or doing.

You know it selection based off of all
of the possible units in the dataset and

that's just based off of course and
similarity.

And the other experiment which just is
capable of doing composer classification

so at the end I had all these results

basically show that predicting context
is actually a very good way for.

A good abstract musical space and
then also it showed that using that

regular size mash regularized method
where you're doing context prediction and

classification you can
get improved results.

Overall so that it's capable of both
doing composer discrimination and

the context reconstruction so

it's sort of OK found the sweet spot
that was good for both of them.

And so on so
you play you an example of that.

So this is.

That was generated that's generating
predicting forward for improvisations and

so the thing that is generated is
going to be determined by the input so

you'll hear that when I give it one
input you'll get one silo of impart

improvising and if I give it a different
input you'll get a different style.

So we can use it to generate But
then we can also do some

sort of fun things and
within that abstract space.

When I go we always referred to
this is our own kind of fun so

it's sort of ingrained.

But basically you have these
representations of music

like these different.

Input seeds so I guess this is one and

this is the other input we can
blame them and sort of try to get

something that's a combination of all of
them and that's what you're seeing here.

And.

Yeah so as you go up if you start getting
more and more closely related to this and

these are you still you don't directly
generated these examples they're still

using unit selection from
the from the library.

So I can play them but
actually I'll show you an interactive.

Application that we built
using that idea merging.

So if I were to play something.

The idea is that I can combine it with
what the computer displayed to generate

the next output that the computer plays so

that when computers response is always
something that was relevant to the past

and then relevant to what I displayed and
this was.

Presented at AAA I couple years ago
actually this was and won an award for.

Best IMO it was two thousand and
seventeen.

So that's one method for
using that learning space and then finally

this is actually something from last
summer he may have seen this video but

it was the output from.

Our applied to Sharon I Robot here.

So that method was using.

The methods that are described for
generating music but the way that it was

planning was based off of this sort
of discrete space plane method and so

my future work is now instead of using
that method to actually use the network

that was based off of that collaborative
generative network method and

use instead of directly applying the input
actually use this learn embedding

as the input and
try to get the system to play this.

Sort of this learn representation instead
of directly playing what I tell it to.

Yeah and so
I get why Mary was my advisor here.

My boss and some of that work was done at
Google when I was working with him and

I think he was also a colleague at Google
who's This is in faculty I work with still

actually quite a bit but thank you very
much and I'm happy to take any questions

about this survey or
working in general yeah so thank you.

Yeah.

Yeah so

so actually so
there is there's this way of doing it and

presented it because that was the results
of this experiments but basically where

you are just trying to add some noise or
some variance to the model in training so

that it doesn't fall
into a suboptimal area.

And so Daddy noise.

Is another method and I've been doing
experiments to see which one is more

robust and sometimes that has to do with.

How long it takes things to train it
seems like they are producing similar

results but one or the other might be
better in terms of training time one thing

that I didn't mention is that with this
method haven't been able to do it with.

Batches so it's been like one
sample at a time and that takes.

Even longer because you have to update
every single time of the network

parameters and there's there's
three networks they have done so

it's a takes a lot of training.

But the nice thing is that as long as you
have the metric you don't need label data.

Right.

So OK so one of the things.

Is trying to give it tasks.

To solve and see what was learned so like

is it actually able Capel is it capable
of predicting the right thing next or

is it capable of classifying composer so
that like that but

usually you you're always going to need
some sort of subjective user study.

Actually.

Almost always whenever I submit a paper
without it I always get a response like

where is the subjective user study and
it's like now I just do it by default but

so with that.

Usually I have baseline models that I'm
comparing against so users listen to.

Melodies that were generated by some
system and I was developing and

then I'll have you know another
system from another paper

that we're comparing against and
that use that as the baseline and so

then the users are asked usually actually
to to rank what basically in preference

sometimes there are things just preference
that we ask them to listen for a lot of.

Work is inspired by text to speech.

So so not just the quality but
are you capable of understanding

is concatenation does it make sense so
all of you.

Were using their user studies
I apply to this research.

So there's general research
which is sort of a space.

And a lot of that is.

Most of the companies are very
sort of actually we need

other especially the ones that publish.

Where you know I'm still going to
reference stuff that happens at Google or

Facebook and
that's going to inform my own research but

I think what is different
is how they're addressing

the problems on the product side how are
they actually integrating these things and

so one of the big things for
example with the Google assistant.

The same thing assistant compared
to the Google assistant so

the Google methodology is where they.

Design functions that will.

Help the greatest number of people so
maybe that function is something that

every Most people do every day
like going to the store and

they can reach the greatest number of
people in that Samsung what we're trying

to do right now is OK what is unique about
this person that they might do it that

they may need to do so Larry likes to
give the example like sometimes he

checks his son's grade at
his local high school so

that's something that's very
unique to him and that process but

if we can teach the system to do that and
it gets the ability to.

You know retrieve the grades in a.

Sort of a more user friendly pipeline.

Then people will start adopting it more
and more because you only go to the store

like once a week but we want the system
to be used every day and sort of take it.

Leverage for all aspects of their
lives not just something that

most of the population does but
everything a person might be doing.

This.

So.

Partly because it is challenging and
you know it's like you can

you know you want to show what
the system is capable of doing.

In that example.

Because the chord changes are so
interesting there was going to be lots of

melodic changes in the Turkish much melody
that would be audibly you know evident so

something that I thought would
just be perceivably easy to hear.

Yeah it's a good

question so it's sort of

it's a little dependent on maybe the group
that you're in within the company so

sometimes there are things that are very
academic It's like this if you're just

doing research and publishing and there
are some actually I'm I guess I sort of

fall into that category right now where so
right now it's like.

It's very very similar to my
graduate school experience

before coming to the same thing I was
working in that while we were as working.

On a particular product and

that experience was very different it
was like I don't care about research

I don't care about publishing this is
a deliverable this is what you expect and

sometimes you're doing research in
order to achieve that deliverable but

it's not in there in the same
way it's more so this is sort of

we're trying to develop a technology
that allows us to do these things were.

The other way around would be
OK let's develop the product and

then base off of what the product needs
are develop the technology so here.

The other way around.

Yeah.

So so.

So so I have immediate goals that
are relevant to this research so

one of them is just.

Getting the thing to play the piano and

then maybe adding additional arms but
if we can do that

then it's becomes more feasible to use
that same technology to add it on to

a robot that might be capable of like
pushing an elevator door button or.

Grabbing a cup for
you things like that allowing.

Sort of enabling mini manipulation so

that's I guess the overall goal how
can we do these things efficiently and

so training them very
quickly getting them to learn

autonomy asleep without having to give
them explicit data explicit label data.

So trying to.

Work within that space is the goal for for
some of this research and then I have.

My own research aspirations
in just music generation.

Well thank you very much.

Thank you.