Hi Good afternoon everyone looks like we're going to get going my name is an alum a ho and I'm with Samsung Research America my role at Samsung is I work and I manage our intern program some to give you a quick overview of Samsung. OK I want to give you a quick overview of Samsung and who we are so Samsung Research America is you know Samsung as a whole is a huge umbrella company below Samsung there's a consumer electronics I.T. and mobile communications and device Aleutians I'm sure you've seen some of these products maybe some of you own them how many of you have some Samsung products. Can somebody in the room name a five products that you know of. That are part of your daily life or that you know are latest and greatest any brave souls out there. OK maybe not today. So let's move on so really quick I really big deal here quick facts of Samsung We have five hundred twenty five thousand employees worldwide we're located in eighty seven countries and our net sales are three hundred sixty nine billion we have been ranked the six most valuable planned it's pretty much a pretty good standing for us. Making it meaningful we have six design centers worldwide and we have one various awards for our design and overall localization approach. Our commitment to R. and D. We have thirty six R. and D. centers as you see there's five in North America we consist of one of those R. and D. centers we invest daily in our R. and D. which is really big. We are considered pretty much a cost center but it's the company sees a lot of value in investing in aren't in research and development. We have two thousand employees in Silicon Valley we have grown rapidly over twenty percent in the past two years we are stem cell research America there is Samsung semiconductor that's located in San Jose and the rest of the other entities Samsung strategy an innovation center Samson next contacts and services are all located in Mountain View as well. Here's our campus like I say we're located in Mountain View California. We take a lot of pride in our center we play a pivotal role in developing innovative products and solutions. And our mission to research is huge we definitely focus in user experience innovative products and concepts key business units and product labs and Advance Research. Samsung Research America consists of ten blabs we have the mobile payment solution of any of you have your Samsung phone there's Knox services our pay system and so forth comes from mobile payments and securities and that's located in Mountain View and in our office we have a mobile experience team the next generation materials research team which is actually located in Boston we have the standards and five G. mobility that's located in Texas and we have the think tank team which is one of our innovative teams or mad scientists if you want to call them and we have the advance applied research lab which we have Mason that is representing to represent in that lab today. Really quick just a slight last slide. Some of our product research highlights if any of you know the sounds and your watch that came directly from our Mountain View location there's a project beyond that was also innovative. The whole concept the whole optimization of the product came from our Think Tank team mobile Knox team again it is located in Mountain View So just some of the cutting edge edge technology that we have comes from Mountain View Again we have Mason here today to talk to you a bit more about the AI lab and the great things that he's doing Thank you so much for your time. I have one. So I have to presentations for you and the first one I was going to talk about the AI center in general that. Was actually just renamed in the past week to the Artificial Intelligence Center we were applied research but now we're sort of our own entity within Sri and after that I'm going to give a talk about what I've been doing and how it connected to my dissertation research when I was here at Georgia Tech. So the Artificial Intelligence Center we have three main campuses in North America where there is actively developing in research in the field of AI in machine learning and. So I'm in Mountain View and that's where most of the team is but we also have groups in Toronto and Montreal and where growing very quickly. The head of the lab very hack he is. The senior vice president of the group and he threw him where we have all of these new initiatives that we're trying to address both short of underlying technology for sort of basic research for AI machine learning but then also how do we integrate these technologies with Samsung products and so. Larry just joined a survey last November he was coming from Google where he was also an S.P.P. doing dielike systems in the Google assistant and actually he was my supervisor when I was at Google and. Now he's the head of the group and I'll tell you a little bit more about my research in a bit but our entire group we have about forty people right now but we're growing and we have a lot of headcount to grow with in Mountain View we can hire like another twenty to thirty people and then within Toronto in Montreal we also can have some room to hire people and we could scientists and engineers both masters and some undergrad we have a few undergrads right now but then also I. For fulltime and internships. Some of the things that we're looking into include all of the like you know the big things that you might think about when you think of an AI lab and that would be things like computer vision and natural language processing but a big thing that we're trying to address right now is context how do we leverage all of the devices that might be in someone's home especially because that's where Samsung has a huge advantage and so we have all of these devices like someone might outfit their entire kitchen with only Samsung appliances so how do we leverage all of this information and use it to make predictions or assumptions about the user so that we can make the experience of using all of these devices much better so a lot of our emphasis in our research is based off of. Improving the user experience. And then we also have some more basic research especially we have some very new initiatives in robotics which I'm a part of and then also things like. Autonomy music generation our generation which also I'm a part of. Yeah so our. I guess the theme of the group is always to think about the user first and then from the user what what can we get from the user and what can we what knowledge can we get from the environment that can be useful for the user so after the user. We want to create some sort of model about how the user my live or how they might use or current devices and we will manage model the context in which the user might be situated within those appliances that they're using So both all types of Samsung appliances that they might be using. And then we're trying to make the experience better so we want to understand the intent natural understanding of someone's intent and how they convey desires so not just through verbal commands but also multi-modal so using things like gesture and. To indicate things. In the language we use the word axis to represent words that can be in big US and you need context to to to fully understand them to like look at this here if you have to understand my gesture and where I'm pointing to understand where I'm looking where I'm addressing. So using all of these types of things to get a better model of the user and their intent and then applying that to all of the devices within Samsung and trying to create a better user experience for for all of the users through services that integrate. And so finally we have all of these sort of small entities that we've been looking at within each of these broader categories so everything from speech recognition to just a recognition with computer vision to understanding to doing predictive analysis with mobile mobile behavior and understanding some someone's behaviors over time by tracking them. And then understanding how do we interface between all of these different technologies so how can we interface between someone's phone and their T.V. and all of the devices and so some of the specific groups within a I so we have this Friday group and then we have individual groups that are doing research within within that. So one of the main groups their main focus is computer vision so a lot of it is based off of general perception things that you may consider when you're doing. Sort of robotics tasks or even. Trying to do general understanding of a scene so semantic modeling understanding what is the what is the real important thing when I'm taking a picture or when I want to have a picture what is the most salient or relevant piece of information in that scene and then also compression is a big thing so we want these things to run quickly and also on mobile devices how can we compress these huge neural networks into something that is capable of performing inference in real time. And then another group doing. Intent understanding so trying to leverage our everything that's from all of those devices and put it together to understand someone's intense and someone's behaviors over time so we're doing all this temporal modeling and also instantaneous modeling individual so that we can better understand context in which people are making decisions. I mean we're trying to leverage all of these things to make the experience better. So. Can we use multi-modal interactions to enhance the interaction between someone and their device and so taking that further like we're trying to create so if you think about something like Syria where it's completely verbal commands we're trying to sort of augment that experience with multi-modal. Interactions and sort of using the whole body and how we make the same way in which we might communicate with people. And then also. Using AI to. Explore creative domains for what we might think of as typically created I mean such as music generation so I'll talk about this more specifically. And finally so we are hiring and we're trying to recruit faculty and students undergrad master's and Ph D. level and. There are lots of ways to enter to interact with us and interface with us and one of the easiest ways for students is to contact us and try to get internships and establish that connection and if you're doing research internship a lot of times what we try to do is have your research at Samsung aligned with what you're doing at Georgia Tech and so that your thesis work here is sort of informed by what you're doing Samsung and vice versa. All right so I'm going to move on to more specifically the stuff that I've been working on at Samsung and. OK so. I graduated from music technology here last summer and for the first six months I was actually working for way and I was doing social robotics and since January I've joined S R A and the AI team and. Within that. I've been trying to integrate a lot of the stuff that I was doing here at Georgia Tech in that thesis and build off of that with my own research now that I'm doing something. And so that the crux of my thesis here was trying to understand how physical embodiment and the constraints of being living in the real world so something that we have in a robot has but a piece of software doesn't necessarily have how that effects. Music generation or how it affects musicianship in general so how we understand music how we play music and all of these different things and so it's I was exploring musicianship but you can see how this problem might be. A political two or relevant to many other spaces that involve embodied interaction so anything where especially in the domain of robots there has to interface or interact with the real world we have to understand what is it about its physicality that's going to affect its interaction. So one of the things that I miss is from my thesis work. It was developing models that were capable of generating music for both sort of for any physical platform had some idea about music but then it was capable of saying OK I know this about music and I have this sort of embodiment of the set of physical constraints how can I generate something that. Fulfills some creative idea so the. So what you see here is some zoomed out representation of music so this you can think of this is frequency in time and so this is a human approximation of generative model so I created a physical body system that replicates the constraints of a human and then I can do the same thing but with something much more capable something like a robot that a human would not be capable of doing and you can see that like the zoomed out bird's eye view looks relatively the same but in actuality when you actually generate these things even though I understand the same things about music and it's trying to get sort of create similar ideas because of its physical constraints or the set of physical constraints on both of these different simulations different music music emerges. So here's an example of the first one the human approximation. So that's a human approximation and that's something you know capable Vajra from player would be able to do so this is. The part that was generated by the computer is the improvisation the background is still human musicians but then I can change the embodiment the set of restrictions and you get a very different behavior. So you can see that it was sort of a lot of the change the general shape of the music was very similar to the to the first one but the actual notes in the rhythms that it played was sort of enhanced by this ability this extra ability that is not human. So. Now that I may as Farai I was when I was first recruited by Larry one of the things that we're talking about is. What what could I do to. Continue my research that I was doing here at Georgia Tech. And still contribute to sort of the aims and objectives of the Artificial Intelligence Center at Samson and so I wanted to create a group that focused on creative and Larry that was sort of I think the creative application space so things that we typically think of as. Being creative domains so things like music in our generation but also creative AI sort of encapsulates the idea of creative thinking and things that are designed to think creatively and problems of and adapt to unforeseen scenarios and so that's also an aspect of creativity that we're trying to address and. So on. And the creative thinking side. You can think of it as I like to think of creativity as the ability to connect two seemingly different ideas in a way that works or that a way. That it is capable of fulfilling some function so in this one this is a MacGyver clip where he uses a chocolate bar to block a sulfuric acid clique and so to like completely different things but you know it works and then obviously things like Path me understanding. Positioning in order to achieve some goal all of these I think are signs of creativity and there are things that computers are quite capable of doing and in my previous research I was exploring both of these things can we take two separate ideas and blend them and also can be within a creative application space. So one of the things I did is. You've probably seen some of the style transfer technology that's been happening with computer vision where you can take style of one painting and put it onto the content of another. This is sort of the same idea but with music where it can take Mozart and take that melody and apply it to the chord changes. John Coltrane's giant steps. So it's so it was able to you know I doubt the melody for this new context so it's connecting these two things that are different but it works and so that's where I sort of see creative thinking in creative applications in computers. Sort of that's the definition so this brings me to. How do I integrate so there was the theme of creative AI. Sort of specific to my research and then there is sort of a broader theme in the group. Where he's been pushing this concept of learning by example and exactly what that means is sort of different within different contexts so. Within the Bigsby assistant that can be like teaching a skill and how do you do that quickly and how can it. Adapt to the user so that it can then apply that skill in different contexts and the robot in robotics it might be teaching them a physical skill How do you. Pick up this teapot and pour a cup of tea so being able to do those things requiring huge amounts of data. Is sort of one of the big themes in our group this year and so for me I was trying to figure out how can I take this theme and apply it to my own research and what things what specific aspects about this problem are. More relevant to a musical task and so my goal is to not just learn by examples but actually a few examples like I was saying so things like if you start learning and want to learn and come into play and then in music we're also not just thinking about these instantaneous things or. Things that. Only require one decision but we're actually thinking about sequences we want to learn how to how do you create the best sequence of notes for example but it's not just the sequence it's. How the sequence falls along in time so they're defined by or they're dependent on this temporality of the problem. But it's much more difficult to say things that might happen over a large amount of times in music we're only thinking about things that happen. Really on the millisecond perceptually we can hear just noticeable difference is about ten milliseconds is the average for the average person. So understanding that threshold addressing that threshold just noticeable differences is something that we need to address when we're designing generative music systems. And then additionally with music the. Metric that we use for measuring the performance is subjective it's hard to quantify some of these things like OK it's generally music but is it good or not that's something there's no like good bad distance metric that we can use for music it's something that we have to consider human perception and human subjective. And then finally I want to wrap all of that stuff into embodiment So I want you know a robot to be able to learn to play music or to improvise so if you take all that you get this long string of words which doesn't mean a lot I think but if you trying to be more concise about what I what I do basically my goal is to teach embodied agents to improvise. So I want. To teach robotic limbs or manipulators to play piano specifically that's what I've been working on and so this is a simulation that I've been working with. In terms but the physics simulation and then also this is just a design that I've been working with based off of the fact that we will probably build something right now we don't have the hardware implemented yet but we are thinking about how we can build something in the future and this is. A set of it's a joint team that's quite common for for manipulators and robotics and so that's why it was designed in this particular way some of the things that I've been thinking about. Or that some of the big characteristics of this problem. Understanding music so we remember that music is subjective but we have to create some representation of of this sort of subjective space that we can use to measure and evaluate the sort of perceptual and subjective takes on music. And then the other big thing is all of the decision process is like the notes that it plays and the things that it's how it's played in the future has to be planned according to its physical constraints so I can teach this one robot how to play piano but. I want to be able to generalize to different types of. Physicalities So if we have a new robot I want to be able to use the same technology the same underlying teaching process to get it to improvise. And then finally. There's the part about learning quickly what model what sort of baseline model does it need in order to. In order to establish. Sort of a grounding that it can learn quickly so it needs you need to have some He's to be situated with some sort of baseline. Where it's initialised that says OK I'm I've got this I've got this already and I want to build off of it how can I do that how can we do that very quickly. And so. Today I'll talk about the top two. I've been working on all of them. Focus on these today so in my thesis. This problem was addressed in sort of a my gut decision process method where you have a set of discrete states and you're trying to find the optimal sequence of states that allows you to play you know a desired sequence of notes and so if you think with Sharon a robot. If you think of each configuration of its arms as being a discrete state then it's trying to find each one sequence and so it's basically trying to find that one of these is a configuration and it's trying to find the sequence within these configurations that allow it to play some desired set of notes. And then. To get it to improvise maybe we don't give it explicitly the desired set of notes but we give it some higher level ideas say OK I need you to play within this chord progression within this note density within this sort of tonal harmony and then it figures out both how to play and then what notes to choose and so this joint optimization of embodied decision making and also musical planning. And so. That's a really key point that I wanted to. Continue working with is that we have this joint decision making process. And so now what I've been working with is something that is with him on the set of physical constraints is relatively simple you can you know it's possible to break it up into states and find the pathway and or you can break it up into stink actions and do some sort of more typical reinforcement learning methods on it but when you start having manipulators that have many degrees of freedom. That the number of disc possible discrete states becomes massive and so trying to search over all of those takes a lot of time and then also the number of possible actions takes a lot of time. And so it's very difficult to learn a model that's capable of you know playing piano and so ideally what you would want to do is have a model that's capable of generating the joint positions directly I know the joint positions can be sort of in this continuous space if you just think of them as an angle for each joint This is seven degrees of robot arm. And so I can think of OK why can't I just generate sequence of joint angles because there are seven degrees seven degrees of freedom that's a factor of seven and I want to just generate those over time so you're creating an animation that hopefully we need to piano playing. So the idea is to take a set of desired notes to play learn a model that allows it to generate the sequence of joint angles. But one of the things where. It's nice in reinforcement learning is that you can. You get some sense of reward based off of this action. Where here there is no. You're generating the actual joint angles and so there's no sense of reward. If this isn't a function that. Different. With respect to the parameters of the network so how do we propagate that wasp back and how do we basically how do we get the grading if this is the thing that we want to generate So something that I've been exploring. Here's the is just a bigger zoomed out representation so if you think of this as time then pitch here's this instance we want to play this so there's some sort of joint. Positions that allows it to happen and we're trying to find a sequence of positions that allows all of this to make sense. So what I've been working on is something. That I've been thinking about and actually I saw hints of sort of promise at the end of my thesis last year where. Maybe I don't have data to model this directly I don't have a training set to just say here's the notes to play and here is the joint angles that allows that to happen. If you think about this in robotics this is sort of an inverse kinematics problem you're trying to find a set of. Joint angles over time that allow it to achieve some task or move to some position. And I was thinking OK well we don't have a last function that's. Differentiable with respect to the printers is it possible to say to have multiple networks and be able to evaluate which one of them is the best and use that information so if we have three networks that are niche allies differently randomly but differently. And be able to evaluate the output as one being good basically sort them first second and third and then use that information to update the network parameters so we're creating a model based off of that information. So. Initially. My last. So during my thesis I was sort of using this. Fine tuning process where I had two networks and I would say OK at the end after we've trained this model. I want to be able to fine tune it in so that the robot is capable of playing the output is and so I would have two networks and then I would say OK if one was better than the other I update the. Second place one in the direction of the first so I say OK you should look more like the one that looks better and so that's basically the idea here but if you want to get this to work from. Sort of the ground up without having to initialize the networks with a huge amount of training and not just use it as a fine tuning thing but actually use it to learn the model directly. It's if you were just to use that process you would very very quickly fall into a local optimal that's nowhere near the global optimal because the two things will converge very quickly so you're saying OK this one was first and like the optimal vector that I want maybe. Very close to this and then I would say OK the second one in that direction sort of rotate the output and use that rotation that output to propagate that back through the network as the last function and that would make it differentiable. So just doing that it will fall into a local optimal that's so optimal. So I've been playing around a lot with a. Strategy about how to do this successfully. And I was able to do it with using three networks and the idea is that I don't want to fall into some bad optimal place in the net in terms of network weights and where they converge on each other. So but I still want to leverage all of this information so the methodology for this is that we first read. The output like first second third or so that might be in terms of this problem. It could be even though it's generating two impositions the output that we're really concerned about are the notes that are played by that those strong positions so what is the end result so the notes that are played we can look at and compare to the input notes that we want and use that as a way of ranking each individual network so that we can get a sense of first second and third and so the process for updating then is to update the third in the direction of the second place one and the second in the direction of the first place and that gives us a differentiable function and then. The first one. I push away from the third place one so it's in there it's moving in the opposite direction so I just know that the first place vector shouldn't be this and I just say push it away and that doing that creates some variance which allows. The network to fall and something that's not doesn't allow it to converge so quickly into some poor local optimal. And so using this method I was the first experiments I did were just OK predict one or generate the output of one period of a sine wave so if I have all of these each one of these is a value and I want to generate basically find X. for each one of these so each one of these are all five random values non correlated and I want to generate the same wave respective input for each of these five outputs and this gives me so and this is something that I can do I know how to generate as I would wave with supervised is very easy to generate supervised data for this so I can compare directly my method for this to the supervised method so where all I'm doing is saying which one is. Closer to the desired result of the same way and then updating and that way and you get something the end result is very very similar to what you would do. With just doing it purely supervised. And actually I mean there's no significant difference here so I was able to do it with a sine wave so I moved on to the forward kinematics problem. So given the set of joint angles I want to predict where the end of factors are in world coordinates so there are seven joy angles seven inputs and in there's two and a factors six outputs X. Y. Z.. For two defectors and so I do the same thing I'm very easy to generate the. Data set for supervised training but then try to do it in the sort of collaborative way and you get something that's not significantly different so it sort of converges on the on the same thing. So so to me this was sort of evidence that OK I could ground something train something from the ground up without having to do some preaching process and. In this case I was able to do experiments with supervised information but in the case where I really want to use this. For playing piano I'm generating two angles but the output is the piece that I really want to optimize for maximizes the. Notes that are played and so I want the notes that are played to be equivalent to the notes that I want to play and so I can use that cosigned similarity between the output of each network as the evaluation function for each of the three collaborating networks and use that to rank the outputs. And so that's how I get my first second and third and so for this the architecture were. Much the other ones were sort of straight fully connected this one because we have something over time there's a custom convolution layers of. Convolution leaders at the end also so it's a little bit more sophisticated model architecture but their general idea is the same and. So I said I couldn't generate supervise trainer but actually I created a simulation where I cared. But it just takes a lot of time so it's not like I could do this for any robotic thing but this is for experimentation I can evaluate my. Method directly against this so this is my. Manually designed sort of pathway system for for this arm and I say go to a new and capable of going to the know and then I just have to go to random notes and use that as training data essentially So I just have a go all these random notes and then I can record the drawing positions over time and say OK here's my data set and use that and so I can get you know an infinite amount of training data from this. And then so that's one way of evaluating and then that's one method of training and then the other method is this club or to. Generative network method. And so here's the results. So what the supervised method it's optimizing those joint angles directly is trying to create a model that is capable of replicating the straight angles that it was given in the training set and you can see that it creates a very a generalization a pretty good generalization of what that was so if you go back. You can see. Here. That you know there's this move left right movement with the arm and so the network is trying to find those values that allows it to do that and it gets some approximation so it has like an idea where the no is but there's nothing in the last function that says. You know you actually need to play the note it just says OK this is the joint angle that you need to hit but I mean there's going to be this sort of get smoothed out in the training process. And then so in this sequence of notes it's like eight or nine seconds of notes and only hits the keyboard once and it's not even the right no so even when I have training data this is a very hard problem. And I would say it doesn't work and that's actually why reinforcement learning exists sort of. But then with the collaborative method where I can actually say the last function is actually playing the notes. It improves much much more in this actually hitting like have to know that I want to it's not perfect yet so working on it but there's a big improvement with this club or to method and so. Completely without any label data or anything just using the evaluation metric and we were able to establish that. Design a method that's capable of getting it to play piano so this is the sort of the kinematics part or the planning part of the problem and this is actually what I've been working on the last few months sampling is mostly this kinematics but remember music there's also the music understanding part and so in this example I was explicitly saying OK I want to play these notes but what if it's the case where the robot is physically incapable of playing of the notes that I give it then I want it to get play the notes that best represents what I want to play so has to understand something about music to make those decisions about how do you what is the best representation to do that and how can do that according to my physical constraints. So the second part of this is going to be more about understanding music and so what I mean by that is. If you think of each of these yellow dots as a measure of music can we create some space where things that are perceptually similar and up. And up clustered closely together in this sort of learned space and. So this is after the tree network was trained and this is the representation of the. Using tensor flow. And then here's a seed. So that's one cluster. So obviously learned something about density. Yeah and then also learn something so learn something about rhythm learn something about know density and whether things are descending ascending learn something about tonality so even though last example some of them were just chords that were played then there was like the one that was playing everything's was still in the same key but many notes so it's learn good things I think and so I'm going to show you that was sort of the end result actually that was the end result of what I'm going to describe now so we had this data. Which consisted mostly of classical music and jazz improvisations and I use that to train these networks and I had many different ways of training them and then I'll talk about how do you evaluate. OK so. Basically the idea is to be able to take some piece of music and then embed it. Into a nice vector that captures some good musical features so project this. Time frequency representation into a space that is perceptually musically relevant. And to do that one of the ways. For doing these things like an image processing is something called the noisy auto in quarter so I guess bored with different types of adding noise so one way is to drop the notes randomly and then try to reconstruct all of the notes that were there so your have some context you're trying to reconstruct stuff that's there another way is just to talk beats entirely and then reconstruct the missing beats and then dropping different octaves so this. Sort of like a I was trying to replicate like a left hand right hand on piano. And then you would generate the other side. And then this one is inspired by language processing where. Where to back where you're. Embedding the context of the surrounding words and so you take the input if that's one measure and you're trying to reconstruct the previous measure and the following measure and then some some representation of that. And then so this is a summary representation and this is just the forward prediction. Another way of doing that is through this model called the deep structured semantic model which is also designed for language and actually it was the same for ranking. Utterances and. Basically instead of trying to you're not optimizing the output like trying to predict the. Like the first or or the previous or next measure instead you're directly optimizing the embedding trying to make the embedding of. Two adjacent measures look similar. It makes the assumptions that two things that are continuous in music are semantically relevant. And then another way of learning this embedding is to actually just get it to classify the composer and use that as a task for creating some sort of feature metric or future space and use that as the. Representation of the music because the idea is if we can I if we're able to identify the composer is probably capturing something that's meaningful about the music in order to make that classification. And then something actually so this is published in Haiti this year and. The this was. The contribution of that paper is where you have you do this the typical. Task like you are doing in our own coding task or predicting the previous and next measure in this network but then you have this adjacent pass at the network is also trying to do and that's actually doing the classification so you have and this is train and and so you have a network that's trying to do multiple things and so the idea is that the embedding is capable of doing both context reconstruction as well as classification. I meant to evaluate these things so in each of these methods we learn. Back to representation of the music and then so what we're trying to do is predict one experiment is to predict forward so can we feed these sequence of a better insight into analysis. And then be able to predict the next one so we can actually generate with that so in this case we have a sequence. To analyse and in which generating the next imbedding so trying to predict forward and then using that generator generated in betting or doing. You know it selection based off of all of the possible units in the dataset and that's just based off of course and similarity. And the other experiment which just is capable of doing composer classification so at the end I had all these results basically show that predicting context is actually a very good way for. A good abstract musical space and then also it showed that using that regular size mash regularized method where you're doing context prediction and classification you can get improved results. Overall so that it's capable of both doing composer discrimination and the context reconstruction so it's sort of OK found the sweet spot that was good for both of them. And so on so you play you an example of that. So this is. That was generated that's generating predicting forward for improvisations and so the thing that is generated is going to be determined by the input so you'll hear that when I give it one input you'll get one silo of impart improvising and if I give it a different input you'll get a different style. So we can use it to generate But then we can also do some sort of fun things and within that abstract space. When I go we always referred to this is our own kind of fun so it's sort of ingrained. But basically you have these representations of music like these different. Input seeds so I guess this is one and this is the other input we can blame them and sort of try to get something that's a combination of all of them and that's what you're seeing here. And. Yeah so as you go up if you start getting more and more closely related to this and these are you still you don't directly generated these examples they're still using unit selection from the from the library. So I can play them but actually I'll show you an interactive. Application that we built using that idea merging. So if I were to play something. The idea is that I can combine it with what the computer displayed to generate the next output that the computer plays so that when computers response is always something that was relevant to the past and then relevant to what I displayed and this was. Presented at AAA I couple years ago actually this was and won an award for. Best IMO it was two thousand and seventeen. So that's one method for using that learning space and then finally this is actually something from last summer he may have seen this video but it was the output from. Our applied to Sharon I Robot here. So that method was using. The methods that are described for generating music but the way that it was planning was based off of this sort of discrete space plane method and so my future work is now instead of using that method to actually use the network that was based off of that collaborative generative network method and use instead of directly applying the input actually use this learn embedding as the input and try to get the system to play this. Sort of this learn representation instead of directly playing what I tell it to. Yeah and so I get why Mary was my advisor here. My boss and some of that work was done at Google when I was working with him and I think he was also a colleague at Google who's This is in faculty I work with still actually quite a bit but thank you very much and I'm happy to take any questions about this survey or working in general yeah so thank you. Yeah. Yeah so so actually so there is there's this way of doing it and presented it because that was the results of this experiments but basically where you are just trying to add some noise or some variance to the model in training so that it doesn't fall into a suboptimal area. And so Daddy noise. Is another method and I've been doing experiments to see which one is more robust and sometimes that has to do with. How long it takes things to train it seems like they are producing similar results but one or the other might be better in terms of training time one thing that I didn't mention is that with this method haven't been able to do it with. Batches so it's been like one sample at a time and that takes. Even longer because you have to update every single time of the network parameters and there's there's three networks they have done so it's a takes a lot of training. But the nice thing is that as long as you have the metric you don't need label data. Right. So OK so one of the things. Is trying to give it tasks. To solve and see what was learned so like is it actually able Capel is it capable of predicting the right thing next or is it capable of classifying composer so that like that but usually you you're always going to need some sort of subjective user study. Actually. Almost always whenever I submit a paper without it I always get a response like where is the subjective user study and it's like now I just do it by default but so with that. Usually I have baseline models that I'm comparing against so users listen to. Melodies that were generated by some system and I was developing and then I'll have you know another system from another paper that we're comparing against and that use that as the baseline and so then the users are asked usually actually to to rank what basically in preference sometimes there are things just preference that we ask them to listen for a lot of. Work is inspired by text to speech. So so not just the quality but are you capable of understanding is concatenation does it make sense so all of you. Were using their user studies I apply to this research. So there's general research which is sort of a space. And a lot of that is. Most of the companies are very sort of actually we need other especially the ones that publish. Where you know I'm still going to reference stuff that happens at Google or Facebook and that's going to inform my own research but I think what is different is how they're addressing the problems on the product side how are they actually integrating these things and so one of the big things for example with the Google assistant. The same thing assistant compared to the Google assistant so the Google methodology is where they. Design functions that will. Help the greatest number of people so maybe that function is something that every Most people do every day like going to the store and they can reach the greatest number of people in that Samsung what we're trying to do right now is OK what is unique about this person that they might do it that they may need to do so Larry likes to give the example like sometimes he checks his son's grade at his local high school so that's something that's very unique to him and that process but if we can teach the system to do that and it gets the ability to. You know retrieve the grades in a. Sort of a more user friendly pipeline. Then people will start adopting it more and more because you only go to the store like once a week but we want the system to be used every day and sort of take it. Leverage for all aspects of their lives not just something that most of the population does but everything a person might be doing. This. So. Partly because it is challenging and you know it's like you can you know you want to show what the system is capable of doing. In that example. Because the chord changes are so interesting there was going to be lots of melodic changes in the Turkish much melody that would be audibly you know evident so something that I thought would just be perceivably easy to hear. Yeah it's a good question so it's sort of it's a little dependent on maybe the group that you're in within the company so sometimes there are things that are very academic It's like this if you're just doing research and publishing and there are some actually I'm I guess I sort of fall into that category right now where so right now it's like. It's very very similar to my graduate school experience before coming to the same thing I was working in that while we were as working. On a particular product and that experience was very different it was like I don't care about research I don't care about publishing this is a deliverable this is what you expect and sometimes you're doing research in order to achieve that deliverable but it's not in there in the same way it's more so this is sort of we're trying to develop a technology that allows us to do these things were. The other way around would be OK let's develop the product and then base off of what the product needs are develop the technology so here. The other way around. Yeah. So so. So so I have immediate goals that are relevant to this research so one of them is just. Getting the thing to play the piano and then maybe adding additional arms but if we can do that then it's becomes more feasible to use that same technology to add it on to a robot that might be capable of like pushing an elevator door button or. Grabbing a cup for you things like that allowing. Sort of enabling mini manipulation so that's I guess the overall goal how can we do these things efficiently and so training them very quickly getting them to learn autonomy asleep without having to give them explicit data explicit label data. So trying to. Work within that space is the goal for for some of this research and then I have. My own research aspirations in just music generation. Well thank you very much. Thank you.