[00:00:05] >> And I want to come back we want to talk with a law student in this class Yes He's 6242 and also today we are guest lecturer we also branded as a machine learning seminar talk so we also have people on the campus coming who are not a student in this class but welcome all of you and today we're really happy to have our colleagues from a local company based in Ponsonby market they have 3 floors there that haven't been to the market you probably already seen their logo so really happy to have them today to come and give us guest guest lecturer at the liver by. [00:00:39] Who is a data scientist Jim and who will be talking to us about the natural language processing approaches or and how he approaches that they're using on the campaign classification so is the worst lodgers marketing automation platform and there is data scientists and he graduated from the University of Georgia. [00:01:02] And we'll forget about that. With a degree in computer science and certificate in applied data science so good news is that we haven't recognized that that was not quite right so he will be starting as a mass Yes student so coming to the bright side yes so we have specializes in natural language processing and has implemented several deep transformer models and for tax cuts of occasions tax and it's models I use for spam to Texan business vertical husband and also support ticket classification so whatever they do that's welcome him. [00:01:42] Thanks. Thanks for the awesome introduction. Mohamed Jr data scientists. Today are we talking about the approaches to campaign classification so 1st will start with an introduction and then we'll move on to a broad overview of where the betting's and then we'll look at a specific example or use a mill term for temping classification and then we can take a look at future directions. [00:02:13] So 1st I want to introduce and like what even is a campaign. Is the world's largest marketing automation platform we sent about a 1000000000 e-mails a day and we're primarily known for this and the capability that e-mail that you send is a email campaign and so the work done here is on classifying the text content in that email. [00:02:34] So also with what this talk is about and so the goal here is to provide a broad overview of text classification methods and focusing on the ones that are most successful as of today. And also to provide a summary of the recent advances in tax classification such that by the end of the presentation everyone has an understanding of State of the art and all the approaches and then lastly I want to offer of my recommendations for building a high performance camping classification system but generally this will provide 2 text as a whole so if you're doing any of this in your own research this might be helpful to you and what it's not is comprehensive as it be impossible to cover all related papers in one talk. [00:03:23] And also want to practice the bend a rule which is to explicitly state the language in which we're working in which is English we want to avoid treating English as the default unmarked normative language as it plays into all sorts of unfair narratives about who counts as intelligent so to avoid that all this work is in English. [00:03:44] So now it can provide the motivation for why we might want to classify you know campaigns and so the 1st is listening hard this is one of the company mottos like I said we send about a 1000000000 emails a day so we should at least make an attempt for understanding What are users are sending. [00:04:02] The 2nd is customer business vertical prediction so this is given e-mail campaign predicting what kind of business the user has and this can be used to. Features like the campaign performance comparison otherwise known as benchmarking or to surface smarter recommendations. It's also used in house spam detection so spam detection is probably something you're used to hearing about like the receiving side of an e-mail but it's really helpful on the sending side as well because we can bring our IP reputation and prevent users with malicious intent from using our platform. [00:04:38] So I kind of want to give like a broad overview 1st of what campaign classification might look like so you can see and this is a specific example of customer business for prediction so you have like a campaign and you feed it into an embedding technique in this example which is a technique that we'll talk about in a bit and then feed it to a classifier And then finally that classifier will output some prediction so now we'll start with like a broad overview of inventing techniques and then end with how we resulted on Bert So we're inventing this or just these numerical representations of text that could be like a word a sentence or maybe just a sequence like an entire document. [00:05:19] And machine learning models can't take in like text by themselves so you have to convert it into these things and throughout this presentation I'll probably switch between saying word and beddings vector representations were vectors these all mean the same thing so don't let my terminology confuse you so by words is the kind of off the shelf and simple way to create these representations you basically just create a history of counts of order currencies and you can be applied different weighting schemes for example to account for turn frequency across documents the problem here are that these mappings are a bit arbitrary and provide no useful information about the relationship between words so we can look at this example where we have this vocabulary and an input sentence I like dogs and we can see the vector representations of each of these words and the issue here is to compute the simulated between dogs and cats would be the same as a similar to between dogs and I which isn't very helpful to classify or. [00:06:19] So it turns out rather than being arbitrary these representations can be learned through a process called Language modeling and the whole idea behind language modeling is utilize a distributional. And that is that you show no word by the company that keeps so language modeling informally is estimating the probability of text given some other texts you can think about this as the fill in the blank example we have a blank and 3 dogs and I give you the task to fill in that blank you can formalize this into a mass language modeling task of filling in the blank or predicting the next word which is an other aggressive task and the advantages of language modeling are that we're able to utilize the on label data through this supervision process in many languages contain enough language or text to learn a capacity language model because it's really easy to gather large corporate from say what could Pedia Web crawl or social media it's also very versatile in that you can learn both sentence and word representations and in our example campaign representations using a variety of last functions. [00:07:23] Language modeling works really well because it's inherently a difficult task even for humans and for them all to have any chance at solving this problem is forceful and semantics and tactics and having code facts about the world and empirically it works better than translation was was the previous approach to creating these representations and so given enough data and compute the model can do a reasonable job at this so we're to vector is a shallow neural network approach that generates these high quality not a contractual word factors that are able to capture semantic similarity so it performs this center word prediction that language Malling task and you see if you were to plot the word vectors in Euclidean space you can see is able to capture the semantic similarity for instance man is to woman as King is to Queen but also syntactic relationships going from the German form of these verbs to the past tense form. [00:08:12] The problem with these word vectors is that there are only exist one word vector per word this is problematic because we know that in English you can have multiple meanings per word dependent on the context in which that word exist so for example we know that the word vector for play given this or sentence the kids play a game the park should not be equal to the word vector for play given the Broadway play period yesterday. [00:08:35] Similarly what do we do for pronouns like they refer to other words in the sentence the animal the sentence here is the animal didn't cross the street because it was too tired we realize that it refers to animal not street because streets can't be tired but there's no way for low capacity model like this to understand relationship so that's where contextual embeddings come into place so contextual embeddings taking the entire sequence in which a word exist before assigning a word. [00:09:03] And what was one of the 1st approaches creating these contractual factors. By the way shamelessly stole this from a blog post it's a really great one on the illustrated transformer but it's this is kind of just like a playful skit so it goes you ask what the word inventing for the word stick is and one says use a sentence you provide this sentence let's stick to improve the nation in this skit and then Elmo puts a contractual word vector it does this by training a biased him 1st in the 4 direction to predict what the next word is and then the backward direction to predict the next word but in reverse so the we put in the sentence let's stick to and so the task would be to learn improvisation and then but also in the backward direction you can learn to stick let's And then the task will be learned say the start of the start token of the sentence so you have these 2 l a c M's and you can cabinet later wise and then you computer the weighted sum and then the result of that way that is you contractual word vector for the word stick. [00:10:05] And this chief say the art on 6 divers data sets back in June of 2018 in a lot of this presentation all have results tables I've highlighted the ones that are most relevant to us s.s. t. is a classification data set it's kind of like a natural language understanding benchmark and the number at the end always stands for the number of classes in the problem so this is a 5 class classification problem and so you can see the output is these contests. [00:10:33] Are dependent on which the context that they were exist. So now we move on to transfer learning which is slightly orthogonal point so transfer learning is storing knowledge gained from one problem and really using it in a related but different problem such that the learned rates from the previous problem are used as the starting point in the brand new problem. [00:10:57] You can think about this in the movie review classified example if you want to take in Move your views and output positive or negative sentiment one thing you might do is train a language model in a very large corpus to learn that mass language or maybe predicting the next word on the pedia and then you can fine tune that language model on your downstream dataset so this would be i.m.d.b. movie reviews the intuition for this is that when you language model look at p.t. you learn in general semantics and syntax in tactics about English as a whole but when you find to into your downstream dataset you're learning more domain specific things so in our example maybe different actors names and how they relate to other things finally you perform network out of which is where you chop off the head of the class of are you freeze the west of the rest of the way to the network and you only train that classifier head. [00:11:47] And so the motivation for transfer learning in natural language processing is. Common knowledge like for example linguistic representations and structural similarities and also provides really good sample fish in sea and performance so you will have a fit which is the paper that will talk about next perform an experiment where they used 3 different levels of transfer learning and this is a validation air plot by the way so ideally you want it to be as low as possible so the blue line is using no transfer learning and the line is using one stage transfer learning and the Green Line is using that to stage transfer learning where we language model and look at Pedia and then our downstream data set and then finally the costs are and you can see in fewer examples it's a better accuracy. [00:12:35] So that she stay the art on 6 classification data sets back in May of 2018 they used to really clever techniques in the paper called discriminate fine tuning in layer and freezing so destroy the fine tuning is applying different learning rates at different layers of the network where this is the output layer specifically you want exponentially decaying learning rates so the latest layers one of the most and then later on freezing would be to freeze the rest of the network so you don't train any of those and any of those weights and you only train the top layer and then you slowly unfreeze and you trade a little bit more and the unfreeze the way into your training the entire network intuition for both of these approaches is we want to avoid this problem called catastrophic forgetting during transfer learning because we don't want to forget all the valuable information that we learned from say the Wikipedia language model in going to this new task so it's kind of like easing the classifier into this new task. [00:13:30] And so this whole talk of probably be about transfer learning but instead I'll just give you a few major themes and takeaways from the papers and that's generally the more data results and better toward vectors bigger balls result in better results and pre-training for a longer time also generally results in better results. [00:13:50] And now we talk about attention mechanisms but before we talk about tension mechanisms we should address that weaknesses of our current networks which was the previous architecture that we're using to create these representations so we can return to a sentence the animal didn't cross the street because it was too tired we need it to attend to the representation for animal. [00:14:11] And in this recur you take in one word of the time and the output of the hidden state of the current word feeds into the input of the next word so what we would need to do to get the hint of representation for animal is get the output of the hidden state and have to travel all the way to losses slee what actually happens in practice is by the time that the representation gets to it it's a bit noisy if it even exists at all. [00:14:39] And then there's the paralyzation concerns and that goes with this being an autograph of tasks where the input to the current node is the output from the previous node you can't really take advantage of these high performance. This is the this is more talking about the long term depends the issue more so the blue score is use in language translation space is how you how good your language translation is working and you can see that generally as sentence length increases the blue score decreases so what is attention we can start by the way that it's motivated and it's motivated by the way that we pay visual attention so if I give you the task of identifying what's in the yellow box you might use some of the things in the red boxes too for to inform that decision what you might not use are the things in the gray boxes because these are less like doggie features so it's just kind of this idea that there are some things that you kin ignore and some things you can tend to. [00:15:36] If we can take this back to words and look at our example sentence the animal didn't cross the street because it was too tired and visualize the attention waits and see that it can attend to animal and if we're to change one word to now the animal didn't cross the street because it was too wide we see that the attention waits update such that it tends to. [00:15:55] The transformer architecture was the 1st to kind of have this idea to fully get rid of her current networks c.n.n. says whoa just use attention. The only really important part that you need to know about the transformer architecture is that there's a stack of encoders and decoders and within each are 2 different types of attention so the encoder has a tension that's able to see the entire sequence that you input. [00:16:21] The tension is in the decoder it's only able to see up to the word that you're about to predict the intuition for this is that during training time you don't want to feed the word that you're trying to predict whilst trying to predict that word cheating. So. I pulled out components of the transformer it took that decoder with the mass of the tension stack 12 months off of each other and got state of the art on tasks back in June of 2018. [00:16:49] And so you can see it's performing this next we're prediction task so you put your sequence and then you pass it through the Transformers architecture and then finally to a classifier and then you put some resolve the problem with this is that sometimes future context is important so we can look at these 2 sentences and see that Mark can be used as a verb or a noun and is only decided in later words in the sentence so if I'm coming up there with a representation for Mark 1st read the sentences so it's. [00:17:19] You want to stress and then Mark Which area do you want to show us so it's just the idea that future context is really important and having these mass tokens can be really harmful to creating these representations. So that's finally wear burkas in the place. So Bert takes all of these ideas of all of these things that I just described and put them all together so contextual beddings from Elmo transfer learning from the you will unfit paper attention mechanism from the transformer paper and similar to open as it pulls out components of the transformer specifically instead of pulling out the decoder it pulls out the encoder is able to see the entire sequence. [00:18:02] Comes in 2 sizes. Large and within those there is a stack. Of 12 these transformer encoders and $24.00 has 2 pre-training tasks so the 1st is mass language modeling that's where we input a sentence and mask out 15 percent of the tokens and then predict what that mass token is and then there's next sentence prediction that's where we have 2 sentences that we predict whether sentence b. comes after sentence a so I can read these 2 examples of sentence a is the man blank to the store and sentence b. is penguin blank or flightless birds the sentence is don't really sound related so the output is not next and this. [00:18:43] Was done on a very very large corpus of books as well as English work Pedia base was trained on 4 in a pod configuration and Brooke large was trained on 16 and each pre-training task took 4 days to complete and so luckily you don't have to train Bert yourself these are all available to you to use so these are the different variations of bird so like we said there's base and large but within those there's case and in case incentives so if your problem relies on uppercase and lowercase being a distinction then maybe you want to use case but there's also the multi-lingual case in the case bird as well as Chinese simplified and traditional. [00:19:25] And this is just to speak to the successes of Bert So as of June 2019 was the Ark over all existing systems on all I should say natural language understanding by a large margin These include text classification language translation question answering summarization and natural language inference so now we can move on to a specific example of can be asked to take a drink of water. [00:20:00] So will return back to our example of predicting a customer's business vertical given their campaign and so will be using Bert So in this example once again you feed in the campaign you pass it to bird and get this new miracle representation of the campaign and then pass it to a class of are actually doesn't really matter doing all the heavy lifting so the crossfire here is actually just a single layer and then you up with some protection and then also talk about the date a little bit so we know we have about 10000000 unlabeled users in terms of their business vertical are labeled users come from kind of this whole when you are on boarding and creating your account used to be a part of onboarding to select your industry this is no longer a part of the app so there's kind of been this initiative on the data science team to label some of these on label users this is a 46 class problem you can see some of the examples here is like architecture struction there's like personal care construction etc Among these is a very high class imbalance and then this was all trained on only the publicly available campaigns to respect our users privacy. [00:21:11] And then there were data preprocessing steps so that was to ship out some of the campaign structure so this included like signature lines All Rights Reserved texts. Like copyright symbols and stuff like that kind of period every single campaign that is representative. And then remove the English campaigns via a language prediction software and then use word peace organization so where are you have your entire campaign as a strain but you want to get it into a list of words and tokenization is the way that you go about that using a predefined vocabulary a common problem in n.l.p. is determining what this predefined vocabulary is because when you open your model to the real world and you start using it you don't know what words you're going to come across but one thing you can do is use a clever technique like word piece tokenization that when a word isn't available in that. [00:22:04] It will split words along these morphological boundaries so you can create representation for words using like root words. And then there's labels moving so there is a fair amount of class entanglement so labels moving is a regular station technique you can use to basically make sure your model isn't like to certain of one class and then there's a train test and validation. [00:22:26] There are a few network adaptations so this was to fine tune the language model using sequential unfreezing and just go to fine tuning like we saw in the you will in fit paper and then I applied cost waiting at the last function to classification to account for this class of dots. [00:22:43] So the this is 2 stage transfer learning so the 1st stage we want to fine tune the language model so we're predicting say we predict the last word and we're using e-mail campaigns which are downstream decide so that took 60 hours onto Nvidia for g.p.s. with the following type of printers. [00:23:05] And so the 2nd fine tuning step is fighting the cost of our 10 hours on the same 2 and video. And this is a table of results you'll notice it's split down the middle because we can kind of think of the output of this model in 2 different ways so the output is going to be a list of size $46.00 where each index in the list corresponds to the class of our certainty of that campaign being of that class so you can think about this 2 ways either you can take a single prediction which we'd be the one with the highest confidence or you can set a confidence threshold say point 7 in this example and take only that subset of very confident values so now we can look at this and top to bottom I says kind of advancements that happen over time so we can observe the f one score when I use the vanilla bird and then fine tune it. [00:23:53] Perform so well and then I fine tuned it got better waiting and I got much better at it more data because that always helps and then I improve the language prediction so I explain that it was like a pre-processing step to strip out any non English campaigns really interesting thing that I found is that a lot of these language production softwares if you feed it like punctuation along with text like German text and punctuation and like numbers it's going to think it's English for some reason must be some bias going on in the model but I found that out afterwards and since then shipped out all of that like the non letters and improve the language production which then improved the downstream cost far and then finally I did even more data and that resulted in the best results and you can see for the same case for the multi label prediction as well. [00:24:44] So when I was making this presentation I thought I would do a demo of the really cool to show you guys like campaigns and like what the classifier said and I started with correct predictions it turns out as not very interesting so we're going to look at incorrect predictions instead. [00:25:01] So there is a common problem with image campaigns being the expression the troubling for the class of our Because obviously if the entire campaign is like an image there's no relevant text for the classified to take in so this entire thing is pretty much just an image so the costs are got really confused about the way they reported business vertical which is the ground to truth label was restaurant and venue here but it got really confused and said things like e-commerce like needs hobbies and retail is another example of just image campaign I think this entire green thing is just one image and their self selected business vertical with e-commerce you can see here the e-commerce actually still got pretty confused but e-commerce actually did end up somehow here I think this is because the shop now button is actually text. [00:25:50] Here is a much rarer case so this was non-representative text so this was where none of the text in the campaign actually. Contributed or it didn't look like anything that's representative of the business vertical so their self reported business vertical was games this is a organization called the. [00:26:10] International Game Developers so she asian they never said anything about games in the document and they always use their relation so the classifier said that they were nonprofit with a really high confidence I was really curious why the conference was so large and then I looked up the idea and it actually is a nonprofit company so now I get my takeaway was that maybe their language was just that of a nonprofit company so it got it got it right but like wrong at the same time. [00:26:41] And then there are examples of where users would self select their business vertical and correctly so the label was false so this is from an organization that identifies themselves as the best food importers but they self selected their business vertical to be architecture and construction which is not true so the cost of actually operated agriculture and food services with really high confidence and then there are examples where the fire is wrong but like still reasonable so memory Cross is a company that sells like Christian mugs and stickers with Proverbs on them for kids so they self selected their business vertical to be manufacturing but the multi-level predictor outputted religion in e-commerce which I would actually say is probably more representative of what they do. [00:27:31] Is another example so this is a great point church they are self selected they their business vertical to be nonprofit which is true because most churches are nonprofit but the most you have a predictor operated religion which is also because religious. I think this is the last example of this is old garden it's a d.i.y. plant like campaign they basically teach you how to tend to your plants and so they self selected their business vertical to be education in training the multilevel predictor outputted Home and Garden agriculture and food services and hobbies. [00:28:07] So the best way to probably get your feet wet with dirt would probably be to try to yourself so if anyone was interested these were the 4 implementations that I would recommend explicitly the pipe torch implementation its expression good and the community is really like bullish on the torch I think. [00:28:28] 29000 about 80 percent of all the new LP papers used by torch so if this is something of interest to you then maybe you want to learn pipes which. So now I'll provide the overview guidelines of building a high performance campaign classification system most of these techniques will apply like to with a more broad tax castigation task and that would be to fine tune your language model as it produces better results in a fewer number of impacts and to use discriminate fine tuning in later and freezing like we saw a new length it. [00:29:03] And then. Using your better you want to track the final state. And then use class waiting to see if you have any problem and this ship out the campaign structure so this would be like the sexual sectional dividers and the All Rights Reserved text and anything else that's not really representative or anything that's really generic across all of your examples and then for Bert specifically you want to keep things like punctuation. [00:29:29] Also turned out to be very predictive specifically if you want to fine tune the language while you're going to need periods because there's that sentence prediction task so the only way it's going to know that there's sentences and then remove any non letters if you plan on limiting to one letter doing any language one language and doing a language production. [00:29:50] And then if at all possible more data we saw that in my example every time I added more data it seemed to boost performance so now we can look at some future directions so the 1st 3 are just variations of bird so bird came out like earlier this year. [00:30:10] Probably about like 9 months so a lot of different variations of it have come out since and then the other 2 are just different classes of deep learning architectures that we might want to apply to campaign. So the 1st is Roberta stands for the robustly optimized approaches I came out of Facebook and they basically just trained it longer on bigger batches with more data and they removed that next sentence prediction objective as it turned out to be not that helpful and then they trained on longer sequences as well and the chief say they are in last month actually. [00:30:48] And then this is kind of just looking over time the size of deploying models getting larger and larger. And so that points to the question when do these models get back down to like a reasonable size to train and use and practice and distill bird answers that question so use the student teacher training which trains a student at work to do to mimic the 4 output distribution of a Teacher Network where the teacher is in the student is distilled Bert it's able to retain 95 percent of birth performance on these natural language understanding glue tasks and only using half of the total number of parameters it does is by minimizing the kale divergence between the 2 opposite distributions and we have Albert This is the current state of the art in all natural language understanding tasks I came out 2 weeks ago and this is pretty much the same architecture for the most part where we have the Transformers encoder in a stack of them and non-linearities between and then the use the additional data and Lord larger sizes for Roberta it's 3 main contributions were that it provided parameter efficiency through parameter sharing and betting factorization and then instead of so Roberta removes that next sentence prediction task they instead replace it with sentence ordering as it turned out to be a lot more helpful and then they remove the drop out probability altogether they performed an abolition study in the paper where they showed that all levels of dropout seem to perform lower than just getting rid of it. [00:32:22] So now we can look at different classes of machine learning architectures that we could use so this is multi-modal campaign classification so we saw that images were like a common problem just campaigns that were just images or what if we embed the image as well and we're where this is like Bert and we use like a resonator maybe like the g.g. or something to embed this image and then we can calculate them and then use multi-modal self attention and then I'll put a result so this is something that really excited to try out and then I think this is my last slide so this is multitasking this is something I was really popular in the research right now so all the models that we looked at today actually have only one task it's either like language modeling where you want to predict like say the next word or classification where you just want to predict where the class was but what if we did like the 2 at the same time I mean it turns out to be really imperiously successful method so long as the 2 tasks are related in it will to inform each other so the way you could like to find the last option would be the sum of your primary task times are exhilarate task and multiplied by this and healing ratio. [00:33:36] And that's all I actually have today so thanks for listening to the talk and we'll take questions. Ok thank you very much so so we have plenty of time for your question and also we have a lot of sway here on the table which can have some at the end not now. [00:34:03] So we have a lot of time for q. and a and any questions you have of Mohammed and I also want to say that so much of it's hiring a lot of people plan to grow by 50 percent by this year. So a lot of people and they already have 3 fours and poncing market so I don't know how many hundreds they're hiring and also they had they have been turned into positions and also full time positions as well Ok So question from the audience Ok Don't be shy maybe I'll start with one question 1st so. [00:34:43] Looking at those numbers obviously I'll state of the art. Seems to look great so one often the question that I think expressed it is class where we say doing data. And then says part about visual visualization So what's your impression looking at these models are they already doing Gray we don't really need to and human anymore we can just yes do everything automatic and then what are your experience Yeah isn't that a really good question so my impression would be like the entire spectrum when it comes to classifying things when you work on like a product team so one side the spectrum is just fully predict for your user and then the other side which is to put all the labeling tasks onto your user but what you can do is kind of come together so similar to how when Facebook they show up they'll put a bounding box around your face and say Is this you they don't label it they'll say this is you and they don't ask you to find all the pictures and draw the bounding boxes and I think that's approach probably would be to have that kind of like symbiotic relationship where you provide a bunch of maybe like your top 5 predictions and then your user will self select I think a good yes question over there. [00:36:02] Yes go ahead go ahead I'm going speak I think if I can hear. So the question is if I use these big models in production Yeah most of them that you've seen here today in practice expression Bert and Roberta which came out of Facebook we're currently production lising some of this this is vertical model prediction stuff so yeah we're currently using this so the question is when in production how do you debug these models so luckily we work on products where it's user facing So there's a few u.i. components that we can create for that that's where we like whenever we predict to customers business vertical we explicitly state that we predicted and there's a section where you can fill in feedback to ask if we nailed this prediction could we have done better and then like a little text box where you can add any additional comment. [00:37:37] By debug exactly what do you mean. Yeah there's not much you can do on like the individual level when you do work out and specific to the model but luckily not all problems have to be solved by like machine learning when you work on like a product you take what's usually known as like the m.v.p. or like the minimum viable thing so in that case it would probably just be to have users be able to select if your prediction was wrong and what we were working on like it's not like a tumor classification so we're we're just working on predicting customer verticals so there's not much risk there but thank you. [00:38:29] One of. The. Other regulars. Yeah. Yeah yeah it was really interesting like they only had like like a few sentences about it I thought that same thing. It turns out at least the justification that they gave was that in previous works using cnn's and or and with like Bass normalisation and dropout seem to prove inherently and directly to be like lower performing that's kind of like the only backup that they gave for that it's. [00:39:11] The. Bird itself uses layer norm which is kind of like it's very similar battle so the question. I will have a question I have a question not specific to n.l.p. And so if you're working as a data scientist role and I know a lot of students in the class they're interested in and position similar to that and. [00:39:40] Yes I understand. You were recruited or your works are working. With a turn Yes you join a company so what experience can you share a student who are people who might not have work at a company yet and I want to sound like good skills a sensual thing that you think helpful helpful to prepare so well. [00:40:03] Yes So I would as a summer intern as well as a fall intern still in school it's really also in that you're able to have all the responsibility of like a full time employee like you giving real tasks like real tickets. Where your team depends on you and you're immediately given the trust in like your competency in that you'll do good work so that was like really awesome and then for preparedness I would say that just coming and with eagerness to learn is probably the best piece of advice I'll give. [00:40:38] That's great so how about technical skills why. To know sort of things already or is that x. I'm sure there is an expectation that you continue to learn the time and again and so on so there are certain things that you think I'll really need to have these before you can attend applying for entry there is such a broad broad of problems that you might work on so I can see there's one thing that you should directly like specialize I would say I just making sure that your foundation is probably sound as possible. [00:41:14] And during your interview Don't just as long as you know try to pretend like you know something if you don't know some I did this during my interview if you don't know something just say it it's much more comfortable for you and the interviewer rather than like listening to you like know exactly. [00:41:31] How to act or that that's a good I would want to make that could calm and yes yes or sometimes into. Who you're talking to some time a day they may try to trap you a little bit the master do you know about this and if you pretend that you don't know then you do but you actually done that Ok no higher authority reject So yeah so I was going to pretend. [00:41:51] And the other question from the audience very very shy. Ok And do you guys are you guys like accepting. Resumes. Ok. I said so there we have have something coming up that position Ok Ok So without further questions I mean that's the thing I speak again thanks Joe.