Hi I'm psychic and I and this is my RIGHT NOW child Samuel I received as visually classified that things using a piece as polynomial representation of a set of data that I received by inventor of their brains as an artifact that she finds in Dr and the L. and now what is exactly is the problem as it continues to grow and his beam used in your new ways it becomes more and more ubiquitous and harder to classify that are trapped they now can't let the topic prosecution every memory intensive and storage space intensive and involves complex computations to address this problem you propose using a continuous approximation a set of data. Now what is not very tough because because you never trap because with each and is dividing that links based on data gathered from in that way such as the amount of data being sent and received by in that way and that and here I will times between these pockets and the time between to get to classify networks. Into either e-mail and instant messaging abnormal are normal or state or malicious this raises the question why isn't there were traffic classifications or well in the area network forensics investigators classify network categorized there were traffic and a select myspace all known archive traffic when it comes to network monitoring many companies actually perform statistics to measure performance for example measure the battery consumption in another measure is to identify a network related problem. Currently there are three main method is to analyze the network traffic the first being packet payload analysis also known as the packet inspection. This is what I want to actually goes inside of a packet. And investigate the contents of a still over e-mail. There's some was a look at the causes of e-mail you can actually see the message that's being presented then that second being important over analysis essentially many applications on H.T.T.P. such as e-mail or a C.P.U. or fall Transfer Protocol many people can identify networks I think this has its own weakness because many people use T.C.P. timely and order to transfer malicious traffic use on the important numbers that are being analyzing closed systems which are arrival times and they don't let you know as a categorized network traffic on another no backup to check your paypal malices this is extremely accurate but a slow environments privacy as we can get some repos of piecewise on how to approximate this approximation is based on the least squares approximation which you lies is a built in function of Matlab called poly fit and if they create a list of arbitrary order these poems represent data in by arrival time milliseconds and arrival or which the arrival orders the independent variable and the other two are dependent now there are many benefits to using a piece as part of the approximation but that over a discrete representation of the data. When is that piece of power male presentation that anyone is significantly less memory than the other line for example with your other story ten thousand planes apparently you have their story of an area that represents these ten thousand points for their morning peak this hour of the approximation yields more descriptive statistics than this week of presentation of the data as you can see and that's about if you use discrete traditional discrete statistics such as mean and static deviation despite the. That's you would find them to be exactly the same and virtually indistinguishable. However if you use the polo representation. You would clearly see that one is why equals X. and the other it is white clothes hundred minus X. and that one has a positive so while the other has a negative as though which makes them very easily distinguishable this is a screen capture of wireshark an action bar struck is a popular network analyzers tool and which many network administrators use to identify network related problems start actually look inside. Eg headers identify the protocol such as the investor T.C.P. and site in a word came from by its source addresses and destination address this. Even the time of arrival and the order that arrives at Wow wireshark is useful. It does have this was one it takes time to analyze each and every packet so that great detail we propose using piecewise a moment. Proxima she would try to create custom parsing algorithms to distract certain premise such as source and destination address the protocol and the data let in order to represent this traffic and automated process and this is a diagram of a T.C.P. had it which contains information only instructed using every help us out there are now as you can see how it's a sport and US nation but basically tell you what advocation Is there any I need to know based on any given time then you have a sequence of base which ensures that packets arrive at history or you. And lastly you have the knowledge and number which basically serves as a kind of your seat where that it basically says yes I receive this now T.C.P. connections follow the plane to both ends. But this is a life. T.C.P. connections ignorant U.I. the most reliable connection that. The four to pick a principle basically states that every T.C.P. connection mice have a destination where a source putting a destination address and it starts at just now. If multiple C.C.P. connections share the same destination answers addresses then they form and I keep there now according to a definition. This means that I keep a can constitute of multiple T.C.P. connections or more than just to tease. Now as we outlined before it is a procedure to generate and models how to get rid of various kinds of signs out there for example you have a basically a lot of sites like that from sides of that ad and you have your instant messaging email and video of say some of the more popular sites there really is that every stage I G. Mail. Yahoo. Facebook etc Of the four ties in that were tracked there basically there were no one asked why we're only going to demonstrate two in this presentation the first thing they said web page traffic which was generated by no hoax dot com The second being this domestic setting which is in a generated by Facebook respect that the Web basic web has traffic to have many T.C.P. photos established around the start of the connection. This is due to when one is browsing the web and requests to look at a certain web site it immediately downloads many different objects to represent a rose site for example it could be multiple pictures or text boxes but is the message in traffic is expected to have many small burst of T.C.P. flows that occur as a session. This is doing so when one is actually chatting online. There may be some way before another message is sent or received this isn't through the representation of not more folks dot com The problem of that has a rival on the X. axis and arrival time on the right on the Y. axis. Please note the not the monotonic be not increased. Nature of the grass and that is not weight here. This is due to the nature of the Internet. This is because the Internet actually does not receive a packet is the first second or sees the first packet or the second second they receive the second bursts is in May It really behaves in such a manner that it will receive five and one second and it may take like ten seconds to receive ten more back to the problem the right represents data and bias on a lie axis and it increases sharply and then has a sign useful a major Now this is you need to now almost dot com It is not very significant but it is important. No that is less than fifteen hundred and greater than zero. This is due to the limit of even a technology. This is a three D. representation of normal folks dot com in red are the individual T.C.P. flows and blue is IP pair that is a human to some of these individuals you see people please know that each T.C.P. connection is unique in that a star around is the same time. This is due to reasons that I stated earlier and this is consistent with our thoughts upon first inspection the Facebook is the messenger traffic looks incredibly sporadic and that is actually connected but if you look at it through the mission of you look arrival on the X. axis and arrival time on arrival on the Y. axis. You can see that there are actually a total of forty three individual T.C.P. connections for every message that is sent and received also no right here where that there is not much significant overlap of the individual T.C.P. connections which is also system now considering this is a recommend proof of concept study this is like the success of the preliminary results indicate that it can't be that the grass to be just another topic States can be distinguished visually. Well this distinguishing process is not yet automated has this but then it is not yet scale. However if implemented this method would mean using significantly less might mean that they've had a little methods out there which is gives it a huge advantage in order to address this problem. Scalability wins hands of us three techniques in order to automate the process such clustering techniques improve became rhythm Gallus and make sure most of models but as we came East and even custom out of rooms that may actually suit our approximation best. We also need to perform an error analysis because every approximation has some type of error and that was not invested in this presentation. This concludes our presentation. Are there any question is the right setting as in the application or wan we could identify different types of network traffic. We can also identified attacks. For example met on techniques of intrusion detection. You know what identify malicious traffic versus known say traffic on this technique that actually helps a lot especially when it comes to a significant amount of storage space because you can archive more known traffic as opposed to other sites and basically this could be his out any time you like to type and someone that's harmful to him could be raining invigilator could bring in most likely anyway is bumping my brain just because they need it or even government. I'm curious how you're just never seen the last time I first application is a. Boy life has a great it does work life basically. Doesn't it extracts the inhumation we need and remains it great that the Lately extract is improving. I guess a lot of them. So far. Start also there's an A lot of the school but if you're captured the really wires are on for about say thirty minutes. You may have like a gig with information and or it's identified. Everything is going on. You have to look through every single packet which could be in the minute millions which takes a lot of time for instance wireshark is very weak because encrypted stuff which is becoming ever more popular especially our cap is transactions. So if things aren't going to our start really doesn't help usually because you are right. One of those lines that while we're on that one yet by just looking at those two you do when you sort of know where licious time costs are just by looking at these two or three here and you sort of you look at see where the highest it is and then look back through the wires. I was Marjorie's love on these two pots are essentially nothing. If you're just looking at a home for the first time but if we archive it and we know this is now more wholesome and this is just behavior of a survival order at this bison land then what we see the same signature again we can classified and say this was an attack this attack may have a unique signature. So if we see that signature again we can immediately identify that tag and throw up a red flag. Hey somebody is breaking into our network. We need to do something about. That's your question. Also with how I feel right now that you get to choose your own squares and like I do Russian or whatever is automatically Let's say I can do this with the work of well the others I love Pollie Pedal is that basically it said that whatever is the highest rate could fill it should be a nightmare planes you had minus one as they were going to decide which is really faulty because if you had you had a thousand planes there right in the basket for that would have seen you live here. I mean nothing to the one hundred ninety nine I would like the team fascinations So you actually have to modify that by a little and in public. There's a built in flight that basically if it goes one of very busy like measures that is the I guess of that itself. And when you lost. And you see really just basically raised by that said hey late this is the best of the house nation and stuff and as a since we want the best fish possible without being two to fifty two. Yeah so and we also don't want to not approximate or not. So it's a delicate balance because we could simply say OK room approximate everything for the water. Well that would be feasible boredom. Let's say we only had two points and even to make a parabola off two points. So it has to basically gauge itself right into the data that is given and that's basically how we found all the other when you do the on the building you see you have dollars or six. How many points we do what you know on the way around the city. They're good. I'm not hard. Russia is going up. You know it also wants to have a no we didn't really like how you did you see I'm not receiving I use them. Do I do rather understand your question. They think it's that's basically because that's where it actually experimenting and because we're converting a discrete representation to the only representation you're losing seven out of that and that's a very tragic event so you're trying to be as good as possible about that and that's why we're doing all this experimentation on these these NG and we did a lot of not just the daily it's at this whole process. Also on say we have two hundred individual data points so that means if you had two hundred points say we pick a polynomial of two corporations for example or you just move them to one or two numbers so that saves a lot of stores. Of course there's a loss. You know so that this that kind of saved you from having a million that you're not a C.V. like it or not. That was we're not saying it's part of this polynomial is represented by a vector of home vision So for example this might be four X. where a plus five X. plus one whereas this three numbers graphs you see this is like around six hundred data points that represents we also represent my balance of the call. So you can't really predict traffic using this method because it would be valid. We're just taking a fingerprint weskit I think our two percenters are sharing the research on the.