We are very lucky to have. A storm. He's an associate professor of robotics intersection at the Department of informatics. University of Surrey in the teacher and I guess a job that's to the south to start to go to each of the robotics and circle and Sea World. And get more. Into. The well really with several awards in the icicle the robotics and automation societies early career or Google research or the European young research. A bunch of others that I did on my level were here so to me the name. Of the owner. For. Your spare time even co-founded the startup on all of these you all are vision to have occasion. For all of us so with that I will leave it to the. Thank you very much for the introduction and thank your un the panel and all the other fiber to the Department for inviting me here is still a pleasure it's also my first time to Atlanta in the Georgia Tech soul to be honored to be here so before I start to want to ask. Who began the computer vision can you raise your hand OK And where's it by going in our control. OK robotics in general OK OK so go go audience. Previous to this is Larry so the title of my talk is that Apollo was a giant Mission Control drones from frames to bend based vision. So they say one part of the title would be more clear so this is a picture of my group but we had I thirty research. Department the uniform objects of E.T.H. and you D.H. and. Most of our funding come from a third party organization like DARPA. Foundations but also have our companies like Qualcomm Intel Nikon in one way which are actually as you can see companies that are not that actually dealing with robots but for example with mobile devices so sure that some of the algorithms are we are we should all be able to find the British and the. Devices. What do we do in my lab that we focused on four different areas of research the first one is what we call of usually national Face the Nation that for some of you all made might be known as a slum some of it was that it's actually mapping which is basically using a combination of cameras and the national sensors to you fair that opposition that information is space as well last Also it to be my three dimensional map of the environment if you have a vision a national system that you can use these tools to allow rubble to localize in space and or for example I mean things like for example past or in Iraq are for navigate from A to B. so we also focus on on the most training some of these are guilty them software autonomous navigation of on drones and we try to do everything on board was awfully running on boards both sensing and computing we also started working on deploying since last year saw here you can see a video of a drawing that. Recognizes a trail and follows it and this is done by using imitation learning. Also that running on board and then seen three years ago we also started working on a new type of camera called the event based camera that is basically a new camera that is almost one million times faster than standard cameras because they do not output frames but about events every time that a single pixel detects a change of intensity so a big part of my talk would be dedicated to it so are what is driving my research I like to have a passion and motivation behind my research biggest motivation behind my research is to build flying robots that can be sent to the rescue or sought to be used the potential in one day in search of rescue operations and in fact that we collaborate a lot also would say called that's that they're working in a search and rescue for example after an earthquake a tsunami and so on so the idea is you know to develop technologies that can be used to allow that to. Autonomously explore an unknown building in order to search for the survivors and these are actually operations that they need to autonomy you cannot just image it to send a drone on with any more because of the fifty meters you get you lose a connection with with a drone So one of the biggest problems when you do autonomous navigation as you know is A how do you localize in that through the mention of space so outdoors you can use G.P.S. right but what about indoors so all indoors over the past few years there has been seven of the most stations by some groups like in this case the group of eleven that idea to Zurich or Also digicam out of Europe the U.S. or Pennsylvania where there sure are giant man oversweet there Michael drones in this case with the micro or the others that can do amazing things like fly over the heads of the audience in a Broadway show with the suit. To build a bridge where people can can pass by even throwing balls or balls balancing. Other objects like in this case but as you know the magic comes from the fact that actually all these drones are using an external mode. If I can in this case to localize themselves in space but so these rebels are completely blind and they're also they also don't have a brain so all computation and sensing is actually happening of board now you know very well as little as that in order to do autonomous navigation you have to have a more sensing computing and that's what we are on in my lab so we've come at us actually just a single camera and international measurement unit on board but we also use much from processors to allow drones to navigate all by themselves so here you can see our drone for example that this autonomous will be following a given project three if we get it we destined to be the accuracy within the main hall of our university and here you can see the camera on that you can see you can recognize a few corners of your tracking and the system has to be able to robust to for example reflections that are coming from the sun and other things how does he work so you go straight to the point of there could be them well we have relying on algorithms that have been known now for about thirty years for a structure of motion and now after fifty years of these I'll go to them can be classified the the name of visual odometry So how can you use as little as a single camera yourself first of all you need a reference system typically the reference frame. The frame where you start so you should the first frame is the origin. So how do they work this algorithm so what did you do is there to you move away from the first image and then you build a point cloud for this there are many good Adams like the five point zero go to them. So you can be the initial point cloud and then every five the frame is to be a localized we just back to the same point cloud to using those estimation algorithms like P.N.P. perspective from a point. Then go to Humans are very useful for example localized within the same space but what happens when you start to expand when you want to expand them up well in this case you will have to try and get a new points and the localized as before we just get to the existing and expanding them and. Now this scheme of the frame is the vision of the movie was a proposed successfully the most ready for the first time in display were tracking American by George Klein who is now Microsoft Research and what he's one of the leaders of behind all ends and has been since actually also used in several other open source visual comedy pipelines I'm sure you know. But there is also. India so it's all these are all open source biologists and they all use of more or less the same base visual them in the scheme using Actually the term that as you see was published in two thousand and seven we did we managed to do the first autonomous navigation window on the vision so we don't go to the visual dormitory in two thousand and nine and here I want to show this video that it touches me but eagerly because he was basically also doing the participation of the European in my career a vehicle competition and here. Task consisted in taking off from this spot where the laptop is placed entering an apartment indexing all autonomously. We were the only team performing at the start but honestly and the only team also using on board vision because of the competition regulations we were not allowed to let the other two fly fairly freely so we had to basically secure it with a fishing rod OK but the fishing rod was not constrain the movements of the other other now you know these that there are some bars years and some are some stickers so it turned out that actually the place where we did the competition had no fish on the ground and the camera was looking down so we had to. Buy ourselves so we since we didn't have time where I look at only twenty minutes of time we came with this idea of rolling if you were about bands with some stickers already be attached and then the other was building them up in real time using this map to localize So the map was not on a priority however all the computation of the time was happening of worth and because of issues with way fire in two thousand and nine we used it twenty me there are longer U.S.B. cable OK but there was enough to do computations of twenty or so on on board the laptop that is placed here by the way what happened two years later into Lesson one we did then the first public demonstration but this time we are on board the sensing in computing as well so this time we used an interim board that was in two thousand and nine and also this was at the beginning of a European project that there ought to be my former advisor and see what that which was dedicated to autonomous navigation of multiple multiple tours with vision Ali So here you see basically what was rather flying a rectangular path which was a four hundred meters long using a double income and a plus in a mule and he was using them up to both stop the lies and organize itself we also try to stand these have go to them sorts of to multiple Roberts actually. Up to three robots and here you see the performance of our movie agent which was land that was concurrently collecting information from the three drones and some of the maps together and then in the end we were also running some tentacles to Russian then it was Russian at that time was up in offline one of the outcomes of this project was also the Explorer the pilot that is now used in many commercial drones. And what's next so this was the past of so what are we doing next so I give you my dream my dream is this is what you're going to see in this video. OK this is a video from a commercial by a Lexus where you see fifty drones that move very a jolly so they're able to before but for a fast man over sort of dressing menorahs very jolly and they look like they're actually alive so this is something that is very difficult in our days to implement because there are many challenges for us or bodies it's like not just perception of also control and but blaming. Actually here there are fifty don'ts but forty of them were computer generated and their meaning then were controlled with an external motion capture system but this is just a vision how do we get there I saw now this talk is outlined in a way that basically I try to tell your story in order you might know how in my opinion we can actually get there. So what are the challenges Saurabh of perception so I'm a perception guy and I think that perception I go to them so much sure but not robust because for example differently from emotion capture system the accreditation Occulus that you get from a more perception depends on the distance from the scene the farther you are the bigger the error and then our problem is that it depends on the texture if there is no texture and you cannot do much so you have to move on in ways that actually maximize the texture in the scene. In The problem is that if you want to fly faster than it's faster sensors but also you need to act fast to changes in the environment so you want to have low latency algorithms and lots of sensors. In other big problem is that so far control and perception of be mostly can see that say but only by us you can understand we need ot them so that they can persuade them that they perception into account for example if you want to fly from A to B. not always the straight path without obstacles is the most reasonable and reasonable path but actually more logically it will be the path that maximizes the texture in the scene if you have a wide wide floor but actually you have or through a longer path a lot more texture should actually prefer the more texture of C.S. So I called this into the Robert and other things other problems with perceptions are not texture high dynamic range scenes for example scenes characterized by you know. That either overexposed on that exposed like in this case which can actually cause a serious accidents like the. Accident two years ago which caused the death of a driver because there was a white truck that appeared in front of the camera of the mobile eye camera. With a white big round so it was not visible at all. In other problem that come in as heavy as motion so when you start moving too fast in a light environment the well become and I get motion blur right so this is a problem so can we actually control the doubles in a way that they minimize the motion. So the interesting research questions so based on this topics basically outlined this talk in this way I'm going to talk first about how we achieve the visually National Statistics mission and then I'm going to talk about active vision saw when we control their perception pipeline. Then I'm going to talk a little bit just a little bit about deploying base navigation and then finally I'm going to talk about him based come at us. So let me start talking a. Visually national stays to Mission. So. For me every autumn slam here. Just one to two people so to be a there are two ways two ways to made the motion of the camera with respect to the previous frame either your features like key points. Or you use that it for the magic metals that what they do is that in principle they can track every single pixel in the image OK Now each of these approaches have a sort of bandages if you rely on features the camera can move a lot between frames so you can actually cope well with larger frame to free motion but they are slow because you need to extra features that you have to remove outliers and so on the second type of metals Instead they can in principle exploited pixels in the image therefore they can achieve a higher accuracy in position and interestingly their higher the frame rate the faster they converge. But the problem is that they do not cope well with the frame with the small frame to frame motion. So we came up with this idea in two thousand and fourteen a call there is a view of the stands for Samy that Activision odometry day takes that ban the use of both of future based and for the magic based approaches so uses features for a frame to frame those estimation and uses many pixels so for the magic man that's for a frame to frame a motorist emission so you end up but we depart line which only which is very low latency It only takes two point five million guns per frame while they still be applicable tangle takes fifty milliseconds or thirty minutes a once by the way we also outsource the strong. Here you see the performance of I could see them in different cases so it's also probabilistic in the way it is to make the depth so you have an uncertainty for the depth of that allows a double to know whether it can actually fly over certain areas or prefer that area so we can also use it for Activision OK now this is how you can do more with just a single camera the probably the single camera is that you don't get the scale factor OK so in order to get the absolute scale factor you need to fuse it with other sensors either with a second camera which you want to avoid because you want to our turns to be very small or with the international measurement unit called I.M.U. which gives you accelerations and. Philosophies How does he work I tried to explain to the valley here we discover for basically the great triangle to denote the becoming a frames and the red dot then all today I am you measurements and as you can see there are many more than you I am your measurements then come in a frame stipulation I mean is to get up two hundred to one killer Earth's and become a bigger fifty votes and then the camera makes observations in this case for some for simplicity the Ali put there a single three dimensional limit so how do we fuse the information coming from the camera with the information coming from an I am your best well what we do is that the we try to represent at this problem as a graph and then I saw the graph the basically minimizes to error terms or one side you want that the prediction by the mule is as close as possible to that of the action by the camera. Which we call a basically I mean as you go in on the other side you also want that all the back projected race the correspond to the vision of these pictures intersecting a single point and we call this a projection every single or abundant adjustment term so if you minimize this to a term some of you end up basically fusing the best of the two information. And also equivalent the absolute scale. Here you can see the performance of the system and the key messages that if you just use come at us to begin to drift of the visual drama three is around one percent of the travel distance so one me there every one kilometer of the directory traveled wild when you knew you actually get it down to zero point one percent so ten times better so one saw it without a name you will be one me that I wonder me there's a trajectory with a name you know it's one me that one kilometer projectors so it's an impressively higher. We also compared with Google tango so here you can see the translation error over distance traveled and you can see of course that as expected the editing Clichy saw over the distance started because you always are accumulating drifted by integration but actually you can see that actually the performance of our approach does better than with Tango why Google tango uses it feels that it only optimize is the last State of the cup in a phrase OK so it does not optimize the last N. frames or the only story instead what we're doing is that we take all the information out into account from the beginning until now in order to solve this graph efficiently though you have to be very fast and we use actually a softer that was made here by the group of friends relatives years ago that is called the I some incrementalism of the a mapping which while it does is that it's fast because it only updates the posts that are affected by a new measurement so it's a very smart way to solve a very complex graph so if I talk about theory now let me show you a few If you B.D.S. So this is our one of our drones actually they're all listed on at the moment which we developed between us and twelve into sixteen it was a very lightweight less than. A pound and basically here you can see our main sense in which is a global company. Then we had the Explore F.M. you to read the I.M.U. and control the motors and then we use the computer specifically the Droid X. you for which is basically a computer to run all the computation control. On board in there we also run a Linux a boon to Eros So it's all running on board so what a world you're going to see in the next day this was running on this platform and basically we show that we could do several things so first the wall one of the first things you should do when you control a robot is to evaluate the performance of their approach. And saw by the store being so by pulling out pushing the car the other end because of the vision issues thirsty mission running basically you can see the position controller is able to take it back to the initial position. Up and. OK. Here is the video from before of trajectory following here we had an accuracy of once in the meter at the meters but if you go higher than accuracy actually goes up this way is also very interesting here we show how the system reacts to dynamic objects so here I asked my student to lay underneath the other other because of this looking down and in order to the star because the other one that was doing like a figure eight but at the other actually doesn't care so it's very robust wise robust because basically we have this probably stick that estimation that makes sure that the points that are used and added to the map belong to the static part of the world. And then above them right. It's another. Fifteen A Well basically what we did there is that we show that we can thanks to the latency of the algorithm we can also require us to be allies very quickly after tossing their father over India and here we are not using any G.P.S. but just on board and there's something actually that inspired also a Kompany from Berkeley called the nearly camera where they were also showing off like that when they put it over from the bridge but conversely from us they were actually using a G.P.S. instead of on board vision and because of can also throw it from your window. So it's very robust actually with them almost every week so that was until two thousand and sixteen or. So from this year on Actually we changed now to a different type of color order which is much lighter in the four Safe Uses a couple fiber frame and uses Also the new quad comes not drive on Flight Board which is basically replacing the camera we had before the F. F. the X four the pilot replacing this my phone computer so you now have a fifty grams of this border actually for come at us if a forward looking camera is there a forward looking camera and then what are looking at V.G.A. coming all global share in the cost of three hundred dollars So this is also thanks to the progress of the canal and how to go and this is the first four months or new minute on which we call a snappy So it's very very fast in those especially safer so it can also be used by on the go to students here we are following some chatter is that actually feasible and minimize this not minimize the energy in other application of things that we are doing within the dark by feel a problem so is that first that a problem dedicated to faster lightweight autonomy and he focuses specifically on autonomous navigation and I see that using only on board the ship and here you can see it the most pressure from last year actually where we fly straight so easy here in this case straight up twenty meters per second so seventy two kilometer per hour using basically forward looking steadily come a running as we're in this case and then we're looking coming. In now the program every last until next year and other challenges scope we do multiple obstacles and also flying it in now with the employees and the three companies. Other things that we will be working on not regard regard also. Density construction because often you may want to have the rubble to interact with the environment so in this case this is very important to be able to build much that's maps of the environment so differently from before the maps and now then it's the means of basic We have triangulate it every single pixel in the image OK so you can see two examples in this video. Here you see basically the Quadro that is building it. Then for that artistic map of the environment. By following a given trajectory and in the in this video what you can see is that we used these denser mapping strategy to also find landing spots for the weather over so that when and whenever there is a in a marriage and see lending triggered they were there that can actually safely land there. So Blue stands for safe and red means at risk in this case. This open source if you're interested OK Now let me change topic so so far I basically talked about a motorist Imation and how to follow given trajectories but if objectives were actually fine by us now what if you want your job for example to define the Planet Project three in order to perform a given task so here we are working on different problems one of them awards for example to define half that maximize the texture in the scene so that basically at the cost of a longer fart Actually you make sure that you end up in the location with the smallest possible error but actually in the last year we have been trying to also focus on other tasks that are even more challenging for example there was my vision of sending the flying robots to the rescue one of the challenges that drones were left to give us some point is that how to enter collapsed buildings like through a narrow gap. So and these are problems that have been investigated before for example these will actually spire by my former advisor to be chief of the US A Pennsylvania when I was there ARE doing my post-doc actually this was a from two thousand and eleven days from two thousand and seventeen this year so in this case the basically passed through these narrow gap by using of computing and also in a board sensing the use of by connection twenty six come at us at the time and they used also that I did a little learning more recently actually they apply the same a strategy to this year now they're not using the bike but they're using on board a visually National Laboratory from a downward looking camera and what does this mean downloading camera means that the camera or the robot is not aware that there is a gap to pass through that means basically you need in order to solve this task what they did is that they put the waypoint there they generate their trajectory and then afterwards they put the window there OK So this is very important to remember they were not aware of the gap so or it doesn't know the position of the gap in. Therefore it passes there just because it's told to pass through there is so left to the accuracy of the vision inertial system mission now this is not the way we humans and animal work if you want to pass through a door what do you do you don't look away from the door but you look at the door right so first of all what we've tried to ask ourself was a can we pass through a window by using a window that Ector plus and I mean because we also have the best of your system so that at the present I mean to pass through it so it turned out to be a very difficult challenge first of all it Q.J. research challenge because we came up with a very difficult promise to solve but also technical challenge so one of the first things we did before approaching the problem was or how difficult is this task for a professional human pilot so we embody the best as we spiral to our lab and I challenge him to pass through a narrow gap so we built a simple gapped easy also for the human to detect it's black and white then there is this drone that has a camera that the video stream so that streams they be due to their human pilot which is behind here so the pilot is basically controlling manually the drone and let's see how he performed. More difficult. And after half an hour of training. He managed to pass through this is what deploy an NG will be able to the one day if you have the power of a brain for example just half an hour OK So these are things about responding as well but not things I would like to point out to not is the strategy of the or the pilot ego straight so he uses a very long round so like thirty meters and then the very last second he flips the quadrille there. That's a strategy going straight and then flip but he quickly now sometimes. I don't have the space so let's try to investigate what is the complex of the problem and what would it for example not do but what are the challenges ahead. So I told you we don't know where the waypoint ease but we have the quote rather all the detector window and try to pass through the center of the window without impacting with it so first of all. How do you localize Well with respect to the window so it turns out as I told you before that then certain increases that are dealing with the distance from the window actually at three meters they are there others because the other has an uncertainty appear to centimeters like this. Is not is a lot that means that if you plan it object to that which you can mark the past of the center of the window being uncertain at about thirty centimeters away you are you will pass through a toilet twenty percent of the times OK so you doesn't work so one of the first thing you would have to do is basically replant continually continuously with a victory because as you approach nigga you will become more and more certain about your position with respect to the gap is it clear now the problem is that since you're using a gap detector to know that the opposition respect to the gap but you have to sure that while you are approaching the gap there is always looking at the Gap Now what is the problem or what the problem is that as you probably know very well that what the others are and that actuated has the nominal systems saw it cannot sustain any configuration is possible but actually it has to satisfy certain dynamic constraints so you need to ensure they have reached the start of time because of these looking at the Gap Now most of the planning algorithms that been developed for Quadro there's only plan project tracing X. Y. and Z. it leaving the euro axis free so what did we do since our cameras rigidly attached to the court order we need to also plan in the your angle let me make sure that the quote There is always looking at the gap in a way that the gap is as close as possible to the center of the image why because we also want to minimize the motion blur. OK so it's an interesting Activision problem. So later in the question session I can tell you more how it works in the end we basically plan trajectories that satisfy perceptual constraints Joe medical strains as well us that we call constraints. So and this is the final. The most ration because there are now is a poem US doesn't use deploring and he found it actually the closest project that he can be plan the from us a little us three meters away and not thirty meters away. However he generates very strong angular acceleration as you go up it goes up to seven hundred degrees a second. With speeds also we surely can cope with different orientations of the gap like Dolly. Says and so on and he worked eighty percent of the times when did he fail he failed to when he didn't perceive that we know for four days and more than two hundred milliseconds because in that case the prediction by the M. you was already too much off from the subject. Now let me show you something. More recent Now these days is impossible to ignore a diploma grad who is working in diploma here. One two three three people OK. So I am home message to take it's impossible to ignore deploying why because of this in computer vision it's all the many many problems and the thing about it would be solving many more problems so they want if you want a computer vision company ninety nine percent of the papers are on deploying and only one percent under medication and at the moment is the opposite. But I think soon that situation will change wide because diplomat deployed networks can actually exploit context context from the images that is very difficult to hard code the standard of programming now the probably when you work with learning easier How do you generate data for your algorithms for training your algorithm so now when you use drones especially want to have a drone that passes through a narrow gap or you break a lot of drones we baked we usually break twenty one so you're OK. And so we try to find ways to generate train data without crashing and spending money so one way that kind of that we're following is using for the really stick simulators and we are lucky that Microsoft just released six months ago that Microsoft are dead basically provides the be a physics simulator as well as compatibility with many other pilots and what we did here is that we generated the data for outdoor and indoor spaces and then we had a task here the task was to have the dip in your network compute the depth for each pixel from a single image without any additional information so just using an eye to compute distances this is a task that we want to do very well because of experience and that's what the deployment network actually stated here so by experience by training he was able to India and but where did that the map of from images so he you see basically. On the left or in the way the input sequence and input image and the output is basically discolored called it that maps where Red means clothes and blue stands far actually blue stands for forty meters away we impressively got an accuracy of five for me there's a forty meters. Still not as good as the ground troops are from the Velodyne Lyda which is here which is very wary detailed and if you can see that you can see that actually the network can last a lot in resolution so this is something that the networks still have to improve but of course you need more train data in order to do that then probably also the from that could actually. Now there was one way to sort so this is one way to actually train for a drone also using simulation tools or after another plea for example using different domains that were not specifically meant for drones for example cars this is a paper that we sent to now to Ikea is not yet a line by its about the learning to fly by driving so. Because I was told your training did that difficult to collect for a draw and you if you want to one day to a drone that flies in the streets of a CD Well you need to collect a lot of data from the streets of a city which is then for humans and pester for cars and pedestrians so what we did here is that we use the data coming from a data sets available on line like the U. Das of the card data set that provides several hours of either recordings with the ground truth so for the steering wheel. And not so we use our own bicycle dataset So here what we did was imitation learning and we led the network is to me basically regress the steering angle of the car as well as the probably a collision with an in with if with an obstacle in front and now I just jump to the results. Basically what we ended up having in dnd was a drone that they learned to follow street rules and behaves like a car so. Of course so if follows the curse of their all the stops in presence of our pedestrians and bicyclists. Here another example of the following the curve of the street. In never crosses the oncoming lane so he's behaving very well it always keeps on the right hand side of the wall of their old it also flies at the height of the car the car will try to see if it generalizes it can actually go up to five hundred meters pretty well then it doesn't generalize anymore here is stops also in places of course structure works so all these are learning all the from pretty near that coming from the utility they are setting off from Zurich and know how to generalize to unseen environments like indoors for example this is the parking lot in our building and the drone actually was able to follow the right lane without having seen it ever you know we also. Trust in it and actually he was surprised as well as my student. So our Though this is another example also over the course of flying him in the corridor and it behaves also here as if it was flying in the streets so he thinks this is the lane I was treated and so it keeps on the right and then we stop the presence of the person now we try to see what the network is still learning from and we found that the network is actually reacting so not that we expect it is reacting to edges of course two lines by not only lines of the lane but also lines of the cards in front of it in front of the vehicle so this is to say what I was telling you before deep neural network as are able to contextualize so to extract information that would be very difficult to our code you would never think that to make a robot become autonomous on the streets you would also have to look at the behavior of the car in front of you which you actually humans do all the time when we drive right so this is what the network is inferring So I think it's base something very interesting to think about knowing when to switch to a complete different topic so we talked about Activision leap learning towards my vision of going faster gyle what I haven't talked about yet there is a low latency agility so another of my dream is to get us past a birth or even faster than a bird look at this video this week it was shot in real time on National Geographic and here you see a sparrow hawk the basically slaloms among trees passes through a narrow gap and then poof he makes the bear disappear OK only four seconds so humans and animals have a very low latency I. Which is difficult to replicate. And it turns out that actually agility and latency are strongly coupled in fact. That you Governor Robert is limited by the latency and that the temple of the secret is the sense in pipeline what does it mean. Well take a stand that camera where you get trains that cost on time intervals. So there are thirty or fifty earths The problem is that every time you get a frame you need to wait some processing time until the frame is processed and then you wait and you need to wait another thirty until the next frame arrives right. Now it turns out that actually the lazy Levinsky of the standard the rebel vision I would see them in the order of fifty to two hundred milliseconds like we will dangle this smart from computer smartphone device basically as latencies for example of fifty milliseconds and disputes are hard to bound on the maximum achievable agility of the platform why because think of a drone they cost on time of the actuators of a drone are around thirty milliseconds if you go from if you step from zero to a certain speed of thirty minutes a wants to reach the speed is nothing by you know for a small increment. Yet it actually takes less than one millisecond So ideally you would like to be below a millisecond so cameras do not allow you to death unless you use a high speed camera but they are still very heavy they need a lot of external light in order to work because as the frame rate increases the exposure time goes down right so there are now new camera us called event come at us that the state of transmitting frames they only first meet what is changing so the information that is changing from consecutive frames they're called event cameras and they've actually laid in still microseconds more than what we need the output is not frames but events. Basically every time that a single pixel the text change of intensity in the scene it triggers an event so the output is not frames but it's a nice income stream of events in this space and time doing. This is actually also called the bias by your sense of why because they are responded by the human eye. If you think of the human eye we have one hundred thirty million for the receptors but only two million actions so to mean. Why yours how is it possible we are many more pixels and then effective number of wires this is possible because our eyes our radio actually spike information are seeing closely so as a fuel photons heat up for the receptor before the receptor would trigger information actually triggers to the brightness so might be that as well while it spiraled by dispensible professor probably there book. In my department he invented in two thousand and eight to the first. The nomic vision sensor which is basically it looks like a standard camera and now can be integrated is more smartphones so it's very very small like us. Differently from a standard comfortable the output is not frames but rather changes so basically the camera sees edges you can think of these cameras in motion activated edge detector The cool thing about this company is that because it only has missed changes it doesn't the prelate frames that. Therefore it has a very high dynamic range from the common to have a dynamic range of a sixty D.B.S. while it decide that I mean vision sensor has a dynamic range of one hundred forty B.C. means it can seem both low light and very high like in very bright light even in presence of a solar eclipse like the one in Europe into out of fifteen This picture was taken during the solar eclipse and here you can see both the most shape as would us the fingers without putting any black filter OK this is to show they pull the end of the had and I mean can ensure they come and they also have a high dynamic rate up to one mag or so in principle you can do the national safety mission on one megahertz and they also consume very low power actually twenty million votes instead of one point five what almost one hundred times less than a standard camera there are many applications scenarios Now Samsung just announced that they're going to mass produce the sensor for just a few dollars and they invested more than one billion in event cameras so this is something that is going to revolutionize I believe perception in robotics especially. Because of that latency a one microsecond so basically breaking this barrier of milliseconds latency of the cameras. I provide this illustration to show you how it affected our basically event coming as work I told you that there you have two changes in the suit right so that you only see something if you move the camera so if you have it this can they draw is rotating with a black dot a standard camera will output frames a cost on time intervals while a D.V.S. will output a spiral of events into space and time Dani don't mean that only corresponded what is changing in a scene. Does this make sense. So X. Y. and in time and if the stops now no events will be generated because it's only sensitive to. That but I miss changes so you have the drastic reduction in language by the other cool thing is that there is no motion blur that typically affects standard compass because the this event coming out works in differential mode so there is no motion blur not photon accumulation of all basically computes the sign of the derivative of the. Now another way to which relies this information is to use a decent. This way but basically you accumulate the events that happened over a certain time in the world and you display them what we call the event frames that did not exist but we created within that to facilitate to illustrate how to we should as the events what is a read what is blue so this camera reacts to blindness changes that can be positive or negative sense or positive or right sense for the negative and you will notice that there is also some background noise if nothing moves so background noise is present which can be molded doesn't go shop process. But there are many unknown in the ideologies that are still unanswered then we try and talk to research about saw. Three or so ago what we did was to collect a data set that we decode the other performing aggressive mongers so we mounted the D.B.S. and the standard camera and we like to call the other flip in front of a known scene in it to understand exactly how many events are generated percent and what is the complexity of the problem for example for motion estimation sort of ideation and I speed you can see that the standard camera is affected by a strong motion blur why D.V.S. images that doesn't show any motion blur it's always very sharp but it's also unfortunately consistently different from standard cameras. And is actually the main problem so standard coming out are so different that basically because they output to seeing progress and there is no intensity information but only binary information you cannot apply standard computer vision audience so the fifty years of past researching computer vision cannot be that had to be applied to even come at us but another question is how do you process the information coming from any one camera so there are two ways either build the disappeared to all event frames by accumulating the events of up in the last of those thirty milliseconds and you can see the effect here if you use a Delta a wind of thirty minutes you have a lot of motion blur and if you go down you can go down to one microsecond but you don't see any then if you go down to two zero point one millisecond already you don't see much. So the problem is if you used too many you increase the latency if you used to either you don't see much so what what should we do. We have been working on for three years on a new class of algorithms and we call event based that actually what they do is that they use the try to update the state of the already them every time a single event is generated so it's called event based because they do not accumulate events but basically the events as they come so event based here these are going to have a latency on. Microseconds do we really need microseconds What does depend on the placation I would say for the POD rather if you want is enough but there might be applications like for example I heard this morning for example target tracking of where you're going very very fast and then you may need action that will be able to be much below a millisecond. So how do we use now the sensor First you need to model the sensor. Of a sensor work how can we model it so we came up with this. Formula which basically says that the probability that an event is generated is proportional to the sculler product between the gradient of the patch that you're observing if you're also observing a single pixel and there are no emotional depicts on the image plane. Let me justify this so imagine that the D.V.S. is looking at a simple patch of course it's looking at the scene my knowledge just focused on a single pixel. So a let's assume that this pixel is a single gradient pointing in one direction so in their more normal direction so you will expect that if you move the D.V.S. parallel to the edge no events will be generated I agree if you move it in the any other direction or perpendicularly then you a live are a lot of events generated with the maximal probably the when you move perpendicularly So that's where does comes from. But illusion these and other algorithms and now it would take me one hour to talk about but basically we were able now to do for example localization a mapping using this camera. So they put is basically the events a stream of events are synchronous in time and the output is basically a map that can be updated with a solution of the single events. Here we show images from a standard camera for illustration purposes Ali. In this video is that what we do is that we compare their performance with a standard camera running a standard visual damage I'll go to them. And you will see that the when you start shaking the camera too fast this time the camera fails because of the motion blur so you lose track of features while instead of the pipeline running on the D.V.D.'s does not break because for the D.V.D.'s this is a lot more. OK people usually ask me how fast can you go with this camera well depends on this from the scene if you are infinitely far from the city and even to an infinite of speed but just to give you an idea. If you mounted under a train in the companies at one me the distance from the track of the train the train can go up to seven am that forty kilometers per hour OK just to give you an idea our fast you can go. What is the sensitivity in speed of the sensor or other things that we managed to do because of the Higher Than I mean of distance or they can call very well we see in scenes characterized by stronger than Amica range so this is my i Phone camera that is affected by you see zones that are overexposed underexposed because of the sun shining from behind that old sign while the events of the event camera are always says that very clear and sharp even when the sun is shining if you would see a little bit you can still see features texture on the old sign. Here we do in addition to mapping and tracking We also managed to deconstruct the intensity of the signal from the events all the way we were we were not using frames but just events so this is to show you that deceiving cameras will all the information that you need this is a recent paper that we published B M B C British conference a few weeks ago which is about now fusing here in event camera we had the international sensor. End up with the most three D. So we deceive when so where we have touched the D.V.D.'s on a leash and then we spin it around and look what happens so these magicians to every we can do most restoration in mapping at this speed OK while it's been in the sensor so they measure the logical. Next step was to put it on to put it on now in flight so we managed to fly for the first time two weeks ago. And this is the first autonomous flight that with an event camera actually more than that we also using a standard camera tell you a little bit why. So here they put all these flying which is not a surprise is using the camera looking down we're using our vision a national event based biped line and now to short of the potential of this technology we're going to switch off the light so if you tried this with any commercial drone running the state of the of the visuals none pipeline for somewhat cameras they would crash while this doesn't happen because of the high than a McClain's of the sensor and because also the sensor is not affected by motion look. When you are in low light in principle you should do is use the exposure time of the camera but then the camera become even darker They the time of image becomes even that can now you can you can see that happening here. So what happens in my ground so these are the cause of the Flies So basically it extracts features from the event camera. And as you can see the standard coming out doesn't see anything when you are in low light OK I still do now actually this pipeline uses a hybrid approach we also extra features from standard camera and at the U.N. camera because when you go to law especially when you're home reading there are no nor many events so in this case the standard camera would provide features to truck why when you start moving faster actually they come in I would become more predominant. Other possible applications out of with the lower latency of the sensor so you can actually. This is what they say of the other on the US they did found them using this using an infrared camera and this is our draw now of embedded with out of them if we can sensor that because of the low latency the opposite will last segment this was a very simple optical flow I'll go to the running on. On this and the most there so in conclusion I saw the topic of this talk was a dry flight that what do we stand so I think that it would take ten more years to get there. And there are several challenges ahead first of all perception control I hope that you will remember needs to be cobbled can see that jointly and the thing that has been you know a lot of research has been done in vision for a body but believe it or robotics will be shown what wattle says so like slum Islamists a vision for robotics but very little of control go to those to improve the quality of perception I also think that the perception system ational slam of the field is where it's very well established Currently there are challenges in not shipping robustness like high speed motion texture dynamic environments. Remember the machine learning can also exploit context and provide robustness to be for new agencies. And also remember they didn't come out are very do able to Sherry and they provide robustness to high speed motion and now the NOMIC range and also I think they are very interesting because they offer a lot of intellectual challenge is because in the end the fifth standard cameras would be used for fifty years and I think it's also interesting to work on different sensory modalities just to show you in one slide is still the art of the past thirty seven years some visually natural or three we can say that the research was done on three different axis accuracy efficiency and robustness from nine hundred eighty to two thousand computer vision I strive to improve accuracy but it didn't go farther so using Fisher base metals I got an acute as a one percent we did their dishes for the medic medals in the last ten years we didn't improve curacy too much but mostly in efficiency because they're faster and robust robustness because they are a more robust tool for motion blur and the focus. Within I am you remember you achieve a higher robustness because basically even if there is motion blur you can still estimate the motion is more like amount. She timing that about Otherwise he believed and they provide that for ten times better curacy Now if you are believing camera you can improve in North Korea because you know curiosity efficiency and or in robustness and we do I thank you for your. THANK YOU THANK YOU THANK. You to. The scene. Of. The. U.S. but. That's a very personal so. I will open a different slide just to you Mr I want it's surfie choice extracted from the mainstream directly. I don't know if I mean every corner detectors. So Or are you familiar with the fast corner detector soon after the fast except now there is not intensity greyscale the information right so for hours now the intense information becomes the time that's that of the mission so we look for corners that basically So in the faster detector better you look for a nine peaks in a sea of call of sixteen pixels that are all darker or brighter than this and that pixel here what we do is that we look for a. Concentric segment so four segments of adjacent pixels that are all coming at the same time so they're all the only one in the center pixel that's India now the problem with the core of tracking we've encountered is that the appearance of the corner depends on the motion of the camera. Take this windows there if you moved there the camera out of there to any of the four edges of the window you won't be able to detect a single corner you will only detect them if you do other motion. That's the problem. Not this one is actually probably there is not pressure in this case with only really purely event based in this case that is another thirty. Yes over there. Yes So you have seen this some of videos we use this very strong it's right from IKEA this is used because still cameras need a lot of contrast book a standard cameras and even cameras they need a lot of contrast in right up sort of features they're not as good as the human eyes OK So that's them and their main problem but outdoors actually we don't have this problem we only have this problem indorse in environments where the carpet is mostly you know very low contrast the so there we usually have a like carpet with which is almost like this one actually will be perfect for example but if you world that is fine OK. Yes. But then you move it you're moving the pattern so this color structure like that it can be done also with standard cameras like the most recent i Phone ten or other deal of the connector Yes you can do that of course but then at that point I'm moving with you and that's different from what you're doing here here you take an actual point of that fixed in the world and then by analyzing the changes that this feature seemed to. Induce in the company which plane you're in for your motion OK so if you instead project a pattern you move the pattern and then you cannot do that OK so that the principle is completely different you would have to first. And then move a little start them up and through the registration between point clouds that's a different approach. Yes. The very questions are what is causing the scene. So what is causing the noise is basically different things first of all there is a threshold that needs to be overcome so that events are generated. Let me show you here. You see basically if this is the intensity of the top of the photo receptor they even come out of spikes event every time that one of the brightness changes or below a certain user defined threshold which is relative so in it's usually around fifty percent. And our However discretion is not fixed meaning that actually we more than a tacit question process so that after threshold the positive negative and we more than to the question process with a stream of five percent by that also known ideologies that we are not able to wall actually there are actually Dyke the looming effects of for example if you have a very bright light source like the sun maybe some events around may actually some pics of around. The pixel of interest can actually get events outdoor they are not affected by edges in the other member effects or for example of a very strong gradient then there might be some shadows summer waves coming out these are very difficult to Mother we are normal that more than this in only ideology so there is still a lot of research to do Not the moment that the only way we mother to know is that with this process so with the other problems are basically still present a but are not being contracted So yes there is plenty of research yes yes. Yes Yes So the question was What if all the peaks of Spike because you're moving too fast there is too much a texture and environment yes in this case and then you may separate the sensor bar first saw in the current. D.B.S.. So these are all the D.B.'s available commercially the current T.V. has a maximum event the rate of fifty million a second which is already a lot that means that you would have a death toll if you put. The camera under a car looking at the abyss of the street looking at their world of the asphalt and of modern or moving a fifty kilometer per hour I would say but. The more recent assumption D.V.S. social the one that is being commercialized must be used by science and can go up to three hundred million events per second that means that I mean it is basically like if all depicts a subway B.J. come in I will Spike Lee so we could handle that you could actually have it all sustained over the U.S.B. three cable so we're kind of the using the D.V.D.'s of the Davis at three forty six and it has a U.S.B. cable other questions. Thank you thank you very much.