Thank you very much for the introduction but I'm going to talk to you today that the topic is in a sense very abstract and so I decided to start with the one hundred thousand foot view down to the thirty thousand foot view and eventually and with some concrete examples specific examples that show how these theories apply to specific problems. The reason I'm doing that is I want to show you a little bit of an overview of the mentality that me and the researchers in my group think the way that we try to approach problem solving in terms of research and hopefully that's going to give you some ideas. Maybe inspire you on some topics and it will definitely tell you pretty much the things that we do to recommission models for infrastructure sensing the reaction comes here's the one hundred thousand foot view stats from a report from the National Academy of Engineering back in two thousand and eight that identified one of the grand challenges for our world today to be construction. OK that's something that we need to map and label existing infrastructure. Under the restoring proven structure topic of this grand challenge. More specifically just to give one example. Stands out of that challenge is that two thirds of the effort needed to model even simple infrastructure is spent on manually converting surface data to dramatic models and the result of that is build models not surfaces and point clouds actual object models are generally not produced for new construction and also this needs to rework and design changes that can cost up to ten percent of the installed cost is identified by that report that is missing is a combination of a lot of things a lot of things are not standardized a lot of things are not automated and that creates a lot of waste and serve the long term goal of what we tried to do is build top of two very solid pillars one pillar being the area of remote sensing the other pillar area of peril recognition to try to devise methods that are going to create these are our infrastructure the first step in that the way we see here is to create those models that are needed to American recognize the infrastructure elements that we want to manipulate for one reason or the other let me show this to you in a graph. This is the graph to show just a plan a path plan of how we view this problem how we view the solution to this problem. So you see here the two pillars peregrination the remote sensing and what we see as a very significant first step the ability to use a lot of knowledge in those areas and combine them to be able to recognise infrastructure elements after you see a wealth of things that we can. Do this works for example if we can recognize infrastructure then we can measure their size. If we can measure their size or excavation volume same thing if we can recognize infrastructure elements then we can count how many they are so when we can automatically count how many bricks how many doors how many workers how many trucks. You can see how many applications this can affect continue on. If we can recognize them. We can track them. We can track them and then we can track materials for asset management personnel for productivity equipment the same reason not least if we can recognize aliments just to the top of those elements such as cracks and pockets and sparring and. Which we can then use again for all of these application areas but you see this we do today because it's just so time consuming The difference is that this approach. We believe that if it's successful allow us to automate many of these tasks. So let's move on to the background from where we start from let's just kind of follow this map step by step modeling at least many of us know what this is the process of capturing infrastructure special beta transforming that into a structural problem and into a presentation for getting information from it and solving complex problems. Basically involves two stages one stages I need to go out there and get the data create my three D. surface get my pictures on those things. The second stage over there is to identify what's on. That surface extract it place it in my model. So let's take each one of these stages separately. Let's start with the first one spatial sensing which we can do with time of light sensors such as players can ascend three D. cameras these things can offer a wealth of information of course about the environment in terms of the spatial information and this of course comes in the form of a point cloud the measurements from that point cloud are already very useful without even going as far as building a model just by using that point we can do quality control interference with constructibility studies just to give you an example I have a picture up there. This is from a fellow professor from the University of Illinois use those point clouds to compare with the design specifications and automatically detect all the spatial deviations of what was expected versus what was built and also by adding the specific locations into that red areas that you see areas that show where objects were too far away so they'd fail the specifications. These are areas that were to magically to tackle this problem. Erick Erickson and I were to do the same thing is you can use visual data triangulation using photogrammetry the grammar three. Also quite popular for different applications and of course for their own reasons these techniques give coordinates of object points. Much like what a point cloud would do although they don't really produce a point produce basically a wire frame and this is used to model critical infrastructure monuments has been used for highway projects for building projects for tracking progress of tasks especially for certain. Excavation over several times now let's put those two together both of these techniques are going to give you a three D. surface. That's really not a three D. model the surface has no knowledge of the elements that it contains made of wood surface parts belong to which object cannot provide any other information. Besides the purely spatial information that you have in there for example we cannot tell you with a material. It cannot tell you what is the condition of the element and a lot of these other aspects. This is really the reason that we have a second stage. This is the reason why we start from that type of that we tried to create an object model and convert that through the surface into an information rich object oriented model for the second stage the processes that are involved. Typically is that you have to recognize where our what points off the surface belong to that object and add to it spatially related attributes are some basic level information cost information for Clemente formation maintenance information and then of course place that in the object model these processes right now are mostly to a great extent their manual quite time consuming and costly. This is what we were discussing in the beginning of this presentation and we contend that further automation can be achieved by using real pattern recognition concepts to make those steps more automated from what they are today. So let's talk a little bit about visual power recognition and what it means let's just give a very brief overview picture I like to show is a picture on. This is a simple sequence of events that we do almost every time that we have any sort of image analysis project to recognize a pattern down there with the same when we use our cameras we take the pictures of course those pictures are not perfect. We have to do now is then we have to remove all the aberrations of the Lancers to rectify them to bring them all to a condition that we can then extract information from them. This is the preprocessing stage and hands on the restoration and so on. We can then go inside those pictures and try to find regions that are meaningful regions that seem like they belong to a single entity. OK sort of a cluster in approach to segmented into areas that make sense as one step further and start looking at features of objects within those areas for example corners angles textures these kinds of futures as soon as we recognize those then we can put those together using statistical and syntactic tools to recognize objects from the visual data when you see here in the slide it mentions several other tools that could be used for that purpose. This essentially this model based recognition century is basically using American models eventually that are simply the recognition of multiple features a spatial correlation of those features in order to be able to recognize any such element type anywhere and the reason we call it model based is that basically we have to build a computer model that tells the computer what something looks like let's say for example we want. Describe a face. If you want to describe a face then we have to give the computer model of how the nose looks like how the mouth looks like the ears their eyes and so on and on top of that tell the computer what is the spatial correlation of these features. Where is the nose in relevance to the eyes and so on this information is all information that can be used then by the computer to look at millions of pictures and recognize faces of people. OK this is really a very rudimentary approach to explaining this concept and you can see it also now of course the result the recognition is not a three D. model. The result this is simply just a region in the picture with a tag region that says this area is a face this area is a column this area is a compressor or something. OK even beyond that there are recognition models for several types of elements I think this was mentioned in the preview slide for example faces cars planes. This is a very popular object for cognition models but there are no such models at least not to my knowledge for buildings there are no such models for the infrastructure and this is a big gap in knowledge that we are trying to fill on top of that we need to find a way to combine those real features with really surfaces to get a very competent description of the object. I mean just look at it if I know the size and the shape and the location and the material of an element of these things and I have a very good job description. Plus some additional features of that element. OK And then it makes my will. It is much much higher. So I prefer steps but we followed in order to solve this problem started several years back our first attempt was basically a material recognition approach that tried to follow the same process. Basically it can take any material sample that you give it and then using that sample it analyzes it to texture features that are bundled in a statistical vector. OK looks Firdos features inside of a picture and the way that that is that it crops very much twice filleted the potential infrastructure elements from the background such as the sky or the foreground the trees and the birds and so on some of those parts that are interesting for processing and then compares the samples that you provided with statistical vectors or various areas to find out those materials in the surfaces. Here is just an example of the software we created back then. This is a picture just a snapshot of the software that shows that in this picture. What was identified was forms concrete rebar and good. OK And this is that I'm not American. Now of course this worked but it was a rudimentary model of a cognition approach was very basic in the sense if you look at that picture on the right. You know only use a little bit of color texture and a few metric measurements a lot of things we're ignoring a lot of things such as shape information such as the structure of course objects geometric properties symmetry convex and many other metric measurements. If they were considered instead of just being able to find concrete able to find the concrete call them instead of just finding good. Able to find the door. OK And on top of that what was also amazing that you see where is the correlations of those features where the corners in relation to the center of the object. Where is the relation to the door. That would help getting the recognition completed. OK so what we're trying to do what we need to do is to formalize that presentation strategy of parents of infrastructure elements and try to automate their recognition using that representation and then of course match those detected elements only through the surface so about this way we will not just have the visual recognition we will also have the special recognition of that object and how we decided to do that our solution is that there are a top. OK by creating visual pair of recognition models of every category of elements not just every element every category of elements and then try to identify or devise image analysis approaches that will help us recognize any picture in that case. And of course combine them the recognized visual information and the three D. surfaces to recognize those elements basically those three words that you see are three of the arrows up there that connect to the replay are models. So the matching elements on the three D. surface. So how do we make those models. What's inside them and what do they really mean this is what the slide shows. First we have to identify what are the visually distinctive characteristics. OK. If I want to be able to differentiate. Let's say a human from a table then if I can recognize and know that this nose is a distinctive characteristic of the human that the table doesn't have However if I'm using let's say color wide and the person is wearing a white jacket and the table is wide then this is not a distinctive feature. OK we have to find first of all what are the distinctive characteristics of every object type that we're interested to look a second thing is even if we do find it. We need to find that can help is represented. OK And then comes the second step to identify the characteristics or presentation features. So imagine Alice used to try to find every algorithm possible that could help us find that feature and then last but not least if we can identify those features separately. Then let's put them all together together with their spatial correlation to create that element model. OK So here's where some of the tools that we use for the much analysis part you see here a much intensity color channels for transformation and proficiency adds that actions but I will not go into that direction. There is no point in discussing it at this stage seat in the examples again the other issue we have to look at is that we cannot just make one model to do everything every object category at least is relatively unique and we are going to need different features to recognize projectors versus tables cannot use the same features and so for every category of objects would have to create an individual approach because each one is a lot of experimental work because each one is a lot of work from the perspective of the person who is going to work. Wanted to get it done what we're proposing to make this letter work faster is to use basically hierarchical probabilistic models that are specifically targeted for these kinds of applications basically you create a set of training images that highlight the features the Pactor restricts that we want to recognize and then those models where they can give us is basically the optimal when they are optimal parameters that we need to use in order to identify those future so this is what this slide is all about having done that then we can assemble our models using concepts from recognizing the object classes from arbitrary poses. OK it's important that we can recognize. Let's take a gander example of the face not just from this angle but also from this angle from this angle and from this angle and from any potential that we actually have the same capabilities for recognition and so we're using compact models of public categories by linking together diagnostic parts also known as the different unique angles of an object from different viewing points creates a much more stable result. It allows you to be able to circle around any complex object and still get the same capable recognition Eventually each front of these models that we create is expected to recognize objects under the following constraints from arbitrary poses under arbitrary illumination conditions very important highly clattered environments such as for example a construction site. Of course with a lot. Ship of different objects parts. If we can do that. So I was enough of the high level stuff going to run the plane discuss a little bit more with concrete examples meaning concrete but it happens in this case that it means concrete but anyway I was on the purpose of saying that specific examples of how this theory can be applied for all recognition. This is an example that we've done and it works. Show us not the most rational. OK So let's be concrete columns. How can we make a repair model for a concrete color. So we can recognize it in any arbitrary picture. Well first we have to look at the visual there is six. Now I'm following the slide three four slides before I was talking about how to make those models. There is the exact We identified as very distinctive. Well one of them was the long vertical boundary edges of columns and the second one was the uniform textured color pattern of concrete or any plaster that you put on it between the boundaries of that object. So how can we identify those distinctive patterns what image analysis to scan we use. Well we can go into very much intensity to some added attraction information from the color channels cannot allies the callers to the facts of brightness get the color information and also from the much intensity we can apply certain filter banks to get the texture information the sportiness the line there is that we consider Ventura we created this revolution that you see on the right. The left is the result of that that you will see in a video. So we start with the video frames. They call lines and then we find the line pairs because we need a line to define a color when in those line pairs we look in between them to retrieve the texture. That's inside so that we can differentiate let's say a tree trunk from a color because the texture is different. Having done that we can recognize the material and then we have let's say a complete representation of an object. I think example we have to do here so let me just stop this. So let's start with the third example. This is something we can do live here. So here what we have is a laptop and a webcam nothing fancy. And here we have a fake building and we're going to try with this fake building and the way we try to fool the algorithm is that we've actually painted this plaster building with real concrete so that when it looks at the texture you actually find that the texture is correct. Now the camera should be alive. Let's see there's a world so red but you see over there that red rectangle is really the recognition. It happens real time with a simple webcam and of course as you can see it doesn't work all the times. This is why this guy hasn't graduated yet. But it's getting there and it's getting there. We try to optimize those parameters to be able to make the algorithm invaluable to me nation potential issues that might be created so that now of course we haven't needed to lock so. Well but this is not really the target the target is to make it work better with real life and in fact you have a column here trade head because of the riders hitting the camera. It's blaring the camera where we can show you this was actually shot like this. Well so you will recognize where this media comes from it's actually right outside of here. OK And this was just shouted like a couple of days ago. OK so his cameras and the video is pretty much working quite well again it's not perfect. He still is flickering we're trying to solve the flickering problem together with a professor from Iraq to engineering but I think eventually we'll be able to get it to work clothes that go back to the presentation the same thing. The same concept doesn't just apply to big solid objects you can also apply to defects can also apply to damage. It's really the same things. So we've done for example some of the work on cracks recognition and I can show you again the same approach through the concept of cracks OK So what are the visual characteristics. We have a darker area inside the crack compared to the neighborhood pixels and the other one is that cracks tend to have some of the narrative. OK that's what I would define as cracks. If there was just a big hole. Wouldn't define it is a crack. OK so we use those two characteristics and how we. Get them. Well from very much intensity. You can get those dark points and then we can then look at religion regionally narrative measurement formula that kind of tells us how linear actually our line to differentiate it from lines that might not be actual lines it might be more squares or circles so rather objects and the result of that is what you see here. OK We have our original picture we can get the edge pixels then using that we can get a crack map. OK using crack detection. This is something that people have done already. Beyond that what we're doing is everything after that with much training to get the skeleton of the crowd and then from that get those boundary boxes that can actually split the crack into five or six pieces and tell you here is one piece of the crack. There is another Then there is another and so on. This can also tell us the average weight the maximum width the ranks and of course if you put all of that information together with some structures loading information then maybe you would be able to say Well here is the condition of the element here is how bad the element this automatically This is eventual our target. Another example. Can same strata philosophy air pockets. This time we're looking at the little circle some type of concrete as defects that for some reason might make let's say especially architecture fail the specific ations if they are pockets are too large. OK so instead of having a person going there and saying visually it looks OK. Find those things and measure exactly how bad the phenomenon is so we have here two characteristics. We have a darker circle compared to the area. We have. And also to say OK circles are value side and so we're using three patterns here we're using a much intensity to get the crack points when we spot filtering we can find the circular regions. So that we can separate their pockets from cracks. OK And using a much pyramids. You can actually find what is the actual size of the air pocket. And we can create distributions. OK of how many air pockets you have of this size that size and that size and so on because of course if their pockets are merely neater you cannot see them. So the visual effect is minimal. But if you have their poppies that are one or two centimeters large then you have an issue then it might be something worth looking at here is just the result of that great you see that the larger air pockets with larger circles the smaller ones with smaller circles indicating that not only were they found plaques were size of those pockets were found this is just how that software works you can see it was made by a Chinese student since there is Chinese up there on the software so he is the only one who can use a plane and that I have one last lied. After those examples to talk about next steps. Like that philosophy has been working fast so far and we plan to expand it to other objects to look into for example other structural members beames labs are still members which initially might not look such big of a problem. OK I oftentimes joke that this way we can do beams and if we turn the camera. So in this way we can do columns and if we turn the camera beams. It's actually not that simple. Because if you have you been directly. Apparent to the normal assessor face. When you see something that's horizontal tilt your viewing point is no longer horizontal. OK So immediately start having problems pads to slabs. It's not an individual object and that creates more issue so there are lot of issues that we need to look at in order to be able to find them anyway and also other things temporary structures such as forms mechanical equipment that work construction materials bricks insulation panels and so on. There are much better that we can currently do basically construction equipment and also personnel personnel in the sense have construction workers as opposed to pedestrians. So instead of in doing recognition of people trying to do recognition off. What is this think of construction workers for example the hired help which could be the uniform they're wearing which makes them construction workers versus a pedestrian that might accidentally be in the camera's view and that Dan and I welcome any questions. Yes dissertation and we're trying to solve the world runs from turned out to be just like air conditioning units that maybe the case that we were getting versus our situation. Well it may be hard to have you any of those issues. That's a very good question. Well we learn more scared we get our example people from computer vision they're very pessimistic that we can do any of this and the reason is that the deeper you start looking into the problem you start identifying exactly all those problems from what you said something that's much more easy problem but kind of approval and is what we face between cracks or pockets discoloration and so on. Many things existing on the same surface. So how can we separate those things. How can we have one be recognized in the panel of the other and have one not being obstructing the other OK Eventually what we realize is that there is no magic solution only magic solution is you have to go after every target independently. So for example trees might have a distinctive pattern. So you can come up with an algorithm that identifies trees takes them out of the picture. For example with air pockets and praxis we identify their pockets. And then we replace their surface with concrete. So we can play not that surface with regular concrete and it's like their purpose never existed. What's left behind is the cracks. So it makes our cracks recognition algorithms work better. OK So one approach would be race find first of the things that you want to get them out of your scene then look for the things that you want to find and hopefully that's going to improve your recognition capability. Now if they should. OK occlusion you sometimes resolve. If you have a partial. That's distinctive enough in that case you might use active contours other methods that look at. Features. If you lose the object completely for example with tracking we face this problem with tracking that a person might go behind the column you don't see him when he appears again. The only solution is that you have to really recognize a person. It's very hard to see estimate that he has this philosophy so I expect him to get out from that side at this time it's actually easier to just go back and we recognize a person from scratch. So it's a very problem dependent of how you would approach this very very very very similar result using your own networks. You know was there with her back when you were using your and your we have done this work and we have done this for patents that we can easily describe ourselves such as for example with materials recognition in order to when you give a sample to the algorithm that sample. If you give it for the first time. If there is no concrete sample before there is not to my session in there. You need to use to get that sample perfectly and in those cases. Yes we need that works if we want to really increase our detection capability and we've seen that it works. However the reason we use the hierarchy probabilistic models is that these models take into account context. They don't just take into account the object. Itself. Look at other objects that are nearby might be relevant to it. OK So for example it is common that a column is going to appear next to a bill. So if you see one in the other it probable that next to a big you're going to find a color. So these kinds of features is where the have a ballistic not us help us improve our cognition capability is more focused problems where you just want to consider your own parameters and based on that get your specific result questions. Yes. Very good question and I have something to say on that initially we cannot do that. OK It's very transparent. It's almost impossible to identify some way to represent it especially with textures you cannot do. And this is actually something that happened to come up the idea of how to do this we were looking at Windows is there any way to recognize windows. OK. What makes a window when the first is just an open gap the way we figured that out is that we cannot do with cameras. We have to use some other sensor and the sensor we found is basically three D. cameras because when we use a three D. camera pointed to a window the signal becomes one hundred percent noise. OK gets lost completely but with any object next to the window the signal is very small. So if I was to open the shades here and use a three D. K.. And look at this surface you would see on the computer that there is a small surface here soon as it hit the window the signal would be jumping up and down like crazy. OK this craziness can help us identify that this is a window on the window not problems. OK because the signal passes through comes back from another object so we see again a static surface. This is something very fresh that we haven't really looked into it in more detail hopefully within a year. I'll have more results. Yes Well of course I have consider infrared can be very cheap. OK but once you go into other technologies we can see there are also even radar technologies. These are just signals for K. solutions we tried to achieve at least from my point of view target low cost solutions so Lucian's like this. This is less than twenty dollars. OK you connect this to a laptop you get columns recognition. If I was to do the same with radar. How much would I be spending in that case especially since most of our construction sites where one sensor is not enough. You need maybe one hundred cameras two hundred cameras. You need those sensors to be really cheap to make it something that seems like viable seems like something people would like to use. So this is a sensor might be great but it might be too expensive. This is the kind of thing that we're looking at. Yes. Really you know this is something that has been explored computer vision basically seen recognition algorithms. I'm not sure if this would be an approach that could be applied there to my understanding the scene recognition algorithms are already doing a good job so I wouldn't know how this would actually help them. It could be possible but I've never tried it yet to tell you have this would work. Yeah yeah yeah. And there is actually for example with tax time the forests of that type you can even find online some demos that show a person driving a car and as he walks through the city the camera tells the driver. What kind of a scene he's looking at is it an urban scene suburban So in a rural scene again looking at the surroundings as a whole entire scene. I'm assuming the same thing could be used for recognizing spaces for the same problem. Fassel it looks even more difficult to solve is action recognition. How can we recognize what a person is doing OK or what our piece of equipment is doing this can be very tricky and we've seen some simplistic solutions. I reviewed a paper recently I shouldn't be talking about it but just to give you an example someone was proposing an algorithm that would recognize when a person bends down as a way to measure the productivity of iron workers who have to bend down to tie the Rebbe. And my comment to that person is right if that person is bending down to tie his shoe laces is the same saying how do you differentiate the two you have to find something that's more distinctive than just simply bending up and down as a way to measure the productivity of the walker. Yes like yours ago when war broke or bad weather. What did you know or what or why that happened. Goes over or me or your or your friends or your life around very very little or while thankful for what you have were away for life without wife a real word for word. Moreover I think you're breaking up an interesting point and the way that I view it is something that I should have play somewhere and I didn't is that the object we're looking for is I would classify as frequent objects those where it would make sense to go through the trouble of recognition model because you would need that so often as to justify the need to doing that. What we've seen for example in buildings basically twenty percent of the object types with eighty percent of the time. OK because they are the most frequent ones. Everything else is a specialty item. So what a target is twenty percent because you can't do that twenty percent of America eighty percent of their work is done automatically. And then they can do the rest twenty percent targeting because we know that there is no chance we can do everything. Especially objects that are very you know very easy to find those things just like the examples of concrete columns that we should be looking at first. Sure maybe we can move on to specialty items where I work give you some recognition but it would not be perfect if you have more samples increase the variety of potential looks and of course your capability because the spectrum of possibilities that is true because if you want the computer to find something for you you have to tell you how it looks like OK that's really the bottom line of this.