[00:00:05] >> It's a pleasure to have with us Professor Grayson Justin from Boston University professor Justin was invited to be a part of this by Professor $100.00 here from Georgia Tech so thanks him for that connection just a little bit about her background she had a surpassed there's a great a micro-electronics and you're in from Savannah see university in Turkey but I pronounce that correct and then I'm a master's in ph d. in computer science and engineering from u.c. San Diego during which time she also in terms of Sun Microsystems she moved to Boston University in 2009 in this current In associate professor in the Department of Electrical and Computer Engineering She's an associate editor of Tripoli transactions on computer aided design as well as transactions on computers and among many awards has won an s.f. career war in 2012 and I'm going to hand it over to her to begin her presentation Jeff thank you for for joining us today thank you for the introduction and thanks the one that put in liking me I am I said Russian and us interesting I'm an associate professor there and mom and this is where I came thank present this has been an inclination with my colleague purpose and I take her she who is also to take out and and also our case to students of course we have been collaborating with c.n.n. if he and friends and some of the group has been clipped mission with folks it is history and so today I'm going to talk about how to use that he says in 2.5 he actually systems with looking for the next they can put an x. box as a way to build energy efficient systems and they see every pro inch or vision they have to do this is to do a cross layer of optimization and I'm going to specify what that means. [00:02:00] There are obviously the trend has been that you know we keep integrating more cores into our process since this has been going on for several decades now and. And you know while it's fun it's not easy to build. Large ships with hundreds of course potential though we've seen this percent official in the last decade or so going from a few course to tens of corps and now I'm talking about potentially hundreds of course now one way to build. [00:02:34] Many course systems in a more practical way in a more sustainable way in some ways is to go with a chip that these approach basically as opposed to putting all the cores in one chip one hoot and designing a bunch of chips let's and then integrate all the chipsets together and people might be familiar with that approach of program that started a few years ago and there are other programs of course that are along the same similar lines and that research program for instance was about designing a whole bunch of Donald standard interfaces for making this chip that this integration is here now this is basically what's happening and I believe this is one way to. [00:03:19] Continue the efficiency trend and reclaim some of the dark silicon that has been an ongoing problem basically champ put a lot of the Lincoln out there became for the little Genesis and s. of the can but keeping them all up and maintaining the efficiency tags that we want has been a primary challenge now what can 250 patients bring to the table so 2.5 to the patient can mean a whole bunch of things of course but broadly it's a 1st in having some sort of plan interposes and a whole bunch of chips let's integrate that's on top of. [00:03:56] And this gives higher yield because as opposed to building one big ship you can make smaller triplets and you can have a more robust integration process and cost wise it can be advantages in some cases any surprise an easier way to protect her genius acknowledges to get this logic a memory so can and so can for the Knicks and others as so the picture is from i.b.m.. [00:04:21] From some time ago actually but the vision the still there so you can put together is effort and definitely remember earlier processor they are all together in a system like now com great if they really. Major advantage of a 2.5 piece regarding thermal saw through the integration increases the amount of heat generated in the since the print obviously so 2.5 It doesn't have the problem of 2nd 2 minute layers on top of each other so thermal wise it's minutes and there are a subset of course an answer just as of today now then you know is this the bishops phase as you know switch to 2.5 p. chip that there is design and if the lower cost and Egyptian chips we hope so but we believe there are some important challenges to be resolved to achieve this vision to realize this vision so are the questions we've been asking in the last few years in our project has been one of them is like is that a need for some new interface to network technology an act that shows basically how do we build this with silicon photonics particularly and then have wide Things like them assess the beauty and its impact on the overall system in it for instance. [00:05:34] Interposer can have a simple thing if that rocks that can call it a bunch of chipsets in a high speed manner at high speed advantage but then the races to the people doing their arses are sensitive to temperature so if they go all solve this problem then that adds to the power of course because of you know the relations need to be maintained and so our overall the. [00:05:57] The end goal should be having an Egyptian system not just in 2.5 So we believe this challenges need to be solved to be able to achieve seem to realize that. This is another text because it is so many of the attendees might be experts in Dubai States military I'm not of the Us person so if the disclaimer but if you have the most religious questions feel free to shoot them and I will make sure to and set them in the best way possible by playback and or assist outline as well but you know in his people some people are not rocking silicon and technology. [00:06:36] So we are very safe so Company think they are talking about a laser source that could be launched for a chip and the leaders even put through the driver any much later and the lights are on the right way and propagates into an output for those and others my careen can as an insertion air filter essentially it goes through for the detector and it has amplified and of course you know the the city electrical optical combination circuitry has to be there to a point to this into the electrical circuit and now in it you know it's a high level what are the advantages of using some company technology it provides a lot higher bandwidth density still in it can provide lower long distance communication latency to a special memory I thinking about building a big chip using 2.5 technology with a whole bunch of chipsets connected to my interposes long distance longer distance communication latency and become an advantage and the data dependent energy post option is. [00:07:38] Much lower or negatively when compared to lexical things now that are some of this event or some challenges that come along with simple to the next they point to and they are sensitive to their limitations and as Willis poses British Now what this means is that you know relations be innovations we live with relations it will in the electrical circuit don't invent But here what it means is that it's going to require high higher thermal tuning power if you have a lot of patients in. [00:08:08] Because it's the end of the day the communicating rings need to resonate at the same frequency so some of these devices need to be heated up and same thing with dealing with process relations though one will have to be taking care of process variations so that communication is robust on the system and and is highly sterile to me power can grow especially for like ships if not taking care of any other challenges include a high of tickle losses especially if there's a lot of longer lace or lots of turns and crossovers etc and the laser source efficiency remains the law. [00:08:48] And you know we're talking about hope our division is over 15 percent of the places I believe and I saw if if you know efficiency is a problem then you want to increase the plesiosaurs power but and that as the overall power consumption of the system so if the end goal is getting the best performance per watt we are spending out of the system as a whole we need to consider all of these things as a whole to our one could design something from it for a conscious not here stands for network on chip using several of the full ways rocket them a little bit integration 2.5 to 3 and you know there are some pros and cons with each one of is taking x. like I mentioned we are in this project have been focusing on mostly $2.00. [00:09:37] Reseated has a more practical way of. Integrating things at a reasonable cost and you know we believe this is quite impact as of now but if you have thoughts on I don't know one of the 3 really having certain advantages I would love to hear them that I mean particularly with some company and experts there are they're focusing on 2.5 feet but many of the things they're doing could apply to will think and they expect to be as well. [00:10:06] An hour is I mentioned before that like at the middle beginnings our t.v. vision is to look at the system as a whole and designed system looked musician to help achieve this energy efficiency improvements agate and what the system that will mean basically a system has many things it has the races there you know. [00:10:28] That are basically writing today to take knowledge the people build circuits with them it will build architectures on top of this focus there's a system operating on top of these acts and then there are applications running user applications that are meant leveraged these resources now much of the lower overall when you look into support for the next network so chips or point 50 through the integration looks into one or maybe through all of these layers to optimize something over there are earth in contests is advocating basically modeling this and I have spared though that you can see what happens to for instance you may still a sensitivity if you're on a private application of or what happens to your overall energy efficiency if you spent so much power in the maturity that we believe by the looking into this system as a whole one can understand things better but also potentially improve on some of aspects without leaving optimization opportunities on the table so we want to actually achieve more by having this type of a system level and specifically give them several different things reluctant to design if the musician forest is peanut here or the 1st of this opens up and if this recall chip optimizing the piano player it's on a ship with a system that will go basically looking into all the way from the operational the it down to the device and if you're not what kind of power profiles you how drew up to my eyes only. [00:12:06] Similarly we looked into grossly oppositional knocks and $2.00 systems and they have done work in a system management aspects basically how would you manage your runtime system for instance if an applications presents a different course to help with the thermal sensitivity issues. And lastly. The done a bunch of for pumping up our optimization basically how to manage the power also optional associated with that if that's right so that the overall energy efficiency gain is higher in this whole I'll mostly talk about the on going up power optimization and then I'll come back and that's upon the others aspects that I just talked about. [00:12:54] Well. It seemed pretty it's to do system level optimization is to have a system level or across the stack model that can. That can work with a variety of systems and architectures and technology assumptions at its door if the one is there will be are interested in looking at the impact on the macro in this in interest is out as I mentioned briefly before says 2 to thermal emphasis releases though we need to model this impacts and we need to model how much getting power is required to thermal to in these him are asked and the overall energy efficiency impacts or basically need to have the models for this but we need to look at this model when we are actually you know emulating some application running on the system we need to look into the actual impact one could have the full designs resulting in different data rates vary in length and if a number of plate mailings if numbers been missed in or out of the colleges so this has to be taken into account in this simulation prevert And like I mentioned we need to have an idea of what kind of applications are running on the system because knowing the arctic chill is not sufficient to have a good idea of the power profile you might call it a guess the 2nd a high end servers there's an average power poles option and if you are an essential and escapist to begin with it is the system gets largish this variation of the Everest base and there averse to space is a sensor in terms of our assumption it varies a lot so we need to have an idea of every applications and the bridging this is. [00:14:30] Now it's a high level again you know the solution framework will then have some models regarding devices are pictures and systems you see in the middle. The way sensitivity models them a control the sensor needs to be plugged in and similarly without the actual assumptions and the like and the inputs would be the belief in profiles the actually cherish that we are tweaking efficacious that we went around in other system settings and the outputs we're interested in getting and getting out of such a modern frame refuse system performance along with an overall system power but also specifically looking at you know power and thermal properties Dispy know how our internal profiler into related and so depending on the thermal profile of the system there's going to be the Filipino power consumption because of the cemetery in power and like I mentioned air power and thermal are generally it's a belief that anyway even if you look into you know it's a good system a unit to look at both together because of its power issues that is temperature dependent forests there are this is the overall assimilation from record high level out to visit with some more details later on. [00:15:41] So I keep mentioning fomenting coercion and so what happens with the several killing business and so are there are measuring this and it says are designed with a particular intense. View to process really Asians and. And similarly there's a shift from from this frequency so people here shows process variation t.v. here shows temperature is. [00:16:09] Dough then the idea is that Ok there's somebody sions will happen so on but then. And move. Will move this resonance frequency to a particular level where you know the system is going to have Still there are some cases and now this is not new I mean people have been designing this basically put some sort of a heater and then using it Joey think effects and you provide some hits without insurance and heat up the ring and our colleague said let's he actually designed a nice and all console look at that has a feedback loop here it compares a 4th appearance with the reference current and I just it's a fine down there it's the kind that well and you know this is they've demonstrated that it's a c.c. at Opel of years ago I fell hard details in that paper and you know I cash in with details it is interest as well so this 1st obviously somehow or overhead like emission because of additional current that needs to be put in to get actually to a price bill that will to. [00:17:19] Heat up this device now maybe one tenses that we can discuss is that. You know if they're all discussions with many colleagues for designing the races it's also possible to design devices that are ethanol. Or you know are less sensitive to temperature but then there are other issues and so on. [00:17:40] Actually having. Having the races react to temperature is that a bad thing because process variations will happen one way or the other door the thermal tuning is also making isn't to alleviate and for us it's really a sensor that you can actually see the operates not acknowledge it and that is prone to lots of there are maybe you know it's it's good to consider that I saw in a real leverage this specific Alexander McCall to look still quite literally but there are other examples out. [00:18:14] There on the architectures or the overall system we have been. Have been working with it's called popstar it's Stephanie interpose a basement of course system orginally developed in literacy in France and my one of my Ph d. students said it's contributed to. The design of a position of this particular system during a 6 month day and friends saw. [00:18:44] Its the attic to the never spoken before so our computer chip lets And there's a whole bunch of them and then there's a there are Tessier 50 if that extra that's there. Anything the computer chip lets add today are and that's very much there's a so competent interpose ish and and there's a main base going across the system and these micro rings are organized in groups to enable the publication across to the chip that's the laser source of the chip in this case. [00:19:14] Again you know there are more details available in our recent publications in this pop star at the. So all I fear is a more detailed version of the simulation framework and and we are happy to share parts of this if people want to do similar simulations. And. If we intend to release already the part that is computing the he thinks it's almost an empowered and you know our insights by the well that's probably the least you can share some of the other pieces of incest just let me know we email what your project is and what what you would want to do with it as on the left side you see the I picture will soon latest My 1st which we will fight for the purpose of this and if you have a different article that was later you can if I slip is basically what you need to simulate a dead end run an application it is aggregate it's it you can try out the sense with combinations different words frequency settings and you know some other words you're at the front of this obviously and you know this. [00:20:21] The simulator should also know something about their families or male makes you have because that's pointed determine how much when with it came from I don't the system and that's going to impact your performance overall. And then there's an extra power model. London which is looking into both electronics and laser power and the logic power that if people see fuse caches It's a track these are coming from had to be Telegraph McBeth was reported types of chords and caches that we are using and then on the top box that is in the reds that box. [00:20:57] Has but 3 the which is a 3 the extension of the coming years them a little hot spots which might even have done some some time ago the suspect that it is now used to I mean all the good or the ups and layouts and with all the material properties and uses and such as so that it can. [00:21:16] Provide the resulting temperatures and you can choose the ground you know if you want to play with. In this case if you want to know for instance what's the temperature at m m m r. So once you know the ringer temperature then there's a lot more you can do basically then you can look into Ok this is how much time will to empower I need to apply to get the rings the same temperature you can say if my sail cation changes this is how the temperature changes and this is how much you know power outlets and how and so on and so forth and this system policies should be also rare of his efficiency and process British and other benefits as that might impact your system and overall then you get the meting or hitting our. [00:22:01] Saw you see on the left side or middle side electronics and these are power if he power and all is our own minds to get the overall system. And now. What do we do with it now like we have assimilated that the can play with Ok we can run it many courses them that's building this ship let's action. [00:22:21] So what do we do so in this case by the way we experimented initially with. Sticks to plates each of which had. 16 course I believe but then the very. First ordinal designed that and it's a was looking into bins and in some papers we made a number of course and chipsets and the like so or. [00:22:47] So one key observation that we had was we had the simulator what is that you know when you're running different types of applications on your mini core system. You get promised benefits out of you know having more of them there's basically. Having more land as means that can have a man's. [00:23:04] Delegation's benefit from this more so if you look into this plot on the left you see that some applications. Exit crucial time drop substantially as you increase the number of active numbers and some other applications like Greenline olds or the young ones application etc They they see a more mild effect door the reason for this is the implication properties for some applications basically using it for more and some applications use this request and actually fabrication may have some variations within the application outlook about it now one would think that well no out of you know or design a system for applications that actually need the network but overall we design systems that are not going just want Haskell do it out of tests and there will be some amount of finished and the observation here is that if you provide more bandwidth into the system some applications will benefit from it for others it will be wasting power and dock and if you provide more bandwidth then people power increases though what one can do is still exploiting this observation. [00:24:13] And provide just enough benefits or pension to cut down some of the analysts to power and all the right side you see the power increase as the active lumbers increase from one to 6 for the system and a legit power in some cases increases with higher profile or some cases that is more stable but there wasn't in power in all of the cases increases as you see as the temperature has increased both to exploit this what if we didn't wish to design a policy called babies and those coming from ailing selection and the purpose here is to identify the minimum number of what in a channels that satisfies the man with requirements and given a threshold of performance for instance I can say ideally this is the highest profile as one could get I want all the one per cent lower than that at the last or you can say that you know I'm less mystic is about the problems I can get maybe 10 pro-science a load and the idle populace analyse what that means in terms of the overall energy efficiency and. [00:25:16] End this failing solution has to account for thermal and process variations as well because of the reasons we discussed before so basically temperatures as a temperature changes as the process relations change. They will impact the innocent because of the Rings and it will need to cope with the heat in power and the like according they can see the time so I'm just going to quickly. [00:25:48] Go about another 20 minutes or so. During the Ok. Perfect So the idea over here is going to be running waves to select the. They said assumptions number of failings while accounting for these overall system levels view when the associated thermal deletions and process finished there are how we did that initially we're still Ok You know you know we can profile the applications and see which phalanx throws best for them front from this in a deficiency perfect storm here and on my screen of power is basically when you activate all the other mail and saw that on those you're out and you see several different choices for instance one of them is just providing one per cents it allows for profile institutions the blue light blue bars laugh or 5 percent the collision and the red and orange bias pro and a little bit more flexible profiles and performs budget with like 10 percent. [00:26:54] Down 10 percent off on the slopes a lot here and the applications are around with 40 friends and these are I forgot to say before these applications from scratch and price sick benchmarks foods that are. Popular free in the arctic chill and design of the mission communities that I know what we see or hear is. [00:27:16] We have high power savings for our applications that are not really communication intensive and the communication here is basically is going to be. Between cores and caches as I actually saw the more parallel an application is it's going to require less communication like the app shows an example of that the finest application that some of you may know of I saw if the application doesn't require a lot of communication you can save more because you know intuitively then you can turn off more along those and use less of the capabilities that you have in us the company that's right though that's kind of obvious but I think it's important to see. [00:27:56] Saw also obviously if you have whole or allows for performance loss you can actually see more power and you know actually more efficiency and then we're not actually. The tool bars that says 1st one of the best time that this number is finding the best option the 1st one is basically when it's during the summer tuning it's push there with it took a have the same same same 1st time they will bill and you know we can discuss more about that it's needed so basically if you have optimal healing solution and that is considering the p.v. and he will you can and you can save more dollar all I mean looking for this. [00:28:38] Remark of performance but keep in mind that this approach requires knowing your applications and profiling them in advance to decide like what would be the best number of failing to provide for each application. Now one other thing that's because that is that while you know our 2nd splash type of applications are popular in the community because they are in benchmarks that are open sourced and many people use them and they're not the steadily very nobody intensive. [00:29:08] And you know the they were in its original design for the purpose of creating a come in and that's work either and so we look at other applications quite clearly graph applications which is emerging in a number of different fields today. Show and you know we basically used also in Nashville when tracksuits which is a high performance. [00:29:29] Had problems computing applications and we increased the amount of bandwidth available in the system and we saw that actually fraction of the time spent memory at a drop substantially because applications that Irish communication intensive though are these are profound locations if you were to play a technique like waves basically sell it's only the amount of. [00:29:53] Lumber that is needed the savings could be potentially higher so that was the observation and. So on and that's why we looked into graph applications in Arash paper earlier this year and for instance if you look at the graph applications out of bigger end here the figures showing normalized and our. [00:30:14] Comp it sure looks it again having always a real mix activated and there's a laser power the electrical optical conversion power as we also compute and the thermal to power and there are 3 different applications for instance breath research etc These are huge applications and then there are different data sets because the bending of the data sets its results and so one observational risk here is that there will tuning power compared to the plot I was showing with the price against question and Frank said here is a much larger fraction of the order of the hour. [00:30:49] And then you know for smaller. Graph you can actually secure it with lower number and essay power is your system is going through the full graph sizes you can adjust this and overall we reached over 35 percent savings in empowers old efficacious dog so this is important because whenever we look into a minute or system that is like hundreds of horse minutes of lead simple with a new technology it's also important to think about like what applications are going to benefit from this type of system or whatever patients real people need to run on these systems in the near future and applications are really an all important in many domains today as I was saying from important search to some of the ai applications and such I saw it was promising to see in this results Now one other thing is that while.