[00:00:10]
>> Ok great so as Mary was so kind introduced I am a research scientist at g.t. or I. My research interest as a Ph d. student involves software transformation and specifically how we can apply those to cybersecurity So today I'm going to talk about a topic that I've been doing some research on over the about the last 6 months on how compiler based performance optimizations impact security.

[00:00:35]
So it's a bit of an introduction when modern compilers are built designed they're really focused on 2 goals they want to make sure that they accurately and correctly translate the source code into machine code in a way that preserves the functional semantics and they want the code to be performance they want to produce the most optimal version of the binary they can within a given execution time for compilation to make sure that every other time this program is executed it runs as fast as possible so one thing is notably absent from this list and that cybersecurity So as a result over the last few years there's been some research that has highlighted a number of security weaknesses that are actually being introduced by compilers.

[00:01:14]
So in the sock I'm going to explore some recent research on this topic and then also present the results of a study that we recently did on how. Compiler optimizations impact code reuse attacks. So 1st we go through a little bit of background just for anybody who's not terribly familiar with how compilers work so compilers work in multiple stages and the 1st stage at the very top is commonly referred to as the front end let's see if I get this thing to work here the front end which is roughly the top 5 blocks this is where the compiler will take the source code will perform parsing and semantic checks on things like type checking and prepare data structures necessary to lower the source code to what's called intermediate representation or i.r..

[00:02:01]
When the code is in this i.r. form it looks pretty similar to assembly Clode if you've ever seen that and the reason why it's lowered to an i.r. is lowered specifically to an i.r. That's machine independent so this code is not designed to actually run on an arm or an x. $86.00 processor it's intentionally left machine independent because compilers in the middle and the middle 3 boxes will perform a series of software analyses and software transformations to try and improve the performance of that code.

[00:02:29]
Finally in the back end of the compiler which is the bottom 2 box the intermediate representation which is already fairly close to machine code but not quite is fed to a code generator that translates it for a specific platform and then produces the binary executable. So let's talk about what some of these different transformations might look like and this is a relatively contrived but still useful example.

[00:02:54]
So let's take a look at this function foo in the leftmost box it will notice that it defines a few variables a b. c. Indy performs a check on variable c. performs an action on variable d. based on that conditional check and then returns the result. So a common compiler optimization called Constant propagation can significantly simplify and reduce the size of this code so what it will do is it will notice that values are that variables a b. and d. are statically or statically provided meaning they're known at compile time and it can track the definition of that variable to all of its uses and identify opportunities to replace the use of this variable with the constant itself.

[00:03:37]
So as we can see here in the 2nd block what's happened is rather than set c. equal to one plus b. it's just rewritten that to be one plus 2 It's replace the value of the with or replace the variable b. with the value of it at that time which is 2 it's also made a couple of replacements in the other code as well but I'm trying not to get too deep into this.

[00:03:57]
So in the 3rd block will now see that in function foo we've now replaced the variable c. with a static evaluation of the expression one plus 2 there is no reason to pass this computation to the actual binary computed every single time the program is run because we can do this now while we're compiling the code and similarly we're actually able to rewrite now that c. has become a constant were able to propagate that constant forward so now the conditional check becomes static and can be computed at compile time 3 greater than 0 which ultimately allows us to trim up and eliminate the conditional branch.

[00:04:29]
And ultimately rewrite this function as a single statement which reflects the fact that these individual constants have been propagated through so the actual removal of code is actually another type of software optimization called dead code elimination and it commonly feeds off of information provided to it by constant propagation passes so there's a pretty obvious and clear benefit in this contrived example to use these optimizations we both reduce the amount of code that we need to represent in the actual binary and we also significantly speed up to speed up the code we don't have to perform any memory operations to put values in memory We're simply now performing a single expression for so performing a single calculation on the on the input parameter.

[00:05:14]
So modern compilers implement dozens of these types of optimizations at varying levels of complexity So to simplify the process for the user of selecting which optimizations to execute the compilers will expose different often ization levels single flags that you can select that correspond to groups of these optimizations.

[00:05:33]
For example g.c.c. defines their optimisation levels as. At optimization levels there performs no off them is ations this is actually the default which recognises the fact that. People use compilers the most are people actually writing the code and they're going to build the code many many many times in order to test it make sure that it actually works so rather than spend a bunch of time increase the compilation time to optimize the code it just spits out code that is semantically correct but it is and is then testable.

[00:06:03]
When this code is actually moved to production typically one of the optimization levels 010203 are selected optimisation level one enables optimizations that take relatively little incremental computational time stuff like constant propagation of dead total in the nation. Optimization level 2 enables a significant number of additional often is ations in addition to level ones often as ations but it also.

[00:06:27]
These optimizations added in Korea that it adds are known to increase compilation time by quite a bit by a significant amount something that you would notice but they don't enable optimizations that perform a we call space and speed tradeoffs and really what that is it's a fancy way of saying there's a number of ways you can improve code by duplicating it and by making specialized versions of the code but they increase your code size of the cost of speed as opposed to doing both like we saw in the previous example and optimization level $3.00 is simply optimization level to what those trade off often is ations added in.

[00:07:02]
So as I mentioned before there's a significant amount of research that goes into compilers and proving that they're proving that they're collect the correct Typically this is done very formally The problem is. Research has shown that there is a number of security weaknesses that get added into programs as a result of the compiler because the compiler is focused on the program model not the actual machine model.

[00:07:27]
There's been recent work by e.g. to Sylvanus co-authors that have studied this situation and they they call this the correctness security gap and ultimately compilers are going to get a lot smarter about how they model the machine rather than They rather than how they model the actual program it's compiling to be able to understand in a text when these guarantees have been violated.

[00:07:47]
So let's take a look at some examples of these. Security weaknesses they typically fall into one of 3 categories and that is the elimination of security or any code leakage of information and the introduction of side channels. Are those 1st take a look at the elimination of security or any code let's take a look at the security sensitive function cryptically named crypt.

[00:08:09]
And we'll see that I received the key probably through or we can assume it's through some secure channel it doesn't work with this key but at the very end of the function it doesn't want to keep this key in memory so it overwrites it with 0 zip 0 Why is this particular portion of memory Well the problem is there's a compiler optimization called Dead store elimination which specifically looks for memory writes like these and specifically looks for memory writes that are unnecessary so programmers sometimes will declare variables or they will declare they will fill portions of memory with values that they end up never accessing so did store elimination looks for writes to memory that don't have a corresponding read of that value later and because the rest of the program doesn't need that value we can eliminate the right in the 1st place while the problem is that we're trying to sanitize that memory we've now actually removed that sanitization this actually also is an example of the 2nd category of information leakage because now in addition to removing this code we've now also left ourselves open for the potential for this key to be disclosed and used to circumvent the security control by another vulnerability such as the memory disclosure vulnerability.

[00:09:18]
So now let's talk about side channel introduction this function another security sensitive function will take a look at the left hand version or the left side 1st is specifically designed with a number of number of redundant computations and that's to ensure that regardless of which branch we take at the if at the conditional branch we perform the same amount of computation and the reason why is we don't want to attack or who can monitor the execution is programmed to know that we've taken one branch or another for the script logical function because it's performing more computations as opposed to another so we've intentionally obscured this side channel with a number of or duck dunnit computations in the else block Well a very common compiler optimization called commons of expression elimination specifically looks for repeated computations that it can consolidate into a single calculation store that of a temporary value and then.

[00:10:09]
Replace redundant instances of that computation with that temporary value so ultimately our attempts to obscure this side channel will be undone by the compiler will now perform an operation that now changes the time and the computation time required for the f. in the else block ultimately resulting in the reintroduction of a side channel we try to prevent.

[00:10:31]
So these particular weaknesses are fairly insidious because they typically involve things that we've done in source code specifically to be secure that have now actually resulted in security weaknesses they've been undone or they've been introduced. So with that. Based on some prior work that I did back in 2017.

[00:10:53]
Specifically with the impact of software transformations on code reuse attacks and let me to wonder is there a 4th category of security weaknesses being implemented by compilers specifically are they making us more susceptible to code reuse attacks so before we get too far into the study that I conducted on that very topic I'm going to provide a very brief overview of code reuse attacks and what they look like so we can all have some common terminology.

[00:11:18]
The 1st paper that proposed a gadget based code reuse attack called return oriented programming was originally proposed in 2007 and the way these particular attacks work is they circumvent defenses that are designed of that are designed to detect when an attacker is inserted their own shell code and tries to redirect the execution of that shell code so the attacker will then so the attacker to interrupt attack will.

[00:11:43]
Rather than injecting their own shellcode they will inject a number of addresses that correspond to a chain of snippets of the program that's been compromised each of these snippets ends in a return instruction which allows the attacker to hijack control flow and directed from gadget to gadget ultimately in each gadget.

[00:12:02]
A number of instructions will perceive the return instruction and they will be used by the attacker to accomplish some sort of computational task for example it will add 2 numbers together at a load of value in the memory and as we were 1st to hear of these instructions.

[00:12:15]
After the gadget is executed it reaches the return instruction it redirects control back to the stack which has been overwritten to now point to the next gadget so the attacker essentially uses snippets of existing code within the program to reconstruct their malicious exploits using only code that's already been provided it's already been marked executable so this is particularly difficult to defend and detect so as a consequence there's been significant research on this topic in fact this original papers been cited over 1300 times and it's largely been cited by authors who are developing increasingly complex versions of this attack or increasingly complex defenses against these new types of attacks.

[00:12:57]
So this type of attack is the subject of a significant amount of research one particularly piece of one particularly interesting piece of research is the advent of jump oriented programming so a number of the defenses that were originally designed to stop proper tax focused on protecting the stack.

[00:13:13]
So attacker started developing methods to avoid using the stack for control flow which include this type of technology and for good programming. So rather than use gadgets that in return instructions we use gadgets that end in direct jump or indirect call instructions now because we can't rely on the stack to perform the militias control flow for us we have to rely on what's called a special purpose gadget in this case the dispatcher block on the left the dispatcher is a very special kind of gadget that's still found within existing program snippets but essentially it can be used to maintain a virtual instruction pointer to our list of gadgets and then can then sequence them for for execution and rather than and the purpose of the terminating instructions in direct jump in the indirect calls is to.

[00:13:59]
Redirect control flow back to the dispatcher to execute the next the next gadget in the chain. This is been further revised into another derivative called caller and in programming which essentially is Jop but only uses indirect calls it's a little bit more complicated significantly more difficult to use but it also circumvents a number of defenses that originally designed for Jump oriented programming.

[00:14:24]
So gadgets that come in a number of different flavors and I'm going to reference some of these later in the talks I want to be able to I want to talk about these very briefly. So there is 2 there are 2 alternate purposes of gadgets they are either functional which means they perform a computational task like adding numbers together loading a value in memory or their special purpose gadgets special purpose gadgets the criteria to use a gadget has a special purpose gadgets very high but they perform specialized functions that are useful in actually constructing or scaffolding these attacks for example the dispatcher is a very specific type of gadget most programs only have one or 2 of these types of gadgets available but they're also very important that a dispatcher you can't create a job attack.

[00:15:08]
So are tacklers they find these gadgets by performing static analysis on binary and looking at looking for these terminating instructions they look for returns and direct calls and indirect jumps and more importantly they actually don't even have to find these instructions in place by the compiler they can just find any sequence of bytes in the program that correspond or that will be decoded as these instructions so instruction or gadgets that can be created from unintended or non compiler place instructions are called unintended gadgets and it turns out there's a significant amount of computational power in these unintended gadgets.

[00:15:43]
So for example when a jump instruction that is a direct jump to a specific address or a specific offset in the binary is encoded at that particular address by a based on the layout of that program just happens to have the hex byte see 3 in it that see 3 corresponds to return instruction and then all the bytes preceeding it can then be potentially interpreted as a as a gadget so take a look at a simple example of this the top block in this diagram represents an actual gadget that was actually placed by the compiler if instead of interpreted the sequence of bytes at the starting bite for a we start interpreting it a 3 we get a subtly different gadget that performs the operation on the s.p. register vs the r.s.t. register not terribly different but if we chop 2 more bytes off we certain terp reading at the by with the value 18 we actually end up with a completely different gadget that performs complement subtraction on a memory location so.

[00:16:38]
Unintended gadgets are actually a source of a significant amount of. Security weakness when it comes to raw Punjab and cop attacks because they are and they are in place intentionally which means we can't target them intentionally. So another we have a little bit of understanding about code reuse attacks and compiler optimizations talked a little bit about our motivation for this work behind this so back in 2017 I started a research project into software to bloating which is another type of software transformation where we try to find unnecessary parts of programs and remove them and the process of conducting this research we found that even very small changes are moving relatively small amounts of programs that are considered bloat can result in these huge drastic changes to population of code reuse gadgets and the resulting binary even worse we found that in many cases.

[00:17:32]
By certain measures the quality of these gadgets and the devoted code is actually better than the code we had to start with and this is largely because new gadgets can actually be introduced as a result or as a consequence of software transformation So this led us to ask the question.

[00:17:50]
Compilers are doing software transformation on virtually every piece of every piece of software that runs that runs through a compiler so what effect do they have on code reuse gadget populations is it as bad as software loading is it less bad and ultimately we set out to answer 3 primary questions to what degree do these compiler optimizations introduce new gadgets if they're introducing large amounts of new gadgets then there's a pretty high potential that they're going introduce gadgets that are more useful than the ones they replace or they might just be introduced as as a consequence of code duplication and provide new capabilities to be attacked her.

[00:18:26]
To we want to know to what degree are these compiler optimizations negatively impacting the quality and counts of gadgets within these binary s. and 3 specifically just as we saw with the previous examples in the previous security weaknesses we want to know are there specific optimizations that are problematic and can we fix them.

[00:18:45]
So that we conducted a study in a relatively top down matter manner of 2 different production compilers g.c.c. and playing and the way we conducted the studies we built a number of an optimized and optimized variance of 21 different programs and we analyzed them to see how the optimized code how the gadgets president optimized code differ from the gadgets present in an optimized code.

[00:19:10]
We also drilled pretty deep to look at the effects of individual optimizations and we actually generated over $900.00 different program variants that isolate the effective single individual optimizations to find out once again are there any of these problematic or specific troublemaker optimizations that we can fix. So what metrics did we use to measure these variance Well our previous work on software bloating we built a tool that we call g.s.a. that performs static analysis on binary looks at the gadget populations and calculates a number of a number of useful metrics The 1st is the gadget introduction rate which tells us and the optimized binary What is the percentage of gadgets that are newly introduced meaning they are present in the optimized binary.

[00:19:56]
Second we also calculate functional gadgets that expressive 80 so for functional gadgets the set of functional gadgets we have dictates the computational power that an attacker can use to construct and exploit the computational power if the set of gadgets only lets you add multiply divide and subtract that is significantly less computational power than a gadgets that will let you add multiply subtracted vied and perform conditional branching and make function calls etc so.

[00:20:23]
We use a measure that takes a look at the different types of gadgets available determines what kind of operations they can perform and then determines whether or not. The collective set of gadgets meets a bar of expressive ity or a bar of computational power and 3rd we looked at special purpose gadget availability there are 10 different types of special purpose gadgets that we look for and the way we calculate this metric is we simply look within the.

[00:20:49]
Set of gadgets within the binary and if it has at least one gadget that can be used for that purpose we consider that category satisfied so what we're measuring here is how many different categories of special purpose gadgets are available out of that total of 10 and what this lets us understand about the set of gadgets is what kind of attacks can we launch if we don't have the gadgets necessary for a job attack then I don't really have to worry about the job gadgets because somebody can't connect them together but if they do have all the gadgets necessary to perform a color in a programming attack well then maybe I need to look at the call oriented gadgets more closely.

[00:21:23]
Right so let's take a look at some of our results. So specifically our prior results of software to bloating indicated that very small transformations can have drastic effects on the gadget populations pacifically we found that when aggressively deep loading software up to about 95 percent of the gadgets can be newly introduced in the in the resulting devoted variant and in about a minimum was about 65 percent So we want to know how compiler optimizations.

[00:21:52]
Impact that as well so. It's difficult to predict what we expect to see here is a good number of compiler optimizations will actually remove code as we saw in our 1st example we went from you know roughly 8 or 9 instructions down to a single instruction in function foo but there are also specific optimizations such as space speed optimizations that will intentionally duplicate code potentially introducing more gadgets.

[00:22:17]
Right so what is a stable tell us 1st is stable tells us that I'm bad at data visualization and I need to work on that but this data right here tells us the rate at which gadgets are introduced on the left hand side are the 21 different benchmarks there are 2 sets of columns one for each compiler on the left most column in each that we have the optimized code that's the count of gadgets available within that particular variant and then there are 3 corresponding columns corresponding to 102 and 03 optimisation levels the data in those columns and those particular cells indicates the number of gadgets and then also the percentage of those gadgets that are newly introduced.

[00:22:54]
So before everybody strains their eyes and tries to read this very difficult to read table will go and move on we're all just kind of summarize this ultimately what we found is that. Compiler optimizations they almost completely change the set of gadgets available versus their own optimized variant the smallest observed gadget introduction we saw was 68.5 percent and the vast majority of these variants introduced gadgets or have a gadget introduction rates greater than 85 percent So this indicates to us that we really do need to look at the qualitative metrics we need to understand how these changes are impacting impacting the gadgets that because a huge number of gadgets are actually being changed.

[00:23:36]
Some key observations we found with respect to the specific counts if you're interested on how one compiler did relative to another client typically produces an optimized binary as with far fewer gadgets but when it optimize the code it typically ends up with code that has more gadgets as a result Interestingly the reverse is actually true for g.c.c..

[00:23:55]
G.c. sees an optimized code typically has more gadgets but as they optimize the code they typically get rid of a number a good number of these if you're looking specifically at gadget count playing still generally wins out but it's an interesting to know how these compilers work compared to an optimized code versus optimized code and also it tells us that there are significant differences in the implementation of the optimization the optimization and the code generation schemes that are being used in each compiler.

[00:24:25]
So move on now to the results of our course grained analysis and what we mean by course screen analysis is we looked at the qualitative metrics that we described earlier at the different autumn ization levels rather than try to isolate individual optimizations we're looking at what are the changes that occur simply by selecting a different flag go one through 3.

[00:24:46]
So once again difficult to read tables but I made this one slightly easier by highlighting in green the instances where the data indicates that optimizing the code has actually improved the quality of the gadgets and by improved I mean improved for the person writing the code up to not be exploited not improved in terms of the exploiter looking for more expressive or more special purpose gadgets available so what we found is that really only one of our benchmarks did optimizing the code reduce the expressive power meaning it got rid of some gadgets that allowed it to do more expressive actions and the other 20 optimizing the code generally increased the expressivity it produced code that was more expressive more useful for producing for producing more sophisticated types of exploits with just existing snippets of code.

[00:25:35]
The special purpose gadgets were slightly different as you can see almost half of the time. Optimization removed certain categories of special purpose gadgets and the other half the time in general didn't really change the population so optimization at least at least for g.c.c. in this particular case. Doesn't impact the availability of special purpose gadgets nearly as much as it impacts the expressivity these gadgets.

[00:26:02]
We found actually pretty similar results for climbing but if you recall it generally introduces new gadgets as it often Mises So we found that its benefits for functional set expressivity were significantly worse it tends to because it introduces more gadgets it tends to have a higher increases in expressive power after optimization versus before however the presence of special purpose gadgets is about the same.

[00:26:27]
So to cover a couple of other key observations to make here in the vast majority of cases optimized code is across both compilers optimized code is usually more expressive than an optimized code. The increases in expressivity are also nonlinear This means that we are getting worse results as we increase the optimization level there are a number of cases where well optimize that one and we might actually reduce the expressive power but then we optimize it 03 which includes all of the ones optimizations and we have worse results and vice versa there are times where performing less often is ations causes negative results and performing more optimizations ends up causing positive results.

[00:27:05]
Which is somewhat unexpected. And ultimately what these observations mean to us in aggregate is that we can't really make assessments on compiler optimizations and security at this level we're going to need to drill down to the individual often ization to see which ones are actually causing issues and which ones aren't.

[00:27:25]
So this brings us to our fine grained qualitative analysis so this is where we generated a number of variants that isolate individual optimization. Yes and observe how they have actually impacted the gadgets that. So ultimately the unpredictability that we experience in the course screen analysis is largely due to the presence of unintended gadgets so we're hoping that with this court with this fine grain analysis will be able to eliminate a lot of these bystander effects and focus on specific effects caused by individual optimizations.

[00:27:56]
You know what how about we just talk about this instead of looking at this once again I'm bad at data visualization and I'm in dire need of help. So interestingly for the vast majority of our isolated optimizations the results actually looked really similar to the core screen analysis a good number of this is actually observed across both compilers the across all different types of optimizations and across all benchmarks ultimately what this indicates is that there is a significant amount of noise associated with software transformation and this is also consistent with our results in software to bloating it turns out that by churning the code to tie their optimizer do blow to really whatever we're doing we're causing a significant amount of new gadgets to be introduced and these new gadgets they have some latent probability of either increasing or decreasing the expressive power or the availability of special purpose gadgets now typically.

[00:28:53]
Typically this probability weights more towards the negative side it's far more likely that the introduction of new gadgets will cause negative effects either increasing expressive power or introducing new cars or new categories of special purpose gadgets that it will go on the other direction so ultimately we've answered one of our key motivating questions here and it's that compiler optimizations when they often use the code they generally make it more conducive for attackers who are trying to make construct a proper job for carpets x. So this begs the question is there signal in this noise can we find certain optimizations where we can make an impact and also is there a solution for the noise can we find some way to make the introduction of new gadgets more predict.

[00:29:37]
So to answer these questions we try to identify some of the root causes for these transformations we did this by using. A common software disassembly tool idea and a common plugin called been deaf to compare individual variants where we observed interesting results specifically we looked at all of our results the Zinah different variants and we noted a few outliers.

[00:29:59]
And these outliers we have termed the dirty half dozen transformations specifically within g.c.c. we found that frame pointer omission into procedural function cloning shrink wrapping people off them as ations and comments of expression eliminated base jump following cause a greater than normal share of introduction of special purpose gadgets or increasing the expressive power of the gadgets and overall and also with Klein we noted one additional optimization called tail call in the nation.

[00:30:31]
So taking a look at what's causing the noise we noted that in general the churn that occurs when we perform software transformation introduces a lot of gadgets but specifically it introduces a lot of small gadgets gadgets that are typically less than 4 bytes and the reason why that 4 by boundary is important is because it's the size in x. 86 of an offset within the binary that is the target of jumper call instructions so as the program layout is changed or modified the more change or modified the more likely it is that we're going to accidentally encode a piece of data in the program that will correspond to a useful gadget i.e. a gadget that ends in a return instruction if it contains c 3 or a gadget such as special purpose gadgets like system call gadgets which we see here in the bottom left hand corner.

[00:31:19]
Additionally we see other types of data that encode these types of instructions and ultimately reduce the ultimate Lee result in the introduction of new gadgets such as constants and code header information can be jumped to and executed and the stuff is particularly insidious So we'll get to some interesting solutions for these a bit later but a 2nd root cause that we had in a 5 was the duplication of gadget producing instructions.

[00:31:47]
So I know this is going to difficult to read but this is actually an example left and right of what the frame pointer omission optimization does does to code. So the frame pointer. A good number. Of small functions don't actually need to use the frame pointer so this means that the operations that the compiler inserts preserve the frame pointer at the begin at the beginning of the function and then restore the frame pointer at the end are really unnecessary So what this optimization does is it looks for opportunities to get rid of these extra instructions and increase the execution speed of the program but what we notice is that in some cases pacifically when this operation that this operation that undoes the work done of the beginning of the program has multiple predecessors they return instruction actually gets duplicated it gets moved up to the end of each of these individual these individual basic blocks and that's to avoid having to one in code in to execute the direct jump that we see on the original code so from a performance perspective we've done great work we've eliminated 2 instructions in the function prelude they're no longer present here we've eliminated the pop instruction in the function of the function cleanup process but from a gadget perspective we've now introduced another return instruction which now is its own set of gadgets that are associated with both intended gadgets and unintended gadgets and like I said before when we introduce gadgets the late probability is on the side of making gadgets that expressive any more expressive or introducing new types of special purpose gadgets as opposed to getting rid of them.

[00:33:18]
So in this particular case we can actually. Observe that by being a little too greedy with the optimization by trying to get rid of that one extra jump instruction we've actually increased the likelihood that will introduce gadgets that are useful to an attacker. And finally. One of our last root causes we had in a fight is that sometimes some of these special purpose gadgets are actually optimizations in and of themselves specifically for tail call in the nation for for jobs or for specifically for tail call in the nation which attempts to replace a return instruction at the end of a function with a direct or the jump to the next function that's going to be called This occurs when the last instruction in a particular function is a call to another function.

[00:34:04]
Essentially what this optimization does that tries to preserve that stack frame to avoid the overhead of having to build a new one so rewrites the instruction rather than executing this last basic blocks that performs cleanup and then performs a return which then would allow the stack to perform the next function call and will actually end every writing the instruction to perform a jump instead of return ultimately this gets rid of some of the return oriented programming gadgets but it will introduce new programming gadgets and specifically one that into your one that it introduces which is an indirect jump followed by a number of pop instructions is actually the exact criteria that is necessary to use the job data loader gadget so this puts us in a bit of a bind we're specifically using a special purpose gadget to achieve a performance speed up so there's really not a great way to get around this there's nothing we can do to eliminate this particular gadget introduction other than just disabling other than disabling the the optimization altogether.

[00:35:06]
So now that I've hit you with the doom and gloom Let's go to some potential solutions for these problems. So with respect to the noise the. Types of gadgets that we're introducing just by simply turning the code around by transforming it it turns out the majority of these gadgets that are introduced they come from unintended gadgets so if we have a good solution for disabling unintended gadgets we can actually get rid of a lot of this noise but that's something we're going to implement in the compiler which will take a significant out of compile time to do now fortunately a number of static transformation techniques have already been proposed in other work and the.

[00:35:43]
In the framing of Iraq defense so previous work. Proposed by a reference number 11 it's a tool called g free which takes a binary and then performs binary transformations to try and get rid of some of these unintended gadgets it looks for a number of different situations and this is a particularly simple one here and this example the compiler inserts an instruction that adds the constant value c. to to the x. register while c. to as a constant also and codes return instruction so by having this constant in our code it's going to produce gadgets so we can perform a transformation that will preserve the original instruction but we Dekker meant that constant value by one so that it no longer statically indicates a return instruction and then we perform an additional step to increment the value in that register so we've done is we've eliminated the presence of this c. to buy an r. and r. program are static binary which is ultimately gotten rid of a number of get rid of a number of gadgets there are a lot of other transformations techniques that were proposed in this paper.

[00:36:51]
And there are much more complex and ultimately a significant number of them trade code size meaning they have to insert other instructions to try and perform these perform or to get rid of these unintended gadgets as we see here so. A lot of the techniques they propose they are they are overeager because they're working with the binary not actually working in the compiler So one possible solution we have for the noise is actually performing transformation passes on the code before it is written to the binary when it's machine specific if you recall the way most compilers are designed is they avoid having to deal with the intricacies of specific I say this by performing all these optimizations on machine independent i.r. So if we use that same format and we move that down we also perform transformation passes we can actually get a lot of these gadgets if we work with the machine dependent code so that's an interesting area future research that we'll be getting into following following the study.

[00:37:50]
So some of the other solutions are actually you know relatively straightforward specifically with respect to our case where we duplicated a number of gadget producing instructions we're going to patch this this we're going to patch is particular optimization make it a little less aggressive So the point of this often ization is to get rid of these instructions that I've marked in red these are the ones that are performing unnecessary functions.

[00:38:12]
As a result of over aggressive optimisation We also try to get rid of this jump instruction by copying this return so what if we just leave the jump instruction in we still get the vast majority of the benefit for execution time and code size we eliminated 3 instructions but we did leave that 4th instruction in and we avoid duplicating that return.

[00:38:30]
And in fact there's actually it's actually interesting to note that claim. Uses a different cleanup process in the final basic block of most of their functions that isn't susceptible to this type of issue it occurs mostly in g.c.c. McLane also provides a specific optimization called merge returns which specifically does this will find instances where functions have multiple exit points and he uses direct jumps to consolidate it down to a single return instruction what it does is it's really doing that in order to make the exit point of a program more predictable which enables other types of defenses and other types of optimizations but it turns out it actually also can be used to reduce the number of returns in a program and ultimately the number of gadget producing instructions.

[00:39:14]
We talk about special purpose gadgets that actually end up being optimizations in and of themselves this is a little bit more difficult to discuss. As we saw with some of the other security weaknesses there's really not much you can do there either in fact the prevailing advice with respect to dead store elimination which is that compiler optimization that eliminates or standardization code is generally just if you're working with crypto logical functions you make sure you turn that optimization you for go the performance benefits that you'll get by running it but you don't have to worry about it getting rid of something you're getting rid of some of your sanitization code by the time it makes the binary we have the same option here depending on what type of program work compiling and depending on how often tail call or tail calls appear within that program it might be more beneficial just to leave this autumn ization turned off and not have to worry about introducing potential I mean to worry about potentially introducing data later gadgets into the code so this is specifically useful if tail calls are only.

[00:40:12]
Perform early being performed on cold code that sonic succeeded very frequently but it's also fairly likely that cell calls will be used quite frequently within the code and potentially on hot code so it's really a program by program decision whether or not the benefits outweigh the costs. So before we before we close and take questions I want to provide a few notes on performance implications of these solutions.

[00:40:37]
There's no such thing as a free lunch just as you have had to sit through this presentation in order to receive your slices of pizza there is a downside that comes all these solutions so they mention before performing static transformation to try and get rid of some these unintended gadgets.

[00:40:54]
In their original paper entitled The free they noticed that they had an increase on the order of about 30 percent code size so if you have you know a 3 gigabyte executable it's going to be 4 gigabytes by the time you get done trying to get rid of these unintended gadgets which That's admittedly a contrived example but on a bed of platforms where space is limited entry increasing the code size by that much might be the difference between being able to employ this defense versus not being able to.

[00:41:17]
Do the so one of the interesting areas here or one of the interesting things to note here is they performed this type of optimization this type of defense they did is outside of the com Pilar they did do this on binary is it already been written which limits their techniques because then they have to worry about the problem of making sure that the binary layout is still valid So if we move some of these techniques into the actual compiler in the final and or in the back end of the compiler where it's still working with.

[00:41:45]
Machine code that it can relay out using existing code or using existing functions of the compiler we might actually be able to get this code size increase down a significant amount. So using other transformation techniques to merge gadget producing instructions like we saw with the frame pointer mission it still results in small increases to code size and execution time like you mentioned if we don't go to the jump instruction we have to execute it and the jump instruction takes more space to encode than the return.

[00:42:12]
Struction that replaced it so we will have to sacrifice some of our benefits but in this case we can kind of view this is a sliding scale where on one end we have code that is less likely to have security vulnerabilities or security weaknesses in it or are strong gadgets an optimized code and on the other end we have the optimized code that does introduce some of these new gadgets maybe we just back the where we land on that spectrum back a little bit were a little bit less aggressive with certain types of optimizations that introduce gadgets and that we get the best of both worlds.

[00:42:43]
And we were to talk about the performance implications of disabling specific optimizations they really are program dependent and there's it's really difficult to make it's really difficult to make one decision or another statically for all programs the entire class of programs which one is going to be more and more beneficial.

[00:43:00]
So with that the things I really hope you take away from this talk despite being formally proven correct with respect to functional semantics there's still exists a serious gap between compilers compilers function and the security properties of the programs that it produces. So if you're implementing security sensitive code or security controls in your program it's really important for you to consider how these compiler optimizations might unintentionally introduce weaknesses the reason why this is important is because it's actually up for significant debate within the compiler community whether these things that we've showed today actually constitute bugs there's a good number of people within the compiler community that that will say that because the compiler was never designed to satisfy security criteria like these that it's not actually a bug when it violates them so it's not an error so it's very reasonable to say that this problem is not likely to be fixed by the standard compiler community anytime soon because a good number of them are resistant to the idea that this is even a bug or a problem in the 1st place their official solution for this is what I propose here it's important for the programmer to know what the compiler is going to do to their code before it reaches the binary.

[00:44:10]
So. 3rd take away compiler optimizations in addition to introducing the security weaknesses I showed you before that have to do with actual functional semantics and also can potentially increase our susceptibility to code reuse attacks by introducing new gadgets that increase the functional expressive ity and the availability of special purpose gadgets ultimately making it potentially easier for an attacker to build and construct exploits so developing solution to these problems remains an open area of research but there's some good work for us to there's some there's a good foothold for us to start from to actually implementing compiler based defenses that can mitigate some of these problems.

[00:44:48]
Right with that we've got about 15 minutes for any questions you guys might have or last minute run on the pizza. Guess. So so right now there are very limited options available to the programmer when you're writing code in c. it's actually possible to specify inline assembly language and generally the compilers don't touch inline assembly when they're optimizing it because they recognize the fact that the compiler was very very incredibly deliberate about putting that code there in the 1st place so unless you want to start writing inline assembly there's not really great ways to do that but the potential exists to do a much better job of the software transformations in the compiler So for example when we have system call gadgets that accidentally get encoded in branch targets the way goes about correcting this is they introduce no ops into the code that try to push that offset off of.

[00:46:06]
Off of encoding that individual by now the problem is if it's a high order byte you could potential. We have to put up 2 including a megabyte worth of no up instructions into the code which is why they see such huge increases in code size but for doing this in the compiler we're still working with a malleable layout we can just swap the location of that basic block with any other basic block that's of the same size preferably on the same within the same page size of memory so what we've done now is we've replaced these offsets and we could potentially increase them by one or 2 if we look for something that's off by size and just a little bit maybe add a little byte padding and with a much lower.

[00:46:42]
When I was a much lower code size cost we can do a more efficient job which is why there's a strong argument to move some of these techniques actually into the compiler as opposed to post compiler passes yes. Yes. Yet for languages that support that you can do that as well other options are to perform a dummy read of the data in the perform some dummy operation on it to fool the compiler into thinking that value is both after sanitised read from memory than performed and used in some other useful computation but sometimes that can be difficult because if the compiler is really smart might figure out that all the stuff that you've added in the end is actually useless to get rid of that as well so so yes the volatile keyword is useful for languages that support it but there there do exist some programs in some compilers that don't support that so it's not as much of a problem these days but it still remains a problem for specifically embedded c. where using device specific compilers perform optimizations but don't support a lot of the standard c. plus plus features that come along in the last the last few years yes.

[00:48:09]
We're. All here. So for the low low cost of a system call you can now 0 I. Know you have at it though it's interesting to see the things that have actually been added to add into account for some of these problems. Yes. It's possible. It is possible but there is risk to be more specific to the actual code interpreter so we probably would see things that are generally applicable between Python and Java but yes it is possible that.

[00:49:14]
Some of these situations could exist there are code reuse style attacks against the j. v.m. that attempt to use existing pieces of the code existing pieces of byte code in reinterpret them in different ways but those are not as much the subject of current research because the memory model of c. is hopelessly broken and it's much easier fodder for attacks and offenses and and an interesting code is typically written in c. code that we want to exploit stuff that needs to run blazing fast like server stuff it's typically written in c. So so yes there's definitely research to be done in that area and it's definitely important thing to make the jump but we haven't quite got there yet yes you have to solve this problem 1st.

[00:49:56]
And other questions. Ok yes not right thanks everybody for coming to the doctor should.