Welcome to everyone in the room. Welcome to everyone watching remotely. It is my absolute pleasure to introduce our speaker for today. We have Zane my, he is a postdoc researcher here at Georgia Tech and the astrolabe slab. Just joining us from the University of Illinois. I'm going to let him take it away. This should be a really interesting talk. Thank you so much. All. Thanks, Brian. Yeah, Good afternoon, everyone. Thanks for coming to the talk today. As Brandon said, my name is a mine. I'm a post-doc research and the astrolabe lab. And today I'm going to be presenting some of my doctoral research on understanding the trust relationships of the web PKI. Though the web public-key infrastructure, also referred to as the PKI, is an authentication system that enables the TLS cryptographic protocol. And the TLS cryptographic protocol in turn secures most of the web services we use today, including online banking, social media, and e-mail. And as an authentication system. The wet PKI provides a scalable way for TLS clients, such as web browsers, to securely know which server they are communicating with. For example, when you visit Bank of America.com, the web PKI provides confidence that you're actually communicating with Bank of America.com and not an adversary who's impersonating Bank of America.com. But let's go through a brief overview of how the web PKI works for TLS server authentication. Though in a nutshell, the web PKI is a delegated authentication system with three parties, a client, a server, and Certificate Authority, also known as a CA. So the general idea here is that the eyes are a trusted third party who verify server identity on behalf of clients. And servers receive digital identity certificates from CAs and then present them to clients. But to be a little bit more specific, servers that wish to participate in the web PKA, I will first go through and identity verification process. With the CA. Though this is typically a request response protocol to prove two things. One, that the server is who they claim to be. For example, Bank of America.com. And second, that the server controls a particular set of cryptographic keys. Once the certificate authority has verified the server's identity, and cryptographic material issues a leaf certificate that links the server's identity, typically a domain name with the cryptographic key. The CA then signs that certificate with the CA's own cryptographic keys which are publicly represented by a CA certificate. Does this process so far is known as certificate issuance. Asynchronous to certificate issuing CAs will try to establish trust with TLS clients. The TLS clients here we'll evaluate CA trustworthiness based off of things such as the CA certificate issuance practices in general security behavior. And once a TLS client has decided to trust the a, they will then include that CA certificate in their store of trusted CAs, which is also called a root store or a trust anchor store. That now this two-step process of trusting CAs and deploying their trust anchors is called trust management, which as we will see, is the topic of today's talk. Now finally, when a web browser axis is a web server, the server will afford their certificate to the client in order to authenticate. The client will then check that the leaf certificate is signed by a trusted CA in its root store. And if so, the client uses the cryptographic keys and the certificate to set up an encrypted communication channel. This certificate exchange validation and subsequent data encryption is the TLS protocol. Pull. So to summarize, the web, PKI is composed of three components, press management, certificate issuance, and TLS. Over the last decade, security researchers have helped fortify the web PKI by answering some fundamental questions in each area. For TLS to the primary questions are, first, how secure is the TLS protocol and how well the software implement this. So prior work has found, you know, TLS protocol vulnerabilities and how formal methods to prove the security guarantees of TLS. In parallel. There's been software testing and analysis techniques that have uncovered and patch lots of issues arising from a wide range of TLS implementations. Certificate issuance processes are a little less algorithmic and less precisely specified than TLS. The core questions here take a more inductive approach to understand who's issuing certificates to who and how well does this issuance work overall? Again, initial research in this area has identified key ecosystem participants and evaluated their behavior. Subsequent research has then investigated. Vulnerable aspects of issuance practices and ultimately found ways to move towards more precise and automated certificate to show it's. Now finally, looking at this last piece, trust management is the least well-specified part of the web PKA. There's no standard that defines how a TLS client make fast decisions. Instead, each restore, kind of this free to trust whoever they want. For TLS server authentication does similar to certificate issuance stress management here requires an inductive research approach that first understand the composition and behavior of today's trust management. Though unfortunately, existing research here has only skim the surface in this regard. With a couple of studies in 2014 looking at a sliver of the ecosystem. And a recent study in 2020 that looks at the alternative certificate chains that arise from cross signing intermediate see is that while these existing studies contribute to our understanding of trust management, no one has satisfactorily evaluated which CAs are trusted by which the OS clients. There's kind of a basic question. But remember this is important because ultimately a single compromise root CA, can compromise the entire system and impersonate any domain on the Internet. So today's talk bridges this gap in knowledge by answering the following question. Who is trusted for TLS authentication? Who decides who is trusted and how well place or how secure is this trust? Now to explain how we went about answering these questions, we first need to further dissect the trust management process. The first additional detail in our kind of simplified overview here is that root stores are often loosely coupled to TLS clients. And as we will see, many of these clients rely on routes stores. Many kinds rely on Route stores provided by other systems or other software. The second wrinkle here is that not every route store provider negotiates with CAs to determine which ca is trusted. Instead, roots store programs establish a set of trusted CA certificates that are subsequently distributed to one or many root stores. Now that I've introduced a slightly more detailed version of trust management, I can outline today's talk which seeks to understand the trust relationships at each step of this trust management process. So I'll first demonstrate that there's an unreliable relationship between the names found in CA certificates and the actual CAs that operate them. And this mapping is opaque. We need to develop new techniques to overcome the challenge. Next and attract the provenance of root stores, starting with popular TLS clients, identifying their chosen routes or providers, and then determining their origin root store programs. And then finally, I'm going to look at deviations from this default happy path of root trust and utilize new techniques to identify TLS interceptors. These interceptor shortcut route store trust by injecting their own root certificates into root stores. So now by combining these three perspectives, we can provide a holistic view of the trust relationships in the web PKA and as a sneak peak or spoiler, today's talk will find a misbehaving root CA that's ultimately distrusted by major browsers. Already manage restores and hundreds of millions of men and the middle TLS connections. Though here's a quick outline of today's talk. They'll fire provided an overview and a little bit of motivation for why we need to understand stress management. First step to this understanding is identifying which CAs are represented by which ca certificates might be something you all assumed was easier obvious, but we'll show you why it's not. The next step traces the provenance of root store trust to investigate who's making CA, this trust decisions for whom. And then the final work for today's discussion, we'll identify the tiers, interceptors that circumvent this whole process by injecting their own trust anchors into root stores. And then finally, I'll conclude with a discussion, broader insights and opportunities for future work. All right. So let's, I guess we'll start at the top and begin by determining the mapping between CA certificate and the CAs that operate them. To give you an example of why the attribution of CA certificate operators is important. Let's look at the distrust of the CA. Symantec to Symantec was formerly the largest CA, and in 2018, it was just trusted by all major route stores as a result of numerous certificate issuance problems. Though. However, the process of actually this trusting BAM Tech was surprisingly difficult because believe it or not, it was not clear which trusted roots were actually operated by semantic. So as an example, here are two real root certificate subject names, only one of which is operated by semantic. Though, as you can probably see, he's root certificate names are nearly identical, but neither of them provide a clear link to Symantec. And as it turns out, despite they're nearly identical naming, the first route is operated by Komodo and the second root is operated by Symantec. Though. What do we learn from this little example? Here's two takeaways. First, TLS authentication occurs at the level of CAs, but it is enforced at the level of CA certificate. So there's a layer of indirection here. And second, there's no guarantees that the subject names in any of the CA certificates actually reflect the operator of the certificate. And unfortunately, there's there's no prior work addressing this opacity between CAs and CA certificate. So we have to devise our own methods. Though our approach here is to measure different operational aspects of the CA and then group CA certificate based on shared operational features. The first set of features are derived directly from certificates themselves. Though we first build a novel certificate fingerprinting technique that captures the issuance infrastructure. Other ca. And then we also extract URLs from certificates that represent the associated network infrastructure that a CA runs. Our third operational perspective comes from CA audit documents. And these are reports written by auditors who visit CA datacenters and verify the certificates under a CAs control. We parse these reports and automatically extract the CA certificates in the scope for each audit. That then finally, we combine these features to generate clusters of CA certificate under their operational control. The next step is okay, we have these clusters and we want to make some sense of them, so we need to label them. And for this step we utilize the common CA database, also known as CSI ADB, which is organized by Mozilla as a central place or CAs to provide their audit and policy document. So the SCC ADB has an owner field for each CA certificate, but this is more of an administrative owner and less of a operational owner. And it has an unreliable relationship with operational control. Though we can only use CCAP as a guide and we carefully apply CC ADB labels to these clusters. Do some sort of label correction and expansion. And we finally generate a best effort dataset of CA certificates and their operators. So here's kind of a fall overview of the pipeline I just described. And in the interests of time today, I won't discuss the details of each step or how we validated its accuracy. But I will instead kind of skip ahead to the, the results. But feel free to read the paper, ask me later if you have more questions. So what are some of the things we found in this new dataset? First, we found a CA certificate belonging to Multiservo that was improperly reported as being a camera firmus CA certificate. This contributed to the growing evidence of camera firm has an ability to run a CA. And as a result, in early 2021, Mozilla and Google completely remove camera firm, uh, from their roots stores, effectively ending there CA business. That can our data pipeline and male investigation identify 241 CA certificates with operators that did not match the CCAP be label. Again, this highlights the administrative control associated B, sorry, the administrative role of CC ADB, rather than its usefulness for determining operational control. And we also added new ca operator, the labels for 651 previously unlabeled CA certificate. And this greatly expands our view of the existing C at ecosystem. So as a result of our finding CCA to be actually has proposed improvements, their process to automate sub c, a consistency checking, and to add new CA certificate, ownership tracking. Finally, kind of taking a step back, this work improves CA transparency more broadly. This new transparency has already provided benefits to roots OR operators who are deciding which CAs to trust. And it can lead to more accurate research and attribution of CA issues do arise or so. Now that I've given you a synopsis of how we determined the operators of CA certificate. We will now move down the trust management chain to shed light on the relationships between CIS Certificates, reprograms, root providers, and the TLS software that relies on the web PKI. Though, today's web ecosystem consists of a diversity of TLS clients that are implemented by different TLS user agent software. And it is often unknown what route stores they rely on and who manages that route store. For example, let's imaginary, let's, let's look at an imaginary user named Alice. Let's say Alice is a little edgy and uses Firefox as their default web browser. But for online shopping, Alice uses Safari because it allows apple pear. And furthermore, let's say Alice has a slack for work communication. And because Alice works in tech, they often use curl through the terminal. Don't total Alice is using at least four different TLS user agents here. And it turns out that each of these TLS user agents has a different route store in differing trust and see is that they're exposed to potential attacks in many different ways. So as mentioned at the beginning of this talk, very little work has actually looked at the trust decisions that TLS user agents make. These decisions are important because trusting the wrong ca can have big consequences. For our research. We first begin by determining the roots store providers that different user agents rely on. And then we examine the relationship between restore providers to discover how route providers even choose which CAs to trust. And as a spoiler, they rely actually on a very odd number of route programs, which we then characterized by comparing their observed security behavior. And then finally, given that route storage providers often outsource CA, trust to route programs, we evaluate the trust copying properties of roots or providers to understand the consequences of this copying behavior. Now, the first challenge we account when determining your TLS user agent trust is the sheer number of TLS user agents, right? People normally think of browsers, but there's also API calls and all sorts of other software that is using TLS. And since we don't have kind of a scalable mechanism to enumerate TLS user agent route stores. We instead take a two-pronged approach. The first, we manually collect the most popular user agent restores seen at a global CDN. And then we supplement these with the best effort compilation of different TLS libraries and popular clients. Though. This approach, at the very least, we'll approximate most popular user agent route stores and tries to get kind of some of the longer tail as well. The downside here is that it's not complete and we do actually miss the very long tail of TLS user agents, which contains less visible root stores that are potentially poorly scrutinized and poorly secure. This is kind of an area of future work. Anyways, to be more concrete. So we looked at the top 200 user agents at a CDN and we manually collected. And we were able to collect about 83 percent of the top 200. We also expanded this to a couple dozen additional TS libraries and clients, some of which are shown here. And then in the next step, we manually traced each of these user agents and libraries to their default route store. And we find that these dozens of user agents rely on a condensed set, a root store providers. A lot of them are just the base operating system, but there are some exceptions. Exceptions include Mozilla as NSS, Java, and Node.js that provide their own roots door for TLS applications to use. This initial finding indicates that most TLS User Agent developers don't make their own trust decisions for TLS and instead they trust a root store that's provided by other software. Though the question is, how far does this trust delegation go, right to restore providers even make their own decisions about which CAs to trust. Though, in order to answer this question, we cluster the historical roots stores for each provider to see how they are related. Here we have an unsupervised metric, MDS clustering, Jaccard set distance between each route store snapshot. But I know that's a mouthful. So in other words, we take the pairwise Jaccard distance between each historical snapshot of each route store. And so this is kind of exists somewhere in high dimensional space. Then we display a two-dimensional representation that attempts to best preserve the distances between each point. From this figure, I'd like to highlight a few observations. First, we see four clusters, which a line width for route programs that we have Microsoft and assess Apple in Java. The kinda yellower points here represent newer root stores. And we see that even over the last 10 years, these four root programs haven't converged and they operate independently. The only cluster containing multiple root providers is the NSS cluster. The second from the left ear, which contains Linux distributions, Node.js and Android. I'll come back to this in a couple of slides. Now, looking back at the ecosystem diagram or second discoveries that restore trust condenses even further into this inverted pyramid structure that is based upon for independent route programs. And this diversity of TLS user agents actually hides the centralization of TLS authentication trust. That's hundreds of a user agent stress essentially reroute store programs. And I say three here because Java is rarely used by any of these TLS user agents. So given this kind of centralized critical importance of these route programs as the basis for almost all TLS authentication, trust on the web. We want to understand how these programs compare with regards to the security properties. Though, we first examine a few major ca, distrust events over the last decade. And we look at when these, this trusted roots are actually removed from different routes, stores, route programs. And I'm going to give some of the details for each case, but I wanted I do want to highlight the fact that Microsoft and Apple actually still continue to trust routes that all other programs I've decided to remove. Though, the reasons for this decision are opaque. But it could be the result of management carelessness or intentional contradiction to what the broader community has decided are on our pressed not trustworthy root certificates. Now, in addition to evaluating the removal of clearly objectionable CAs, we also look at the security hygiene of these different route stores comparing when they removed weak cryptography. And we look at two cases, 1024 bit RSA and MD5 signatures. And in both cases, Microsoft trails Apple and NSS. Java is the slowest to remove weak crypto, but again, that isn't really used by anyone. And so overall, we kinda find that Mozilla is typically the most responsive restore Program. And not only does Mozilla promptly remove unhygienic or untrusted TAs. It is actively it's actively maintaining hygiene by kind of pruning things that should be removed or things that are going to become insecure in the future. An additional comparative analysis. Yeah, that's it. So the question is, why is Mozilla kind of behaving better from our perspective than, than others. I think it's a complex answer. So part of it is one we don't have good visibility into why they're making these decisions. I'll kinda happens under the hood. But if we're to hypothesize as to why, you know, there's all sorts of business relationships between software providers and CAs. There's considerations for, hey, if we remove this CA, maybe some services will break. And we're willing to risk having a slightly insecure root CA compared to having a lot of these underlying services not work. Though there's trade-offs that are being made by some of these other programs. Yeah, but if we just look at it from a kind of security lens and less of a usability line. We can say that Mozilla is responding kind of better to these events than, than some of these other research. So yeah, additional comparative analysis finds that Microsoft actually has the largest and most permissive root store Program. And this often includes nation states super CAs that act as the sort of trust gateway for unknown other CAs. So basically once you trust the super CA, you're trusting who knows what else. And all these super serious, I've actually been rejected by NSS, an apple, and this is a sign that they run more restrictive group programs. But again, it comes back to this point. There's other external considerations around business agreements and all that kind of stuff. The lastly, we find that those NSS also runs the most transparent route program and they publicly detail every Ca inclusion and removal. They also require immunity discourse for any changes to the root store. On the other hand, Apple and Microsoft don't have this sorted. We don't have the visibility into what they're doing. So given this characteristic, sorry, this, given this characterization of independent route programs, it's perhaps unsurprising that Mozilla's program is actually the root, is the basis of all restore providers that copy their trust from a restore Program. On our data set, we find that Linux variants, Android, electron and Node.js all copy from Mozilla. Since you now kind of a majority of the roots for providers here actually opt not to run their own route program and instead copy Mozilla. The last thing we want to really look at is the implications of this trust copying. The first aspect of NSS derivative trust is to understand how frequently root providers copy NSS and how up-to-date they are. On the y-axis here we have NSS versions that introduce a root CA change, which can be a new inclusion or removal or partial distrust of some CA. And on the x-axis we have the date of that change. Does you can see NSS has made pretty regular changes to its trusted CAs over time that's maintaining its roots store. And as an example of update delay, we have Node.js is copying of NSS, though. Each check mark here represents a major Node.js version update. And two observations emerge from this figure. First, Node.js does not always update its roots store when making major version changes, as we can see in some of these flat-line tick marks. But updating NodeJS, but they're not up being the root store. Second, even when it does updated through its door, Node.js doesn't always update to the most recent NSS version. As you can see from some of the gaps between NSS, a Node.js graph, the auto-generate a metric that encapsulates the average state of roots or copying. We integrate the area between the NSS update line and the Node.js line. This yields the number of version days that Node.js lags behind NSS. We normalize this by the number of days here, and we find that on average, Node.js is about 2.5 versions behind NSF. Now when we look at all NSS derivative roots doors, we find that ubuntu, Debian actually update their roots stores more frequently, but there's still about two versions behind. And in the worst-case, Amazon links in our popular OS for cloud services is nearly five versions or almost a year behind NSS. Now, in the simplest explanation, these findings indicate a lax manual update process. But in the worst case, these findings indicate an intentional reluctance to update due to differences between NSS use cases and NSS derivative Use Cases. Who also. Now, in addition to understanding how frequently derivatives copy from an assessed, we also want to know how faithfully these derivatives copy, right? And we find out all NSS derivatives actually alter the set of trusted CAs found in Mozilla as reprogram, we wanted to understand why there is this alteration of trust. Let let's look at a few cases. In the interest of time. I'll just be discussing a couple of these. So one cause of this customized stress is the inability of derivative root stores to express partial distress, which is exacerbated, are exemplified by this semantic distrust of that. Though, NSS contains the ability to partially at this trust, the CA certificate and this partial distress was applied to semantic. Basically, they said, you know, all all certificates issued by Symantec after this date are distrusted. All certificates issued before our trusted. And because NSS derivatives can only fully trust or fully this trust to route, this partial distrust of semantic introduced a dilemma for them, Right? Do they only distress semantic roots and break some web services, or do they know bully continued to fully trust Symantec roots and riskier security. The most derivatives chose the latter option. Though they actually still trust semantic roots even though they removed in 2018. But as a side effect, they also miss the mother changes as well because they basically skipped to the update. Update Samantha, that remove semantic trust. In our investigation of deviant trust also has revealed application developer confusion. Though NSS only trust the certificates for two things, TLS authentication and email signing. But the CA certificates are included in route stores that are unaware of what these roots are trusted for. So they're unaware of the trust purpose. And this opens a door for CIS Certificates to be trusted by applications for other purposes, though, for example, when Debian Ubuntu distrusted Symantec roots, Microsoft.net manager. The NuGet stopped working because it relied on semantic roots, not for a server authentication, but for code signing and for time-stepping. Though, this is weird because NSS has never trusted any of these routes for these purposes. Though. Not only does this reflect a misuse, root misuse by applications, but to make things worse, Debian ubuntu actually reversed their distrust of Symantec in order to support this code signing use case. Though, this is kind of like a chaotic board. Anemia and in other words, you know, Debian Ubuntu re-introduced distrusted root in order to satisfy a use case that should never have existed in the first place. Who's clearly some progress that needs to be made in terms of holding some of these providers accountable and tracking what's actually being trusted in these different resource. So to summarize, we perform the first ecosystem analysis of TLS authentication trust. We have found that it is highly centralized into essentially three wrote route programs, Apple, Microsoft, and NSS. And the majority of this is copied from NSS actually, but it suffers from delayed updates and questionable trust modifications. And by exploring the reasons behind this poor NSS copying, we, we've shown that they arise from implicit manual copying and incompatible roots store design. And we posit that explicit provenance transparency will improve the state of TLS authentication trust. Well, so bubbling back up, now that we have mapped the provenance of roots store trust and examine the standard path for root trust establishment. Our final research study quickly looks at the underbelly of restore trust, which is TLS interceptors that inject their own root certificates into root stores. Let's go through just a quick reminder of how TLS 1.2 works. This is the version of TLS that was used when we did the study. Now we have TLS 1.3, but it's similar. So here we have a TLS client and the TLS server. And the TLS protocol begins with the exchange of a client hello message that specifies a cryptographic capabilities of the TLS client, which includes the cipher suites that the client supports, as well as other TLS extensions. The server then responds with the server hello, which amongst other things, includes a trusted certificate chain that can be validated by the client. And the certificate contains the public key that's used for encrypting the TLS session. The remainder of the TLS handshake occurs after which the client sends an encrypted HTTP requests that contains the familiar user agent header, highlighted here in blue. And this User-Agent header just identifies a TLS client software. So that's a normal TLS looks like. But for TLS interception, which we have here, the interceptor first injects their own route into the client's root store. And during active interception, two distinct TLS sessions or establish one between the client and the interceptor and then one between the interceptor and the server. Though the TLS session between the client and the interceptor is valid because the interceptor can use it's injected root to dynamically forge a trusted certificate for any domain that the trie that the client is trying to access. But how do we fingerprint this process? Well, it turns out that the client hello contains a list of cipher suites and tell it that TLS extensions and the elements, an order of these lists is highly configurable, and so this yields a fingerprint for each client. And so we create a database of these client hello fingerprints for popular browsers. And we can use these fingerprints to identify specific browsers and even specific versions of specific browsers. Now an interception occurs. The server will receive the interceptors client hello rather than the client's client hello. And these do client how those differ because the interceptor has a different tier lessons staff than the client. Though. In order to detect interception, we corroborate the http user agent with the TLS client hello fingerprint. And we see if this information matches. Based off of the user agent here, we would expect this client hello. But during interception, the server still receives the original user agent, but the client hello changes. So from our measurement vantage point at the server, there's a mismatch between the client hello fingerprint and the http user agent, which provides evidence that interception is happening. Now, I know that might be a little confusing. So here's an alternate explanation. Https clients send two pieces of identifying information to the server. 1 is the client hello, and the other is a user agent. During interception, one of those pieces of identifying information changes and alerts us to interception. So we applied this interception technique to three large real-world datasets that we were lucky to get. One was all daily update traffic for all Firefox browsers. A sample of HTTPS connections from a major e-commerce site than 5% sample of all Cloudflare CDN traffic. Now this gives us a pretty representative view of kind of what's going on broadly in the web. And based on these protected, we found between four to 10% of all HTTP connections to be intercepted. This is way more than we expected, right? And more than a magnitude greater than has been shown in previous work. This obviously raises a few obvious next questions. You know, who is performing this interception? Why is this happening? To answer this question, what we did is we compile the second fingerprint database based off of known interception products such as network middle boxes, antivirus, parental control software, and just general TLS libraries. Though we created the second database. And by applying these additional fingerprints to our three datasets, we can actually identify who these interceptors are. So we found that we are able to attribute many of the top interception sources. And in the interest of time, I'll just point out to two sorts of things. The first is that antivirus comprises nearly half of the top fingerprints across all three datasets with an especially high concentration for e-commerce. So this doesn't necessarily indicate insecurity on its own. But we also find that most AV software does a horrible job of securing TLS connections between itself and the server. For example, sometimes they don't even check the certificates. So anyone could man in the middle, this kind of second connection that the interceptor is making. Looking at Cloudflare data, we notice a suspiciously high rate of Android interception that we can only attribute to the bouncy castle SSL library. And unfortunately, we're unable to determine a specific app or set of apps that was responsible for this. One hypothesis is that there's prior work that is bound cellular providers that inject their own root certificates. And maybe there's some large-scale TLS interception happening for Android devices. So taking a step back, We've shown that TLS interception accounts for notable portion of all TLS connections. And this wasn't really presented due to time constraints. But we also perform security analysis of the TLS session during the interceptor and the server. We found that nearly all interception decrease the security of the connection compared to the onions and on intercepted connection. And this actually exposes users to additional interception us know, by doing interception, we make it easier for other people to do interception and kind of bubbles from there. To finally, these interceptors are also tainting root stores with questionable root certificate that could be compromised. And our broader mapping of ALS trust needs to account for the specific interception products and libraries that we detected in our real-world dataset. For that, we're almost, we're almost there, almost done. When our conclusions in future or so to quickly recap, they stock focused on several areas of unintended or imprudent trust simply by shedding first light on trust management in the web PGI. And we develop new fingerprinting techniques to examine previously opaque aspects of the ecosystem. And using these perspectives, we were actually able to identify the operators of CA certificates and found instances of unintended trust, which ultimately led to a root CA, removal. We demonstrated the inverted pyramid structure of the TLS authentication trust, and we found problematic trust practices due to implicit copying and incompatible that store design. And then finally, we use TLS fingerprinting to detect five to 10 percent global TLS interception due to antivirus and middle box products that degrade TLS security. So taken together, these findings kind of lead to a broader set of insights. First, protocols that support extensibility, such as TLS and X.509, which is used for certificates. These have several side effects. Extensibility enables fingerprinting of different implementations and configurations, which provides opportunities for interesting measurement research. And simultaneously, this flexibility can result in misconfiguration in the ecosystem fracturing as entities support different features at different time intervals. Back in this research is a reminder that digital identity is often oxy for the real-world identity that we care about. The not only is it challenging to link digital identity with real-world identity, but we must constantly reassess when digital identity is transferable, such as in the case of CA certificates. Third, we find that making good trust decisions as hard as evidenced by major ca distrust events and Microsoft's questionable trust and super CAs, right? Even these major route programs are doing some questionable stuff. And because of this difficulty, many people are trying to copy that have implemented their own trusts, but even copying his heart. And so we need to build systems that make this process easier and more transparent. And lastly, we notice that TLS authentication trust exists in many places, not just how you might think of it normally. Being between CAs and roots stores, right? It also exists between TLS user agents and roots or providers, between providers and roots or programs between users and TLS interception software. And increasing the transparency of all these avenues of implicit trust can improve kind of the overall state of what PKI security. Now, last thing I want to do is just quickly go through some possibilities for future work. Looking beyond today's presentation, I'm excited about work that further extends the transparency of trust management and the PKA. Though. While today's presentation compared a different route stores to figure out who was trusted. Future work could figure out why CAs are trusted. So this would involve codifying. Reprogram policies are moving towards data-driven and automated trust management through empirical CA, measurement. I'd also like to research kinda new measurement techniques to fill in this long tail of TLS, authentication, trust, right? These new techniques must go beyond manual collection of root stores and start to scalably measure the root stores of the growing number of devices using TLS, which includes the whole world of IoT, right? If we look hard enough, I'm pretty sure we'll find something bad there. And taking an even bigger step back, right? The web PKI is only one of many PKI systems deployed on the web today. So this also includes DNS, SEC, RPKM, and the wet because by far the most widely deployed PKI and a good starting point or studying broader PKI system design principles. I hope to first look at the history of the web picky I run this history, derive a framework of design principles that can be used to evaluate today's my PKI and also indicate promising directions for future development. I think you'd also be inciteful to perform a comparative assessment of existing PPIs to better understand the challenges and opportunities by each peak at and ultimately come up with kind of maybe a global set of guidelines for developing you pick as because they're going to pop up in different areas in the future. Then finally, we need to design any new PKI to it. Or explicit transparency between all the way from what CSR trusted, all the way to the TLS user agent. And you know, one thing that we found is that today's X.509 certificates, which are kind of the de facto standard or PKI. It's actually an archaic behemoth that really needs to be overhauled to facilitate future trust systems. And more broadly, PKI systems aren't going anywhere since they are essential for scaling network authentication. And my long-term goal is to systematize our understanding of trust management and PKI. And ultimately to help design the PKI systems of the future. Though. That's the end of my talk for today. But before I finish, I just want to thank my coauthors and advisers that whom this work would not have been possible. And also as a new post-doc researcher, I'd like to take this opportunity to share some of the other research that I've done. And I'd be happy to discuss any of these topics after the talk. On with that, I'm happy to take questions. Any other questions from the room? Yeah. Yeah, Good. You said when the derivative route programs get their copy their trust from the source route programs, they can support the partial partial address. Yeah. Is that just because they the way that it's implemented on when x is there's a directory that's like bash, the certificates and anything that's in that directory is automatically it fully trusted or not. And with NSS, it's actually some kind of more complex file that in addition to listing the CA certificate, indicates the time periods for which those CA certificate things are trusted. So it's just that the interface for Linux doesn't support partial this trust. Whereas if you're using the NSS restore and more accurately, you're actually looking at these other fields that restrict trust. Thank you. Nice. You said you could detect in position plate. What would you do that you see like HTTP protocols do food. Yeah, So if we look at this slide, so this is kind of what's happening during interception. And we're measuring things from the server. So the server can actually be the HTTP request because it's decrypting all the stuff. And normally if there's no interception, the server would see the first client. Hello, node C, This user agent, and I'd say, okay, this matches, we fingerprinted the client allows on top. And this matches. They both look like brown or Firefox. But when interception happens, the interceptor uses a different client hello because they have a different TLS networking stack. And so they send a different client hello. Now the server will look at the user agent and it'll see this client hello, and it'll say, That's not the client hello that I was expecting for this user agent. So something is going on as an intercepted wouldn't iTunes, these two DPDK was distributed. Yeah. So that's that's a good question. It turns out they don't. Interceptors typically don't change the user agent because the user agent affects the content that is returned, right? If you advertise your user agent to be a mobile device, the server's going to return a mobile page and then the user, the client, when it gets a mobile page, even though they browsed from desktop, going to see different things and I'll mess up the user experience. Also you said interception Meeks, intersection, easy to teen doing. So imagine here the interceptors actually some antivirus software really hear the client and the interceptor are on the same machine, confined to the same network. And the public. The part of the connection that is going over the public Internet is not this first TLS connection. This first TLS connection is occurring on your local machine. The second TLS connection is the one that's going over the public Internet now. And these interceptors, this often don't check certificates. They don't look at dates, they just do a bad job of performing TLS. And now this public part of your TLS session is really weak and anyone on the public Internet can and then further intercept that connection. Here we had a question online at, along those lines, to what degree can the interceptors mimic the client to make your approach harder and harder to detect an interceptor. Yeah. So I think this is kind of goes back to Paris. This question. Yeah, they can start to mimic the client, but there's a cost to mimicking the client, right? You can't just say, Okay, let's just copy everything that the client does in their client allow because then if the server says, Okay, I think you can do, you know, EC, DHA, ours, you see elliptic curve cryptography with AES 128. Then you actually have to do that when you're performing the rest of the TLS connection. You can't just advertise that you can do it. You actually have to be able to back that up. So there's a cost there. And yet there. They could theoretically copy the entire TLS networking stack for these clients. But typically interceptors are intercepting for a variety of clients. And so the burden receptors would have to be basically implementing every possible extension and every feature of TLS in the same manner that these different clients do. Yeah. Thank you. Any other questions? Yeah. One more. Like using antivirus software is actually making it worse is then yeah. I mean, though antivirus software does a lot of stuff. There may be helping in other ways, but if we're looking strictly at DLS interception, the other making things worse. Okay. All good. Let's thank our speaker. Thank you so much. Pleasure. Excellent talk. Thank everyone for attending. I'll see you next week. Bye.