today's talk will be on the implications of privacy aware choice resolver thank you so much for that warm introduction so hotels that I'm going to give today is probably not your typical cyber security talk because I'm not really going to talk about cyber security but instead I'm going to talk about how people interact with them interact with them technology isn't in particular without privacy technologies and I do have a mild I'm speech impairment and so if I pause briefly during my talk I promise I'm not just daydreaming everyday consumers make millions and billions of observable choices where to shop which websites to browse what news stories tarita what hotel rooms to book and these choices are attractive recorded in mass and these choice data are going to be incredibly valuable for things like personalization to the individual for things like on target advertising personalized price pricing and also like on personalized healthcare as a few of these examples of where we're really where really we can like I'm tailor things based upon based upon a particular person's data that they create through their behavior and it's also helpful at large for making sort of larger aggregate decisions for example doing like machine learning and for doing a/b testing and for building and for building recommender systems and these things can all be built based upon personal data but if and these data can really help us do a few important tasks that do we aspire to do as we use data I mean you can first help us understand a person's preferences how do they how do they prefer things what choices do they make I can also explain why they behave the way that they do and they're really performing some optimization on their own what are they doing and that can also be used to like predict how did he behave in the future and perhaps may help us towards this tortoise much more lofty goal really understanding something about why humans behave the way that we do and for this talk in this technical audience I can really rephrase these things by saying that personal data can be used for the purposes of like inference learning and for optimization and these are tasks we want to do have you can do them better if we have data and those data come from people but of course with great data come great privacy violations and in the privacy community everyone has their favorite tale of privacy gone wrong and then this one and the story about how target knew that a high school girl was I'm pregnant before her parents did and so now here in this tale it tells what a father who like I'm strong dint his local target target branch because his on high school aged daughter had been receiving ads the mail for cribs and baby clothes and he was like what are you doing she's in high school are trying to encourage her to get pregnant and the new week later he had to go back in and apologize to the manager because she in fact was pregnant how did this happen feel like target was doing a very reasonable thing but that data that they had legally obtained which is which is they were doing machine learning and they've realized that they're not really capturing a portion of the market and to increase profits then they really had to and they really had to expand and this market that they weren't capturing was like on parents and they found and they found that if pregnant women develop brand develop brand loyalty to like Target while they were pregnant and they would um continue shopping there once our child was born enzo target did some machine learning and they found out it's a really good like a predictors of a person's pregnancy was buying things like blankets vitamins and body lotion and to this high school girl did turns out it was because she was I'm pregnant and then and then like I'm a target use this information to send like I'm targeted advertisements to that girl and so really this isn't breaking any laws and if that target is doing a very reasonable thing but the data but they had like legally obtained but it still feels kind of creepy and so in fact this was the essence of their public response because they said we are very conservative about compliance with all privacy laws but still even if you followed the law then you can do things that make people queasy he'll sort of gets at the essence of what's going on here is oftentimes like compliance isn't enough and so the wrong response here is to not use data that is the wrong answer and in fact I began my my talk by talking about how valuable personal data are and about all the like amazing things we can do with it but unfortunately this is kind of what's happening in practice companies are getting scared in fact there is a scientific heart Scientific American article they're really like I'm described this practice by saying as the awareness of these privacy concerns has grown many organizations have clamped down on their sensitive data unsure about what if I'm anything that can release and if you talk to your friends who are working and like industry you know kind of see the same thing thing like happening and so even at large organizations they really don't share data across departments and so for example YouTube doesn't talk to Gmail at all and that's good for privacy but that's really bad for the ability like to make use of this data and so maybe like you feel like you know this is okay I don't want Skynet it's okay the Google doesn't share sure sure data but there's cases but it sort of seems intuitively natural do we should share data because there's a large gain from it for example how part of the like HIPAA laws in the states which govern like privacy protections about people's medical data and in fact prevents doctors from going back to past patient records when trying to find the new treatment for a new patient and so for example if a doctor said like you know what I saw a patient last year that had the exact same things made out of clay and and I found a treatment that works but he forgot what it was and that doctor is not legally allowed to go back and check past patient records and so this seems like a case where for the fact we don't share share share data is really harming people and in fact it may cost lives and so you might think okay it's fine I'm going to use data but I'm going to anonymize it right that seems fine that should work and the answer is no it never ever works and again back to our favorite news stories so back in um 2006 AOL tried to provide this kind service to the research community and they've released anonymized versions have the search logs of about like 20% of their users and this was back in 2006 where they actually had a lot of um users and so this really was a lot of fun people and so they anonymize it and they took off people's like names and location and ap address but if you think about the kind of things that you search for and that you search for things that like on Georgia Tech I ISP Atlanta weather maybe you search for yourself or your co-authors or your own papers and so really anonymizing that that data is not going to be enough to ensure privacy of it and so in fact in this story there are some New York Times journalists who who exactly did this and they found and they found one particular person just as a way to really show how how this works and so this is AOL user number four four one seven seven four nine and that sounds perfectly anonymous right except she searched for like um landscapers in her city several people with the same last name and also for like I'm home sold in a shadow Lakes Subdivision Gwinnett County Georgia that probably narrows it down a whole lot and in fact they found her and they talked her for this article but she also searched for some things that maybe she wishes weren't made public for example for example if she searched for some health conditions and for some elderly dating sites and for dog the pees on things I bet she feels like her privacy was not protected in this process and they found her in effect and in fact he did live in this area and she apparently searched for these things for friends and she has three dogs and so really there's no such thing as anonymized data and this quote comes from a famous uh privacy theorist and the dwarf who's now faculty at Harvard and she wrote the actual book on the kind of privacy that I'm going to be talking about today and she's famously says anonymized data isn't meaning either that it's not properly anonymized in the case as we see here how there's still content of the data that can leak information about the individuals or perhaps if it isn't oughta mised we've taken out all the content of the data and it's no longer data and it's just noise and so really if we want to do things in a formal way how we still make use of our data and still have privacy guarantees do we have to have a more formal guarantee as to what privacy means and as to how we plan to achieve that and so here enter on differential privacy which was defined back in 2006 by um work McSherry u seaman smith and it informally bounds the maximum amount the one purses data can affect some computation over a very large database and so if I guarantee I'm not even going to learn things from you because I'm constrained away from that maybe this feels more like privacy and so this definition formally says that an algorithm M the maps from an end Hoople of types and so think of this as being like the data coming from n people does an arbitrary output range R is prima tries epsilon differentially private if it is the case that for all neighboring databases that are the same except for one person's data and for all possible things that things that I might output as a result of this analysis it's a clay it's a case at the output on D and the output on MD prime are closed and so I'm going to guarantee I'm gonna learn approximately the same thing if I have your data versus if I don't and so there's some closeness guarantee is in fact a multiplicative one and it guarantees at the ratio between the probabilities of things that I output under D and things that I outputs under D prime are going to be bounded by this term e to the epsilon and so this epsilon is their privacy parameter and if epsilon equals equals infinity and this thing has nope then it has no bite and so I'm saying you can do like anything do you want and if epsilon equals zero then this is a complete privacy and I say that I have to output the same thing independent of your data and so in picture that looks like this imagine if I have some blue database here and they plug that into my algorithm and it gives me out some like blue Kirk curve here then person I changes hurt her data and she changes it to be some new like on TI prime and that moves me between the neighboring blue and red databases and I now plug this red database in and it gives out this red curve and these things are close and this is the sort of like I'm closeness notion that they are close like a point wise and so really this is saying I'm going to output s the same probability if I have your data or if I don't or if you're truthful or if you lie there's really not going to have any effect on the kind of things that I'm doing as I analyze this very large database D and so think of this as is the collection of all bad outcomes that may arise from this analysis on your data maybe it's the case that I'm that I like buying things online and I'm concerned about my dad knowing that I am pregnant maybe I'm sharing my like medical records but the doctor and I'm concerned about my insurance prices rising as a result and I'm promising that bad thing is going to happen with approximately the same the same chance if I saw your data versus if I never saw your data at all and so really this ensures I'm just learning properties of the population and I'm not learning properties about you and in fact it's a very strong worst-case guarantee because it holds for all day for all possible databases and for all people who may want to change their data and for all things they might change their data to and for all possible collections of bad outcomes and so there's a really really strong guarantee and it has a whole bunch of nice properties and so this isn't just a single standard definition but in fact it's been studied for the past now twelve years and it has a whole bunch of nice properties that we've seen and so for example it allows us to move smoothly between these two informational extra I no longer have to talk about sharing data versus not but instead I can like parameterize how much of your data are being leaked in this analysis that I'm doing it also has some very nice algorithmic properties so for example it is robust to post processing which means if I publish a differential private output is private forever and no adversary can like I'm going the corner and think really hard and learn more about you as an individual aside from the promise of this epsilon in fact it's like an information theoretic guarantee and there's no assumptions on the computational power or on the outside information held by the adversary so it's very very strong and this guarantee also composes adaptively and so this says as I perform more and more differentially private computations on my data set my privacy guarantee is going to degrade like gracefully and pretty slowly and so I can say if I do like on K different epsilon differential private computations I now have like I'm square root K epsilon privacy of the entire process and so this makes algorithm design very nice because it's very because it's now very modular and I only have to have a few how type of you only have to have a few very simple tools and I can stack those together like tiny like I'm tiny building blocks and I can then analyze the privacy guarantee I'm a overall a very complicated algorithm but I just thinking about like how many of these differentially private building blocks have I used AM algorithm and it's been studied a whole bunch in the past on 12 years and so we now have a huge collection of tools for problems including including including like statistics economics optimization and machine learning any problem that you want to solve involving data so you can probably also solve it privately and there probably exists an algorithm for it already and these tools are being deployed in practice as by a growing collection of um firms including including Microsoft Google Apple uber snapchat and now the US Census Bureau it's going to implement it in the m20 20s census and so basically this has been like a wildly successful line of work but there's one sort of drawback that I'm going to talk to you about today which is that these things all kind of they all kind of ignore these like strategic and the human aspects of on privacy they assume they assume that the world begins and you have a nicely curated and true database and then you can just pick your epsilon to balance the like trade-offs between the accuracy rights of the analyst and the privacy rights of the individuals and you do this once and the day and the day ends but actually data come from people and so and so this like assumption of a database that you begin the world with actually may depend on your choice of privacy parameter and so in fact privacy concerns effect on behavior I think it's no I'm surprised if I tell tell tell like everyone here they're really like the really like most Internet's adult users have taken steps to avoid surveillance on online and think about how he might browse the like Internet differently if you were alone whereas if you were being watched by a boss or a loved one you might not click certain links but these changes in behavior actually correspond to different creation of data and so if people don't click links and that means if there's no data point saying person X click link Y and so really the main question is his his like well if people can modify their behavior to ensure that like different or perhaps more favorable things are being learned about them are their data still useful could we still do all these tasks all this inference learning and the optimization that I talked about at the beginning and so there's three points about this that I'm gonna make today about three challenges of interpreting privacy we're at choice data and so first if you just like plug in this privacy where choice data enter standard machine learning algorithms you're going to learn the wrong thing and this is because people have like specifically taken steps to make sure that you learn the wrong thing and so this is unsurprising and so here we might ask how can we change the way which make inferences so that we can still hopefully have something interesting to say based on the EM data and you might also think maybe if I just had like stronger privacy policies if I promised people to use their data perhaps in some like on differentiy private way maybe then they won't have these privacy concerns anymore and then I can go back to applying my same standard machine learning algorithms and we'll see things are kind of wonky here and things don't always behave in the way that you might expect as you will do that and then finally even if you have like the best theoretical model in the world that's not really helpful if it doesn't match human human behavior and so people are are like notoriously irrational and so we'll talk here about how to incorporate that into our privacy models any questions I'm also going to go through these things pretty fast so don't be alarmed yeah I'm actually glad that I'm not the only one that is an excellent point and that's actually a nice I'm transition in terms of how to interpret this data and so for one thing people don't just sort of like blatantly lie constantly or they aren't just like completely making this like wildly random choices but a bet that when you decide to lie they are sort of like reasoning what are my cost-benefit analysis of this lie I mean for example you may lie to an uber driver about your life story but you probably don't lie about your location because they have to drive you there and so actually people are really sort of like doing some optimization inside of their head about like you know how much can I lie lie about how much can I get away with it what is it costing me to lie what am i gaining and then also incorporating this how can I still how could he still make inferences about individuals and so let's begin with a story about buying phones and imagine that we have a friend Alice and she and she's in she's kind of old-fashioned and so she goes to the phone store and she has an option to buy a or B and she picks this old-fashioned playin line and she's very happy about that but then a year later her phone breaks and just to go buy like a new phone but now there's less like new fancy smart phone that's been invented and she still really likes a but she knows but she knows that her friends are gonna tease her if she buys a for this new smart phone his is available and so she kind of splits the like um difference and says okay fine I'll buy B now in the classical economic view how Alice has preferences over phones and so it's not clear how to interpret these behaviors does she like phone a better or does she like phone be better it's not clear but the privacy aware of views has actually she's reasoning both about pairs of phones and about the inferences made about her based upon the phone that she buys now we're going to assume that there is some known observer Big Brother who in this case are like alice's friends great friends I will say and his observer observed Alice's choices how we define that as a choice instance which consists of on three things and so there's first a like a collection of all possible phones here this is a B and C and that we have a that is a collection of different menus Alice faces in this process and so here this is the first menu and here this is menu two and then Alice also have a choice function the maps from the available menu into her choice so he or she picked a he or she picked B I would like to know what is Big Brother able to like reverse-engineer Alice's choices based upon this observed choice sorry sorry when can Big Brother for verse engineer Alice's preferences over phones based upon her observed choices and so let's think about how it's game goes Alice things the big brother is a classical economist who will assume that she chose the phone that she liked the most and so this is that her choice from some collection of things a is preferred over all other phones that were available but not chosen and so given this palace is actually going to choose a phone based upon these pairs of phone and the inference is made about her base upon that phone and so this says that she chose her phone because she liked that that that phone paired with these collection of public inferences that she believes are going to be made about her more than she likes any other phone paired with a factual inference is made about her had she chosen phone why instead and I want to know like this assumption about the observers reasoning here here doesn't necessarily have to be the way which Big Brother makes these inferences he can actually be as smart as us and know that she's doing it in this way but this is really just like her belief about how these inferences are going to be made and they can the paper they extend this to think about like you know he thinks I think he thinks I think but in this talk they'll just stick with on this and so privacy preference is this ordering heat here over pairs of like phones and these pairwise inferences Alice believes are going to be made about her based upon the phones and so what properties should have privacy preference achieve I mean first of all intuitively it should be monotone and this really says Alice prefers more privacy on to less to less and so really will say that if they keep the phone that she gets fixed she's going to be happier if a smaller collection of things are going to be inferred about her okay and so if I keep everything else fixed she likes it when the fewer inferences are made and also because we're talking about the observer trying to infer her like preferences over phones it makes sense for that to be a well-defined mathematical object that can be inferred and so this says that if she prefers phone acts over phone why under some inference that then she'll also prefer phone acts over phone why when they're both paired paired with some other inference set and so this just says that her choices are consistent with her preferences over phones and so in pictures here's how this game works how we have our friend Alice and she has her and she has her like internal privacy where preference and then based upon that she makes some choices and these are like I publicly observed by by are very cute adversary and then he wants to know two things he wants to know first of all what are the possible proves you wear a preferences that may have caused this and then also what are the underlying like preferences over phones that she must have and so here's part one of our main theorem we say that for any choice instance and there always exists a privacy preference that explains those Amish choices and so this says whole choice behavior can be rationalized I could always find some way to explain Alice's behavior with some privacy where preference but in fact it's much stronger than that and really I'm for any choice instance and for any possible hypothesized ordering over phones and there always exist a privacy preference the respects these two desired properties and is also consistent with the hypothesis and so this says and so this says therefore any like I'm conjecture preferences over phones and najin if I want to like test how your choice is consistent with this possible like a preference ordering over phones and that is always going to be consistent with all possible choice data so basically nothing can be inferred at all ever because for any hypothesis that I want to like test will always be consistent with all possible data okay and so this gives us a very strong a very strong no answer of like how like I'm kind of we still interpret privacy where choice data and so this is great news for the consumer but it's very bad news for some firm who's trying to like understand her preferences over their objects so the they can for example show her more like on target ads they basically have no ability to make these inferences once we introduce a possibility that she may have some privacy concerns and to look one possibility is saying is saying maybe I can just introduce some privacy policy so that my customer doesn't have any privacy concerns anymore and so really can be fixed privacy concerns via privacy but via privacy it's technology for example and a company like promise they're going to implement differential privacy and then consumers will start behaving and like a traditional non privacy aware sentence and if so how does my choice of privacy policy effect affect behavior and really like how should I choose epsilon and my differential privacy guarantee so let's revisit our friend Alice and she's going to play a like two stage game game here so she's first gonna decide if she wants if I sound like magazine at some price price P and then tomorrow she's gonna go apply for a loan and she has and she has two pieces of like private information they're important for this she has the value for the magazine and she has a type that describes that is the describes of these two two loans which is gonna be a better fit fit for her and so here her type is I'm headed worthy which means that she probably deserves to be matched and it's very nice looking alone and not this very sketchy payday loan and so how this is going to work she's first going to decide to buy this magazine because she's a responsible person and she likes like I'm smart investing magazines and I imagine that these banks would love to know if she likes to buy smart investing magazines because that's because that's probably a good sign about her how about her like I'm underlying ability to like pay off the loans but instead we're going to introduce a privacy technology here and so I'm gonna say at the bank can't exactly observe but instead the bank will see all these only some like noisy information about her purchased to say to them and so I'm gonna toss a coin and and like oops her by choice got turned into a not buy and so she's probably gonna get this like very sketchy payday loan and our privacy and our privacy policy here is going to be the chance we flip her a bit and so this bitch is going to be correct with probability Q and it's going to be flipped with probability one minus Q and the main question that I want to address in this game is like how does this games equilibrium change as we have very privacy policy as we change Q and the punchline is that it is like hesitates not all obvious otherwise this wouldn't be a very interesting paper and so I'm going to give away like a spoiler alert about this game before I go on to define things more precisely how the outcomes are the S as we decrease accuracy you have the signal which means more privacy for our friend Alice we can actually see that this causes more information to be revealed about the consumer as we strengthen privacy and it can make the consumer less happy and the bank more happy again as we have stronger privacy protections for the consumer and in general player utilities behave in some kind of unsurprising ways that they aren't that they aren't monotone and the aren't um continuous in the privacy parameter and in general there can be like multiplicity of equilibria and we don't even we donate we've and we don't even always have like an equilibrium in this game so basically things are very strange here and if you don't like this Bank example there's a whole bunch of different examples where II make some decision today today and the information is used has use like I'm downstream by some different farm who is going to like got take a risk on you as an individual and if they had more information about you and that risk would be a diminished and so for example if you buy like I'm crippled a then maybe I should give you lower lower insurance prices if you have high grades I should get you a better job offers I may want to use your like shopping behavior have some ads but in this talk of continue using the example of this like for this past purchase behavior for loans okay and so here's a formal definition of of of the game and so we have the same consumer who's going to interact on quench aliy with two different firms and so in the first day she has some value v drawn from some prior f and there's a posted price chosen by this seller p and then her friend alice decides decides to buy or to not buy and that creates a bit that is one if she buys in his zero otherwise and her privacy intervention here has going to be two like on flip her bit with some probability one pi minus q and so this and to this be hat is going to be it's going to be like a noisy version of her purchase choice and that's going to be passed on downstream to the period to firm and so then the next day when she goes to apply for a loan there's going to be two possible types t1 and t2 and think if t1 is being the like hi type hi and having a high credit credit score and the probability that you have the high type given your value is an increasing function of your value and so if you have a higher value on the first day then you're more likely to be the high type on day two and there's two possible loans there's a nice one and there's a sketchy one and the consumers all prefer to receive to receive this nice loan because of course who doesn't but the bank prefers to screen and so they prefer to give to give the high type alone to the high type buyer and the low type lon to the low type buyer and so this is a lot but here's the like important deep details cause we're going to share like a noisy bit that is correct with probability Q and the high type is t1 we have a good loan a and if you have a higher value in the first period then you're more likely to be that to be the high type and the lender is gonna be more likely to want to give you a better loan and we can in fact call this games equilibrium by this backwards induction I will need this key fact that in fact all like equilibrium are going to follow a cut off strategy in the first period and so this means if there's some value V where players above that value by and players below that value don't buy and so given that in period 2 and the bank is going to see the like I'm realization of the noisy purchase bit and it's gonna know the noise and it's gonna know V V Star and just some computer posterior on types and the bank is going to give loan a if and only if the expected utility for doing so it's higher given the realization of the bit then the expected utility for giving loan B and this in fact boils down if you do this math to just a to just a like on threshold condition on on the posterior belief of the consumer being being of the high type and so in fact this like ADA has altered I'm going to like talk about during this talk but it's like a function I look a bunch of the other stuff that was on the past slide and so then really there's just two possibilities of this I can either treat everyone the same and give them the same loan or I can give some people the good loan in some people the bad loan and if we have a pooling equilibrium then this is kind of a boring case and so this is a case where like everyone gets the same loan and so in that case I'm period one consumers have no incentive to misbehave because what you do today doesn't affect your payoff tomorrow and so you might as well just behave just behave myopically and so the consumer is going to buy exactly if their value is above the price and the seller is then just going to set the panoply price given a brighter and so this theorem kind of says all that in a very cumbersome way which is the ailich pooling equilibrium exists exactly when exactly when this behavior induces posterior induces posterior beliefs they make the seller want to give everyone the same loan now the more complicated version is a z' separating equilibrium so it was the bank wants to condition the terms of a loan based upon past purchases and so in this case in period one a consumer knows that her behavior today may affect her future loans and so she's gonna be a strategic when you citing if she wants to buy this magazine and so she's going to buy if and only if her value from buying plus room continuation payoff is greater than that if she doesn't buy and so this is saying that if she buys that in period one she gets the difference between between her value in the price and then there's some cue probability this that that are a bit that her bid is not going to be flipped and she'll get the good loan but if she doesn't buy and she gets 0 today and there's a 1 minus Q chance that a bit gets flipped and then she gets shown on the better loan tomorrow and these two things must hold for the equality for the marginal buyer and so how the marginal buyer must be indifferent between buying and not buying and so then I can solve this and this tells me the P star and these other is I'm going to post price to maximize profits given this purchase behavior and again we have a kind of ugly theorem that says it will have a separating equilibrium exactly when this ability exactly when this behavior induces a posterior that makes the bank want to treat different realizations and be had differently and so that's all for one single fixed queue but really but really we'd like to know how these things change as we increase Q and so and so if I increase Q then mice the mice signal is going to be correct with a higher higher chance and so that should correspond to less privacy because I'm sending a more accurate signal but on the other hand we had this on condition for when the buyers choose to choose choose choose to buy and if we increase Q then it just pushes down V star and so really what's going on here is is it as we increase Q I'm sending a more accurate signal that simply has different content right and so it's unclear without knowing more about the parameters of the game if it's better to have a less accurate signal based upon this V Star versus a more accurate signal based upon this V Star and it really depends on like how many people live in this space between these two and on the payoffs and so really I can't say like uniformly it's better to have more privacy versus less but one thing that's really happening here is the consumers are giving themselves privacy through their through their own actually having their basically like buying privacy here right because like the reason why this thing changes is because people who are not buying before it's because their value was too low are now beginning to buy anyways in order to send a more favorable signal about themselves and so they're sort of like paying the price by like buying this good that at a price above their value in exchange for getting more privacy in this game and so here in this game I think I'm gonna skip a few of these things for time and I'll just show I'm this one and so here in this game I've had like instantiated things with just one particular like I'm pre-order case values are distributed uniform at one and there is a linear correlation between between your value and your type and so I plot here on this on this x-axis here is Q and this is the like a probability you have having a like of having a like of having a like on correct bit and so if Q equals one and this is perfect privacy sorry there's no privacy and if Q equals 1/2 then it means that I'm going to flip everyone's but it bit but probability 1/2 and so here this is a complete privacy and I plot here on this y-axis mutual information between between the buyers type and between the like up posterior belief about the type and so really really this is a way to sort of capture how much information is being leaked about the buyer through this through this noisy B hat that we send and the most of surprising thing is that this is curved and so you might think I'm over here and um customers have no prior privacy and that's bad and I want to give them more privacy and so I'm gonna lower Q and give them more privacy but in fact doing that but in fact doing that might cause in fact like even more information to be learned about these individuals and so really this says then then in the case of like on strategic environments implementing privacy policies without thinking about the incentives may actually lead to the exact opposite of the intended effects and I'm gonna sort of go through these quickly and here we plot in the same game consumer payoffs and the black line is the bank and and the red line is this seller and the blue line is the consumer and as a few weird things happening here notably there's like dramatic jumps between these two points and there's a period over here in which there's no equilibria at all and so this really says again if I was going to like tweak cue just a little bit say that I wanted to like you know help them consumers and it was going to increase privacy I may actually make them a lot worse off by going front from here to here and also of course there's like on different players and in this game have different have different like uh preferences over these possible Q values and then main takeaways here hard that if you just they evilly add like privacy into games then it might have the exact opposite of the intended effect and also privacy policy makers have to decide who are they really trying to help by these tools I'm finally going to talk extremely briefly about some ongoing experiments and this is going to be very brief because they're still in process and so I have a lot less to say about it and this is really trying to like test how do people actually reason about privacy and so in theory we assume people are rational and they're like utility maximizers and they're doing some and they're doing some kind of smart optimization but in practice people give away their personal data for for cookie and this was a performance art piece that was done by some by some artists who asked people to give away their birthdate their mother's maiden name a photo of themselves in their social security number in exchange for a cookie and people largely did it there was like a follow-up experiment at some school where they asked people to give away like I'm five of their friends email email addresses in exchange for a piece of pizza and they also did did that so just like this does not sound like a rational utility Maximizer and and the main questions that we're trying to address here our first like do do people even care about privacy had these experiments address maybe they they don't yeah we all hear everyone complaining complaining about their privacy being being violated and so they must care at least a little bit and then also does the value of privacy match the same properties there match the same properties that are assumed in our theoretical work and one big challenge here is what like embarrassing data should I use in the lab for these experiments maybe I should use some real data I should use like your transcripts or your medical records or I can like blackmail you to sound like deep dark personal secrets that would be awesome and you definitely care about privacy of that but for a variety of moral and also legal reasons we shouldn't and can't be doing that maybe I can just like make up some data that you care about and say like you know so I'm just gonna like randomly generate this your deep dark secret is three and protect that but like you don't really have any reason to value privacy of that besides the amount that I tell you to and I compensate you for Han the experiment and so this isn't really capturing things anyways and so instead we had to create kind of embarrassing data through their behavior in the lab and the core of the experiment is that we had four hundred and fifty people coming to a lab in groups of size ranging between two and twenty and they played a like on dictator game where they split pairwise I'm twenty dollars and so in this game one person decided on split and the other person had no say and they just had to take it and then afterwards in this room did we announce a noisy version at the split who said it's correct with probability P and I'm going to pick a random split of these on point $20 with probability 1 minus P and we varied our like privacy parameter across some treatments that we had controls of like on complete information versus no information so here's a sheet I'm going to kind of like skip through it but I'll just say two like I'm here are these seven cases that we used how how like these were the chances of announcing like true behavior and we wanted to a test the people are people consistent with with things that are predicted and I'm theory and for time I'm just gonna skip through and and show you our punchline I'm happy to go back and talk more about this experiment but I also want to Lake respect your time and be done and so here are the common so here Hardy like outcomes and so here on this x-axis we have the probability have a random allocation and so and so if it's on 0 then there's no noise and there's no privacy and if it's 1 then it's complete noise and and like a complete privacy and so we see things are mostly monotone like they're mostly decrease seeing in this sense that as people have more have more privacy and they have a diminished ability to sort of like publicly announce I'm a nice person and they're like fine I'm gonna keep more of it some some strange things that we generally don't don't understand her his is a spike here so we thought there would be like a dramatic dip at one extreme but we didn't expect to see things going up as people got more privacy and we really don't understand understand understand that spike and so now we're and so now we're going to do is similar things under like the variety have like a few different games to see if this spike they like I'm her sis and to see if we can match this with existing or with new like on theoretical models and so really all this to say what is privacy research matter I think in this room I don't have to proselytize too too much but for a few reasons and so first is because privacy concerns affect affect behavior and you might learn the wrong things if you don't incorporate this into your inference about people based upon their error data and also privacy policy hasn't a quick fix and there may be these like surprising effects based upon incentives and then finally humans are not rational and their behavior may not match your theoretical models and really like all this is because privacy is not a one-size-fits-all solution I can't slap one single tool on and say and say and say that like there that said it's done and it's all privacy but in fact it is very hard but in fact it is very ongoing and there's a lot of things in the privacy space that we don't understand yet in particular as it pertains to incentives and to humans that's all thank you not here yeah for the question yes yes yes Anitha in fact in fact that was a point of the experiment is to really use the power of M differential privacy that allows us to explore the spaces in between these two extremes how it's no longer just about sharing the data versus not but I couldn't say but I can say there's a parameter that quantifies how much of your information is being leaked and ideally we'd like we'd like them to extend these things to say like you know if I were to give you Absalon versus epsilon Prime how much would you pay for that difference in privacy guarantee yeah yeah of course and so that's sort of what and so that's sort of what this slide hoped to do is to say like you know here is here's how people behave in the space in between these extremes and then trying to map that back on - like expected utility theory yeah oh yeah yeah I mean I mean so it's been all I mean it's been observed in like a whole bunch of different treatments the people behave differently when they're being watched versus when they're not and so this was actually like very important in the last election is that like in the polls everyone said I'm going to vote Hillary or vote for some third party and then and then of course in the privacy of the voting booth that we had a whole bunch of Trump voters and they were even named the shy Trump voter because they were like too embarrassed to say this out loud and so maybe if we could have like had some idea about how to map back on people sort of like observed like a privacy where announcements of how they're going to vote News ought to sort of like reverse engineer their true preferences over candidates and how they're going to vote then we might have seen that coming I mean being frank I think though we're far away from having that kind of analysis and probably people who do like machine learning are much better at making those kind of inferences yeah yeah so that's so that ass actually kind of a couple important questions the first one is that we never use the word privacy in this experiment because we didn't want to prime them we talked about like randomness of the announcements to the others in the room who talked about it and that way and in fact we even like asked them and I can exit survey what they thought it was about and a few of them said privacy but not very many of them but one real challenge here was explaining this game to people and you think that it's very obvious because I can like describe it in one slide people did not get it and we think that may be a reason for this spike is we gave them a like on comprehension exam like check and they understood the rules the game and there was like a sixty percent pass rate and so even this like very very simplified version of the game like people still didn't get it and so hopefully on these experiments will make things a little more clear but also here we never talked about like a differential privacy kind of an Epsilon and in terms of the privacy accuracy trade-off the technique that we use here like at random eyes their data is far from optimal there are much better algorithms that we can use in terms of achieving privacy but it's really hard to explain those to people and it has to do with like you know drawing noise from things that are not Gaussian and then like perturbing outputs by the cut but it's kind of wonky but it's kind of wonky distribution and like and like if they didn't understand this there's no way they're going to understand that yeah yeah yeah and that's why I like I say that were there were pretty far from being able to make these kind of like meaningful inferences for important things like for example presidential elections because we still don't like understand how they're gonna split $20 but put you're right I mean it's a long wait wait to go and so Han this like hypothesis slide that a slide today that I skipped we really wanted to find some sort of like qualitative guarantees instead of quantitative it guarantees because it's not because not very valuable to say like you know so they value privacy at $3 in this game but I really want want to know are there privacy preferences monotone and the amount of privacy they get and things like that and things like that will hopefully extend into much more general and much more useful cases yeah thank you yeah yeah absolutely and so I'm gonna gonna go back to one of the earliest slides which is here or here actually only go here and so I said this is like an amazing definition and it really has like all the privacy guarantees - do we want that it's used in practice and it's helpful for large database of size N and I said that well the were really in the case where we have a consumer who is maybe like you know logged in to some site and the and the firm it was exactly who that who they are is not really meaningful to say like well I'm gonna like output the same thing if one person changes their data because now this one person really is the entire database and so we have like a whole other branch of my work on sort of how to do this new hat in the case where you have a very large database and so that's honestly more like on traditional and this on differential privacy space is thinking about a very large database and there's one firm hwacha for example share that or for publish analyses about this and so this is what the honest and this bureau will be doing is they're going to like them collect everyone's data and then publish a few thousand like got tables and um statistics that have a about that and it's also helpful if you for example have like a database and you want to publish some privatized version of it like in the case AOL did only do it correctly so that I would differential privacy can in fact help you with that if you want to share publicly a large a large a large a large on database and still have some formal privacy guarantees for that that is a very fair question and I thought we specifically controlled for it so on the experiment two of my collaborators there are like Israelis and we thought about running it there we thought there's a very different cultural norm in terms of like sharing and in terms of like how much you give to others and in terms of moms generosity and so we specifically controlled for that by doing it in the States I think that'd be very cool I'm trying the same thing in both places and to see how the outcomes are very different but I think that's an excellent point is is that this is really all based upon the assumption that we want privacy and the privacy is good and in countries like I'm China that's not really the case yeah yeah yeah yeah and so it's like you're and so you're like really asking about a case where people want to share information about themselves for better or for worse for they actually like want things to be shared about them I don't really think differential privacy captures that because it is purposely trying to like prevent things from being leaked about the individual I mean I suppose do because sort of think that sort of just like reverse this thing and say like actually I want to make sure that I leak at most a certain amount about you so part of my work that is kind of related to that but in a different way is thinking about like reversing the direction of the privacy guarantee for the purposes of like obfuscation and saying like actually I want a I want to ensure that I add noise that will hide at least this much of my like information and then asking how large must that noise be so that's kind of difference to the case we actually want want to like them we actually want to have like information leaked although that may be a way to like I'm combat it because if you're going to like add noise onto these announcements then perhaps like a defense agency can say like actually I'm going to add at least this much noise how that no one can see the announcements of these bad guys all right thank you