Privacy-preserving Statistical Tools: Differential Privacy and Beyond

dc.contributor.advisor Cummings, Rachel
dc.contributor.advisor Mei, Yajun
dc.contributor.author Zhang, Wanrong
dc.contributor.committeeMember Krehbiel, Sara
dc.contributor.committeeMember Davenport, Mark
dc.contributor.committeeMember Wu, Jeff
dc.contributor.department Industrial and Systems Engineering
dc.date.accessioned 2021-06-10T16:57:57Z
dc.date.available 2021-06-10T16:57:57Z
dc.date.created 2021-05
dc.date.issued 2021-05-01
dc.date.submitted May 2021
dc.date.updated 2021-06-10T16:57:57Z
dc.description.abstract Differential privacy has emerged as the de facto gold standard in protecting the privacy of individuals when processing sensitive data, because of its powerful formal guarantees. Several companies, including Google, Apple, Microsoft, have deployed differentially private tools, but barriers remain between such systems and full-featured privacy-preserving data analytics. This thesis focuses on two main challenges: private online decision-making and privacy of dataset-level properties. Most of the existing differentially private tools are for static databases with non-adaptive analysis. However, modern data analysts interact with online dataset adaptively. The first part of this thesis studies private algorithms for two classical statistical online decision-making problems. In Chapter 2, we study the statistical problem of change-point detection through the lens of differential privacy. This problem appears in many important practical settings involving personal data, such as identifying disease outbreaks based on hospital records, or IoT devices detecting activity within a home. We give the first private algorithms for both online and offline change-point detection. We prove a differential privacy guarantee and accuracy guarantees for our algorithms. We also give the first finite-sample accuracy guarantees for the standard non-private MLE. Additionally, we provide empirical validation, which provides evidence that our algorithms perform well for practical use. In Chapter 3, we study False Discovery Rate (FDR) control in multiple hypothesis testing under the constraint of differential privacy. Unlike previous work in this direction, we focus on the online setting, meaning that a decision about each hypothesis must be made immediately after the test is performed, rather than waiting for the output of all tests as in the offline setting. We provide new private algorithms based on state-of-the-art results in non-private online FDR control. Our algorithms have strong provable guarantees for privacy and statistical performance as measured by FDR and power. We also provide experimental results to demonstrate the efficacy of our algorithms in a variety of data environments. Second, classical differential privacy does not protect sensitive global properties of a dataset, such as the distribution of race and gender among users in the training set or proprietary information. We demonstrate a new dataset-level privacy vulnerability and introduce new privacy notions beyond individual privacy in the second part of this thesis. In Chapter 4, we study the leakage of dataset properties at the population-level. Our primary focus in this work is on attacks to infer dataset properties in the centralized multi-party machine learning setting, where the model is securely trained on several parties' data, and parties only have black-box access to the final model. We propose an effective attack strategy that requires only a few hundred queries to the model and relies on a simple attack architecture that even a computationally bound attacker can use. Our attack successes on different types of datasets including tabular, text, and graph data, and leakage occurs even if the sensitive attribute is not included in the training data and has a low correlation with other attributes and the target variable. In Chapter 5, we depart from individual privacy to initiate the study of attribute privacy, where a data owner is concerned about revealing sensitive properties of a whole dataset during analysis. We propose definitions to capture attribute privacy in two relevant cases where global attributes may need to be protected:(1) properties of a specific dataset and (2) parameters of the underlying distribution from which dataset is sampled. We also provide two efficient mechanisms and one inefficient mechanism that satisfy attribute privacy for these settings.
dc.description.degree Ph.D.
dc.format.mimetype application/pdf
dc.identifier.uri http://hdl.handle.net/1853/64789
dc.language.iso en_US
dc.publisher Georgia Institute of Technology
dc.subject Differential privacy
dc.subject Privacy-preserving statistical tools
dc.title Privacy-preserving Statistical Tools: Differential Privacy and Beyond
dc.type Text
dc.type.genre Dissertation
dspace.entity.type Publication
local.contributor.advisor Mei, Yajun
local.contributor.corporatename H. Milton Stewart School of Industrial and Systems Engineering
local.contributor.corporatename College of Engineering
relation.isAdvisorOfPublication 278b2355-ca85-4111-b664-4d7e39f71482
relation.isOrgUnitOfPublication 29ad75f0-242d-49a7-9b3d-0ac88893323c
relation.isOrgUnitOfPublication 7c022d60-21d5-497c-b552-95e489a06569
thesis.degree.level Doctoral
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
1.4 MB
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
3.87 KB
Plain Text