Organizational Unit:
College of Computing

Research Organization Registry ID
Description
Previous Names
Parent Organization
Parent Organization
Includes Organization(s)

Publication Search Results

Now showing 1 - 10 of 2990
  • Item
    Empirical Measurements of the Security, Privacy, and Usability of Website Password Authentication Workflows
    (Georgia Institute of Technology, 2024-07-31) Alroomi, Suood
    In an era where digital interactions are integral to daily life, the security and privacy of online authentication mechanisms are crucial for protecting user data and maintaining trust in web services. Passwords, though decades old, remain the most common form of authentication and are likely to stay ubiquitous. Therefore, the web ecosystem’s security depends on how users and websites handle passwords and manage authentication. Researchers have extensively explored user behavior with passwords, offering insights into how websites should handle authentication and leading to significant updates in modern guidelines. A significant gap remains in understanding how websites handle authentication and whether they adhere to best practices. This dissertation aims to bridge that gap through large-scale empirical measurements of website authentication practices. I develop measurement techniques to systematically evaluate websites’ authentication policies and implementation decisions and apply them at scale to assess their authentication workflows. I reveal the disparity between modern recommendations and real-world implementations. My studies show that while guidelines inform policy decisions, barriers prevent adopting recent recommendations, highlighting the need for education and outreach efforts. Further, I found poor policy decisions aligning with the default configurations of web software, which often compromise security, privacy, or usability. Updating these defaults to match modern guidelines could significantly reduce vulnerabilities and promote best practices. Moreover, incorporating security features such as blocking common passwords and rate limiting could significantly enhance the security of websites, as many are found lacking these defenses. I also identify concerning practices in authentication workflows, such as insecure communication, misconfigured HTTPS deployments, and mixed content vulnerabilities. While TLS deployment has improved, work remains to migrate all sensitive resources to HTTPS. Standardized authentication workflows with centralized security controls and outreach efforts can further mitigate inconsistencies and improve authentication security.
  • Item
    Exploring How Visualization Design Affects Perceived Message Credibility
    (Georgia Institute of Technology, 2024-07-29) Song, Hayeong
    Well-designed visualizations can leverage the strength of our perceptual capabilities and augment viewers’ cognition to help find insights about data, facilitate content comprehension, and enable informed decision-making. However, poor visualizations can obstruct the understanding of the content and can even bias viewers’ data interpretation and analysis. In this context, visualization serves as a medium between viewers and the information being conveyed, enhancing the message's credibility. This dissertation focuses on developing a better understanding of how combinations of different design choices affect viewers' perceived message credibility in visualization. It is comprised of three high-level goals: • Identify embellishment factors that influence perceived credibility: We identify factors that contribute to shaping viewers' perceived credibility for communicated messages (embellishment: color, imagery, or stylized fonts and shapes) in visualizations. • Quantify effects of embellishment factors in perceived credibility: We conduct crowdsourced studies to quantify the effects of design choices that shape people's perception of message credibility. • Provide design guidelines: We provide design guidelines that suggest ways that visualizations might leverage embellishment to effectively communicate engaging messages without degrading perceived message credibility.
  • Item
    Augmenting Visualizations with Statistical and User-Defined Data Facts
    (Georgia Institute of Technology, 2024-07-28) Guo, Grace
    When designing visualizations and visualization systems, we often augment charts and graphs with visual elements in order to convey richer and more nuanced information about relationships in the data. However, we do not fully understand user considerations when creating these augmentations, nor do we have toolkits to support augmentation authoring. This thesis first outlines a design space of user-created augmentations, then introduces Auteur, a front-end JavaScript toolkit designed to help developers add augmentations to web-based D3 visualizations and systems to convey statistical and custom data relationships. The library is then customized and extended for the domains of online learning and causal inference, where users may be interested in domain-specific data relationships or work with unique chart types and data sets. Collectively, these contributions aim to help us better incorporate user-defined augmentations into visualizations for analysis and storytelling, thus conveying human context, user preferences, and domain knowledge through our charts and graphs.
  • Item
    Can a Neural ODE learn a Chaotic System?
    (Georgia Institute of Technology, 2024-07-28) Park, Jeongjin
    Learning chaotic dynamical systems from data presents significant challenges due to their inherent unpredictability and high sensitivity to initial conditions. Conventional metrics of model performance, such as generalization error, often fail to capture a neural network's ability to reproduce the invariant statistics of these complex systems. In this thesis, we demonstrate through a comprehensive set of examples that pointwise accuracy (measured through generalization error) does not necessarily translate into statistical fidelity. Our evaluation leverages concepts from ergodic theory to provide a more nuanced assessment of model performance. Then, we propose and implement modifications to the training scheme by incorporating Jacobian information. We show that this approach enables the reproduction of correct physical measures for chaotic systems, which we term "statistically-accurate learning.'' We also report the failure mode of our proposed training scheme and give a theoretical explanation with shadowing lemma. Our work offers valuable insights into the limitations of traditional machine learning theory when applied to complex systems. These findings have implications for improving the reliability and interpretability of machine learning models in complex dynamical contexts, with potential applications in fields such as climate modeling, fluid dynamics, and nonlinear control systems.
  • Item
    Human-Centeredness in Understanding and Detecting Online Harassment
    (Georgia Institute of Technology, 2024-07-27) Kim, Seunghyun
    Online harassment presents a pervasive and concerning issue, particularly impacting vulnerable groups like youth. However, prevailing detection systems often prioritize technical aspects, neglecting perspectives and experiences of the affected. Despite extensive research, these systems remain limited, necessitating more human-centered solutions. Recognizing online harassment's salient causes and its profound effects—such as psychological distress, reduced social support, and increase risk of mental health concerns—prioritizing victims' well-being through human-centered methods in understanding and detecting online harassment is crucial. This dissertation focuses on key inquiries concerning human involvement in online harassment detection. These encompass the role of humans in automated detection, ground truth establishment, platform-based variations in detection, impact on victims, and vulnerable demographics. Leveraging unique datasets, especially from affected youth, including public peer-support interaction data, voluntarily-shared communications in private channels, and clinical records enrich the understanding of individual attributes affecting interactions and harassment on social media. The dissertation offers many novel and critical insights. Having successfully assessed human-centric aspects in automated harassment detection algorithms, it has investigated existing harassment detection approaches, sources and semantics of ground truth annotations, dataset reliance, and connections between individual traits and harassment. Empirical analysis has compared harassment classifiers using varied victim-contributed and victim-distanced annotations, stressing the need to integrate stakeholders' experiences. The research has explored dataset biases across the public- and privateness of networked spaces, and has examined harassment’s broader mental health impact through a causal inference framework. Importantly, the dissertation further unveils the relationship between mental health and harassment, focusing on contexts that increase or offset vulnerability to harassment. Focusing on linguistic and behavioral features from youth-contributed social media data and clinical records, the work has examined nuanced life circumstances influencing online experiences. Emphasizing contextual insights, it promises to guide tailored mental health support for affected individuals. In summary, this thesis introduces a human-centered machine learning approach, enhancing harassment detection in ground truth establishment and dataset curation. It explores individual characteristics and online harassment experiences, offering insights into human, machine learning, and mental health dynamics. Advocating for a comprehensive approach considering diverse online experiences, it contributes to improving detection efficacy towards fostering a safer digital environment for all.
  • Item
    Block Iterative Methods with Applications to Density Functional Theory
    (Georgia Institute of Technology, 2024-07-27) Shah, Shikhar
    A novel, cubic-scaling algorithm for computing the electronic correlation energy in density functional theory via the random phase approximation was proposed. The key computational kernel involves solving a family of large, sparse, and complex block linear systems. A short-term recurrence block Krylov subspace method was proposed to solve this family of linear systems and yields both a short time-to-solution and good parallel efficiency. Efficiency losses arising from an emergent load imbalance vanish when a shifted Laplacian preconditioner was introduced. A second novel algorithm was also proposed by leveraging block Krylov subspace methods to perform a functional trace approximation. This alternative algorithm is also cubic-scaling and most viable when higher levels of parallelism are available or required. Additionally, two adjacent topics were investigated. First, a method for choosing a robust low degree polynomial preconditioner was proposed. In situations where a random right-hand side vector produces a poor preconditioner, such as for highly non-normal matrices, this novel method is preferable. Second, a method for avoiding exact diagonalization in nonlinear polynomial-filtered subspace iteration was proposed. This approximate diagonalization was significantly faster than exact counterparts while not adversely impacting the convergence of the nonlinear subspace iteration procedure.
  • Item
    Large-Scale Offline Pre-Training Bootstraps Embodied Intelligence
    (Georgia Institute of Technology, 2024-07-27) Majumdar, Arjun
    A central goal in Artificial Intelligence (AI) is to develop embodied intelligence -- i.e., embodied agents such as mobile robots that can accomplish a wide variety of tasks in real-world, physical environments. In this dissertation, we will argue that offline pre-training of foundation models on web-scale data can bootstrap embodied intelligence. In part 1, we present VC-1, a visual foundation model pre-trained (primarily) on video data collected from an egocentric perspective. We systematically demonstrate that such models substantially benefit from pre-training dataset diversity by introducing CortexBench, an embodied AI (EAI) benchmark curated from a diverse collection of existing EAI tasks spanning locomotion, navigation, and dexterous or mobile manipulation. In part 2, we first demonstrate that visual grounding learned from internet data (i.e., image-caption pairs from the web) can be transferred to an instruction-following visual navigation agent (VLN-BERT). Then, we present ZSON, a highly scalable approach for learning to visually navigate to objects specified in open-vocabulary, natural language instructions such as “find the kitchen sink.” In part 3, we study spatial understanding in real-world indoor environments. First, we introduce an evaluation benchmark (OpenEQA) to measure progress on answering open-ended questions about 3D scenes. Then, we present a modular agent that leverages pre-trained components such as vision-language models (VLMs) to address the question-answering task.
  • Item
    Data-Centric Bias Mitigation in Machine Learning Life Cycle
    (Georgia Institute of Technology, 2024-07-26) Zhang, Hantian
    As Machine Learning (ML) becomes increasingly central to decision-making processes in our society, it is crucial to acknowledge the potential of these ML models to inadvertently perpetuate biases, disproportionately impacting certain demographic groups and individuals. For instance, some ML models used in judicial systems have shown biases against African Americans when predicting recidivism rates. Therefore, addressing the inherent biases and ensuring fairness in ML models is imperative. While enhancements in fairness can be implemented by changing the ML models directly, we argue that a more foundational solution lies in correcting the data as biased data is often the root cause of unfairness. In this dissertation, we aim to systematically understand and mitigate biases in ML models in the full ML life-cycle, from data preparation (pre-processing), to model training (in-processing) and model validation (post-processing). First, we develop a pioneering system, iFlipper, that optimizes for individual fairness in ML. iFlipper enhances training data during data preparation by adjusting the labels, thus mitigating inconsistencies that arise when similar individuals receive varying outcomes. Experiments on real datasets show that iFlipper significantly outperforms other pre-processing baselines in terms of individual fairness and accuracy on unseen test sets. Subsequently, we introduce a declarative system OmniFair that aims at bolstering group fairness in ML. OmniFair allows users to define specific group fairness constraints and change the weight of each training sample during the training process to achieve given group fairness constraints. We show that OmniFair is more versatile than existing algorithmic fairness approaches in terms of both supported fairness constraints and downstream ML models. OmniFair reduces the accuracy loss by up to 94.8% compared with the second best method. Finally, we present a method to discover and explain semantically coherent subsets (slices) of unstructured data where the ML models underperform after the models are trained. To be specific, we introduce a new perspective for quantifying explainability in unstructured data slices by borrowing the concept of separability from machine learning literature. We find that separability, which captures how well a slice can be differentiated from the rest of the dataset, complements the coherence measure that focuses on the commonalities of all instances within a slice. Preliminary results demonstrate that a separability-based slice discovery algorithm can identify complementary data slices to existing, coherence-based approaches. The three works in this dissertation can be integrated in to a comprehensive system that reduces bias in data in the full machine learning life cycle, which can cover different fairness metrics and different types of data. To be specific, iFlipper is responsible for structured data and individual fairness in the data preparation step. OmniFair is responsible for structured data and group fairness in the model training step. And the slice discovery is responsible for unstructured data in the model validation step.
  • Item
    Fine-grained Modeling for Clinical Decisions via Machine Learning
    (Georgia Institute of Technology, 2024-07-26) Cui, Jiaming
    Public health challenges threaten people’s lives and place a high burden on our healthcare system. For example, for infectious diseases, COVID-19 has led to 775 million cases and 7 million deaths worldwide as of July 2024, making it one of the largest public health crises in human history. Healthcare-associated infections (HAIs), such as Methicillin-resistant Staphylococcus aureus (MRSA) and Clostridioides difficile (C. diff), infect approximately 3% of hospitalized patients in the United States every year, resulting in more than 2.8 million cases and 35,000 deaths annually. These challenges force us to make critical clinical decisions in context of public health, such as releasing less severe patients to admit those in greater danger. In turn, such decisions may also lead to more infections in the community, and influence the hospital back. However, it is challenging to use existing epidemiological models to guide such decision-making. The spread pathways of infectious diseases are much more complicated in hospital settings when they interact with healthcare systems, and existing models cannot capture all these pathways effectively to make informed decisions. Additionally, these models also cannot digest the rich clinical datasets that provide a large amount of patient-level data, restricting them from making accurate, fine-grained decisions. To tackle this, in this dissertation, we propose novel machine learning algorithms that can utilize these datasets to help design more detailed, fine-grained epidemiological models for more accurate infectious disease surveillance and control practices. Specifically, in Part I, we will show how such environmental factor-mediated models could better reconstruct the spread pathway in hospitals for infectious disease surveillance and achieve more cost-efficient contact precaution policies for clinical control. In Part II, we will propose new ML algorithms to better calibrate epidemiological models to learn more accurate model parameters. Moreover, we also designed new frameworks to integrate neural networks and epidemiological models simultaneously, which allows us to incorporate electronic health record (EHR) data to give patient-level predictions. Experimental results on real-world clinical datasets from large hospital systems demonstrate that our models and frameworks lead to more effective and efficient decision-making, thereby better bridging public health with clinical decisions.
  • Item
    The Algorithm Keeps The Score: Identity, Marginalization, and Power in the Technology-Mediated Search for Care
    (Georgia Institute of Technology, 2024-07-26) Pendse, Sachin R.
    Severe psychological distress and mental illness are widespread. Globally, one in every two people will experience a mental health disorder at some point over the course of their lifetime. Identity has long played a core role in how each of those individuals understands their distress, expresses it to others, and searches for care. Directly tied to identity, societal marginalization and power similarly play a core role in whether individuals in distress can successfully access the resources and care that could deliver them relief. Alongside identity, power, and marginalization, technologies increasingly play a role in how people engage with care --- people in distress may turn to mental health helplines, online support communities, large language model chatbots, and other accessible technologies as they make meaning from their distress and search for care. In turn, the design of those technologies also have an influence on people's illness experiences, just as identity, power, and marginalization do. I understand these support technologies to be technology-mediated mental health support (TMMHS) systems, in which technology mediates support provided to people in distress. This dissertation engages in the study of how identity, power, and marginalization intersect with the design of TMMHS systems to influence people's experiences of distress, as well as their subsequent engagements with care. Marginalized populations often have diverse and unmet mental health needs---I thus investigate how the design of TMMHS systems may be helpful in validating and meeting these marginalized needs, as well as how the design of TMMHS systems may further compound offline inequities, and make it more difficult for people to access acceptable and effective care. Through use of both quantitative and qualitative methods, I highlight the limitations associated with the psychiatric and computational approach that often decontextualizes and quantifies experiences of distress. I propose that one means to mitigate these limitations is to incorporate considerations of identity, power, and marginalization in the design of TMMHS systems, and argue that doing so could ensure that diverse people are able to to leverage technology for their mental health needs. I describe what these considerations may look like, including a focus on designing technologies that strengthen support relationships, an awareness of differences across people of diverse identities, and a constant eye to patterns of historical marginalization. My dissertation begins by first outlining the history of mental health support in both traditional and technology-mediated contexts, and discusses the role of colonial power relations in how both identity and mental illness have been understood, including for my areas of study in the United States and India. This provides important context for my empirical investigation of people's experiences with TMMHS systems. I then examine four areas in which identity, marginalization, and power have a direct and salient impact on how people engage with diverse TMMHS systems. First, I investigate use of Indian mental health helplines, analyzing how volunteers provide care to individuals in distress, and where (and for whom) technical and structural gaps together prevent that care from actually being accessed by callers in distress. I next shift to resource constrained areas of the United States, investigating how individuals in mental health professional shortage areas use TMMHS systems to fill structural gaps and create new identities from their experiences of distress, and the role of technical design and marginalization in what I find to often be deeply polarized environments within TMMHS systems. Building on this finding, I then examine a dimension of social identity that is particularly polarized in the U.S. today, or partisan identity. I quantitatively examine differentiated engagements among partisan users of online support communities, and investigate where there may be differences in potential avenues to care through analyzing personalized search engine results for U.S. Republican and Democrat partisan groups. I end by investigating identity-based biases within a new and emergent form of TMMHS support, or LLM-based chatbots, including a quantitative analysis of biases and qualitative analysis of lived experiences with this new tool. Across all, I use the language people use around their distress as a tool to analyze how identity, power, and marginalization interact with technical (and algorithm) design to influence people's lived experiences with their mental health. My research contributes a deeper understanding of the harms that are created when technology-mediated support is not considerate of histories of marginalization, and where support technologies can be sensitive to identity-based marginalization. In summary, with this dissertation, I ask the question --- what do we gain and lose when technology (and the algorithms that underlie it) keep the score around our experiences with mental health?