Organizational Unit:
School of Computer Science

Research Organization Registry ID
Description
School established in 2007
Previous Names
Parent Organization
Parent Organization
Organizational Unit
Includes Organization(s)

Publication Search Results

Now showing 1 - 10 of 794
  • Item
    Language-Driven Robotics: Dynamic Closed-Loop Control for Complex Manipulation
    (Georgia Institute of Technology, 2024-12-16) Barroso, Pierre
    This thesis presents a novel framework for integrating language-driven control with dynamic closed-loop manipulation, enabling robotic systems to adapt effectively to complex and changing environments. By leveraging advanced 3D scene representations, vision-language models, and innovative methodologies, the research addresses the challenges of grasping and manipulating novel objects in real-time. The proposed approach combines Gaussian Splatting, a fast and efficient 3D representation technique, with CLIP embeddings to provide semantic understanding of the scene. Using Grounded-SAM2 for segmentation and Gaussian-based clustering for dynamic object detection, the system achieves high segmentation accuracy. Object tracking is handled with Co-Tracker 3, ensuring robust updates to object positions and transformations in dynamic scenes. These capabilities culminate in a grasp pose generation mechanism, allowing reliable execution of grasps on unseen objects. Experimental results demonstrate the framework's effectiveness in rapid scene reconstruction, accurate segmentation, robust tracking, and high success rates in grasp execution. Despite successes, challenges such as training instabilities, hardware dependencies, and tracking drift highlight opportunities for further improvement. This work advances language-driven robotics by integrating semantic understanding with dynamic manipulation. It lays the foundation for adaptive and intelligent robotic systems, with future directions including enhanced encoder/decoder models, improved dynamic scene representations, and expanded applications in fast-paced and complex tasks. The contributions open new avenues for real-world robotic applications requiring adaptability and precision.
  • Item
    ChatHF: Collecting Rich Human Feedback from Real-time Conversations
    (Georgia Institute of Technology, 2024-12-16) Li, Andrew Larry
    We introduce ChatHF, an interactive annotation framework for chatbot evaluation, which integrates configurable annotation within a chat interface. ChatHF can be flexibly configured to accommodate various chatbot evaluation tasks, for example detecting offensive content, identifying incorrect or misleading information in chatbot responses, and chatbot responses that might compromise privacy. It supports post-editing of chatbot outputs and supports visual inputs, in addition to an optional voice interface. ChatHF is suitable for collection and annotation of NLP datasets, and Human-Computer Interaction studies, as demonstrated in case studies on image geolocation and assisting older adults with daily activities.
  • Item
    Empowering Guardians of the Digital Realm: An Analysis of the Current State of Trust & Safety and Opportunities for Advancing the Industry
    (Georgia Institute of Technology, 2024-12-11) Swenson, Michael Ray
    This work analyzes the challenges faced by Trust & Safety professionals in managing online content moderation and transparency practices. Through 16 semi-structured interviews and participant observation, the authors examined how these professionals navigate complex policy areas, such as harassment, hate speech, misinformation, and legal requests. The study reveals that Trust & Safety workers encounter significant obstacles in moderating non-English content, addressing the needs of children and teens, and adapting to increasing governmental regulations worldwide. Participants emphasized the need for stronger knowledge-sharing programs, open-source tools, and cross-platform collaborations to better tackle online harm. Additionally, participants advocate for enhanced transparency reporting and algorithmic accountability to increase public trust. The study concludes by suggesting that Trust & Safety professionals should play a more active role in shaping regulations that govern online platforms. This work offers both theoretical insights into industry challenges and practical recommendations for advancing the Trust & Safety field through collaboration and knowledge sharing.
  • Item
    Capability-Aware Shared Hypernetworks for Heterogeneous Multi-Agent Coordination
    (Georgia Institute of Technology, 2024-12-09) Fu, Kevin
    Cooperative heterogeneous multi-agent tasks require agents to behave in a flexible and complementary manner that best leverages their diverse capabilities. Learning-based approaches to this challenge span a spectrum between two endpoints: i) shared-parameter methods, which assign an ID to each agent to encode diverse behaviors within a single architecture for sample-efficiency, but are limited in their ability to learn diverse behaviors; ii) independent methods, which learn a separate policy for each agent, enabling greater diversity at the cost of sample- and parameter-efficiency. Prior work on learning for heterogeneous multi-agent teams has already explored the middle ground of this spectrum by learning shared-parameter or independent policies for classes of agents, allowing for a compromise between diversity and efficiency. However, these approaches still do not reason over the impact of agent capabilities on behavior, and thus cannot generalize to unseen agents or team compositions. In this work, we aim to enable flexible and heterogeneous coordination without sacrificing diversity, sample efficiency or generalization to unseen agents and teams. First, inspired by work from trait-based heterogeneous task allocation, we explore how capability-awareness enables generalization to unseen agents and teams. We thoroughly evaluate our GNN-based capability-aware policy architecture, showing that it can more effectively generalize than existing work. Then, inspired by recent work in transfer learning and meta-RL, we propose Capability-Aware Shared Hypernetworks (CASH), a new soft weight sharing architecture for heterogeneous coordination that use hypernetworks to explicitly reason about continuous agent capabilities in addition to local observations. Intuitively, CASH allows the team to learn shared decision making strategies (captured by a shared encoder) that are readily adapted according to the team’s individual and collective capabilities (by a shared hypernetwork). Our design is agnostic to the underlying learning paradigm. We conducted detailed experiments across two heterogeneous coordination tasks and three standard learning paradigms (imitation learning, value-based and policy-gradient reinforcement learning). Results reveal that CASH generates appropriately diverse behaviors that consistently outperform baseline architectures in terms of task performance and sample efficiency during both training and zero-shot generalization. Notably, CASH provides these improvements with only 20% to 40% of the learnable parameters used by baselines.
  • Item
    Understanding Malware Analysts' Workflows to Narrow the Gap Between Research and Practice
    (Georgia Institute of Technology, 2024-12-08) Yong Wong, Miuyin M.
    Malicious software or malware presents a serious cybersecurity challenge, threatening individuals, organizations, and nation-states. To combat and prevent attacks launched with malware, it is essential to understand the malware’s intent and its impact on targeted systems. This process is usually referred to as malware analysis. Over the years, there have been significant research advances in automating the process of malware analysis. Despite these advances, human analysts still play an indispensable role in keeping defenses against malware current and effective. Unfortunately, the manual analysis process used by analysts in practice remains unexplored. To help address this gap, this thesis explores a human-centric approach to malware analysis. In this thesis, I begin by presenting the findings from a user study with malware analysts in practice. This study allowed us to define a taxonomy of malware analysts' objectives, identify five common analysis workflows, and highlight common challenges faced by these analysts. Next, I present the results of a comparative analysis that contrasts the findings from a systematic mapping of malware evasion countermeasures and insights gained from a user study on malware evasion. This comparison reveals several gaps between the real challenges faced by malware experts dealing with evasive malware and the focus of research solutions. Moreover, it highlights future research directions that can help analysts overcome challenging evasion techniques. Lastly, I demonstrate the potential of Large Language Models (LLMs) to help analysts overcome some of the identified challenges that arise due to evasion tactics, with a human-in-the-loop approach. Malware analysis remains a serious challenge despite decades of research and tool development. It is hoped that the insights offered by this thesis help researchers develop tools and techniques that can reduce analyst burden and help us develop defenses against malware in a more timely manner.
  • Item
    Deep Reinforcement Learning Framework for Autonomous Surface Vehicles in Environmental Cleanup
    (Georgia Institute of Technology, 2024-12-08) Ro, Junghwan
    The water pollution from floating plastics poses significant environmental threats that require efficient solutions. ASV presents a promising solution to address this challenge. However, deploying DRL for ASV control in real-world environmental missions is underexplored due to simulation limitations and the sim-to-real gap. This thesis presents a DRL framework for ASVs focused on environmental missions, explicitly targeting the autonomous collection of floating waste. An open-source, highly parallelized hydrodynamics and buoyancy simulation environment is developed to facilitate large-scale training. By integrating system identification with domain randomization, we reduce the sim-to-real gap, enhancing the robustness and energy efficiency of the trained agents. The proposed approach is validated through simulation and real-world experiments, demonstrating improved task completion times and reduced energy consumption. Task experiments show that our approach reduces energy consumption by 13.1%, while reducing task completion time by 7.4%. These findings, supported by sharing our open-source implementation, have the potential to impact the efficiency and versatility of ASVs, contributing to environmental preservation efforts. This thesis incorporates and expands on work previously published in a paper presented at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) in 2024. Significant portions of the content have been reused and adapted to fit the comprehensive format and depth required for the thesis.
  • Item
    Human-centered Explainable AI
    (Georgia Institute of Technology, 2024-12-08) Ehsan, Upol
    If AI systems are going to inform consequential decisions such as deciding whether you should get a loan or receive an organ transplant, they must be explainable to everyone, not just software engineers. Despite commendable technical progress in “opening” the black-box of AI, the prevailing algorithm-centered Explainable AI (XAI) view overlooks a vital insight: who opens the black-box matters just as much as opening it. As a result of this blind spot, many popular XAI interventions have been ineffective and even harmful in real-world settings. To address the blind spot, this dissertation introduces and operationalizes Human- centered XAI (HCXAI), a human-centered and sociotechnically-informed XAI paradigm. Focusing on non-AI experts, this dissertation demonstrates how Human-centered XAI: • expands the design space of XAI by broadening the domain of non-algorithmic factors that augment AI explainability and illustrating how to incorporate them • enriches our knowledge of the importance of “who” the humans are in XAI design • enables resourceful ways to do Responsible AI by providing proactive mitigation strategies through participatory methods It contributes 1) conceptually: new concepts such as such as Social Transparency that showcase how to encode socio-organizational context to augment explainability without changing the internal model; 2) methodologically: human-centered evaluation of XAI, actionable frameworks, and participatory methods to co-design XAI systems; 3) technically: computational techniques and design artifacts; 4) empirically: findings such as how one’s AI background impacts one’s interpretation of AI explanations, user perceptions of real AI users, and how AI explanations can negatively impact users despite our best intentions. The impact of this dissertation spans research, practice, and policy. Beyond pioneering the HCXAI research domain, it has influenced society– informed AI policies at interna- tional organizations like the UN and being incorporated into NIST’s AI Risk Management Framework, a global standard for Responsible AI. The work been adopted by industry– seven Fortune 500 companies adopted its techniques, positively impacting over 3 million users by addressing AI trust calibration and resulting in savings of US $4.2 million. It has also nurtured a vibrant research community–over 400 researchers from 19+ countries have participated in four HCXAI workshops at ACM CHI (the leading venue for Human- Computer Interaction research) since 2021, culminating in the first ACM HCXAI journal issue, where I led the editorial efforts. The dissertation transforms the XAI discourse from an algorithm-centered perspective to a human-centered one. It takes a foundational step towards creating a future where anyone, regardless of their background, can interact with AI systems in an explainable, accountable, and dignified manner so that people who are not at the table do not end up on the menu.
  • Item
    Traffic Sign Localization Using SfM and Deep Learning
    (Georgia Institute of Technology, 2024-12-08) Ho, Hoang Nhu
    This study addresses the challenge of traffic sign inventory management faced by the U.S. Department of Transportation in complying with Manual on Uniform Traffic Control Devices (MUTCD) standards. The study proposes a cost-effective methodology for geo-localizing traffic signs using smartphone-recorded video and GPS data. The approach employs various techniques, including depth estimation deep learning models, and Structure-from-Motion (SfM), to accurately determine the geographic coordinates of roadside traffic signs. The methodology was tested in diverse environments, including challenging mountain roads with curves and urban settings. Structure-from-Motion (SfM) is shown as the most effective approach, demonstrating high accuracy with 90.91% of tested signs (40 out of 44) in Pima County, Arizona, and 88.24% of tested signs (90 out of 102) in Peyton Road, Atlanta, Georgia, achieving a distance error below 4.9 meters. The remaining discrepancies were mainly caused by GPS inaccuracies rather than the limitations. These results show SfM as a promising solution for efficient and accurate traffic sign geo-localization.
  • Item
    Navigation to Objects: The Significance of Scene Realism and Moving towards Universal Navigation
    (Georgia Institute of Technology, 2024-12-08) Khanna, Mukul
    Recent years have brought considerable progress in embodied AI agents that navigate in realistic scenes, follow language instructions, find and rearrange objects, and perform other tasks involving embodied sensing, planning, and acting. This progress is bolstered by simulation platforms that enable systematic, safe, and scalable training and evaluation of embodied AI agents before deployment to the real world. However, despite the ubiquitous use of synthetic 3D scene datasets in embodied AI experiments, there has been no systematic analysis of the tradeoffs between dataset scale (number of scenes and total scene physical size) and dataset realism (visual fidelity and correlation to real-world statistics). Furthermore, the community has primarily been focused on episodic navigation – agents navigating to only a single-goal object in each episode specified through only a single input modality (e.g. object category label or a natural language description or an image). In this thesis, we focus on tackling these two shortcomings in prior work by 1) contributing a new dataset of high-quality, human-authored synthetic 3D scenes and a systematic analysis of scene dataset scale and realism towards improved ObjectNav agent generalization, and 2) building novel universal navigation systems capable of handling various goal types, enabling more effective user interaction with robots. In Chapter 2, we contribute the Habitat Synthetic Scenes Dataset (HSSD-200), a dataset of 211 high-quality realistic 3D scenes and 18,656 models of real-world objects, and use it to test navigation agent generalization to realistic 3D environments. Specifically, we investigate the impact of synthetic 3D scene dataset scale and realism on the task of training embodied agents to find and navigate to objects (ObjectGoal navigation). We find that scale helps in generalization, but the benefits quickly saturate, making visual fidelity and correlation to real-world scenes more important. In Chapter 3 and Chapter 4, we move beyond single-goal episodic evaluation setups focusing on only one goal specification modality and explore universal navigation agents that are multi-modal and lifelong. Specifically, we introduce the GO to Any Thing (GOAT) task, a state-of-the-art modular system for universal navigation in the real world, and a benchmark named GOAT-Bench with a comprehensive analysis of modular and end-to-end trained methods with and without memory representations.
  • Item
    Witness Functions in Program Analysis and Complexity Theory
    (Georgia Institute of Technology, 2024-12-08) Ding, Shuo
    Proving impossibility results is one of the main themes of program analysis theory and computability/complexity theory. For example, we can prove a program analysis problem is undecidable, meaning that there does not exist an algorithm to precisely solve the problem. As another example, we can prove a problem does not belong to a complexity class, meaning that every correct algorithm for the problem must exceed the given resource restriction. In general, given a class C of computational problems and a specific computational problem P not in C, a witness function maps every candidate Q in C to an input on which P and Q are different. We investigate the computational properties of such witness functions and discuss their implications. In program analysis theory, we prove that a large class of undecidable program analysis problems have computable witness functions, including every semantic property described in Rice's theorem. This implies the existence of computable functions mapping every program analyzer to a more precise program analyzer. Through two real program analysis tasks (1) CFL-reachability based program analysis for Java and LLVM-IR and (2) template constraint analysis for C++, we demonstrate that computable witness functions provide guarantees on the progress of developing more and more precise program analysis techniques. In complexity theory, we prove that witness functions for major complexity classes are closely related to reductions, and discuss the implications in complexity class separation proofs.