Series
Master of Science in Computer Science

Series Type
Degree Series
Description
Associated Organization(s)
Associated Organization(s)
Organizational Unit
School of Computer Science
School established in 2007
Organizational Unit
School of Interactive Computing
School established in 2007
Organizational Unit
School of Computational Science and Engineering
School established in May 2010

Publication Search Results

Now showing 1 - 10 of 139
  • Item
    Language-Driven Robotics: Dynamic Closed-Loop Control for Complex Manipulation
    (Georgia Institute of Technology, 2024-12-16) Barroso, Pierre
    This thesis presents a novel framework for integrating language-driven control with dynamic closed-loop manipulation, enabling robotic systems to adapt effectively to complex and changing environments. By leveraging advanced 3D scene representations, vision-language models, and innovative methodologies, the research addresses the challenges of grasping and manipulating novel objects in real-time. The proposed approach combines Gaussian Splatting, a fast and efficient 3D representation technique, with CLIP embeddings to provide semantic understanding of the scene. Using Grounded-SAM2 for segmentation and Gaussian-based clustering for dynamic object detection, the system achieves high segmentation accuracy. Object tracking is handled with Co-Tracker 3, ensuring robust updates to object positions and transformations in dynamic scenes. These capabilities culminate in a grasp pose generation mechanism, allowing reliable execution of grasps on unseen objects. Experimental results demonstrate the framework's effectiveness in rapid scene reconstruction, accurate segmentation, robust tracking, and high success rates in grasp execution. Despite successes, challenges such as training instabilities, hardware dependencies, and tracking drift highlight opportunities for further improvement. This work advances language-driven robotics by integrating semantic understanding with dynamic manipulation. It lays the foundation for adaptive and intelligent robotic systems, with future directions including enhanced encoder/decoder models, improved dynamic scene representations, and expanded applications in fast-paced and complex tasks. The contributions open new avenues for real-world robotic applications requiring adaptability and precision.
  • Item
    ChatHF: Collecting Rich Human Feedback from Real-time Conversations
    (Georgia Institute of Technology, 2024-12-16) Li, Andrew Larry
    We introduce ChatHF, an interactive annotation framework for chatbot evaluation, which integrates configurable annotation within a chat interface. ChatHF can be flexibly configured to accommodate various chatbot evaluation tasks, for example detecting offensive content, identifying incorrect or misleading information in chatbot responses, and chatbot responses that might compromise privacy. It supports post-editing of chatbot outputs and supports visual inputs, in addition to an optional voice interface. ChatHF is suitable for collection and annotation of NLP datasets, and Human-Computer Interaction studies, as demonstrated in case studies on image geolocation and assisting older adults with daily activities.
  • Item
    Empowering Guardians of the Digital Realm: An Analysis of the Current State of Trust & Safety and Opportunities for Advancing the Industry
    (Georgia Institute of Technology, 2024-12-11) Swenson, Michael Ray
    This work analyzes the challenges faced by Trust & Safety professionals in managing online content moderation and transparency practices. Through 16 semi-structured interviews and participant observation, the authors examined how these professionals navigate complex policy areas, such as harassment, hate speech, misinformation, and legal requests. The study reveals that Trust & Safety workers encounter significant obstacles in moderating non-English content, addressing the needs of children and teens, and adapting to increasing governmental regulations worldwide. Participants emphasized the need for stronger knowledge-sharing programs, open-source tools, and cross-platform collaborations to better tackle online harm. Additionally, participants advocate for enhanced transparency reporting and algorithmic accountability to increase public trust. The study concludes by suggesting that Trust & Safety professionals should play a more active role in shaping regulations that govern online platforms. This work offers both theoretical insights into industry challenges and practical recommendations for advancing the Trust & Safety field through collaboration and knowledge sharing.
  • Item
    Capability-Aware Shared Hypernetworks for Heterogeneous Multi-Agent Coordination
    (Georgia Institute of Technology, 2024-12-09) Fu, Kevin
    Cooperative heterogeneous multi-agent tasks require agents to behave in a flexible and complementary manner that best leverages their diverse capabilities. Learning-based approaches to this challenge span a spectrum between two endpoints: i) shared-parameter methods, which assign an ID to each agent to encode diverse behaviors within a single architecture for sample-efficiency, but are limited in their ability to learn diverse behaviors; ii) independent methods, which learn a separate policy for each agent, enabling greater diversity at the cost of sample- and parameter-efficiency. Prior work on learning for heterogeneous multi-agent teams has already explored the middle ground of this spectrum by learning shared-parameter or independent policies for classes of agents, allowing for a compromise between diversity and efficiency. However, these approaches still do not reason over the impact of agent capabilities on behavior, and thus cannot generalize to unseen agents or team compositions. In this work, we aim to enable flexible and heterogeneous coordination without sacrificing diversity, sample efficiency or generalization to unseen agents and teams. First, inspired by work from trait-based heterogeneous task allocation, we explore how capability-awareness enables generalization to unseen agents and teams. We thoroughly evaluate our GNN-based capability-aware policy architecture, showing that it can more effectively generalize than existing work. Then, inspired by recent work in transfer learning and meta-RL, we propose Capability-Aware Shared Hypernetworks (CASH), a new soft weight sharing architecture for heterogeneous coordination that use hypernetworks to explicitly reason about continuous agent capabilities in addition to local observations. Intuitively, CASH allows the team to learn shared decision making strategies (captured by a shared encoder) that are readily adapted according to the team’s individual and collective capabilities (by a shared hypernetwork). Our design is agnostic to the underlying learning paradigm. We conducted detailed experiments across two heterogeneous coordination tasks and three standard learning paradigms (imitation learning, value-based and policy-gradient reinforcement learning). Results reveal that CASH generates appropriately diverse behaviors that consistently outperform baseline architectures in terms of task performance and sample efficiency during both training and zero-shot generalization. Notably, CASH provides these improvements with only 20% to 40% of the learnable parameters used by baselines.
  • Item
    Deep Reinforcement Learning Framework for Autonomous Surface Vehicles in Environmental Cleanup
    (Georgia Institute of Technology, 2024-12-08) Ro, Junghwan
    The water pollution from floating plastics poses significant environmental threats that require efficient solutions. ASV presents a promising solution to address this challenge. However, deploying DRL for ASV control in real-world environmental missions is underexplored due to simulation limitations and the sim-to-real gap. This thesis presents a DRL framework for ASVs focused on environmental missions, explicitly targeting the autonomous collection of floating waste. An open-source, highly parallelized hydrodynamics and buoyancy simulation environment is developed to facilitate large-scale training. By integrating system identification with domain randomization, we reduce the sim-to-real gap, enhancing the robustness and energy efficiency of the trained agents. The proposed approach is validated through simulation and real-world experiments, demonstrating improved task completion times and reduced energy consumption. Task experiments show that our approach reduces energy consumption by 13.1%, while reducing task completion time by 7.4%. These findings, supported by sharing our open-source implementation, have the potential to impact the efficiency and versatility of ASVs, contributing to environmental preservation efforts. This thesis incorporates and expands on work previously published in a paper presented at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) in 2024. Significant portions of the content have been reused and adapted to fit the comprehensive format and depth required for the thesis.
  • Item
    Traffic Sign Localization Using SfM and Deep Learning
    (Georgia Institute of Technology, 2024-12-08) Ho, Hoang Nhu
    This study addresses the challenge of traffic sign inventory management faced by the U.S. Department of Transportation in complying with Manual on Uniform Traffic Control Devices (MUTCD) standards. The study proposes a cost-effective methodology for geo-localizing traffic signs using smartphone-recorded video and GPS data. The approach employs various techniques, including depth estimation deep learning models, and Structure-from-Motion (SfM), to accurately determine the geographic coordinates of roadside traffic signs. The methodology was tested in diverse environments, including challenging mountain roads with curves and urban settings. Structure-from-Motion (SfM) is shown as the most effective approach, demonstrating high accuracy with 90.91% of tested signs (40 out of 44) in Pima County, Arizona, and 88.24% of tested signs (90 out of 102) in Peyton Road, Atlanta, Georgia, achieving a distance error below 4.9 meters. The remaining discrepancies were mainly caused by GPS inaccuracies rather than the limitations. These results show SfM as a promising solution for efficient and accurate traffic sign geo-localization.
  • Item
    Navigation to Objects: The Significance of Scene Realism and Moving towards Universal Navigation
    (Georgia Institute of Technology, 2024-12-08) Khanna, Mukul
    Recent years have brought considerable progress in embodied AI agents that navigate in realistic scenes, follow language instructions, find and rearrange objects, and perform other tasks involving embodied sensing, planning, and acting. This progress is bolstered by simulation platforms that enable systematic, safe, and scalable training and evaluation of embodied AI agents before deployment to the real world. However, despite the ubiquitous use of synthetic 3D scene datasets in embodied AI experiments, there has been no systematic analysis of the tradeoffs between dataset scale (number of scenes and total scene physical size) and dataset realism (visual fidelity and correlation to real-world statistics). Furthermore, the community has primarily been focused on episodic navigation – agents navigating to only a single-goal object in each episode specified through only a single input modality (e.g. object category label or a natural language description or an image). In this thesis, we focus on tackling these two shortcomings in prior work by 1) contributing a new dataset of high-quality, human-authored synthetic 3D scenes and a systematic analysis of scene dataset scale and realism towards improved ObjectNav agent generalization, and 2) building novel universal navigation systems capable of handling various goal types, enabling more effective user interaction with robots. In Chapter 2, we contribute the Habitat Synthetic Scenes Dataset (HSSD-200), a dataset of 211 high-quality realistic 3D scenes and 18,656 models of real-world objects, and use it to test navigation agent generalization to realistic 3D environments. Specifically, we investigate the impact of synthetic 3D scene dataset scale and realism on the task of training embodied agents to find and navigate to objects (ObjectGoal navigation). We find that scale helps in generalization, but the benefits quickly saturate, making visual fidelity and correlation to real-world scenes more important. In Chapter 3 and Chapter 4, we move beyond single-goal episodic evaluation setups focusing on only one goal specification modality and explore universal navigation agents that are multi-modal and lifelong. Specifically, we introduce the GO to Any Thing (GOAT) task, a state-of-the-art modular system for universal navigation in the real world, and a benchmark named GOAT-Bench with a comprehensive analysis of modular and end-to-end trained methods with and without memory representations.
  • Item
    CRAFT: Curriculum Rank Adversarial Fine-Tuning for Robust Vision Language Models
    (Georgia Institute of Technology, 2024-12-07) Chopra, Shivang
    Existing Vision-Language Models (VLMs) have demonstrated remarkable zero-shot performance across various visual domains and tasks. However, recent studies have shown that fine-tuning VLMs on downstream tasks results in loss of generalization and decreased robustness against distribution shifts. To address this issue, we propose Curriculum Rank Adversarial Fine-Tuning (CRAFT), a unified low-rank fine-tuning framework designed to enhance both out-of-distribution (OOD) and adversarial robustness by integrating adaptive adversarial weight perturbations into a curriculum-driven Low-Rank Adaptation (LoRA) framework. CRAFT is grounded in three key insights: (1) constrained parameter updates preserve OOD generalization, (2) promoting a flat weight-loss landscape enhances OOD robustness, and (3) adversarial training with adaptive perturbation budgets mitigate catastrophic forgetting. By progressively increasing the rank of weight updates and perturbations over the course of fine-tuning, CRAFT balances task-specific adaptation with robustness, yielding flatter minima and enhanced OOD robustness. Through comprehensive empirical experiments, we demonstrate that CRAFT preserves the VLM's zero-shot abilities while adapting to specific tasks, outperforming state-of-the-art adversarial and robust fine-tuning approaches in both natural and adversarial distribution shifts. When fine-tuned on DomainNet and ImageNet datasets, CRAFT shows state-of-the-art ID performance while improving average OOD performance by 12% and 10% respectively relative to the vanilla fine-tuning baseline.
  • Item
    Automated Root Tracing Using Deep Learning
    (Georgia Institute of Technology, 2024-12-02) Lu, Cen
    Roots play a crucial role in plant development by anchoring plants, absorbing nutrients, and maintaining soil structure. Understanding root structures and dynamics is vital for ecological research and assessing soil health. However, tracing roots from photos obtained with Minirhizotron is a time-consuming task, and applying deep learning techniques can facilitate this process. This thesis applies the DeepLabV3+ model with a confidence weighted approach to segment root structures in soil images. The methodology involves classifying images based on root visibility, cropping images to focus on root regions, and training the DeepLabV3+ model, which employs atrous convolutions and an Atrous Spatial Pyramid Pooling (ASPP) module to capture multi-scale contextual information. The confidence method modulates the loss function based on pixel confidence scores to handle ambiguous boundaries and low-resolution images. The confidence function decreases with distance from root boundaries and adapts to varying scales. This method was tested on multiple datasets from natural environments with varying soil types, including Mepibdeath, Ban Harol, Champenoux, and Hesse, which allowed for an assessment of the robustness and generalization ability of the tested models. Evaluated using metrics such as Cohen’s kappa and R2 for surface and length, the results show that the confidence-weighted approach improves segmentation quality by reducing false positives but may miss weakly expressed roots. Future work should focus on enhancing model robustness and improving training data quality to handle complex root structures and environmental noise better.
  • Item
    Automated Goal-mining of Information Security Requirements: Empirical Evaluation of Regulatory Harmony with Industry Standards
    (Georgia Institute of Technology, 2024-12-02) Moore, McKay
    The thesis investigates the challenges that disharmony among cybersecurity regulations and industry standards imposes on regulated entities. Disharmony, defined as the redundancies, gaps, and conflicts between requirements, creates burdens for compliance engineers and reduces the overall efficacy of information security measures. To address this challenge, the thesis develops and evaluates a method for automating the extraction and analysis of regulatory requirements using large language models (LLMs) trained through few-shot learning. This methodology enables the automated identification of regulatory expectations, focusing on Federal Trade Commission (FTC) enforcement orders, and systematically compares them against the National Institute of Standards and Technology (NIST) Cybersecurity Framework (CSF). By automating goal-mining techniques, the thesis introduces a scalable approach for identifying regulatory disharmony. An empirical analysis of FTC orders illustrates the methodology, uncovering areas where FTC requirements deviate from or fail to align with NIST standards. This analysis reveals specific redundancies, gaps, and inconsistencies that complicate compliance efforts and hinder the effectiveness of security practices. The findings demonstrate that automating the mapping of regulatory expectations to industry standards reduces manual labor and improves clarity, offering a significant improvement over traditional methods.