Organizational Unit:

School of Interactive Computing

Permanent Link

https://hdl.handle.net/1853/70783

Parent Organization

Organizational Unit

College of Computing

ArchiveSpace Name Record

https://finding-aids.library.gatech.edu/agents/corporate_entities/1113

Full item page

Publication Search Results

Now showing 1 - 10 of 303

Realistic Mobile Manipulation Tasks for Evaluating Home-Assistant Robots

(Georgia Institute of Technology, 2023-12-14) Yenamandra, Sriram Venkata

By assisting in household chores, robotic home assistants hold the potential to significantly enhance the quality of human lives. Mobile manipulation tasks can serve as test beds for evaluating the capabilities essential to the development of robotic home assistants: perception, language understanding, navigation, manipulation, and common-sense reasoning. However, it is imperative to use settings that closely resemble real-world deployment to ensure that progress made on these tasks is practically relevant. The thesis introduces three tasks, namely HomeRobot: Open Vocabulary Mobile Manipulation (OVMM), "GO To Any Thing" (GOAT) and Housekeep, to realsze the different dimensions of realism critical for evaluating embodied agents: 1) autonomy, the ability to operate without very specific instructions (e.g. the precise locations of goal objects), 2) exposure to realistic novel multi-room environments, 3) working with previously unseen objects, and 4) extended durations of deployment. Further, the thesis proposes baselines per task, which succeed in solving each task to a varying degree. The shortcomings of these baselines underscore the open challenges of open-vocabulary object detection and common-sense reasoning. By using test scenarios closer to real-world deployment, this work attempts to advance research in the development of robotic assistants.
Information Extraction on Scientific Literature under Limited Supervision

(Georgia Institute of Technology, 2023-12-12) Bai, Fan

The exponential growth of scientific literature presents both challenges and opportunities for researchers across various disciplines. Effectively extracting pertinent information from this extensive corpus is crucial for advancing knowledge, enhancing collaboration, and driving innovation. However, manual extraction is a laborious and time-consuming process, underscoring the demand for automated solutions. Information extraction (IE), a sub-field of natural language processing (NLP) focused on automatically extracting structured information from unstructured data sources, plays a crucial role in addressing this challenge. Despite their success, many IE methods often require substantial human-annotated data, which might not be easily accessible, particularly in specialized scientific domains. This highlights the need for adaptable and robust techniques capable of functioning with limited supervision. In this thesis, we study the task of information extraction on scientific literature, particularly addressing the challenge of limited (human) supervision. Specifically, our work has delved into four key dimensions of this problem. First, we explore the potential of harnessing easily accessible resources, like knowledge bases, to develop IE systems without direct human supervision. Second, we examine the use of pre-trained language models to create effective and efficient scientific IE systems, experimenting with various fine-tuning architectures and learning strategies. Next, we investigate the balance between the labor expenditure of human annotation and the computational cost linked with domain-specific pre-training, to achieve optimal performance under the budget constraints. Lastly, we capitalize on the emerging capabilities of large pre-trained language models by showcasing how information extraction can be achieved solely based on a human-crafted data schema. Through these explorations, this thesis aims to lay a solid foundation for the continued advancement of scientific IE under limited supervision.
Lifelong Machine Learning without Lifelong Data Retention

(Georgia Institute of Technology, 2023-12-10) Smith, James Seale

Machine learning models suffer from a phenomenon known as catastrophic forgetting when learning novel concepts from continuously shifting training data. Typical solutions for this continual learning problem require extensive replay of previously seen data, which increases memory costs and may violate data privacy. To address these challenges, we first explore replacing this replay data with alternatives: (i) unlabeled data “from the wild” and (ii) synthetic data generated via model inversion. Our work using this alternative replay data boasts strong performance on replay-free continual learning for image classification. Next, we consider an alternative solution to entirely replace replay data: pre-training. Specifically, we leverage strongly pre-trained models and continuously edit them with prompts and low-rank adapters for both (i) image classification and (ii) natural-language visual reasoning. Finally, we extend the idea of continual learning using pre-trained models to the proposed setting of continual customization of text-to-image diffusion models. We hope that our work on enabling models to learn from evolving data distributions and adapt to new tasks will help unlock the full potential of machine learning in addressing emerging real-world challenges.
Machine Learning for Agile Robotic Control

(Georgia Institute of Technology, 2023-12-06) Wagener, Nolan C.

Roboticists typically exploit structure in a problem, such as by modeling the mechanics of a system, to generate solutions for a given task. However, this structure can limit flexibility and require practitioners to reason about challenging phenomena, such as contacts in mechanics. Data, conversely, provides much more flexibility and, when combined with deep neural networks, has given rise to powerful models in vision and language, all with little hand-engineered structure. While it is tempting to fully forego structure in favor of learning-based methods for robotics, we show how data and learning can be gracefully incorporated in a structured way. In particular, we focus on the control setting, and we demonstrate that robotic control offers a variety of modes that data can be utilized. First, we show that data can be used in a model-based fashion to train a neural network that approximates complex dynamics and which can be used within a model predictive controller (MPC). Then, we show that the MPC process is itself an instance of online learning and demonstrate how to synthesize MPC algorithms from a common online learning algorithm. We apply both of the aforementioned approaches on a real-world aggressive driving task and show that they can accomplish the task. Next, we consider the safe reinforcement learning problem and show that safety interventions can be used as a learning signal to have an agent learn to become safe without needing to execute unsafe actions in the environment. Finally, we consider the simulated humanoid domain and show that pre-collected human motions can act as a strong inductive bias to ground motions learned by the humanoid agent.
Leveraging Low-Dimensional Geometry for Search and Ranking

(Georgia Institute of Technology, 2023-12-06) Fenu, Stefano

There is a substantial body of work on search and ranking in computer science, but less attention has been paid to the question of how to learn geometric data representations that are amenable to search and ranking tasks. Index-based datastructures for search are commonplace, but these discard structural features of the data, often have large memory profiles, and scale poorly with data dimension. Geometric search techniques do exist, but few analogous search datastructures or preprocessing algorithms exist that leverage spatial structure in data to increase search performance. The aim of the research detailed here is to show that leveraging low-dimensional geometry can improve the performance of search and ranking over index-only methods, and that there are dimensionality reduction techniques that can make spatial search algorithms more effective without any additional memory overhead. This work accomplishes these aims by developing methods for: Learning low-dimensional coordinate embeddings explicitly for the purpose of search and ranking; and actively querying and constructing searchable embeddings to minimize user-labeling costs. This dissertation will further provide scalable versions of these algorithms and demonstrate their effectiveness across a broad range of problem domains including visual, text, and educational data. These performance improvements will allow human-in-the-loop search of larger datasets and enable new applications in preference search and ranking.
Controllability and Uncertainty in Generative Models

(Georgia Institute of Technology, 2023-12-06) Ham, Cusuh

This dissertation describes methods for enhancing generative models with either added controllability or expressiveness of uncertainty, demonstrating how a strong prior enables both features. One general approach is to introduce new architectures or training objectives. However, current trends towards massive upscaling of model size, training data, and computational resources can make retraining or fine-tuning difficult and expensive. Thus, another approach is to build upon existing pre-trained models. We consider both types of approaches with an emphasis on the latter. We first tackle the tasks of controllable image synthesis and uncertainty estimation through training-based methods and then switch focus towards computationally-efficient methods that do not require direct updates to the base model's parameters. We conclude by discussing future directions based on the insights from our findings.
Robotics in the Era of Vision-Language Foundation Models

(Georgia Institute of Technology, 2023-11-29) Kira, Zsolt
Foundation Models for Robotics

(Georgia Institute of Technology, 2023-11-29) Garg, Animesh
Processes and outcomes of systems thinking in an interactive modeling environment

(Georgia Institute of Technology, 2023-09-06) An, Sungeun

Modern society is full of natural, social, technological, and socio-technical systems, and thus systems thinking is an essential skill for prospering in the modern society. Developing interactive environments for supporting learning about complex systems requires a robust understanding how learners engage in systems thinking in various learning contexts. In this interdisciplinary work, I use theories and techniques from cognitive science, learning science, and artificial intelligence to develop an understanding of processes and outcomes of systems thinking for college students in pedagogical learning contexts and unspecified learners in self-directed learning contests in the domain of ecology. To achieve this goal, I present the Virtual Experimentation Research Assistant (VERA; vera.cc.gatech.edu)-- an interactive modeling environment to promote understanding and reasoning about ecological systems. VERA enables learners to access large-scale biological knowledge from the Encyclopedia of Life (EOL), construct conceptual models of ecological systems, run agent-based simulations of these models, and revise the models and simulations as needed to explain ecological phenomena. I have used VERA to complete four studies. The first two field studies were conducted in pedagogical contexts. The first study explored the effects of modeling in acquiring domain knowledge. I found that engaging in ecological modeling using VERA helped college students acquire biological knowledge. I also found that access to large-scale domain knowledge helped them construct more complex models and develop a larger number of hypotheses for a given problem. The second study investigated college students’ behaviors in estimating the parameters for agent-based simulations. I discovered that college students use multiple cognitive strategies for parameter estimation such as systematic search, problem reduction/decomposition, and global/local search. VERA is now accessible through Smithsonian Institution’s EOL website (eol.org), and it is used by thousands of self-directed learners around the world. The third study conducted a fine-grained analysis of self-directed learners' behaviors and models outside pedagogical contexts. I used a variety of learning analytics methods to analyze these behaviors including sequential data mining, hierarchical clustering, and Markov chain models. I found that self-directed learners engage in three types of behaviors: observation, construction, and exploration. The fourth study explored the effects of guided learning and self-exploration on modeling behaviors, model quality, and transfer of learning in a pedagogical context. Using in situ A/B experiments, I found that self-exploration in systems thinking leads to more complex and varied models whereas guidance in systems thinking does not have significant benefits in efficiency and accuracy for transfer of learning. Together these four studies lead to a robust understanding of how adult students learn about systems thinking and how to design interactive modeling environments to support self-directed systems thinking in open and ill-defined problems.
Building and Evaluating Controllable Models for Text Simplification

(Georgia Institute of Technology, 2023-08-17) Maddela, Mounica

Automatic Text Simplification (ATS) aims to improve the readability of texts with simpler grammar and word choices while preserving meaning. ATS is generally treated as a monolingual translation task where the input is a piece of text and the output is a simplified version of the input. One major drawback of the existing methods for ATS is the lack of controllability. ATS is an audience-dependent task and what constitutes simplified text for one group of users may not be acceptable for other groups. An ideal ATS system should be able to control various attributes of the generated text such as syntactic structures, length, readability levels, and word choices. Meanwhile, evaluating ATS systems is as important as building them because efficient automatic evaluation frameworks can accelerate the process of improving existing systems. However, the current automatic evaluation metrics for ATS focus on the semantic content of the simplified text but not the writing style. These metrics tend to favor conservative systems that make minimal changes to the input and inaccurately penalize simplifications that paraphrase the input. An ideal evaluation metric for ATS should not only capture simplification quality but also the different styles of simplification. In this dissertation, I develop controllable simplification systems and diverse automatic metrics for ATS. I propose two controllable approaches for ATS: a sentence simplification system that combines linguistic rules with Transformer models to generate simplified sentences at different readability levels and a lexical simplification system that leverages human judgments of word complexity to replace complex words with simpler phrases. Finally, I propose the first supervised automatic evaluation metric for ATS, LENS, which can capture multiple simplification styles and outperforms the existing metrics in evaluating diverse simplification systems. To train and evaluate LENS, I create SIMPEval, a new training and evaluation dataset for metrics that incorporates different types of simplification operations.