Organizational Unit:

School of Interactive Computing

Permanent Link

https://hdl.handle.net/1853/70783

Parent Organization

Organizational Unit

College of Computing

ArchiveSpace Name Record

https://finding-aids.library.gatech.edu/agents/corporate_entities/1113

Full item page

Publication Search Results

Now showing 1 - 10 of 161

An analysis of supports and barriers to offering computer science in Georgia public high schools

(Georgia Institute of Technology, 2019-11-08) Parker, Miranda C.

There is a growing international movement to provide every child access to high-quality computing education. Despite the widespread effort, most children in the US do not take any computing classes in primary or secondary schools. There are many factors that principals and districts must consider when determining whether to offer CS courses. The process through which school officials make these decisions, and the supports and barriers they face in the process, is not well understood. Once we understand these supports and barriers, we can better design and implement policy to provide CS for all. In my thesis, I study public high schools in the state of Georgia and the supports and barriers that affect offerings of CS courses. I quantitatively model school- and county-level factors and the impact these factors have on CS enrollment and offerings. The best regression models include prior CS enrollment or offerings, implying that CS is likely sustainable once a class is offered. However, large unexplained variances persist in the regression models. To help explain this variance, I selected four high schools and interviewed principals, counselors, and teachers about what helps, or hurts, their decisions to offer a CS course. I build case studies around each school to explore the structural and people-oriented themes the participants discussed. Difficulty in hiring and retaining qualified teachers in CS was one major theme. I frame the case studies using diffusion of innovations providing additional insights into what attributes support a school deciding to offer a CS course. The qualitative themes gathered from the case studies and the quantitative factors used in the regression models inform a theory of supports and barriers to CS course offerings in high schools in Georgia. This understanding can influence future educational policy decisions around CS education and provide a foundation for future work on schools and CS access.
Towards natural human-AI interactions in vision and language

(Georgia Institute of Technology, 2019-11-07) Chandrasekaran, Arjun

Inter-human interaction is a rich form of communication. Human interactions typically leverage a good theory of mind, involve pragmatics, story-telling, humor, sarcasm, empathy, sympathy, etc. Recently, we have seen a tremendous increase in the frequency and the modalities through which humans interact with AI. Despite this, current human-AI interactions lack many of these features that characterize inter-human interactions. Towards the goal of developing AI that can interact with humans naturally (similar to other humans), I take a two-pronged approach that involves investigating the ways in which both the AI and the human can adapt to each other's characteristics and capabilities. In my research, I study aspects of human interactions, such as humor, story-telling, and the humans' abilities to understand and collaborate with an AI. Specifically, in the vision and language modalities, 1. In an effort to improve the AI's capabilities to adapt its interactions to a human, we build computational models for (i) humor manifested in static images, (ii) contextual, multi-modal humor, and (iii) temporal understanding of the elements of a story. 2. In an effort to improve the capabilities of a collaborative human-AI team, we study (i) a lay person's predictions regarding the behavior of an AI in a situation, (ii) the extent to which interpretable explanations from an AI can improve performance of a human-AI team. Through this work, I demonstrate that aspects of human interactions (such as certain forms of humor and story-telling) can be modeled with reasonable success using computational models that utilize neural networks. On the other hand, I also show that a lay person can successfully predict the outputs and failures of a deep neural network. Finally, I present evidence that suggests that a lay person who has access to interpretable explanations from the model, can collaborate more effectively with a neural network on a goal-driven task.
Map-centric visual data association across seasons in a natural environment

(Georgia Institute of Technology, 2019-11-01) Griffith, Shane David

Vision is one of the primary sensory modalities of animals and robots, yet among robots it still has limited power in natural environments. Dynamic processes of Nature continuously change how an environment looks, which work against appearance-based methods for visual data association. As a robot is deployed again and again, the possibility of finding correspondences diminishes between surveys increasingly separated in time. This is a major limitation of intelligent systems targeted for precision agriculture, search and rescue, and environment monitoring. New approaches to data association may be necessary to overcome the variation in appearance of natural environments. This dissertation presents success with a map-centric approach, which builds on 3D vision to achieve visual data association across seasons. It first presents the new, Symphony Lake Dataset, which consists of fortnightly visual surveys of a 1.3 km lakeshore captured from an autonomous surface vehicle over three years. It then establishes dense correspondence as a technique to both provide robust visual data association and to eliminate the variation in viewpoint between surveys. Given a consistent map and localized poses, visual data association across seasons is achieved with the integration of map point priors and geometric constraints within the dense correspondence image alignment optimization. This algorithm is called Reprojection Flow. This dissertation presents the first work to see through the variation in appearance across seasons in a natural environment using map point priors and localized poses. The variation in appearance had a minimized effect on dense correspondence when anchored by accurate map points. Up to 37 surveys were transformed into year-long time-lapses at the scenes where their maps were consistent. This indicates that, at a time when frequent advancements are made towards robust visual data association, the spatial information in a map may be able to close the distance where hard cases have persisted between observations.
Managing learning interactions for collaborative robot learning

(Georgia Institute of Technology, 2019-09-11) Bullard, Kalesha

Robotic assistants should be able to actively engage their human partner(s) to generalize knowledge about relevant tasks within their shared environment. Yet a key challenge is not all human partners will be proficient at teaching; furthermore, humans should not be held accountable for tracking a robot’s knowledge over time in a dynamically changing environment, across multiple tasks. Thus, it is important to enable these interactive robots to characterize their own uncertainty and equip them with an information gathering policy for asking the appropriate questions of their human partners to resolve that uncertainty. In this way, the robot shares the responsibility in guiding its own learning process and is a collaborator in the learning. Additionally, given the robot requires some tutelage from its partner, awareness of constraints on the teacher’s time and cognitive resources available for devoting to the interaction could help the agent to use the time allotted more wisely. This thesis examines the problem of enabling a robotic agent to leverage structured interaction with a human partner for acquiring concepts relevant to a task it must later perform. To equip the agent with the desired concept knowledge, we first explore the paradigm of Learning from Demonstration for the acquisition of (1) training instances as examples of task-relevant concepts and (2) informative features for appropriately representing and discriminating between task-relevant concepts. Given empirical evidence that a human partner can be helpful to the agent in solving the concept learning problem, we subsequently investigate the design of algorithms that enable the robot learner to autonomously manage interaction with its human partner, using a questioning policy to actively gather both instance and feature information. This thesis seeks to investigate the following hypothesis: In the context of robot learning from human demonstrations in changeable and resource-constrained environments, enabling the robot to actively elicit multiple types of information through questions, and to reason about what question to ask and when, leads to improved learning performance.
Visual question answering and beyond

(Georgia Institute of Technology, 2019-09-03) Agrawal, Aishwarya

In this dissertation, I propose and study a multi-modal Artificial Intelligence (AI) task called Visual Question Answering (VQA) -- given an image and a natural language question about the image (e.g., "What kind of store is this?", "Is it safe to cross the street?"), the machine's task is to automatically produce an accurate natural language answer ("bakery", "yes"). Applications of VQA include -- aiding visually impaired users in understanding their surroundings, aiding analysts in examining large quantities of surveillance data, teaching children through interactive demos, interacting with personal AI assistants, and making visual social media content more accessible. Specifically, I study the following -- 1) how to create a large-scale dataset and define evaluation metrics for free-form and open-ended VQA, 2) how to develop techniques for characterizing the behavior of VQA models, and 3) how to build VQA models that are less driven by language biases in training data and are more visually grounded, by proposing -- a) a new evaluation protocol, b) a new model architecture, and c) a novel objective function. Most of my past work has been towards building agents that can "see" and "talk". However, for a lot of practical applications (e.g., physical agents navigating inside our houses executing natural language commands) we need agents that can not only "see" and "talk" but can also take actions. In chapter 6, I present future directions towards generalizing vision and language agents to be able to take actions.
VISUAL DENSE THREE-DIMENSIONAL MOTION ESTIMATION IN THE WILD

(Georgia Institute of Technology, 2019-08-19) Lv, Zhaoyang

One of the most fundamental abilities of the human perception system is to seamlessly sense the changing 3D worlds from our ego-centric visual observations. Driven by the modern applications of robotics, autonomous driving, and mixed reality, machine perception requires a precise dense representation of 3D motion with low latency. In this thesis, we focus on the task of estimating absolute 3D motions in the world coordinate in unconstrained environments observed from ego-centric visual information only. The goal is to achieve a fast algorithm that can produce an accurate representation of the densely rich 3D motions. To achieve this goal, I propose to investigate the problem from four perspectives with the following contributions. 1) Present a fast and accurate continuous optimization approach that solves the scene motions as fixed-a-priori planar segments. 2) Present a learning-based approach that recovers the dense scene flow from egocentric motion and optical flow, decomposed by a novel data-driven rigidity prediction. 3) Present a modern synthesis of the classic inverse compositional method for 3D rigid motion estimation using dense image alignment. 4) Present a two-view monocular scene flow approach that recovers depth, camera motion, and 3D scene motions of rigid moving scenes.
Improvisational artificial intelligence for embodied co-creativity

(Georgia Institute of Technology, 2019-08-14) Jacob, Mikhail

This dissertation explores embodied agents that can improvise with people in an object-based gestural proto-narrative domain. I study the improvisational action selection problem (the challenge of performing action selection in an open-ended, ill-defined problem space in near real-time based on the agent’s knowledge and the improvisational context, in order to avoid incoherent behavior, decision paralysis, and unexpressive responses) and how to address it within the Robot Improv Circus interactive virtual reality (VR) installation and the CARNIVAL agent architecture. The CARNIVAL architecture uses affordance-based action variant generation, improvisational response strategies, and computational evaluation of creativity of perceived or generated actions to perform creative arc negotiation to address the improvisational action selection problem. Creative arc negotiation is the process of selecting actions over time to follow a given creative arc, i.e. a continuous target trajectory for generated responses through an agent’s creative space (consisting of novelty, surprise, and value). \My thesis statement states that “embodied agents that address the improvisational action selection problem using ‘creative arc negotiation’ increase perceptions of enjoyment, agent creativity, and coherence in both observers and participants while performing movement improv with non-experts.” My research found that it is valid to conclusively state that embodied agents addressing the improvisational action selection problem using creative arc negotiation can perform movement improv with non-experts so that perceptions of agent creativity and coherence increase for both VR participants and audience members. However, perceptions of enjoyment only conclusively increase for observers. More study is required to show a conclusive increase in enjoyment for VR participants of the installation.
Autonomous rally racing with AutoRally and model predictive control

(Georgia Institute of Technology, 2019-07-30) Goldfain, Brian

The ability to conduct experiments in the real world is a critical step for roboticists working to create autonomous systems that achieve human-level task performance. Self-driving vehicles are a domain that has received significant attention in recent years, in part because of their potential societal benefit. However, there is still a significant performance gap between human drivers and self-driving vehicles. The task of off-road rally racing is an especially difficult driving task where many of the unsolved challenges occur frequently. This thesis opens the domain of autonomous rally racing to researchers and conducts the first rally race between autonomous and human drivers. We created the AutoRally platform, a robust, scaled self-driving vehicle and demonstrated AutoRally driven at high speed on a dirt track by the model predictive path integral controller. The controller optimizes control plans on-the-fly onboard the robot using a dynamics model learned from data and a hand-coded task description, also called a cost function. To enable rally racing, an additional layer of cost function optimization, that operates on the time scale of lap times, was created to replace the hand-coded cost function with one adapted through interactions with the system. We explore representations and optimization methods for the racing cost function, and then compare driving performance to human and autonomous drivers using the AutoRally platform at the Georgia Tech Autonomous Racing Facility
Combinational machine learning creativity

(Georgia Institute of Technology, 2019-07-24) Guzdial, Matthew James

Computational creativity is a field focused on the study and development of behaviors in computers an observer would deem creative. Traditionally, it has relied upon rules-based and search-based artificial intelligence. However these types of artificial intelligence rely on human-authored knowledge that can obfuscate whether creative behavior arose due to actions from an AI agent or its developer. In this dissertation I look to instead apply machine learning to a subset of computational creativity problems. This particular area of research is called combinational creativity. Combinational creativity is the type of creativity people employ when they create new knowledge by recombining elements of existing knowledge. This dissertation examines the problem of combining combinational creativity and machine learning in two primary domains: video game design and image classification. Towards the goal of creative novel video game designs I describe a machine-learning approach to learn a model of video game level design and rules from gameplay video, validating the accuracy of these with a human subject study and automated gameplaying agent, respectively. I then introduce a novel combinational creativity approach I call conceptual expansion, designed to work with machine-learned knowledge and models by default. I demonstrate conceptual expansion’s utility and limitations across both domains, through the creation of novel video games and applied in a transfer learning framework for image classification. This dissertation seeks to validate the following hypothesis: For creativity problems that require the combination of aspects of distinct examples, conceptual expansion of generative or evaluative models can create a greater range of artifacts or behaviors, with greater measures of value, surprise, and novelty than standard combinational approaches or approaches that do not explicitly model combination.
Human-centered algorithms and ethical practices to understand deviant mental health behaviors in online communities

(Georgia Institute of Technology, 2019-07-19) Chancellor, Stevie

Social media has changed how individuals cope with health challenges in complex ways. In some mental health communities, individuals promote deliberate self-injury, disordered eating habits, and suicidal ideas as acceptable choices rather than dangerous actions. In particular, I study focuses on the pro-eating disorder (pro-ED) community, a clandestine group that advocates for eating disorders as lifestyle choices rather than a dangerous and potentially life-threatening mental illnesses. This thesis develops human-centered algorithmic approaches to understand these deviant and dangerous behaviors on social media. Using large-scale social media datasets and techniques like machine learning, computational linguistics, and statistical modeling, I analyze and understand patterns of behavior in pro-ED communities, how they interact with others on the platform, and these latent impacts. Through eight empirical examinations and an analytical essay, I demonstrate that computational approaches can identify pro-ED and related behaviors on social media as well as documenting larger-scale community and platform changes and interactions with dangerous content. I also consider the impacts that methods, ethics, and practices of conducting this work have on these communities. In sum, this thesis represents the beginnings of an interdisciplinary approach to problem-solving for complex, vulnerable communities on social media.