Person:

Kira, Zsolt

Permanent Link

https://hdl.handle.net/1853/72202

Associated Organization(s)

Organizational Unit

School of Interactive Computing

Full item page

Publication Search Results

Now showing 1 - 10 of 16

Robotics in the Era of Vision-Language Foundation Models

(Georgia Institute of Technology, 2023-11-29) Kira, Zsolt
Lifelong Robot Learning in an Open World

(Georgia Institute of Technology, 2022-08-24) Kira, Zsolt

IRIM hosts each semester a symposium to feature presentations from faculty and presentations of research that has been funded by our IRIM seed grant program in the last year. The symposium is a chance for faculty to meet new PhD students on campus, as well as a chance to get a better idea of what IRIM colleagues are up to these days. The goal of the symposium is to spark new ideas, new collaborations, and even new friends!
Adaptive and Data-Efficient Robot Learning

(Georgia Institute of Technology, 2021-08-25) Kira, Zsolt
ML@GT Lab presents LAB LIGHTNING TALKS 2020

( 2020-12-04) AlRegib, Ghassan ; Chau, Duen Horng ; Chava, Sudheer ; Cohen, Morris B. ; Davenport, Mark A. ; Desai, Deven ; Dovrolis, Constantine ; Essa, Irfan ; Gupta, Swati ; Huo, Xiaoming ; Kira, Zsolt ; Li, Jing ; Maguluri, Siva Theja ; Pananjady, Ashwin ; Prakash, B. Aditya ; Riedl, Mark O. ; Romberg, Justin ; Xie, Yao ; Zhang, Xiuwei

Labs affiliated with the Machine Learning Center at Georgia Tech (ML@GT) will have the opportunity to share their research interests, work, and unique aspects of their lab in three minutes or less to interested graduate students, Georgia Tech faculty, and members of the public. Participating labs include: Yao’s Group - Yao Xie, H. Milton Stewart School of Industrial Systems and Engineering (ISyE); Huo Lab - Xiaoming Huo, ISyE; LF Radio Lab – Morris Cohen, School of Electrical Computing and Engineering (ECE); Polo Club of Data Science – Polo Chau, CSE; Network Science – Constantine Dovrolis, School of Computer Science; CLAWS – Srijan Kumar, CSE; Control, Optimization, Algorithms, and Randomness (COAR) Lab – Siva Theja Maguluri, ISyE; Entertainment Intelligence Lab and Human Centered AI Lab – Mark Riedl, IC; Social and Language Technologies (SALT) Lab – Diyi Yang, IC; FATHOM Research Group – Swati Gupta, ISyE; Zhang's CompBio Lab – Xiuwei Zhang, CSE; Statistical Machine Learning - Ashwin Pananjady, ISyE and ECE; AdityaLab - B. Aditya Prakash, CSE; OLIVES - Ghassan AlRegib, ECE; Robotics Perception and Learning (RIPL) – Zsolt Kira, IC; Eye-Team - Irfan Essa, IC; and Mark Davenport, ECE.
Mining Structure Fragments for Smart Bundle Adjustment

(Georgia Institute of Technology, 2014-09) Carlone, Luca ; Alcantarilla, Pablo Fernandez ; Chiu, Han-Pang ; Kira, Zsolt ; Dellaert, Frank

Bundle Adjustment (BA) can be seen as an inference process over a factor graph. From this perspective, the Schur complement trick can be interpreted as an ordering choice for elimination. The elimination of a single point in the BA graph induces a factor over the set of cameras observing that point. This factor has a very low information content (a point observation enforces a low-rank constraint on the cameras). In this work we show that, when using conjugate gradient solvers, there is a computational advantage in “grouping” factors corresponding to sets of points (fragments) that are co-visible by the same set of cameras. Intuitively, we collapse many factors with low information content into a single factor that imposes a high-rank constraint among the cameras. We provide a grounded way to group factors: the selection of points that are co-observed by the same camera patterns is a data mining problem, and standard tools for frequent pattern mining can be applied to reveal the structure of BA graphs. We demonstrate the computational advantage of grouping in large BA problems and we show that it enables a consistent reduction of BA time with respect to state-of-the-art solvers (Ceres [1]).
Deep Segments: Comparisons between Scenes and their Constituent Fragments using Deep Learning

(Georgia Institute of Technology, 2014-09) Doshi, Jigar ; Mason, Celeste ; Wagner, Alan ; Kira, Zsolt

We examine the problem of visual scene understanding and abstraction from first person video. This is an important problem and successful approaches would enable complex scene characterization tasks that go beyond classification, for example characterization of novel scenes in terms of previously encountered visual experiences. Our approach utilizes the final layer of a convolutional neural network as a high-level, scene specific, representation which is robust enough to noise to be used with wearable cameras. Researchers have demonstrated the use of convolutional neural networks for object recognition. Inspired by results from cognitive and neuroscience, we use output maps created by a convolutional neural network as a sparse, abstract representation of visual images. Our approach abstracts scenes into constituent segments that can be characterized by the spatial and temporal distribution of objects. We demonstrate the viability of the system on video taken from Google Glass. Experiments examining the ability of the system to determine scene similarity indicate ρ (384) = ±0:498 correlation to human evaluations and 90% accuracy on a category match problem. Finally, we demonstrate high-level scene prediction by showing that the system matches two scenes using only a few initial segments and predicts objects that will appear in subsequent segments.
Eliminating Conditionally Independent Sets in Factor Graphs: A Unifying Perspective based on Smart Factors

(Georgia Institute of Technology, 2014) Carlone, Luca ; Kira, Zsolt ; Beall, Chris ; Indelman, Vadim ; Dellaert, Frank

Factor graphs are a general estimation framework that has been widely used in computer vision and robotics. In several classes of problems a natural partition arises among variables involved in the estimation. A subset of the variables are actually of interest for the user: we call those target variables. The remaining variables are essential for the formulation of the optimization problem underlying maximum a posteriori (MAP) estimation; however these variables, that we call support variables, are not strictly required as output of the estimation problem. In this paper, we propose a systematic way to abstract support variables, defining optimization problems that are only defined over the set of target variables. This abstraction naturally leads to the definition of smart factors, which correspond to constraints among target variables. We show that this perspective unifies the treatment of heterogeneous problems, ranging from structureless bundle adjustment to robust estimation in SLAM. Moreover, it enables to exploit the underlying structure of the optimization problem and the treatment of degenerate instances, enhancing both computational efficiency and robustness.
Communication and alignment of grounded symbolic knowledge among heterogeneous robots

(Georgia Institute of Technology, 2010-04-05) Kira, Zsolt

Experience forms the basis of learning. It is crucial in the development of human intelligence, and more broadly allows an agent to discover and learn about the world around it. Although experience is fundamental to learning, it is costly and time-consuming to obtain. In order to speed this process up, humans in particular have developed communication abilities so that ideas and knowledge can be shared without requiring first-hand experience. Consider the same need for knowledge sharing among robots. Based on the recent growth of the field, it is reasonable to assume that in the near future there will be a collection of robots learning to perform tasks and gaining their own experiences in the world. In order to speed this learning up, it would be beneficial for the various robots to share their knowledge with each other. In most cases, however, the communication of knowledge among humans relies on the existence of similar sensory and motor capabilities. Robots, on the other hand, widely vary in perceptual and motor apparatus, ranging from simple light sensors to sophisticated laser and vision sensing. This dissertation defines the problem of how heterogeneous robots with widely different capabilities can share experiences gained in the world in order to speed up learning. The work focus specifically on differences in sensing and perception, which can be used both for perceptual categorization tasks as well as determining actions based on environmental features. Motivating the problem, experiments first demonstrate that heterogeneity does indeed pose a problem during the transfer of object models from one robot to another. This is true even when using state of the art object recognition algorithms that use SIFT features, designed to be unique and reproducible. It is then shown that the abstraction of raw sensory data into intermediate categories for multiple object features (such as color, texture, shape, etc.), represented as Gaussian Mixture Models, can alleviate some of these issues and facilitate effective knowledge transfer. Object representation, heterogeneity, and knowledge transfer is framed within Gärdenfors' conceptual spaces, or geometric spaces that utilize similarity measures as the basis of categorization. This representation is used to model object properties (e.g. color or texture) and concepts (object categories and specific objects). A framework is then proposed to allow heterogeneous robots to build models of their differences with respect to the intermediate representation using joint interaction in the environment. Confusion matrices are used to map property pairs between two heterogeneous robots, and an information-theoretic metric is proposed to model information loss when going from one robot's representation to another. We demonstrate that these metrics allow for cognizant failure, where the robots can ascertain if concepts can or cannot be shared, given their respective capabilities. After this period of joint interaction, the learned models are used to facilitate communication and knowledge transfer in a manner that is sensitive to the robots' differences. It is shown that heterogeneous robots are able to learn accurate models of their similarities and difference, and to use these models to transfer learned concepts from one robot to another in order to bootstrap the learning of the receiving robot. In addition, several types of communication tasks are used in the experiments. For example, how can a robot communicate a distinguishing property of an object to help another robot differentiate it from its surroundings? Throughout the dissertation, the claims will be validated through both simulation and real-robot experiments.
A Design Process for Robot Capabilities and Missions Applied to Microautonomous Platforms

(Georgia Institute of Technology, 2010) Kira, Zsolt ; Arkin, Ronald C. ; Collins, Thomas R.

As part of our research for the ARL MAST CTA (Collaborative Technology Alliance) [1], we present an integrated architecture that facilitates the design of microautonomous robot platforms and missions, starting from initial design conception to actual deployment. The framework consists of four major components: design tools, mission-specification system (MissionLab), case-based reasoning system (CBR Expert), and a simulation environment (USARSim). The designer begins by using design tools to generate a space of missions, taking broad mission-specific objectives into account. For example, in a multi-robot reconnaissance task, the parameters varied include the number of robots used, mobility capabilities (e.g. maximum speeds), and sensor capabilities. The design tools are used to intelligently carve out the space of all possible parameter combinations to produce a smaller set of mission configurations. Quantitative assessment of this design space is then performed in simulation to determine which particular configuration would yield an effective team before actual deployment. MissionLab, a mission-specification platform, is used to incorporate the input parameters, generate the underlying robot missions, and control the robots in simulation. It also provides logging mechanisms to measure a range of quantitative performance metrics, such as mission completion rates, resource utilization, and time to completion, which are then used to determine the best configuration for a particular mission. These metrics can also provide guidance for the refinement of the entire design process. Finally, a case-based reasoning system allows users to maximize successful deployment of the robots by retrieving proven configurations and determine the robot capabilities necessary for success in a particular mission.
Mission Specification and Control for Unmanned Aerial and Ground Vehicles for Indoor Target Discovery and Tracking

(Georgia Institute of Technology, 2010) Ulam, Patrick D. ; Kira, Zsolt ; Arkin, Ronald C. ; Collins, Thomas R.

This paper describes ongoing research by Georgia Tech into the challenges of tasking and controlling heterogonous teams of unmanned vehicles in mixed indoor/outdoor reconnaissance scenarios. We outline the tools and techniques necessary for an operator to specify, execute, and monitor such missions. The mission specification framework used for the purposes of intelligence gathering during mission execution are first demonstrated in simulations involving a team of a single autonomous rotorcraft and three ground-based robotic platforms. Preliminary results including robotic hardware in the loop are also provided.