Detection and incremental object learning in videos

Thumbnail Image
Humayun, Ahmad
Rehg, James M.
Associated Organization(s)
Organizational Unit
Organizational Unit
Supplementary to
Unlike state-of-the-art batch machine learning methods, children have a remarkable facility for learning visual representations of objects through a combination of self-directed visual exploration and access to a sparse supervisory signal in the form of spoken object names. Studies of infant development have shown that children are able to locate, track, and differentiate novel object instances from a continuous sequence of visual inputs without requiring dense object labels. This thesis develops methods for on-line visual learning in video which are inspired by infant object learning and are enabled by recent advances in deep learning architectures. We introduce two methods and a dataset to support this thesis. First, we demonstrate a convolutional neural network for detecting and tracking objects in continuous video. These detections are generated by harnessing the temporal continuity of the visual world, and can be used as space-time trajectories for objects in the scene. This method is capable of generating space-time proposals from streaming video, which presents a starting point for on-line weakly-supervised learning. We show that a network can be trained to detect objects more reliably when given a sequence of frames, while being 2.5 times faster when compared to traditional single frame detectors. The second part of this thesis studies the incremental learning paradigm in a setting similar to an infant's play environment. To mimic an environment where children pick up, examine, and put down different objects, we develop a novel data generation pipeline which can produce an arbitrary number of learning exposures composed of videos of rotating objects. Enabled by this data generator, we introduce a novel object learning problem, known as self-directed incremental learning, where an agent needs to decide whether a learning exposure corresponds to a previously-seen object or a new object. We present a simple solution to this problem, which has the ability to work with 100 unique objects shown repeatedly to the learner. From our extensive experiments we conclude that the effect of catastrophic forgetting, the main obstacle in adapting batch learning algorithms to an incremental learning setting, is diminished when learners are repeatedly exposed to different views of the same object.
Date Issued
Resource Type
Resource Subtype
Rights Statement
Rights URI