Title:
Bridging distributional discrepancy with temporal dynamics for video understanding

dc.contributor.advisor AlRegib, Ghassan
dc.contributor.author Chen, Min-Hung
dc.contributor.committeeMember Kira, Zsolt
dc.contributor.committeeMember Vela, Patricio
dc.contributor.committeeMember Tsai, Yi-Chang
dc.contributor.committeeMember Dyer, Eva
dc.contributor.department Electrical and Computer Engineering
dc.date.accessioned 2020-09-08T12:43:38Z
dc.date.available 2020-09-08T12:43:38Z
dc.date.created 2020-08
dc.date.issued 2020-05-21
dc.date.submitted August 2020
dc.date.updated 2020-09-08T12:43:38Z
dc.description.abstract Video has become one of the major media in our society, bringing considerable interests in the development of video analysis techniques for various applications. Temporal Dynamics, which characterize how information changes along time, is the key component for videos. However, it is still not clear how temporal dynamics benefit video tasks, especially for the cross-domain case, which is close to real-world scenarios. Therefore, the objective of this thesis is to effectively exploit temporal dynamics from videos to tackle distributional discrepancy problems for video understanding. To achieve this objective, firstly I identified the benefits for exploiting temporal dynamics for videos, including proposing Temporal Segment LSTM (TS-LSTM) and Inception-style Temporal-ConvNet (Temporal-Inception) for general video understanding, and demonstrating that temporal dynamics can help reduce temporal variations for cross-domain video understanding. Since most previous work only evaluates the performance on small-scale datasets with little domain discrepancy, I collected two large-scale datasets for video domain adaptation: UCF HMDB_full and Kinetics-Gameplay to facilitate cross-domain video research, and proposed Temporal Attentive Adversarial Adaptation Network (TA3N) to simultaneously attend, align and learn temporal dynamics across domains. Finally, to utilize temporal dynamics from unlabeled videos for action segmentation, I proposed Self-Supervised Temporal Domain Adaptation (SSTDA) to jointly align cross-domain feature spaces embedded with local and global temporal dynamics by two self-supervised auxiliary tasks, binary and sequential domain prediction, and demonstrated the usefulness of adapting to unlabeled videos across variations.
dc.description.degree Ph.D.
dc.format.mimetype application/pdf
dc.identifier.uri http://hdl.handle.net/1853/63572
dc.language.iso en_US
dc.publisher Georgia Institute of Technology
dc.subject Domain adaptation
dc.subject Action recognition
dc.subject Action segmentation
dc.subject Self-supervised learning
dc.subject Video understanding
dc.subject Transfer learning
dc.subject Unsupervised learning
dc.subject Temporal dynamics
dc.subject Domain discrepancy
dc.subject Temporal variations
dc.subject Multi-scale
dc.title Bridging distributional discrepancy with temporal dynamics for video understanding
dc.type Text
dc.type.genre Dissertation
dspace.entity.type Publication
local.contributor.advisor AlRegib, Ghassan
local.contributor.corporatename School of Electrical and Computer Engineering
local.contributor.corporatename College of Engineering
relation.isAdvisorOfPublication 7942fed2-1bb6-41b8-80fd-4134f6c15d8f
relation.isOrgUnitOfPublication 5b7adef2-447c-4270-b9fc-846bd76f80f2
relation.isOrgUnitOfPublication 7c022d60-21d5-497c-b552-95e489a06569
thesis.degree.level Doctoral
Files
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
Name:
CHEN-DISSERTATION-2020.pdf
Size:
37.42 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
3.87 KB
Format:
Plain Text
Description: