eprintid: 17157
rev_number: 15
eprint_status: archive
userid: 1285
dir: disk0/00/01/71/57
datestamp: 2014-07-29 07:39:14
lastmod: 2014-08-04 12:18:59
status_changed: 2014-07-29 07:39:14
type: doctoralThesis
metadata_visibility: show
creators_name: Antić, Borislav
title: Latent Structured Models for Video Understanding
subjects: ddc-004
divisions: i-110300
divisions: i-708000
divisions: i-708070
adv_faculty: af-11
abstract: The proliferation of videos in recent years has spurred a surge of interest in developing efficient techniques for automatic video interpretation. The thesis  improves the understanding of videos by building structured models that use  latent information to detect and recognize instances of actions or  abnormalities in videos. The thesis also proposes efficient algorithms for inference in and learning of the proposed latent structured models that are  appropriate for learning with weak supervision.

An important class of latent variable models is the multiple instance learning where the training labels are provided only for bags of instances, but not for instances themselves. As inference of latent instance labels is performed jointly with training of a classifier on the same data, multiple-instance learning is very susceptible to overfitting. To increase the robustness of popular methods for multiple instance learning, the thesis introduces a novel concept of superbags (ensemble of bags of bags) that allows for decoupling of classifier training and latent label inference steps.

In the thesis, a novel latent structured representation is proposed to discover instances of action classes in videos and jointly train an action classifier on them. Action class instances typically occupy only a part of the whole video that is not annotated in weakly labeled training videos. Therefore, multiple instance learning is proposed to find these latent action instances in training videos and jointly train the action classifier. The thesis proposes a sequential method to multiple instance learning to increase the robustness of the training. 

For the interpretation of crowded scenes, it is important to detect all irregular objects or actions in a video. However, the abnormality detection is hindered by the fact that the training set does not contain any abnormal sample, thus it is necessary to find abnormalities in a test video without actually knowing what they are. To address this problem, the thesis proposes a probabilistic graphical model for video parsing that searches for latent object hypotheses to jointly explain all the foreground pixels, which are, at the same time, well matched to the normal training samples. By inferring all latent normal hypotheses in a video, the model indirectly finds  abnormalities as those hypotheses that are not supported by normal samples but still need to be used to explain the foreground. Video parsing is applied sequentially on individual video frames, where hypotheses are jointly inferred by a local search in a graphical model. The thesis then proposes a spatio-temporal extension of the video parsing, where an efficient inference method based on convex optimization is developed to find abnormal/normal spatio-temporal hypotheses in the video.
date: 2014
id_scheme: DOI
id_number: 10.11588/heidok.00017157
ppn_swb: 1658750837
own_urn: urn:nbn:de:bsz:16-heidok-171579
date_accepted: 2014-07-22
advisor: HASH(0x558ea2549870)
language: eng
bibsort: ANTICBORISLATENTSTRU2014
full_text_status: public
citation:   Antić, Borislav  (2014) Latent Structured Models for Video Understanding.  [Dissertation]     
document_url: https://archiv.ub.uni-heidelberg.de/volltextserver/17157/1/phd-thesis.pdf