Riassunto analitico
Anomaly detection in surveillance videos strives for the recognition of abnormal events that significantly deviate from the normal or expected behavior. Although anomalies are generally local, as they happen in a limited portion of the frame, none of the previous works on the subject has ever studied the contribution of locality. In this work, we explore the impact of considering spatio-temporal tubes instead of whole-frame segments. For this purpose, we enrich some existing surveillance videos with both spatial and temporal annotations: it is the first dataset for anomaly detection with bounding box supervision in both train and test set. Our experiments show that a network trained with spatio-temporal tubes performs better than its corresponding network trained with whole-frame videos. Moreover, we discover that the locality is robust to different kinds of errors in the tube extraction phase at test time. Finally, we demonstrate that our network is able to provide spatio-temporal proposals for unseen, unlabeled surveillance videos. In this way, we show that it is possible to enlarge our existing dataset without the need of further human effort.
|