MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions

The recent and increasing interest in video-language research has driven the development of large-scale datasets that enable data-intensive machine learning techniques. In comparison, limited effort has been made at assessing the fitness of these …

Generic Object Tracking

Object Tracking in Images and Point Cloud

Sports Video Understanding

Holistic Understanding of Broadcast Videos