CSE Colloquium: Compositional and Robust Action Understanding
Zoom Information: Join from PC, Mac, Linux, iOS or Android: https://psu.zoom.us/j/96676545814?pwd=ejEwSVQxbUFiM29zcUkrWFVrVzBXZz09 Password: 324720
Or iPhone one-tap (US Toll): +16468769923,96676545814# or +13017158592,96676545814#
Or Telephone: Dial: +1 646 876 9923 (US Toll) +1 301 715 8592 (US Toll) +1 312 626 6799 (US Toll) +1 669 900 6833 (US Toll) +1 253 215 8782 (US Toll) +1 346 248 7799 (US Toll) Meeting ID: 966 7654 5814 Password: 324720 International numbers available: https://psu.zoom.us/u/acmMiqAzb1
ABSTRACT: In an era with massive video data becoming available from a wide range of applications (e.g., smart home devices, medical instruments, intelligent transportation networks, etc.), designing algorithms which understand action can enable machines to interact meaningfully with human partners. Practically, continuous video streams require temporal localization of actions before a trimmed action recognition method can be applied, yet such annotation is expensive and suffers from annotation consistency issues. Also, early video understanding technologies mostly use holistic frame modeling and do not employ reasoning capabilities. In this talk, I will discuss how to detect action in continuous video streams efficiently. Specifically, I will talk about several temporal action detection models with different levels of supervision. Next, I will describe how to understand action compositionally with localized foreground subjects or objects to reduce the effect of confounding variables and bridge a connection with common knowledge of involved objects. Additionally, natural language provides an efficient and intuitive way to convey details of action to a human. I will conclude the talk with some perspectives on how compositional and efficient modeling opens the door for real-word action understanding with high complexity and fine granularity.
BIOGRAPHY: Huijuan Xu is a postdoctoral scholar in the EECS department at UC Berkeley advised by Prof. Trevor Darrell. Her research focuses on deep learning, computer vision and natural language processing, particularly in the area of action understanding in videos. Specifically, she has investigated efficient action detection, compositional action understanding and action description using language. Her R-C3D work received the Most Innovative Award in ActivityNet Challenge. Prior to UC Berkeley, she received her PhD degree from the computer science department at Boston University in 2018, and interned at Disney Research, Pittsburgh. She is selected as the Rising Star 2020 in EECS.
Event Contact: Wang-Chien Lee