Events
Seminar @ Cornell Tech: Carl Vondrick
Multimodal Learning from Pixels to People
Abstract: People experience the world through modalities of sight, sound, words, touch, and more. By leveraging their natural relationships and developing multimodal learning methods, my research creates artificial perception systems with diverse skills, including spatial, physical, logical, and cognitive abilities, for flexibly analyzing visual data. This multimodal approach provides versatile representations for tasks like 3D reconstruction, visual question answering, and object recognition, while offering inherent explainability and excellent zero-shot generalization across tasks. By closely integrating diverse modalities, we can overcome key challenges in machine learning and enable new capabilities for computer vision, especially for the upcoming applications where trust is required.
Speaker Bio
Carl Vondrick is the YM Associate Professor of Computer Science at Columbia University. Previously, he was a Research Scientist at Google, and he received his PhD from MIT. His research interests are in computer vision, machine learning, and their applications. He is the recipient of the NSF CAREER award, and his research is supported by the NSF, DARPA, Amazon, Google, and Toyota. For more information, please visit his website at https://www.cs.columbia.edu/~vondrick/.