Skip to main page content

Lessons from Computer Vision: We are (still!) not giving Data enough credit

My Session Status

What:
Talk
Part of:
When:
1:30 PM, Wednesday 12 Jun 2024 EDT (1 hour 30 minutes)
Theme:
Large Language Models & Multimodal Grounding
For most of Computer Vision’s existence, the focus has been solidly on algorithms and models, with data treated largely as an afterthought. Only recently did our discipline finally begin to appreciate the singularly crucial role played by data. In this talk, I will begin with some historical examples illustrating the importance of large visual data in both computer vision as well as human visual perception. I will then share some of our recent work demonstrating the power of very simple algorithms when used with the right data. Recent results in visual in-context learning, large vision models, and visual data attribution will be presented.

 

References

Bai, Y., Geng, X., Mangalam, K., Bar, A., Yuille, A., Darrell, T., Malik, J. and Efros, A.A. (2023). Sequential modeling enables scalable learning for large vision models. arXiv preprint arXiv:2312.00785.

Bar, A., Gandelsman, Y., Darrell, T., Globerson, A., & Efros, A. (2022). Visual prompting via image inpainting. Advances in Neural Information Processing Systems, 35, 25005-25017.

Pathak, D., Agrawal, P., Efros, A. A., & Darrell, T. (2017). Curiosity-driven exploration by self-supervised prediction. In International conference on machine learning (pp. 2778-2787). PMLR.

My Session Status

Session detail
Allows attendees to send short textual feedback to the organizer for a session. This is only sent to the organizer and not the speakers.
To respect data privacy rules, this option only displays profiles of attendees who have chosen to share their profile information publicly.

Changes here will affect all session detail pages