Lessons from Computer Vision: We are (still!) not giving Data enough credit
Mon statut pour la session
Quoi:
Talk
Partie de:
Quand:
1:30 PM, Mercredi 12 Juin 2024 EDT
(1 heure 30 minutes)
Thème:
Large Language Models & Multimodal Grounding
For most of Computer Vision’s existence, the focus has been solidly on algorithms and models, with data treated largely as an afterthought. Only recently did our discipline finally begin to appreciate the singularly crucial role played by data. In this talk, I will begin with some historical examples illustrating the importance of large visual data in both computer vision as well as human visual perception. I will then share some of our recent work demonstrating the power of very simple algorithms when used with the right data. Recent results in visual in-context learning, large vision models, and visual data attribution will be presented.
References
Bai, Y., Geng, X., Mangalam, K., Bar, A., Yuille, A., Darrell, T., Malik, J. and Efros, A.A. (2023). Sequential modeling enables scalable learning for large vision models. arXiv preprint arXiv:2312.00785.
Bar, A., Gandelsman, Y., Darrell, T., Globerson, A., & Efros, A. (2022). Visual prompting via image inpainting. Advances in Neural Information Processing Systems, 35, 25005-25017.
Pathak, D., Agrawal, P., Efros, A. A., & Darrell, T. (2017). Curiosity-driven exploration by self-supervised prediction. In International conference on machine learning (pp. 2778-2787). PMLR.