Passer au contenu de la page principale

Emergent Behaviors in Foundational Models

Mon statut pour la session

Quoi:
Talk
Partie de:
Quand:
1:30 PM, Jeudi 13 Juin 2024 EDT (1 heure 30 minutes)
Thème:
Large Language Models & Learning
The field of AI is advancing at unprecedented speed due to the rise of foundation models – large-scale, self-supervised pre-trained models whose impressive capabilities greatly increase with scaling the amount of training data, model size and computational power. Empirical neural scaling laws aim to predict scaling behaviors of foundation models, thus serving as an “investment tool” towards choosing the best-scaling methods with increased compution, likely to stand the test of time and  escaping “the bitter lesson”. Predicting AI behaviors at scale, especially “phase transitions” and emergence, is highly important from the perspective of AI Safety and Alignment with human intent. I will present our efforts towards accurate forecasting of AI behaviors using both an open-box approach, when the model’s internal learning dynamics is accessible, and a closed-box approach of inferring neural scaling laws based solely on external observations of AI behavior at scale. I will provide an overview of open-source foundation models our lab has built over the past year thanks to the large INCITE compute grant on Summit and Frontier supercomputers at OLCF, including multiple 9.6B LLMs trained continually, the first Hindi model Hi-NOLIN, the multimodal vision-text model suite Robin, as well as time-series foundation models. I will highlight the continual pre training paradigm that allows train models on potentially infinite datasets, as well as approaches to AI ethics and multimodal alignment. 

See our CERC-AAI project page for more details: https://www.irina-lab.ai/projects.

 

References

Ibrahim, A., Thérien, B., Gupta, K., Richter, M. L., Anthony, Q., Lesort, T., … & Rish, I. (2024). Simple and Scalable Strategies to Continually Pre-train Large Language Models. arXiv preprint arXiv:2403.08763.

Rifat Arefin, M., Zhang, Y., Baratin, A., Locatello, F., Rish, I., Liu, D., & Kawaguchi, K. (2024). Unsupervised Concept Discovery Mitigates Spurious Correlations. arXiv e-prints, arXiv-2402.

Jain, A. K., Lehnert, L., Rish, I., & Berseth, G. (2024). Maximum State Entropy Exploration using Predecessor and Successor RepresentationsAdvances in Neural Information Processing Systems36.

Irina Rish

Conférencier.ère

Mon statut pour la session

Detail de session
Pour chaque session, permet aux participants d'écrire un court texte de feedback qui sera envoyé à l'organisateur. Ce texte n'est pas envoyé aux présentateurs.
Afin de respecter les règles de gestion des données privées, cette option affiche uniquement les profils des personnes qui ont accepté de partager leur profil publiquement.

Les changements ici affecteront toutes les pages de détails des sessions