Emergent Behaviors in Foundational Models

My Session Status

What:

Talk

Part of:

Day 9

When:

1:30 PM, Thursday 13 Jun 2024 EDT (1 hour 30 minutes)

Theme:

Large Language Models & Learning

The field of AI is advancing at unprecedented speed due to the rise of foundation models – large-scale, self-supervised pre-trained models whose impressive capabilities greatly increase with scaling the amount of training data, model size and computational power. Empirical neural scaling laws aim to predict scaling behaviors of foundation models, thus serving as an “investment tool” towards choosing the best-scaling methods with increased compution, likely to stand the test of time and escaping “the bitter lesson”. Predicting AI behaviors at scale, especially “phase transitions” and emergence, is highly important from the perspective of AI Safety and Alignment with human intent. I will present our efforts towards accurate forecasting of AI behaviors using both an open-box approach, when the model’s internal learning dynamics is accessible, and a closed-box approach of inferring neural scaling laws based solely on external observations of AI behavior at scale. I will provide an overview of open-source foundation models our lab has built over the past year thanks to the large INCITE compute grant on Summit and Frontier supercomputers at OLCF, including multiple 9.6B LLMs trained continually, the first Hindi model Hi-NOLIN, the multimodal vision-text model suite Robin, as well as time-series foundation models. I will highlight the continual pre training paradigm that allows train models on potentially infinite datasets, as well as approaches to AI ethics and multimodal alignment.

See our CERC-AAI project page for more details: https://www.irina-lab.ai/projects.

References

Ibrahim, A., Thérien, B., Gupta, K., Richter, M. L., Anthony, Q., Lesort, T., … & Rish, I. (2024). Simple and Scalable Strategies to Continually Pre-train Large Language Models. arXiv preprint arXiv:2403.08763.

Rifat Arefin, M., Zhang, Y., Baratin, A., Locatello, F., Rish, I., Liu, D., & Kawaguchi, K. (2024). Unsupervised Concept Discovery Mitigates Spurious Correlations. arXiv e-prints, arXiv-2402.

Jain, A. K., Lehnert, L., Rish, I., & Berseth, G. (2024). Maximum State Entropy Exploration using Predecessor and Successor Representations. Advances in Neural Information Processing Systems, 36.

Irina Rish

Speaker

My Session Status

Allow attendees to rate sessions with a "thumbs up" or "thumbs down".

Allow attendees to send feedback about sessions

Allows attendees to send short textual feedback to the organizer for a session. This is only sent to the organizer and not the speakers.

Display the list of attendees for each session.

To respect data privacy rules, this option only displays profiles of attendees who have chosen to share their profile information publicly.

Allow attendees to participate in a discussion thread for sessions

Changes here will affect all session detail pages

Emergent Behaviors in Foundational Models

My Session Status

References

My Session Status

Session detail

We use cookies