Learning to reason is hard
My Session Status
What:
Talk
Part of:
When:
9:00 AM, Tuesday 11 Jun 2024 EDT
(1 hour 30 minutes)
Theme:
Large Language Models & Learning
Reasoning is the action of drawing conclusions efficiently by composing learned concepts. In this presentation I’ll give a few examples illustrating why it is hard to learn to reason with current machine learning approaches. I will describe a general framework (generalization of the unseen) that characterizes most reasoning problems and out-of-distribution generalization in general, and give insights about intrinsic biases of current models. I will then present the specific problem of length generalization and why some instances can be solved by models like Transformers and some cannot.
References
Boix-Adsera, E., Saremi, O., Abbe, E., Bengio, S., Littwin, E., & Susskind, J. (2023). When can transformers reason with abstract symbols? arXiv preprint arXiv:2310.09753.ICLR 2024
Zhou,, E., Razin, N., Saremi, O., Susskind, J., … & Nakkiran, P. (2023). What algorithms can transformers learn? a study in length generalization. arXiv preprint arXiv:2310.16028. ICLR 2024
Abbe, E., Bengio, S., Lotfi, A., & Rizk, K. (2023). Generalization on the unseen, logic reasoning and degree curriculum. In International Conference on Machine Learning (pp. 31-60). PMLR. ICML 2023.