Schedule Dec 08, 2023
Tutorial on Neural Scaling Laws
Alex Atanasov, Harvard

I'll cover the scaling laws observed in language models, the chinchilla paper, as well as proposed explanations for the parameter and data-limited regimes that can occur. Much of the talk will be about presenting empirical observations and discussing that. Depending on audience interest we can go deeper into further observations or cover proposed theoretical models of this scaling phenomena. I'll aim to keep things informal, aimed at non-experts, and I encourage any/all questions! 

Some references if you are interested to learn more:  "Early" scaling laws papers:

The Chinchilla paper "Training Compute-Optimal Language Models":

Yasaman and colleagues' nice paper on explaining neural scaling laws:

Equations for scaling laws of kernel regression:

Spigler, Geiger, and Wyart's nice paper:

Blake and Abdul's work:

and follow up:

The limiting effects of finite width:

In addition to Yasaman's paper, there's our Onset of Variance paper with Blake:

Relatedly, a random feature analysis by Bruno and collaborators has many of the key insights necessary to derive the limits of finite width:

and a further one on the role played by ensembling:

A related model is also explored by Maloney et al:

Also some recent work by our group on consistency of dynamics of trained networks across widths as well as further aspects finite width causing limiting effects

To download: Right-click and choose "Save Link As..."