**
**

**By now, you've likely heard about different "regimes" of learning in wide neural nets --- in particular, the contrast between the "(neural tangent) kernel" regime, which is said to not learn features, and the "feature learning" or "mu-parameterized" regime, in which features are learned but analysis is much harder. This is a new foundational idea in deep learning theory (as evidenced by its importance to Sho + Guillame's excellent talks), and it's becoming clear that it's often important to understand for both theory + practice.
**

**For that reason, I figured it'd be useful to give a tutorial sort of talk to explain what this is all about. It'll be very informal and aimed at non-experts. I'm not going to try to prove much, just explain what things are and give an intuitive flavor for why this all matters, how we should think about it, etc. Let's meet at 3pm tomorrow in the auditorium.
----------------------------------------------------------------------
**

**Some central references if you're curious:
**

- The original NTK paper
- Google's NTK paper, which I found more physicist-accessible (shoutout to Yasaman)
- On Lazy Training, which points out this dichotomy
- Tensor Programs IV, which describes the "mu-parameterization" infinite-width limit now gaining popularity in practice (note: this paper has important ideas but is hard to parse -- I suggest watching Greg Yang's talk version)

Some more references:

- Here are some lecture notes by Cengiz + Blake Bordelon that give an introduction to hyperparameter scaling.
- Here is a paper by Sho that could also serve as a good intro!
- Finally, in a shameless plug, here's a recent paper with Greg Yang that I'm party to, the purpose of which is to give the simplest possible derivation of the feature-learning limit.

**
**

To download: Right-click and choose "Save Link As..."