nanoGPT

Train a transformer from scratch in ~10 minutes.

Demystified

Interactive

A ~10-million-parameter transformer, trained from scratch on free cloud hardware. Watch the loss fall from random noise to recognizable prose.

Published

May 23, 2026

Open In Colab

Needs the free T4 GPU runtime. Set Runtime → Change runtime type → T4 GPU → Save before running.

A ~10-million-parameter character-level transformer, trained from scratch on free cloud hardware. Watch the loss fall from random noise to recognizable prose. It’s the same procedure that built GPT-4 — 200,000× smaller, four orders of magnitude less data, identical idea.

There’s a dropdown at the top: pick Shakespeare or Mark Twain. Run it, then come back, switch authors, and run it again. Every other line of code is identical — only the data changes, and a completely different writer comes out. That’s the whole point: the architecture is general; the data is what gives a model its voice.

First presented in Demystifying AI: Separating Architecture from the Hype (SIM, May 2026).