Karpathy's microgpt.py Dissected: Understanding GPT's Essence in 150 Lines
A line-by-line dissection of microgpt.py -- a pure Python GPT implementation with zero dependencies. Training, inference, and autograd in 150 lines.

Karpathy's microgpt.py Dissected: Understanding GPT's Essence in 150 Lines
Andrej Karpathy has released new code. This time, it is even more extreme than nanoGPT. A 150-line script that trains and runs inference on a GPT, using pure Python with no external libraries.
No PyTorch. No NumPy. Just three imports: os, math, random.
The comment at the top of the code says it all:
"This file is the complete algorithm. Everything else is just efficiency."
In this post, we dissect microgpt.py line by line. Follow along with the code, and you will see that the algorithm behind GPT is a surprisingly simple composition of mathematical operations.
Overall Structure
microgpt.py breaks down into roughly 6 parts:
| Part | Lines | Role |
|---|---|---|
| Data & Tokenizer | ~10 | Load name dataset, character-level tokenization |
| Value Class (Autograd) | ~35 | Scalar automatic differentiation engine |
| Parameter Initialization | ~15 | Weight matrix creation (4,192 parameters) |
| Model Architecture | ~40 | Embedding + Attention + MLP + RMSNorm |
| Training Loop | ~20 | Cross-entropy loss + Adam optimizer |
| Inference | ~15 | Name generation via temperature sampling |
Total parameters: 4,192. Compared to GPT-2 Small's 124M, that is roughly 30,000x smaller. But the algorithm is identical.
Related Posts

Build Your Own LLM Knowledge Base β A Karpathy-Style Knowledge System
Complete guide to building a permanent personal knowledge system with Obsidian + Claude Code. Wiki + Memory dual-axis architecture.

Why Karpathy's CLAUDE.md Got 48K Stars β And How to Write Your Own
One markdown file raised AI coding accuracy from 65% to 94%. Analyzing Karpathy's 4 rules and practical writing guide.

Why AI Forgets Everything β 3 Open-Source Solutions to the Memory Crisis
karpathy-skills, claude-mem, Cognee β comparing 3 approaches to solving the AI memory problem.