A course in eleven chapters
You built a Pokémon AI.
Here's how it works.
This repository contains a system that taught itself to play competitive Pokémon doubles by playing millions of turns against itself — and became a draft-and-play recommender. You steered every step of it. This course explains, from zero, what each piece actually does and why it exists.
No machine-learning background is assumed. Every term is defined the first time it appears, every idea is grounded in something that actually happened in this project, and where an analogy helps, it comes from data engineering — pipelines, caches, schemas — because that's the world this course's reader lives in.
The whole project in one picture
The story in one paragraph
Pokémon Champions Regulation M-B is a doubles format: bring six Pokémon, pick four, lead with two. The lucky break that made this project feasible is that the format already exists inside Pokémon Showdown, an open-source battle simulator — so instead of building a game engine, we wrapped one (Chapter 2). On top of it we trained a small neural network by reinforcement learning: it starts clicking random buttons, and every win or loss nudges it toward better clicking (Chapters 3–7). It grew eyes (features for HP, speed, "does this move KO?"), a memory for identities (embeddings for species, items, moves), and eventually took over drafting the team itself. A search procedure that thinks one turn ahead made it stronger still — and then quietly destroyed its ability to play Trick Room, teaching us the project's biggest lesson: a win-rate number can hide a strategy lobotomy (Chapter 10). The final system recommends your pick-four, your leads, and your first moves against any opponent (Chapter 11).
The course
Part I — The setup
- 1The game and the goalWhat Reg M-B doubles is, why it's genuinely hard for a computer, and what "success" was defined as.
- 2Borrowing a worldShowdown as the game engine, the Node⇄Python bridge, the damage oracle, and the 599-team corpus.
Part II — The learner
- 3Learning by playingReinforcement learning from zero: states, actions, rewards — and why hand-written rules lost to random.
- 4What the AI seesTurning a battle into 114 numbers, and embeddings: how the net learned who Incineroar is.
- 5The brainThe pointer network: score every legal action, never an illegal one, and estimate who's winning.
- 6Training dayPPO, self-play against a league of past selves, and the collapse that taught us about brittle opponents.
- 7The credit problemOne ±1 reward per 20 decisions: how reward shaping and GAE figure out which move deserved it.
Part III — The craft
- 8Measuring strengthWhy vs-random lies, what Elo can and can't see, and the paired benchmark that resolves 1% edges.
- 9Thinking aheadDepth-1 search as a game of simultaneous moves, and expert iteration: baking the planner into the weights.
- 10When winning liesThe Trick Room collapse: how win-rate hid a strategy lobotomy, and the regression gate built to catch it.
Part IV — The payoff
- 11The productThe recommender, battle-log generation, the foe-blind draft problem, and what a new metagame would cost.
- ·GlossaryEvery term in the project, one card each — from action space to value head, mc5 to wider.
How to read this course
Chapters build on each other, so first time through, go in order — Part II especially is a staircase. Along the way you'll meet recurring signposts:
In plain terms analogy
The concept restated with no math, usually in data-engineering language.
War story pitfall
Something that actually went wrong in this project, and what it taught us. These are the best parts.
Key point takeaway
The one sentence to remember if you remember nothing else from the section.
Check yourself quiz
Two or three questions at the end of each chapter. Click to reveal the answer.
Numbers in this course are the project's real numbers — win rates, Elo ratings, feature dimensions — taken from the repo's own docs (docs/roadmap.md, docs/evaluate.md, docs/regression.md, docs/architecture.md and the archive). Where a chapter refers to code, it names the file, so you can go read the real thing.
Nothing in this system was designed by a genius in one sitting. It's a loop — play, learn, measure — plus two years' worth of lessons compressed into a few months of asking "why is it doing that?" and fixing what the answer revealed. This course is the map of those fixes.