A course in eleven chapters

You built a Pokémon AI.
Here's how it works.

This repository contains a system that taught itself to play competitive Pokémon doubles by playing millions of turns against itself — and became a draft-and-play recommender. You steered every step of it. This course explains, from zero, what each piece actually does and why it exists.

No machine-learning background is assumed. Every term is defined the first time it appears, every idea is grounded in something that actually happened in this project, and where an analogy helps, it comes from data engineering — pipelines, caches, schemas — because that's the world this course's reader lives in.

The whole project in one picture

PLAY 48 games at once inside Pokémon Showdown bridge.mjs · vecbridge.py LEARN a neural network nudged toward winning choices trainer.py · PPO MEASURE is it stronger? is it still sane? elo.py · the gate outcomes candidate the improved player becomes its own next opponent the recommender
The loop the whole repo exists to run. The AI plays batches of games against itself, learns from the outcomes, gets measured, and — if it passes — becomes both the new opponent and the product. Every chapter in this course explains one part of this picture.

The story in one paragraph

Pokémon Champions Regulation M-B is a doubles format: bring six Pokémon, pick four, lead with two. The lucky break that made this project feasible is that the format already exists inside Pokémon Showdown, an open-source battle simulator — so instead of building a game engine, we wrapped one (Chapter 2). On top of it we trained a small neural network by reinforcement learning: it starts clicking random buttons, and every win or loss nudges it toward better clicking (Chapters 3–7). It grew eyes (features for HP, speed, "does this move KO?"), a memory for identities (embeddings for species, items, moves), and eventually took over drafting the team itself. A search procedure that thinks one turn ahead made it stronger still — and then quietly destroyed its ability to play Trick Room, teaching us the project's biggest lesson: a win-rate number can hide a strategy lobotomy (Chapter 10). The final system recommends your pick-four, your leads, and your first moves against any opponent (Chapter 11).

The course

Part I — The setup

  1. 1The game and the goalWhat Reg M-B doubles is, why it's genuinely hard for a computer, and what "success" was defined as.
  2. 2Borrowing a worldShowdown as the game engine, the Node⇄Python bridge, the damage oracle, and the 599-team corpus.

Part II — The learner

  1. 3Learning by playingReinforcement learning from zero: states, actions, rewards — and why hand-written rules lost to random.
  2. 4What the AI seesTurning a battle into 114 numbers, and embeddings: how the net learned who Incineroar is.
  3. 5The brainThe pointer network: score every legal action, never an illegal one, and estimate who's winning.
  4. 6Training dayPPO, self-play against a league of past selves, and the collapse that taught us about brittle opponents.
  5. 7The credit problemOne ±1 reward per 20 decisions: how reward shaping and GAE figure out which move deserved it.

Part III — The craft

  1. 8Measuring strengthWhy vs-random lies, what Elo can and can't see, and the paired benchmark that resolves 1% edges.
  2. 9Thinking aheadDepth-1 search as a game of simultaneous moves, and expert iteration: baking the planner into the weights.
  3. 10When winning liesThe Trick Room collapse: how win-rate hid a strategy lobotomy, and the regression gate built to catch it.

Part IV — The payoff

  1. 11The productThe recommender, battle-log generation, the foe-blind draft problem, and what a new metagame would cost.
  2. ·GlossaryEvery term in the project, one card each — from action space to value head, mc5 to wider.

How to read this course

Chapters build on each other, so first time through, go in order — Part II especially is a staircase. Along the way you'll meet recurring signposts:

In plain terms analogy

The concept restated with no math, usually in data-engineering language.

War story pitfall

Something that actually went wrong in this project, and what it taught us. These are the best parts.

Key point takeaway

The one sentence to remember if you remember nothing else from the section.

Check yourself quiz

Two or three questions at the end of each chapter. Click to reveal the answer.

Numbers in this course are the project's real numbers — win rates, Elo ratings, feature dimensions — taken from the repo's own docs (docs/roadmap.md, docs/evaluate.md, docs/regression.md, docs/architecture.md and the archive). Where a chapter refers to code, it names the file, so you can go read the real thing.

Key point

Nothing in this system was designed by a genius in one sitting. It's a loop — play, learn, measure — plus two years' worth of lessons compressed into a few months of asking "why is it doing that?" and fixing what the answer revealed. This course is the map of those fixes.