Chapter 4 · Part II — The learner

What the AI sees

A neural network can't watch a battle. It eats fixed-length lists of numbers, and nothing else. This chapter is about the eyes the project built: the 114 numbers that describe the battlefield, the 30 numbers that describe each possible click, a damage calculator wired in as a sense organ — and embeddings, the trick that let the net learn who Incineroar actually is.

After this chapter you can explain

What featurization is, and what's inside the 114-number state vector and the 30-number per-action row
Why the KO oracle is a feature the net may weigh, not a rule it must obey — and what it lifted (95% → 99%)
What an embedding is, in dimension-table terms, and the measured staircase: blind → species → full rosters → items and abilities
The allyHit bug: how a wrong feature quietly poisons a policy without ever crashing anything

Featurization: the ETL job in front of the brain

Chapter 3 kept saying "the agent receives an observation." Time to be exact about what that is. A neural net's input layer has a fixed width; whatever you feed it must be a list of numbers of exactly that length, in exactly the same order, every single time. The battle, meanwhile, is a sprawling live object — species, HP bars, status conditions, weather, whose turn it is. Featurization is the translation: turning the live battle into that fixed-length number list, identically on every decision.

In plain terms

The feature vector is a fact row in a star schema, and src/selfplay/featurize.py is the ETL job that populates it. Every turn produces one row with a rigid, versioned schema: column 0 always means the same thing, column 87 always means the same thing. The net is a downstream consumer that binds to columns by position. And exactly like a real pipeline, if the ETL silently writes garbage into a column, nothing errors — the consumer just learns from wrong data. Hold that thought for the war story.

The bridge's state tracker (in src/selfplay/bridge.mjs) watches Showdown's battle stream and maintains a structured snapshot; featurize.py flattens that snapshot into the vector. Two kinds of input come out of this stage: one vector describing the situation, and one row per candidate action describing each possible click. Situation first.

The state vector: 114 numbers

The current state vector is 114 numbers long (it started at 77; the upgrade story comes later in the chapter). Its contents, in readable groups, straight from docs/architecture.md:

First, the field. Weather and terrain are each stored as a one-hot — a group of columns, one per possible value, where exactly one column is 1 and the rest are 0. "Sun" isn't encoded as the number 3, because 3 would imply sun is "more than" rain in some ordered sense; one-hot encoding gives each weather its own yes/no flag instead. Alongside them: a Trick Room flag and the turn counter.

Next, 4 speed bits: for each of my two active Pokémon against each of the foe's two, does mine move first? Crucially, these bits are Trick-Room-adjusted — when Trick Room is up and slow moves first, the bits flip. The net doesn't have to derive speed order from raw stats; the sense is precomputed.

Then the heart of it: 4 active Pokémon × 16 numbers each — HP fraction, status condition, stat boosts, and the rest of each active mon's condition. That's 64 of the 114 right there.

Then both benches: all 6 Pokémon on each roster, × {HP fraction, fainted flag}. Seeing the opponent's full roster isn't cheating: Regulation M-B uses Open Team Sheets, meaning both players see both full teams from the start. The feature vector simply encodes knowledge every human player at the table legally has.

Finally: 4 protect-last bits (did each active mon Protect last turn? — consecutive Protects usually fail, so this matters) and side conditions (screens, tailwind).

The fact row. Segment widths are proportional to their share of the 114 columns; the two biggest tenants are the four active Pokémon (64 numbers) and the two full rosters (24 numbers, legal knowledge under Open Team Sheets). Identity — which Pokémon these numbers describe — travels separately, as integer ids feeding embeddings.

Per-action features: what would this click be?

The state vector describes the board. But the net's job is to score each candidate in the action space, so each candidate action carries its own description: a 30-number row — 2 slots × 15 features, one set per active slot's half of the joint choice. Per slot: is it a move / a switch / a status move; base power; type effectiveness against the target; is it Protect; priority; remaining PP; the switch target's bulk; koFrac and koBit (next section); allyHit (would this hurt my partner); flinch chance; isMega (does this click Mega-evolve); and charge (a two-turn move). The state row answers "where am I?"; the action row answers "what would this click be?"

What would this click be? Every legal joint action gets this 30-number row (built in bridge.mjs): one 15-feature strip per active slot. Highlighted: the calc-oracle KO features, the isMega flag, and allyHit — the feature that shipped broken.

A designed sense: the KO oracle

Two of those columns deserve their own section. For every damaging move-action, the bridge calls the damage calculator — the oracle in engine/calc.ts, originally built to validate Showdown's math — and asks: across the game's 16 possible damage rolls, what fraction knock out the target? That fraction is koFrac; koBit is the stronger claim "all 16 rolls KO — guaranteed." The results are memoized (cached by species, status, boosts, move, weather), because the same question recurs constantly.

This is worth pausing on as a design pattern. The project didn't hope the net would rediscover damage arithmetic from win/loss statistics; it wired a trusted calculator in as a sense organ. And critically, koFrac is a feature the net may weigh, not a rule it must obey. Nothing forces the policy to take a guaranteed KO — sometimes Protect is better — but the information is on the table every turn. On the fixed mirror matchup, adding it lifted the policy from 95% to 99% vs-random.

For the curious

The KO features are honest signals, not ground truth — docs/caveats.md keeps the list. A Mega-evolved Pokémon is still scored with its base forme's stats (a slight underestimate); spread moves report the best single target's KO rather than the 0.75× spread-reduced value; a handful of moves missing from the calc's data (e.g. Swift) degrade gracefully to [0,0]; terrain isn't passed to the calc at all. These were deliberate v1 trade-offs — the +4-point lift proved the signal useful long before it was perfect. Features are evidence, and the net learns how much to trust each one.

Embeddings: teaching the net who Incineroar is

Now the centerpiece. Everything so far describes condition — HP, speed, power. None of it says who. And identity matters enormously: knowing the opposing Incineroar exists means expecting Fake Out and Intimidate, before either has happened. But a net can't read the string "Incineroar." How do you feed it a name?

You already know the answer from data engineering; it just has a different name here. Step one: assign every species an integer — a surrogate key. featurize.py builds this lookup from the team corpus: 165 species, ids 1–165 (0 is reserved for "unknown," the padding row). Step two is the new idea: for each id, the network holds a learned row of numbers — for species, 16 numbers. That row is called an embedding. Wherever a species id appears in the input, the net looks up its row and reads those 16 numbers as the description of that species.

Here's the part that takes a minute to absorb: it's a dimension table whose attribute columns nobody defines. The 16 values per species start as random noise. During training, they get nudged by the same win/loss gradient as everything else — and they drift until they're useful. If distinguishing "fast frail attacker" from "slow bulky support" helps win games, some direction in those 16 numbers comes to encode it. Something like an "is a Fake Out pivot" attribute can emerge — unnamed, unrequested — simply because teams that respected it won more. You define the key and the row width; training invents the columns.

A dimension table with invented attributes. Name → surrogate key → learned 16-number row (values illustrative — they're weights, retuned every update). One table, shared everywhere a species appears: the active mons, both benches, and each switch action's target.

The same trick is applied three more times, with sizes matched to how much there is to know: items and abilities each get 8-number embeddings (for the active mons), and moves get a 332-entry vocabulary at 12 numbers each. The move embedding fixes a real blindness: in the scalar features, Trick Room and Tailwind look almost identical — both status moves, zero base power, zero KO chance — yet they do opposite things to speed. With a learned identity per move, the net can tell them apart and learn what each is for.

For the curious

The vocab tables are also the system's Achilles' heel for format changes. featurize.py rebuilds them alphabetically from the corpus at import time, so a new metagame inserts entries, shifts every id after the insertion point, and reshapes the embedding matrices — old checkpoints won't even load. docs/new-rules.md documents the fix (persistent, append-only id tables — freeze the surrogate keys, exactly as you would in a warehouse) that would turn "retrain from scratch" into "one short fine-tune."

The staircase: what better senses were worth

None of this was speculative — each perception upgrade was measured, mostly head-to-head against the policy that lacked it (an A/B test where the only variable is the new sense, since architecture and training recipe stay fixed). The staircase, with the repo's real numbers:

Upgrade	What was added	Measured
Species embedding	Identity for active mons + switch targets (was HP/type/KO only)	~80% → ~93% h2h vs blind
Perception v2	Both full 6-mon rosters embedded at preview (Open Team Sheets)	~97% h2h vs prior
Perception v3	Terrain, speed order, bench HP, PP, protect-last, item + ability embeddings; state 77 → 114	~2× faster learning

Read the last row carefully, because it's the subtle one. Perception v3 didn't primarily raise the final ceiling — it reached 86% vs-random by iteration 12, roughly twice the previous pace. When the net no longer has to infer speed order from stats or remember Protect usage across turns, it spends its learning capacity on strategy instead of bookkeeping. Richer senses don't just raise the ceiling; they speed the climb.

Key point

Perception was the cheapest strength lever in the whole project. No algorithm changed across these rows — same net shape, same PPO — only the inputs got richer, and each step was verified as a measured A/B win before it stayed.

War story: the feature that lied on every action

War story

The allyHit feature was designed to answer one question per action: "what fraction of my partner's HP would this move cost?" — because spread moves like Earthquake hit your own ally, and the net should know before clicking. But allyDamageFrac in bridge.mjs computed that damage for every damaging move, without checking the move's target type. Single-target moves like Knock Off and foe-only spreads like Blizzard — moves that cannot physically hit the partner — all reported a phantom ally cost. The policy was fed this lie on every action row of every decision of every game. Nothing crashed. Nothing warned. Training converged. Win rates looked normal.

It surfaced only when the regression gate (docs/regression.md) added an ally-KO probe — "how much probability does the policy put on moves that would KO a healthy, exposed ally?" — and the probe fired 302 times in 150 games, absurdly often. Investigating why the probe fired, not any crash, exposed the feature bug. The fix was one guard: compute allyHit only when target === 'allAdjacent' (the Earthquake/Surf/Discharge class). Genuine firings dropped to 16, all real Earthquake-into-ally decisions.

The lesson generalizes to every ML system you'll ever touch: feature bugs don't crash — the model quietly learns the lie. A broken ETL column doesn't throw; it just makes every downstream consumer confidently wrong. Garbage in, policy out. The only defense is behavioral probes on the output, which is what Chapter 10's regression gate is about.

What the AI still can't see

Senses are chosen, budgeted engineering, and docs/tech-debt.md keeps the honest backlog of blind spots. The speed bits ignore ability-conditional speed — a Chlorophyll user in sun actually doubles its speed, but the feature doesn't know that. A Mega-evolved Pokémon is still perceived (and KO-calculated) as its base forme, a deliberate trade-off to keep its embedding identity intact. And the item/ability embeddings cover only the four active mons — the bench's held items and abilities are invisible until they switch in.

Each gap is a known cost/benefit call, not an oversight: every new sense costs bridge code, featurizer schema changes, and a retrain, and gets weighed against measured payoff — the same discipline as deciding which columns earn a place in a fact table. What happens once these numbers reach the network — how 114 state numbers plus a variable-length stack of 30-number action rows become one probability per legal action — is Chapter 5.

Check yourself

Why does the state vector include the opponent's full 6-mon roster — isn't that cheating?

No. Regulation M-B uses Open Team Sheets: both players legally see both full teams from team preview onward. The roster block (6 mons × {HP fraction, fainted} per side, plus the roster species ids feeding the shared embedding) just encodes knowledge every human at the table already has. Hiding it would handicap the net, not make it fairer.

What exactly is a species embedding, in data-engineering terms — and which part does nobody design?

It's a dimension-table lookup: the species name gets a surrogate key (an integer id, 165 in the vocab), and the key selects a row of 16 numbers that flows into the net. The undesigned part is the row's contents: the 16 "attribute columns" start random and are molded by the win/loss gradient until they encode whatever distinctions help win — nobody names or defines them. You choose the key and the width; training invents the schema.

The allyHit bug never threw an error and win rates looked fine. What actually caught it, and what's the general lesson?

A behavioral probe caught it: the regression gate's ally-KO check fired 302 times per 150 games — far too often — and chasing down why revealed that allyHit was being computed for every damaging move instead of only allAdjacent spread moves, feeding phantom ally damage on single-target moves like Knock Off. After guarding on the target type, genuine firings fell to 16. Lesson: feature bugs don't crash — the model just learns the lie — so you must probe the model's behavior, not just its win rate or its error logs.