3,467 Moves Under a Microscope
Every LLM move from 68 games, evaluated by Stockfish — what the data reveals about where language models reason well and where they fall apart.
Read more →3,467 Moves Under a Microscope
Every LLM move from 68 games, evaluated by Stockfish — what the data reveals about where language models reason well and where they fall apart.
Read more →Qg8#
89 moves, zero illegal attempts, and a checkmate — the story of the first time a language model pipeline beat Stockfish, playing at 1320 ELO.
Read more →The 3x3 Matrix
Mapping chess cognitive demands to real-world task combinations — which tasks overload language models for the same reasons chess does?
Read more →The $100 Chess Game and What Came After
How cost constraints, rate limits, and a creative use of Claude Code transformed a failing experiment into a human-AI partnership against Stockfish.
Read more →36 Games, 799 Moves, and the Shape of Failure
How the system broke across 36 games, how failures evolved, and what the data reveals about LLM chess.
Read more →Twelve Tools, Four Agents, and One Pipeline
The architecture of a system that gives a language model eyes, critics, and a structured workflow for playing chess against Stockfish.
Read more →When a Prompt Isn't Enough
The origin story of a project that started with broken LaTeX, led through cognitive load theory, and ended with a language model playing chess.
Read more →The best way to understand something is to break it here and there.
Cross-domain experiments — cognitive load theory meets chess engines, Bloom's taxonomy meets Q/A generation, GPU kernels meet ML intuition.
Read more →