Skip to content

The best way to understand something is to break it here and there.

3,467 Moves Under a Microscope

New

Every LLM move from 68 games, evaluated by Stockfish — what the data reveals about where language models reason well and where they fall apart.

Read more →

Qg8#

New

89 moves, zero illegal attempts, and a checkmate — the story of the first time a language model pipeline beat Stockfish, playing at 1320 ELO.

Read more →

The 3x3 Matrix

Mapping chess cognitive demands to real-world task combinations — which tasks overload language models for the same reasons chess does?

Read more →

The $100 Chess Game and What Came After

How cost constraints, rate limits, and a creative use of Claude Code transformed a failing experiment into a human-AI partnership against Stockfish.

Read more →

36 Games, 799 Moves, and the Shape of Failure

How the system broke across 36 games, how failures evolved, and what the data reveals about LLM chess.

Read more →

Twelve Tools, Four Agents, and One Pipeline

The architecture of a system that gives a language model eyes, critics, and a structured workflow for playing chess against Stockfish.

Read more →

When a Prompt Isn't Enough

The origin story of a project that started with broken LaTeX, led through cognitive load theory, and ended with a language model playing chess.

Read more →

The best way to understand something is to break it here and there.

Cross-domain experiments — cognitive load theory meets chess engines, Bloom's taxonomy meets Q/A generation, GPU kernels meet ML intuition.

Read more →