#reinforcement-learning — zkdefi

2025-01-07

Replicating R1-Zero on a countdown task

A reinforcement-learning replication of the R1-Zero training recipe on a small countdown task — using the TinyZero pipeline as the substrate. The point isn't the result; it's the discipline of reproducing a published recipe from scratch on consumer hardware.