zkdefi · notes

#reinforcement-learning

Replicating R1-Zero on a countdown task

A reinforcement-learning replication of the R1-Zero training recipe on a small countdown task — using the TinyZero pipeline as the substrate. The point isn't the result; it's the discipline of reproducing a published recipe from scratch on consumer hardware.