zkdefi · notes

#r1-zero

Replicating R1-Zero on a countdown task

A reinforcement-learning replication of the R1-Zero training recipe on a small countdown task — using the TinyZero pipeline as the substrate. The point isn't the result; it's the discipline of reproducing a published recipe from scratch on consumer hardware.