zkdefi · notes

A scratch notebook for thinking about PQC acceleration

The post-quantum standards are settled enough that the conversation has moved on from "which scheme?" to "how do we make them fast on the silicon we already have?" That's a less glamorous question and most of its answers are unsatisfying. The Kyber and Dilithium reference implementations are written in clean portable C; the actually-fast versions are AVX-2 and AVX-512 intrinsics, NEON for ARM, and a long tail of clever Barrett-reduction tricks for the modular arithmetic. Everyone benchmarks differently. Nobody benchmarks against the workload that I care about — long-lived TLS handshakes on a banking platform, with batched key generation and signature verification under steady load.

pqc-accelerate is the notebook where I work through this for myself. It's deliberately a scratchpad, not a product. The point is the exercise of staring at the cycle counts long enough to know what's worth optimizing.

The questions the notebook walks through:

Nothing in this notebook is novel. The constructions it benchmarks all exist in the reference and optimized implementations. The notebook's purpose is internalization — I want to be the kind of engineer who, when someone asks "can we afford to put Kyber on this load balancer," answers from numbers I derived myself, not from a paper's abstract.

The era for these constructions is just beginning. Once they ship into ordinary cryptographic libraries (OpenSSL 3.x already has stubs), every operations team is going to face the question of how much do they cost on my fleet? The answer they get from a vendor benchmark is wrong by a factor of two in either direction depending on what the vendor wanted to sell them. The answer they get from running it themselves is the actual answer.

This is the scratchpad for getting good at deriving that answer.

What becomes possible: I know what to look for when I read a new PQC benchmark paper, and I know what to actually measure when I deploy the constructions to production.

#post-quantum #kyber #dilithium #hardware-acceleration #jupyter