% ── Supplementary: KEM-level end-to-end speedup ─────────────────────────────── \section{End-to-End KEM Speedup} \label{sec:supp:kem} Figure~\ref{fig:kemlevel} shows the hand-written SIMD speedup for the top-level KEM operations: key generation (\op{kyber\_keypair}), encapsulation (\op{kyber\_encaps}), and decapsulation (\op{kyber\_decaps}). These composite operations aggregate the speedups of their constituent primitives, weighted by relative cycle counts. Decapsulation achieves the highest speedup (\speedup{6.9}--\speedup{7.1}) because it involves the largest share of arithmetic operations (two additional NTT and INVNTT calls for re-encryption verification). Key generation achieves the lowest (\speedup{5.3}--\speedup{5.9}) because it involves one fewer polynomial multiplication step relative to encapsulation. \begin{figure}[h] \centering \input{figures/fig_kem_level} \caption{End-to-end KEM speedup (\varref{} $\to$ \varavx{}) for \op{kyber\_keypair}, \op{kyber\_encaps}, and \op{kyber\_decaps}. Intel Xeon Platinum 8268; 95\% bootstrap CI.} \label{fig:kemlevel} \end{figure} \section{Full Operation Set} \label{sec:supp:fullops} \todo[inline]{Full operation speedup table for all 20 benchmarked operations, including \op{poly\_compress}, \op{poly\_decompress}, \op{polyvec\_compress}, \op{poly\_tomsg}, and the \texttt{*\_derand} KEM variants.}