where-simd-helps/paper/sections/supplementary.tex

% ── Supplementary: KEM-level end-to-end speedup ───────────────────────────────
\section{End-to-End KEM Speedup}
\label{sec:supp:kem}

Figure~\ref{fig:kemlevel} shows the hand-written SIMD speedup for the
top-level KEM operations: key generation (\op{kyber\_keypair}), encapsulation
(\op{kyber\_encaps}), and decapsulation (\op{kyber\_decaps}). These composite
operations aggregate the speedups of their constituent primitives, weighted by
relative cycle counts.

Decapsulation achieves the highest speedup (\speedup{6.9}--\speedup{7.1})
because it involves the largest share of arithmetic operations (two additional
NTT and INVNTT calls for re-encryption verification). Key generation achieves
the lowest (\speedup{5.3}--\speedup{5.9}) because it involves one fewer
polynomial multiplication step relative to encapsulation.

\begin{figure}[h]
  \centering
  \input{figures/fig_kem_level}
  \caption{End-to-end KEM speedup (\varref{} $\to$ \varavx{}) for
           \op{kyber\_keypair}, \op{kyber\_encaps}, and \op{kyber\_decaps}.
           Intel Xeon Platinum 8268; 95\% bootstrap CI.}
  \label{fig:kemlevel}
\end{figure}

\section{Full Operation Set}
\label{sec:supp:fullops}

\todo[inline]{Full operation speedup table for all 20 benchmarked operations,
including \op{poly\_compress}, \op{poly\_decompress}, \op{polyvec\_compress},
\op{poly\_tomsg}, and the \texttt{*\_derand} KEM variants.}