42 lines
2.3 KiB
TeX
42 lines
2.3 KiB
TeX
% ── 6. Related Work ───────────────────────────────────────────────────────────
|
|
\section{Related Work}
|
|
\label{sec:related}
|
|
|
|
\paragraph{ML-KEM / Kyber implementations.}
|
|
The AVX2 implementation studied here was developed by Schwabe and
|
|
Seiler~\cite{kyber-avx2} and forms the optimized path in both the
|
|
\texttt{pq-crystals/kyber} reference repository and
|
|
PQClean~\cite{pqclean}. Bos et al.~\cite{kyber2018} describe the original
|
|
Kyber submission; FIPS~203~\cite{fips203} is the standardized form.
|
|
The ARM NEON and Cortex-M4 implementations are available in
|
|
pqm4~\cite{pqm4}; cross-ISA comparison is planned for Phase~3.
|
|
|
|
\paragraph{PQC benchmarking.}
|
|
eBACS/SUPERCOP provides a cross-platform benchmark suite~\cite{supercop} that
|
|
reports median cycle counts for many cryptographic primitives, including Kyber.
|
|
Our contribution complements this with a statistically rigorous decomposition
|
|
using nonparametric effect-size analysis and bootstrapped CIs. Kannwischer et
|
|
al.~\cite{pqm4} present systematic benchmarks on ARM Cortex-M4 (pqm4), which
|
|
focuses on constrained-device performance rather than SIMD analysis.
|
|
|
|
\paragraph{SIMD in cryptography.}
|
|
Gueron and Krasnov demonstrated AVX2 speedups for AES-GCM~\cite{gueron2014};
|
|
similar techniques underpin the Kyber AVX2 implementation. Bernstein's
|
|
vectorized polynomial arithmetic for Curve25519~\cite{bernstein2006} established
|
|
the template of hand-written vector intrinsics for cryptographic field
|
|
arithmetic.
|
|
|
|
\paragraph{NTT optimization.}
|
|
Longa and Naehrig~\cite{ntt-survey} survey NTT algorithms for ideal
|
|
lattice-based cryptography and analyze instruction counts for vectorized
|
|
implementations. Our measurements provide the first empirical cycle-count
|
|
decomposition isolating the compiler's contribution vs.\ hand-written SIMD for
|
|
the ML-KEM NTT specifically.
|
|
|
|
\paragraph{Hardware counter profiling.}
|
|
Bernstein and Schwabe~\cite{cachetime} discuss the relationship between cache
|
|
behavior and cryptographic timing. PAPI~\cite{papi} provides a portable
|
|
interface to hardware performance counters used in related profiling work.
|
|
Phase~2 of this study will add PAPI counter collection to provide the
|
|
mechanistic hardware-level explanation of the speedups observed here.
|