Abstract: The ML-KEM post-quantum cryptography (PQC) scheme requires matrix-vector polynomial multiplication and polynomial arithmetic operations in the number theoretic transform (NTT) domain. Prior ...
CUDA-L2 is a system that combines large language models (LLMs) and reinforcement learning (RL) to automatically optimize Half-precision General Matrix Multiply (HGEMM) CUDA kernels. CUDA-L2 ...
Abstract: In this paper, we propose three modular multiplication algorithms that use only the IEEE 754 binary floating-point operations. Several previous studies have used floating-point operations to ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results