Abstract:
Public-key cryptography came into light in 1976 through the work of Diffie and Hellman which introduced the first one-way function giving birth to the famous Diffie-Hellman (DH) key agreement protocol. While the original protocol was defined over multiplicative groups, it can also be instantiated over elliptic curves which is known as the Elliptic Curve Diffie-Hellman (ECDH) protocol. Security and efficiency are two crucial issues that needs to be taken care of while an elliptic curve is proposed and implemented so that it can be used in the real world applications. At the design phase the curve parameters need to be chosen such that they satisfy some pre-defined security requirements. At the implementation level, the cryptographic software should be efficient enough so that it executes in high-speed and satisfy certain security requirements so that it is capable to resist certain side-channel attacks. Our focus in this work is on secure and efficient computation of ECDH which has two phases, key generation and shared secret computation. Both the phases have a critical component known as the scalar multiplication and this has different computational challenges depending on the target platform of the implementation. In the context of ECDH there are two kinds of scalar multiplication involved, fixed-base scalar multiplication and variable-base scalar multiplication. Key generation uses fixed-base scalar multiplication and shared secret computation uses variable-base scalar multiplication which is relatively an expensive operation and has more computational interest. In this thesis we have addressed the problem of efficient computation of ECDH on Montgomery curves at various security levels from a software perspective targeting the modern Intel architectures. Montgomery curves are usually defined over fields based on primes of special shape, like Mersenne primes, pseudo-Mersenne primes and the Solinas trinomial primes which allow high-speed field arithmetic. A simple and efficient method to compute the scalar multiplication in constant time is known as the Montgomery ladder, which performs a fixed sequence of field operations. Efficiently computing the ladder targeting various platform and architectures has been heavily researched by the cryptographic community over the last two decades. The performances of the associated field and curve arithmetic is a major target of optimization. While implementing the cryptographic software for ECDH, the instruction set of the target architecture is carefully exploited to achieve the best results. Developing the code for scalar multiplication is often done by writing the corresponding programs using assembly instructions of the target architecture. The Intel architecture is one of the most heavily used architectures all over the world and many ECDH software have been developed on the top of it. The architecture provides a rich set of sequential and vector instructions which have been exploited to the best possible extent in the software implementations of this thesis. The thesis has multiple research contributions which are summarized below.
First, we provide a variety of efficient 64-bit assembly implementations of field multiplication, squaring and FLT-based inversion for several cryptographically relevant primes covering a wide range of security levels which are suitable for the Haswell-onward Intel architectures. The implementations are done on the basis of various algorithms which have been formalized and rigorously proven for their correctness.
Second, we provide efficient assembly implementations of 4-way field multiplication and squaring using the 4-way vector instructions available in the Haswell-onward Intel architectures. This again has been done for several primes covering various security levels.
Third, we provide new 4-way vectorized algorithms to efficiently compute the Montgomery ladder.
Next, we provide state-of-art implementations of ECDH for the standard Montgomery curves Curve25519 and Curve448.
Finally, we propose new curves over prime order fields targeting various security levels which are fairly competitive to the existing standards in terms of the shared-secret computation phase of ECDH. On the basis of the proposed curves and the existing standards, we provide security and efficiency trade-offs for different standard security levels.