Loading crypto/bn/asm/rsaz-avx2.pl +4 −0 Original line number Diff line number Diff line Loading @@ -61,8 +61,12 @@ # # rsa2048 sign/sec OpenSSL 1.0.1 scalar(*) this # 2.3GHz Haswell 621 765/+23% 1113/+79% # 2.3GHz Broadwell(**) 688 1200(***)/+74% 1120/+63% # # (*) if system doesn't support AVX2, for reference purposes; # (**) scaled to 2.3GHz to simplify comparison; # (***) scalar AD*X code is faster than AVX2 and is preferred code # path for Broadwell; $flavour = shift; $output = shift; Loading crypto/modes/asm/aesni-gcm-x86_64.pl +4 −1 Original line number Diff line number Diff line Loading @@ -22,7 +22,10 @@ # [1] and [2], with MOVBE twist suggested by Ilya Albrekht and Max # Locktyukhin of Intel Corp. who verified that it reduces shuffles # pressure with notable relative improvement, achieving 1.0 cycle per # byte processed with 128-bit key on Haswell processor. # byte processed with 128-bit key on Haswell processor, and 0.74 - # on Broadwell. [Mentioned results are raw profiled measurements for # favourable packet size, one divisible by 96. Applications using the # EVP interface will observe a few percent worse performance.] # # [1] http://rt.openssl.org/Ticket/Display.html?id=2900&user=guest&pass=guest # [2] http://www.intel.com/content/dam/www/public/us/en/documents/software-support/enabling-high-performance-gcm.pdf Loading crypto/modes/asm/ghash-x86_64.pl +3 −1 Original line number Diff line number Diff line Loading @@ -63,6 +63,7 @@ # Sandy Bridge 1.80(+8%) # Ivy Bridge 1.80(+7%) # Haswell 0.55(+93%) (if system doesn't support AVX) # Broadwell 0.45(+110%)(if system doesn't support AVX) # Bulldozer 1.49(+27%) # Silvermont 2.88(+13%) Loading @@ -73,7 +74,8 @@ # CPUs such as Sandy and Ivy Bridge can execute it, the code performs # sub-optimally in comparison to above mentioned version. But thanks # to Ilya Albrekht and Max Locktyukhin of Intel Corp. we knew that # it performs in 0.41 cycles per byte on Haswell processor. # it performs in 0.41 cycles per byte on Haswell processor, and in # 0.29 on Broadwell. # # [1] http://rt.openssl.org/Ticket/Display.html?id=2900&user=guest&pass=guest Loading Loading
crypto/bn/asm/rsaz-avx2.pl +4 −0 Original line number Diff line number Diff line Loading @@ -61,8 +61,12 @@ # # rsa2048 sign/sec OpenSSL 1.0.1 scalar(*) this # 2.3GHz Haswell 621 765/+23% 1113/+79% # 2.3GHz Broadwell(**) 688 1200(***)/+74% 1120/+63% # # (*) if system doesn't support AVX2, for reference purposes; # (**) scaled to 2.3GHz to simplify comparison; # (***) scalar AD*X code is faster than AVX2 and is preferred code # path for Broadwell; $flavour = shift; $output = shift; Loading
crypto/modes/asm/aesni-gcm-x86_64.pl +4 −1 Original line number Diff line number Diff line Loading @@ -22,7 +22,10 @@ # [1] and [2], with MOVBE twist suggested by Ilya Albrekht and Max # Locktyukhin of Intel Corp. who verified that it reduces shuffles # pressure with notable relative improvement, achieving 1.0 cycle per # byte processed with 128-bit key on Haswell processor. # byte processed with 128-bit key on Haswell processor, and 0.74 - # on Broadwell. [Mentioned results are raw profiled measurements for # favourable packet size, one divisible by 96. Applications using the # EVP interface will observe a few percent worse performance.] # # [1] http://rt.openssl.org/Ticket/Display.html?id=2900&user=guest&pass=guest # [2] http://www.intel.com/content/dam/www/public/us/en/documents/software-support/enabling-high-performance-gcm.pdf Loading
crypto/modes/asm/ghash-x86_64.pl +3 −1 Original line number Diff line number Diff line Loading @@ -63,6 +63,7 @@ # Sandy Bridge 1.80(+8%) # Ivy Bridge 1.80(+7%) # Haswell 0.55(+93%) (if system doesn't support AVX) # Broadwell 0.45(+110%)(if system doesn't support AVX) # Bulldozer 1.49(+27%) # Silvermont 2.88(+13%) Loading @@ -73,7 +74,8 @@ # CPUs such as Sandy and Ivy Bridge can execute it, the code performs # sub-optimally in comparison to above mentioned version. But thanks # to Ilya Albrekht and Max Locktyukhin of Intel Corp. we knew that # it performs in 0.41 cycles per byte on Haswell processor. # it performs in 0.41 cycles per byte on Haswell processor, and in # 0.29 on Broadwell. # # [1] http://rt.openssl.org/Ticket/Display.html?id=2900&user=guest&pass=guest Loading