md5: add assembly implementation for aarch64 This change improves md5 performance significantly by using a hand-optimized assembly implementation of the inner loop of md5 calculation. Th
md5: add assembly implementation for aarch64 This change improves md5 performance significantly by using a hand-optimized assembly implementation of the inner loop of md5 calculation. The instructions are carefully ordered to separate data dependencies as much as possible. Test with: $ openssl speed md5 AWS Graviton 2 type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes md5 46990.60k 132778.65k 270376.96k 364718.08k 405962.75k 409201.32k md5-modified 51725.23k 152236.22k 323469.14k 453869.57k 514102.61k 519056.04k +10% +15% +20% +24% +27% +27% Apple M1 type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes md5 74634.39k 195561.25k 375434.45k 491004.23k 532361.40k 536636.48k md5-modified 84637.11k 229017.09k 444609.62k 588069.50k 655114.24k 660850.56k +13% +17% +18% +20% +23% +23% Reviewed-by: Matt Caswell <matt@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/openssl/pull/16928)
show more ...
|