md5: add assembly implementation for loongarch64 This change can improve md5 performance by using a hand-optimized assembly implementation of the inner loop of md5 calculation. This
md5: add assembly implementation for loongarch64 This change can improve md5 performance by using a hand-optimized assembly implementation of the inner loop of md5 calculation. This implementation refered to md5-x86_64.pl and made more effort to reorder instructions for separating data dependencies as much as possible. Test with: $ openssl speed md5 3A5000 type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes md5 45061.04k 130440.75k 291105.28k 421101.23k 484639.27k 488320.43k md5-modified 47179.95k 139015.57k 308836.69k 445963.26k 512540.67k 518215.00k +5% +7% +6% +6% +6% +6% 3A6000 type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes md5 60070.06k 161822.76k 325817.60k 438017.02k 486864.21k 492243.31k md5-modified 62827.74k 170294.04k 343795.03k 463324.50k 515831.13k 520060.93k +5% +5% +6% +6% +6% +6% Signed-off-by: Min Zhou <zhoumin@loongson.cn> Co-authored-by: Xi Ruoyao <xry111@xry111.site> Reviewed-by: Shane Lontis <shane.lontis@oracle.com> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/21704)
show more ...
|