armcap.c - OpenGrok history log for /openssl/crypto/armcap.c

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
# 01f4b44e	01-Sep-2024	Brad Smith	Add support for elf_aux_info() on OpenBSD CLA: trivial Reviewed-by: Dmitry Belyavskiy <beldmit@gmail.com> Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com> Reviewed-by: Tomas Add support for elf_aux_info() on OpenBSD CLA: trivial Reviewed-by: Dmitry Belyavskiy <beldmit@gmail.com> Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/25346) show more ...
# b6461792	20-Mar-2024	Richard Levitte	Copyright year updates Reviewed-by: Neil Horman <nhorman@openssl.org> Release: yes (cherry picked from commit 0ce7d1f355c1240653e320a3f6f8109c1f05f8c0) Reviewed-by: Hugo Lan Copyright year updates Reviewed-by: Neil Horman <nhorman@openssl.org> Release: yes (cherry picked from commit 0ce7d1f355c1240653e320a3f6f8109c1f05f8c0) Reviewed-by: Hugo Landau <hlandau@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/24034) show more ...
# e7f1afe4	21-Mar-2024	Jiangning Liu	Enable SHA3 unrolling and EOR3 optimization for Ampere Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/op Enable SHA3 unrolling and EOR3 optimization for Ampere Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/23929) show more ...
# 11adf9a7	21-Feb-2024	Tom Cosgrove	Apply the AES-GCM unroll8 optimisation to Microsoft Azure Cobalt 100 Performance improvements range from 18% to 32%. Change-Id: Ifb89eeac3c0625a582a25ff07cf7f9c9ec8f5ba6 Re Apply the AES-GCM unroll8 optimisation to Microsoft Azure Cobalt 100 Performance improvements range from 18% to 32%. Change-Id: Ifb89eeac3c0625a582a25ff07cf7f9c9ec8f5ba6 Reviewed-by: Hugo Landau <hlandau@openssl.org> Reviewed-by: Neil Horman <nhorman@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/23651) show more ...
# cc82b09c	17-Oct-2023	fisher.yu	Optimize AES-CTR for ARM Neoverse V1 and V2. Unroll AES-CTR loops to a maximum 12 blocks for ARM Neoverse V1 and V2, to fully utilize their AES pipeline resources. I Optimize AES-CTR for ARM Neoverse V1 and V2. Unroll AES-CTR loops to a maximum 12 blocks for ARM Neoverse V1 and V2, to fully utilize their AES pipeline resources. Improvement on ARM Neoverse V1. Package Size(Bytes) 16 32 64 128 256 1024 Improvement(%) 3.93 -0.45 11.30 4.31 12.48 37.66 Package Size(Bytes) 1500 8192 16384 61440 65536 Improvement(%) 37.16 38.90 39.89 40.55 40.41 Change-Id: Ifb8fad9af22476259b9ba75132bc3d8010a7fdbd Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/22733) show more ...
# 7602bf87	08-Nov-2023	Tom Cosgrove	Enable AES and SHA3 optimisations on Apple Silicon M3-based macOS systems AES gets a performance enhancement of 19-36%, similar to the M1 and M2. SHA3 gets an improvement of 4-7% on Enable AES and SHA3 optimisations on Apple Silicon M3-based macOS systems AES gets a performance enhancement of 19-36%, similar to the M1 and M2. SHA3 gets an improvement of 4-7% on buffers 256 bytes or larger. Tested on an M3 Pro, but the CPU cores are the same on M3 and M3 Max. Change-Id: I2bf40bbde824823bb8cf2efd1bd945da9f23a703 Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/22685) show more ...
# ba9472c1	15-Jul-2023	sdlyyxy	Update with `ARMV8_HAVE_SHA3_AND_WORTH_USING` Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/open Update with `ARMV8_HAVE_SHA3_AND_WORTH_USING` Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/openssl/pull/21398) show more ...
# 08e6eb21	14-Jul-2023	sdlyyxy	Move CPU detection to armcap.c Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/openssl/pull/21398)
# 7b508cd1	27-Mar-2023	Tom Cosgrove	Ensure there's only one copy of OPENSSL_armcap_P in libcrypto.a Change-Id: Ia94e528a2d55934435de6a2949784c52eb38d82f Reviewed-by: Matt Caswell <matt@openssl.org> Reviewed-by: To Ensure there's only one copy of OPENSSL_armcap_P in libcrypto.a Change-Id: Ia94e528a2d55934435de6a2949784c52eb38d82f Reviewed-by: Matt Caswell <matt@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/20621) show more ...
# 93370db1	21-Mar-2023	Tomas Mraz	Avoid duplication of OPENSSL_armcap_P on 32bit ARM Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl Avoid duplication of OPENSSL_armcap_P on 32bit ARM Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/openssl/pull/20558) show more ...
# 52a38144	12-Feb-2023	Tom Cosgrove	Tidy up aarch64 feature detection code in armcap.c Make the SIGILL-based code easier to read, and don't use it on Apple Silicon. Also fix "error: 'HWCAP(2)_' macro redefined" warni Tidy up aarch64 feature detection code in armcap.c Make the SIGILL-based code easier to read, and don't use it on Apple Silicon. Also fix "error: 'HWCAP(2)_' macro redefined" warnings on FreeBSD. Fixes #20188 Change-Id: I5618bbe9444cc40cb5705c6ccbdc331c16bab794 Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/openssl/pull/20305) show more ...
# 513e103f	29-Jan-2023	Xiaokang Qian	Apply aes-gcm unroll8+eor3 optimization patch to Neoverse V2 Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/op Apply aes-gcm unroll8+eor3 optimization patch to Neoverse V2 Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/20184) show more ...
# d79bb531	25-Jan-2023	Tom Cosgrove	Enable AES optimisation on Apple Silicon M2-based systems Gives a performance enhancement of 16-38%, similar to the M1. Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Enable AES optimisation on Apple Silicon M2-based systems Gives a performance enhancement of 16-38%, similar to the M1. Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Hugo Landau <hlandau@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/openssl/pull/20141) show more ...
# f97ddfc3	03-Dec-2022	Tom Cosgrove	Fix the code used to detect aarch64 capabilities when we don't have getauxval() In addition to a missing prototype there was also a missing closing brace '}'. Fixes #19825. Fix the code used to detect aarch64 capabilities when we don't have getauxval() In addition to a missing prototype there was also a missing closing brace '}'. Fixes #19825. Reviewed-by: Matt Caswell <matt@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/19833) show more ...
# b863e1e4	27-Oct-2022	Everton Constantino	Add two new build targets to enable the possibility of using clang-cl as an assembler for Windows on Arm builds and also clang-cl as the compiler as well. Make appropriate changes to armcap s Add two new build targets to enable the possibility of using clang-cl as an assembler for Windows on Arm builds and also clang-cl as the compiler as well. Make appropriate changes to armcap source and peralsm scripts. Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Hugo Landau <hlandau@openssl.org> (Merged from https://github.com/openssl/openssl/pull/19523) show more ...
# f2ec24c9	23-Jul-2022	Cameron Gutman	armcap: skip probing _armv7_tick() Detection of this feature is unreliable so only use it if requested. Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Dmitry Belyavskiy armcap: skip probing _armv7_tick() Detection of this feature is unreliable so only use it if requested. Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Dmitry Belyavskiy <beldmit@gmail.com> Reviewed-by: Hugo Landau <hlandau@openssl.org> (Merged from https://github.com/openssl/openssl/pull/18852) show more ...
# 9224a407	18-May-2022	XiaokangQian	Apply the AES-GCM unroll8 optimization patch to Neoverse N2 The loop unrolling and use of EOR3 can improve N2 performance by up to 32% Signed-off-by: XiaokangQian <xiaokang.qian Apply the AES-GCM unroll8 optimization patch to Neoverse N2 The loop unrolling and use of EOR3 can improve N2 performance by up to 32% Signed-off-by: XiaokangQian <xiaokang.qian@arm.com> Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/openssl/pull/18350) show more ...
# fecb3aae	03-May-2022	Matt Caswell	Update copyright year Reviewed-by: Tomas Mraz <tomas@openssl.org> Release: yes
# b1b2146d	07-Feb-2022	Daniel Hu	Acceleration of chacha20 on aarch64 by SVE This patch accelerates chacha20 on aarch64 when Scalable Vector Extension (SVE) is supported by CPU. Tested on modern micro-architecture with Acceleration of chacha20 on aarch64 by SVE This patch accelerates chacha20 on aarch64 when Scalable Vector Extension (SVE) is supported by CPU. Tested on modern micro-architecture with 256-bit SVE, it has the potential to improve performance up to 20% The solution takes a hybrid approach. SVE will handle multi-blocks that fit the SVE vector length, with Neon/Scalar to process any tail data Test result: With SVE type 1024 bytes 8192 bytes 16384 bytes ChaCha20 1596208.13k 1650010.79k 1653151.06k Without SVE (by Neon/Scalar) type 1024 bytes 8192 bytes 16384 bytes chacha20 1355487.91k 1372678.83k 1372662.44k The assembly code has been reviewed internally by ARM engineer Fangming.Fang@arm.com Signed-off-by: Daniel Hu <Daniel.Hu@arm.com> Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/openssl/pull/17916) show more ...
# 954f45ba	09-Jun-2021	XiaokangQian	Optimize AES-GCM for uarchs with unroll and new instructions Increase the block numbers to 8 for every iteration. Increase the hash table capacity. Make use of EOR3 instruction to impr Optimize AES-GCM for uarchs with unroll and new instructions Increase the block numbers to 8 for every iteration. Increase the hash table capacity. Make use of EOR3 instruction to improve the performance. This can improve performance 25-40% on out-of-order microarchitectures with a large number of fast execution units, such as Neoverse V1. We also see 20-30% performance improvements on other architectures such as the M1. Assembly code reviewd by Tom Cosgrove (ARM). Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/openssl/pull/15916) show more ...
# 15b7175f	19-Oct-2021	Daniel Hu	SM4 optimization for ARM by HW instruction This patch implements the SM4 optimization for ARM processor, using SM4 HW instruction, which is an optional feature of crypto extension fo SM4 optimization for ARM by HW instruction This patch implements the SM4 optimization for ARM processor, using SM4 HW instruction, which is an optional feature of crypto extension for aarch64 V8. Tested on some modern ARM micro-architectures with SM4 support, the performance uplift can be observed around 8X~40X over existing C implementation in openssl. Algorithms that can be parallelized (like CTR, ECB, CBC decryption) are on higher end, with algorithm like CBC encryption on lower end (due to inter-block dependency) Perf data on Yitian-710 2.75GHz hardware, before and after optimization: Before: type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes SM4-CTR 105787.80k 107837.87k 108380.84k 108462.08k 108549.46k 108554.92k SM4-ECB 111924.58k 118173.76k 119776.00k 120093.70k 120264.02k 120274.94k SM4-CBC 106428.09k 109190.98k 109674.33k 109774.51k 109827.41k 109827.41k After (7.4x - 36.6x faster): type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes SM4-CTR 781979.02k 2432994.28k 3437753.86k 3834177.88k 3963715.58k 3974556.33k SM4-ECB 937590.69k 2941689.02k 3945751.81k 4328655.87k 4459181.40k 4468692.31k SM4-CBC 890639.88k 1027746.58k 1050621.78k 1056696.66k 1058613.93k 1058701.31k Signed-off-by: Daniel Hu <Daniel.Hu@arm.com> Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/17455) show more ...
# 71396cd0	24-Dec-2021	fangming.fang	SM3 acceleration with SM3 hardware instruction on aarch64 SM3 hardware instruction is optional feature of crypto extension for aarch64. This implementation accelerates SM3 via SM3 instru SM3 acceleration with SM3 hardware instruction on aarch64 SM3 hardware instruction is optional feature of crypto extension for aarch64. This implementation accelerates SM3 via SM3 instructions. For the platform not supporting SM3 instruction, the original C implementation still works. Thanks to AliBaba for testing and reporting the following perf numbers for Yitian710: Benchmark on T-Head Yitian-710 2.75GHz: Before: type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes sm3 49297.82k 121062.63k 223106.05k 283371.52k 307574.10k 309400.92k After (33% - 74% faster): type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes sm3 65640.01k 179121.79k 359854.59k 481448.96k 534055.59k 538274.47k Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/17454) show more ...
# abc4345a	28-Dec-2021	fangming.fang	fix building failure when using -Wconditional-uninitialized Use clang -Wconditional-uninitialized to build, the error "initialize the variable 'buffer_size' to silence this warning" will fix building failure when using -Wconditional-uninitialized Use clang -Wconditional-uninitialized to build, the error "initialize the variable 'buffer_size' to silence this warning" will be reported. Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/17375) show more ...
Revision tags: openssl-3.0.0-alpha17
# efa1f224	19-May-2021	Orr Toledano	Add Arm Assembly (aarch64) support for RNG Include aarch64 asm instructions for random number generation using the RNDR and RNDRRS instructions. Provide detection functions for RNDR and Add Arm Assembly (aarch64) support for RNG Include aarch64 asm instructions for random number generation using the RNDR and RNDRRS instructions. Provide detection functions for RNDR and RNDRRS getauxval. Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/15361) show more ...
# c1dabe26	19-Nov-2021	Allan Jude	Fix detection of ARMv7 and ARM64 CPU features on FreeBSD OpenSSL assumes AT_HWCAP = 16 (as on Linux), but on FreeBSD AT_HWCAP = 25 Switch to using AT_HWCAP, and setting it to 16 if it is Fix detection of ARMv7 and ARM64 CPU features on FreeBSD OpenSSL assumes AT_HWCAP = 16 (as on Linux), but on FreeBSD AT_HWCAP = 25 Switch to using AT_HWCAP, and setting it to 16 if it is not defined. OpenSSL calls elf_auxv_info() with AT_CANARY which returns ENOENT resulting in all ARM acceleration features being disabled. CLA: trivial Reviewed-by: Ben Kaduk <kaduk@mit.edu> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/17082) show more ...
12 3