#
01f4b44e |
| 01-Sep-2024 |
Brad Smith |
Add support for elf_aux_info() on OpenBSD CLA: trivial Reviewed-by: Dmitry Belyavskiy <beldmit@gmail.com> Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com> Reviewed-by: Tomas
Add support for elf_aux_info() on OpenBSD CLA: trivial Reviewed-by: Dmitry Belyavskiy <beldmit@gmail.com> Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/25346)
show more ...
|
#
b6461792 |
| 20-Mar-2024 |
Richard Levitte |
Copyright year updates Reviewed-by: Neil Horman <nhorman@openssl.org> Release: yes (cherry picked from commit 0ce7d1f355c1240653e320a3f6f8109c1f05f8c0) Reviewed-by: Hugo Lan
Copyright year updates Reviewed-by: Neil Horman <nhorman@openssl.org> Release: yes (cherry picked from commit 0ce7d1f355c1240653e320a3f6f8109c1f05f8c0) Reviewed-by: Hugo Landau <hlandau@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/24034)
show more ...
|
#
e7f1afe4 |
| 21-Mar-2024 |
Jiangning Liu |
Enable SHA3 unrolling and EOR3 optimization for Ampere Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/op
Enable SHA3 unrolling and EOR3 optimization for Ampere Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/23929)
show more ...
|
#
11adf9a7 |
| 21-Feb-2024 |
Tom Cosgrove |
Apply the AES-GCM unroll8 optimisation to Microsoft Azure Cobalt 100 Performance improvements range from 18% to 32%. Change-Id: Ifb89eeac3c0625a582a25ff07cf7f9c9ec8f5ba6 Re
Apply the AES-GCM unroll8 optimisation to Microsoft Azure Cobalt 100 Performance improvements range from 18% to 32%. Change-Id: Ifb89eeac3c0625a582a25ff07cf7f9c9ec8f5ba6 Reviewed-by: Hugo Landau <hlandau@openssl.org> Reviewed-by: Neil Horman <nhorman@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/23651)
show more ...
|
#
cc82b09c |
| 17-Oct-2023 |
fisher.yu |
Optimize AES-CTR for ARM Neoverse V1 and V2. Unroll AES-CTR loops to a maximum 12 blocks for ARM Neoverse V1 and V2, to fully utilize their AES pipeline resources. I
Optimize AES-CTR for ARM Neoverse V1 and V2. Unroll AES-CTR loops to a maximum 12 blocks for ARM Neoverse V1 and V2, to fully utilize their AES pipeline resources. Improvement on ARM Neoverse V1. Package Size(Bytes) 16 32 64 128 256 1024 Improvement(%) 3.93 -0.45 11.30 4.31 12.48 37.66 Package Size(Bytes) 1500 8192 16384 61440 65536 Improvement(%) 37.16 38.90 39.89 40.55 40.41 Change-Id: Ifb8fad9af22476259b9ba75132bc3d8010a7fdbd Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/22733)
show more ...
|
#
7602bf87 |
| 08-Nov-2023 |
Tom Cosgrove |
Enable AES and SHA3 optimisations on Apple Silicon M3-based macOS systems AES gets a performance enhancement of 19-36%, similar to the M1 and M2. SHA3 gets an improvement of 4-7% on
Enable AES and SHA3 optimisations on Apple Silicon M3-based macOS systems AES gets a performance enhancement of 19-36%, similar to the M1 and M2. SHA3 gets an improvement of 4-7% on buffers 256 bytes or larger. Tested on an M3 Pro, but the CPU cores are the same on M3 and M3 Max. Change-Id: I2bf40bbde824823bb8cf2efd1bd945da9f23a703 Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/22685)
show more ...
|
#
ba9472c1 |
| 15-Jul-2023 |
sdlyyxy |
Update with `ARMV8_HAVE_SHA3_AND_WORTH_USING` Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/open
Update with `ARMV8_HAVE_SHA3_AND_WORTH_USING` Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/openssl/pull/21398)
show more ...
|
#
08e6eb21 |
| 14-Jul-2023 |
sdlyyxy |
Move CPU detection to armcap.c Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/openssl/pull/21398)
|
#
7b508cd1 |
| 27-Mar-2023 |
Tom Cosgrove |
Ensure there's only one copy of OPENSSL_armcap_P in libcrypto.a Change-Id: Ia94e528a2d55934435de6a2949784c52eb38d82f Reviewed-by: Matt Caswell <matt@openssl.org> Reviewed-by: To
Ensure there's only one copy of OPENSSL_armcap_P in libcrypto.a Change-Id: Ia94e528a2d55934435de6a2949784c52eb38d82f Reviewed-by: Matt Caswell <matt@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/20621)
show more ...
|
#
93370db1 |
| 21-Mar-2023 |
Tomas Mraz |
Avoid duplication of OPENSSL_armcap_P on 32bit ARM Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl
Avoid duplication of OPENSSL_armcap_P on 32bit ARM Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/openssl/pull/20558)
show more ...
|
#
52a38144 |
| 12-Feb-2023 |
Tom Cosgrove |
Tidy up aarch64 feature detection code in armcap.c Make the SIGILL-based code easier to read, and don't use it on Apple Silicon. Also fix "error: 'HWCAP(2)_*' macro redefined" warni
Tidy up aarch64 feature detection code in armcap.c Make the SIGILL-based code easier to read, and don't use it on Apple Silicon. Also fix "error: 'HWCAP(2)_*' macro redefined" warnings on FreeBSD. Fixes #20188 Change-Id: I5618bbe9444cc40cb5705c6ccbdc331c16bab794 Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/openssl/pull/20305)
show more ...
|
#
513e103f |
| 29-Jan-2023 |
Xiaokang Qian |
Apply aes-gcm unroll8+eor3 optimization patch to Neoverse V2 Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/op
Apply aes-gcm unroll8+eor3 optimization patch to Neoverse V2 Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/20184)
show more ...
|
#
d79bb531 |
| 25-Jan-2023 |
Tom Cosgrove |
Enable AES optimisation on Apple Silicon M2-based systems Gives a performance enhancement of 16-38%, similar to the M1. Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by:
Enable AES optimisation on Apple Silicon M2-based systems Gives a performance enhancement of 16-38%, similar to the M1. Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Hugo Landau <hlandau@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/openssl/pull/20141)
show more ...
|
#
f97ddfc3 |
| 03-Dec-2022 |
Tom Cosgrove |
Fix the code used to detect aarch64 capabilities when we don't have getauxval() In addition to a missing prototype there was also a missing closing brace '}'. Fixes #19825.
Fix the code used to detect aarch64 capabilities when we don't have getauxval() In addition to a missing prototype there was also a missing closing brace '}'. Fixes #19825. Reviewed-by: Matt Caswell <matt@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/19833)
show more ...
|
#
b863e1e4 |
| 27-Oct-2022 |
Everton Constantino |
Add two new build targets to enable the possibility of using clang-cl as an assembler for Windows on Arm builds and also clang-cl as the compiler as well. Make appropriate changes to armcap s
Add two new build targets to enable the possibility of using clang-cl as an assembler for Windows on Arm builds and also clang-cl as the compiler as well. Make appropriate changes to armcap source and peralsm scripts. Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Hugo Landau <hlandau@openssl.org> (Merged from https://github.com/openssl/openssl/pull/19523)
show more ...
|
#
f2ec24c9 |
| 23-Jul-2022 |
Cameron Gutman |
armcap: skip probing _armv7_tick() Detection of this feature is unreliable so only use it if requested. Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Dmitry Belyavskiy
armcap: skip probing _armv7_tick() Detection of this feature is unreliable so only use it if requested. Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Dmitry Belyavskiy <beldmit@gmail.com> Reviewed-by: Hugo Landau <hlandau@openssl.org> (Merged from https://github.com/openssl/openssl/pull/18852)
show more ...
|
#
9224a407 |
| 18-May-2022 |
XiaokangQian |
Apply the AES-GCM unroll8 optimization patch to Neoverse N2 The loop unrolling and use of EOR3 can improve N2 performance by up to 32% Signed-off-by: XiaokangQian <xiaokang.qian
Apply the AES-GCM unroll8 optimization patch to Neoverse N2 The loop unrolling and use of EOR3 can improve N2 performance by up to 32% Signed-off-by: XiaokangQian <xiaokang.qian@arm.com> Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/openssl/pull/18350)
show more ...
|
#
fecb3aae |
| 03-May-2022 |
Matt Caswell |
Update copyright year Reviewed-by: Tomas Mraz <tomas@openssl.org> Release: yes
|
#
b1b2146d |
| 07-Feb-2022 |
Daniel Hu |
Acceleration of chacha20 on aarch64 by SVE This patch accelerates chacha20 on aarch64 when Scalable Vector Extension (SVE) is supported by CPU. Tested on modern micro-architecture with
Acceleration of chacha20 on aarch64 by SVE This patch accelerates chacha20 on aarch64 when Scalable Vector Extension (SVE) is supported by CPU. Tested on modern micro-architecture with 256-bit SVE, it has the potential to improve performance up to 20% The solution takes a hybrid approach. SVE will handle multi-blocks that fit the SVE vector length, with Neon/Scalar to process any tail data Test result: With SVE type 1024 bytes 8192 bytes 16384 bytes ChaCha20 1596208.13k 1650010.79k 1653151.06k Without SVE (by Neon/Scalar) type 1024 bytes 8192 bytes 16384 bytes chacha20 1355487.91k 1372678.83k 1372662.44k The assembly code has been reviewed internally by ARM engineer Fangming.Fang@arm.com Signed-off-by: Daniel Hu <Daniel.Hu@arm.com> Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/openssl/pull/17916)
show more ...
|
#
954f45ba |
| 09-Jun-2021 |
XiaokangQian |
Optimize AES-GCM for uarchs with unroll and new instructions Increase the block numbers to 8 for every iteration. Increase the hash table capacity. Make use of EOR3 instruction to impr
Optimize AES-GCM for uarchs with unroll and new instructions Increase the block numbers to 8 for every iteration. Increase the hash table capacity. Make use of EOR3 instruction to improve the performance. This can improve performance 25-40% on out-of-order microarchitectures with a large number of fast execution units, such as Neoverse V1. We also see 20-30% performance improvements on other architectures such as the M1. Assembly code reviewd by Tom Cosgrove (ARM). Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/openssl/pull/15916)
show more ...
|
#
15b7175f |
| 19-Oct-2021 |
Daniel Hu |
SM4 optimization for ARM by HW instruction This patch implements the SM4 optimization for ARM processor, using SM4 HW instruction, which is an optional feature of crypto extension fo
SM4 optimization for ARM by HW instruction This patch implements the SM4 optimization for ARM processor, using SM4 HW instruction, which is an optional feature of crypto extension for aarch64 V8. Tested on some modern ARM micro-architectures with SM4 support, the performance uplift can be observed around 8X~40X over existing C implementation in openssl. Algorithms that can be parallelized (like CTR, ECB, CBC decryption) are on higher end, with algorithm like CBC encryption on lower end (due to inter-block dependency) Perf data on Yitian-710 2.75GHz hardware, before and after optimization: Before: type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes SM4-CTR 105787.80k 107837.87k 108380.84k 108462.08k 108549.46k 108554.92k SM4-ECB 111924.58k 118173.76k 119776.00k 120093.70k 120264.02k 120274.94k SM4-CBC 106428.09k 109190.98k 109674.33k 109774.51k 109827.41k 109827.41k After (7.4x - 36.6x faster): type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes SM4-CTR 781979.02k 2432994.28k 3437753.86k 3834177.88k 3963715.58k 3974556.33k SM4-ECB 937590.69k 2941689.02k 3945751.81k 4328655.87k 4459181.40k 4468692.31k SM4-CBC 890639.88k 1027746.58k 1050621.78k 1056696.66k 1058613.93k 1058701.31k Signed-off-by: Daniel Hu <Daniel.Hu@arm.com> Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/17455)
show more ...
|
#
71396cd0 |
| 24-Dec-2021 |
fangming.fang |
SM3 acceleration with SM3 hardware instruction on aarch64 SM3 hardware instruction is optional feature of crypto extension for aarch64. This implementation accelerates SM3 via SM3 instru
SM3 acceleration with SM3 hardware instruction on aarch64 SM3 hardware instruction is optional feature of crypto extension for aarch64. This implementation accelerates SM3 via SM3 instructions. For the platform not supporting SM3 instruction, the original C implementation still works. Thanks to AliBaba for testing and reporting the following perf numbers for Yitian710: Benchmark on T-Head Yitian-710 2.75GHz: Before: type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes sm3 49297.82k 121062.63k 223106.05k 283371.52k 307574.10k 309400.92k After (33% - 74% faster): type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes sm3 65640.01k 179121.79k 359854.59k 481448.96k 534055.59k 538274.47k Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/17454)
show more ...
|
#
abc4345a |
| 28-Dec-2021 |
fangming.fang |
fix building failure when using -Wconditional-uninitialized Use clang -Wconditional-uninitialized to build, the error "initialize the variable 'buffer_size' to silence this warning" will
fix building failure when using -Wconditional-uninitialized Use clang -Wconditional-uninitialized to build, the error "initialize the variable 'buffer_size' to silence this warning" will be reported. Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/17375)
show more ...
|
Revision tags: openssl-3.0.0-alpha17 |
|
#
efa1f224 |
| 19-May-2021 |
Orr Toledano |
Add Arm Assembly (aarch64) support for RNG Include aarch64 asm instructions for random number generation using the RNDR and RNDRRS instructions. Provide detection functions for RNDR and
Add Arm Assembly (aarch64) support for RNG Include aarch64 asm instructions for random number generation using the RNDR and RNDRRS instructions. Provide detection functions for RNDR and RNDRRS getauxval. Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/15361)
show more ...
|
#
c1dabe26 |
| 19-Nov-2021 |
Allan Jude |
Fix detection of ARMv7 and ARM64 CPU features on FreeBSD OpenSSL assumes AT_HWCAP = 16 (as on Linux), but on FreeBSD AT_HWCAP = 25 Switch to using AT_HWCAP, and setting it to 16 if it is
Fix detection of ARMv7 and ARM64 CPU features on FreeBSD OpenSSL assumes AT_HWCAP = 16 (as on Linux), but on FreeBSD AT_HWCAP = 25 Switch to using AT_HWCAP, and setting it to 16 if it is not defined. OpenSSL calls elf_auxv_info() with AT_CANARY which returns ENOENT resulting in all ARM acceleration features being disabled. CLA: trivial Reviewed-by: Ben Kaduk <kaduk@mit.edu> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/17082)
show more ...
|