History log of /openssl/crypto/armcap.c (Results 1 – 25 of 53)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 01f4b44e 01-Sep-2024 Brad Smith

Add support for elf_aux_info() on OpenBSD

CLA: trivial

Reviewed-by: Dmitry Belyavskiy <beldmit@gmail.com>
Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com>
Reviewed-by: Tomas

Add support for elf_aux_info() on OpenBSD

CLA: trivial

Reviewed-by: Dmitry Belyavskiy <beldmit@gmail.com>
Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/25346)

show more ...


# b6461792 20-Mar-2024 Richard Levitte

Copyright year updates

Reviewed-by: Neil Horman <nhorman@openssl.org>
Release: yes
(cherry picked from commit 0ce7d1f355c1240653e320a3f6f8109c1f05f8c0)

Reviewed-by: Hugo Lan

Copyright year updates

Reviewed-by: Neil Horman <nhorman@openssl.org>
Release: yes
(cherry picked from commit 0ce7d1f355c1240653e320a3f6f8109c1f05f8c0)

Reviewed-by: Hugo Landau <hlandau@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/24034)

show more ...


# e7f1afe4 21-Mar-2024 Jiangning Liu

Enable SHA3 unrolling and EOR3 optimization for Ampere

Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/op

Enable SHA3 unrolling and EOR3 optimization for Ampere

Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/23929)

show more ...


# 11adf9a7 21-Feb-2024 Tom Cosgrove

Apply the AES-GCM unroll8 optimisation to Microsoft Azure Cobalt 100

Performance improvements range from 18% to 32%.

Change-Id: Ifb89eeac3c0625a582a25ff07cf7f9c9ec8f5ba6

Re

Apply the AES-GCM unroll8 optimisation to Microsoft Azure Cobalt 100

Performance improvements range from 18% to 32%.

Change-Id: Ifb89eeac3c0625a582a25ff07cf7f9c9ec8f5ba6

Reviewed-by: Hugo Landau <hlandau@openssl.org>
Reviewed-by: Neil Horman <nhorman@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/23651)

show more ...


# cc82b09c 17-Oct-2023 fisher.yu

Optimize AES-CTR for ARM Neoverse V1 and V2.

Unroll AES-CTR loops to a maximum 12 blocks for ARM Neoverse V1 and
V2, to fully utilize their AES pipeline resources.

I

Optimize AES-CTR for ARM Neoverse V1 and V2.

Unroll AES-CTR loops to a maximum 12 blocks for ARM Neoverse V1 and
V2, to fully utilize their AES pipeline resources.

Improvement on ARM Neoverse V1.

Package Size(Bytes) 16 32 64 128 256 1024
Improvement(%) 3.93 -0.45 11.30 4.31 12.48 37.66
Package Size(Bytes) 1500 8192 16384 61440 65536
Improvement(%) 37.16 38.90 39.89 40.55 40.41

Change-Id: Ifb8fad9af22476259b9ba75132bc3d8010a7fdbd

Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/22733)

show more ...


# 7602bf87 08-Nov-2023 Tom Cosgrove

Enable AES and SHA3 optimisations on Apple Silicon M3-based macOS systems

AES gets a performance enhancement of 19-36%, similar to the M1 and M2.

SHA3 gets an improvement of 4-7% on

Enable AES and SHA3 optimisations on Apple Silicon M3-based macOS systems

AES gets a performance enhancement of 19-36%, similar to the M1 and M2.

SHA3 gets an improvement of 4-7% on buffers 256 bytes or larger.

Tested on an M3 Pro, but the CPU cores are the same on M3 and M3 Max.

Change-Id: I2bf40bbde824823bb8cf2efd1bd945da9f23a703

Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/22685)

show more ...


# ba9472c1 15-Jul-2023 sdlyyxy

Update with `ARMV8_HAVE_SHA3_AND_WORTH_USING`

Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/open

Update with `ARMV8_HAVE_SHA3_AND_WORTH_USING`

Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/21398)

show more ...


# 08e6eb21 14-Jul-2023 sdlyyxy

Move CPU detection to armcap.c

Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/21398)


# 7b508cd1 27-Mar-2023 Tom Cosgrove

Ensure there's only one copy of OPENSSL_armcap_P in libcrypto.a

Change-Id: Ia94e528a2d55934435de6a2949784c52eb38d82f

Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: To

Ensure there's only one copy of OPENSSL_armcap_P in libcrypto.a

Change-Id: Ia94e528a2d55934435de6a2949784c52eb38d82f

Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/20621)

show more ...


# 93370db1 21-Mar-2023 Tomas Mraz

Avoid duplication of OPENSSL_armcap_P on 32bit ARM

Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl

Avoid duplication of OPENSSL_armcap_P on 32bit ARM

Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/20558)

show more ...


# 52a38144 12-Feb-2023 Tom Cosgrove

Tidy up aarch64 feature detection code in armcap.c

Make the SIGILL-based code easier to read, and don't use it on Apple Silicon.

Also fix "error: 'HWCAP(2)_*' macro redefined" warni

Tidy up aarch64 feature detection code in armcap.c

Make the SIGILL-based code easier to read, and don't use it on Apple Silicon.

Also fix "error: 'HWCAP(2)_*' macro redefined" warnings on FreeBSD.

Fixes #20188

Change-Id: I5618bbe9444cc40cb5705c6ccbdc331c16bab794

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/20305)

show more ...


# 513e103f 29-Jan-2023 Xiaokang Qian

Apply aes-gcm unroll8+eor3 optimization patch to Neoverse V2

Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/op

Apply aes-gcm unroll8+eor3 optimization patch to Neoverse V2

Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/20184)

show more ...


# d79bb531 25-Jan-2023 Tom Cosgrove

Enable AES optimisation on Apple Silicon M2-based systems

Gives a performance enhancement of 16-38%, similar to the M1.

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by:

Enable AES optimisation on Apple Silicon M2-based systems

Gives a performance enhancement of 16-38%, similar to the M1.

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Hugo Landau <hlandau@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/20141)

show more ...


# f97ddfc3 03-Dec-2022 Tom Cosgrove

Fix the code used to detect aarch64 capabilities when we don't have getauxval()

In addition to a missing prototype there was also a missing closing brace '}'.

Fixes #19825.

Fix the code used to detect aarch64 capabilities when we don't have getauxval()

In addition to a missing prototype there was also a missing closing brace '}'.

Fixes #19825.

Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/19833)

show more ...


# b863e1e4 27-Oct-2022 Everton Constantino

Add two new build targets to enable the possibility of using clang-cl as
an assembler for Windows on Arm builds and also clang-cl as the compiler
as well. Make appropriate changes to armcap s

Add two new build targets to enable the possibility of using clang-cl as
an assembler for Windows on Arm builds and also clang-cl as the compiler
as well. Make appropriate changes to armcap source and peralsm scripts.

Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Hugo Landau <hlandau@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/19523)

show more ...


# f2ec24c9 23-Jul-2022 Cameron Gutman

armcap: skip probing _armv7_tick()

Detection of this feature is unreliable so only use it if requested.

Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Dmitry Belyavskiy

armcap: skip probing _armv7_tick()

Detection of this feature is unreliable so only use it if requested.

Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Dmitry Belyavskiy <beldmit@gmail.com>
Reviewed-by: Hugo Landau <hlandau@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/18852)

show more ...


# 9224a407 18-May-2022 XiaokangQian

Apply the AES-GCM unroll8 optimization patch to Neoverse N2

The loop unrolling and use of EOR3 can improve N2 performance
by up to 32%

Signed-off-by: XiaokangQian <xiaokang.qian

Apply the AES-GCM unroll8 optimization patch to Neoverse N2

The loop unrolling and use of EOR3 can improve N2 performance
by up to 32%

Signed-off-by: XiaokangQian <xiaokang.qian@arm.com>

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/18350)

show more ...


# fecb3aae 03-May-2022 Matt Caswell

Update copyright year

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Release: yes


# b1b2146d 07-Feb-2022 Daniel Hu

Acceleration of chacha20 on aarch64 by SVE

This patch accelerates chacha20 on aarch64 when Scalable Vector Extension
(SVE) is supported by CPU. Tested on modern micro-architecture with

Acceleration of chacha20 on aarch64 by SVE

This patch accelerates chacha20 on aarch64 when Scalable Vector Extension
(SVE) is supported by CPU. Tested on modern micro-architecture with
256-bit SVE, it has the potential to improve performance up to 20%

The solution takes a hybrid approach. SVE will handle multi-blocks that fit
the SVE vector length, with Neon/Scalar to process any tail data

Test result:
With SVE
type 1024 bytes 8192 bytes 16384 bytes
ChaCha20 1596208.13k 1650010.79k 1653151.06k

Without SVE (by Neon/Scalar)
type 1024 bytes 8192 bytes 16384 bytes
chacha20 1355487.91k 1372678.83k 1372662.44k

The assembly code has been reviewed internally by
ARM engineer Fangming.Fang@arm.com

Signed-off-by: Daniel Hu <Daniel.Hu@arm.com>

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17916)

show more ...


# 954f45ba 09-Jun-2021 XiaokangQian

Optimize AES-GCM for uarchs with unroll and new instructions

Increase the block numbers to 8 for every iteration. Increase the hash
table capacity. Make use of EOR3 instruction to impr

Optimize AES-GCM for uarchs with unroll and new instructions

Increase the block numbers to 8 for every iteration. Increase the hash
table capacity. Make use of EOR3 instruction to improve the performance.

This can improve performance 25-40% on out-of-order microarchitectures
with a large number of fast execution units, such as Neoverse V1. We also
see 20-30% performance improvements on other architectures such as the M1.

Assembly code reviewd by Tom Cosgrove (ARM).

Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/15916)

show more ...


# 15b7175f 19-Oct-2021 Daniel Hu

SM4 optimization for ARM by HW instruction

This patch implements the SM4 optimization for ARM processor,
using SM4 HW instruction, which is an optional feature of
crypto extension fo

SM4 optimization for ARM by HW instruction

This patch implements the SM4 optimization for ARM processor,
using SM4 HW instruction, which is an optional feature of
crypto extension for aarch64 V8.

Tested on some modern ARM micro-architectures with SM4 support, the
performance uplift can be observed around 8X~40X over existing
C implementation in openssl. Algorithms that can be parallelized
(like CTR, ECB, CBC decryption) are on higher end, with algorithm
like CBC encryption on lower end (due to inter-block dependency)

Perf data on Yitian-710 2.75GHz hardware, before and after optimization:

Before:
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
SM4-CTR 105787.80k 107837.87k 108380.84k 108462.08k 108549.46k 108554.92k
SM4-ECB 111924.58k 118173.76k 119776.00k 120093.70k 120264.02k 120274.94k
SM4-CBC 106428.09k 109190.98k 109674.33k 109774.51k 109827.41k 109827.41k

After (7.4x - 36.6x faster):
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
SM4-CTR 781979.02k 2432994.28k 3437753.86k 3834177.88k 3963715.58k 3974556.33k
SM4-ECB 937590.69k 2941689.02k 3945751.81k 4328655.87k 4459181.40k 4468692.31k
SM4-CBC 890639.88k 1027746.58k 1050621.78k 1056696.66k 1058613.93k 1058701.31k

Signed-off-by: Daniel Hu <Daniel.Hu@arm.com>

Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17455)

show more ...


# 71396cd0 24-Dec-2021 fangming.fang

SM3 acceleration with SM3 hardware instruction on aarch64

SM3 hardware instruction is optional feature of crypto extension for
aarch64. This implementation accelerates SM3 via SM3 instru

SM3 acceleration with SM3 hardware instruction on aarch64

SM3 hardware instruction is optional feature of crypto extension for
aarch64. This implementation accelerates SM3 via SM3 instructions. For
the platform not supporting SM3 instruction, the original C
implementation still works. Thanks to AliBaba for testing and reporting
the following perf numbers for Yitian710:

Benchmark on T-Head Yitian-710 2.75GHz:

Before:
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
sm3 49297.82k 121062.63k 223106.05k 283371.52k 307574.10k 309400.92k

After (33% - 74% faster):
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
sm3 65640.01k 179121.79k 359854.59k 481448.96k 534055.59k 538274.47k

Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17454)

show more ...


# abc4345a 28-Dec-2021 fangming.fang

fix building failure when using -Wconditional-uninitialized

Use clang -Wconditional-uninitialized to build, the error "initialize
the variable 'buffer_size' to silence this warning" will

fix building failure when using -Wconditional-uninitialized

Use clang -Wconditional-uninitialized to build, the error "initialize
the variable 'buffer_size' to silence this warning" will be reported.

Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17375)

show more ...


Revision tags: openssl-3.0.0-alpha17
# efa1f224 19-May-2021 Orr Toledano

Add Arm Assembly (aarch64) support for RNG

Include aarch64 asm instructions for random number generation using the
RNDR and RNDRRS instructions. Provide detection functions for RNDR and

Add Arm Assembly (aarch64) support for RNG

Include aarch64 asm instructions for random number generation using the
RNDR and RNDRRS instructions. Provide detection functions for RNDR and
RNDRRS getauxval.

Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/15361)

show more ...


# c1dabe26 19-Nov-2021 Allan Jude

Fix detection of ARMv7 and ARM64 CPU features on FreeBSD

OpenSSL assumes AT_HWCAP = 16 (as on Linux), but on FreeBSD AT_HWCAP = 25
Switch to using AT_HWCAP, and setting it to 16 if it is

Fix detection of ARMv7 and ARM64 CPU features on FreeBSD

OpenSSL assumes AT_HWCAP = 16 (as on Linux), but on FreeBSD AT_HWCAP = 25
Switch to using AT_HWCAP, and setting it to 16 if it is not defined.

OpenSSL calls elf_auxv_info() with AT_CANARY which returns ENOENT
resulting in all ARM acceleration features being disabled.

CLA: trivial

Reviewed-by: Ben Kaduk <kaduk@mit.edu>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17082)

show more ...


123