History log of /openssl/crypto/chacha/asm/chacha-loongarch64.pl (Results 1 – 5 of 5)
Revision Date Author Comments
# b6461792 20-Mar-2024 Richard Levitte

Copyright year updates

Reviewed-by: Neil Horman <nhorman@openssl.org>
Release: yes
(cherry picked from commit 0ce7d1f355c1240653e320a3f6f8109c1f05f8c0)

Reviewed-by: Hugo Lan

Copyright year updates

Reviewed-by: Neil Horman <nhorman@openssl.org>
Release: yes
(cherry picked from commit 0ce7d1f355c1240653e320a3f6f8109c1f05f8c0)

Reviewed-by: Hugo Landau <hlandau@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/24034)

show more ...


# e5313f20 13-Feb-2024 Shakti Shah

OpenSSL License is applied for some source files, change to Apache 2

The following files

include/openssl/hpke.h
crypto/hpke/hpke.c
crypto/ec/asm/ecp_sm2p256-armv8.pl
cry

OpenSSL License is applied for some source files, change to Apache 2

The following files

include/openssl/hpke.h
crypto/hpke/hpke.c
crypto/ec/asm/ecp_sm2p256-armv8.pl
crypto/chacha/asm/chacha-loongarch64.pl
still seem to be released under the OpenSSL License instead of the Apache 2 license.

Fixes #23570

Reviewed-by: Shane Lontis <shane.lontis@oracle.com>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/23576)

show more ...


# 97102853 14-Jan-2024 Lin Runze

Fix performance regression of ChaCha20 on LoongArch64

The regression was introduced in PR #22817.

In that pull request, the input length check was moved forward,
but the related

Fix performance regression of ChaCha20 on LoongArch64

The regression was introduced in PR #22817.

In that pull request, the input length check was moved forward,
but the related ori instruction was missing, and it will cause
input of any length down to the much slower scalar implementation.

Fixes #23300

CLA: trivial

Reviewed-by: Shane Lontis <shane.lontis@oracle.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/23301)

show more ...


# b46de72c 25-Nov-2023 Xi Ruoyao

LoongArch64 assembly pack: Fix ChaCha20 ABI breakage

The [LP64D ABI][1] requires the floating-point registers f24-f31
(aka fs0-fs7) callee-saved. The low 64 bits of a LSX/LASX vector

LoongArch64 assembly pack: Fix ChaCha20 ABI breakage

The [LP64D ABI][1] requires the floating-point registers f24-f31
(aka fs0-fs7) callee-saved. The low 64 bits of a LSX/LASX vector
register aliases with the corresponding FPR, so we must save and restore
the callee-saved FPR when we writes into the corresponding vector
register.

This ABI breakage can be easily demonstrated by injecting the use of a
saved FPR into the test in bio_enc_test.c:

static int test_bio_enc_chacha20(int idx)
{
register double fs7 asm("f31") = 114.514;
asm("#optimize barrier":"+f"(fs7));
return do_test_bio_cipher(EVP_chacha20(), idx) && fs7 == 114.514;
}

So fix it. To make the logic simpler, jump into the scalar
implementation earlier when LSX and LASX are not enumerated in AT_HWCAP,
or the input is too short.

[1]: https://github.com/loongson/la-abi-specs/blob/v2.20/lapcs.adoc#floating-point-registers

Reviewed-by: Neil Horman <nhorman@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/22817)

show more ...


# 9a41a3c6 07-Sep-2023 Min Zhou

LoongArch64 assembly pack: add ChaCha20 modules

This assembly implementation for ChaCha20 includes three code paths:
scalar path, 128-bit LSX path and 256-bit LASX path. We prefer the

LoongArch64 assembly pack: add ChaCha20 modules

This assembly implementation for ChaCha20 includes three code paths:
scalar path, 128-bit LSX path and 256-bit LASX path. We prefer the
LASX path or LSX path if the hardware and system support these
extensions.

There are 32 vector registers avaialable in the LSX and LASX
extensions. So, we can load the 16 initial states and the 16
intermediate states of ChaCha into the 32 vector registers for
calculating in the implementation. The test results on the 3A5000
and 3A6000 show that this assembly implementation significantly
improves the performance of ChaCha20 on LoongArch based machines.
The detailed test results are as following.

Test with:
$ openssl speed -evp chacha20

3A5000
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
C code 178484.53k 282789.93k 311793.70k 322234.99k 324405.93k 324659.88k
assembly code 223152.28k 407863.65k 989520.55k 2049192.96k 2127248.70k 2131749.55k
+25% +44% +217% +536% +556% +557%

3A6000
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
C code 214945.33k 310041.75k 340724.22k 349949.27k 352925.01k 353140.74k
assembly code 299151.34k 492766.34k 2070166.02k 4300909.91k 4473978.88k 4499084.63k
+39% +59% +508% +1129% +1168% +1174%

Signed-off-by: Min Zhou <zhoumin@loongson.cn>

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/21998)

show more ...