History log of /openssl/crypto/poly1305/asm/poly1305-x86_64.pl (Results 1 – 25 of 43)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 7ed6de99 05-Sep-2024 Tomas Mraz

Copyright year updates


Reviewed-by: Neil Horman <nhorman@openssl.org>
Release: yes


# 25391acc 01-Mar-2024 Theo Buehler

Unable to run asm code on OpenBSD (amd64)

In order to get asm code running on OpenBSD we must place
all constants into .rodata sections.

davidben@ also pointed out we need to ad

Unable to run asm code on OpenBSD (amd64)

In order to get asm code running on OpenBSD we must place
all constants into .rodata sections.

davidben@ also pointed out we need to adjust `x86_64-xlate.pl` perlasm
script to adjust read-olny sections for various flavors (OSes). Those
changes were cherry-picked from boringssl.

closes #23312

Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/23997)

show more ...


# da1c088f 07-Sep-2023 Matt Caswell

Copyright year updates


Reviewed-by: Richard Levitte <levitte@openssl.org>
Release: yes


# 7b8e27bc 22-Aug-2023 Bernd Edlinger

Avoid clobbering non-volatile XMM registers

This affects some Poly1305 assembler functions
which are only used for certain CPU types.

Remove those functions for Windows targets,

Avoid clobbering non-volatile XMM registers

This affects some Poly1305 assembler functions
which are only used for certain CPU types.

Remove those functions for Windows targets,
as a simple interim solution.

Fixes #21522

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/21808)

show more ...


Revision tags: openssl-3.0.0-alpha17, openssl-3.0.0-alpha16, openssl-3.0.0-alpha15, openssl-3.0.0-alpha14, OpenSSL_1_1_1k, openssl-3.0.0-alpha13, openssl-3.0.0-alpha12, OpenSSL_1_1_1j, openssl-3.0.0-alpha11, openssl-3.0.0-alpha10, OpenSSL_1_1_1i, openssl-3.0.0-alpha9, openssl-3.0.0-alpha8, openssl-3.0.0-alpha7, OpenSSL_1_1_1h
# cd84d883 26-Aug-2020 Jung-uk Kim

Ignore vendor name in Clang version number.

For example, FreeBSD prepends "FreeBSD" to version string, e.g.,

FreeBSD clang version 11.0.0 (git@github.com:llvm/llvm-project.git llvmo

Ignore vendor name in Clang version number.

For example, FreeBSD prepends "FreeBSD" to version string, e.g.,

FreeBSD clang version 11.0.0 (git@github.com:llvm/llvm-project.git llvmorg-11.0.0-rc2-0-g414f32a9e86)
Target: x86_64-unknown-freebsd13.0
Thread model: posix
InstalledDir: /usr/bin

This prevented us from properly detecting AVX support, etc.

CLA: trivial

Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: Paul Dale <paul.dale@oracle.com>
Reviewed-by: Ben Kaduk <kaduk@mit.edu>
(Merged from https://github.com/openssl/openssl/pull/12725)

show more ...


Revision tags: openssl-3.0.0-alpha6, openssl-3.0.0-alpha5, openssl-3.0.0-alpha4, openssl-3.0.0-alpha3, openssl-3.0.0-alpha2, openssl-3.0.0-alpha1
# 33388b44 23-Apr-2020 Matt Caswell

Update copyright year

Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/11616)


Revision tags: OpenSSL_1_1_1g, OpenSSL_1_1_1f, OpenSSL_1_1_1e
# a21314db 17-Feb-2020 David Benjamin

Also check for errors in x86_64-xlate.pl.

In https://github.com/openssl/openssl/pull/10883, I'd meant to exclude
the perlasm drivers since they aren't opening pipes and do not
partic

Also check for errors in x86_64-xlate.pl.

In https://github.com/openssl/openssl/pull/10883, I'd meant to exclude
the perlasm drivers since they aren't opening pipes and do not
particularly need it, but I only noticed x86_64-xlate.pl, so
arm-xlate.pl and ppc-xlate.pl got the change.

That seems to have been fine, so be consistent and also apply the change
to x86_64-xlate.pl. Checking for errors is generally a good idea.

Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: David Benjamin <davidben@google.com>
(Merged from https://github.com/openssl/openssl/pull/10930)

show more ...


# 98ad3fe8 31-Jan-2020 H.J. Lu

x86_64: Add endbranch at function entries for Intel CET

To support Intel CET, all indirect branch targets must start with
endbranch. Here is a patch to add endbranch to function entries

x86_64: Add endbranch at function entries for Intel CET

To support Intel CET, all indirect branch targets must start with
endbranch. Here is a patch to add endbranch to function entries
in x86_64 assembly codes which are indirect branch targets as
discovered by running openssl testsuite on Intel CET machine and
visual inspection.

Verified with

$ CC="gcc -Wl,-z,cet-report=error" ./Configure shared linux-x86_64 -fcf-protection
$ make
$ make test

and

$ CC="gcc -mx32 -Wl,-z,cet-report=error" ./Configure shared linux-x32 -fcf-protection
$ make
$ make test # <<< passed with https://github.com/openssl/openssl/pull/10988

Reviewed-by: Tomas Mraz <tmraz@fedoraproject.org>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/10982)

show more ...


# 32be631c 17-Jan-2020 David Benjamin

Do not silently truncate files on perlasm errors

If one of the perlasm xlate drivers crashes, OpenSSL's build will
currently swallow the error and silently truncate the output to however

Do not silently truncate files on perlasm errors

If one of the perlasm xlate drivers crashes, OpenSSL's build will
currently swallow the error and silently truncate the output to however
far the driver got. This will hopefully fail to build, but better to
check such things.

Handle this by checking for errors when closing STDOUT (which is a pipe
to the xlate driver).

Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: Tim Hudson <tjh@openssl.org>
Reviewed-by: Tomas Mraz <tmraz@fedoraproject.org>
(Merged from https://github.com/openssl/openssl/pull/10883)

show more ...


# 9bb3e5fd 15-Jan-2020 Richard Levitte

For all assembler scripts where it matters, recognise clang > 9.x

Fixes #10853

Reviewed-by: Paul Dale <paul.dale@oracle.com>
(Merged from https://github.com/openssl/openssl/pull

For all assembler scripts where it matters, recognise clang > 9.x

Fixes #10853

Reviewed-by: Paul Dale <paul.dale@oracle.com>
(Merged from https://github.com/openssl/openssl/pull/10855)

show more ...


# 048fa13e 22-Dec-2019 Bernd Edlinger

Add some missing cfi frame info in poly1305-x86_64.pl

Reviewed-by: Kurt Roeckx <kurt@roeckx.be>
(Merged from https://github.com/openssl/openssl/pull/10678)


Revision tags: OpenSSL_1_0_2u
# 1aa89a7a 12-Sep-2019 Richard Levitte

Unify all assembler file generators

They now generally conform to the following argument sequence:

script.pl "$(PERLASM_SCHEME)" [ C preprocessor arguments ... ] \

Unify all assembler file generators

They now generally conform to the following argument sequence:

script.pl "$(PERLASM_SCHEME)" [ C preprocessor arguments ... ] \
$(PROCESSOR) <output file>

However, in the spirit of being able to use these scripts manually,
they also allow for no argument, or for only the flavour, or for only
the output file. This is done by only using the last argument as
output file if it's a file (it has an extension), and only using the
first argument as flavour if it isn't a file (it doesn't have an
extension).

While we're at it, we make all $xlate calls the same, i.e. the $output
argument is always quoted, and we always die on error when trying to
start $xlate.

There's a perl lesson in this, regarding operator priority...

This will always succeed, even when it fails:

open FOO, "something" || die "ERR: $!";

The reason is that '||' has higher priority than list operators (a
function is essentially a list operator and gobbles up everything
following it that isn't lower priority), and since a non-empty string
is always true, so that ends up being exactly the same as:

open FOO, "something";

This, however, will fail if "something" can't be opened:

open FOO, "something" or die "ERR: $!";

The reason is that 'or' has lower priority that list operators,
i.e. it's performed after the 'open' call.

Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/9884)

show more ...


Revision tags: OpenSSL_1_0_2t, OpenSSL_1_1_0l, OpenSSL_1_1_1d, OpenSSL_1_1_1c, OpenSSL_1_1_0k, OpenSSL_1_0_2s, OpenSSL_1_0_2r, OpenSSL_1_1_1b
# 49d3b641 06-Dec-2018 Richard Levitte

Following the license change, modify the boilerplates in crypto/poly1305/

[skip ci]

Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from https://github.com/openssl/openssl/

Following the license change, modify the boilerplates in crypto/poly1305/

[skip ci]

Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/7810)

show more ...


Revision tags: OpenSSL_1_0_2q, OpenSSL_1_1_0j, OpenSSL_1_1_1a, OpenSSL_1_1_1
# 1212818e 11-Sep-2018 Matt Caswell

Update copyright year

Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/7176)


Revision tags: OpenSSL_1_1_1-pre9, OpenSSL_1_0_2p, OpenSSL_1_1_0i
# 89778806 09-Jul-2018 Andy Polyakov

poly1305/asm/poly1305-x86_64.pl: fix solaris64-x86_64-cc build.

Reviewed-by: Paul Dale <paul.dale@oracle.com>
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.c

poly1305/asm/poly1305-x86_64.pl: fix solaris64-x86_64-cc build.

Reviewed-by: Paul Dale <paul.dale@oracle.com>
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/6676)

show more ...


# 0edb109f 03-Jul-2018 Andy Polyakov

evp/e_chacha20_poly1305.c: further improve small-fragment TLS performance.

Improvement coefficients vary with TLS fragment length and platform, on
most Intel processors maximum improveme

evp/e_chacha20_poly1305.c: further improve small-fragment TLS performance.

Improvement coefficients vary with TLS fragment length and platform, on
most Intel processors maximum improvement is ~50%, while on Ryzen - 80%.
The "secret" is new dedicated ChaCha20_128 code path and vectorized xor
helpers.

Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/6638)

show more ...


Revision tags: OpenSSL_1_1_1-pre8, OpenSSL_1_1_1-pre7, OpenSSL_1_1_1-pre6, OpenSSL_1_1_1-pre5, OpenSSL_1_1_1-pre4, OpenSSL_1_0_2o, OpenSSL_1_1_0h, OpenSSL_1_1_1-pre3, OpenSSL_1_1_1-pre2, OpenSSL_1_1_1-pre1, OpenSSL_1_0_2n
# 4dfe4310 06-Dec-2017 Andy Polyakov

poly1305/asm/poly1305-x86_64.pl: add Knights Landing AVX512 result.

Hardware used for benchmarking courtesy of Atos, experiments run by
Romain Dolbeau <romain.dolbeau@atos.net>. Kudos!

poly1305/asm/poly1305-x86_64.pl: add Knights Landing AVX512 result.

Hardware used for benchmarking courtesy of Atos, experiments run by
Romain Dolbeau <romain.dolbeau@atos.net>. Kudos!

Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/4855)

show more ...


# a8f302e5 20-Nov-2017 Andy Polyakov

poly1305/asm/poly1305-x86_64.pl: switch to pure AVX512F.

Convert AVX512F+VL+BW code path to pure AVX512F, so that it can be
executed even on Knights Landing. Trigger for modification was

poly1305/asm/poly1305-x86_64.pl: switch to pure AVX512F.

Convert AVX512F+VL+BW code path to pure AVX512F, so that it can be
executed even on Knights Landing. Trigger for modification was
observation that AVX512 code paths can negatively affect overall
Skylake-X system performance. Since we are likely to suppress
AVX512F capability flag [at least on Skylake-X], conversion serves
as kind of "investment protection".

Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/4758)

show more ...


# 46f4e1be 12-Nov-2017 Josh Soref

Many spelling fixes/typo's corrected.

Around 138 distinct errors found and fixed; thanks!

Reviewed-by: Kurt Roeckx <kurt@roeckx.be>
Reviewed-by: Tim Hudson <tjh@openssl.org>

Many spelling fixes/typo's corrected.

Around 138 distinct errors found and fixed; thanks!

Reviewed-by: Kurt Roeckx <kurt@roeckx.be>
Reviewed-by: Tim Hudson <tjh@openssl.org>
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/3459)

show more ...


Revision tags: OpenSSL_1_0_2m, OpenSSL_1_1_0g
# 64d92d74 20-Jul-2017 Andy Polyakov

x86_64 assembly pack: "optimize" for Knights Landing, add AVX-512 results.

"Optimize" is in quotes because it's rather a "salvage operation"
for now. Idea is to identify processor capabi

x86_64 assembly pack: "optimize" for Knights Landing, add AVX-512 results.

"Optimize" is in quotes because it's rather a "salvage operation"
for now. Idea is to identify processor capability flags that
drive Knights Landing to suboptimial code paths and mask them.
Two flags were identified, XSAVE and ADCX/ADOX. Former affects
choice of AES-NI code path specific for Silvermont (Knights Landing
is of Silvermont "ancestry"). And 64-bit ADCX/ADOX instructions are
effectively mishandled at decode time. In both cases we are looking
at ~2x improvement.

AVX-512 results cover even Skylake-X :-)

Hardware used for benchmarking courtesy of Atos, experiments run by
Romain Dolbeau <romain.dolbeau@atos.net>. Kudos!

Reviewed-by: Rich Salz <rsalz@openssl.org>

show more ...


# 54f8f9a1 30-Jun-2017 Andy Polyakov

x86_64 assembly pack: fill some blanks in Ryzen results.

Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>


Revision tags: OpenSSL_1_0_2l, OpenSSL_1_1_0f, OpenSSL-fips-2_0_16
# 0a5d1a38 18-Mar-2017 Andy Polyakov

poly1305/asm/poly1305-x86_64.pl: add poly1305_blocks_vpmadd52_8x.

As hinted by its name new subroutine processes 8 input blocks in
parallel by loading data to 512-bit registers. It still

poly1305/asm/poly1305-x86_64.pl: add poly1305_blocks_vpmadd52_8x.

As hinted by its name new subroutine processes 8 input blocks in
parallel by loading data to 512-bit registers. It still needs more
work, as it needs to handle some specific input lengths better.
In this sense it's yet another intermediate step...

Reviewed-by: Rich Salz <rsalz@openssl.org>

show more ...


# 6cbfd94d 18-Mar-2017 Andy Polyakov

x86_64 assembly pack: add some Ryzen performance results.

Reviewed-by: Tim Hudson <tjh@openssl.org>


# c2b93590 12-Mar-2017 Andy Polyakov

poly1305/asm/poly1305-x86_64.pl: add poly1305_blocks_vpmadd52_4x.

As hinted by its name new subroutine processes 4 input blocks in
parallel. It still operates on 256-bit registers and is

poly1305/asm/poly1305-x86_64.pl: add poly1305_blocks_vpmadd52_4x.

As hinted by its name new subroutine processes 4 input blocks in
parallel. It still operates on 256-bit registers and is just
another step toward full-blown AVX512IFMA procedure.

Reviewed-by: Rich Salz <rsalz@openssl.org>

show more ...


# e052083c 25-Feb-2017 Andy Polyakov

poly1305/asm/poly1305-x86_64.pl: minor AVX512 optimization.

Reviewed-by: Rich Salz <rsalz@openssl.org>


12