poly1305-x86_64.pl - OpenGrok history log for /openssl/crypto/poly1305/asm/poly1305-x86

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
# 7ed6de99	05-Sep-2024	Tomas Mraz	Copyright year updates Reviewed-by: Neil Horman <nhorman@openssl.org> Release: yes
# 25391acc	01-Mar-2024	Theo Buehler	Unable to run asm code on OpenBSD (amd64) In order to get asm code running on OpenBSD we must place all constants into .rodata sections. davidben@ also pointed out we need to ad Unable to run asm code on OpenBSD (amd64) In order to get asm code running on OpenBSD we must place all constants into .rodata sections. davidben@ also pointed out we need to adjust `x86_64-xlate.pl` perlasm script to adjust read-olny sections for various flavors (OSes). Those changes were cherry-picked from boringssl. closes #23312 Reviewed-by: Richard Levitte <levitte@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/23997) show more ...
# da1c088f	07-Sep-2023	Matt Caswell	Copyright year updates Reviewed-by: Richard Levitte <levitte@openssl.org> Release: yes
# 7b8e27bc	22-Aug-2023	Bernd Edlinger	Avoid clobbering non-volatile XMM registers This affects some Poly1305 assembler functions which are only used for certain CPU types. Remove those functions for Windows targets, Avoid clobbering non-volatile XMM registers This affects some Poly1305 assembler functions which are only used for certain CPU types. Remove those functions for Windows targets, as a simple interim solution. Fixes #21522 Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/openssl/pull/21808) show more ...
Revision tags: openssl-3.0.0-alpha17, openssl-3.0.0-alpha16, openssl-3.0.0-alpha15, openssl-3.0.0-alpha14, OpenSSL_1_1_1k, openssl-3.0.0-alpha13, openssl-3.0.0-alpha12, OpenSSL_1_1_1j, openssl-3.0.0-alpha11, openssl-3.0.0-alpha10, OpenSSL_1_1_1i, openssl-3.0.0-alpha9, openssl-3.0.0-alpha8, openssl-3.0.0-alpha7, OpenSSL_1_1_1h
# cd84d883	26-Aug-2020	Jung-uk Kim	Ignore vendor name in Clang version number. For example, FreeBSD prepends "FreeBSD" to version string, e.g., FreeBSD clang version 11.0.0 (git@github.com:llvm/llvm-project.git llvmo Ignore vendor name in Clang version number. For example, FreeBSD prepends "FreeBSD" to version string, e.g., FreeBSD clang version 11.0.0 (git@github.com:llvm/llvm-project.git llvmorg-11.0.0-rc2-0-g414f32a9e86) Target: x86_64-unknown-freebsd13.0 Thread model: posix InstalledDir: /usr/bin This prevented us from properly detecting AVX support, etc. CLA: trivial Reviewed-by: Richard Levitte <levitte@openssl.org> Reviewed-by: Paul Dale <paul.dale@oracle.com> Reviewed-by: Ben Kaduk <kaduk@mit.edu> (Merged from https://github.com/openssl/openssl/pull/12725) show more ...
Revision tags: openssl-3.0.0-alpha6, openssl-3.0.0-alpha5, openssl-3.0.0-alpha4, openssl-3.0.0-alpha3, openssl-3.0.0-alpha2, openssl-3.0.0-alpha1
# 33388b44	23-Apr-2020	Matt Caswell	Update copyright year Reviewed-by: Richard Levitte <levitte@openssl.org> (Merged from https://github.com/openssl/openssl/pull/11616)
Revision tags: OpenSSL_1_1_1g, OpenSSL_1_1_1f, OpenSSL_1_1_1e
# a21314db	17-Feb-2020	David Benjamin	Also check for errors in x86_64-xlate.pl. In https://github.com/openssl/openssl/pull/10883, I'd meant to exclude the perlasm drivers since they aren't opening pipes and do not partic Also check for errors in x86_64-xlate.pl. In https://github.com/openssl/openssl/pull/10883, I'd meant to exclude the perlasm drivers since they aren't opening pipes and do not particularly need it, but I only noticed x86_64-xlate.pl, so arm-xlate.pl and ppc-xlate.pl got the change. That seems to have been fine, so be consistent and also apply the change to x86_64-xlate.pl. Checking for errors is generally a good idea. Reviewed-by: Richard Levitte <levitte@openssl.org> Reviewed-by: David Benjamin <davidben@google.com> (Merged from https://github.com/openssl/openssl/pull/10930) show more ...
# 98ad3fe8	31-Jan-2020	H.J. Lu	x86_64: Add endbranch at function entries for Intel CET To support Intel CET, all indirect branch targets must start with endbranch. Here is a patch to add endbranch to function entries x86_64: Add endbranch at function entries for Intel CET To support Intel CET, all indirect branch targets must start with endbranch. Here is a patch to add endbranch to function entries in x86_64 assembly codes which are indirect branch targets as discovered by running openssl testsuite on Intel CET machine and visual inspection. Verified with $ CC="gcc -Wl,-z,cet-report=error" ./Configure shared linux-x86_64 -fcf-protection $ make $ make test and $ CC="gcc -mx32 -Wl,-z,cet-report=error" ./Configure shared linux-x32 -fcf-protection $ make $ make test # <<< passed with https://github.com/openssl/openssl/pull/10988 Reviewed-by: Tomas Mraz <tmraz@fedoraproject.org> Reviewed-by: Richard Levitte <levitte@openssl.org> (Merged from https://github.com/openssl/openssl/pull/10982) show more ...
# 32be631c	17-Jan-2020	David Benjamin	Do not silently truncate files on perlasm errors If one of the perlasm xlate drivers crashes, OpenSSL's build will currently swallow the error and silently truncate the output to however Do not silently truncate files on perlasm errors If one of the perlasm xlate drivers crashes, OpenSSL's build will currently swallow the error and silently truncate the output to however far the driver got. This will hopefully fail to build, but better to check such things. Handle this by checking for errors when closing STDOUT (which is a pipe to the xlate driver). Reviewed-by: Richard Levitte <levitte@openssl.org> Reviewed-by: Tim Hudson <tjh@openssl.org> Reviewed-by: Tomas Mraz <tmraz@fedoraproject.org> (Merged from https://github.com/openssl/openssl/pull/10883) show more ...
# 9bb3e5fd	15-Jan-2020	Richard Levitte	For all assembler scripts where it matters, recognise clang > 9.x Fixes #10853 Reviewed-by: Paul Dale <paul.dale@oracle.com> (Merged from https://github.com/openssl/openssl/pull For all assembler scripts where it matters, recognise clang > 9.x Fixes #10853 Reviewed-by: Paul Dale <paul.dale@oracle.com> (Merged from https://github.com/openssl/openssl/pull/10855) show more ...
# 048fa13e	22-Dec-2019	Bernd Edlinger	Add some missing cfi frame info in poly1305-x86_64.pl Reviewed-by: Kurt Roeckx <kurt@roeckx.be> (Merged from https://github.com/openssl/openssl/pull/10678)
Revision tags: OpenSSL_1_0_2u
# 1aa89a7a	12-Sep-2019	Richard Levitte	Unify all assembler file generators They now generally conform to the following argument sequence: script.pl "$(PERLASM_SCHEME)" [ C preprocessor arguments ... ] \ Unify all assembler file generators They now generally conform to the following argument sequence: script.pl "$(PERLASM_SCHEME)" [ C preprocessor arguments ... ] \ $(PROCESSOR) <output file> However, in the spirit of being able to use these scripts manually, they also allow for no argument, or for only the flavour, or for only the output file. This is done by only using the last argument as output file if it's a file (it has an extension), and only using the first argument as flavour if it isn't a file (it doesn't have an extension). While we're at it, we make all $xlate calls the same, i.e. the $output argument is always quoted, and we always die on error when trying to start $xlate. There's a perl lesson in this, regarding operator priority... This will always succeed, even when it fails: open FOO, "something" \|\| die "ERR: $!"; The reason is that '\|\|' has higher priority than list operators (a function is essentially a list operator and gobbles up everything following it that isn't lower priority), and since a non-empty string is always true, so that ends up being exactly the same as: open FOO, "something"; This, however, will fail if "something" can't be opened: open FOO, "something" or die "ERR: $!"; The reason is that 'or' has lower priority that list operators, i.e. it's performed after the 'open' call. Reviewed-by: Matt Caswell <matt@openssl.org> (Merged from https://github.com/openssl/openssl/pull/9884) show more ...
Revision tags: OpenSSL_1_0_2t, OpenSSL_1_1_0l, OpenSSL_1_1_1d, OpenSSL_1_1_1c, OpenSSL_1_1_0k, OpenSSL_1_0_2s, OpenSSL_1_0_2r, OpenSSL_1_1_1b
# 49d3b641	06-Dec-2018	Richard Levitte	Following the license change, modify the boilerplates in crypto/poly1305/ [skip ci] Reviewed-by: Matt Caswell <matt@openssl.org> (Merged from https://github.com/openssl/openssl/ Following the license change, modify the boilerplates in crypto/poly1305/ [skip ci] Reviewed-by: Matt Caswell <matt@openssl.org> (Merged from https://github.com/openssl/openssl/pull/7810) show more ...
Revision tags: OpenSSL_1_0_2q, OpenSSL_1_1_0j, OpenSSL_1_1_1a, OpenSSL_1_1_1
# 1212818e	11-Sep-2018	Matt Caswell	Update copyright year Reviewed-by: Richard Levitte <levitte@openssl.org> (Merged from https://github.com/openssl/openssl/pull/7176)
Revision tags: OpenSSL_1_1_1-pre9, OpenSSL_1_0_2p, OpenSSL_1_1_0i
# 89778806	09-Jul-2018	Andy Polyakov	poly1305/asm/poly1305-x86_64.pl: fix solaris64-x86_64-cc build. Reviewed-by: Paul Dale <paul.dale@oracle.com> Reviewed-by: Rich Salz <rsalz@openssl.org> (Merged from https://github.c poly1305/asm/poly1305-x86_64.pl: fix solaris64-x86_64-cc build. Reviewed-by: Paul Dale <paul.dale@oracle.com> Reviewed-by: Rich Salz <rsalz@openssl.org> (Merged from https://github.com/openssl/openssl/pull/6676) show more ...
# 0edb109f	03-Jul-2018	Andy Polyakov	evp/e_chacha20_poly1305.c: further improve small-fragment TLS performance. Improvement coefficients vary with TLS fragment length and platform, on most Intel processors maximum improveme evp/e_chacha20_poly1305.c: further improve small-fragment TLS performance. Improvement coefficients vary with TLS fragment length and platform, on most Intel processors maximum improvement is ~50%, while on Ryzen - 80%. The "secret" is new dedicated ChaCha20_128 code path and vectorized xor helpers. Reviewed-by: Rich Salz <rsalz@openssl.org> (Merged from https://github.com/openssl/openssl/pull/6638) show more ...
Revision tags: OpenSSL_1_1_1-pre8, OpenSSL_1_1_1-pre7, OpenSSL_1_1_1-pre6, OpenSSL_1_1_1-pre5, OpenSSL_1_1_1-pre4, OpenSSL_1_0_2o, OpenSSL_1_1_0h, OpenSSL_1_1_1-pre3, OpenSSL_1_1_1-pre2, OpenSSL_1_1_1-pre1, OpenSSL_1_0_2n
# 4dfe4310	06-Dec-2017	Andy Polyakov	poly1305/asm/poly1305-x86_64.pl: add Knights Landing AVX512 result. Hardware used for benchmarking courtesy of Atos, experiments run by Romain Dolbeau <romain.dolbeau@atos.net>. Kudos! poly1305/asm/poly1305-x86_64.pl: add Knights Landing AVX512 result. Hardware used for benchmarking courtesy of Atos, experiments run by Romain Dolbeau <romain.dolbeau@atos.net>. Kudos! Reviewed-by: Rich Salz <rsalz@openssl.org> (Merged from https://github.com/openssl/openssl/pull/4855) show more ...
# a8f302e5	20-Nov-2017	Andy Polyakov	poly1305/asm/poly1305-x86_64.pl: switch to pure AVX512F. Convert AVX512F+VL+BW code path to pure AVX512F, so that it can be executed even on Knights Landing. Trigger for modification was poly1305/asm/poly1305-x86_64.pl: switch to pure AVX512F. Convert AVX512F+VL+BW code path to pure AVX512F, so that it can be executed even on Knights Landing. Trigger for modification was observation that AVX512 code paths can negatively affect overall Skylake-X system performance. Since we are likely to suppress AVX512F capability flag [at least on Skylake-X], conversion serves as kind of "investment protection". Reviewed-by: Rich Salz <rsalz@openssl.org> (Merged from https://github.com/openssl/openssl/pull/4758) show more ...
# 46f4e1be	12-Nov-2017	Josh Soref	Many spelling fixes/typo's corrected. Around 138 distinct errors found and fixed; thanks! Reviewed-by: Kurt Roeckx <kurt@roeckx.be> Reviewed-by: Tim Hudson <tjh@openssl.org> Many spelling fixes/typo's corrected. Around 138 distinct errors found and fixed; thanks! Reviewed-by: Kurt Roeckx <kurt@roeckx.be> Reviewed-by: Tim Hudson <tjh@openssl.org> Reviewed-by: Rich Salz <rsalz@openssl.org> (Merged from https://github.com/openssl/openssl/pull/3459) show more ...
Revision tags: OpenSSL_1_0_2m, OpenSSL_1_1_0g
# 64d92d74	20-Jul-2017	Andy Polyakov	x86_64 assembly pack: "optimize" for Knights Landing, add AVX-512 results. "Optimize" is in quotes because it's rather a "salvage operation" for now. Idea is to identify processor capabi x86_64 assembly pack: "optimize" for Knights Landing, add AVX-512 results. "Optimize" is in quotes because it's rather a "salvage operation" for now. Idea is to identify processor capability flags that drive Knights Landing to suboptimial code paths and mask them. Two flags were identified, XSAVE and ADCX/ADOX. Former affects choice of AES-NI code path specific for Silvermont (Knights Landing is of Silvermont "ancestry"). And 64-bit ADCX/ADOX instructions are effectively mishandled at decode time. In both cases we are looking at ~2x improvement. AVX-512 results cover even Skylake-X :-) Hardware used for benchmarking courtesy of Atos, experiments run by Romain Dolbeau <romain.dolbeau@atos.net>. Kudos! Reviewed-by: Rich Salz <rsalz@openssl.org> show more ...
# 54f8f9a1	30-Jun-2017	Andy Polyakov	x86_64 assembly pack: fill some blanks in Ryzen results. Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
Revision tags: OpenSSL_1_0_2l, OpenSSL_1_1_0f, OpenSSL-fips-2_0_16
# 0a5d1a38	18-Mar-2017	Andy Polyakov	poly1305/asm/poly1305-x86_64.pl: add poly1305_blocks_vpmadd52_8x. As hinted by its name new subroutine processes 8 input blocks in parallel by loading data to 512-bit registers. It still poly1305/asm/poly1305-x86_64.pl: add poly1305_blocks_vpmadd52_8x. As hinted by its name new subroutine processes 8 input blocks in parallel by loading data to 512-bit registers. It still needs more work, as it needs to handle some specific input lengths better. In this sense it's yet another intermediate step... Reviewed-by: Rich Salz <rsalz@openssl.org> show more ...
# 6cbfd94d	18-Mar-2017	Andy Polyakov	x86_64 assembly pack: add some Ryzen performance results. Reviewed-by: Tim Hudson <tjh@openssl.org>
# c2b93590	12-Mar-2017	Andy Polyakov	poly1305/asm/poly1305-x86_64.pl: add poly1305_blocks_vpmadd52_4x. As hinted by its name new subroutine processes 4 input blocks in parallel. It still operates on 256-bit registers and is poly1305/asm/poly1305-x86_64.pl: add poly1305_blocks_vpmadd52_4x. As hinted by its name new subroutine processes 4 input blocks in parallel. It still operates on 256-bit registers and is just another step toward full-blown AVX512IFMA procedure. Reviewed-by: Rich Salz <rsalz@openssl.org> show more ...
# e052083c	25-Feb-2017	Andy Polyakov	poly1305/asm/poly1305-x86_64.pl: minor AVX512 optimization. Reviewed-by: Rich Salz <rsalz@openssl.org>
12