Commit Graph

140 Commits

Author SHA1 Message Date
Jung-uk Kim
cd84d8832d Ignore vendor name in Clang version number.
For example, FreeBSD prepends "FreeBSD" to version string, e.g.,

FreeBSD clang version 11.0.0 (git@github.com:llvm/llvm-project.git llvmorg-11.0.0-rc2-0-g414f32a9e86)
Target: x86_64-unknown-freebsd13.0
Thread model: posix
InstalledDir: /usr/bin

This prevented us from properly detecting AVX support, etc.

CLA: trivial

Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: Paul Dale <paul.dale@oracle.com>
Reviewed-by: Ben Kaduk <kaduk@mit.edu>
(Merged from https://github.com/openssl/openssl/pull/12725)
2020-08-27 20:27:26 -07:00
Matt Caswell
33388b44b6 Update copyright year
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/11616)
2020-04-23 13:55:52 +01:00
David Benjamin
a21314dbbc Also check for errors in x86_64-xlate.pl.
In https://github.com/openssl/openssl/pull/10883, I'd meant to exclude
the perlasm drivers since they aren't opening pipes and do not
particularly need it, but I only noticed x86_64-xlate.pl, so
arm-xlate.pl and ppc-xlate.pl got the change.

That seems to have been fine, so be consistent and also apply the change
to x86_64-xlate.pl. Checking for errors is generally a good idea.

Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: David Benjamin <davidben@google.com>
(Merged from https://github.com/openssl/openssl/pull/10930)
2020-02-17 12:17:53 +10:00
H.J. Lu
98ad3fe82b x86_64: Add endbranch at function entries for Intel CET
To support Intel CET, all indirect branch targets must start with
endbranch.  Here is a patch to add endbranch to function entries
in x86_64 assembly codes which are indirect branch targets as
discovered by running openssl testsuite on Intel CET machine and
visual inspection.

Verified with

$ CC="gcc -Wl,-z,cet-report=error" ./Configure shared linux-x86_64 -fcf-protection
$ make
$ make test

and

$ CC="gcc -mx32 -Wl,-z,cet-report=error" ./Configure shared linux-x32 -fcf-protection
$ make
$ make test # <<< passed with https://github.com/openssl/openssl/pull/10988

Reviewed-by: Tomas Mraz <tmraz@fedoraproject.org>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/10982)
2020-02-15 22:15:03 +01:00
David Benjamin
32be631ca1 Do not silently truncate files on perlasm errors
If one of the perlasm xlate drivers crashes, OpenSSL's build will
currently swallow the error and silently truncate the output to however
far the driver got. This will hopefully fail to build, but better to
check such things.

Handle this by checking for errors when closing STDOUT (which is a pipe
to the xlate driver).

Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: Tim Hudson <tjh@openssl.org>
Reviewed-by: Tomas Mraz <tmraz@fedoraproject.org>
(Merged from https://github.com/openssl/openssl/pull/10883)
2020-01-22 18:11:30 +01:00
Richard Levitte
9bb3e5fd87 For all assembler scripts where it matters, recognise clang > 9.x
Fixes #10853

Reviewed-by: Paul Dale <paul.dale@oracle.com>
(Merged from https://github.com/openssl/openssl/pull/10855)
2020-01-17 08:55:45 +01:00
Bernd Edlinger
275a048ffc Add some missing cfi frame info in aesni-gcm-x86_64.pl
Reviewed-by: Kurt Roeckx <kurt@roeckx.be>
(Merged from https://github.com/openssl/openssl/pull/10677)
2019-12-23 20:23:27 +01:00
Fangming.Fang
31b59078c8 Optimize AES-GCM implementation on aarch64
Comparing to current implementation, this change can get more
performance improved by tunning the loop-unrolling factor in
interleave implementation as well as by enabling high level parallelism.

Performance(A72)

new
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes   16384 bytes
aes-128-gcm     113065.51k   375743.00k   848359.51k  1517865.98k  1964040.19k  1986663.77k
aes-192-gcm     110679.32k   364470.63k   799322.88k  1428084.05k  1826917.03k  1848967.17k
aes-256-gcm     104919.86k   352939.29k   759477.76k  1330683.56k  1663175.34k  1670430.72k

old
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes   16384 bytes
aes-128-gcm     115595.32k   382348.65k   855891.29k  1236452.35k  1425670.14k  1429793.45k
aes-192-gcm     112227.02k   369543.47k   810046.55k  1147948.37k  1286288.73k  1296941.06k
aes-256-gcm     111543.90k   361902.36k   769543.59k  1070693.03k  1208576.68k  1207511.72k

Change-Id: I28a2dca85c001a63a2a942e80c7c64f7a4fdfcf7

Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
Reviewed-by: Paul Dale <paul.dale@oracle.com>
(Merged from https://github.com/openssl/openssl/pull/9818)
2019-12-19 12:36:07 +10:00
Richard Levitte
1aa89a7a3a Unify all assembler file generators
They now generally conform to the following argument sequence:

    script.pl "$(PERLASM_SCHEME)" [ C preprocessor arguments ... ] \
              $(PROCESSOR) <output file>

However, in the spirit of being able to use these scripts manually,
they also allow for no argument, or for only the flavour, or for only
the output file.  This is done by only using the last argument as
output file if it's a file (it has an extension), and only using the
first argument as flavour if it isn't a file (it doesn't have an
extension).

While we're at it, we make all $xlate calls the same, i.e. the $output
argument is always quoted, and we always die on error when trying to
start $xlate.

There's a perl lesson in this, regarding operator priority...

This will always succeed, even when it fails:

    open FOO, "something" || die "ERR: $!";

The reason is that '||' has higher priority than list operators (a
function is essentially a list operator and gobbles up everything
following it that isn't lower priority), and since a non-empty string
is always true, so that ends up being exactly the same as:

    open FOO, "something";

This, however, will fail if "something" can't be opened:

    open FOO, "something" or die "ERR: $!";

The reason is that 'or' has lower priority that list operators,
i.e. it's performed after the 'open' call.

Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/9884)
2019-09-16 16:29:57 +02:00
Andy Polyakov
6465321e40 ARM64 assembly pack: add ThunderX2 results.
Reviewed-by: Tim Hudson <tjh@openssl.org>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/8776)
2019-04-17 21:08:13 +02:00
Shane Lontis
54d00677f3 cfi build fixes in x86-64 ghash assembly
Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Paul Dale <paul.dale@oracle.com>
(Merged from https://github.com/openssl/openssl/pull/8281)
2019-02-21 07:39:14 +10:00
David Benjamin
c0e8e5007b Fix some CFI issues in x86_64 assembly
The add/double shortcut in ecp_nistz256-x86_64.pl left one instruction
point that did not unwind, and the "slow" path in AES_cbc_encrypt was
not annotated correctly. For the latter, add
.cfi_{remember,restore}_state support to perlasm.

Next, fill in a bunch of functions that are missing no-op .cfi_startproc
and .cfi_endproc blocks. libunwind cannot unwind those stack frames
otherwise.

Finally, work around a bug in libunwind by not encoding rflags. (rflags
isn't a callee-saved register, so there's not much need to annotate it
anyway.)

These were found as part of ABI testing work in BoringSSL.

Reviewed-by: Richard Levitte <levitte@openssl.org>
GH: #8109
2019-02-17 23:39:51 +01:00
Andy Polyakov
3405db97e5 ARM assembly pack: make it Windows-friendly.
"Windows friendliness" means a) flipping .thumb and .text directives,
b) always generate Thumb-2 code when asked(*); c) Windows-specific
references to external OPENSSL_armcap_P.

(*) so far *some* modules were compiled as .code 32 even if Thumb-2
was targeted. It works at hardware level because processor can alternate
between the modes with no overhead. But clang --target=arm-windows's
builtin assembler just refuses to compile .code 32...

Reviewed-by: Paul Dale <paul.dale@oracle.com>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/8252)
2019-02-16 16:59:23 +01:00
Richard Levitte
81cae8ce09 Following the license change, modify the boilerplates in crypto/modes/
[skip ci]

Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/7803)
2018-12-06 15:06:37 +01:00
Matt Caswell
1212818eb0 Update copyright year
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/7176)
2018-09-11 13:45:17 +01:00
Andy Polyakov
ce5eb5e814 modes/asm/ghash-armv4.pl: address "infixes are deprecated" warnings.
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/6615)
2018-07-01 11:51:44 +02:00
Andy Polyakov
1753d12374 PA-RISC assembly pack: make it work with GNU assembler for HP-UX.
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/6583)
2018-06-25 16:45:48 +02:00
Andy Polyakov
41013cd63c PPC assembly pack: correct POWER9 results.
As it turns out originally published results were skewed by "turbo"
mode. VM apparently remains oblivious to dynamic frequency scaling,
and reports that processor operates at "base" frequency at all times.
While actual frequency gets increased under load.

Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/6406)
2018-06-03 21:20:06 +02:00
Matt Caswell
83cf7abf8e Update copyright year
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/6371)
2018-05-29 13:16:04 +01:00
Andy Polyakov
13f6857db1 PPC assembly pack: add POWER9 results.
Reviewed-by: Rich Salz <rsalz@openssl.org>
2018-05-10 11:44:21 +02:00
Matt Caswell
6ec5fce25e Update copyright year
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/6145)
2018-05-01 13:34:30 +01:00
Andy Polyakov
198a2ed791 ARM assembly pack: make it work with older assembler.
Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: Paul Dale <paul.dale@oracle.com>
(Merged from https://github.com/openssl/openssl/pull/6043)
2018-04-23 17:29:59 +02:00
Andy Polyakov
603ebe0352 modes/asm/ghashv8-armx.pl: handle lengths not divisible by 4x.
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/4830)
2017-12-04 17:21:23 +01:00
Andy Polyakov
aa7bf31698 modes/asm/ghashv8-armx.pl: optimize modulo-scheduled loop.
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/4830)
2017-12-04 17:21:20 +01:00
Andy Polyakov
9ee020f8dc modes/asm/ghashv8-armx.pl: modulo-schedule loop.
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/4830)
2017-12-04 17:21:15 +01:00
Andy Polyakov
7ff2fa4b92 modes/asm/ghashv8-armx.pl: implement 4x aggregate factor.
This initial commit is unoptimized reference version that handles
input lengths divisible by 4 blocks.

Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/4830)
2017-12-04 17:20:25 +01:00
Andy Polyakov
7533162322 ARMv8 assembly pack: add Qualcomm Kryo results.
[skip ci]

Reviewed-by: Tim Hudson <tjh@openssl.org>
2017-11-13 11:13:00 +01:00
Josh Soref
46f4e1bec5 Many spelling fixes/typo's corrected.
Around 138 distinct errors found and fixed; thanks!

Reviewed-by: Kurt Roeckx <kurt@roeckx.be>
Reviewed-by: Tim Hudson <tjh@openssl.org>
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/3459)
2017-11-11 19:03:10 -05:00
Patrick Steuer
bc4e831ccd s390x assembly pack: extend s390x capability vector.
Extend the s390x capability vector to store the longer facility list
available from z13 onwards. The bits indicating the vector extensions
are set to zero, if the kernel does not enable the vector facility.

Also add capability bits returned by the crypto instructions' query
functions.

Signed-off-by: Patrick Steuer <patrick.steuer@de.ibm.com>

Reviewed-by: Andy Polyakov <appro@openssl.org>
Reviewed-by: Tim Hudson <tjh@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/4542)
2017-10-30 14:31:32 +01:00
Patrick Steuer
af1d638730 s390x assembly pack: remove capability double-checking.
An instruction's QUERY function is executed at initialization, iff the required
MSA level is installed. Therefore, it is sufficient to check the bits returned
by the QUERY functions. The MSA level does not have to be checked at every
function call.
crypto/aes/asm/aes-s390x.pl: The AES key schedule must be computed if the
required KM or KMC function codes are not available. Formally, the availability
of a KMC function code does not imply the availability of the corresponding KM
function code.

Signed-off-by: Patrick Steuer <patrick.steuer@de.ibm.com>

Reviewed-by: Andy Polyakov <appro@openssl.org>
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/4501)
2017-10-17 21:55:33 +02:00
Rich Salz
e3713c365c Remove email addresses from source code.
Names were not removed.
Some comments were updated.
Replace Andy's address with openssl.org

Reviewed-by: Andy Polyakov <appro@openssl.org>
Reviewed-by: Paul Dale <paul.dale@oracle.com>
(Merged from https://github.com/openssl/openssl/pull/4516)
2017-10-13 10:06:59 -04:00
Andy Polyakov
64d92d7498 x86_64 assembly pack: "optimize" for Knights Landing, add AVX-512 results.
"Optimize" is in quotes because it's rather a "salvage operation"
for now. Idea is to identify processor capability flags that
drive Knights Landing to suboptimial code paths and mask them.
Two flags were identified, XSAVE and ADCX/ADOX. Former affects
choice of AES-NI code path specific for Silvermont (Knights Landing
is of Silvermont "ancestry"). And 64-bit ADCX/ADOX instructions are
effectively mishandled at decode time. In both cases we are looking
at ~2x improvement.

AVX-512 results cover even Skylake-X :-)

Hardware used for benchmarking courtesy of Atos, experiments run by
Romain Dolbeau <romain.dolbeau@atos.net>. Kudos!

Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-07-21 14:07:32 +02:00
Rich Salz
28f298e70a Undo commit cd359b2
Original text:
    Clarify use of |$end0| in stitched x86-64 AES-GCM code.

    There was some uncertainty about what the code is doing with |$end0|
    and whether it was necessary for |$len| to be a multiple of 16 or 96.
    Hopefully these added comments make it clear that the code is correct
    except for the caveat regarding low memory addresses.

    Change-Id: Iea546a59dc7aeb400f50ac5d2d7b9cb88ace9027
    Reviewed-on: https://boringssl-review.googlesource.com/7194
    Reviewed-by: Adam Langley <agl@google.com>

Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: Tim Hudson <tjh@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/3700)
2017-07-05 17:06:57 -04:00
David Benjamin
e195c8a256 Remove filename argument to x86 asm_init.
The assembler already knows the actual path to the generated file and,
in other perlasm architectures, is left to manage debug symbols itself.
Notably, in OpenSSL 1.1.x's new build system, which allows a separate
build directory, converting .pl to .s as the scripts currently do result
in the wrong paths.

This also avoids inconsistencies from some of the files using $0 and
some passing in the filename.

Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: Andy Polyakov <appro@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/3431)
2017-05-11 17:00:23 -04:00
Andy Polyakov
c93f06c12f ARMv4 assembly pack: harmonize Thumb-ification of iOS build.
Three modules were left behind in a285992763.

Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/2617)
2017-02-15 23:16:01 +01:00
Andy Polyakov
5c72e5ea7a modes/asm/*-x86_64.pl: add CFI annotations.
Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-02-13 14:14:24 +01:00
Andy Polyakov
384e6de4c7 x86_64 assembly pack: Win64 SEH face-lift.
- harmonize handlers with guidelines and themselves;
- fix some bugs in handlers;
- add missing handlers in chacha and ecp_nistz256 modules;

Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-02-06 08:21:42 +01:00
Andy Polyakov
ace05265d2 x86_64 assembly pack: add Goldmont performance results.
Reviewed-by: Richard Levitte <levitte@openssl.org>
2016-10-24 13:01:13 +02:00
David Benjamin
609b0852e4 Remove trailing whitespace from some files.
The prevailing style seems to not have trailing whitespace, but a few
lines do. This is mostly in the perlasm files, but a few C files got
them after the reformat. This is the result of:

  find . -name '*.pl' | xargs sed -E -i '' -e 's/( |'$'\t'')*$//'
  find . -name '*.c' | xargs sed -E -i '' -e 's/( |'$'\t'')*$//'
  find . -name '*.h' | xargs sed -E -i '' -e 's/( |'$'\t'')*$//'

Then bn_prime.h was excluded since this is a generated file.

Note mkerr.pl has some changes in a heredoc for some help output, but
other lines there lack trailing whitespace too.

Reviewed-by: Kurt Roeckx <kurt@openssl.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
2016-10-10 23:36:21 +01:00
Andy Polyakov
6cf412c473 modes/asm/ghash-armv4.pl: improve interoperability with Android NDK.
Reviewed-by: Tim Hudson <tjh@openssl.org>
2016-09-03 10:41:52 +02:00
Andy Polyakov
05ef4d1980 ARMv8 assembly pack: add Samsung Mongoose results.
Reviewed-by: Tim Hudson <tjh@openssl.org>
2016-08-16 12:47:49 +02:00
klemens
6025001707 spelling fixes, just comments and readme.
Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/1413)
2016-08-05 19:07:30 -04:00
Andy Polyakov
f198cc43a0 SPARC assembly pack: enforce V8+ ABI constraints.
Even though it's hard to imagine, it turned out that upper half of
arguments passed to V8+ subroutine can be non-zero.

["n" pseudo-instructions, such as srln being srl in 32-bit case and
srlx in 64-bit one, were implemented in binutils 2.10. It's assumed
that Solaris assembler implemented it around same time, i.e. 2000.]

Reviewed-by: Richard Levitte <levitte@openssl.org>
2016-07-01 14:25:08 +02:00
Brian Smith
cd359b2564 Clarify use of |$end0| in stitched x86-64 AES-GCM code.
There was some uncertainty about what the code is doing with |$end0|
and whether it was necessary for |$len| to be a multiple of 16 or 96.
Hopefully these added comments make it clear that the code is correct
except for the caveat regarding low memory addresses.

Change-Id: Iea546a59dc7aeb400f50ac5d2d7b9cb88ace9027
Reviewed-on: https://boringssl-review.googlesource.com/7194
Reviewed-by: Adam Langley <agl@google.com>

Signed-off-by: Andy Polyakov <appro@openssl.org>
Reviewed-by: Rich Salz <rsalz@openssl.org>
2016-06-27 10:15:05 +02:00
Andy Polyakov
cc77d0d84a modes/asm/ghashp8-ppc.pl: improve performance by 2.7x.
Reviewed-by: Rich Salz <rsalz@openssl.org>
2016-06-14 23:28:39 +02:00
Andy Polyakov
cfe1d9929e x86_64 assembly pack: tolerate spaces in source directory name.
[as it is now quoting $output is not required, but done just in case]

Reviewed-by: Richard Levitte <levitte@openssl.org>
2016-05-29 14:12:51 +02:00
Rich Salz
6aa36e8e5a Add OpenSSL copyright to .pl files
Reviewed-by: Richard Levitte <levitte@openssl.org>
2016-05-21 08:23:39 -04:00
Andy Polyakov
670ad0fbf6 s390x assembly pack: cache capability query results.
IBM argues that in certain scenarios capability query is really
expensive. At the same time it's asserted that query results can
be safely cached, because disabling CPACF is incompatible with
reboot-free operation.

Reviewed-by: Tim Hudson <tjh@openssl.org>
2016-04-25 11:53:45 +02:00
Richard Levitte
a5aa63a456 Fix some assembler generating scripts for better unification
Some of these scripts would recognise an output parameter if it looks
like a file path.  That works both in both the classic and new build
schemes.  Some fo these scripts would only recognise it if it's a
basename (i.e. no directory component).  Those need to be corrected,
as the output parameter in the new build scheme is more likely to
contain a directory component than not.

Reviewed-by: Andy Polyakov <appro@openssl.org>
2016-03-11 00:54:31 +01:00
Richard Levitte
4f0d5f1849 Unified - adapt the generation of modes assembler to use GENERATE
This gets rid of the BEGINRAW..ENDRAW sections in crypto/modes/build.info.

This also moves the assembler generating perl scripts to take the
output file name as last command line argument, where necessary.

Reviewed-by: Andy Polyakov <appro@openssl.org>
2016-03-09 11:09:26 +01:00