Like in r12-7519-g027e30414492d50feb2854aff38227b14300dc4b, I've done
git grep -v 'long long\|optab optab\|template template\|double double' | grep ' \([a-zA-Z]\+\) \1 '
This is just part of the changes, mostly for non-gcc directories.
I'll try to get to the rest soon. Obviously, the above command also
finds cases which are correct as is and shouldn't be changed, so one
needs to manually inspect everything.
I'd hope most of it is pretty obvious, but the config/ and libstdc++-v3/
hunks include a tweak in a license wording, though other copies of the
similar license have the wording right.
2024-04-02 Jakub Jelinek <jakub@redhat.com>
* Makefile.tpl: Fix duplicated words; returns returns ->
returns.
config/
* lcmessage.m4: Fix duplicated words; can can -> can,
package package -> package.
libdecnumber/
* decCommon.c (decFinalize): Fix duplicated words in
comment; the the -> the.
libgcc/
* unwind-dw2-fde.c (struct fde_accumulator): Fix duplicated
words in comment; is is -> is.
libgfortran/
* configure.host: Fix duplicated words; the the -> the.
libgm2/
* configure.host: Fix duplicated words; the the -> the.
libgomp/
* libgomp.texi (OpenMP 5.2): Fix duplicated words; with with ->
with.
(omp_target_associate_ptr): Fix duplicated words; either either ->
either.
(omp_init_allocator): Fix duplicated words; be be -> be.
(omp_realloc): Fix duplicated words; is is -> is.
(OMP_ALLOCATOR): Fix duplicated words; other other -> other.
* priority_queue.h (priority_queue_multi_p): Fix duplicated words;
to to -> to.
libiberty/
* regex.c (byte_re_match_2_internal): Fix duplicated words in comment;
next next -> next.
* dyn-string.c (dyn_string_init): Fix duplicated words in comment;
of of -> of.
libitm/
* beginend.cc (GTM::gtm_thread::begin_transaction): Fix duplicated
words in comment; not not -> not to.
libobjc/
* init.c (duplicate_classes): Fix duplicated words in comment; in in
-> in.
* sendmsg.c (__objc_prepare_dtable_for_class): Fix duplicated words
in comment; the the -> the.
* encoding.c (objc_layout_structure): Likewise.
libstdc++-v3/
* acinclude.m4: Fix duplicated words; file file -> file can.
* configure.host: Fix duplicated words; the the -> the.
libvtv/
* vtv_rts.cc (vtv_fail): Fix duplicated words; to to -> to.
* vtv_fail.cc (vtv_fail): Likewise.
Original bug report: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111731
The unwinding mechanism registers both the code range and the unwind
table itself within a b-tree lookup structure. That data structure
assumes that is consists of non-overlappping intervals. This
becomes a problem if the unwinding table is embedded within the
code itself, as now the intervals do overlap.
To fix this problem we now keep the unwind tables in a separate
b-tree, which prevents the overlap.
libgcc/ChangeLog:
PR libgcc/111731
* unwind-dw2-fde.c: Split unwind ranges if they contain the
unwind table.
The Knuth's division algorithm relies on the number of dividend limbs
to be greater ore equal to number of divisor limbs, which is why
I've added a special case for un < vn at the start of __divmodbitint4.
Unfortunately, my assumption that it then implies abs(v) > abs(u) and
so quotient must be 0 and remainder same as dividend is incorrect.
This is because this check is done before negation of the operands.
While bitint_reduce_prec reduces precision from clearly useless limbs,
the problematic case is when the dividend is unsigned or non-negative
and divisor is negative. We can have limbs (from MS to LS):
dividend: 0 M ?...
divisor: -1 -N ?...
where M has most significant bit set and M >= N (if M == N then it
also the following limbs matter) and the most significant limbs can
be even partial. In this case, the quotient should be -1 rather than
0. bitint_reduce_prec will reduce the precision of the dividend so
that M is the most significant limb, but can't reduce precision of the
divisor to more than having the -1 as most significant limb, because
-N doesn't have the most significant bit set.
The following patch fixes it by detecting this problematic case in the
un < vn handling, and instead of assuming q is 0 and r is u will
decrease vn by 1 because it knows the later code will negate the divisor
and it can be then expressed after negation in one fewer limbs.
2024-03-21 Jakub Jelinek <jakub@redhat.com>
PR libgcc/114397
* libgcc2.c (__divmodbitint4): Don't assume un < vn always means
abs(v) > abs(u), check for a special case of un + 1 == vn where
u is non-negative and v negative and after v's negation vn could
be reduced by 1.
* gcc.dg/torture/bitint-65.c: New test.
Tested with some simple toy examples where an exception is thrown in the
signal handler.
libgcc/ChangeLog:
* config/i386/gnu-unwind.h: Support unwinding x86_64 signal frames.
Signed-off-by: Flavio Cruz <flaviocruz@gmail.com>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
While for __mulbitint3 we actually don't negate anything and perform the
multiplication in unsigned style always, for __divmodbitint4 if the operands
aren't unsigned and are negative, we negate them first and then try to
negate them as needed at the end.
quotient is negated if just one of the operands was negated and the other
wasn't or vice versa, and remainder is negated if the first operand was
negated.
The case which doesn't work correctly is if due to limited range of the
operands we perform the division/modulo in some smaller number of limbs
and then extend it to the desired precision of the quotient and/or
remainder results. If they aren't negated, the extension is done with
memset to 0, if they are negated, the extension was done with memset
to -1. The problem is that if the quotient or remainder is zero,
then bitint_negate negates it again to zero (that is ok), but we should
then extend with memset to 0, not memset to -1.
The following patch achieves that by letting bitint_negate also check if
the negated operand is zero and changes the memset argument based on that.
2024-03-15 Jakub Jelinek <jakub@redhat.com>
PR libgcc/114327
* libgcc2.c (bitint_negate): Return UWtype bitwise or of all the limbs
before negation rather than void.
(__divmodbitint4): Determine whether to fill in the upper limbs after
negation based on whether bitint_negate returned 0 or non-zero, rather
then always filling with -1.
* gcc.dg/torture/bitint-63.c: New test.
This arranges that the byte order of the instruction sequences is
independent of the byte order of memory.
libgcc/ChangeLog:
* config/aarch64/heap-trampoline.c
(aarch64_trampoline_insns): Arrange to encode instructions as a
byte array so that the order is independent of memory byte order.
(struct aarch64_trampoline): Likewise.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
This allows the same trampoline pattern to be used on all linux variants
rather than restricting it to linux gnu.
PR target/113971
libgcc/ChangeLog:
* config/aarch64/heap-trampoline.c: Allow all linux variants.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
Fix a typo in __gthr_win32_abs_to_rel_time that caused it to return a
relative time in seconds instead of milliseconds. As a consequence,
__gthr_win32_cond_timedwait called SleepConditionVariableCS with a
1000x shorter timeout; this caused ~1000x more spurious wakeups in
CV timed waits such as std::condition_variable::wait_for or wait_until,
resulting generally in much higher CPU usage.
This can be demonstrated by this sample program:
```
int main() {
std::condition_variable cv;
std::mutex mx;
bool pass = false;
auto thread_fn = [&](bool timed) {
int wakeups = 0;
using sc = std::chrono::system_clock;
auto before = sc::now();
std::unique_lock<std::mutex> ml(mx);
if (timed) {
cv.wait_for(ml, std::chrono::seconds(2), [&]{
++wakeups;
return pass;
});
} else {
cv.wait(ml, [&]{
++wakeups;
return pass;
});
}
printf("pass: %d; wakeups: %d; elapsed: %d ms\n", pass, wakeups,
int((sc::now() - before) / std::chrono::milliseconds(1)));
pass = false;
};
{
// timed wait, let expire
std::thread t(thread_fn, true);
t.join();
}
{
// timed wait, wake up explicitly after 1 second
std::thread t(thread_fn, true);
std::this_thread::sleep_for(std::chrono::seconds(1));
{
std::unique_lock<std::mutex> ml(mx);
pass = true;
}
cv.notify_all();
t.join();
}
{
// non-timed wait, wake up explicitly after 1 second
std::thread t(thread_fn, false);
std::this_thread::sleep_for(std::chrono::seconds(1));
{
std::unique_lock<std::mutex> ml(mx);
pass = true;
}
cv.notify_all();
t.join();
}
return 0;
}
```
On builds based on non-affected threading models (e.g. POSIX on Linux,
or winpthreads or MCF on Win32) the output is something like
```
pass: 0; wakeups: 2; elapsed: 2000 ms
pass: 1; wakeups: 2; elapsed: 991 ms
pass: 1; wakeups: 2; elapsed: 996 ms
```
while with the Win32 threading model we get
```
pass: 0; wakeups: 1418; elapsed: 2000 ms
pass: 1; wakeups: 479; elapsed: 988 ms
pass: 1; wakeups: 2; elapsed: 992 ms
```
(notice the huge number of wakeups in the timed wait cases only).
This commit fixes the conversion, adjusting the final division by
NSEC100_PER_SEC to use NSEC100_PER_MSEC instead (already defined in the
file and not used in any other place, so probably just a typo).
libgcc/ChangeLog:
PR libgcc/113850
* config/i386/gthr-win32-cond.c (__gthr_win32_abs_to_rel_time):
fix absolute timespec to relative milliseconds count
conversion (it incorrectly returned seconds instead of
milliseconds); this avoids spurious wakeups in
__gthr_win32_cond_timedwait
Add x32 and IBT support to x86 heap trampoline implementation with a
testcase.
2024-02-13 Jakub Jelinek <jakub@redhat.com>
H.J. Lu <hjl.tools@gmail.com>
libgcc/
PR target/113855
* config/i386/heap-trampoline.c (trampoline_insns): Add IBT
support and pad to the multiple of 4 bytes. Use movabsq
instead of movabs in comments. Add -mx32 variant.
gcc/testsuite/
PR target/113855
* gcc.dg/heap-trampoline-1.c: New test.
* lib/target-supports.exp (check_effective_target_heap_trampoline):
New.
As I wrote earlier, I was seeing
FAIL: gcc.dg/torture/bitint-24.c -O0 execution test
FAIL: gcc.dg/torture/bitint-24.c -O2 execution test
with the ia32 _BitInt enablement patch on i686-linux. I thought
floatbitintxf.c was miscompiled with -O2 -march=i686 -mtune=generic, but it
turned out to be UB in it.
If a signed _BitInt to be converted to binary floating point has
(after sign extension from possible partial limb to full limb) one or
more most significant limbs equal to all ones and then in the limb below
(the most significant non-~(UBILtype)0 limb) has the most significant limb
cleared, like for 32-bit limbs
0x81582c05U, 0x0a8b01e4U, 0xc1b8b18fU, 0x2aac2a08U, -1U, -1U
then bitint_reduce_prec can't reduce it to that 0x2aac2a08U limb, so
msb is all ones and precision is negative (so it reduced precision from
161 to 192 bits down to 160 bits, in theory could go as low as 129 bits
but that wouldn't change anything on the following behavior).
But still iprec is negative, -160 here.
For that case (i.e. where we are dealing with an negative input), the
code was using 65 - __builtin_clzll (~msb) to compute how many relevant
bits we have from the msb. Unfortunately that invokes UB for msb all ones.
The right number of relevant bits in that case is 1 though (like for
-2 it is 2 and -4 or -3 3 as already computed) - all we care about from that
is that the most significant bit is set (i.e. the number is negative) and
the bits below that should be supplied from the limbs below.
So, the following patch fixes it by special casing it not to invoke UB.
For msb 0 we already have a special case from before (but that is also
different because msb 0 implies the whole number is 0 given the way
bitint_reduce_prec works - even if we have limbs like ..., 0x80000000U, 0U
the reduction can skip the most significant limb and msb then would be
the one below it), so if iprec > 0, we already don't call __builtin_clzll
on 0.
2024-02-13 Jakub Jelinek <jakub@redhat.com>
* soft-fp/bitint.h (FP_FROM_BITINT): If iprec < 0 and msb is all ones,
just set n to 1 instead of using __builtin_clzll (~msb).
The initial heap trampoline implementation was targeting 64b
platforms. As the PR demonstrates this creates an issue where it
is expected that the same symbols are exported for 32 and 64b.
Rather than conditionalize the exports and code-gen on x86_64,
this patch provides a basic implementation of the IA32 trampoline.
This also avoids potential user confusion, when a 32b target has
64b multilibs, and vice versa; which is the case for Darwin.
PR target/113855
gcc/ChangeLog:
* config/i386/darwin.h (DARWIN_HEAP_T_LIB): Moved to be
available to all sub-targets.
* config/i386/darwin32-biarch.h (DARWIN_HEAP_T_LIB): Delete.
* config/i386/darwin64-biarch.h (DARWIN_HEAP_T_LIB): Delete.
libgcc/ChangeLog:
* config.host: Add trampoline support to x?86-linux.
* config/i386/heap-trampoline.c (trampoline_insns): Provide
a variant for IA32.
(union ix86_trampoline): Likewise.
(__gcc_nested_func_ptr_created): Implement a basic trampoline
for IA32.
The ia32 _BitInt support revealed a bug in floatbitint?d.c.
As can be even guessed from how the code is written in the loop,
the intention was to set inexact to non-zero whenever the remainder
after division wasn't zero, but I've ended up just checking whether
the 2 least significant limbs of the remainder were non-zero.
Now, in the dfp/bitint-4.c test in one case the remainder happens
to have least significant 64 bits zero and then the higher limbs are
non-zero; with 32-bit limbs that means 2 least significant limbs are zero
and so the code acted as if it was exactly divisible.
Fixed thusly.
2024-02-10 Jakub Jelinek <jakub@redhat.com>
* soft-fp/floatbitintdd.c (__bid_floatbitintdd): Or in all remainder
limbs into inexact rather than just first two.
* soft-fp/floatbitintsd.c (__bid_floatbitintsd): Likewise.
* soft-fp/floatbitinttd.c (__bid_floatbitinttd): Likewise.
I've tried last night to enable _BitInt support for i?86-linux, and
a few spots in libgcc emitted -Wshift-count-overflow warnings and clearly
didn't do what it was supposed to do.
Fixed thusly.
2024-02-10 Jakub Jelinek <jakub@redhat.com>
* soft-fp/fixddbitint.c (__bid_fixddbitint): Fix up
BIL_TYPE_SIZE == 32 shifts.
* soft-fp/fixsdbitint.c (__bid_fixsdbitint): Likewise.
* soft-fp/fixtdbitint.c (__bid_fixtdbitint): Likewise.
* soft-fp/floatbitintdd.c (__bid_floatbitintdd): Likewise.
* soft-fp/floatbitinttd.c (__bid_floatbitinttd): Likewise.
Some exports were missed from the GCC-13 cycle, these are added here
along with the bitint-related ones added in GCC-14.
libgcc/ChangeLog:
* config/i386/libgcc-darwin.ver: Export bf and bitint-related
synbols.
As reported in the PR, all libgcc x86 symbol versions added after
GCC_7.0.0 were only added to i386/libgcc-glibc.ver, missing all of
libgcc-sol2.ver, libgcc-bsd.ver, and libgcc-darwin.ver.
This patch fixes this for Solaris/x86, adding all of them
(GCC_1[234].0.0) as GCC_14.0.0 to not retroactively change history.
Since this isn't the first time this happens, I've added a note to the
end of libgcc-glibc.ver to request notifying other maintainers in case
of additions.
Tested on i386-pc-solaris2.11.
2024-02-01 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
libgcc:
PR target/113700
* config/i386/libgcc-sol2.ver (GCC_14.0.0): Added all symbols from
i386/libgcc-glibc.ver (GCC_12.0.0, GCC_13.0.0, GCC_14.0.0).
* config/i386/libgcc-glibc.ver: Request notifications on updates.
SEH _Unwind_Resume_or_Rethrow invokes abort directly if
_Unwind_RaiseException doesn't manage to find a handler for the rethrown
exception; this is incorrect, as in this case std::terminate should be
invoked, allowing an application-provided terminate handler to handle
the situation instead of straight crashing the application through
abort.
The bug can be demonstrated with this simple test case:
===
static void custom_terminate_handler() {
fprintf(stderr, "custom_terminate_handler invoked\n");
std::exit(1);
}
int main(int argc, char *argv[]) {
std::set_terminate(&custom_terminate_handler);
if (argc < 2) return 1;
const char *mode = argv[1];
fprintf(stderr, "%s\n", mode);
if (strcmp(mode, "throw") == 0) {
throw std::exception();
} else if (strcmp(mode, "rethrow") == 0) {
try {
throw std::exception();
} catch (...) {
throw;
}
} else {
return 1;
}
return 0;
}
===
On all gcc builds with non-SEH exceptions, this will print
"custom_terminate_handler invoked" both if launched as ./a.out throw or
as ./a.out rethrow, on SEH builds instead if will work as expected only
with ./a.exe throw, but will crash with the "built-in" abort message
with ./a.exe rethrow.
This patch fixes the problem, forwarding back the error code to the
caller (__cxa_rethrow), that calls std::terminate if
_Unwind_Resume_or_Rethrow returns.
The change makes the code path coherent with SEH _Unwind_RaiseException,
and with the generic _Unwind_Resume_or_Rethrow from libgcc/unwind.inc
(used for SjLj and Dw2 exception backend).
libgcc/ChangeLog:
PR libgcc/113337
* unwind-seh.c (_Unwind_Resume_or_Rethrow): forward
_Unwind_RaiseException return code back to caller instead of
calling abort, allowing __cxa_rethrow to invoke std::terminate
in case of uncaught rethrown exception
The following testcase ends up with SIGFPE in __divmodbitint4.
The problem is a thinko in my attempt to implement Knuth's algorithm.
The algorithm does (where b is 65536, i.e. one larger than what
fits in their unsigned short word):
// Compute estimate qhat of q[j].
qhat = (un[j+n]*b + un[j+n-1])/vn[n-1];
rhat = (un[j+n]*b + un[j+n-1]) - qhat*vn[n-1];
again:
if (qhat >= b || qhat*vn[n-2] > b*rhat + un[j+n-2])
{ qhat = qhat - 1;
rhat = rhat + vn[n-1];
if (rhat < b) goto again;
}
The problem is that it uses a double-word / word -> double-word
division (and modulo), while all we have is udiv_qrnnd unless
we'd want to do further library calls, and udiv_qrnnd is a
double-word / word -> word division and modulo.
Now, as the algorithm description says, it can produce at most
word bits + 1 bit quotient. And I believe that actually the
highest qhat the original algorithm can produce is
(1 << word_bits) + 1. The algorithm performs earlier canonicalization
where both the divisor and dividend are shifted left such that divisor
has msb set. If it has msb set already before, no shifting occurs but
we start with added 0 limb, so in the first uv1:uv0 double-word uv1
is 0 and so we can't get too high qhat, if shifting occurs, the first
limb of dividend is shifted right by UWtype bits - shift count into
a new limb, so again in the first iteration in the uv1:uv0 double-word
uv1 doesn't have msb set while vv1 does and qhat has to fit into word.
In the following iterations, previous iteration should guarantee that
the previous quotient digit is correct. Even if the divisor was the
maximal possible vv1:all_ones_in_all_lower_limbs, if the old uv0:lower_limbs
would be larger or equal to the divisor, the previous quotient digit
would increase and another divisor would be subtracted, which I think
implies that in the next iteration in uv1:uv0 double-word uv1 <= vv1,
but uv0 could be up to all ones, e.g. in case of all lower limbs
of divisor being all ones and at least one dividend limb below uv0
being not all ones. So, we can e.g. for 64-bit UWtype see
uv1:uv0 / vv1 0x8000000000000000UL:0xffffffffffffffffUL / 0x8000000000000000UL
or 0xffffffffffffffffUL:0xffffffffffffffffUL / 0xffffffffffffffffUL
In all these cases (when uv1 == vv1 && uv0 >= uv1), qhat is
0x10000000000000001UL, i.e. 2 more than fits into UWtype result,
if uv1 == vv1 && uv0 < uv1 it would be 0x10000000000000000UL, i.e.
1 more than fits into UWtype result.
Because we only have udiv_qrnnd which can't deal with those too large
cases (SIGFPEs or otherwise invokes undefined behavior on those), I've
tried to handle the uv1 >= vv1 case separately, but for one thing
I thought it would be at most 1 larger than what fits, and for two
have actually subtracted vv1:vv1 from uv1:uv0 instead of subtracting
0:vv1 from uv1:uv0.
For the uv1 < vv1 case, the implementation already performs roughly
what the algorithm does.
Now, let's see what happens with the two possible extra cases in
the original algorithm.
If uv1 == vv1 && uv0 < uv1, qhat above would be b, so we take
if (qhat >= b, decrement qhat by 1 (it becomes b - 1), add
vn[n-1] aka vv1 to rhat and goto again if rhat < b (but because
qhat already fits we can goto to the again label in the uv1 < vv1
code). rhat in this case is uv0 and rhat + vv1 can but doesn't
have to overflow, say for uv0 42UL and vv1 0x8000000000000000UL
it will not (and so we should goto again), while for uv0
0x8000000000000000UL and vv1 0x8000000000000001UL it will (and
we shouldn't goto again).
If uv1 == vv1 && uv0 >= uv1, qhat above would be b + 1, so we
take if (qhat >= b, decrement qhat by 1 (it becomes b), add
vn[n-1] aka vv1 to rhat. But because vv1 has msb set and
rhat in this case is uv0 - vv1, the rhat + vv1 addition
certainly doesn't overflow, because (uv0 - vv1) + vv1 is uv0,
so in the algorithm we goto again, again take if (qhat >= b and
decrement qhat so it finally becomes b - 1, and add vn[n-1]
aka vv1 to rhat again. But this time I believe it must always
overflow, simply because we added (uv0 - vv1) + vv1 + vv1 and
vv1 has msb set, so already vv1 + vv1 must overflow. And
because it overflowed, it will not goto again.
So, I believe the following patch implements this correctly, by
subtracting vv1 from uv1:uv0 double-word once, then comparing
again if uv1 >= vv1. If that is true, subtract vv1 from uv1:uv0
again and add 2 * vv1 to rhat, no __builtin_add_overflow is needed
as we know it always overflowed and so won't goto again.
If after the first subtraction uv1 < vv1, use __builtin_add_overflow
when adding vv1 to rhat, because it can but doesn't have to overflow.
I've added an extra testcase which tests the behavior of all the changed
cases, so it has a case where uv1:uv0 / vv1 is 1:1, where it is
1:0 and rhat + vv1 overflows and where it is 1:0 and rhat + vv1 does not
overflow, and includes tests also from Zdenek's other failing tests.
2024-02-02 Jakub Jelinek <jakub@redhat.com>
PR libgcc/113604
* libgcc2.c (__divmodbitint4): If uv1 >= vv1, subtract
vv1 from uv1:uv0 once or twice as needed, rather than
subtracting vv1:vv1.
* gcc.dg/torture/bitint-53.c: New test.
* gcc.dg/torture/bitint-55.c: New test.
Rainer pointed out that __PFX__ and __FIXPTPFX__ prefix replacement is done
solely for libgcc-std.ver.in and not for the *.ver files in config.
I've used the __PFX__ prefix even in config/i386/libgcc-glibc.ver because it
was used for similar symbols in libgcc-std.ver.in, and that results in those
symbols being STB_LOCAL in libgcc_s.so.1. Tests still work because gcc by
default uses -static-libgcc when linking (unlike g++ etc.), but would
have failed when using -shared-libgcc (but I see nothing in the testsuite
actually testing with -shared-libgcc, so am not adding tests).
With the patch, libgcc_s.so.1 now exports
__fixtfbitint@@GCC_14.0.0 FUNC GLOBAL DEFAULT
__fixxfbitint@@GCC_14.0.0 FUNC GLOBAL DEFAULT
__floatbitintbf@@GCC_14.0.0 FUNC GLOBAL DEFAULT
__floatbitinthf@@GCC_14.0.0 FUNC GLOBAL DEFAULT
__floatbitinttf@@GCC_14.0.0 FUNC GLOBAL DEFAULT
__floatbitintxf@@GCC_14.0.0 FUNC GLOBAL DEFAULT
on x86_64-linux which it wasn't before.
2024-02-02 Jakub Jelinek <jakub@redhat.com>
PR target/113700
* config/i386/libgcc-glibc.ver (GCC_14.0.0): Remove __PFX prefixes
from symbol names.
I'm seeing hundreds of
In file included from ../../../libgcc/libgcc2.c:56:
../../../libgcc/libgcc2.h:32:13: warning: conflicting types for built-in function ‘__gcc_nested_func_ptr_created’; expected ‘void(void *, void *, void *)’
+[-Wbuiltin-declaration-mismatch]
32 | extern void __gcc_nested_func_ptr_created (void *, void *, void **);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
warnings.
Either we need to add like in r14-6218
#pragma GCC diagnostic ignored "-Wbuiltin-declaration-mismatch"
(but in that case because of the libgcc2.h prototype (why is it there?)
it would need to be also with #pragma GCC diagnostic push/pop around),
or we could go with just following how the builtins are prototyped on the
compiler side and only cast to void ** when dereferencing (which is in
a single spot in each TU).
2024-02-01 Jakub Jelinek <jakub@redhat.com>
PR libgcc/113402
* libgcc2.h (__gcc_nested_func_ptr_created): Change type of last
argument from void ** to void *.
* config/i386/heap-trampoline.c (__gcc_nested_func_ptr_created):
Change type of dst from void ** to void * and cast dst to void **
before dereferencing it.
* config/aarch64/heap-trampoline.c (__gcc_nested_func_ptr_created):
Likewise.
I'm seeing
../../../libgcc/shared-object.mk:14: warning: overriding recipe for target 'heap-trampoline.o'
../../../libgcc/shared-object.mk:14: warning: ignoring old recipe for target 'heap-trampoline.o'
../../../libgcc/shared-object.mk:17: warning: overriding recipe for target 'heap-trampoline_s.o'
../../../libgcc/shared-object.mk:17: warning: ignoring old recipe for target 'heap-trampoline_s.o'
This patch fixes that.
2024-02-01 Jakub Jelinek <jakub@redhat.com>
PR libgcc/113403
* config/i386/t-heap-trampoline: Add to LIB2ADDEHSHARED
i386/heap-trampoline.c rather than aarch64/heap-trampoline.c.
Use aarch64-asm.h in asm code consistently, this was started in
commit c608ada288
Author: Zac Walker <zacwalker@microsoft.com>
CommitDate: 2024-01-23 15:32:30 +0000
Ifdef `.hidden`, `.type`, and `.size` pseudo-ops for `aarch64-w64-mingw32` target
But that commit failed to remove some existing markings from asm files,
which means some objects got double marked with gnu property notes.
libgcc/ChangeLog:
* config/aarch64/crti.S: Remove stack marking.
* config/aarch64/crtn.S: Remove stack marking, include aarch64-asm.h
* config/aarch64/lse.S: Remove stack and GNU property markings.
In order to handle system security constraints during GCC build
and test and that most platform versions cannot link to libgcc_eh
since the unwinder there is incompatible with the system one.
1. We make the support functions weak definitions.
2. We include them as a CRT for platform conditions that do not
allow libgcc_eh.
3. We ensure that the weak symbols are exported from DSOs (which
includes exes on Darwin) so that the dynamic linker will
pick one instance (which avoids duplication of trampoline
caches).
PR libgcc/113403
gcc/ChangeLog:
* config/darwin.h (DARWIN_SHARED_WEAK_ADDS, DARWIN_WEAK_CRTS): New.
(REAL_LIBGCC_SPEC): Move weak CRT handling to separate spec.
* config/i386/darwin.h (DARWIN_HEAP_T_LIB): New.
* config/i386/darwin32-biarch.h (DARWIN_HEAP_T_LIB): New.
* config/i386/darwin64-biarch.h (DARWIN_HEAP_T_LIB): New.
* config/rs6000/darwin.h (DARWIN_HEAP_T_LIB): New.
libgcc/ChangeLog:
* config.host: Build libheap_t.a for i686/x86_64 Darwin.
* config/aarch64/heap-trampoline.c (HEAP_T_ATTR): New.
(allocate_tramp_ctrl): Allow a target to build this as a weak def.
(__gcc_nested_func_ptr_created): Likewise.
* config/i386/heap-trampoline.c (HEAP_T_ATTR): New.
(allocate_tramp_ctrl): Allow a target to build this as a weak def.
(__gcc_nested_func_ptr_created): Likewise.
* config/t-darwin: Build libheap_t.a (a CRT with heap trampoline
support).
This removes the heap trampoline support functions from libgcc.a and
adds them to libgcc_eh.a. They are also present in libgcc_s.
PR libgcc/113403
libgcc/ChangeLog:
* config/aarch64/t-heap-trampoline: Move the heap trampoline
support functions from libgcc.a to libgcc_eh.a.
* config/i386/t-heap-trampoline: Likewise.
The symbols for the functions supporting heap-based trampolines were
exported at an incorrect symbol version, the following patch fixes that.
As requested in the PR, this also renames __builtin_nested_func_ptr* to
__gcc_nested_func_ptr*. In carrying our the rename, we move the builtins
to use DEF_EXT_LIB_BUILTIN.
PR libgcc/113402
gcc/ChangeLog:
* builtins.cc (expand_builtin): Handle BUILT_IN_GCC_NESTED_PTR_CREATED
and BUILT_IN_GCC_NESTED_PTR_DELETED.
* builtins.def (BUILT_IN_GCC_NESTED_PTR_CREATED,
BUILT_IN_GCC_NESTED_PTR_DELETED): Make these builtins LIB-EXT and
rename the library fallbacks to __gcc_nested_func_ptr_created and
__gcc_nested_func_ptr_deleted.
* doc/invoke.texi: Rename these to __gcc_nested_func_ptr_created
and __gcc_nested_func_ptr_deleted.
* tree-nested.cc (finalize_nesting_tree_1): Use builtin_explicit for
BUILT_IN_GCC_NESTED_PTR_CREATED and BUILT_IN_GCC_NESTED_PTR_DELETED.
* tree.cc (build_common_builtin_nodes): Build the
BUILT_IN_GCC_NESTED_PTR_CREATED and BUILT_IN_GCC_NESTED_PTR_DELETED local
builtins only for non-explicit.
libgcc/ChangeLog:
* config/aarch64/heap-trampoline.c: Rename
__builtin_nested_func_ptr_created to __gcc_nested_func_ptr_created and
__builtin_nested_func_ptr_deleted to __gcc_nested_func_ptr_deleted.
* config/i386/heap-trampoline.c: Likewise.
* libgcc2.h: Likewise.
* libgcc-std.ver.in (GCC_7.0.0): Likewise and then move
__gcc_nested_func_ptr_created and
__gcc_nested_func_ptr_deleted from this symbol version to ...
(GCC_14.0.0): ... this one.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
Co-authored-by: Jakub Jelinek <jakub@redhat.com>
This is enough to get gfx1030 and gfx1100 working; there are still some test
failures to investigate, and probably some tuning to do.
gcc/ChangeLog:
* config/gcn/gcn-opts.h (TARGET_PACKED_WORK_ITEMS): Add TARGET_RDNA3.
* config/gcn/gcn-valu.md (all_convert): New iterator.
(<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>): New
define_expand, and rename the old one to ...
(*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>): ... this.
(extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>): Likewise, to ...
(extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>): .. this.
(*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_shift<exec>): New.
* config/gcn/gcn.cc (gcn_global_address_p): Use "offsetbits" correctly.
(gcn_hsa_declare_function_name): Update the vgpr counting for gfx1100.
* config/gcn/gcn.md (<u>mulhisi3): Disable on RDNA3.
(<u>mulqihi3_scalar): Likewise.
libgcc/ChangeLog:
* config/gcn/amdgcn_veclib.h (CDNA3_PLUS): Handle RDNA3.
libgomp/ChangeLog:
* config/gcn/time.c (RTC_TICKS): Configure RDNA3.
(omp_get_wtime): Add RDNA3-compatible variant.
* plugin/plugin-gcn.c (max_isa_vgprs): Tune for gfx1030 and gfx1100.
Signed-off-by: Andrew Stubbs <ams@baylibre.com>
Recent
change (https://gcc.gnu.org/pipermail/gcc-cvs/2023-December/394915.html)
added a generic SME support using `.hidden`, `.type`, and ``.size`
pseudo-ops in the assembly sources, `aarch64-w64-mingw32` does not
support the pseudo-ops though. This patch wraps usage of those
pseudo-ops using macros and ifdefs them for `__ELF__` define.
libgcc/
* config/aarch64/aarch64-asm.h (HIDDEN, SYMBOL_SIZE, SYMBOL_TYPE)
(ENTRY_ALIGN, GNU_PROPERTY): New macros.
* config/aarch64/__arm_sme_state.S: Use them.
* config/aarch64/__arm_tpidr2_save.S: Likewise.
* config/aarch64/__arm_za_disable.S: Likewise.
* config/aarch64/crti.S: Likewise.
* config/aarch64/lse.S: Likewise.
As discussed on IRC, the following patch uses may_alias attribute, so that
on targets like aarch64 where abi_limb_mode != limb_mode the library
accesses the limbs (half limbs of the ABI) in the arrays with conservative
alias set.
2024-01-12 Jakub Jelinek <jakub@redhat.com>
* libgcc2.h (UBILtype): New typedef with may_alias attribute.
(__mulbitint3, __divmodbitint4): Use UBILtype * instead of
UWtype * and const UBILtype * instead of const UWtype *.
* libgcc2.c (bitint_reduce_prec, bitint_mul_1, bitint_addmul_1,
__mulbitint3, bitint_negate, bitint_submul_1, __divmodbitint4):
Likewise.
* soft-fp/bitint.h (UBILtype): Change define into a typedef with
may_alias attribute.
Exception handling on nios2-linux-gnu with -fpic has been broken since
revision 790854ea76, "Use _dl_find_object
in _Unwind_Find_FDE". For whatever reason, this doesn't work on nios2.
Nios2 uses the GOT address as the base for DW_EH_PE_datarel
relocations in PIC; see my previous fix to make this work, revision
2d33dcfe9f, "Support for GOT-relative
DW_EH_PE_datarel encoding". So this may be a horrible bug in the ABI
or in my interpretation of it or just glibc's implementation of
_dl_find_object for this target, but there's existing code out there
that does things this way; and realistically, nobody is going to
re-engineer this now that the vendor has EOL'ed the nios2
architecture. So, just skip over the code trying to use
_dl_find_object on this target and fall back to the way that works.
I plan to backport this patch to the GCC 12 and GCC 13 branches as well.
libgcc/ChangeLog
* unwind-dw2-fde-dip.c (_Unwind_Find_FDE): Do not try to use
_dl_find_object on nios2; it doesn't work.
For now, for single-threaded GCN, nvptx target use only; extension for
multi-threaded offloading use is to follow later. Eventually switch to
libstdc++-v3/libsupc++ proper.
libgcc/
* c++-minimal/README: New.
* c++-minimal/guard.c: New.
* config/gcn/t-amdgcn (LIB2ADD): Add it.
* config/nvptx/t-nvptx (LIB2ADD): Likewise.
If we allow __strub_leave to allocate a frame on sparc, it will
overlap with a lot of the stack range we're supposed to scrub, because
of the large fixed-size outgoing args and register save area.
Unfortunately, setting up the PIC register seems to prevent the frame
pointer from being omitted.
Since the strub runtime doesn't issue calls or use global variables,
at least on sparc, disabling PIC to compile strub.c seems to do the
right thing.
for libgcc/ChangeLog
PR middle-end/112917
* config.host (sparc, sparc64): Enable...
* config/sparc/t-sparc: ... this new fragment.
The strub builtins are not suited for cross-unit inlining, they should
only be inlined by the builtin expanders, if at all. While testing on
sparc64, it occurred to me that, if libgcc was built with LTO enabled,
lto1 might inline them, and that would likely break things. So, make
sure they're clearly marked as not inlinable.
for libgcc/ChangeLog
* strub.c (ATTRIBUTE_NOINLINE): New.
(ATTRIBUTE_STRUB_CALLABLE): Add it.
(__strub_dummy_force_no_leaf): Drop it.
This adds initial support for function multiversioning on aarch64 using
the target_version and target_clones attributes. This loosely follows
the Beta specification in the ACLE [1], although with some differences
that still need to be resolved (possibly as follow-up patches).
Existing function multiversioning implementations are broken in various
ways when used across translation units. This includes placing
resolvers in the wrong translation units, and using symbol mangling that
callers to unintentionally bypass the resolver in some circumstances.
Fixing these issues for aarch64 will require modifications to our ACLE
specification. It will also require further adjustments to existing
middle end code, to facilitate different mangling and resolver
placement while preserving existing target behaviours.
The list of function multiversioning features specified in the ACLE is
also inconsistent with the list of features supported in target option
extensions. I intend to resolve some or all of these inconsistencies at
a later stage.
The target_version attribute is currently only supported in C++, since
this is the only frontend with existing support for multiversioning
using the target attribute. On the other hand, this patch happens to
enable multiversioning with the target_clones attribute in Ada and D, as
well as the entire C family, using their existing frontend support.
This patch also does not support the following aspects of the Beta
specification:
- The target_clones attribute should allow an implicit unlisted
"default" version.
- There should be an option to disable function multiversioning at
compile time.
- Unrecognised target names in a target_clones attribute should be
ignored (with an optional warning). This current patch raises an
error instead.
[1] https://github.com/ARM-software/acle/blob/main/main/acle.md#function-multi-versioning
gcc/ChangeLog:
* config/aarch64/aarch64-feature-deps.h (fmv_deps_<FEAT_NAME>):
Define aarch64_feature_flags mask foreach FMV feature.
* config/aarch64/aarch64-option-extensions.def: Use new macros
to define FMV feature extensions.
* config/aarch64/aarch64.cc (aarch64_option_valid_attribute_p):
Check for target_version attribute after processing target
attribute.
(aarch64_fmv_feature_data): New.
(aarch64_parse_fmv_features): New.
(aarch64_process_target_version_attr): New.
(aarch64_option_valid_version_attribute_p): New.
(get_feature_mask_for_version): New.
(compare_feature_masks): New.
(aarch64_compare_version_priority): New.
(build_ifunc_arg_type): New.
(make_resolver_func): New.
(add_condition_to_bb): New.
(dispatch_function_versions): New.
(aarch64_generate_version_dispatcher_body): New.
(aarch64_get_function_versions_dispatcher): New.
(aarch64_common_function_versions): New.
(aarch64_mangle_decl_assembler_name): New.
(TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P): New implementation.
(TARGET_OPTION_EXPANDED_CLONES_ATTRIBUTE): New implementation.
(TARGET_OPTION_FUNCTION_VERSIONS): New implementation.
(TARGET_COMPARE_VERSION_PRIORITY): New implementation.
(TARGET_GENERATE_VERSION_DISPATCHER_BODY): New implementation.
(TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): New implementation.
(TARGET_MANGLE_DECL_ASSEMBLER_NAME): New implementation.
* config/aarch64/aarch64.h (TARGET_HAS_FMV_TARGET_ATTRIBUTE):
Set target macro.
* config/arm/aarch-common.h (enum aarch_parse_opt_result): Add
new value to report duplicate FMV feature.
* common/config/aarch64/cpuinfo.h: New file.
libgcc/ChangeLog:
* config/aarch64/cpuinfo.c (enum CPUFeatures): Move to shared
copy in gcc/common
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/options_set_17.c: Reorder expected flags.
* gcc.target/aarch64/cpunative/native_cpu_0.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_13.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_16.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_17.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_18.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_19.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_20.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_21.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_6.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_7.c: Ditto.
This is added to enable function multiversioning, but can also be used
directly. The interface is chosen to match that used in LLVM's
compiler-rt, to facilitate cross-compiler compatibility.
The content of the patch is derived almost entirely from Pavel's prior
contributions to compiler-rt/lib/builtins/cpu_model.c. I have made minor
changes to align more closely with GCC coding style, and to exclude any code
from other LLVM contributors, and am adding this to GCC with Pavel's approval.
libgcc/ChangeLog:
* config/aarch64/t-aarch64: Include cpuinfo.c
* config/aarch64/cpuinfo.c: New file
(__init_cpu_features_constructor) New.
(__init_cpu_features_resolver) New.
(__init_cpu_features) New.
Co-authored-by: Pavel Iliin <Pavel.Iliin@arm.com>
This patch try to introduce the rwlock and split the read/write to
unit_root tree and unit_cache with rwlock instead of the mutex to
increase CPU efficiency. In the get_gfc_unit function, the percentage
to step into the insert_unit function is around 30%, in most instances,
we can get the unit in the phase of reading the unit_cache or unit_root
tree. So split the read/write phase by rwlock would be an approach to
make it more parallel.
BTW, the IPC metrics can gain around 9x in our test
server with 220 cores. The benchmark we used is
https://github.com/rwesson/NEAT
libgcc/ChangeLog:
* gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro.
(__gthrw): New function.
(__gthread_rwlock_rdlock): New function.
(__gthread_rwlock_tryrdlock): New function.
(__gthread_rwlock_wrlock): New function.
(__gthread_rwlock_trywrlock): New function.
(__gthread_rwlock_unlock): New function.
libgfortran/ChangeLog:
* io/async.c (DEBUG_LINE): New macro.
* io/async.h (RWLOCK_DEBUG_ADD): New macro.
(CHECK_RDLOCK): New macro.
(CHECK_WRLOCK): New macro.
(TAIL_RWLOCK_DEBUG_QUEUE): New macro.
(IN_RWLOCK_DEBUG_QUEUE): New macro.
(RDLOCK): New macro.
(WRLOCK): New macro.
(RWUNLOCK): New macro.
(RD_TO_WRLOCK): New macro.
(INTERN_RDLOCK): New macro.
(INTERN_WRLOCK): New macro.
(INTERN_RWUNLOCK): New macro.
* io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in
a comment.
(unit_lock): Remove including associated internal_proto.
(unit_rwlock): New declarations including associated internal_proto.
(dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock
instead of __gthread_mutex_lock and __gthread_mutex_unlock on
unit_lock.
* io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK on
unit_rwlock instead of LOCK and UNLOCK on unit_lock.
(st_write_done_worker): Likewise.
* io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules'
comment. Use unit_rwlock variable instead of unit_lock variable.
(get_gfc_unit_from_unit_root): New function.
(get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock
instead of LOCK and UNLOCK on unit_lock.
(close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead of
LOCK and UNLOCK on unit_lock.
(close_units): Likewise.
(newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on
unit_lock.
* io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock
instead of LOCK and UNLOCK on unit_lock.
(flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead
of LOCK and UNLOCK on unit_lock.
Some targets do not provide a prototype for fork, and compilation now
fails with an implicit-function-declaration error.
libgcc/
* libgcov-interface.c (__gcov_fork): Use __builtin_fork instead
of fork.
It was updated incorrectly in
commit dbbfb52b0e
Author: Szabolcs Nagy <szabolcs.nagy@arm.com>
CommitDate: 2023-12-08 11:29:06 +0000
libgcc: aarch64: Configure check for __getauxval
so regenerate it.
libgcc/ChangeLog:
* config.in: Regenerate.
To support the ZA lazy save scheme, the PCS requires the unwinder to
reset the SME state to PSTATE.SM=0, PSTATE.ZA=0, TPIDR2_EL0=0 on entry
to an exception handler. We use the __arm_za_disable SME runtime call
unconditionally to achieve this.
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#exceptions
The hidden alias is used to avoid a PLT and avoid inconsistent VPCS
marking (we don't rely on special PCS at the call site). In case of
static linking the SME runtime init code is linked in code that raises
exceptions.
libgcc/ChangeLog:
* config/aarch64/__arm_za_disable.S: Add hidden alias.
* config/aarch64/aarch64-unwind.h: Reset the SME state before
EH return via the _Unwind_Frames_Extra hook.
The call ABI for SME (Scalable Matrix Extension) requires a number of
helper routines which are added to libgcc so they are tied to the
compiler version instead of the libc version. See
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#sme-support-routines
The routines are in shared libgcc and static libgcc eh, even though
they are not related to exception handling. This is to avoid linking
a copy of the routines into dynamic linked binaries, because TPIDR2_EL0
block can be extended in the future which is better to handle in a
single place per process.
The support routines have to decide if SME is accessible or not. Linux
tells userspace if SME is accessible via AT_HWCAP2, otherwise a new
__aarch64_sme_accessible symbol was introduced that a libc can define.
Due to libgcc and libc build order, the symbol availability cannot be
checked so for __aarch64_sme_accessible an unistd.h feature test macro
is used while such detection mechanism is not available for __getauxval
so we rely on configure checks based on the target triplet.
Asm helper code is added to make writing the routines easier.
libgcc/ChangeLog:
* config/aarch64/t-aarch64: Add sources to the build.
* config/aarch64/__aarch64_have_sme.c: New file.
* config/aarch64/__arm_sme_state.S: New file.
* config/aarch64/__arm_tpidr2_restore.S: New file.
* config/aarch64/__arm_tpidr2_save.S: New file.
* config/aarch64/__arm_za_disable.S: New file.
* config/aarch64/aarch64-asm.h: New file.
* config/aarch64/libgcc-sme.ver: New file.
Add configure check for the __getauxval ABI symbol, which is always
available on aarch64 glibc, and may be available on other linux C
runtimes. For now only enabled on glibc, others have to override it
target_configargs=libgcc_cv_have___getauxval=yes
This is deliberately obscure as it should be auto detected, ideally
via a feature test macro in unistd.h (link time detection is not
possible since the libc may not be installed at libgcc build time),
but currently there is no such feature test mechanism.
Without __getauxval, libgcc cannot do runtime CPU feature detection
and has to assume only the build time known features are available.
libgcc/ChangeLog:
* config.in: Undef HAVE___GETAUXVAL.
* configure: Regenerate.
* configure.ac: Check for __getauxval.
Ideally SME support routines in libgcc are marked as variant PCS symbols
so check if as supports the directive.
libgcc/ChangeLog:
* config.in: Undef HAVE_AS_VARIANT_PCS.
* configure: Regenerate.
* configure.ac: Check for .variant_pcs.
When libgcc is being built in --disable-tls configuration or on
a target without native TLS support, one gets annoying warnings:
../../../../libgcc/emutls.c:61:7: warning: conflicting types for built-in function ‘__emutls_get_address’; expected ‘void *(void *)’ [-Wbuiltin-declaration-mismatch]
61 | void *__emutls_get_address (struct __emutls_object *);
| ^~~~~~~~~~~~~~~~~~~~
../../../../libgcc/emutls.c:63:6: warning: conflicting types for built-in function ‘__emutls_register_common’; expected ‘void(void *, unsigned int, unsigned int, void *)’
+[-Wbuiltin-declaration-mismatch]
63 | void __emutls_register_common (struct __emutls_object *, word, word, void *);
| ^~~~~~~~~~~~~~~~~~~~~~~~
../../../../libgcc/emutls.c:140:1: warning: conflicting types for built-in function ‘__emutls_get_address’; expected ‘void *(void *)’ [-Wbuiltin-declaration-mismatch]
140 | __emutls_get_address (struct __emutls_object *obj)
| ^~~~~~~~~~~~~~~~~~~~
../../../../libgcc/emutls.c:204:1: warning: conflicting types for built-in function ‘__emutls_register_common’; expected ‘void(void *, unsigned int, unsigned int, void *)’
+[-Wbuiltin-declaration-mismatch]
204 | __emutls_register_common (struct __emutls_object *obj,
| ^~~~~~~~~~~~~~~~~~~~~~~~
The thing is that in that case __emutls_get_address and
__emutls_register_common are builtins, and are declared with void *
arguments rather than struct __emutls_object *.
Now, struct __emutls_object is a type private to libgcc/emutls.c and the
middle-end creates on demand when calling the builtins a similar structure
(with small differences, like not having the union in there).
We have a precedent for this e.g. for fprintf or strftime builtins where
the builtins are created with magic fileptr_type_node or const_tm_ptr_type_node
types and then match it with user definition of pointers to some structure,
but I think for this case users should never define these functions
themselves nor call them and having special types for them in the compiler
would mean extra compile time spent during compiler initialization and more
GC data, so I think it is better to keep the compiler as is.
On the library side, there is an option to just follow what the
compiler is doing and do
EMUTLS_ATTR void
-__emutls_register_common (struct __emutls_object *obj,
+__emutls_register_common (void *xobj,
word size, word align, void *templ)
{
+ struct __emutls_object *obj = (struct __emutls_object *) xobj;
but that will make e.g. libabigail complain about ABI change in libgcc.
So, the patch just turns the warning off.
2023-12-06 Thomas Schwinge <thomas@codesourcery.com>
Jakub Jelinek <jakub@redhat.com>
PR libgcc/109289
* emutls.c: Add GCC diagnostic ignored "-Wbuiltin-declaration-mismatch"
pragma.
This patch adds the strub attribute for function and variable types,
command-line options, passes and adjustments to implement it,
documentation, and tests.
Stack scrubbing is implemented in a machine-independent way: functions
with strub enabled are modified so that they take an extra stack
watermark argument, that they update with their stack use, and the
caller can then zero it out once it regains control, whether by return
or exception. There are two ways to go about it: at-calls, that
modifies the visible interface (signature) of the function, and
internal, in which the body is moved to a clone, the clone undergoes
the interface change, and the function becomes a wrapper, preserving
its original interface, that calls the clone and then clears the stack
used by it.
Variables can also be annotated with the strub attribute, so that
functions that read from them get stack scrubbing enabled implicitly,
whether at-calls, for functions only usable within a translation unit,
or internal, for functions whose interfaces must not be modified.
There is a strict mode, in which functions that have their stack
scrubbed can only call other functions with stack-scrubbing
interfaces, or those explicitly marked as callable from strub
contexts, so that an entire call chain gets scrubbing, at once or
piecemeal depending on optimization levels. In the default mode,
relaxed, this requirement is not enforced by the compiler.
The implementation adds two IPA passes, one that assigns strub modes
early on, another that modifies interfaces and adds calls to the
builtins that jointly implement stack scrubbing. Another builtin,
that obtains the stack pointer, is added for use in the implementation
of the builtins, whether expanded inline or called in libgcc.
There are new command-line options to change operation modes and to
force the feature disabled; it is enabled by default, but it has no
effect and is implicitly disabled if the strub attribute is never
used. There are also options meant to use for testing the feature,
enabling different strubbing modes for all (viable) functions.
for gcc/ChangeLog
* Makefile.in (OBJS): Add ipa-strub.o.
(GTFILES): Add ipa-strub.cc.
* builtins.def (BUILT_IN_STACK_ADDRESS): New.
(BUILT_IN___STRUB_ENTER): New.
(BUILT_IN___STRUB_UPDATE): New.
(BUILT_IN___STRUB_LEAVE): New.
* builtins.cc: Include ipa-strub.h.
(STACK_STOPS, STACK_UNSIGNED): Define.
(expand_builtin_stack_address): New.
(expand_builtin_strub_enter): New.
(expand_builtin_strub_update): New.
(expand_builtin_strub_leave): New.
(expand_builtin): Call them.
* common.opt (fstrub=*): New options.
* doc/extend.texi (strub): New type attribute.
(__builtin_stack_address): New function.
(Stack Scrubbing): New section.
* doc/invoke.texi (-fstrub=*): New options.
(-fdump-ipa-*): New passes.
* gengtype-lex.l: Ignore multi-line pp-directives.
* ipa-inline.cc: Include ipa-strub.h.
(can_inline_edge_p): Test strub_inlinable_to_p.
* ipa-split.cc: Include ipa-strub.h.
(execute_split_functions): Test strub_splittable_p.
* ipa-strub.cc, ipa-strub.h: New.
* passes.def: Add strub_mode and strub passes.
* tree-cfg.cc (gimple_verify_flow_info): Note on debug stmts.
* tree-pass.h (make_pass_ipa_strub_mode): Declare.
(make_pass_ipa_strub): Declare.
(make_pass_ipa_function_and_variable_visibility): Fix
formatting.
* tree-ssa-ccp.cc (optimize_stack_restore): Keep restores
before strub leave.
* attribs.cc: Include ipa-strub.h.
(decl_attributes): Support applying attributes to function
type, rather than pointer type, at handler's request.
(comp_type_attributes): Combine strub_comptypes and target
comp_type results.
* doc/tm.texi.in (TARGET_STRUB_USE_DYNAMIC_ARRAY): New.
(TARGET_STRUB_MAY_USE_MEMSET): New.
* doc/tm.texi: Rebuilt.
* cgraph.h (symtab_node::reset): Add preserve_comdat_group
param, with a default.
* cgraphunit.cc (symtab_node::reset): Use it.
for gcc/c-family/ChangeLog
* c-attribs.cc: Include ipa-strub.h.
(handle_strub_attribute): New.
(c_common_attribute_table): Add strub.
for gcc/ada/ChangeLog
* gcc-interface/trans.cc: Include ipa-strub.h.
(gigi): Make internal decls for targets of compiler-generated
calls strub-callable too.
(build_raise_check): Likewise.
* gcc-interface/utils.cc: Include ipa-strub.h.
(handle_strub_attribute): New.
(gnat_internal_attribute_table): Add strub.
for gcc/testsuite/ChangeLog
* c-c++-common/strub-O0.c: New.
* c-c++-common/strub-O1.c: New.
* c-c++-common/strub-O2.c: New.
* c-c++-common/strub-O2fni.c: New.
* c-c++-common/strub-O3.c: New.
* c-c++-common/strub-O3fni.c: New.
* c-c++-common/strub-Og.c: New.
* c-c++-common/strub-Os.c: New.
* c-c++-common/strub-all1.c: New.
* c-c++-common/strub-all2.c: New.
* c-c++-common/strub-apply1.c: New.
* c-c++-common/strub-apply2.c: New.
* c-c++-common/strub-apply3.c: New.
* c-c++-common/strub-apply4.c: New.
* c-c++-common/strub-at-calls1.c: New.
* c-c++-common/strub-at-calls2.c: New.
* c-c++-common/strub-defer-O1.c: New.
* c-c++-common/strub-defer-O2.c: New.
* c-c++-common/strub-defer-O3.c: New.
* c-c++-common/strub-defer-Os.c: New.
* c-c++-common/strub-internal1.c: New.
* c-c++-common/strub-internal2.c: New.
* c-c++-common/strub-parms1.c: New.
* c-c++-common/strub-parms2.c: New.
* c-c++-common/strub-parms3.c: New.
* c-c++-common/strub-relaxed1.c: New.
* c-c++-common/strub-relaxed2.c: New.
* c-c++-common/strub-short-O0-exc.c: New.
* c-c++-common/strub-short-O0.c: New.
* c-c++-common/strub-short-O1.c: New.
* c-c++-common/strub-short-O2.c: New.
* c-c++-common/strub-short-O3.c: New.
* c-c++-common/strub-short-Os.c: New.
* c-c++-common/strub-strict1.c: New.
* c-c++-common/strub-strict2.c: New.
* c-c++-common/strub-tail-O1.c: New.
* c-c++-common/strub-tail-O2.c: New.
* c-c++-common/torture/strub-callable1.c: New.
* c-c++-common/torture/strub-callable2.c: New.
* c-c++-common/torture/strub-const1.c: New.
* c-c++-common/torture/strub-const2.c: New.
* c-c++-common/torture/strub-const3.c: New.
* c-c++-common/torture/strub-const4.c: New.
* c-c++-common/torture/strub-data1.c: New.
* c-c++-common/torture/strub-data2.c: New.
* c-c++-common/torture/strub-data3.c: New.
* c-c++-common/torture/strub-data4.c: New.
* c-c++-common/torture/strub-data5.c: New.
* c-c++-common/torture/strub-indcall1.c: New.
* c-c++-common/torture/strub-indcall2.c: New.
* c-c++-common/torture/strub-indcall3.c: New.
* c-c++-common/torture/strub-inlinable1.c: New.
* c-c++-common/torture/strub-inlinable2.c: New.
* c-c++-common/torture/strub-ptrfn1.c: New.
* c-c++-common/torture/strub-ptrfn2.c: New.
* c-c++-common/torture/strub-ptrfn3.c: New.
* c-c++-common/torture/strub-ptrfn4.c: New.
* c-c++-common/torture/strub-pure1.c: New.
* c-c++-common/torture/strub-pure2.c: New.
* c-c++-common/torture/strub-pure3.c: New.
* c-c++-common/torture/strub-pure4.c: New.
* c-c++-common/torture/strub-run1.c: New.
* c-c++-common/torture/strub-run2.c: New.
* c-c++-common/torture/strub-run3.c: New.
* c-c++-common/torture/strub-run4.c: New.
* c-c++-common/torture/strub-run4c.c: New.
* c-c++-common/torture/strub-run4d.c: New.
* c-c++-common/torture/strub-run4i.c: New.
* g++.dg/strub-run1.C: New.
* g++.dg/torture/strub-init1.C: New.
* g++.dg/torture/strub-init2.C: New.
* g++.dg/torture/strub-init3.C: New.
* gnat.dg/strub_attr.adb, gnat.dg/strub_attr.ads: New.
* gnat.dg/strub_ind.adb, gnat.dg/strub_ind.ads: New.
for libgcc/ChangeLog
* Makefile.in (LIB2ADD): Add strub.c.
* libgcc2.h (__strub_enter, __strub_update, __strub_leave):
Declare.
* strub.c: New.
* libgcc-std.ver.in (__strub_enter): Add to GCC_14.0.0.
(__strub_update, __strub_leave): Likewise.
read_encoded_value_with_base has an ifdef'd code path conditional on __FDPIC__
which was calling _Unwind_gnu_Find_got without a prototype. This naturally
caused various build failures.
This adds a suitable prototype.
Pushed to the trunk.
libgcc
* unwind-pe.h (_Unwind_gnu_Find_got): Add prototype.
The rx port has a bunch of what I presume are ABI compatibility functions in
libgcc. Those compatibility functions routines such as __eqdf2 from libgcc,
but without a prototype. This patch adds the missing prototypes.
libgcc/
* config/rx/rx-abi-functions.c (__ltdf2, __gtdf2): Add prototype.
(__ledf2, __gedf2, __eqdf2, __nedf2): Likewise.
(__ltsf2, __gtsf2, __lesf2, __gesf2, __eqsf2, __nesf2): Likewise.
Two issues prevent the frv-elf port from building after the C99 changes. First
the trampoline code emitted into libgcc has calls to exit, but no prototype.
Adding a trivial prototype for exit() into the macro fixes that little goof.
Second, frvbegin.c has a call to atexit, so a quick prototype is added into
frvbegin.c to fix that problem.
That's enough to get the compiler building again.
gcc/
* config/frv/frv.h (TRANSFER_FROM_TRAMPOLINE): Add prototype for exit.
libgcc/
* config/frv/frvbegin.c (atexit): Add prototype.
The libgcc-exported runtime component of control flow redundancy
hardening was missing symbol versioning information. Add it.
for libgcc/ChangeLog
* libgcc-std.ver.in (__hardcfr_check): Add to GCC_14.0.0.
__sync_val_compare_and_swap may be used on 128-bit types and either calls the
outline atomic code or uses an inline loop. On AArch64 LDXP is only atomic if
the value is stored successfully using STXP, but the current implementations
do not perform the store if the comparison fails. In this case the value
returned is not read atomically.
gcc/ChangeLog:
PR target/111404
* config/aarch64/aarch64.cc (aarch64_split_compare_and_swap):
For 128-bit store the loaded value and loop if needed.
libgcc/ChangeLog:
PR target/111404
* config/aarch64/lse.S (__aarch64_cas16_acq_rel): Execute STLXP using
either new value or loaded value.
My previous patch to add an implementation of __sync_syncrhonize with
a warning trips a testsuite failure in fortran (and possibly other
languages as well) as the framework expects no blank lines in the
output, but this warning was generating one. So remove the newline
from the end of the message and rely on the one added by the linker
instead.
Since we're there, remove the trailing period from the message as
well, since the convention seems to be not to have one.
libgcc/
* config/arm/lib1funcs.S (__sync_synchronize): Adjust warning message.
Prior to Armv6 there was no architected method to synchronize data
across processors. Armv6 saw the first introduction of
multi-processor support, using a CP15 operation; but this was
deprecated in Armv7 and is not supported on m-profile devices of any
form. Armv7 (and armv6-m) and later support data synchronization via
the DMB instruction.
This all leads to difficulties when linking programs as the user
generally needs to know which synchronization method is needed, but
there seems no easy way around this, when there are no OS-related
primitives available.
I've addressed this by adding multiple variants of __sync_synchronize
to libgcc, one for each of the above use cases. I've named these
__sync_synchronize_none, __sync_synchronize_cp15dmb and
__sync_synchronize_dmb. I've also added three specs files that can be
used to direct the linker to pick the appropriate implementation.
Using specs fragments for this is preferable to directing the user to
directly use --defsym as the latter has to be placed at the correct
position on the command line to be effective and the spec rule ensures
this automatically.
I've also added a default implementation of __sync_synchronize. The
default implementation will use DMB if that is available in the target
ISA, or fall back to a nul-implementation if it isn't. In the latter
case it will cause the linker (GNU LD) to emit a warning that
specifies how to pick a specific implementation. I've chosen not to
permit this default to use the CP15 solution as that has been
deprecated.
libgcc:
* config.host (arm*-*-eabi* | arm*-*-rtems*):
Add arm/t-sync to the makefile rules.
* config/arm/lib1funcs.S (__sync_synchronize_none)
(__sync_synchronize_cp15dmb, __sync_synchronize_dmb)
(__sync_synchronize): New functions.
* config/arm/t-sync: New file.
* config/arm/sync-none.specs: Likewise.
* config/arm/sync-dmb.specs: Likewise.
* config/arm/sync-cp15dmb.specs: Likewise.
The function __hardcfr_check_fail in hardcfr.c is internal and static
inline. It receives many arguments, which require more than five
registers to be passed in bpf-none-unknown targets. BPF is limited to
that number of registers to pass arguments, and therefore libgcc fails
to build in that target. This patch marks the function with the
always_inline attribute, fixing the bpf build.
Tested in bpf-unknown-none target and x86_64-linux-gnu host.
libgcc/ChangeLog:
* hardcfr.c (__hardcfr_check_fail): Mark as always_inline.
The code coverage support uses counters to determine which edges in the control
flow graph were executed. If a counter overflows, then the code coverage
information is invalid. Therefore the counter type should be a 64-bit integer.
In multi-threaded applications, it is important that the counter increments are
atomic. This is not the case by default. The user can enable atomic counter
increments through the -fprofile-update=atomic and
-fprofile-update=prefer-atomic options.
If the target supports 64-bit atomic operations, then everything is fine. If
not and -fprofile-update=prefer-atomic was chosen by the user, then non-atomic
counter increments will be used. However, if the target does not support the
required atomic operations and -fprofile-atomic=update was chosen by the user,
then a warning was issued and as a forced fallback to non-atomic operations was
done. This is probably not what a user wants. There is still hardware on the
market which does not have atomic operations and is used for multi-threaded
applications. A user which selects -fprofile-update=atomic wants consistent
code coverage data and not random data.
This patch removes the fallback to non-atomic operations for
-fprofile-update=atomic the target platform supports libatomic. To
mitigate potential performance issues an optimization for systems which
only support 32-bit atomic operations is provided. Here, the edge
counter increments are done like this:
low = __atomic_add_fetch_4 (&counter.low, 1, MEMMODEL_RELAXED);
high_inc = low == 0 ? 1 : 0;
__atomic_add_fetch_4 (&counter.high, high_inc, MEMMODEL_RELAXED);
In gimple_gen_time_profiler() this split operation cannot be used, since the
updated counter value is also required. Here, a library call is emitted. This
is not a performance issue since the update is only done if counters[0] == 0.
gcc/c-family/ChangeLog:
* c-cppbuiltin.cc (c_cpp_builtins): Define
__LIBGCC_HAVE_LIBATOMIC for libgcov.
gcc/ChangeLog:
* doc/invoke.texi (-fprofile-update): Clarify default method. Document
the atomic method behaviour.
* tree-profile.cc (enum counter_update_method): New.
(counter_update): Likewise.
(gen_counter_update): Use counter_update_method. Split the
atomic counter update in two 32-bit atomic operations if
necessary.
(tree_profiling): Select counter_update_method.
libgcc/ChangeLog:
* libgcov.h (GCOV_SUPPORTS_ATOMIC): Always define it.
Set it also to 1, if __LIBGCC_HAVE_LIBATOMIC is defined.
libgcc/config/avr/libf7/
* libf7-const.def [F7MOD_sinh_]: Add MiniMax polynomial.
* libf7.c (f7_sinh): Use it instead of (exp(x) - exp(-x)) / 2
when |x| < 0.5 to avoid loss of precision due to cancellation.
Check for non-zero denorm in __adddf3. Need to check both the upper and
lower 32-bit chunks of a 64-bit float for a non-zero value when
checking to see if the value is -0.
Fix __addsf3 when the sum exponent is exactly 0xff to ensure that
produces infinity and not nan.
Handle converting NaN/inf values between formats.
Handle underflow and overflow when truncating.
Write a replacement for __fixxfsi so that it does not raise extra
exceptions during an extra conversion from long double to double.
libgcc/
* config/m68k/lb1sf68.S (__adddf3): Properly check for non-zero denorm.
(__divdf3): Restore sign bit properly.
(__addsf3): Correct exponent check.
* config/m68k/fpgnulib.c (EXPMASK): Define.
(__extendsfdf2): Handle Inf and NaN properly.
(__truncdfsf2): Handle underflow and overflow correctly.
(__extenddfxf2): Handle underflow, denorms, Inf and NaN correctly.
(__truncxfdf2): Handle underflow and denorms correctly.
(__fixxfsi): Reimplement.
The following patch adds the missing
{unsigned ,}__int128 <-> _Decimal{32,64,128}
conversion support into libgcc.a on top of the _BitInt support
(doing it without that would be larger amount of code and I hope all
the targets which support __int128 will eventually support _BitInt,
after all it is a required part of C23) and because it is in libgcc.a
only, it doesn't hurt that much if it is added for some architectures
only in GCC 15.
Initially I thought about doing this on the compiler side, but doing
it on the library side seems to be easier and more -Os friendly.
The tests currently require bitint effective target, that can be
removed when all the int128 targets support bitint.
2023-11-09 Jakub Jelinek <jakub@redhat.com>
PR libgcc/65833
libgcc/
* config/t-softfp (softfp_bid_list): Add
{U,}TItype <-> _Decimal{32,64,128} conversions.
* soft-fp/floattisd.c: New file.
* soft-fp/floattidd.c: New file.
* soft-fp/floattitd.c: New file.
* soft-fp/floatuntisd.c: New file.
* soft-fp/floatuntidd.c: New file.
* soft-fp/floatuntitd.c: New file.
* soft-fp/fixsdti.c: New file.
* soft-fp/fixddti.c: New file.
* soft-fp/fixtdti.c: New file.
* soft-fp/fixunssdti.c: New file.
* soft-fp/fixunsddti.c: New file.
* soft-fp/fixunstdti.c: New file.
gcc/testsuite/
* gcc.dg/dfp/int128-1.c: New test.
* gcc.dg/dfp/int128-2.c: New test.
* gcc.dg/dfp/int128-3.c: New test.
* gcc.dg/dfp/int128-4.c: New test.
This adds support for the 'indirect' clause in the 'declare target'
directive. Functions declared as indirect may be called via function
pointers passed from the host in offloaded code.
Virtual calls to member functions via the object pointer in C++ are
currently not supported in target regions.
2023-11-07 Kwok Cheung Yeung <kcy@codesourcery.com>
gcc/c-family/
* c-attribs.cc (c_common_attribute_table): Add attribute for
indirect functions.
* c-pragma.h (enum parma_omp_clause): Add entry for indirect clause.
gcc/c/
* c-decl.cc (c_decl_attributes): Add attribute for indirect
functions.
* c-lang.h (c_omp_declare_target_attr): Add indirect field.
* c-parser.cc (c_parser_omp_clause_name): Handle indirect clause.
(c_parser_omp_clause_indirect): New.
(c_parser_omp_all_clauses): Handle indirect clause.
(OMP_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask.
(c_parser_omp_declare_target): Handle indirect clause. Emit error
message if device_type or indirect clauses used alone. Emit error
if indirect clause used with device_type that is not 'any'.
(OMP_BEGIN_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask.
(c_parser_omp_begin): Handle indirect clause.
* c-typeck.cc (c_finish_omp_clauses): Handle indirect clause.
gcc/cp/
* cp-tree.h (cp_omp_declare_target_attr): Add indirect field.
* decl2.cc (cplus_decl_attributes): Add attribute for indirect
functions.
* parser.cc (cp_parser_omp_clause_name): Handle indirect clause.
(cp_parser_omp_clause_indirect): New.
(cp_parser_omp_all_clauses): Handle indirect clause.
(handle_omp_declare_target_clause): Add extra parameter. Add
indirect attribute for indirect functions.
(OMP_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask.
(cp_parser_omp_declare_target): Handle indirect clause. Emit error
message if device_type or indirect clauses used alone. Emit error
if indirect clause used with device_type that is not 'any'.
(OMP_BEGIN_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask.
(cp_parser_omp_begin): Handle indirect clause.
* semantics.cc (finish_omp_clauses): Handle indirect clause.
gcc/
* lto-cgraph.cc (enum LTO_symtab_tags): Add tag for indirect
functions.
(output_offload_tables): Write indirect functions.
(input_offload_tables): read indirect functions.
* lto-section-names.h (OFFLOAD_IND_FUNC_TABLE_SECTION_NAME): New.
* omp-builtins.def (BUILT_IN_GOMP_TARGET_MAP_INDIRECT_PTR): New.
* omp-offload.cc (offload_ind_funcs): New.
(omp_discover_implicit_declare_target): Add functions marked with
'omp declare target indirect' to indirect functions list.
(omp_finish_file): Add indirect functions to section for offload
indirect functions.
(execute_omp_device_lower): Redirect indirect calls on target by
passing function pointer to BUILT_IN_GOMP_TARGET_MAP_INDIRECT_PTR.
(pass_omp_device_lower::gate): Run pass_omp_device_lower if
indirect functions are present on an accelerator device.
* omp-offload.h (offload_ind_funcs): New.
* tree-core.h (omp_clause_code): Add OMP_CLAUSE_INDIRECT.
* tree.cc (omp_clause_num_ops): Add entry for OMP_CLAUSE_INDIRECT.
(omp_clause_code_name): Likewise.
* tree.h (OMP_CLAUSE_INDIRECT_EXPR): New.
* config/gcn/mkoffload.cc (process_asm): Process offload_ind_funcs
section. Count number of indirect functions.
(process_obj): Emit number of indirect functions.
* config/nvptx/mkoffload.cc (ind_func_ids, ind_funcs_tail): New.
(process): Emit offload_ind_func_table in PTX code. Emit indirect
function names and count in image.
* config/nvptx/nvptx.cc (nvptx_record_offload_symbol): Mark
indirect functions in PTX code with IND_FUNC_MAP.
gcc/testsuite/
* c-c++-common/gomp/declare-target-7.c: Update expected error message.
* c-c++-common/gomp/declare-target-indirect-1.c: New.
* c-c++-common/gomp/declare-target-indirect-2.c: New.
* g++.dg/gomp/attrs-21.C (v12): Update expected error message.
* g++.dg/gomp/declare-target-indirect-1.C: New.
* gcc.dg/gomp/attrs-21.c (v12): Update expected error message.
include/
* gomp-constants.h (GOMP_VERSION): Increment to 3.
(GOMP_VERSION_SUPPORTS_INDIRECT_FUNCS): New.
libgcc/
* offloadstuff.c (OFFLOAD_IND_FUNC_TABLE_SECTION_NAME): New.
(__offload_ind_func_table): New.
(__offload_ind_funcs_end): New.
(__OFFLOAD_TABLE__): Add entries for indirect functions.
libgomp/
* Makefile.am (libgomp_la_SOURCES): Add target-indirect.c.
* Makefile.in: Regenerate.
* libgomp-plugin.h (GOMP_INDIRECT_ADDR_MAP): New define.
(GOMP_OFFLOAD_load_image): Add extra argument.
* libgomp.h (struct indirect_splay_tree_key_s): New.
(indirect_splay_tree_node, indirect_splay_tree,
indirect_splay_tree_key): New.
(indirect_splay_compare): New.
* libgomp.map (GOMP_5.1.1): Add GOMP_target_map_indirect_ptr.
* libgomp.texi (OpenMP 5.1): Update documentation on indirect
calls in target region and on indirect clause.
(Other new OpenMP 5.2 features): Add entry for virtual function calls.
* libgomp_g.h (GOMP_target_map_indirect_ptr): Add prototype.
* oacc-host.c (host_load_image): Add extra argument.
* target.c (gomp_load_image_to_device): If the GOMP_VERSION is high
enough, read host indirect functions table and pass to
load_image_func.
* config/accel/target-indirect.c: New.
* config/linux/target-indirect.c: New.
* config/gcn/team.c (build_indirect_map): Add prototype.
(gomp_gcn_enter_kernel): Initialize support for indirect
function calls on GCN target.
* config/nvptx/team.c (build_indirect_map): Add prototype.
(gomp_nvptx_main): Initialize support for indirect function
calls on NVPTX target.
* plugin/plugin-gcn.c (struct gcn_image_desc): Add field for
indirect functions count.
(GOMP_OFFLOAD_load_image): Add extra argument. If the GOMP_VERSION
is high enough, build address translation table and copy it to target
memory.
* plugin/plugin-nvptx.c (nvptx_tdata): Add field for indirect
functions count.
(GOMP_OFFLOAD_load_image): Add extra argument. If the GOMP_VERSION
is high enough, Build address translation table and copy it to target
memory.
* testsuite/libgomp.c-c++-common/declare-target-indirect-1.c: New.
* testsuite/libgomp.c-c++-common/declare-target-indirect-2.c: New.
* testsuite/libgomp.c++/declare-target-indirect-1.C: New.
For 'libgcc/config/gcn/gthr-gcn.h' used in libstdc++ context (WIP), we have:
[...]/build-gcc-offload-amdgcn-amdhsa/amdgcn-amdhsa/libstdc++-v3/include/amdgcn-amdhsa/bits/gthr-default.h: In function ‘void* __gthread_getspecific(__gthread_key_t)’:
[...]/build-gcc-offload-amdgcn-amdhsa/amdgcn-amdhsa/libstdc++-v3/include/amdgcn-amdhsa/bits/gthr-default.h:90:10: error: ‘NULL’ was not declared in this scope
90 | return NULL;
| ^~~~
Resolve this with 's%NULL%0', as is used in
'libgcc/gthr-single.h:__gthread_getspecific', for example.
Follow-up to commit 76d4633107
"Create GCN-specific gthreads".
libgcc/
* config/gcn/gthr-gcn.h (__gthread_getspecific): 's%NULL%0'.
Control flow redundancy may choose abnormal edges for early checking,
but that breaks because we can't insert checks on such edges.
Introduce conditional checking on the dest block of abnormal edges,
and leave it for the optimizer to drop the conditional.
for gcc/ChangeLog
PR tree-optimization/111943
* gimple-harden-control-flow.cc: Adjust copyright year.
(rt_bb_visited): Add vfalse and vtrue data members.
Zero-initialize them in the ctor.
(rt_bb_visited::insert_exit_check_on_edge): Upon encountering
abnormal edges, insert initializers for vfalse and vtrue on
entry, and insert the check sequence guarded by a conditional
in the dest block.
for libgcc/ChangeLog
* hardcfr.c: Adjust copyright year.
for gcc/testsuite/ChangeLog
PR tree-optimization/111943
* gcc.dg/harden-cfr-pr111943.c: New.
Recent Darwin versions place contraints on the use of run paths
specified in environment variables. This breaks some assumptions
in the GCC build.
This change allows the user to configure a Darwin build to use
'@rpath/libraryname.dylib' in library names and then to add an
embedded runpath to executables (and libraries with dependents).
The embedded runpath is added by default unless the user adds
'-nodefaultrpaths' to the link line.
For an installed compiler, it means that any executable built with
that compiler will reference the runtimes installed with the
compiler (equivalent to hard-coding the library path into the name
of the library).
During build-time configurations any "-B" entries will be added to
the runpath thus the newly-built libraries will be found by exes.
Since the install name is set in libtool, that decision needs to be
available here (but might also cause dependent ones in Makefiles,
so we need to export a conditional).
This facility is not available for Darwin 8 or earlier, however the
existing environment variable runpath does work there.
We default this on for systems where the external DYLD_LIBRARY_PATH
does not work and off for Darwin 8 or earlier. For systems that can
use either method, if the value is unset, we use the default (which
is currently DYLD_LIBRARY_PATH).
ChangeLog:
* configure: Regenerate.
* configure.ac: Do not add default runpaths to GCC exes
when we are building -static-libstdc++/-static-libgcc (the
default).
* libtool.m4: Add 'enable-darwin-at-runpath'. Act on the
enable flag to alter Darwin libraries to use @rpath names.
gcc/ChangeLog:
* aclocal.m4: Regenerate.
* configure: Regenerate.
* configure.ac: Handle Darwin rpaths.
* config/darwin.h: Handle Darwin rpaths.
* config/darwin.opt: Handle Darwin rpaths.
* Makefile.in: Handle Darwin rpaths.
gcc/ada/ChangeLog:
* gcc-interface/Makefile.in: Handle Darwin rpaths.
gcc/jit/ChangeLog:
* Make-lang.in: Handle Darwin rpaths.
libatomic/ChangeLog:
* Makefile.am: Handle Darwin rpaths.
* Makefile.in: Regenerate.
* configure: Regenerate.
* configure.ac: Handle Darwin rpaths.
libbacktrace/ChangeLog:
* configure: Regenerate.
* configure.ac: Handle Darwin rpaths.
libcc1/ChangeLog:
* configure: Regenerate.
libffi/ChangeLog:
* Makefile.am: Handle Darwin rpaths.
* Makefile.in: Regenerate.
* configure: Regenerate.
libgcc/ChangeLog:
* config/t-slibgcc-darwin: Generate libgcc_s
with an @rpath name.
* config.host: Handle Darwin rpaths.
libgfortran/ChangeLog:
* Makefile.am: Handle Darwin rpaths.
* Makefile.in: Regenerate.
* configure: Regenerate.
* configure.ac: Handle Darwin rpaths
libgm2/ChangeLog:
* Makefile.am: Handle Darwin rpaths.
* Makefile.in: Regenerate.
* aclocal.m4: Regenerate.
* configure: Regenerate.
* configure.ac: Handle Darwin rpaths.
* libm2cor/Makefile.am: Handle Darwin rpaths.
* libm2cor/Makefile.in: Regenerate.
* libm2iso/Makefile.am: Handle Darwin rpaths.
* libm2iso/Makefile.in: Regenerate.
* libm2log/Makefile.am: Handle Darwin rpaths.
* libm2log/Makefile.in: Regenerate.
* libm2min/Makefile.am: Handle Darwin rpaths.
* libm2min/Makefile.in: Regenerate.
* libm2pim/Makefile.am: Handle Darwin rpaths.
* libm2pim/Makefile.in: Regenerate.
libgomp/ChangeLog:
* Makefile.am: Handle Darwin rpaths.
* Makefile.in: Regenerate.
* configure: Regenerate.
* configure.ac: Handle Darwin rpaths
libitm/ChangeLog:
* Makefile.am: Handle Darwin rpaths.
* Makefile.in: Regenerate.
* configure: Regenerate.
* configure.ac: Handle Darwin rpaths.
libobjc/ChangeLog:
* configure: Regenerate.
* configure.ac: Handle Darwin rpaths.
libphobos/ChangeLog:
* configure: Regenerate.
* configure.ac: Handle Darwin rpaths.
* libdruntime/Makefile.am: Handle Darwin rpaths.
* libdruntime/Makefile.in: Regenerate.
* src/Makefile.am: Handle Darwin rpaths.
* src/Makefile.in: Regenerate.
libquadmath/ChangeLog:
* Makefile.am: Handle Darwin rpaths.
* Makefile.in: Regenerate.
* configure: Regenerate.
* configure.ac: Handle Darwin rpaths.
libsanitizer/ChangeLog:
* asan/Makefile.am: Handle Darwin rpaths.
* asan/Makefile.in: Regenerate.
* configure: Regenerate.
* hwasan/Makefile.am: Handle Darwin rpaths.
* hwasan/Makefile.in: Regenerate.
* lsan/Makefile.am: Handle Darwin rpaths.
* lsan/Makefile.in: Regenerate.
* tsan/Makefile.am: Handle Darwin rpaths.
* tsan/Makefile.in: Regenerate.
* ubsan/Makefile.am: Handle Darwin rpaths.
* ubsan/Makefile.in: Regenerate.
libssp/ChangeLog:
* Makefile.am: Handle Darwin rpaths.
* Makefile.in: Regenerate.
* configure: Regenerate.
* configure.ac: Handle Darwin rpaths.
libstdc++-v3/ChangeLog:
* configure: Regenerate.
* configure.ac: Handle Darwin rpaths.
* src/Makefile.am: Handle Darwin rpaths.
* src/Makefile.in: Regenerate.
libvtv/ChangeLog:
* configure: Regenerate.
* configure.ac: Handle Darwin rpaths.
lto-plugin/ChangeLog:
* configure: Regenerate.
* configure.ac: Handle Darwin rpaths.
zlib/ChangeLog:
* configure: Regenerate.
* configure.ac: Handle Darwin rpaths.
Accept the architecture configure option and resolve build failures. This is
enough to build binaries, but I've not got a device to test it on, so there
are probably runtime issues to fix. The cache control instructions might be
unsafe (or too conservative), and the kernel metadata might be off. Vector
reductions will need to be reworked for RDNA2. In principle, it would be
better to use wavefrontsize32 for this architecture, but that would mean
switching everything to allow SImode masks, so wavefrontsize64 it is.
The multilib is not included in the default configuration so either configure
--with-arch=gfx1030 or include it in --with-multilib-list=gfx1030,....
The majority of this patch has no effect on other devices, but changing from
using scalar writes for the exit value to vector writes means we don't need
the scalar cache write-back instruction anywhere (which doesn't exist in RDNA2).
gcc/ChangeLog:
* config.gcc: Allow --with-arch=gfx1030.
* config/gcn/gcn-hsa.h (NO_XNACK): gfx1030 does not support xnack.
(ASM_SPEC): gfx1030 needs -mattr=+wavefrontsize64 set.
* config/gcn/gcn-opts.h (enum processor_type): Add PROCESSOR_GFX1030.
(TARGET_GFX1030): New.
(TARGET_RDNA2): New.
* config/gcn/gcn-valu.md (@dpp_move<mode>): Disable for RDNA2.
(addc<mode>3<exec_vcc>): Add RDNA2 syntax variant.
(subc<mode>3<exec_vcc>): Likewise.
(<convop><mode><vndi>2_exec): Add RDNA2 alternatives.
(vec_cmp<mode>di): Likewise.
(vec_cmp<u><mode>di): Likewise.
(vec_cmp<mode>di_exec): Likewise.
(vec_cmp<u><mode>di_exec): Likewise.
(vec_cmp<mode>di_dup): Likewise.
(vec_cmp<mode>di_dup_exec): Likewise.
(reduc_<reduc_op>_scal_<mode>): Disable for RDNA2.
(*<reduc_op>_dpp_shr_<mode>): Likewise.
(*plus_carry_dpp_shr_<mode>): Likewise.
(*plus_carry_in_dpp_shr_<mode>): Likewise.
* config/gcn/gcn.cc (gcn_option_override): Recognise gfx1030.
(gcn_global_address_p): RDNA2 only allows smaller offsets.
(gcn_addr_space_legitimate_address_p): Likewise.
(gcn_omp_device_kind_arch_isa): Recognise gfx1030.
(gcn_expand_epilogue): Use VGPRs instead of SGPRs.
(output_file_start): Configure gfx1030.
* config/gcn/gcn.h (TARGET_CPU_CPP_BUILTINS): Add __RDNA2__;
(ASSEMBLER_DIALECT): New.
* config/gcn/gcn.md (rdna): New define_attr.
(enabled): Use "rdna" attribute.
(gcn_return): Remove s_dcache_wb.
(addcsi3_scalar): Add RDNA2 syntax variant.
(addcsi3_scalar_zero): Likewise.
(addptrdi3): Likewise.
(mulsi3): v_mul_lo_i32 should be v_mul_lo_u32 on all ISA.
(*memory_barrier): Add RDNA2 syntax variant.
(atomic_load<mode>): Add RDNA2 cache control variants, and disable
scalar atomics for RDNA2.
(atomic_store<mode>): Likewise.
(atomic_exchange<mode>): Likewise.
* config/gcn/gcn.opt (gpu_type): Add gfx1030.
* config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX1030): New.
(main): Recognise -march=gfx1030.
* config/gcn/t-omp-device: Add gfx1030 isa.
libgcc/ChangeLog:
* config/gcn/amdgcn_veclib.h (CDNA3_PLUS): Set false for __RDNA2__.
libgomp/ChangeLog:
* plugin/plugin-gcn.c (EF_AMDGPU_MACH_AMDGCN_GFX1030): New.
(isa_hsa_name): Recognise gfx1030.
(isa_code): Likewise.
* team.c (defined): Remove s_endpgm.
This patch introduces an optional hardening pass to catch unexpected
execution flows. Functions are transformed so that basic blocks set a
bit in an automatic array, and (non-exceptional) function exit edges
check that the bits in the array represent an expected execution path
in the CFG.
Functions with multiple exit edges, or with too many blocks, call an
out-of-line checker builtin implemented in libgcc. For simpler
functions, the verification is performed in-line.
-fharden-control-flow-redundancy enables the pass for eligible
functions, --param hardcfr-max-blocks sets a block count limit for
functions to be eligible, and --param hardcfr-max-inline-blocks
tunes the "too many blocks" limit for in-line verification.
-fhardcfr-skip-leaf makes leaf functions non-eligible.
Additional -fhardcfr-check-* options are added to enable checking at
exception escape points, before potential sibcalls, hereby dubbed
returning calls, and before noreturn calls and exception raises. A
notable case is the distinction between noreturn calls expected to
throw and those expected to terminate or loop forever: the default
setting for -fhardcfr-check-noreturn-calls, no-xthrow, performs
checking before the latter, but the former only gets checking in the
exception handler. GCC can only tell between them by explicit marking
noreturn functions expected to raise with the newly-introduced
expected_throw attribute, and corresponding ECF_XTHROW flag.
for gcc/ChangeLog
* tree-core.h (ECF_XTHROW): New macro.
* tree.cc (set_call_expr): Add expected_throw attribute when
ECF_XTHROW is set.
(build_common_builtin_node): Add ECF_XTHROW to
__cxa_end_cleanup and _Unwind_Resume or _Unwind_SjLj_Resume.
* calls.cc (flags_from_decl_or_type): Check for expected_throw
attribute to set ECF_XTHROW.
* gimple.cc (gimple_build_call_from_tree): Propagate
ECF_XTHROW from decl flags to gimple call...
(gimple_call_flags): ... and back.
* gimple.h (GF_CALL_XTHROW): New gf_mask flag.
(gimple_call_set_expected_throw): New.
(gimple_call_expected_throw_p): New.
* Makefile.in (OBJS): Add gimple-harden-control-flow.o.
* builtins.def (BUILT_IN___HARDCFR_CHECK): New.
* common.opt (fharden-control-flow-redundancy): New.
(-fhardcfr-check-returning-calls): New.
(-fhardcfr-check-exceptions): New.
(-fhardcfr-check-noreturn-calls=*): New.
(Enum hardcfr_check_noreturn_calls): New.
(fhardcfr-skip-leaf): New.
* doc/invoke.texi: Document them.
(hardcfr-max-blocks, hardcfr-max-inline-blocks): New params.
* flag-types.h (enum hardcfr_noret): New.
* gimple-harden-control-flow.cc: New.
* params.opt (-param=hardcfr-max-blocks=): New.
(-param=hradcfr-max-inline-blocks=): New.
* passes.def (pass_harden_control_flow_redundancy): Add.
* tree-pass.h (make_pass_harden_control_flow_redundancy):
Declare.
* doc/extend.texi: Document expected_throw attribute.
for gcc/ada/ChangeLog
* gcc-interface/trans.cc (gigi): Mark __gnat_reraise_zcx with
ECF_XTHROW.
(build_raise_check): Likewise for all rcheck subprograms.
for gcc/c-family/ChangeLog
* c-attribs.cc (handle_expected_throw_attribute): New.
(c_common_attribute_table): Add expected_throw.
for gcc/cp/ChangeLog
* decl.cc (push_throw_library_fn): Mark with ECF_XTHROW.
* except.cc (build_throw): Likewise __cxa_throw,
_ITM_cxa_throw, __cxa_rethrow.
for gcc/testsuite/ChangeLog
* c-c++-common/torture/harden-cfr.c: New.
* c-c++-common/harden-cfr-noret-never-O0.c: New.
* c-c++-common/torture/harden-cfr-noret-never.c: New.
* c-c++-common/torture/harden-cfr-noret-noexcept.c: New.
* c-c++-common/torture/harden-cfr-noret-nothrow.c: New.
* c-c++-common/torture/harden-cfr-noret.c: New.
* c-c++-common/torture/harden-cfr-notail.c: New.
* c-c++-common/torture/harden-cfr-returning.c: New.
* c-c++-common/torture/harden-cfr-tail.c: New.
* c-c++-common/torture/harden-cfr-abrt-always.c: New.
* c-c++-common/torture/harden-cfr-abrt-never.c: New.
* c-c++-common/torture/harden-cfr-abrt-no-xthrow.c: New.
* c-c++-common/torture/harden-cfr-abrt-nothrow.c: New.
* c-c++-common/torture/harden-cfr-abrt.c: New.
* c-c++-common/torture/harden-cfr-always.c: New.
* c-c++-common/torture/harden-cfr-never.c: New.
* c-c++-common/torture/harden-cfr-no-xthrow.c: New.
* c-c++-common/torture/harden-cfr-nothrow.c: New.
* c-c++-common/torture/harden-cfr-bret-always.c: New.
* c-c++-common/torture/harden-cfr-bret-never.c: New.
* c-c++-common/torture/harden-cfr-bret-noopt.c: New.
* c-c++-common/torture/harden-cfr-bret-noret.c: New.
* c-c++-common/torture/harden-cfr-bret-no-xthrow.c: New.
* c-c++-common/torture/harden-cfr-bret-nothrow.c: New.
* c-c++-common/torture/harden-cfr-bret-retcl.c: New.
* c-c++-common/torture/harden-cfr-bret.c: New.
* g++.dg/harden-cfr-throw-always-O0.C: New.
* g++.dg/harden-cfr-throw-returning-O0.C: New.
* g++.dg/torture/harden-cfr-noret-always-no-nothrow.C: New.
* g++.dg/torture/harden-cfr-noret-never-no-nothrow.C: New.
* g++.dg/torture/harden-cfr-noret-no-nothrow.C: New.
* g++.dg/torture/harden-cfr-throw-always.C: New.
* g++.dg/torture/harden-cfr-throw-never.C: New.
* g++.dg/torture/harden-cfr-throw-no-xthrow.C: New.
* g++.dg/torture/harden-cfr-throw-no-xthrow-expected.C: New.
* g++.dg/torture/harden-cfr-throw-nothrow.C: New.
* g++.dg/torture/harden-cfr-throw-nocleanup.C: New.
* g++.dg/torture/harden-cfr-throw-returning.C: New.
* g++.dg/torture/harden-cfr-throw.C: New.
* gcc.dg/torture/harden-cfr-noret-no-nothrow.c: New.
* gcc.dg/torture/harden-cfr-tail-ub.c: New.
* gnat.dg/hardcfr.adb: New.
for libgcc/ChangeLog
* Makefile.in (LIB2ADD): Add hardcfr.c.
* hardcfr.c: New.
libgcc/config/avr/libf7/
* libf7.h (F7_SIZEOF): New macro.
* libf7-asm.sx: Use F7_SIZEOF instead of magic number "10".
(F7MOD_D_fma_, __fma): New module and function.
(fma) [-mdouble=64]: Define as alias for __fma.
(fmal) [-mlong-double=64]: Define as alias for __fma.
* libf7-common.mk (F7_ASM_PARTS): Add D_fma.
libgcc/config/avr/libf7/
* libf7.h (F7_FLAGNO_plusx, F7_FLAG_plusx): New macros.
* libf7.c (f7_horner): Handle F7_FLAG_plusx in highest coefficient.
* libf7-const.def [F7MOD_atan_]: Denominator: Set F7_FLAG_plusx
and omit highest term.
[F7MOD_asinacos_]: Use rational function with normalized denominator.
The outline atomic functions have hidden visibility and can only be called
directly. Therefore we can remove the BTI at function entry. This improves
security by reducing the number of indirect entry points in a binary.
The BTI markings on the objects are still emitted.
libgcc/ChangeLog:
* config/aarch64/lse.S (BTI_C): Remove define.
Be const and sign correct by using a matching CIE augmentation type.
Use a builtin instead of relying <string.h> being included.
libgcc/ChangeLog:
* config/aarch64/aarch64-unwind.h (aarch64_cie_signed_with_b_key):
Use const unsigned type and a builtin.
Signed-off-by: Pekka Seppänen <pexu@gcc.mail.kapsi.fi>
On Mon, Aug 21, 2023 at 05:32:04PM +0000, Joseph Myers wrote:
> I think the libgcc functions (i.e. those exported by libgcc, to which
> references are generated by the compiler) need documenting in libgcc.texi.
> Internal functions or macros in the libgcc patch need appropriate comments
> specifying their semantics; especially FP_TO_BITINT and FP_FROM_BITINT
> which have a lot of arguments and no comments saying what the semantics of
> the macros and their arguments are supposed to me.
Here is an incremental patch which does that.
2023-09-06 Jakub Jelinek <jakub@redhat.com>
PR c/102989
gcc/
* doc/libgcc.texi (Bit-precise integer arithmetic functions):
Document general rules for _BitInt support library functions
and document __mulbitint3 and __divmodbitint4.
(Conversion functions): Document __fix{s,d,x,t}fbitint,
__floatbitint{s,d,x,t,h,b}f, __bid_fix{s,d,t}dbitint and
__bid_floatbitint{s,d,t}d.
libgcc/
* libgcc2.c (bitint_negate): Add function comment.
* soft-fp/bitint.h (bitint_negate): Add function comment.
(FP_TO_BITINT, FP_FROM_BITINT): Add comment explaining the macros.
This patch adds the library helpers for multiplication, division + modulo
and casts from and to floating point (both binary and decimal).
As described in the intro, the first step is try to reduce further the
passed in precision by skipping over most significant limbs with just zeros
or sign bit copies. For multiplication and division I've implemented
a simple algorithm, using something smarter like Karatsuba or Toom N-Way
might be faster for very large _BitInts (which we don't support right now
anyway), but could mean more code in libgcc, which maybe isn't what people
are willing to accept.
For the to/from floating point conversions the patch uses soft-fp, because
it already has tons of handy macros which can be used for that. In theory
it could be implemented using {,unsigned} long long or {,unsigned} __int128
to/from floating point conversions with some frexp before/after, but at that
point we already need to force it into integer registers and analyze it
anyway. Plus, for 32-bit arches there is no __int128 that could be used
for XF/TF mode stuff.
I know that soft-fp is owned by glibc and I think the op-common.h change
should be propagated there, but the bitint stuff is really GCC specific
and IMHO doesn't belong into the glibc copy.
2023-09-06 Jakub Jelinek <jakub@redhat.com>
PR c/102989
libgcc/
* config/aarch64/t-softfp (softfp_extras): Use += rather than :=.
* config/i386/64/t-softfp (softfp_extras): Likewise.
* config/i386/libgcc-glibc.ver (GCC_14.0.0): Export _BitInt support
routines.
* config/i386/t-softfp (softfp_extras): Add fixxfbitint and
bf, hf and xf mode floatbitint.
(CFLAGS-floatbitintbf.c, CFLAGS-floatbitinthf.c): Add -msse2.
* config/riscv/t-softfp32 (softfp_extras): Use += rather than :=.
* config/rs6000/t-e500v1-fp (softfp_extras): Likewise.
* config/rs6000/t-e500v2-fp (softfp_extras): Likewise.
* config/t-softfp (softfp_floatbitint_funcs): New.
(softfp_bid_list): New.
(softfp_func_list): Add sf and df mode from and to _BitInt libcalls.
(softfp_bid_file_list): New.
(LIB2ADD_ST): Add $(softfp_bid_file_list).
* config/t-softfp-sfdftf (softfp_extras): Add fixtfbitint and
floatbitinttf.
* config/t-softfp-tf (softfp_extras): Likewise.
* libgcc2.c (bitint_reduce_prec): New inline function.
(BITINT_INC, BITINT_END): Define.
(bitint_mul_1, bitint_addmul_1): New helper functions.
(__mulbitint3): New function.
(bitint_negate, bitint_submul_1): New helper functions.
(__divmodbitint4): New function.
* libgcc2.h (LIBGCC2_UNITS_PER_WORD): When building _BitInt support
libcalls, redefine depending on __LIBGCC_BITINT_LIMB_WIDTH__.
(__mulbitint3, __divmodbitint4): Declare.
* libgcc-std.ver.in (GCC_14.0.0): Export _BitInt support routines.
* Makefile.in (lib2funcs): Add _mulbitint3.
(LIB2_DIVMOD_FUNCS): Add _divmodbitint4.
* soft-fp/bitint.h: New file.
* soft-fp/fixdfbitint.c: New file.
* soft-fp/fixsfbitint.c: New file.
* soft-fp/fixtfbitint.c: New file.
* soft-fp/fixxfbitint.c: New file.
* soft-fp/floatbitintbf.c: New file.
* soft-fp/floatbitintdf.c: New file.
* soft-fp/floatbitinthf.c: New file.
* soft-fp/floatbitintsf.c: New file.
* soft-fp/floatbitinttf.c: New file.
* soft-fp/floatbitintxf.c: New file.
* soft-fp/op-common.h (_FP_FROM_INT): Add support for rsize up to
4 * _FP_W_TYPE_SIZE rather than just 2 * _FP_W_TYPE_SIZE.
* soft-fp/bitintpow10.c: New file.
* soft-fp/fixsdbitint.c: New file.
* soft-fp/fixddbitint.c: New file.
* soft-fp/fixtdbitint.c: New file.
* soft-fp/floatbitintsd.c: New file.
* soft-fp/floatbitintdd.c: New file.
* soft-fp/floatbitinttd.c: New file.
The following patch adds a header with generated helper tables to support
computation of powers of 10 from 10^0 to 10^6111 inclusive into a
sufficiently large array of _BitInt limbs. This is split from the rest
of the libgcc _BitInt support because it is quite large and together it
would run into gcc-patches mail length limits.
2023-09-06 Jakub Jelinek <jakub@redhat.com>
PR c/102989
libgcc/
* soft-fp/bitintpow10.h: New file.