mirror of
https://gcc.gnu.org/git/gcc.git
synced 2024-11-24 03:14:08 +08:00
30486fab71
206153 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Andrew Stubbs
|
30486fab71 |
libgomp, nvptx: low-latency memory allocator
This patch adds support for allocating low-latency ".shared" memory on NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc. The memory can be allocated, reallocated, and freed using a basic but fast algorithm, is thread safe and the size of the low-latency heap can be configured using the GOMP_NVPTX_LOWLAT_POOL environment variable. The use of the PTX dynamic_smem_size feature means that low-latency allocator will not work with the PTX 3.1 multilib. For now, the omp_low_lat_mem_alloc allocator also works, but that will change when I implement the access traits. libgomp/ChangeLog: * allocator.c (MEMSPACE_ALLOC): New macro. (MEMSPACE_CALLOC): New macro. (MEMSPACE_REALLOC): New macro. (MEMSPACE_FREE): New macro. (predefined_alloc_mapping): New array. Add _Static_assert to match. (ARRAY_SIZE): New macro. (omp_aligned_alloc): Use MEMSPACE_ALLOC. Implement fall-backs for predefined allocators. Simplify existing fall-backs. (omp_free): Use MEMSPACE_FREE. (omp_calloc): Use MEMSPACE_CALLOC. Implement fall-backs for predefined allocators. Simplify existing fall-backs. (omp_realloc): Use MEMSPACE_REALLOC, MEMSPACE_ALLOC, and MEMSPACE_FREE. Implement fall-backs for predefined allocators. Simplify existing fall-backs. * config/nvptx/team.c (__nvptx_lowlat_pool): New asm variable. (__nvptx_lowlat_init): New prototype. (gomp_nvptx_main): Call __nvptx_lowlat_init. * libgomp.texi: Update memory space table. * plugin/plugin-nvptx.c (lowlat_pool_size): New variable. (GOMP_OFFLOAD_init_device): Read the GOMP_NVPTX_LOWLAT_POOL envvar. (GOMP_OFFLOAD_run): Apply lowlat_pool_size. * basic-allocator.c: New file. * config/nvptx/allocator.c: New file. * testsuite/libgomp.c/omp_alloc-1.c: New test. * testsuite/libgomp.c/omp_alloc-2.c: New test. * testsuite/libgomp.c/omp_alloc-3.c: New test. * testsuite/libgomp.c/omp_alloc-4.c: New test. * testsuite/libgomp.c/omp_alloc-5.c: New test. * testsuite/libgomp.c/omp_alloc-6.c: New test. Co-authored-by: Kwok Cheung Yeung <kcy@codesourcery.com> Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com> |
||
John David Anglin
|
458e7c9379 |
Fix c-c++-common/fhardened-[12].c test fails on hppa
The -fstack-protector and -fstack-protector-strong options are not supported on hppa since the stack grows up. 2023-12-06 John David Anglin <danglin@gcc.gnu.org> gcc/testsuite/ChangeLog: * c-c++-common/fhardened-1.c: Ignore __SSP_STRONG__ define if __hppa__ is defined. * c-c++-common/fhardened-2.c: Ignore __SSP__ define if __hppa__ is defined. |
||
Juzhe-Zhong
|
c9d5b46a25 |
RISC-V: Fix VSETVL PASS bug
As PR112855 mentioned, the VSETVL PASS insert vsetvli in unexpected location. Due to 2 reasons: 1. incorrect transparant computation LCM data. We need to check VL operand defs and uses. 2. incorrect fusion of unrelated edge which is the edge never reach the vsetvl expression. PR target/112855 gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (pre_vsetvl::compute_lcm_local_properties): Fix transparant LCM data. (pre_vsetvl::earliest_fuse_vsetvl_info): Disable earliest fusion for unrelated edge. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr112855.c: New test. |
||
Jason Merrill
|
c1e54c82a9 |
c++: partial ordering of object parameter [PR53499]
Looks like we implemented option 1 (skip the object parameter) for CWG532 before the issue was resolved, and never updated to the final resolution of option 2 (model it as a reference). More recently CWG2445 extended this handling to static member functions; I think that's wrong, and have opened CWG2834 to address that and how explicit object member functions interact with it. The FIXME comments are to guide how the explicit object member function support should change the uses of DECL_NONSTATIC_MEMBER_FUNCTION_P. The library testsuite changes are to make partial ordering work again between the generic operator- in the testcase and _Pointer_adapter::operator-. DR 532 PR c++/53499 gcc/cp/ChangeLog: * pt.cc (more_specialized_fn): Fix object parameter handling. gcc/testsuite/ChangeLog: * g++.dg/template/partial-order4.C: New test. * g++.dg/template/spec26.C: Adjust for CWG532. libstdc++-v3/ChangeLog: * testsuite/23_containers/vector/ext_pointer/types/1.cc * testsuite/23_containers/vector/ext_pointer/types/2.cc (N::operator-): Make less specialized. |
||
Marek Polacek
|
e0eca4a55b |
build: unbreak bootstrap on uclinux targets [PR112762]
Currently, cross-compiling with --target=c6x-uclinux (and several other) fails due to: ../../src/gcc/config/linux.h:221:45: error: 'linux_fortify_source_default_level' was not declared in this scope #define TARGET_FORTIFY_SOURCE_DEFAULT_LEVEL linux_fortify_source_default_level In the PR Andrew mentions that another fix would be in config.gcc, but really, here I meant to use the target hook for glibc only, not uclibc. This trivial patch fixes the build problem. It means that -fhardened with uclibc will use -D_FORTIFY_SOURCE=2 and not =3. PR target/112762 gcc/ChangeLog: * config/linux.h: Redefine TARGET_FORTIFY_SOURCE_DEFAULT_LEVEL for glibc only. |
||
Thomas Schwinge
|
fbacdeff97 |
Modula-2: Support '-isysroot [...]'
In GCC cross configurations (tested '--target=amdgcn-amdhsa' and '--target=nvptx-none') with a sysroot configured, the 'gm2' driver invocations are passed '--sysroot=[...]', which is translated into '-isysroot [...]' for the 'cc1gm2' compiler invocation. The latter, however gets complained about: cc1gm2: warning: command-line option ‘-isysroot [...]’ is valid for C/C++/D/Fortran/ObjC/ObjC++ but not for Modula-2 ..., and therefore a ton of FAILs. Reproducer (also for non-cross, native configurations): $ build-gcc/gcc/gm2 -Bbuild-gcc/gcc -v --sysroot=/tmp -x modula-2 /dev/null [...] build-gcc/gcc/cc1gm2 [...] -isysroot [...]/tmp [...] cc1gm2: warning: command-line option ‘-isysroot /tmp’ is valid for C/C++/D/Fortran/ObjC/ObjC++ but not for Modula-2 [...] gcc/m2/ * lang.opt (-isysroot): New. |
||
Jakub Jelinek
|
d7ceffab96 |
libgcc: Avoid -Wbuiltin-declaration-mismatch warnings in emutls.c
When libgcc is being built in --disable-tls configuration or on a target without native TLS support, one gets annoying warnings: ../../../../libgcc/emutls.c:61:7: warning: conflicting types for built-in function ‘__emutls_get_address’; expected ‘void *(void *)’ [-Wbuiltin-declaration-mismatch] 61 | void *__emutls_get_address (struct __emutls_object *); | ^~~~~~~~~~~~~~~~~~~~ ../../../../libgcc/emutls.c:63:6: warning: conflicting types for built-in function ‘__emutls_register_common’; expected ‘void(void *, unsigned int, unsigned int, void *)’ +[-Wbuiltin-declaration-mismatch] 63 | void __emutls_register_common (struct __emutls_object *, word, word, void *); | ^~~~~~~~~~~~~~~~~~~~~~~~ ../../../../libgcc/emutls.c:140:1: warning: conflicting types for built-in function ‘__emutls_get_address’; expected ‘void *(void *)’ [-Wbuiltin-declaration-mismatch] 140 | __emutls_get_address (struct __emutls_object *obj) | ^~~~~~~~~~~~~~~~~~~~ ../../../../libgcc/emutls.c:204:1: warning: conflicting types for built-in function ‘__emutls_register_common’; expected ‘void(void *, unsigned int, unsigned int, void *)’ +[-Wbuiltin-declaration-mismatch] 204 | __emutls_register_common (struct __emutls_object *obj, | ^~~~~~~~~~~~~~~~~~~~~~~~ The thing is that in that case __emutls_get_address and __emutls_register_common are builtins, and are declared with void * arguments rather than struct __emutls_object *. Now, struct __emutls_object is a type private to libgcc/emutls.c and the middle-end creates on demand when calling the builtins a similar structure (with small differences, like not having the union in there). We have a precedent for this e.g. for fprintf or strftime builtins where the builtins are created with magic fileptr_type_node or const_tm_ptr_type_node types and then match it with user definition of pointers to some structure, but I think for this case users should never define these functions themselves nor call them and having special types for them in the compiler would mean extra compile time spent during compiler initialization and more GC data, so I think it is better to keep the compiler as is. On the library side, there is an option to just follow what the compiler is doing and do EMUTLS_ATTR void -__emutls_register_common (struct __emutls_object *obj, +__emutls_register_common (void *xobj, word size, word align, void *templ) { + struct __emutls_object *obj = (struct __emutls_object *) xobj; but that will make e.g. libabigail complain about ABI change in libgcc. So, the patch just turns the warning off. 2023-12-06 Thomas Schwinge <thomas@codesourcery.com> Jakub Jelinek <jakub@redhat.com> PR libgcc/109289 * emutls.c: Add GCC diagnostic ignored "-Wbuiltin-declaration-mismatch" pragma. |
||
Victor Do Nascimento
|
9a8fdade94 |
aarch64: Add system register duplication check selftest
Add a build-time test to check whether system register data, as imported from `aarch64-sys-reg.def' has any duplicate entries. Duplicate entries are defined as any two SYSREG entries in the .def file which share the same encoding values (as specified by its `CPENC' field) and where the relationship amongst the two does not fit into one of the following categories: * Simple aliasing: In some cases, it is observed that one register name serves as an alias to another. One example of this is where TRCEXTINSELR aliases TRCEXTINSELR0. * Expressing intent: It is possible that when a given register serves two distinct functions depending on how it is used, it is given two distinct names whose use should match the context under which it is being used. Example: Debug Data Transfer Register. When used to receive data, it should be accessed as DBGDTRRX_EL0 while when transmitting data it should be accessed via DBGDTRTX_EL0. * Register depreciation: Some register names have been deprecated and should no longer be used, but backwards- compatibility requires that such names continue to be recognized, as is the case for the SPSR_EL1 register, whose access via the SPSR_SVC name is now deprecated. * Same encoding different target: Some encodings are given different meaning depending on the target architecture and, as such, are given different names in each of theses contexts. We see an example of this for CPENC(3,4,2,0,0), which corresponds to TTBR0_EL2 for Armv8-A targets and VSCTLR_EL2 in Armv8-R targets. A consequence of these observations is that `CPENC' duplication is acceptable iff at least one of the `properties' or `arch_reqs' fields of the `sysreg_t' structs associated with the two registers in question differ and it's this condition that is checked by the new `aarch64_test_sysreg_encoding_clashes' function. gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_test_sysreg_encoding_clashes): New. (aarch64_run_selftests): add call to aarch64_test_sysreg_encoding_clashes selftest. |
||
Victor Do Nascimento
|
5af697d72d |
aarch64: Add front-end argument type checking for target builtins
In implementing the ACLE read/write system register builtins it was observed that leaving argument type checking to be done at expand-time meant that poorly-formed function calls were being "fixed" by certain optimization passes, meaning bad code wasn't being properly picked up in checking. Example: const char *regname = "amcgcr_el0"; long long a = __builtin_aarch64_rsr64 (regname); is reduced by the ccp1 pass to long long a = __builtin_aarch64_rsr64 ("amcgcr_el0"); As these functions require an argument of STRING_CST type, there needs to be a check carried out by the front-end capable of picking this up. The introduced `check_general_builtin_call' function will be called by the TARGET_CHECK_BUILTIN_CALL hook whenever a call to a builtin belonging to the AARCH64_BUILTIN_GENERAL category is encountered, carrying out any appropriate checks associated with a particular builtin function code. gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc (aarch64_general_check_builtin_call): New. * config/aarch64/aarch64-c.cc (aarch64_check_builtin_call): Add `aarch64_general_check_builtin_call' call. * config/aarch64/aarch64-protos.h (aarch64_general_check_builtin_call): New. gcc/testsuite/ChangeLog: * gcc.target/aarch64/acle/rwsr-3.c: New. |
||
Victor Do Nascimento
|
fc42900d21 |
aarch64: Implement system register r/w arm ACLE intrinsic functions
Implement the aarch64 intrinsics for reading and writing system registers with the following signatures: uint32_t __arm_rsr(const char *special_register); uint64_t __arm_rsr64(const char *special_register); void* __arm_rsrp(const char *special_register); float __arm_rsrf(const char *special_register); double __arm_rsrf64(const char *special_register); void __arm_wsr(const char *special_register, uint32_t value); void __arm_wsr64(const char *special_register, uint64_t value); void __arm_wsrp(const char *special_register, const void *value); void __arm_wsrf(const char *special_register, float value); void __arm_wsrf64(const char *special_register, double value); gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc (enum aarch64_builtins): Add enums for new builtins. (aarch64_init_rwsr_builtins): New. (aarch64_general_init_builtins): Call aarch64_init_rwsr_builtins. (aarch64_expand_rwsr_builtin): New. (aarch64_general_expand_builtin): Call aarch64_general_expand_builtin. * config/aarch64/aarch64.md (read_sysregdi): New insn_and_split. (write_sysregdi): Likewise. * config/aarch64/arm_acle.h (__arm_rsr): New. (__arm_rsrp): Likewise. (__arm_rsr64): Likewise. (__arm_rsrf): Likewise. (__arm_rsrf64): Likewise. (__arm_wsr): Likewise. (__arm_wsrp): Likewise. (__arm_wsr64): Likewise. (__arm_wsrf): Likewise. (__arm_wsrf64): Likewise. gcc/testsuite/ChangeLog: * gcc.target/aarch64/acle/rwsr.c: New. * gcc.target/aarch64/acle/rwsr-1.c: Likewise. * gcc.target/aarch64/acle/rwsr-2.c: Likewise. * gcc.dg/pch/rwsr-pch.c: Likewise. * gcc.dg/pch/rwsr-pch.hs: Likewise. |
||
Victor Do Nascimento
|
7d36ea7057 |
aarch64: Implement system register validation tools
Given the implementation of a mechanism of encoding system registers into GCC, this patch provides the mechanism of validating their use by the compiler. In particular, this involves: 1. Ensuring a supplied string corresponds to a known system register name. System registers can be accessed either via their name (e.g. `SPSR_EL1') or their encoding (e.g. `S3_0_C4_C0_0'). Register names are validated using a hash map, mapping known system register names to its corresponding `sysreg_t' struct, which is populated from the `aarch64_system_regs.def' file. Register name validation is done via `lookup_sysreg_map', while the encoding naming convention is validated via a parser implemented in this patch - `is_implem_def_reg'. 2. Once a given register name is deemed to be valid, it is checked against a further 2 criteria: a. Is the referenced register implemented in the target architecture? This is achieved by comparing the ARCH field in the relevant SYSREG entry from `aarch64_system_regs.def' against `aarch64_feature_flags' flags set at compile-time. b. Is the register being used correctly? Check the requested operation against the FLAGS specified in SYSREG. This prevents operations like writing to a read-only system register. gcc/ChangeLog: * config/aarch64/aarch64-protos.h (aarch64_valid_sysreg_name_p): New. (aarch64_retrieve_sysreg): Likewise. * config/aarch64/aarch64.cc (is_implem_def_reg): Likewise. (aarch64_valid_sysreg_name_p): Likewise. (aarch64_retrieve_sysreg): Likewise. (aarch64_register_sysreg): Likewise. (aarch64_init_sysregs): Likewise. (aarch64_lookup_sysreg_map): Likewise. * config/aarch64/predicates.md (aarch64_sysreg_string): New. |
||
Victor Do Nascimento
|
76d3114b04 |
aarch64: Add support for aarch64-sys-regs.def
This patch defines the structure of a new .def file used for representing the aarch64 system registers, what information it should hold and the basic framework in GCC to process this file. Entries in the aarch64-system-regs.def file should be as follows: SYSREG (NAME, CPENC (sn,op1,cn,cm,op2), FLAG1 | ... | FLAGn, ARCH) Where the arguments to SYSREG correspond to: - NAME: The system register name, as used in the assembly language. - CPENC: The system register encoding, mapping to: s<sn>_<op1>_c<cn>_c<cm>_<op2> - FLAG: The entries in the FLAGS field are bitwise-OR'd together to encode extra information required to ensure proper use of the system register. For example, a read-only system register will have the flag F_REG_READ, while write-only registers will be labeled F_REG_WRITE. Such flags are tested against at compile-time. - ARCH: The architectural features the system register is associated with. This is encoded via one of three possible macros: 1. When a system register is universally implemented, we say it has no feature requirements, so we tag it with the AARCH64_NO_FEATURES macro. 2. When a register is only implemented for a single architectural extension EXT, the AARCH64_FEATURE (EXT), is used. 3. When a given system register is made available by any of N possible architectural extensions, the AARCH64_FEATURES(N, ...) macro is used to combine them accordingly. In order to enable proper interpretation of the SYSREG entries by the compiler, flags defining system register behavior such as `F_REG_READ' and `F_REG_WRITE' are also defined here, so they can later be used for the validation of system register properties. Finally, any architectural feature flags from Binutils missing from GCC have appropriate aliases defined here so as to ensure cross-compatibility of SYSREG entries across the toolchain. gcc/ChangeLog: * config/aarch64/aarch64.cc (sysreg_t): New. (aarch64_sysregs): Likewise. (AARCH64_FEATURE): Likewise. (AARCH64_FEATURES): Likewise. (AARCH64_NO_FEATURES): Likewise. * config/aarch64/aarch64.h (AARCH64_ISA_V8A): Add missing ISA flag. (AARCH64_ISA_V8_1A): Likewise. (AARCH64_ISA_V8_7A): Likewise. (AARCH64_ISA_V8_8A): Likewise. (AARCH64_NO_FEATURES): Likewise. (AARCH64_FL_RAS): New ISA flag alias. (AARCH64_FL_LOR): Likewise. (AARCH64_FL_PAN): Likewise. (AARCH64_FL_AMU): Likewise. (AARCH64_FL_SCXTNUM): Likewise. (AARCH64_FL_ID_PFR2): Likewise. (F_DEPRECATED): New. (F_REG_READ): Likewise. (F_REG_WRITE): Likewise. (F_ARCHEXT): Likewise. (F_REG_ALIAS): Likewise. |
||
Victor Do Nascimento
|
b3df847026 |
aarch64: Sync system register information with Binutils
This patch adds the `aarch64-sys-regs.def' file, originally written for Binutils, to GCC. In so doing, it provides GCC with the necessary information for teaching the compiler about system registers known to the assembler and how these can be used. By aligning the representation of data common to different parts of the toolchain we can greatly reduce the duplication of work, facilitating the maintenance of the aarch64 back-end across different parts of the toolchain; By keeping both copies of the file in sync, any `SYSREG (...)' that is added in one project is automatically added to its counterpart. This being the case, no change should be made in the GCC copy of the file. Any modifications should first be made in Binutils and the resulting file copied over to GCC. GCC does not implement the full range of ISA flags present in Binutils. Where this is the case, aliases must be added to aarch64.h with the unknown architectural extension being mapped to its associated base architecture, such that any flag present in Binutils and used in system register definitions is understood in GCC. Again, this is done such that flags can be used interchangeably between projects making use of the aarch64-system-regs.def file. This is done in the next patch in the series. `.arch' directives missing from the emitted assembly files as a consequence of this aliasing are accounted for by the compiler using the S<op0>_<op1>_<Cn>_<Cm>_<op2> encoding of system registers when issuing mrs/msr instructions. This design choice ensures the assembler will accept anything that was deemed acceptable by the compiler. gcc/ChangeLog: * config/aarch64/aarch64-sys-regs.def: New. |
||
Robin Dapp
|
056cce4128 |
RISC-V: Add vec_init expander for masks [PR112854].
PR112854 shows a problem on rv32 with zvl1024b. During the course of expand_constructor we try to overlay/subreg a 64-element mask by a scalar (Pmode) register. This works for zvl512b and its maximum of 32 elements but fails for rv32 and 64 elements. To circumvent this this patch adds a vec_init expander for vector masks by initializing a QImode vector and comparing that against 0. gcc/ChangeLog: PR target/112854 PR target/112872 * config/riscv/autovec.md (vec_init<mode>qi): New expander. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr112854.c: New test. * gcc.target/riscv/rvv/autovec/pr112872.c: New test. |
||
Jakub Jelinek
|
e44ed92dbb |
i386: Move vzeroupper pass from after reload pass to after postreload_cse [PR112760]
Regardless of the outcome of the REG_UNUSED discussions, I think it is a good idea to move the vzeroupper pass one pass later. As can be seen in the multiple PRs and as postreload.cc documents, reload/LRA is known to create dead statements quite often, which is the reason why we have postreload_cse pass at all. Doing vzeroupper pass before such cleanup means the pass including df_analyze for it needs to process more instructions than needed and because mode switching adds note problem, also higher chance of having stale REG_UNUSED notes. And, I really don't see why vzeroupper can't wait until those cleanups are done. 2023-12-06 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/112760 * config/i386/i386-passes.def (pass_insert_vzeroupper): Insert after pass_postreload_cse rather than pass_reload. * config/i386/i386-features.cc (rest_of_handle_insert_vzeroupper): Adjust comment for it. * gcc.dg/pr112760.c: New test. |
||
Jakub Jelinek
|
0ca64f846e |
lower-bitint: Fix arithmetics followed by extension by many bits [PR112809]
A zero or sign extension from result of some upwards_2limb operation is implemented in lower_mergeable_stmt as an extra loop which fills in the extra bits with 0s or 1s. If the delta of extended vs. unextended bit count is small, the code doesn't use a loop and emits up to a couple of stores to constant indexes, but if the delta is large, it uses cnt = (bo_bit != 0) + 1 + (rem != 0); statements. bo_bit is non-zero for bit-field loads and is done in that case as straight line, the unconditional 1 in there is for a loop which handles most of the limbs in the delta and finally (rem != 0) is for the case when the extended precision is not a multiple of limb_prec and is again done in straight line code (after the loop). The testcase ICEs because the decision what idx to use was incorrect for kind == bitint_prec_huge (i.e. when the precision delta is very large) and rem == 0 (i.e. the extended precision is multiple of limb_prec). In that case cnt is either 1 (if bo_bit == 0) or 2, and idx should be either first size_int (start) and then result of create_loop (for bo_bit != 0) or just result of create_loop, but by mistake the last case was size_int (end), which means when precision is multiple of limb_prec storing above the precision (which ICEs; but also not emitting the loop which is needed). 2023-12-06 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/112809 * gimple-lower-bitint.cc (bitint_large_huge::lower_mergeable_stmt): For separate_ext in kind == bitint_prec_huge mode if rem == 0, create for i == cnt - 1 the loop rather than using size_int (end). * gcc.dg/bitint-48.c: New test. |
||
Jakub Jelinek
|
895a70f012 |
driver: Fix bootstrap with --enable-default-pie
On IRC Iain mentioned bootstrap is broken for him presumably since r14-5791 -fhardened addition. I think it is only a problem with --enable-default-pie when the case OPT_pie: wants to fall through into case OPT_r: and warns. Before the patch validated = true; was set up if ENABLE_DEFAULT_PIE for OPT_pie, and for -fhardened as documented I think we want to set any_link_options_p = true; for it too: /* True if -r, -shared, -pie, or -no-pie were specified on the command line. */ static bool any_link_options_p; 2023-12-06 Jakub Jelinek <jakub@redhat.com> * gcc.cc (driver_handle_option): Add /* FALLTHROUGH */ comment between OPT_pie and OPT_r cases. |
||
Tobias Burnus
|
6e84dafcc7 |
tsystem.h: Declare calloc/realloc #ifdef inhibit_libc
Declare calloc and realloc #ifndef and inhibit_libc is defined. Those are used by libgcc/emutls.c. gcc/ChangeLog: * tsystem.h (calloc, realloc): Declare when inhibit_libc. |
||
Richard Biener
|
52f8092f54 |
tree-optimization/112843 - update_stmt doing wrong things
The following removes range_query::update_stmt and its single invocation from update_stmt_operands. That function is not supposed to look beyond the raw stmt contents of the passed stmt since there's no guarantee about the rest of the IL. PR tree-optimization/112843 * tree-ssa-operands.cc (update_stmt_operands): Do not call update_stmt from ranger. * value-query.h (range_query::update_stmt): Remove. * gimple-range.h (gimple_ranger::update_stmt): Likewise. * gimple-range.cc (gimple_ranger::update_stmt): Likewise. |
||
xuli
|
8a5ef148bb |
RISC-V: Remove useless modes
gcc/ChangeLog: * config/riscv/riscv.md: Remove. |
||
Alexandre Oliva
|
953a9302d1 |
Revert "libsupc++: try cxa_thread_atexit_impl at runtime"
This reverts commit
|
||
Hans-Peter Nilsson
|
0d51e17791 |
gcc.dg/Wnonnull-4.c: Handle new overflow warning for 32-bit targets [PR112419]
PR testsuite/112419 * gcc.dg/Wnonnull-4.c (test_fda_n_5): Expect warning for exceeding maximum object size for 32-bit targets. |
||
GCC Administrator
|
3dd09cd9e1 | Daily bump. | ||
Alexandre Oliva
|
f0a90c7d73 |
Introduce strub: machine-independent stack scrubbing
This patch adds the strub attribute for function and variable types, command-line options, passes and adjustments to implement it, documentation, and tests. Stack scrubbing is implemented in a machine-independent way: functions with strub enabled are modified so that they take an extra stack watermark argument, that they update with their stack use, and the caller can then zero it out once it regains control, whether by return or exception. There are two ways to go about it: at-calls, that modifies the visible interface (signature) of the function, and internal, in which the body is moved to a clone, the clone undergoes the interface change, and the function becomes a wrapper, preserving its original interface, that calls the clone and then clears the stack used by it. Variables can also be annotated with the strub attribute, so that functions that read from them get stack scrubbing enabled implicitly, whether at-calls, for functions only usable within a translation unit, or internal, for functions whose interfaces must not be modified. There is a strict mode, in which functions that have their stack scrubbed can only call other functions with stack-scrubbing interfaces, or those explicitly marked as callable from strub contexts, so that an entire call chain gets scrubbing, at once or piecemeal depending on optimization levels. In the default mode, relaxed, this requirement is not enforced by the compiler. The implementation adds two IPA passes, one that assigns strub modes early on, another that modifies interfaces and adds calls to the builtins that jointly implement stack scrubbing. Another builtin, that obtains the stack pointer, is added for use in the implementation of the builtins, whether expanded inline or called in libgcc. There are new command-line options to change operation modes and to force the feature disabled; it is enabled by default, but it has no effect and is implicitly disabled if the strub attribute is never used. There are also options meant to use for testing the feature, enabling different strubbing modes for all (viable) functions. for gcc/ChangeLog * Makefile.in (OBJS): Add ipa-strub.o. (GTFILES): Add ipa-strub.cc. * builtins.def (BUILT_IN_STACK_ADDRESS): New. (BUILT_IN___STRUB_ENTER): New. (BUILT_IN___STRUB_UPDATE): New. (BUILT_IN___STRUB_LEAVE): New. * builtins.cc: Include ipa-strub.h. (STACK_STOPS, STACK_UNSIGNED): Define. (expand_builtin_stack_address): New. (expand_builtin_strub_enter): New. (expand_builtin_strub_update): New. (expand_builtin_strub_leave): New. (expand_builtin): Call them. * common.opt (fstrub=*): New options. * doc/extend.texi (strub): New type attribute. (__builtin_stack_address): New function. (Stack Scrubbing): New section. * doc/invoke.texi (-fstrub=*): New options. (-fdump-ipa-*): New passes. * gengtype-lex.l: Ignore multi-line pp-directives. * ipa-inline.cc: Include ipa-strub.h. (can_inline_edge_p): Test strub_inlinable_to_p. * ipa-split.cc: Include ipa-strub.h. (execute_split_functions): Test strub_splittable_p. * ipa-strub.cc, ipa-strub.h: New. * passes.def: Add strub_mode and strub passes. * tree-cfg.cc (gimple_verify_flow_info): Note on debug stmts. * tree-pass.h (make_pass_ipa_strub_mode): Declare. (make_pass_ipa_strub): Declare. (make_pass_ipa_function_and_variable_visibility): Fix formatting. * tree-ssa-ccp.cc (optimize_stack_restore): Keep restores before strub leave. * attribs.cc: Include ipa-strub.h. (decl_attributes): Support applying attributes to function type, rather than pointer type, at handler's request. (comp_type_attributes): Combine strub_comptypes and target comp_type results. * doc/tm.texi.in (TARGET_STRUB_USE_DYNAMIC_ARRAY): New. (TARGET_STRUB_MAY_USE_MEMSET): New. * doc/tm.texi: Rebuilt. * cgraph.h (symtab_node::reset): Add preserve_comdat_group param, with a default. * cgraphunit.cc (symtab_node::reset): Use it. for gcc/c-family/ChangeLog * c-attribs.cc: Include ipa-strub.h. (handle_strub_attribute): New. (c_common_attribute_table): Add strub. for gcc/ada/ChangeLog * gcc-interface/trans.cc: Include ipa-strub.h. (gigi): Make internal decls for targets of compiler-generated calls strub-callable too. (build_raise_check): Likewise. * gcc-interface/utils.cc: Include ipa-strub.h. (handle_strub_attribute): New. (gnat_internal_attribute_table): Add strub. for gcc/testsuite/ChangeLog * c-c++-common/strub-O0.c: New. * c-c++-common/strub-O1.c: New. * c-c++-common/strub-O2.c: New. * c-c++-common/strub-O2fni.c: New. * c-c++-common/strub-O3.c: New. * c-c++-common/strub-O3fni.c: New. * c-c++-common/strub-Og.c: New. * c-c++-common/strub-Os.c: New. * c-c++-common/strub-all1.c: New. * c-c++-common/strub-all2.c: New. * c-c++-common/strub-apply1.c: New. * c-c++-common/strub-apply2.c: New. * c-c++-common/strub-apply3.c: New. * c-c++-common/strub-apply4.c: New. * c-c++-common/strub-at-calls1.c: New. * c-c++-common/strub-at-calls2.c: New. * c-c++-common/strub-defer-O1.c: New. * c-c++-common/strub-defer-O2.c: New. * c-c++-common/strub-defer-O3.c: New. * c-c++-common/strub-defer-Os.c: New. * c-c++-common/strub-internal1.c: New. * c-c++-common/strub-internal2.c: New. * c-c++-common/strub-parms1.c: New. * c-c++-common/strub-parms2.c: New. * c-c++-common/strub-parms3.c: New. * c-c++-common/strub-relaxed1.c: New. * c-c++-common/strub-relaxed2.c: New. * c-c++-common/strub-short-O0-exc.c: New. * c-c++-common/strub-short-O0.c: New. * c-c++-common/strub-short-O1.c: New. * c-c++-common/strub-short-O2.c: New. * c-c++-common/strub-short-O3.c: New. * c-c++-common/strub-short-Os.c: New. * c-c++-common/strub-strict1.c: New. * c-c++-common/strub-strict2.c: New. * c-c++-common/strub-tail-O1.c: New. * c-c++-common/strub-tail-O2.c: New. * c-c++-common/torture/strub-callable1.c: New. * c-c++-common/torture/strub-callable2.c: New. * c-c++-common/torture/strub-const1.c: New. * c-c++-common/torture/strub-const2.c: New. * c-c++-common/torture/strub-const3.c: New. * c-c++-common/torture/strub-const4.c: New. * c-c++-common/torture/strub-data1.c: New. * c-c++-common/torture/strub-data2.c: New. * c-c++-common/torture/strub-data3.c: New. * c-c++-common/torture/strub-data4.c: New. * c-c++-common/torture/strub-data5.c: New. * c-c++-common/torture/strub-indcall1.c: New. * c-c++-common/torture/strub-indcall2.c: New. * c-c++-common/torture/strub-indcall3.c: New. * c-c++-common/torture/strub-inlinable1.c: New. * c-c++-common/torture/strub-inlinable2.c: New. * c-c++-common/torture/strub-ptrfn1.c: New. * c-c++-common/torture/strub-ptrfn2.c: New. * c-c++-common/torture/strub-ptrfn3.c: New. * c-c++-common/torture/strub-ptrfn4.c: New. * c-c++-common/torture/strub-pure1.c: New. * c-c++-common/torture/strub-pure2.c: New. * c-c++-common/torture/strub-pure3.c: New. * c-c++-common/torture/strub-pure4.c: New. * c-c++-common/torture/strub-run1.c: New. * c-c++-common/torture/strub-run2.c: New. * c-c++-common/torture/strub-run3.c: New. * c-c++-common/torture/strub-run4.c: New. * c-c++-common/torture/strub-run4c.c: New. * c-c++-common/torture/strub-run4d.c: New. * c-c++-common/torture/strub-run4i.c: New. * g++.dg/strub-run1.C: New. * g++.dg/torture/strub-init1.C: New. * g++.dg/torture/strub-init2.C: New. * g++.dg/torture/strub-init3.C: New. * gnat.dg/strub_attr.adb, gnat.dg/strub_attr.ads: New. * gnat.dg/strub_ind.adb, gnat.dg/strub_ind.ads: New. for libgcc/ChangeLog * Makefile.in (LIB2ADD): Add strub.c. * libgcc2.h (__strub_enter, __strub_update, __strub_leave): Declare. * strub.c: New. * libgcc-std.ver.in (__strub_enter): Add to GCC_14.0.0. (__strub_update, __strub_leave): Likewise. |
||
Jonathan Wakely
|
08448dc146 |
libstdc++: Add workaround to std::ranges::subrange [PR111948]
libstdc++-v3/ChangeLog: PR libstdc++/111948 * include/bits/ranges_util.h (subrange): Add constructor to _Size to aoid setting member in constructor. * testsuite/std/ranges/subrange/111948.cc: New test. |
||
Jonathan Wakely
|
45630fbcf7 |
libstdc++: Implement LWG 4016 for std::ranges::to
This implements the proposed resolution of LWG 4016, so that std::ranges::to does not use std::back_inserter and std::inserter. Instead it inserts at the back of the container directly, using the first supported one of emplace_back, push_back, emplace, and insert. Using emplace avoids creating a temporary that has to be moved into the container, for cases where the source range and the destination container do not have the same value type. libstdc++-v3/ChangeLog: * include/std/ranges (__detail::__container_insertable): Remove. (__detail::__container_inserter): Remove. (ranges::to): Use emplace_back or emplace, as per LWG 4016. * testsuite/std/ranges/conv/1.cc (Cont4, test_2_1_4): Check for use of emplace_back and emplace. |
||
Jonathan Wakely
|
5e8a30d8b8 |
libstdc++: Redefine __glibcxx_assert to work in C++23 constexpr
The changes in r14-5979 to support unknown references in constant expressions caused some test regressions. The way that __glibcxx_assert is defined for constant evaluation no longer works when _GLIBCXX_ASSERTIONS is defined. This change simplifies __glibcxx_assert so that there is only one check, rather than a constexpr one and a conditionally-enabled runtime one. The constexpr one does not need to use __builtin_unreachable to cause a compilation failure, because __glibcxx_assert_fail is not usable in constant expressions, so that will cause a failure too. As well as fixing the regressions, this makes the code for the assertions shorter and simpler, so should be quicker to compile, and might inline better too. libstdc++-v3/ChangeLog: * include/bits/c++config (__glibcxx_assert_fail): Declare even when assertions are not enabled. (__glibcxx_constexpr_assert): Remove macro. (__glibcxx_assert_impl): Remove macro. (_GLIBCXX_ASSERT_FAIL): New macro. (_GLIBCXX_DO_ASSERT): New macro. (__glibcxx_assert): Simplify to a single definition that works at runtime and during constant evaluation. * testsuite/21_strings/basic_string_view/element_access/char/back_constexpr_neg.cc: Adjust expected errors. * testsuite/21_strings/basic_string_view/element_access/char/constexpr_neg.cc: Likewise. * testsuite/21_strings/basic_string_view/element_access/char/front_constexpr_neg.cc: Likewise. * testsuite/21_strings/basic_string_view/element_access/wchar_t/back_constexpr_neg.cc: Likewise. * testsuite/21_strings/basic_string_view/element_access/wchar_t/constexpr_neg.cc: Likewise. * testsuite/21_strings/basic_string_view/element_access/wchar_t/front_constexpr_neg.cc: Likewise. * testsuite/21_strings/basic_string_view/modifiers/remove_prefix/debug.cc: Likewise. * testsuite/21_strings/basic_string_view/modifiers/remove_suffix/debug.cc: Likewise. * testsuite/23_containers/span/back_neg.cc: Likewise. * testsuite/23_containers/span/front_neg.cc: Likewise. * testsuite/23_containers/span/index_op_neg.cc: Likewise. * testsuite/26_numerics/lcm/105844.cc: Likewise. |
||
Juzhe-Zhong
|
2e7abd0962 |
RISC-V: Block VLSmodes according to TARGET_MAX_LMUL and BITS_PER_RISCV_VECTOR
This patch fixes ICE mentioned on PR112851 and PR112852. Actually these ICEs happens many times in full coverage testing. The ICE happens on: bug.c:84:1: internal compiler error: in partial_subreg_p, at rtl.h:3187 84 | } | ^ 0x11a7271 partial_subreg_p(machine_mode, machine_mode) ../../../../gcc/gcc/rtl.h:3187 gcc_checking_assert (ordered_p (outer_prec, inner_prec)); outer_prec is the PRECISION of RVVM1SImode inner_prec is the PRECISION of V64SImode when it is zvl512b. outer_prec is VLA mode with size (512, 512) inner_prec is VLS mode with size (2048, 0) Their precision/size relationship is not certain. So block VLSmodes according to TARGET_MAX_LMUL and BITS_PER_RISCV_VECTOR, then we never reaches the situation that comparing the precision/size between VLA size and VLS size that size > coeffs[0] of VLA mode. Note this patch cause following regression: FAIL: gcc.target/riscv/rvv/autovec/pr111751.c -O3 -ftree-vectorize scan-assembler-not vset FAIL: gcc.target/riscv/rvv/autovec/pr111751.c -O3 -ftree-vectorize scan-assembler-times li\\s+[a-x0-9]+,0\\s+ret 2 FAIL: gcc.target/riscv/rvv/base/cpymem-1.c check-function-bodies f3 FAIL: gcc.target/riscv/rvv/base/cpymem-2.c check-function-bodies f2 FAIL: gcc.target/riscv/rvv/base/cpymem-2.c check-function-bodies f3 1. cpymem check FAIL should be fixed on the testcase since the test is fragile which should be robostified. 2. pr111751.c is Vector cost model issue, and I will fix it in the following patch. For now, we should land this patch first (highest-priority) since it is fixing ICE. PR target/112851 PR target/112852 gcc/ChangeLog: * config/riscv/riscv-v.cc (vls_mode_valid_p): Block VLSmodes according TARGET_MAX_LMUL and BITS_PER_RISCV_VECTOR. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls/consecutive-1.c: Add LMUL = 8 option. * gcc.target/riscv/rvv/autovec/vls/consecutive-2.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/mod-1.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/mov-1.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/mov-10.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/mov-11.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/mov-12.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/mov-13.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/mov-14.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/mov-15.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/mov-16.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/mov-17.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/mov-3.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/mov-5.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/mov-7.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/mov-8.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/mov-9.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/spill-1.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/spill-2.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/spill-3.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/spill-5.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/spill-6.c: Ditto. * gcc.target/riscv/rvv/autovec/zve32f-1.c: Adapt test. * gcc.target/riscv/rvv/autovec/pr112851.c: New test. * gcc.target/riscv/rvv/autovec/pr112852.c: New test. |
||
Jakub Jelinek
|
c73cc6fe62 |
libiberty: Fix build with GCC < 7
Tobias reported on IRC that the linker fails to build with GCC 4.8.5. In configure I've tried to use everything actually used in the sha1.c x86 hw implementation, but unfortunately I forgot about implicit function declarations. GCC before 7 did have <cpuid.h> header and bit_SHA define and __get_cpuid function defined inline, but it didn't define __get_cpuid_count, which compiled fine (and the configure test is intentionally compile time only) due to implicit function declaration, but then failed to link when linking the linker, because __get_cpuid_count wasn't defined anywhere. The following patch fixes that by using what autoconf uses in AC_CHECK_DECL to make sure the functions are declared. 2023-12-05 Jakub Jelinek <jakub@redhat.com> * configure.ac (HAVE_X86_SHA1_HW_SUPPORT): Verify __get_cpuid and __get_cpuid_count are not implicitly declared. * configure: Regenerated. |
||
David Faust
|
b8cf266f4c |
btf: avoid wrong DATASEC entries for extern vars [PR112849]
The process of creating BTF_KIND_DATASEC records involves iterating through variable declarations, determining which section they will be placed in, and creating an entry in the appropriate DATASEC record accordingly. For variables without e.g. an explicit __attribute__((section)), we use categorize_decl_for_section () to identify the appropriate named section and corresponding BTF_KIND_DATASEC record. This was incorrectly being done for 'extern' variable declarations as well as non-extern ones, which meant that extern variable declarations could result in BTF_KIND_DATASEC entries claiming the variable is allocated in some section such as '.bss' without any knowledge whether that is actually true. That resulted in errors building the Linux kernel BPF selftests. This patch corrects btf_collect_datasec () to avoid assuming a section for extern variables, and only emit BTF_KIND_DATASEC entries for them if they have a known section. gcc/ PR debug/112849 * btfout.cc (btf_collect_datasec): Avoid incorrectly creating an entry in a BTF_KIND_DATASEC record for extern variable decls without a known section. gcc/testsuite/ PR debug/112849 * gcc.dg/debug/btf/btf-datasec-3.c: New test. |
||
Jakub Jelinek
|
9610ba7b6f |
libgfortran: Fix -Wincompatible-pointer-types errors
As reported, libgfortran fails to build on targets where int32_t and int are different types, because it uses int vs. GFC_INTEGER_4 (under hood int32_t) interchangeably. The following patch fixes that. 2023-12-05 Florian Weimer <fweimer@redhat.com> Jakub Jelinek <jakub@redhat.com> * io/list_read.c (list_formatted_read_scalar) <case BT_CLASS>: Change types of unit and noiostat to GFC_INTEGER_4 from int, change type of child_iostat from to GFC_INTEGER_4 * from int *, formatting fixes. (nml_read_obj): Likewise. * io/write.c (list_formatted_write_scalar) <case BT_CLASS>: Likewise. (nml_write_obj): Likewise. * io/transfer.c (unformatted_read, unformatted_write): Likewise. |
||
Jakub Jelinek
|
59be79fd59 |
c++: Further #pragma GCC unroll C++ fix [PR112795]
When committing the #pragma GCC unroll patch, I found I forgot one spot for diagnosting the invalid unrolls - if #pragma GCC unroll argument is dependent and the pragma is before a range for loop, the unroll tree (now, before one converted form ushort) is saved into RANGE_FOR_UNROLL and tsubst_stmt was RECURing on it, but didn't diagnose if it was invalid and so we ICEd later in the middle-end when ANNOTATE_EXPR had unexpected argument. The following patch fixes that. So that the diagnostics isn't done in 3 different places, the patch introduces a new function that both cp_parser_pragma_unroll and instantiation of ANNOTATE_EXPR and RANGE_FOR_STMT can use. 2023-12-05 Jakub Jelinek <jakub@redhat.com> PR c++/112795 * cp-tree.h (cp_check_pragma_unroll): Declare. * semantics.cc (cp_check_pragma_unroll): New function. * parser.cc (cp_parser_pragma_unroll): Use cp_check_pragma_unroll. * pt.cc (tsubst_expr) <case ANNOTATE_EXPR>: Likewise. (tsubst_stmt) <case RANGE_FOR_STMT>: Likwsie. * g++.dg/ext/unroll-2.C: Use { target c++11 } instead of dg-skip-if for -std=gnu++98. * g++.dg/ext/unroll-3.C: Likewise. * g++.dg/ext/unroll-7.C: New test. * g++.dg/ext/unroll-8.C: New test. |
||
Jakub Jelinek
|
58d5546af9 |
rs6000: Canonicalize copysign (x, -1) back to -abs (x) in the backend [PR112606]
The middle-end has been changed quite recently to canonicalize -abs (x) to copysign (x, -1) rather than the other way around. While I agree with that at GIMPLE level, since it matches the GIMPLE goal of as few operations as possible for a canonical form (-abs (x) is 2 GIMPLE statements, copysign (x, -1) is just one), I must say I don't really like that being done on RTL as well (or at least not canonicalizing (COPYSIGN x, negative) back to (NEG (ABS x))), because on most targets most of floating point constants need to be loaded from memory, there are a few exceptions but -1 is often not one of them. Anyway, the following patch fixes the rs6000 regression caused by the change in GIMPLE canonicalization (i.e. the desirable one). As rs6000 clearly prefers -abs (x) form because it has a single instruction to do that while it also has copysign instruction, but that requires loading the -1 from memory, the following patch just ensures the copysign expander can actually see the floating point constant and in that case emits the -abs (x) code (or in the hypothetical case of copysign with non-negative constant abs (x) - but there copysign (x, 1) in GIMPLE is canonicalized to abs (x)), otherwise forces the operand to be the expected gpc_reg_operand and does what it did before. 2023-12-05 Jakub Jelinek <jakub@redhat.com> PR target/112606 * config/rs6000/rs6000.md (copysign<mode>3): Change predicate of the last argument from gpc_reg_operand to any_operand. If operands[2] is CONST_DOUBLE, emit abs or neg abs depending on its sign, otherwise if it doesn't satisfy gpc_reg_operand, force it to REG using copy_to_mode_reg. |
||
Harald Anlauf
|
9c3a880fee |
Fortran: allow RESTRICT qualifier also for optional arguments [PR100988]
gcc/fortran/ChangeLog: PR fortran/100988 * gfortran.h (IS_PROC_POINTER): New macro. * trans-types.cc (gfc_sym_type): Use macro in determination if the restrict qualifier can be used for a dummy variable. Fix logic to allow the restrict qualifier also for optional arguments, and to not apply it to pointer or proc_pointer arguments. gcc/testsuite/ChangeLog: PR fortran/100988 * gfortran.dg/coarray_poly_6.f90: Adjust pattern. * gfortran.dg/coarray_poly_7.f90: Likewise. * gfortran.dg/coarray_poly_8.f90: Likewise. * gfortran.dg/missing_optional_dummy_6a.f90: Likewise. * gfortran.dg/pr100988.f90: New test. Co-authored-by: Tobias Burnus <tobias@codesourcery.com> |
||
Richard Sandiford
|
1dad3df1e7 |
Restore build with GCC 4.8 to GCC 5
GCC 5 and earlier applied array-to-pointer decay too early,
which affected the new attribute namespace code. A reduced
example of the construct that the attribute code uses is:
struct S { template<__SIZE_TYPE__ N> S(int (&)[N]); };
struct T { int a; S b; };
int a[] = { 1 };
T t = { 1, a };
This was fixed by
|
||
Jonathan Wakely
|
3cd73543a1 |
libstdc++: Disable std::formatter::set_debug_format [PR112832]
All set_debug_format member functions should be guarded by the __cpp_lib_formatting_ranges macro (which is not defined yet). libstdc++-v3/ChangeLog: PR libstdc++/112832 * include/std/format (formatter::set_debug_format): Ensure this member is defined conditionally for all specializations. * testsuite/std/format/formatter/112832.cc: New test. |
||
Will Hawkins
|
9fff752695 |
libstdc++: Add test for LWG Issue 3897
Add a test to verify that the implementation of inout_ptr is not vulnerable to LWG Issue 3897. libstdc++-v3/ChangeLog: * testsuite/20_util/smartptr.adapt/inout_ptr/2.cc: Add check for LWG Issue 3897. Co-authored-by: Jonathan Wakely <jwakely@redhat.com> |
||
Jakub Jelinek
|
e5153e7d63 |
c++: Implement C++ DR 2262 - Attributes for asm-definition [PR110734]
Seems in 2017 attribute-specifier-seq[opt] was added to asm-declaration and the change was voted in as a DR. The following patch implements it by parsing the attributes and warning about them. I found one attribute parsing bug I'll send a fix for momentarily. And there is another thing I wonder about: with -Wno-attributes= we are supposed to ignore the attributes altogether, but we are actually still warning about them when we emit these generic warnings about ignoring all attributes which appertain to this and that (perhaps with some exceptions we first remove from the attribute chain), like: void foo () { [[foo::bar]]; } with -Wattributes -Wno-attributes=foo::bar Shouldn't we call some helper function in cases like this and warn not when std_attrs (or how the attribute chain var is called) is non-NULL, but if it is non-NULL and contains at least one non-attribute_ignored_p attribute? cp_parser_declaration at least tries: if (std_attrs != NULL_TREE && !attribute_ignored_p (std_attrs)) warning_at (make_location (attrs_loc, attrs_loc, parser->lexer), OPT_Wattributes, "attribute ignored"); but attribute_ignored_p here checks the first attribute rather than the whole chain. So it will incorrectly not warn if there is an ignored attribute followed by non-ignored. 2023-12-05 Jakub Jelinek <jakub@redhat.com> PR c++/110734 * parser.cc (cp_parser_block_declaration): Implement C++ DR 2262 - Attributes for asm-definition. Call cp_parser_asm_definition even if RID_ASM token is only seen after sequence of standard attributes. (cp_parser_asm_definition): Parse standard attributes before RID_ASM token and warn for them with -Wattributes. * g++.dg/DRs/dr2262.C: New test. * g++.dg/cpp0x/gen-attrs-76.C (foo, bar): Don't expect errors on attributes on asm definitions. * g++.dg/gomp/attrs-11.C: Remove 2 expected errors. |
||
Richard Biener
|
d9403153f9 |
middle-end/112860 - -fgimple can skip ISEL
The following makes sure we don't skip ISEL. PR middle-end/112860 * passes.cc (should_skip_pass_p): Do not skip ISEL. |
||
Gaius Mulley
|
805be8fbea |
PR modula2/112865 IM and RE fails to skip type equivalences
This patch skip type equivalences when checking IM and RE ISO M2 standard functions for complex data type operands. gcc/m2/ChangeLog: PR modula2/112865 * gm2-compiler/M2Quads.mod (BuildReFunction): Use GetDType to retrieve the type of the operand when converting the complex type to its scalar equivalent. (BuildImFunction): Use GetDType to retrieve the type of the operand when converting the complex type to its scalar equivalent. Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com> |
||
Richard Biener
|
7e40497805 |
sanitizer/111736 - skip ASAN for globals in alternate address-space
PR sanitizer/111736 * asan.cc (asan_protect_global): Do not protect globals in non-generic address-space. |
||
Richard Biener
|
1e6c4aa479 |
ipa/92606 - IPA ICF merging variables in different address-space
The following aovids merging variables that are put in different address-spaces. PR ipa/92606 * ipa-icf.cc (sem_variable::equals_wpa): Compare address-spaces. |
||
Richard Biener
|
68d32d0203 |
middle-end/112830 - avoid gimplifying non-default addr-space assign to memcpy
The following avoids turning aggregate copy involving non-default address-spaces to memcpy since that is not prepared for that. GIMPLE verification no longer accepts WITH_SIZE_EXPR in aggregate copies, the following re-allows that for the RHS. I also needed to adjust one assert in DCE. get_memory_address is used for string builtin expansion, so instead of fixing that up for non-generic address-spaces I've put an assert there. I'll note that the same issue exists for initialization from an empty CTOR which we gimplify to a memset call but since we are not prepared to handle RTL expansion of the original VLA init and I failed to provide test coverage (without extending the GNU C extension for VLA structs) and the Ada frontend (or other frontends) to not have address-space support the patch instead asserts we only see generic address-spaces there. PR middle-end/112830 * gimplify.cc (gimplify_modify_expr): Avoid turning aggregate copy of non-generic address-spaces to memcpy. (gimplify_modify_expr_to_memcpy): Assert we are dealing with a copy inside the generic address-space. (gimplify_modify_expr_to_memset): Likewise. * tree-cfg.cc (verify_gimple_assign_single): Allow WITH_SIZE_EXPR as part of the RHS of an assignment. * builtins.cc (get_memory_address): Assert we are dealing with the generic address-space. * tree-ssa-dce.cc (ref_may_be_aliased): Handle WITH_SIZE_EXPR. * gcc.target/avr/pr112830.c: New testcase. * gcc.target/i386/pr112830.c: Likewise. |
||
Richard Biener
|
8ff02df629 |
tree-optimization/112856 - fix LC SSA after loop header copying
When loop header copying unloops loops we have to possibly fixup LC SSA. I've take the opportunity to streamline the unloop_loops API, removing the use of a ivcanon local global variable. PR tree-optimization/109689 PR tree-optimization/112856 * cfgloopmanip.h (unloop_loops): Adjust API. * tree-ssa-loop-ivcanon.cc (unloop_loops): Take edges_to_remove as parameter. (canonicalize_induction_variables): Adjust. (tree_unroll_loops_completely): Likewise. * tree-ssa-loop-ch.cc (ch_base::copy_headers): Rewrite into LC SSA if we unlooped some loops and we are in LC SSA. * gcc.dg/torture/pr109689.c: New testcase. * gcc.dg/torture/pr112856.c: Likewise. |
||
Jakub Jelinek
|
e0786ca9a1 |
i386: Fix -fcf-protection -Os ICE due to movabsq peephole2 [PR112845]
The following testcase ICEs in the movabsq $(i32 << shift), r64 peephole2 I've added a while back to use smaller code than movabsq if possible. If i32 is 0xfa1e0ff3 and shift is not divisible by 8, then it creates an invalid insn (as 0xfa1e0ff3 CONST_INT is not allowed as x86_64_immediate_operand nor x86_64_zext_immediate_operand), the peephole2 even triggers on it again and again (this time with shift 0) until it gives up. The following patch fixes that. As ix86_endbr_immediate_operand needs a CONST_INT and it is hopefully rare, I chose to use FAIL rather than handling it in the condition (where I'd probably need to call ctz_hwi again etc.). 2023-12-05 Jakub Jelinek <jakub@redhat.com> PR target/112845 * config/i386/i386.md (movabsq $(i32 << shift), r64 peephole2): FAIL if the new immediate is ix86_endbr_immediate_operand. |
||
Richard Sandiford
|
c1c267dfcd |
aarch64: Add support for SME2 intrinsics
This patch adds support for the SME2 <arm_sme.h> intrinsics. The convention I've used is to put stuff in aarch64-sve-builtins-sme.* if it relates to ZA, ZT0, the streaming vector length, or other such SME state. Things that operate purely on predicates and vectors go in aarch64-sve-builtins-sve2.* instead. Some of these will later be picked up for SVE2p1. We previously used Uph internally as a constraint for 16-bit immediates to atomic instructions. However, we need a user-facing constraint for the upper predicate registers (already available as PR_HI_REGS), and Uph makes a natural pair with the existing Upl. gcc/ * config/aarch64/aarch64.h (TARGET_STREAMING_SME2): New macro. (P_ALIASES): Likewise. (REGISTER_NAMES): Add pn aliases of the predicate registers. (W8_W11_REGNUM_P): New macro. (W8_W11_REGS): New register class. (REG_CLASS_NAMES, REG_CLASS_CONTENTS): Update accordingly. * config/aarch64/aarch64.cc (aarch64_print_operand): Add support for %K, which prints a predicate as a counter. Handle tuples of predicates. (aarch64_regno_regclass): Handle W8_W11_REGS. (aarch64_class_max_nregs): Likewise. * config/aarch64/constraints.md (Uci, Uw2, Uw4): New constraints. (x, y): Move further up file. (Uph): Redefine as the high predicate registers, renaming the old constraint to... (Uih): ...this. * config/aarch64/predicates.md (const_0_to_7_operand): New predicate. (const_0_to_4_step_4_operand, const_0_to_6_step_2_operand): Likewise. (const_0_to_12_step_4_operand, const_0_to_14_step_2_operand): Likewise. (aarch64_simd_shift_imm_qi): Use const_0_to_7_operand. * config/aarch64/iterators.md (VNx16SI_ONLY, VNx8SI_ONLY) (VNx8DI_ONLY, SVE_FULL_BHSIx2, SVE_FULL_HF, SVE_FULL_SIx2_SDIx4) (SVE_FULL_BHS, SVE_FULLx24, SVE_DIx24, SVE_BHSx24, SVE_Ix24) (SVE_Fx24, SVE_SFx24, SME_ZA_BIx24, SME_ZA_BHIx124, SME_ZA_BHIx24) (SME_ZA_HFx124, SME_ZA_HFx24, SME_ZA_HIx124, SME_ZA_HIx24) (SME_ZA_SDIx24, SME_ZA_SDFx24): New mode iterators. (UNSPEC_REVD, UNSPEC_CNTP_C, UNSPEC_PEXT, UNSPEC_PEXTx2): New unspecs. (UNSPEC_PSEL, UNSPEC_PTRUE_C, UNSPEC_SQRSHR, UNSPEC_SQRSHRN) (UNSPEC_SQRSHRU, UNSPEC_SQRSHRUN, UNSPEC_UQRSHR, UNSPEC_UQRSHRN) (UNSPEC_UZP, UNSPEC_UZPQ, UNSPEC_ZIP, UNSPEC_ZIPQ, UNSPEC_BFMLSLB) (UNSPEC_BFMLSLT, UNSPEC_FCVTN, UNSPEC_FDOT, UNSPEC_SQCVT): Likewise. (UNSPEC_SQCVTN, UNSPEC_SQCVTU, UNSPEC_SQCVTUN, UNSPEC_UQCVT): Likewise. (UNSPEC_SME_ADD, UNSPEC_SME_ADD_WRITE, UNSPEC_SME_BMOPA): Likewise. (UNSPEC_SME_BMOPS, UNSPEC_SME_FADD, UNSPEC_SME_FDOT, UNSPEC_SME_FVDOT) (UNSPEC_SME_FMLA, UNSPEC_SME_FMLS, UNSPEC_SME_FSUB, UNSPEC_SME_READ) (UNSPEC_SME_SDOT, UNSPEC_SME_SVDOT, UNSPEC_SME_SMLA, UNSPEC_SME_SMLS) (UNSPEC_SME_SUB, UNSPEC_SME_SUB_WRITE, UNSPEC_SME_SUDOT): Likewise. (UNSPEC_SME_SUVDOT, UNSPEC_SME_UDOT, UNSPEC_SME_UVDOT): Likewise. (UNSPEC_SME_UMLA, UNSPEC_SME_UMLS, UNSPEC_SME_USDOT): Likewise. (UNSPEC_SME_USVDOT, UNSPEC_SME_WRITE): Likewise. (Vetype, VNARROW, V2XWIDE, Ventype, V_INT_EQUIV, v_int_equiv) (VSINGLE, vsingle, b): Add tuple modes. (v2xwide, za32_offset_range, za64_offset_range, za32_long) (za32_last_offset, vg_modifier, z_suffix, aligned_operand) (aligned_fpr): New mode attributes. (SVE_INT_BINARY_MULTI, SVE_INT_BINARY_SINGLE, SVE_INT_BINARY_MULTI) (SVE_FP_BINARY_MULTI): New int iterators. (SVE_BFLOAT_TERNARY_LONG): Add UNSPEC_BFMLSLB and UNSPEC_BFMLSLT. (SVE_BFLOAT_TERNARY_LONG_LANE): Likewise. (SVE_WHILE_ORDER, SVE2_INT_SHIFT_IMM_NARROWxN, SVE_QCVTxN) (SVE2_SFx24_UNARY, SVE2_x24_PERMUTE, SVE2_x24_PERMUTEQ) (UNSPEC_REVD_ONLY, SME2_INT_MOP, SME2_BMOP, SME_BINARY_SLICE_SDI) (SME_BINARY_SLICE_SDF, SME_BINARY_WRITE_SLICE_SDI, SME_INT_DOTPROD) (SME_INT_DOTPROD_LANE, SME_FP_DOTPROD, SME_FP_DOTPROD_LANE) (SME_INT_TERNARY_SLICE, SME_FP_TERNARY_SLICE, BHSD_BITS) (LUTI_BITS): New int iterators. (optab, sve_int_op): Handle the new unspecs. (sme_int_op, has_16bit_form): New int attributes. (bits_etype): Handle 64. * config/aarch64/aarch64.md (UNSPEC_LD1_SVE_COUNT): New unspec. (UNSPEC_ST1_SVE_COUNT, UNSPEC_LDNT1_SVE_COUNT): Likewise. (UNSPEC_STNT1_SVE_COUNT): Likewise. * config/aarch64/atomics.md (cas_short_expected_imm): Use Uhi rather than Uph for HImode immediates. * config/aarch64/aarch64-sve.md (@aarch64_ld1<SVE_FULLx24:mode>) (@aarch64_ldnt1<SVE_FULLx24:mode>, @aarch64_st1<SVE_FULLx24:mode>) (@aarch64_stnt1<SVE_FULLx24:mode>): New patterns. (@aarch64_<sur>dot_prod_lane<vsi2qi>): Extend to... (@aarch64_<sur>dot_prod_lane<SVE_FULL_SDI:mode><SVE_FULL_BHI:mode>) (@aarch64_<sur>dot_prod_lane<VNx4SI_ONLY:mode><VNx16QI_ONLY:mode>): ...these new patterns. (SVE_WHILE_B, SVE_WHILE_B_X2, SVE_WHILE_C): New constants. Add SVE_WHILE_B to existing while patterns. * config/aarch64/aarch64-sve2.md (@aarch64_sve_ptrue_c<BHSD_BITS>) (@aarch64_sve_pext<BHSD_BITS>, @aarch64_sve_pext<BHSD_BITS>x2) (@aarch64_sve_psel<BHSD_BITS>, *aarch64_sve_psel<BHSD_BITS>_plus) (@aarch64_sve_cntp_c<BHSD_BITS>, <frint_pattern><mode>2) (<optab><mode>3, *<optab><mode>3, @aarch64_sve_single_<optab><mode>) (@aarch64_sve_<sve_int_op><mode>): New patterns. (@aarch64_sve_single_<sve_int_op><mode>, @aarch64_sve_<su>clamp<mode>) (*aarch64_sve_<su>clamp<mode>_x, @aarch64_sve_<su>clamp_single<mode>) (@aarch64_sve_fclamp<mode>, *aarch64_sve_fclamp<mode>_x) (@aarch64_sve_fclamp_single<mode>, <optab><mode><v2xwide>2) (@aarch64_sve_<sur>dotvnx4sivnx8hi): New patterns. (@aarch64_sve_<maxmin_uns_op><mode>): Likewise. (*aarch64_sve_<maxmin_uns_op><mode>): Likewise. (@aarch64_sve_single_<maxmin_uns_op><mode>): Likewise. (aarch64_sve_fdotvnx4sfvnx8hf): Likewise. (aarch64_fdot_prod_lanevnx4sfvnx8hf): Likewise. (@aarch64_sve_<optab><VNx16QI_ONLY:mode><VNx16SI_ONLY:mode>): Likewise. (@aarch64_sve_<optab><VNx8HI_ONLY:mode><VNx8SI_ONLY:mode>): Likewise. (@aarch64_sve_<optab><VNx8HI_ONLY:mode><VNx8DI_ONLY:mode>): Likewise. (truncvnx8sf<mode>2, @aarch64_sve_cvtn<mode>): Likewise. (<optab><v_int_equiv><mode>2, <optab><mode><v_int_equiv>2): Likewise. (@aarch64_sve_sel<mode>): Likewise. (@aarch64_sve_while<while_optab_cmp>_b<BHSD_BITS>_x2): Likewise. (@aarch64_sve_while<while_optab_cmp>_c<BHSD_BITS>): Likewise. (@aarch64_pred_<optab><mode>, @cond_<optab><mode>): Likewise. (@aarch64_sve_<optab><mode>): Likewise. * config/aarch64/aarch64-sme.md (@aarch64_sme_<optab><mode><mode>) (*aarch64_sme_<optab><mode><mode>_plus, @aarch64_sme_read<mode>) (*aarch64_sme_read<mode>_plus, @aarch64_sme_write<mode>): New patterns. (*aarch64_sme_write<mode>_plus aarch64_sme_zero_zt0): Likewise. (@aarch64_sme_<optab><mode>, *aarch64_sme_<optab><mode>_plus) (@aarch64_sme_single_<optab><mode>): Likewise. (*aarch64_sme_single_<optab><mode>_plus): Likewise. (@aarch64_sme_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>) (*aarch64_sme_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>_plus) (@aarch64_sme_single_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>) (*aarch64_sme_single_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>_plus) (@aarch64_sme_single_sudot<VNx4SI_ONLY:mode><SME_ZA_BIx24:mode>) (*aarch64_sme_single_sudot<VNx4SI_ONLY:mode><SME_ZA_BIx24:mode>_plus) (@aarch64_sme_lane_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>) (*aarch64_sme_lane_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>_plus) (@aarch64_sme_<optab><VNx4SI_ONLY:mode><SVE_FULL_BHI:mode>) (*aarch64_sme_<optab><VNx4SI_ONLY:mode><SVE_FULL_BHI:mode>_plus) (@aarch64_sme_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx24:mode>) (*aarch64_sme_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx24:mode>_plus) (@aarch64_sme_single_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx24:mode>) (*aarch64_sme_single_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx24:mode>_plus) (@aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx124:mode>) (*aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx124:mode>) (@aarch64_sme_<optab><VNx2DI_ONLY:mode><VNx8HI_ONLY:mode>) (*aarch64_sme_<optab><VNx2DI_ONLY:mode><VNx8HI_ONLY:mode>_plus) (@aarch64_sme_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx24:mode>) (*aarch64_sme_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx24:mode>_plus) (@aarch64_sme_single_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx24:mode>) (*aarch64_sme_single_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx24:mode>_plus) (@aarch64_sme_lane_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx124:mode>) (*aarch64_sme_lane_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx124:mode>) (@aarch64_sme_<optab><VNx4SI_ONLY:mode><VNx8HI_ONLY:mode>) (@aarch64_sme_<optab><VNx4SI_ONLY:mode><VNx4SI_ONLY:mode>) (@aarch64_sme_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>) (*aarch64_sme_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>_plus) (@aarch64_sme_single_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>) (*aarch64_sme_single_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>_plus) (@aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>) (*aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>_plus) (@aarch64_sme_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>) (*aarch64_sme_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>_plus) (@aarch64_sme_single_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>) (*aarch64_sme_single_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>_plus) (@aarch64_sme_lane_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>) (*aarch64_sme_lane_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>) (@aarch64_sme_<optab><VNx4SI_ONLY:mode><SVE_FULL_HF:mode>) (*aarch64_sme_<optab><VNx4SI_ONLY:mode><SVE_FULL_HF:mode>_plus) (@aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx124:mode>) (*aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx124:mode>) (@aarch64_sme_lut<LUTI_BITS><mode>): Likewise. (UNSPEC_SME_LUTI): New unspec. * config/aarch64/aarch64-sve-builtins.def (single): New mode suffix. (c8, c16, c32, c64): New type suffixes. (vg1x2, vg1x4, vg2, vg2x1, vg2x2, vg2x4, vg4, vg4x1, vg4x2) (vg4x4): New group suffixes. * config/aarch64/aarch64-sve-builtins.h (CP_READ_ZT0) (CP_WRITE_ZT0): New constants. (get_svbool_t): Delete. (function_resolver::report_mismatched_num_vectors): New member function. (function_resolver::resolve_conversion): Likewise. (function_resolver::infer_predicate_type): Likewise. (function_resolver::infer_64bit_scalar_integer_pair): Likewise. (function_resolver::require_matching_predicate_type): Likewise. (function_resolver::require_nonscalar_type): Likewise. (function_resolver::finish_opt_single_resolution): Likewise. (function_resolver::require_derived_vector_type): Add an expected_num_vectors parameter. (function_expander::map_to_rtx_codes): Add an extra parameter for unconditional FP unspecs. (function_instance::gp_type_index): New member function. (function_instance::gp_type): Likewise. (function_instance::gp_mode): Handle multi-vector operations. * config/aarch64/aarch64-sve-builtins.cc (TYPES_all_count) (TYPES_all_pred_count, TYPES_c, TYPES_bhs_data, TYPES_bhs_widen) (TYPES_hs_data, TYPES_cvt_h_s_float, TYPES_cvt_s_s, TYPES_qcvt_x2) (TYPES_qcvt_x4, TYPES_qrshr_x2, TYPES_qrshru_x2, TYPES_qrshr_x4) (TYPES_qrshru_x4, TYPES_while_x, TYPES_while_x_c, TYPES_s_narrow_fsu) (TYPES_za_s_b_signed, TYPES_za_s_b_unsigned, TYPES_za_s_b_integer) (TYPES_za_s_h_integer, TYPES_za_s_h_data, TYPES_za_s_unsigned) (TYPES_za_s_float, TYPES_za_s_data, TYPES_za_d_h_integer): New type macros. (groups_x2, groups_x12, groups_x4, groups_x24, groups_x124) (groups_vg1x2, groups_vg1x4, groups_vg1x24, groups_vg2, groups_vg4) (groups_vg24): New group arrays. (function_instance::reads_global_state_p): Handle CP_READ_ZT0. (function_instance::modifies_global_state_p): Handle CP_WRITE_ZT0. (add_shared_state_attribute): Handle zt0 state. (function_builder::add_overloaded_functions): Skip MODE_single for non-tuple groups. (function_resolver::report_mismatched_num_vectors): New function. (function_resolver::resolve_to): Add a fallback error message for the general two-type case. (function_resolver::resolve_conversion): New function. (function_resolver::infer_predicate_type): Likewise. (function_resolver::infer_64bit_scalar_integer_pair): Likewise. (function_resolver::require_matching_predicate_type): Likewise. (function_resolver::require_matching_vector_type): Specifically diagnose mismatched vector counts. (function_resolver::require_derived_vector_type): Add an expected_num_vectors parameter. Extend to handle cases where tuples are expected. (function_resolver::require_nonscalar_type): New function. (function_resolver::check_gp_argument): Use gp_type_index rather than hard-coding VECTOR_TYPE_svbool_t. (function_resolver::finish_opt_single_resolution): New function. (function_checker::require_immediate_either_or): Remove hard-coded constants. (function_expander::direct_optab_handler): New function. (function_expander::use_pred_x_insn): Only add a strictness flag is the insn has an operand for it. (function_expander::map_to_rtx_codes): Take an unconditional FP unspec as an extra parameter. Handle tuples and MODE_single. (function_expander::map_to_unspecs): Handle tuples and MODE_single. * config/aarch64/aarch64-sve-builtins-functions.h (read_zt0) (write_zt0): New typedefs. (full_width_access::memory_vector): Use the function's vectors_per_tuple. (rtx_code_function_base): Add an optional unconditional FP unspec. (rtx_code_function::expand): Update accordingly. (rtx_code_function_rotated::expand): Likewise. (unspec_based_function_exact_insn::expand): Use tuple_mode instead of vector_mode. (unspec_based_uncond_function): New typedef. (cond_or_uncond_unspec_function): New class. (sme_1mode_function::expand): Handle single forms. (sme_2mode_function_t): Likewise, adding a template parameter for them. (sme_2mode_function): Update accordingly. (sme_2mode_lane_function): New typedef. (multireg_permute): New class. (class integer_conversion): Likewise. (while_comparison::expand): Handle svcount_t and svboolx2_t results. * config/aarch64/aarch64-sve-builtins-shapes.h (binary_int_opt_single_n, binary_opt_single_n, binary_single) (binary_za_slice_lane, binary_za_slice_int_opt_single) (binary_za_slice_opt_single, binary_za_slice_uint_opt_single) (binaryx, clamp, compare_scalar_count, count_pred_c) (dot_za_slice_int_lane, dot_za_slice_lane, dot_za_slice_uint_lane) (extract_pred, inherent_zt, ldr_zt, read_za, read_za_slice) (select_pred, shift_right_imm_narrowxn, storexn, str_zt) (unary_convertxn, unary_za_slice, unaryxn, write_za) (write_za_slice): Declare. * config/aarch64/aarch64-sve-builtins-shapes.cc (za_group_is_pure_overload): New function. (apply_predication): Use the function's gp_type for the predicate, instead of hard-coding the use of svbool_t. (parse_element_type): Add support for "c" (svcount_t). (parse_type): Add support for "c0" and "c1" (conversion destination and source types). (binary_za_slice_lane_base): New class. (binary_za_slice_opt_single_base): Likewise. (load_contiguous_base::resolve): Pass the group suffix to r.resolve. (luti_lane_zt_base): New class. (binary_int_opt_single_n, binary_opt_single_n, binary_single) (binary_za_slice_lane, binary_za_slice_int_opt_single) (binary_za_slice_opt_single, binary_za_slice_uint_opt_single) (binaryx, clamp): New shapes. (compare_scalar_def::build): Allow the return type to be a tuple. (compare_scalar_def::expand): Pass the group suffix to r.resolve. (compare_scalar_count, count_pred_c, dot_za_slice_int_lane) (dot_za_slice_lane, dot_za_slice_uint_lane, extract_pred, inherent_zt) (ldr_zt, read_za, read_za_slice, select_pred, shift_right_imm_narrowxn) (storexn, str_zt): New shapes. (ternary_qq_lane_def, ternary_qq_opt_n_def): Replace with... (ternary_qq_or_011_lane_def, ternary_qq_opt_n_or_011_def): ...these new classes. Allow a second suffix that specifies the type of the second vector argument, and that is used to derive the third. (unary_def::build): Extend to handle tuple types. (unary_convert_def::build): Use the new c0 and c1 format specifiers. (unary_convertxn, unary_za_slice, unaryxn, write_za): New shapes. (write_za_slice): Likewise. * config/aarch64/aarch64-sve-builtins-base.cc (svbic_impl::expand) (svext_bhw_impl::expand): Update call to map_to_rtx_costs. (svcntp_impl::expand): Handle svcount_t variants. (svcvt_impl::expand): Handle unpredicated conversions separately, dealing with tuples. (svdot_impl::expand): Handle 2-way dot products. (svdotprod_lane_impl::expand): Likewise. (svld1_impl::fold): Punt on tuple loads. (svld1_impl::expand): Handle tuple loads. (svldnt1_impl::expand): Likewise. (svpfalse_impl::fold): Punt on svcount_t forms. (svptrue_impl::fold): Likewise. (svptrue_impl::expand): Handle svcount_t forms. (svrint_impl): New class. (svsel_impl::fold): Punt on tuple forms. (svsel_impl::expand): Handle tuple forms. (svst1_impl::fold): Punt on tuple loads. (svst1_impl::expand): Handle tuple loads. (svstnt1_impl::expand): Likewise. (svwhilelx_impl::fold): Punt on tuple forms. (svdot_lane): Use UNSPEC_FDOT. (svmax, svmaxnm, svmin, svminmm): Add unconditional FP unspecs. (rinta, rinti, rintm, rintn, rintp, rintx, rintz): Use svrint_impl. * config/aarch64/aarch64-sve-builtins-base.def (svcreate2, svget2) (svset2, svundef2): Add _b variants. (svcvt): Use unary_convertxn. (svdot): Use ternary_qq_opt_n_or_011. (svdot_lane): Use ternary_qq_or_011_lane. (svmax, svmaxnm, svmin, svminnm): Use binary_opt_single_n. (svpfalse): Add a form that returns svcount_t results. (svrinta, svrintm, svrintn, svrintp): Use unaryxn. (svsel): Use binaryxn. (svst1, svstnt1): Use storexn. * config/aarch64/aarch64-sve-builtins-sme.h (svadd_za, svadd_write_za, svbmopa_za, svbmops_za, svdot_za) (svdot_lane_za, svldr_zt, svluti2_lane_zt, svluti4_lane_zt) (svmla_za, svmla_lane_za, svmls_za, svmls_lane_za, svread_za) (svstr_zt, svsub_za, svsub_write_za, svsudot_za, svsudot_lane_za) (svsuvdot_lane_za, svusdot_za, svusdot_lane_za, svusvdot_lane_za) (svvdot_lane_za, svwrite_za, svzero_zt): Declare. * config/aarch64/aarch64-sve-builtins-sme.cc (load_store_za_base): Rename to... (load_store_za_zt0_base): ...this and extend to tuples. (load_za_base, store_za_base): Update accordingly. (expand_ldr_str_zt0): New function. (svldr_zt_impl, svluti_lane_zt_impl, svread_za_impl, svstr_zt_impl) (svsudot_za_impl, svwrite_za_impl, svzero_zt_impl): New classes. (svadd_za, svadd_write_za, svbmopa_za, svbmops_za, svdot_za) (svdot_lane_za, svldr_zt, svluti2_lane_zt, svluti4_lane_zt) (svmla_za, svmla_lane_za, svmls_za, svmls_lane_za, svread_za) (svstr_zt, svsub_za, svsub_write_za, svsudot_za, svsudot_lane_za) (svsuvdot_lane_za, svusdot_za, svusdot_lane_za, svusvdot_lane_za) (svvdot_lane_za, svwrite_za, svzero_zt): New functions. * config/aarch64/aarch64-sve-builtins-sme.def: Add SME2 intrinsics. * config/aarch64/aarch64-sve-builtins-sve2.h (svbfmlslb, svbfmlslb_lane, svbfmlslt, svbfmlslt_lane, svclamp) (svcvtn, svpext, svpsel, svqcvt, svqcvtn, svqrshr, svqrshrn) (svqrshru, svqrshrun, svrevd, svunpk, svuzp, svuzpq, svzip) (svzipq): Declare. * config/aarch64/aarch64-sve-builtins-sve2.cc (svclamp_impl) (svcvtn_impl, svpext_impl, svpsel_impl): New classes. (svqrshl_impl::fold): Update for change to svrshl shape. (svrshl_impl::fold): Punt on tuple forms. (svsqadd_impl::expand): Update call to map_to_rtx_codes. (svunpk_impl): New class. (svbfmlslb, svbfmlslb_lane, svbfmlslt, svbfmlslt_lane, svclamp) (svcvtn, svpext, svpsel, svqcvt, svqcvtn, svqrshr, svqrshrn) (svqrshru, svqrshrun, svrevd, svunpk, svuzp, svuzpq, svzip) (svzipq): New functions. * config/aarch64/aarch64-sve-builtins-sve2.def: Add SME2 intrinsics. * config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Define or undefine __ARM_FEATURE_SME2. gcc/testsuite/ * gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: Provide a way for test functions to share ZT0. (ATTR): Update accordingly. (TEST_LOAD_COUNT, TEST_STORE_COUNT, TEST_PN, TEST_COUNT_PN) (TEST_EXTRACT_PN, TEST_SELECT_P, TEST_COMPARE_S_X2, TEST_COMPARE_S_C) (TEST_CREATE_B, TEST_GET_B, TEST_SET_B, TEST_XN, TEST_XN_SINGLE) (TEST_XN_SINGLE_Z15, TEST_XN_SINGLE_AWKWARD, TEST_X2_NARROW) (TEST_X4_NARROW): New macros. * gcc.target/aarch64/sve/acle/asm/create2_1.c: Add _b tests. * gcc.target/aarch64/sve/acle/general-c/binary_za_m_1.c: Remove test for svmopa that becomes valid with SME2. * gcc.target/aarch64/sve/acle/general-c/create_1.c: Adjust for existence of svboolx2_t version of svcreate2. * gcc.target/aarch64/sve/acle/general-c/store_1.c: Adjust error messages to account for svcount_t predication. * gcc.target/aarch64/sve/acle/general-c/store_2.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/ternary_qq_lane_1.c: Adjust error messages to account for new SME2 variants. * gcc.target/aarch64/sve/acle/general-c/ternary_qq_opt_n_2.c: Likewise. |
||
Richard Sandiford
|
8d29b7aca1 |
aarch64: Add ZT0
SME2 adds a 512-bit lookup table called ZT0. It is enabled and disabled by PSTATE.ZA, just like ZA itself. This patch adds support for the register, including saving and restoring contents. The code reuses the V8DI that was added for LS64, including the associated memory classification rules. (The ZT0 range is more restricted than the LS64 range, but that's enforced by predicates and constraints.) gcc/ * config/aarch64/aarch64.md (ZT0_REGNUM): New constant. (LAST_FAKE_REGNUM): Bump to include it. * config/aarch64/aarch64.h (FIXED_REGISTERS): Add an entry for ZT0. (CALL_REALLY_USED_REGISTERS, REGISTER_NAMES): Likewise. (REG_CLASS_CONTENTS): Likewise. (machine_function): Add zt0_save_buffer. (CUMULATIVE_ARGS): Add shared_zt0_flags; * config/aarch64/aarch64.cc (aarch64_check_state_string): Handle zt0. (aarch64_fntype_pstate_za, aarch64_fndecl_pstate_za): Likewise. (aarch64_function_arg): Add the shared ZT0 flags as an extra limb of the parallel. (aarch64_init_cumulative_args): Initialize shared_zt0_flags. (aarch64_extra_live_on_entry): Handle ZT0_REGNUM. (aarch64_epilogue_uses): Likewise. (aarch64_get_zt0_save_buffer, aarch64_save_zt0): New functions. (aarch64_restore_zt0): Likewise. (aarch64_start_call_args): Reject calls to functions that share ZT0 from functions that have no ZT0 state. Save ZT0 around shared-ZA calls that do not share ZT0. (aarch64_expand_call): Handle ZT0. Reject calls to functions that share ZT0 but not ZA from functions with ZA state. (aarch64_end_call_args): Restore ZT0 after calls to shared-ZA functions that do not share ZT0. (aarch64_set_current_function): Require +sme2 for functions that have ZT0 state. (aarch64_function_attribute_inlinable_p): Don't allow functions to be inlined if they have local zt0 state. (AARCH64_IPA_CLOBBERS_ZT0): New constant. (aarch64_update_ipa_fn_target_info): Record asms that clobber ZT0. (aarch64_can_inline_p): Don't inline callees that clobber ZT0 into functions that have ZT0 state. (aarch64_comp_type_attributes): Check for compatible ZT0 sharing. (aarch64_optimize_mode_switching): Use mode switching if the function has ZT0 state. (aarch64_mode_emit_local_sme_state): Save and restore ZT0 around calls to private-ZA functions. (aarch64_mode_needed_local_sme_state): Require ZA to be active for instructions that access ZT0. (aarch64_mode_entry): Mark ZA as dead on entry if the function only shares state other than "za" itself. (aarch64_mode_exit): Likewise mark ZA as dead on return. (aarch64_md_asm_adjust): Extend handling of ZA clobbers to ZT0. * config/aarch64/aarch64-c.cc (aarch64_define_unconditional_macros): Define __ARM_STATE_ZT0. * config/aarch64/aarch64-sme.md (UNSPECV_ASM_UPDATE_ZT0): New unspecv. (aarch64_asm_update_zt0): New insn. (UNSPEC_RESTORE_ZT0): New unspec. (aarch64_sme_ldr_zt0, aarch64_restore_zt0): New insns. (aarch64_sme_str_zt0): Likewise. gcc/testsuite/ * gcc.target/aarch64/sme/zt0_state_1.c: New test. * gcc.target/aarch64/sme/zt0_state_2.c: Likewise. * gcc.target/aarch64/sme/zt0_state_3.c: Likewise. * gcc.target/aarch64/sme/zt0_state_4.c: Likewise. * gcc.target/aarch64/sme/zt0_state_5.c: Likewise. * gcc.target/aarch64/sme/zt0_state_6.c: Likewise. |
||
Richard Sandiford
|
724a873b14 |
aarch64: Add svboolx2_t
SME2 has some instructions that operate on pairs of predicates. The SME2 ACLE defines an svboolx2_t type for the associated intrinsics. The patch uses a double-width predicate mode, VNx32BI, to represent the contents, similarly to how data vector tuples work. At present there doesn't seem to be any need to define pairs for VNx2BI, VNx4BI and VNx8BI. We already supported pairs of svbool_ts at the PCS level, as part of a more general framework. All that changes on the PCS side is that we now have an associated mode. gcc/ * config/aarch64/aarch64-modes.def (VNx32BI): New mode. * config/aarch64/aarch64-protos.h (aarch64_split_double_move): Declare. * config/aarch64/aarch64-sve-builtins.cc (register_tuple_type): Handle tuples of predicates. (handle_arm_sve_h): Define svboolx2_t as a pair of two svbool_ts. * config/aarch64/aarch64-sve.md (movvnx32bi): New insn. * config/aarch64/aarch64.cc (pure_scalable_type_info::piece::get_rtx): Use VNx32BI for pairs of predicates. (pure_scalable_type_info::add_piece): Don't try to form pairs of predicates. (VEC_STRUCT): Generalize comment. (aarch64_classify_vector_mode): Handle VNx32BI. (aarch64_array_mode): Likewise. Return BLKmode for arrays of predicates that have no associated mode, rather than allowing an integer mode to be chosen. (aarch64_hard_regno_nregs): Handle VNx32BI. (aarch64_hard_regno_mode_ok): Likewise. (aarch64_split_double_move): New function, split out from... (aarch64_split_128bit_move): ...here. (aarch64_ptrue_reg): Tighten assert to aarch64_sve_pred_mode_p. (aarch64_pfalse_reg): Likewise. (aarch64_sve_same_pred_for_ptest_p): Likewise. (aarch64_sme_mode_switch_regs::add_reg): Handle VNx32BI. (aarch64_expand_mov_immediate): Restrict handling of boolean vector constants to single-predicate modes. (aarch64_classify_address): Handle VNx32BI, ensuring that both halves can be addressed. (aarch64_class_max_nregs): Handle VNx32BI. (aarch64_member_type_forces_blk): Don't for BLKmode for svboolx2_t. (aarch64_simd_valid_immediate): Allow all-zeros and all-ones for VNx32BI. (aarch64_mov_operand_p): Restrict predicate constant canonicalization to single-predicate modes. (aarch64_evpc_ext): Generalize exclusion to all predicate modes. (aarch64_evpc_rev_local, aarch64_evpc_dup): Likewise. * config/aarch64/constraints.md (PR_REGS): New predicate. gcc/testsuite/ * gcc.target/aarch64/sve/pcs/struct_3_128.c (test_nonpst3): Adjust stack offsets. (ret_nonpst3): Remove XFAIL. * gcc.target/aarch64/sve/acle/general-c/svboolx2_1.c: New test. |
||
Richard Sandiford
|
37be343727 |
aarch64: Add svcount_t
Some SME2 instructions interpret predicates as counters, rather than as bit-per-byte masks. The SME2 ACLE defines an svcount_t type for this interpretation. I don't think we have a better way of representing counters than the VNx16BI that we use for masks. The patch therefore doesn't add a new mode for this representation. It's just something that is interpreted in context, a bit like signed vs. unsigned integers. gcc/ * config/aarch64/aarch64-sve-builtins-base.cc (svreinterpret_impl::fold): Handle reinterprets between svbool_t and svcount_t. (svreinterpret_impl::expand): Likewise. * config/aarch64/aarch64-sve-builtins-base.def (svreinterpret): Add b<->c forms. * config/aarch64/aarch64-sve-builtins.cc (TYPES_reinterpret_b): New type suffix list. (wrap_type_in_struct, register_type_decl): New functions, split out from... (register_tuple_type): ...here. (register_builtin_types): Handle svcount_t. (handle_arm_sve_h): Don't create tuples of svcount_t. * config/aarch64/aarch64-sve-builtins.def (svcount_t): New type. (c): New type suffix. * config/aarch64/aarch64-sve-builtins.h (TYPE_count): New type class. gcc/testsuite/ * g++.target/aarch64/sve/acle/general-c++/mangle_1.C: Add test for svcount_t. * g++.target/aarch64/sve/acle/general-c++/mangle_2.C: Likewise. * g++.target/aarch64/sve/acle/general-c++/svcount_1.C: New test. * gcc.target/aarch64/sve/acle/asm/test_sve_acle.h (TEST_DUAL_P) (TEST_DUAL_P_REV): New macros. * gcc.target/aarch64/sve/acle/asm/reinterpret_b.c: New test. * gcc.target/aarch64/sve/acle/general-c/load_1.c: Test passing an svcount_t. * gcc.target/aarch64/sve/acle/general-c/svcount_1.c: New test. * gcc.target/aarch64/sve/acle/general-c/unary_convert_1.c: Test reinterprets involving svcount_t. * gcc.target/aarch64/sve/acle/general/attributes_7.c: Test svcount_t. * gcc.target/aarch64/sve/pcs/annotate_1.c: Likewise. * gcc.target/aarch64/sve/pcs/annotate_2.c: Likewise. * gcc.target/aarch64/sve/pcs/args_12.c: New test. |
||
Richard Sandiford
|
3b58b2205f |
aarch64: Add +sme2
gcc/ * doc/invoke.texi: Document +sme2. * doc/sourcebuild.texi: Document aarch64_sme2. * config/aarch64/aarch64-option-extensions.def (AARCH64_OPT_EXTENSION): Add sme2. * config/aarch64/aarch64.h (AARCH64_ISA_SME2, TARGET_SME2): New macros. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_aarch64_sme2): New target test. (check_effective_target_aarch64_asm_sme2_ok): Likewise. |