mirrors/gcc

mirror of https://gcc.gnu.org/git/gcc.git synced 2024-11-24 19:33:59 +08:00

Author	SHA1	Message	Date
Jakub Jelinek	699e9a0f67	openmp: Fix up include of the generic allocator.c As reported by Richard Sandiford, #include "../../../allocator.c" has one too many ../s, dunno why it worked for me when using ../configure (VPATH = ../../../libgomp) 2022-06-09 Jakub Jelinek <jakub@redhat.com> * config/linux/allocator.c: Fix up #include directive.	2022-06-09 19:44:50 +02:00
Jakub Jelinek	4c334e0e4f	c++: Fix up ICE on __builtin_shufflevector constexpr evaluation [PR105871] As the following testcase shows, BIT_FIELD_REF result doesn't have to have just integral type, it can also have vector type. And in that case cxx_eval_bit_field_ref just ICEs on it because it is unprepared for that case, creates the initial value with build_int_cst (sure, that one could be easily replaced with build_zero_cst) and then expects it can through shifts, ands and ors come up with the final value, but that doesn't work for vectors. We already call fold_ternary if whole is a VECTOR_CST, this patch does the same if the result doesn't have integral type. And, there is no guarantee fold_ternary will succeed and the callers certainly don't expect NULL being returned, so it also diagnoses those as non-constant and returns original t in that case. 2022-06-09 Jakub Jelinek <jakub@redhat.com> PR c++/105871 * constexpr.cc (cxx_eval_bit_field_ref): For BIT_FIELD_REF with non-integral result type use fold_ternary too like for BIT_FIELD_REFs from VECTOR_CST. If fold_ternary returns NULL, diagnose non-constant expression, set non_constant_p and return t, instead of returning NULL. g++.dg/pr105871.C: New test.	2022-06-09 17:42:31 +02:00
Maciej W. Rozycki	702a11ade2	RISC-V: Use a tab rather than space with FSFLAGS Consistently use a tab rather than a space as the separator between the assembly instruction mnemonic and its operand with FSFLAGS instructions produced with the unordered FP comparison RTL insns. gcc/ * config/riscv/riscv.md (f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_default) (f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_snan): Emit a tab rather than space with FSFLAGS.	2022-06-09 14:34:34 +01:00
Nathan Sidwell	97b81fb036	c++: Better module initializer code Every module interface needs to emit a global initializer, but it might have nothing to init. In those cases, there's no need for any idempotency boolean to be emitted. gcc/cp * cp-tree.h (module_initializer_kind): Replace with ... (module_global_init_needed, module_has_import_inits): ... these. * decl2.cc (start_objects): Add has_body parm. Reorganize module initializer creation. (generate_ctor_or_dtor_function): Adjust. (c_parse_final_cleanups): Adjust. (vtv_start_verification_constructor_init_function): Adjust. * module.cc (module_initializer_kind): Replace with ... (module_global_init_needed, module_has_import_inits): ... these. gcc/testsuite/ * g++.dg/modules/init-2_a.C: Check no idempotency. * g++.dg/modules/init-2_b.C: Check idempotency.	2022-06-09 06:22:15 -07:00
Tobias Burnus	209de00fdb	OpenMP: Handle ancestor:1 with discover_declare_target gcc/ * omp-offload.cc (omp_discover_declare_target_tgt_fn_r, omp_discover_declare_target_fn_r): Don't walk reverse-offload target regions. gcc/testsuite/ * c-c++-common/gomp/reverse-offload-1.c: New.	2022-06-09 14:48:24 +02:00
Jakub Jelinek	2dc19a1b59	doc: Fix up -Waddress documentation WHen looking up the -Waddress documentation due to some PR that mentioned it, I've noticed some typos and thus I'm fixing them. 2022-06-09 Jakub Jelinek <jakub@redhat.com> * doc/invoke.texi (-Waddress): Fix a typo in small example. Fix typos inptr_t -> intptr_t and uinptr_t -> uintptr_t.	2022-06-09 10:19:53 +02:00
Jakub Jelinek	17f52a1c72	openmp: Add support for HBW or large capacity or interleaved memory through the libmemkind.so library This patch adds support for dlopening libmemkind.so on Linux and uses it for some kinds of allocations (but not yet e.g. pinned memory). 2022-06-09 Jakub Jelinek <jakub@redhat.com> * allocator.c: Include dlfcn.h if LIBGOMP_USE_MEMKIND is defined. (enum gomp_memkind_kind): New type. (struct omp_allocator_data): Add memkind field if LIBGOMP_USE_MEMKIND is defined. (struct gomp_memkind_data): New type. (memkind_data, memkind_data_once): New variables. (gomp_init_memkind, gomp_get_memkind): New functions. (omp_init_allocator): Initialize data.memkind, don't fail for omp_high_bw_mem_space if libmemkind supports it. (omp_aligned_alloc, omp_free, omp_aligned_calloc, omp_realloc): Add memkind support of LIBGOMP_USE_MEMKIND is defined. * config/linux/allocator.c: New file.	2022-06-09 10:14:42 +02:00
Cui,Lili	269edf4e5e	Update {skylake,icelake,alderlake}_cost to add a bit preference to vector store. Since the interger vector construction cost has changed, we need to adjust the load and store costs for intel processers. With the patch applied 538.imagic_r:gets ~6% improvement on ADL for multicopy. 525.x264_r :gets ~2% improvement on ADL and ICX for multicopy. with no measurable changes for other benchmarks. gcc/ChangeLog PR target/105493 * config/i386/x86-tune-costs.h (skylake_cost): Raise the gpr load cost from 4 to 6 and gpr store cost from 6 to 8. Change SSE loads and unaligned loads cost from {6, 6, 6, 10, 20} to {8, 8, 8, 8, 16}. (icelake_cost): Ditto. (alderlake_cost): Raise the gpr store cost from 6 to 8 and SSE loads, stores and unaligned stores cost from {6, 6, 6, 10, 15} to {8, 8, 8, 10, 15}. gcc/testsuite/ PR target/105493 * gcc.target/i386/pr91446.c: Adjust to expect vectorization * gcc.target/i386/pr99881.c: XFAIL. * gcc.target/i386/pr105493.c: New. * g++.target/i386/pr105638.C: Use other sequence checks instead of vpxor, because code generation changed.	2022-06-09 14:59:44 +08:00
Haochen Gui	2fc6e3d55f	This patch replaces shift and ior insns with one rotate and mask insn for the split patterns which are for DI byte swap on Power6. gcc/ * config/rs6000/rs6000.md (define_split for bswapdi load): Merge shift and ior insns to one rotate and mask insn. (define_split for bswapdi register): Likewise. gcc/testsuite/ * gcc.target/powerpc/pr93453-1.c: New.	2022-06-09 13:31:09 +08:00
GCC Administrator	02b4e2de32	Daily bump.	2022-06-09 00:16:26 +00:00
Jason Merrill	e8ed26c2ac	c++: non-templated friends [PR105852] The previous patch for 105852 avoids copying DECL_TEMPLATE_INFO from a non-templated friend, but it really shouldn't have it in the first place. PR c++/105852 gcc/cp/ChangeLog: * decl.cc (duplicate_decls): Change non-templated friend check to an assert. * pt.cc (tsubst_function_decl): Don't set DECL_TEMPLATE_INFO on non-templated friends. (tsubst_friend_function): Adjust.	2022-06-08 16:38:25 -04:00
Jason Merrill	7d87790a87	c++: redeclared hidden friend take 2 [PR105852] My previous patch for 105761 avoided copying DECL_TEMPLATE_INFO from a friend to a later definition, but in this testcase we have first a non-friend declaration and then a definition, and we need to avoid copying in that case as well. But we do still want to set new_template_info to avoid GC trouble. With this change, the modules dump correctly identifies ::foo as a non-template function in tpl-friend-2_a.C. Along the way I noticed that the duplicate_decls handling of DECL_UNIQUE_FRIEND_P was backwards for templates, where we don't clobber DECL_LANG_SPECIFIC (olddecl) with DECL_LANG_SPECIFIC (newdecl) like we do for non-templates. PR c++/105852 PR c++/105761 gcc/cp/ChangeLog: * decl.cc (duplicate_decls): Avoid copying template info from non-templated friend even if newdecl isn't a definition. Correct handling of DECL_UNIQUE_FRIEND_P on templates. * pt.cc (non_templated_friend_p): New. * cp-tree.h (non_templated_friend_p): Declare it. gcc/testsuite/ChangeLog: * g++.dg/modules/tpl-friend-2_a.C: Adjust expected dump. * g++.dg/template/friend74.C: New test.	2022-06-08 16:37:50 -04:00
Roger Sayle	b6e1373bd3	PR middle-end/105874: Use EXPAND_MEMORY to fix ada bootstrap. Many thanks to Tamar Christina for filing PR middle-end/105874 indicating that SPECcpu 2017's Leela is failing on x86_64 due to a miscompilation of FastBoard::is_eye. This function is much smaller and easier to work with than my previous hunt for the cause of the Ada bootstrap failures due to miscompilation somewhere in GCC (or one of the 131 places that the problematic form of optimization triggers during an ada bootstrap). It turns out the source of the miscompilation introduced by my recent patch is the distinction (during RTL expansion) of l-values and r-values. According to the documentation above expand_modifier, EXPAND_MEMORY should be used for lvalues (when a memory is required), and EXPAND_NORMAL for rvalues when a constant is permissible. In what I'd like to consider a latent bug, the recursive call to expand_expr_real on line 11188 of expr.cc, in the case handling ARRAY_REF, COMPONENT_REF, BIT_FIELD_REF and ARRARY_RANGE_REF was passing EXPAND_NORMAL when it really required (the semantics of) EXPAND_MEMORY. All the time that VAR_DECLs were being returned as memory this was fine, but as soon as we're able to optimize sort arrays into immediate constants, bad things happen. In the test case from Leela, we notice that the array s_eyemask always has DImode constant value { 4, 64 }, which is useful as an rvalue, but not when we need to index it as an lvalue, as in s_eyemask[color]. This also explains why everything being accepted by immediate_const_ctor_p (during an ada bootstrap) looks reasonable, what's incorrect is that we don't know how these structs/arrays are to be used. The fix is to ensure that we call expand_expr with EXPAND_MEMORY when processing the VAR_DECL's returned by get_inner_reference. 2022-06-08 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR middle-end/105874 * expr.cc (expand_expr_real_1) <normal_inner_ref>: New local variable tem_modifier for calculating the expand_modifier enum to use for expanding tem. If tem is a VAR_DECL, use EXPAND_MEMORY. gcc/testsuite/ChangeLog PR middle-end/105874 * g++.dg/opt/pr105874.C: New test case.	2022-06-08 20:43:03 +01:00
Max Filippov	e94c6dbfb5	gcc: xtensa: fix PR target/105879 split_double operates with the 'word that comes first in memory in the target' terminology, while gen_lowpart operates with the 'value representing some low-order bits of X' terminology. They are not equivalent and must be dealt with differently on little- and big-endian targets. gcc/ PR target/105879 * config/xtensa/xtensa.md (movdi): Rename 'first' and 'second' to 'lowpart' and 'highpart' so that they match 'gen_lowpart' and 'gen_highpart' bitwise semantics and fix order of highpart and lowpart depending on target endianness.	2022-06-08 08:47:40 -07:00
Nathan Sidwell	90a6c3b6d6	c++: Reimplement static init/fini generation Currently we generate static init/fini code by generating a set of functions taking an 'initp' bool and an unsigned priority. (There can be more than one, as we repeat the end-of-compile loop.) We then generate a set of real init or fini functions for each needed prioroty, calling the previous set of functions. This is of course very tangled, but excitingly the value-range-propagator is clever enough to unentangle it. However, the current arrangement makes generation awkward, particularly as to how to optimize the module-global-init generation. This reimplements the generation to generate a set of separate init/fini functions for each needed priority, and then call them from the real inits previously mentioned. This replaces a splay tree, recording which priority/init combos we needed, with a pair of hash tables, mapping priority to init functions. Much simpler. While there, rename several of the functions as they are only dealing with part of the init/fini generation, not the whole set. gcc/cp/ * decl2.cc (struct priority_info_s, priority_info): Delete. (priority_map_traits, priority_map_t): New. (static_init_fini_fns): New. (INITIALIZE_P_IDENTIFIER, PRIORITY_IDENTIFIER): Delete. (initialize_p_decl, priority_decl): Delete. (ssdf_decls, priority_info_map): Delete. (start_static_storage_duration_function): Rename to ... (start_partial_init_fini_fn): ... here. Create a void arg fn. Add it to the slot in the appropriate static_init_fini_fns hash table. (finish_static_storage_duration_function): Rename to ... (finish_partial_init_fini_fn): ... here. (get_priority_info): Delete. (one_static_initialization_or_destruction): Assert not trivial dtor. (do_static_initialization_or_destruction): Rename to ... (emit_partial_init_fini_fn) ... here. Start & finish the fn. Simply init/fini each var. (partition_vars_for_init_fini): Partition vars according to priority and add to init and/or fini list. (generate_ctor_or_dtor_function): Start and finish the function. Do santitizer calls here. (generate_ctor_and_dtor_functions_for_priority): Delete. (c_parse_final_cleanups): Reimplement global init/fini processing. gcc/testsuite/ * g++.dg/init/static-cdtor1.C: New.	2022-06-08 07:44:20 -07:00
Roger Sayle	d8c2580941	[Committed] Add -mno-avx2 to recent gcc.target/i386/xop-vpcmov3.c Adding -march=cascadelake to the command line options of the recently added xop-vpcmov3.c test case causes problems as GCC then prefers to use AVX512's vpternlogd instruction, instead of the XOP vpcmov that the test is checking for. This is easily solved by adding an explicit -mno-avx512vl to the command line options. Committed to mainline as obvious (in hindsight). 2022-06-08 Roger Sayle <roger@nextmovesoftware.com> gcc/testsuite/ChangeLog * gcc.target/i386/xop-pcmov3.c: Add -mno-avx512vl to dg-options.	2022-06-08 10:06:23 +01:00
Tobias Burnus	5e5deac508	OpenMP: Fortran - fix ancestor's requires reverse_offload check gcc/fortran/ * openmp.cc (gfc_match_omp_clauses): Check also parent namespace for 'requires reverse_offload'. gcc/testsuite/ * gfortran.dg/gomp/target-device-ancestor-5.f90: New test.	2022-06-08 10:06:57 +02:00
Chung-Ju Wu	ef5cc6bbb6	arm: Add star-mc1 cpu The star-mc1 is an embedded processor with armv8m architecture. Majorly it is designed to meet the requirements of AIoT application performance, power consumption and security. This patch is to add support of star-mc1 cpu. Signed-off-by: Chung-Ju Wu <jasonwucj@gmail.com> gcc/ChangeLog: * config/arm/arm-cpus.in (star-mc1): New cpu. * config/arm/arm-tables.opt: Regenerate. * config/arm/arm-tune.md: Regenerate. * doc/invoke.texi: Update docs.	2022-06-08 07:17:19 +00:00
Yang Yujie	75df1594ae	libgccjit: allow common objects in $(EXTRA_GCC_OBJS) and $(EXTRA_OBJS) This patch fixes libgccjit build failure on loongarch* targets, and could probably be useful for future ports. For now, libgccjit is linked with objects from $(EXTRA_GCC_OBJS) and libbackend.a, which contains object files from $(EXTRA_OBJS). This effectively forbids any overlap between those two lists, i.e. all target-specific shared code between the gcc driver and compiler executables must go into gcc/common/config/<arch>/<arch>-common.cc, which feels a bit inconvenient when there are a lot of "common" stuff that we want to put into separate source files. By linking libgccjit with $(EXTRA_GCC_OBJS_EXCLUSIVE), which contains all elements from $(EXTRA_GCC_OBJS) but not $(EXTRA_OBJS), this problem can be alleviated. This patch does not affect any other target architecture than loongarch, and has been bootstrapped and regression-tested on loongarch64-linux-gnuf64 an x86_64-pc-linux-gnu. * gcc/jit/ChangeLog: * Make-lang.in: only link objects from $(EXTRA_GCC_OBJS) that's not in $(EXTRA_OBJS) into libgccjit.	2022-06-08 14:45:02 +08:00
liuhongt	5e005393d4	Disparages SSE_REGS alternatives sligntly with ?v instead of v in mov{si,di}_internal. So alternative v won't be igored in record_reg_classess. Similar for r alternatives in some vector patterns. It helps testcase in the PR, also RA now makes better decisions for gcc.target/i386/extract-insert-combining.c movd %esi, %xmm0 movd %edi, %xmm1 - movl %esi, -12(%rsp) paddd %xmm0, %xmm1 pinsrd $0, %esi, %xmm0 paddd %xmm1, %xmm0 The patch has no big impact on SPEC2017 for both O2 and Ofast march=native run. And I noticed there's some changes in SPEC2017 from code like mov mem, %eax vmovd %eax, %xmm0 .. mov %eax, 64(%rsp) to vmovd mem, %xmm0 .. vmovd %xmm0, 64(%rsp) Which should be exactly what we want? gcc/ChangeLog: PR target/105513 PR target/105504 config/i386/i386.md (movsi_internal): Change alternative from v to ?v. (movdi_internal): Ditto. config/i386/sse.md (vec_set<mode>_0): Change alternative r to ?r. (vec_extractv4sf_mem): Ditto. (vec_extracthf): Ditto. gcc/testsuite/ChangeLog: gcc.target/i386/pr105513-1.c: New test. * gcc.target/i386/extract-insert-combining.c: Add new scan-assembler-not for spill.	2022-06-08 11:23:49 +08:00
liuhongt	e4bdeaba6e	Adjust testcase to avoid compile failure under -m32. gcc/testsuite/ChangeLog: PR target/105854 * gcc.target/i386/pr105854.c: Add target int128 and dfp.	2022-06-08 10:59:18 +08:00
GCC Administrator	445ba599cb	Daily bump.	2022-06-08 00:16:28 +00:00
Richard Earnshaw	2005b9b888	arm: Improve code generation for BFI and BFC [PR105090] This patch, in response to PR105090, makes some general improvements to the code generation when BFI and BFC instructions are available. Firstly we handle more cases where the RTL does not generate an INSV operation due to a lack of a tie between the input and output, but we nevertheless need to emit BFI later on; we handle this by requiring the register allocator to tie the operands. Secondly we handle some cases where we were previously emitting BFC, but AND with an immediate would be better; we do this by converting all BFC patterns into AND using a split pattern. And finally, we handle some cases where previously we would emit multiple BIC operations to clear a value, but could instead use a single BFC instruction. BFC and BFI express the mask as a pair of values, one for the number of bits to clear and another for the location of the least significant bit. We handle these with a single new output modifier letter that causes both values to be printed; we use an 'inverted' value so that it can be used directly with the constant used in an AND rtl construct. We've run out of 'new' letters, so to do this we re-use one of the long-obsoleted Maverick output modifiers. gcc/ChangeLog: PR target/105090 * config/arm/arm.cc (arm_bfi_1_p): New function. (arm_bfi_p): New function. (arm_rtx_costs_internal): Add costs for BFI idioms. (arm_print_operand [case 'V']): Format output for BFI/BFC masks. * config/arm/constraints.md (Dj): New constraint. * config/arm/arm.md (arm_andsi3_insn): Add alternative to use BFC. (insv_zero): Convert to an insn with a split. (bfi, bfi_alt1, bfi_alt2, bfi_alt3): New patterns.	2022-06-07 12:12:20 +01:00
liuhongt	cd22395457	Fix insn does not satisfy its constraints: sse2_lshrv1ti3 21114(define_insn_and_split "ssse3_palignrdi" 21115 [(set (match_operand:DI 0 "register_operand" "=y,x,Yv") 21116 (unspec:DI [(match_operand:DI 1 "register_operand" "0,0,Yv") 21117 (match_operand:DI 2 "register_mmxmem_operand" "ym,x,Yv") 21118 (match_operand:SI 3 "const_0_to_255_mul_8_operand")] 21119 UNSPEC_PALIGNR))] 21120 "(TARGET_MMX \|\| TARGET_MMX_WITH_SSE) && TARGET_SSSE3" Alternative 2 requires Yw instead of Yv since it's splitted to vpsrldq which requires AVX512VL & AVX512BW for evex version. gcc/ChangeLog: PR target/105854 * config/i386/sse.md (ssse3_palignrdi): Change alternative 2 from Yv to Yw. gcc/testsuite/ChangeLog: * gcc.target/i386/pr105854.c: New test.	2022-06-07 17:32:21 +08:00
Roger Sayle	c00e1e3aa5	PR middle-end/105853: Call store_constructor directly from calls.cc. This patch fixes both ICE regressions PR middle-end/105853 and PR target/105856 caused by my recent patch to expand small const structs as immediate constants. That patch updated code generation in three places: two in expr.cc that call store_constructor directly, and the third in calls.cc's load_register_parameters that expands its CONSTRUCTOR via expand_expr, as store_constructor is local/static to expr.cc, and the "public" API, should usually simply forward the constructor to the appropriate store_constructor function. Alas, despite the clean regression testing on multiple targets, the above ICEs show that expand_expr isn't a suitable proxy for store_constructor, and things that (I'd assumed) shouldn't affect how/whether a struct is placed in a register [such as whether the struct is considered packed/ aligned or not] actually interfere with the optimization that is being attempted. The (proposed) solution is to export store_constructor (and it's helper function int_expr_size) from expr.cc, by removing their static qualifier and prototyping both functions in expr.h, so they can be called directly from load_register_parameters in calls.cc. This cures both ICEs, but almost as importantly improves code generation over GCC 12. For PR 105853, GCC 12 generates: compose_nd_na_ipv6_src: movzx eax, WORD PTR eth_addr_zero[rip+2] movzx edx, WORD PTR eth_addr_zero[rip] movzx edi, WORD PTR eth_addr_zero[rip+4] sal rax, 16 or rax, rdx sal rdi, 32 or rdi, rax xor eax, eax jmp packet_set_nd eth_addr_zero: .zero 6 where now (with this fix) GCC 13 generates: compose_nd_na_ipv6_src: xorl %edi, %edi xorl %eax, %eax jmp packet_set_nd Likewise, for PR 105856 on ARM, we'd previously generate: g_329_3: movw r3, #:lower16:.LANCHOR0 movt r3, #:upper16:.LANCHOR0 ldr r0, [r3] b func_19 but with this optimization we now generate: g_329_3: mov r0, #6 b func_19 2022-06-07 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR middle-end/105853 PR target/105856 * calls.cc (load_register_parameters): Call store_constructor and int_expr_size directly instead of expanding via expand_expr. * expr.cc (static void store_constructor): Don't prototype here. (static HOST_WIDE_INT int_expr_size): Likewise. (store_constructor): No longer static. (int_expr_size): Likewise, no longer static. * expr.h (store_constructor): Prototype here. (int_expr_size): Prototype here. gcc/testsuite/ChangeLog PR middle-end/105853 PR target/105856 * gcc.dg/pr105853.c: New test case. * gcc.dg/pr105856.c: New test case.	2022-06-07 10:09:49 +01:00
Jan Beulich	cef3f69c2f	Revert "configure: arrange to use appropriate objcopy" This reverts commit `6124f42488`. It lacks pieces to work with system binutils.	2022-06-07 10:24:53 +02:00
Jakub Jelinek	03b7140632	openmp: Add support for OpenMP 5.2 linear clause syntax for C/C++ The syntax for linear clause changed in 5.2, the original syntax which is still valid is: linear (var1, var2) linear (var3, var4 : step1) The 4.5 syntax with modifiers like: linear (val (var5, var6)) linear (val (var7, var8) : step2) is still supported in 5.2, but is deprecated there. Instead, one can use a new syntax: linear (var9, var10 : val) linear (var11, var12 : step (step3), val) As val, ref, uval or step (someexpr) can be valid expressions (and especially in C++ can be const / constexpr / consteval), the spec says that when the whole step expression is val (or ref or uval) or step ( ... ) then it is the new modifier syntax, one can use + 0 or 0 + or 1 * or * 1 or ()s to say it is the old step expression. Also, 5.2 now allows val modifier to be specified even outside of declare simd (but not the other modifiers). I've implemented this for the new modifier syntax only, the old one keeps the old restriction (which is why OMP_CLAUSE_LINEAR_OLD_LINEAR_MODIFIER flag has been introduced). 2022-06-07 Jakub Jelinek <jakub@redhat.com> gcc/ * tree.h (OMP_CLAUSE_LINEAR_OLD_LINEAR_MODIFIER): Define. * tree-pretty-print.cc (dump_omp_clause) <case OMP_CLAUSE_LINEAR>: Adjust clause printing style depending on OMP_CLAUSE_LINEAR_OLD_LINEAR_MODIFIER. gcc/c/ * c-parser.cc (c_parser_omp_clause_linear): Parse OpenMP 5.2 style linear clause modifiers. Set OMP_CLAUSE_LINEAR_OLD_LINEAR_MODIFIER flag on the clauses when old style modifiers are used. * c-typeck.cc (c_finish_omp_clauses): Only reject linear clause with val modifier on simd or for if the old style modifiers are used. gcc/cp/ * parser.cc (cp_parser_omp_clause_linear): Parse OpenMP 5.2 style linear clause modifiers. Set OMP_CLAUSE_LINEAR_OLD_LINEAR_MODIFIER flag on the clauses when old style modifiers are used. * semantics.cc (finish_omp_clauses): Only reject linear clause with val modifier on simd or for if the old style modifiers are used. gcc/fortran/ * trans-openmp.cc (gfc_trans_omp_clauses): Set OMP_CLAUSE_LINEAR_OLD_LINEAR_MODIFIER on OMP_CLAUSE_LINEAR clauses unconditionally for now. gcc/testsuite/ * c-c++-common/gomp/linear-2.c: New test. * c-c++-common/gomp/linear-3.c: New test. * g++.dg/gomp/linear-3.C: New test. * g++.dg/gomp/linear-4.C: New test. * g++.dg/gomp/linear-5.C: New test.	2022-06-07 10:05:08 +02:00
Jan Beulich	6bb0776e10	x86: harmonize __builtin_ia32_psadbw() types The 64-bit, 128-bit, and 512-bit variants have V<n>DI return type, in line with instruction behavior. Make the 256-bit builtin match, thus also making it match the insn it expands to (using VI8_AVX2_AVX512BW). gcc/ config/i386/i386-builtin.def (__builtin_ia32_psadbw256): Change type. * config/i386/i386-builtin-types.def: New function type (V4DI, V32QI, V32QI). * config/i386/i386-expand.cc (ix86_expand_args_builtin): Handle V4DI_FTYPE_V32QI_V32QI.	2022-06-07 09:18:28 +02:00
Jan Beulich	76e3d60c16	x86-64: make "length_vex" also account for VEX.B use by register operand The length attribute ought to be "the (bounding maximum) length of an instruction" according to the comment next to its definition. A register operand encoded using the ModR/M.rm field will additionally use VEX.B for encoding the highest bit of the register number. Hence for the high 8 GPR registers as well as the [xy]mm{8..15} ones 3-byte VEX encoding may be needed. Since it isn't known to the function calculating the length which register goes where in the insn encoding, be conservative and assume a 3-byte VEX prefix whenever any such register operand is present and there's no memory operand. gcc/ * config/i386/i386.cc (ix86_attr_length_vex_default): Take REX.B into account for reg-only insns.	2022-06-07 09:17:25 +02:00
Roger Sayle	6dd194e2ce	PR c++/96442: Improved error recovery in enumerations. This patch is a revised fix for PR c++/96442 providing a cleaner solution, setting ENUM_UNDERLYING_TYPE to integer_type_node when issuing an error, so that this invariant holds during the parser's error recovery. 2022-06-07 Roger Sayle <roger@nextmovesoftware.com> gcc/cp/ChangeLog PR c++/96442 * decl.cc (start_enum): When emitting a "must be integral" error, set ENUM_UNDERLYING_TYPE to integer_type_node, to avoid an ICE downstream in build_enumeration. gcc/testsuite/ChangeLog PR c++/96442 * g++.dg/parse/pr96442.C: New test case.	2022-06-07 07:54:13 +01:00
Roger Sayle	c4320bde42	Recognize vpcmov in combine with -mxop on x86. By way of an apology for causing PR target/105791, where I'd overlooked the need to support V1TImode in TARGET_XOP's vpcmov instruction, this patch further improves support for TARGET_XOP's vpcmov instruction, by recognizing it in combine. Currently, the test case: typedef int v4si __attribute__ ((vector_size (16))); v4si foo(v4si c, v4si t, v4si f) { return (c&t)\|(~c&f); } on x86_64 with -O2 -mxop generates: vpxor %xmm2, %xmm1, %xmm1 vpand %xmm0, %xmm1, %xmm1 vpxor %xmm2, %xmm1, %xmm0 ret but with this patch now generates: vpcmov %xmm0, %xmm2, %xmm1, %xmm0 ret On its own, the new combine splitter works fine on TARGET_64BIT, but alas with -m32 combine incorrectly thinks the replacement instruction is more expensive, as IF_THEN_ELSE isn't currently/correctly handled in ix86_rtx_costs. So to avoid the need for a target selector in the new tescase, I've updated ix86_rtx_costs to report that AMD's vpcmov has a latency of two cycles [it's now an obsolete instruction set extension and there's unlikely to ever be a processor where this instruction has a different timing], and while there I also added rtx_costs for x86_64's integer conditional move instructions (which have single cycle latency). 2022-06-07 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/i386/i386.cc (ix86_rtx_costs): Add a new case for IF_THEN_ELSE, and provide costs for TARGET_XOP's vpcmov and TARGET_CMOVE's (scalar integer) conditional moves. * config/i386/sse.md (define_split): Recognize XOP's vpcmov from its equivalent (canonical) pxor;pand;pxor sequence. gcc/testsuite/ChangeLog * gcc.target/i386/xop-pcmov3.c: New test case.	2022-06-07 07:49:40 +01:00
Kewen Lin	63eab5d577	Update document for VECTOR_MODES_WITH_PREFIX r10-3912 updated the format of VECTOR_MODES_WITH_PREFIX by adding one more parameter ORDER, the related document is out of date. So update the document for ORDER. gcc/ChangeLog: * machmode.def (VECTOR_MODES_WITH_PREFIX): Update document for parameter ORDER.	2022-06-06 22:08:23 -05:00
GCC Administrator	70e2ffbcb4	Daily bump.	2022-06-07 00:16:20 +00:00
Patrick Palka	733a792a2b	c++: function NTTP argument considered unused [PR53164, PR105848] Here at parse time the template argument f (an OVERLOAD) in A<f> gets resolved ahead of time to the FUNCTION_DECL f<int>, and we defer marking f<int> as used until instantiation (of g) as usual. Later when instantiating g the type A<f> (where f has already been resolved) is non-dependent, so tsubst_aggr_type avoids re-processing its template arguments, and we end up never actually marking f<int> as used (which means we never instantiate it) even though A<f>::h() later calls it, leading to a link error. This patch works around this issue by looking through ADDR_EXPR when calling mark_used on the substituted callee of a CALL_EXPR. PR c++/53164 PR c++/105848 gcc/cp/ChangeLog: * pt.cc (tsubst_copy_and_build) <case CALL_EXPR>: Look through an ADDR_EXPR callee when calling mark_used. gcc/testsuite/ChangeLog: * g++.dg/template/fn-ptr3.C: New test.	2022-06-06 14:29:12 -04:00
Andrew Stubbs	36bd6eafb6	arm: reinstate HAVE_GAS_ARM_EXTENDED_ARCH The check was removed by accident. gcc/ChangeLog: * config.in: Regenerate. * configure: Regenerate. * configure.ac: Reinstate HAVE_GAS_ARM_EXTENDED_ARCH test.	2022-06-06 15:35:49 +01:00
GCC Administrator	df68ed4a3c	Daily bump.	2022-06-06 00:16:21 +00:00
GCC Administrator	ad6919374b	Daily bump.	2022-06-05 00:16:27 +00:00
Marek Polacek	aec868578d	c++: Allow mixing GNU/std-style attributes [PR69585] cp_parser_attributes_opt doesn't accept GNU attributes followed by [[]] attributes and vice versa; only a sequence of attributes of the same kind. That causes grief for code like: struct __attribute__ ((may_alias)) alignas (2) struct S { }; or #define EXPORT __attribute__((visibility("default"))) struct [[nodiscard]] EXPORT F { }; It doesn't seem to a documented restriction, so this patch fixes the problem. However, the patch does not touch the C FE. The C FE doesn't have a counterpart to C++'s cp_parser_attributes_opt -- it only has c_parser_transaction_attributes (which parses both GNU and [[]] attributes), but that's TM-specific. The C FE seems to use either c_parser_gnu_attributes or c_parser_std_attribute_specifier_sequence. As a consequence, this works: [[maybe_unused]] __attribute__((deprecated)) void f2 (); but this doesn't: __attribute__((deprecated)) [[maybe_unused]] void f1 (); I'm not sure what, if anything, should be done about this. PR c++/102399 PR c++/69585 gcc/cp/ChangeLog: * parser.cc (cp_parser_attributes_opt): Accept GNU attributes followed by [[]] attributes and vice versa. gcc/testsuite/ChangeLog: * g++.dg/ext/attrib65.C: New test. * g++.dg/ext/attrib66.C: New test. * g++.dg/ext/attrib67.C: New test.	2022-06-04 09:57:28 -04:00
Roger Sayle	ed6fd2aed5	PR middle-end/95126: Expand small const structs as immediate constants. This patch resolves PR middle-end/95126 which is a code quality regression, by teaching the RTL expander to emit small const structs/unions as integer immediate constants. The motivating example from the bugzilla PR is: struct small{ short a,b; signed char c; }; extern int func(struct small X); void call_func(void) { static struct small const s = { 1, 2, 0 }; func(s); } which on x86_64 is currently compiled to: call_func: movzwl s.0+2(%rip), %eax movzwl s.0(%rip), %edx movzwl s.0+4(%rip), %edi salq $16, %rax orq %rdx, %rax salq $32, %rdi orq %rax, %rdi jmp func but with this patch is now optimized to: call_func: movl $131073, %edi jmp func 2022-06-04 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR middle-end/95126 * calls.cc (load_register_parameters): When loading a suitable immediate_const_ctor_p VAR_DECL into a single word_mode register, construct it directly in a pseudo rather than read it (by parts) from memory. * expr.cc (int_expr_size): Make tree argument a const_tree. (immediate_const_ctor_p): Helper predicate. Return true for simple constructors that may be materialized in a register. (expand_expr_real_1) [VAR_DECL]: When expanding a constant VAR_DECL with a suitable immediate_const_ctor_p constructor use store_constructor to materialize it directly in a pseudo. * expr.h (immediate_const_ctor_p): Prototype here. * varasm.cc (initializer_constant_valid_for_bitfield_p): Change VALUE argument from tree to const_tree. * varasm.h (initializer_constant_valid_for_bitfield_p): Update prototype. gcc/testsuite/ChangeLog PR middle-end/95126 * gcc.target/i386/pr95126-m32-1.c: New test case. * gcc.target/i386/pr95126-m32-2.c: New test case. * gcc.target/i386/pr95126-m32-3.c: New test case. * gcc.target/i386/pr95126-m32-4.c: New test case. * gcc.target/i386/pr95126-m64-1.c: New test case. * gcc.target/i386/pr95126-m64-2.c: New test case. * gcc.target/i386/pr95126-m64-3.c: New test case. * gcc.target/i386/pr95126-m64-4.c: New test case.	2022-06-04 12:21:51 +01:00
Jakub Jelinek	53718316af	i386: Fix up _doubleword_mask [PR105825] My PR105778 patch apparently broke the following testcase. If the mask has the top relevant bit clear (i.e. we know we are shifting by 0 to wordsize bits - 1) but doesn't have all the bits below it set, we emit andsi3 before the shift sequence. When the pattern had :SI for that operand, that was just fine, but now that it can be also HImode or for -m64 DImode, we either can use a lowpart or paradoxical subreg to SImode as the following patch, or we use a HImode or DImode AND. This patch does the latter. 2022-06-04 Jakub Jelinek <jakub@redhat.com> PR target/105825 config/i386/i386.md (ashl<dwi>3_doubleword_mask, <insn><dwi>3_doubleword_mask): If top bit of mask is clear, but lower bits of mask aren't all set, use operands[2] mode for the AND operation instead of always SImode. * gcc.dg/pr105825.c: New test.	2022-06-04 10:36:24 +02:00
GCC Administrator	58b67140de	Daily bump.	2022-06-04 00:16:27 +00:00
Jason Merrill	891d647216	c++: more-specialized test I noticed the need for this testcase while working on PR102629; since there is no information about the target type, we don't want to choose the most specialized overload. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/auto56.C: New test.	2022-06-03 17:04:30 -04:00
Patrick Palka	0ecb6b906f	c++: value-dep but not type-dep decltype expr [PR105756] Here during ahead of time instantiation of the value-dependent but not type-dependent decltype expression (5 % N) == 0, cp_build_binary_op folds the operands of the == via cp_fully_fold, which performs speculative constexpr evaluation, and from which we crash for (5 % N) due to the value-dependence. Since the operand folding performed by cp_build_binary_op appears to be solely for sake of diagnosing overflow, and since these diagnostics are suppressed when in an unevaluated context, this patch avoids this crash by suppressing cp_build_binary_op's operand folding accordingly. PR c++/105756 gcc/cp/ChangeLog: * typeck.cc (cp_build_binary_op): Don't fold operands when c_inhibit_evaluation_warnings. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/decltype82.C: New test.	2022-06-03 14:58:22 -04:00
Jason Merrill	284ae8b46f	c++: redeclared hidden friend [PR105761] Here, when we see the second declaration of f we match it with the first one, copy over DECL_TEMPLATE_INFO, and then try to use it when parsing the definition, leading to confusion. PR c++/105761 gcc/cp/ChangeLog: * decl.cc (duplicate_decls): Don't copy DECL_TEMPLATE_INFO from a hidden friend. gcc/testsuite/ChangeLog: * g++.dg/cpp1y/auto-fn64.C: New test.	2022-06-03 13:44:44 -04:00
Patrick Palka	44a5bd6d93	c++: cv-quals of dummy obj for non-dep memfn call [PR105637] In non-dependent23.C below we expect the Base::foo calls to resolve to the second, third and fourth overloads respectively in light of the cv-qualifiers of 'this' in each case. But ever since r12-6075-g2decd2cabe5a4f, the calls incorrectly resolve to the first overload at instantiation time. This happens because the calls to Base::foo are all deemed non-dependent (ever since r7-755-g23cb72663051cd made us ignore 'this' dependence when considering the dependence of a non-static memfn call), hence we end up checking the call ahead of time, using as the object argument a dummy object of type Base. Since this object argument is cv-unqualified, the calls in turn resolve to the unqualified overload of baseDevice. Before r12-6075 this incorrect result would just get silently discarded and we'd end up redoing OR at instantiation time using 'this' as the object argument. But after r12-6075 we now reuse this incorrect result at instantiation time. This patch fixes this by making maybe_dummy_object respect the cv-quals of (the non-lambda) 'this' when returning a dummy object. Thus, ahead of time OR using a dummy object will give us the right answer that's consistent with the instantiation time answer. An earlier version of this patch didn't handle 'this'-capturing lambdas correctly, which broke lambda-this22.C below. PR c++/105637 gcc/cp/ChangeLog: * tree.cc (maybe_dummy_object): When returning a dummy object, respect the cv-quals of 'this' if available. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/lambda/lambda-this22.C: New test. * g++.dg/template/non-dependent23.C: New test.	2022-06-03 12:06:59 -04:00
Tobias Burnus	6a098f4e16	gfortran.dg/gomp/scope-6.f90: Add \\ to scan-tree-dump Missed git add for the hot fix before committing r13-982-gff35a75473d28205e52ecbcf9e6b5107b8b5ab90 gcc/testsuite/ * gfortran.dg/gomp/scope-6.f90: Fix dg-final scan-tree-dump.	2022-06-03 15:57:03 +02:00
Tobias Burnus	ff35a75473	OpenMP/Fortran: Add support for firstprivate and allocate clauses on scope construct Fortran commit to C/C++/backend commit r13-862-gf38b20d68fade5a922b9f68c4c3841e653d1b83c gcc/fortran/ChangeLog: * openmp.cc (OMP_SCOPE_CLAUSES): Add firstprivate and allocate. libgomp/ChangeLog: * libgomp.texi (OpenMP 5.2): Mark scope w/ firstprivate/allocate as Y. * testsuite/libgomp.fortran/scope-2.f90: New test. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/scope-5.f90: New test. * gfortran.dg/gomp/scope-6.f90: New test.	2022-06-03 15:54:02 +02:00
Patrick Palka	43c013df02	c++: don't substitute TEMPLATE_PARM_CONSTRAINTS [PR100374] This patch makes us avoid substituting into the TEMPLATE_PARM_CONSTRAINTS of each template parameter except as necessary for declaration matching, like we already do for the other constituent constraints of a declaration. This patch also improves the CA104 implementation of explicit specialization matching of a constrained function template inside a class template, by considering the function's combined constraints instead of just its trailing constraints. This allows us to correctly handle the first three explicit specializations in concepts-spec2.C below, but because we compare the constraints as a whole, it means we incorrectly accept the fourth explicit specialization which writes #3's constraints in a different way. For complete correctness here, determine_specialization should use tsubst_each_template_parm_constraints and template_parameter_heads_equivalent_p. PR c++/100374 gcc/cp/ChangeLog: * pt.cc (determine_specialization): Compare overall constraints not just the trailing constraints. (tsubst_each_template_parm_constraints): Define. (tsubst_friend_function): Use it. (tsubst_friend_class): Use it. (tsubst_template_parm): Don't substitute TEMPLATE_PARM_CONSTRAINTS. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/concepts-spec2.C: New test. * g++.dg/cpp2a/concepts-template-parm11.C: New test.	2022-06-03 09:29:12 -04:00
Patrick Palka	df4f95dbd4	c++: find_template_parameters and PARM_DECLs [PR105797] As explained in r11-4959-gde6f64f9556ae3, the atom cache assumes two equivalent expressions (according to cp_tree_equal) must use the same template parameters (according to find_template_parameters). This assumption turned out to not hold for TARGET_EXPR, which was addressed by that commit. But this assumption apparently doesn't hold for PARM_DECL either: find_template_parameters walks its DECL_CONTEXT but cp_tree_equal by default doesn't consider DECL_CONTEXT unless comparing_specializations is set. Thus in the first testcase below, the atomic constraints of #1 and #2 are equivalent according to cp_tree_equal, but according to find_template_parameters the former uses T and the latter uses both T and U (surprisingly). We could fix this assumption violation by setting comparing_specializations in the atom_hasher, which would make cp_tree_equal return false for the two atoms, but that seems overly pessimistic here. Ideally the atoms should continue being considered equivalent and we instead fix find_template_paremeters to return just T for #2's atom. To that end this patch makes for_each_template_parm_r stop walking the DECL_CONTEXT of a PARM_DECL. This should be safe to do because tsubst_copy / tsubst_decl only substitutes the TREE_TYPE of a PARM_DECL and doesn't bother substituting the DECL_CONTEXT, thus the only relevant template parameters are those used in its type. any_template_parm_r is currently responsible for walking its TREE_TYPE, but I suppose it now makes sense for for_each_template_parm_r to do so instead. In passing this patch also makes for_each_template_parm_r stop walking the DECL_CONTEXT of a VAR_/FUNCTION_DECL since doing so after walking DECL_TI_ARGS is redundant, I think. I experimented with not walking DECL_CONTEXT for CONST_DECL, but the second testcase below demonstrates it's necessary to walk it. PR c++/105797 gcc/cp/ChangeLog: * pt.cc (for_each_template_parm_r) <case FUNCTION_DECL, VAR_DECL>: Don't walk DECL_CONTEXT. <case PARM_DECL>: Likewise. Walk TREE_TYPE. <case CONST_DECL>: Simplify. (any_template_parm_r) <case PARM_DECL>: Don't walk TREE_TYPE. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/concepts-decltype4.C: New test. * g++.dg/cpp2a/concepts-memfun3.C: New test.	2022-06-03 09:08:41 -04:00
Jakub Jelinek	1982fe2692	match.pd: Optimize __builtin_mul_overflow_p (x, cst, (stype)0) [PR105777] The following patch is an incremental change to the PR30314 enhancement, this one handles signed types. For signed types (but still, the same for 1st and result element type and non-zero constant that fits into that type), we actually need to watch for overflow in direction to positive and negative infinity and it also depends on whether the cst operand is positive or negative. For __builtin_mul_overflow_p (x, cst, (stype) 0): For cst > 0, we can simplify it to: x > INT_MAX / cst \|\| x < INT_MIN / cst aka: x + (unsigned) (INT_MIN / cst) > (unsigned) (INT_MAX / cst) - (unsigned) (INT_MIN / cst) and for cst < 0 to: x < INT_MAX / cst \|\| x > INT_MIN / cst aka: x + (unsigned) (INT_MAX / cst) > (unsigned) (INT_MIN / cst) - (unsigned) (INT_MAX / cst) Additionally, I've added executable testcases, so we don't just check for the optimization to be performed, but also that it is correct (done that even for the other PR's testcase). 2022-06-03 Jakub Jelinek <jakub@redhat.com> PR middle-end/30314 PR middle-end/105777 * match.pd (__builtin_mul_overflow_p (x, cst, (stype) 0) -> x > stype_max / cst \|\| x < stype_min / cst): New simplification. * gcc.dg/tree-ssa/pr30314.c: Add noipa attribute to all functions. * gcc.dg/tree-ssa/pr105777.c: New test. * gcc.c-torture/execute/pr30314.c: New test. * gcc.c-torture/execute/pr105777.c: New test.	2022-06-03 11:42:35 +02:00

1 2 3 4 5 ...

193750 Commits