Commit Graph

195242 Commits

Author SHA1 Message Date
Richard Biener
c77b1c833e Fixup unaligned load/store cost for znver5
Currently unaligned YMM and ZMM load and store costs are cheaper than
aligned which causes the vectorizer to purposely mis-align accesses
by adding an alignment prologue.  It looks like the unaligned costs
were simply copied from the bogus znver4 costs.  The following makes
the unaligned costs equal to the aligned costs like in the fixed znver4
version.

	* config/i386/x86-tune-costs.h (znver5_cost): Update unaligned
	load and store cost from the aligned costs.

(cherry picked from commit 896393791e)
2024-09-29 01:55:50 +02:00
Jan Hubicka
54806268b4 Add AMD znver5 processor enablement with scheduler model
2024-02-14  Jan Hubicka  <jh@suse.cz>
	    Karthiban Anbazhagan  <Karthiban.Anbazhagan@amd.com>

gcc/ChangeLog:
	* common/config/i386/cpuinfo.h (get_amd_cpu): Recognize znver5.
	* common/config/i386/i386-common.cc (processor_names): Add znver5.
	(processor_alias_table): Likewise.
	* common/config/i386/i386-cpuinfo.h (processor_types): Add new zen
	family.
	(processor_subtypes): Add znver5.
	* config.gcc (x86_64-*-* |...): Likewise.
	* config/i386/driver-i386.cc (host_detect_local_cpu): Let
	march=native detect znver5 cpu's.
	* config/i386/i386-c.cc (ix86_target_macros_internal): Add
	znver5.
	* config/i386/i386-options.cc (m_ZNVER5): New definition
	(processor_cost_table): Add znver5.
	* config/i386/i386.cc (ix86_reassociation_width): Likewise.
	* config/i386/i386.h (processor_type): Add PROCESSOR_ZNVER5
	(PTA_ZNVER5): New definition.
	* config/i386/i386.md (define_attr "cpu"): Add znver5.
	(Scheduling descriptions) Add znver5.md.
	* config/i386/x86-tune-costs.h (znver5_cost): New definition.
	* config/i386/x86-tune-sched.cc (ix86_issue_rate): Add znver5.
	(ix86_adjust_cost): Likewise.
	* config/i386/x86-tune.def (avx512_move_by_pieces): Add m_ZNVER5.
	(avx512_store_by_pieces): Add m_ZNVER5.
	* doc/extend.texi: Add znver5.
	* doc/invoke.texi: Likewise.
	* config/i386/znver4.md: Rename to zn4zn5.md; combine znver4 and znver5 Scheduler.

gcc/testsuite/ChangeLog:
	* g++.target/i386/mv29.C: Handle znver5 arch.
	* gcc.target/i386/funcspec-56.inc:Likewise.

(cherry picked from commit d0aa0af9a9)
2024-09-29 01:55:19 +02:00
H.J. Lu
2e66eb7e7e x86: Don't use address override with segment regsiter
Address override only applies to the (reg32) part in the thread address
fs:(reg32).  Don't rewrite thread address like

(set (reg:CCZ 17 flags)
    (compare:CCZ (reg:SI 98 [ __gmpfr_emax.0_1 ])
        (mem/c:SI (plus:SI (plus:SI (unspec:SI [
                            (const_int 0 [0])
                        ] UNSPEC_TP)
                    (reg:SI 107))
                (const:SI (unspec:SI [
                            (symbol_ref:SI ("previous_emax") [flags 0x1a] <var_decl 0x7fffe9a11cf0 previous_emax>)
                        ] UNSPEC_DTPOFF))) [1 previous_emax+0 S4 A32])))

if address override is used to avoid the invalid memory operand like

	cmpl	%fs:previous_emax@dtpoff(%eax), %r12d

gcc/

	PR target/116839
	* config/i386/i386.cc (ix86_rewrite_tls_address_1): Make it
	static.  Return if TLS address is thread register plus an integer
	register.

gcc/testsuite/

	PR target/116839
	* gcc.target/i386/pr116839.c: New file.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit c79cc30862)
2024-09-28 18:59:24 +08:00
GCC Administrator
e282606b6c Daily bump. 2024-09-28 00:20:43 +00:00
Stefan Schulze Frielinghaus
7051fa5fa4 s390: Fix TF to FPRX2 conversion [PR115860]
Currently subregs originating from *tf_to_fprx2_0 and *tf_to_fprx2_1
survive register allocation.  This in turn leads to wrong register
renaming.  Keeping the current approach would mean we need two insns for
*tf_to_fprx2_0 and *tf_to_fprx2_1, respectively.  Something along the
lines

(define_insn "*tf_to_fprx2_0"
  [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "=f") 0)
        (unspec:DF [(match_operand:TF 1 "general_operand" "v")]
                   UNSPEC_TF_TO_FPRX2_0))]
  "TARGET_VXE"
  "#")

(define_insn "*tf_to_fprx2_0"
  [(set (match_operand:DF 0 "nonimmediate_operand" "=f")
        (unspec:DF [(match_operand:TF 1 "general_operand" "v")]
                   UNSPEC_TF_TO_FPRX2_0))]
  "TARGET_VXE"
  "vpdi\t%v0,%v1,%v0,1
  [(set_attr "op_type" "VRR")])

and similar for *tf_to_fprx2_1.  Note, pre register allocation operand 0
has mode FPRX2 and afterwards DF once subregs have been eliminated.

Since we always copy a whole vector register into a floating-point
register pair, another way to fix this is to merge *tf_to_fprx2_0 and
*tf_to_fprx2_1 into a single insn which means we don't have to use
subregs at all.  The downside of this is that the assembler template
contains two instructions, now.  The upside is that we don't have to
come up with some artificial insn before RA which might be more
readable/maintainable.  That is implemented by this patch.

In commit r11-4872-ge627cda5686592, the output operand specifier %V was
introduced which is used in tf_to_fprx2 only, now.  Instead of coming up
with its counterpart %F for floating-point registers, which would also
only be used in tf_to_fprx2, I print the operands directly.  This
renders %V unused which is why it is removed by this patch.

gcc/ChangeLog:

	PR target/115860
	* config/s390/s390.cc (print_operand): Remove operand specifier
	%V.
	* config/s390/s390.md (UNSPEC_TF_TO_FPRX2): New.
	* config/s390/vector.md (*tf_to_fprx2_0): Remove.
	(*tf_to_fprx2_1): Remove.
	(tf_to_fprx2): New.

gcc/testsuite/ChangeLog:

	* gcc.target/s390/vector/long-double-asm-abi.c: Adapt
	scan-assembler directive.
	* gcc.target/s390/vector/long-double-to-i64.c: Adapt
	scan-assembler directive.
	* gcc.target/s390/pr115860-1.c: New test.

(cherry picked from commit 46c2538435)
2024-09-27 12:45:42 +02:00
Stefan Schulze Frielinghaus
8d29e1c4ce s390: Fix AQ and AR constraints
Ensure for AQ and AR constraints that the resulting displacement after
adding any positive offset less than the size of the object being
referenced is still valid.

gcc/ChangeLog:

	* config/s390/s390.cc (s390_mem_constraint): Check displacement
	for AQ and AR constraints.

(cherry picked from commit 1a71ff3b89)
2024-09-27 12:45:42 +02:00
GCC Administrator
63a5a1fbb7 Daily bump. 2024-09-27 00:20:45 +00:00
GCC Administrator
596d857e68 Daily bump. 2024-09-26 00:21:00 +00:00
GCC Administrator
50c8048de9 Daily bump. 2024-09-25 00:20:31 +00:00
GCC Administrator
52bb3a257d Daily bump. 2024-09-24 00:20:06 +00:00
GCC Administrator
917b6c6a89 Daily bump. 2024-09-23 00:19:53 +00:00
GCC Administrator
2a6e9bfcd3 Daily bump. 2024-09-22 00:20:46 +00:00
GCC Administrator
a761f1007f Daily bump. 2024-09-21 00:19:46 +00:00
Harald Anlauf
cb25c5dd6b Fortran: fix ICE in gfc_create_module_variable [PR100273]
gcc/fortran/ChangeLog:

	PR fortran/100273
	* trans-decl.cc (gfc_create_module_variable): Handle module
	variable also when it is needed for the result specification
	of a contained function.

gcc/testsuite/ChangeLog:

	PR fortran/100273
	* gfortran.dg/pr100273.f90: New test.

(cherry picked from commit 1f462b5072)
2024-09-20 21:29:21 +02:00
GCC Administrator
645a11f70e Daily bump. 2024-09-20 17:37:53 +00:00
Eric Botcazou
0f32c31250 Fix small thinko in IPA mod/ref pass
When a memory copy operation is analyzed by analyze_ssa_name, if both the
load and store are made through the same SSA name, the store is overlooked.

gcc/
	* ipa-modref.cc (modref_eaf_analysis::analyze_ssa_name): Always
	process both the load and the store of a memory copy operation.

gcc/testsuite/
	* gcc.dg/ipa/modref-4.c: New test.
2024-09-20 17:31:22 +02:00
Stefan Schulze Frielinghaus
4fe0b88159 s390: Fix strict_low_part generation
In s390_expand_insv(), if generating code for ICM et al. src is a MEM
and gen_lowpart might force src into a register such that we end up with
patterns which do not match anymore.  Use adjust_address() instead in
order to preserve a MEM.

Furthermore, it is not straight forward to enforce a subreg.  For
example, in case of a paradoxical subreg, gen_lowpart() may return a
register.  In order to compensate this, s390_gen_lowpart_subreg() emits
a reference to a pseudo which does not coincide with its definition
which is wrong.  Additionally, if dest is a paradoxical subreg, then do
not try to emit a strict_low_part since it could mean that dest was not
initialized even though this might be fixed up later by init-regs.

Splitter for insn *get_tp_64, *zero_extendhisi2_31,
*zero_extendqisi2_31, *zero_extendqihi2_31 are applied after reload.
Thus, operands[0] is a hard register and gen_lowpart (m, operands[0])
just returns the hard register for mode m which is fine to use as an
argument for strict_low_part, i.e., we do not need to enforce subregs
here since after reload subregs are supposed to be eliminated anyway.

This fixes gcc.dg/torture/pr111821.c.

gcc/ChangeLog:

	* config/s390/s390-protos.h (s390_gen_lowpart_subreg): Remove.
	* config/s390/s390.cc (s390_gen_lowpart_subreg): Remove.
	(s390_expand_insv): Use adjust_address() and emit a
	strict_low_part only in case of a natural subreg.
	* config/s390/s390.md: Use gen_lowpart() instead of
	s390_gen_lowpart_subreg().

(cherry picked from commit 9ebc9fbddd)
2024-09-20 14:08:32 +02:00
Haochen Jiang
8483527158 doc: Add more alias option and reorder Intel CPU -march documentation
This patch is backported from GCC15 with some tweaks.

Since r15-3539, there are requests coming in to add other alias option
documentation. This patch will add all of them, including corei7, corei7-avx,
core-avx-i, core-avx2, atom and slm.

Also in the patch, I reordered that part of documentation, currently all
the CPUs/products are just all over the place. I regrouped them by
date-to-now products (since the very first CPU to latest Panther Lake), P-core
(since the clients become hybrid cores, starting from Sapphire Rapids) and
E-core (since Bonnell). In GCC14 and eariler GCC, Xeon Phi CPUs are still
there, I put them after E-core CPUs.

And in the patch, I refined the product names in documentation.

gcc/ChangeLog:

	* doc/invoke.texi: Add corei7, corei7-avx, core-avx-i,
	core-avx2, atom, and slm. Reorder the -march documentation by
	splitting them into date-to-now products, P-core, E-core and
	Xeon Phi. Refine the product names in documentation.
2024-09-19 14:42:05 +08:00
GCC Administrator
f467bbb06d Daily bump. 2024-09-19 00:20:51 +00:00
GCC Administrator
0ab2379e3a Daily bump. 2024-09-18 00:19:22 +00:00
Marek Polacek
9046f9aeae c++: crash with anon VAR_DECL [PR116676]
r12-3495 added maybe_warn_about_constant_value which will crash if
it gets a nameless VAR_DECL, which is what happens in this PR.

We created this VAR_DECL in cp_parser_decomposition_declaration.

	PR c++/116676

gcc/cp/ChangeLog:

	* constexpr.cc (maybe_warn_about_constant_value): Check DECL_NAME.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp1z/constexpr-116676.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>
(cherry picked from commit dfe0d4389a)
2024-09-17 12:37:05 -04:00
GCC Administrator
46bf97c534 Daily bump. 2024-09-17 00:18:27 +00:00
GCC Administrator
772393c20b Daily bump. 2024-09-16 00:18:36 +00:00
H.J. Lu
ebdc85b6ce x86-64: Don't use temp for argument in a TImode register
Don't use temp for a PARALLEL BLKmode argument of an EXPR_LIST expression
in a TImode register.  Otherwise, the TImode variable will be put in
the GPR save area which guarantees only 8-byte alignment.

gcc/

	PR target/116621
	* config/i386/i386.cc (ix86_gimplify_va_arg): Don't use temp for
	a PARALLEL BLKmode container of an EXPR_LIST expression in a
	TImode register.

gcc/testsuite/

	PR target/116621
	* gcc.target/i386/pr116621.c: New test.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit fa7bbb065c)
2024-09-16 04:38:46 +08:00
GCC Administrator
5c8f84c5dd Daily bump. 2024-09-15 00:18:06 +00:00
GCC Administrator
6aceb85821 Daily bump. 2024-09-14 00:18:08 +00:00
GCC Administrator
0344276a00 Daily bump. 2024-09-13 00:19:01 +00:00
GCC Administrator
682cc3f90d Daily bump. 2024-09-12 00:18:11 +00:00
GCC Administrator
b64a99840e Daily bump. 2024-09-11 00:19:52 +00:00
GCC Administrator
b48e7c28b6 Daily bump. 2024-09-10 00:25:59 +00:00
GCC Administrator
0dba9570a4 Daily bump. 2024-09-09 00:18:31 +00:00
GCC Administrator
0f053a8519 Daily bump. 2024-09-08 00:20:33 +00:00
GCC Administrator
fc14ff0c9e Daily bump. 2024-09-07 00:18:42 +00:00
GCC Administrator
71f9ca6c69 Daily bump. 2024-09-06 00:19:57 +00:00
H.J. Lu
42d4aa02c6 ipa: Don't disable function parameter analysis for fat LTO
Update analyze_parms not to disable function parameter analysis for
-ffat-lto-objects.  Tested on x86-64, there are no differences in zstd
with "-O2 -flto=auto" -g "vs -O2 -flto=auto -g -ffat-lto-objects".

	PR ipa/116410
	* ipa-modref.cc (analyze_parms): Always analyze function parameter
	for LTO.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 2f1689ea8e)
2024-09-05 07:49:09 -07:00
GCC Administrator
87a5641b65 Daily bump. 2024-09-05 00:20:00 +00:00
GCC Administrator
93e66cab19 Daily bump. 2024-09-04 00:25:58 +00:00
Haochen Jiang
6e59b188c4 i386: Fix vfpclassph non-optimizied intrin
The intrin for non-optimized got a typo in mask type, which will cause
the high bits of __mmask32 being unexpectedly zeroed.

The test does not fail under O0 with current 1b since the testcase is
wrong. We need to include avx512-mask-type.h after SIZE is defined, or
it will always be __mmask8. That problem also happened in AVX10.2 testcases.
I will write a seperate patch to fix that.

gcc/ChangeLog:

	* config/i386/avx512fp16intrin.h
	(_mm512_mask_fpclass_ph_mask): Correct mask type to __mmask32.
	(_mm512_fpclass_ph_mask): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-vfpclassph-1c.c: New test.
2024-09-03 16:48:07 +08:00
GCC Administrator
911eadd490 Daily bump. 2024-09-03 00:23:54 +00:00
liuhongt
6585b06303 Check avx upper register for parallel.
For function arguments/return, when it's BLK mode, it's put in a
parallel with an expr_list, and the expr_list contains the real mode
and registers.
Current ix86_check_avx_upper_register only checked for SSE_REG_P, and
failed to handle that. The patch extend the handle to each subrtx.

gcc/ChangeLog:

	PR target/116512
	* config/i386/i386.cc (ix86_check_avx_upper_register): Iterate
	subrtx to scan for avx upper register.
	(ix86_check_avx_upper_stores): Inline old
	ix86_check_avx_upper_register.
	(ix86_avx_u128_mode_needed): Ditto, and replace
	FOR_EACH_SUBRTX with call to new
	ix86_check_avx_upper_register.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr116512.c: New test.

(cherry picked from commit ab214ef734)
2024-09-02 09:37:41 +08:00
GCC Administrator
4dc921bcf2 Daily bump. 2024-09-02 00:20:58 +00:00
GCC Administrator
bb95e77900 Daily bump. 2024-09-01 00:29:35 +00:00
GCC Administrator
a9284c5d4e Daily bump. 2024-08-31 00:20:04 +00:00
GCC Administrator
2875f9fd29 Daily bump. 2024-08-30 00:24:32 +00:00
GCC Administrator
9742dbd709 Daily bump. 2024-08-29 00:21:13 +00:00
GCC Administrator
c2305c8285 Daily bump. 2024-08-28 00:21:21 +00:00
GCC Administrator
84fc228288 Daily bump. 2024-08-26 00:20:27 +00:00
GCC Administrator
19fedf7aa7 Daily bump. 2024-08-25 00:20:22 +00:00
GCC Administrator
15176abb93 Daily bump. 2024-08-24 00:19:20 +00:00
GCC Administrator
61d63da66e Daily bump. 2024-08-23 00:18:43 +00:00