Go to file
Christophe Lyon df0e57c2c0 arm: Fix vcond_mask expander for MVE (PR target/100757)
The problem in this PR is that we call VPSEL with a mask of vector
type instead of HImode. This happens because operand 3 in vcond_mask
is the pre-computed vector comparison and has vector type.

This patch fixes it by implementing TARGET_VECTORIZE_GET_MASK_MODE,
returning the appropriate VxBI mode when targeting MVE.  In turn, this
implies implementing vec_cmp<mode><MVE_vpred>,
vec_cmpu<mode><MVE_vpred> and vcond_mask_<mode><MVE_vpred>, and we can
move vec_cmp<mode><v_cmp_result>, vec_cmpu<mode><mode> and
vcond_mask_<mode><v_cmp_result> back to neon.md since they are not
used by MVE anymore.  The new *<MVE_vpred> patterns listed above are
implemented in mve.md since they are only valid for MVE. However this
may make maintenance/comparison more painful than having all of them
in vec-common.md.

In the process, we can get rid of the recently added vcond_mve
parameter of arm_expand_vector_compare.

Compared to neon.md's vcond_mask_<mode><v_cmp_result> before my "arm:
Auto-vectorization for MVE: vcmp" patch (r12-834), it keeps the VDQWH
iterator added in r12-835 (to have V4HF/V8HF support), as well as the
(!<Is_float_mode> || flag_unsafe_math_optimizations) condition which
was not present before r12-834 although SF modes were enabled by VDQW
(I think this was a bug).

Using TARGET_VECTORIZE_GET_MASK_MODE has the advantage that we no
longer need to generate vpsel with vectors of 0 and 1: the masks are
now merged via scalar 'ands' instructions operating on 16-bit masks
after converting the boolean vectors.

In addition, this patch fixes a problem in arm_expand_vcond() where
the result would be a vector of 0 or 1 instead of operand 1 or 2.

Since we want to skip gcc.dg/signbit-2.c for MVE, we also add a new
arm_mve effective target.

Reducing the number of iterations in pr100757-3.c from 32 to 8, we
generate the code below:

float a[32];
float fn1(int d) {
  float c = 4.0f;
  for (int b = 0; b < 8; b++)
    if (a[b] != 2.0f)
      c = 5.0f;
  return c;
}

fn1:
	ldr     r3, .L3+48
	vldr.64 d4, .L3              // q2=(2.0,2.0,2.0,2.0)
	vldr.64 d5, .L3+8
	vldrw.32        q0, [r3]     // q0=a(0..3)
	adds    r3, r3, #16
	vcmp.f32        eq, q0, q2   // cmp a(0..3) == (2.0,2.0,2.0,2.0)
	vldrw.32        q1, [r3]     // q1=a(4..7)
	vmrs     r3, P0
	vcmp.f32        eq, q1, q2   // cmp a(4..7) == (2.0,2.0,2.0,2.0)
	vmrs    r2, P0  @ movhi
	ands    r3, r3, r2           // r3=select(a(0..3]) & select(a(4..7))
	vldr.64 d4, .L3+16           // q2=(5.0,5.0,5.0,5.0)
	vldr.64 d5, .L3+24
	vmsr     P0, r3
	vldr.64 d6, .L3+32           // q3=(4.0,4.0,4.0,4.0)
	vldr.64 d7, .L3+40
	vpsel q3, q3, q2             // q3=vcond_mask(4.0,5.0)
	vmov.32 r2, q3[1]            // keep the scalar max
	vmov.32 r0, q3[3]
	vmov.32 r3, q3[2]
	vmov.f32        s11, s12
	vmov    s15, r2
	vmov    s14, r3
	vmaxnm.f32      s15, s11, s15
	vmaxnm.f32      s15, s15, s14
	vmov    s14, r0
	vmaxnm.f32      s15, s15, s14
	vmov    r0, s15
	bx      lr
	.L4:
	.align  3
	.L3:
	.word   1073741824	// 2.0f
	.word   1073741824
	.word   1073741824
	.word   1073741824
	.word   1084227584	// 5.0f
	.word   1084227584
	.word   1084227584
	.word   1084227584
	.word   1082130432	// 4.0f
	.word   1082130432
	.word   1082130432
	.word   1082130432

This patch adds tests that trigger an ICE without this fix.

The pr100757*.c testcases are derived from
gcc.c-torture/compile/20160205-1.c, forcing the use of MVE, and using
various types and return values different from 0 and 1 to avoid
commonalization with boolean masks.  In addition, since we should not
need these masks, the tests make sure they are not present.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon  <christophe.lyon@arm.com>

	PR target/100757
	gcc/
	* config/arm/arm-protos.h (arm_get_mask_mode): New prototype.
	(arm_expand_vector_compare): Update prototype.
	* config/arm/arm.cc (TARGET_VECTORIZE_GET_MASK_MODE): New.
	(arm_vector_mode_supported_p): Add support for VxBI modes.
	(arm_expand_vector_compare): Remove useless generation of vpsel.
	(arm_expand_vcond): Fix select operands.
	(arm_get_mask_mode): New.
	* config/arm/mve.md (vec_cmp<mode><MVE_vpred>): New.
	(vec_cmpu<mode><MVE_vpred>): New.
	(vcond_mask_<mode><MVE_vpred>): New.
	* config/arm/vec-common.md (vec_cmp<mode><v_cmp_result>)
	(vec_cmpu<mode><mode, vcond_mask_<mode><v_cmp_result>): Move to ...
	* config/arm/neon.md (vec_cmp<mode><v_cmp_result>)
	(vec_cmpu<mode><mode, vcond_mask_<mode><v_cmp_result>): ... here
	and disable for MVE.
	* doc/sourcebuild.texi (arm_mve): Document new effective-target.

	gcc/testsuite/
	PR target/100757
	* gcc.target/arm/simd/pr100757-2.c: New.
	* gcc.target/arm/simd/pr100757-3.c: New.
	* gcc.target/arm/simd/pr100757-4.c: New.
	* gcc.target/arm/simd/pr100757.c: New.
	* gcc.dg/signbit-2.c: Skip when targeting ARM/MVE.
	* lib/target-supports.exp (check_effective_target_arm_mve): New.
2022-02-22 15:55:07 +00:00
c++tools Update copyright years. 2022-01-03 10:42:10 +01:00
config Daily bump. 2021-12-22 00:16:30 +00:00
contrib Daily bump. 2022-02-01 00:16:29 +00:00
fixincludes Daily bump. 2022-02-05 00:16:31 +00:00
gcc arm: Fix vcond_mask expander for MVE (PR target/100757) 2022-02-22 15:55:07 +00:00
gnattools Daily bump. 2021-10-23 00:16:26 +00:00
gotools Daily bump. 2022-02-14 00:16:23 +00:00
include Update copyright years. 2022-01-03 10:42:10 +01:00
INSTALL
intl Daily bump. 2021-11-30 00:16:44 +00:00
libada Update copyright years. 2022-01-03 10:42:10 +01:00
libatomic Daily bump. 2022-02-04 00:16:24 +00:00
libbacktrace Daily bump. 2022-02-18 00:16:39 +00:00
libcc1 Update copyright years. 2022-01-03 10:42:10 +01:00
libcody Update Copyright in ChangeLog files 2022-01-03 10:31:39 +01:00
libcpp Daily bump. 2022-02-12 00:16:23 +00:00
libdecnumber Update copyright years. 2022-01-03 10:42:10 +01:00
libffi Daily bump. 2021-11-16 00:16:31 +00:00
libgcc Daily bump. 2022-01-26 00:16:38 +00:00
libgfortran Daily bump. 2022-01-27 00:16:29 +00:00
libgo runtime/internal/syscall: build dummy package if not Linux 2022-02-21 13:24:38 -08:00
libgomp [libgomp, nvptx] Fix hang in gomp_team_barrier_wait_end 2022-02-22 15:48:03 +01:00
libiberty libiberty: Fix up debug.temp.o creation if *.o has 64K+ sections [PR104617] 2022-02-22 11:33:45 +01:00
libitm Daily bump. 2022-02-04 00:16:24 +00:00
libobjc Update copyright years. 2022-01-03 10:42:10 +01:00
liboffloadmic Daily bump. 2021-10-20 00:16:43 +00:00
libphobos Daily bump. 2022-02-21 00:16:24 +00:00
libquadmath Daily bump. 2022-01-12 00:16:39 +00:00
libsanitizer Daily bump. 2022-02-16 00:16:26 +00:00
libssp Update copyright years. 2022-01-03 10:42:10 +01:00
libstdc++-v3 libstdc++: Implement P2415R2 changes to viewable_range / views::all 2022-02-22 09:37:58 -05:00
libvtv Update copyright years. 2022-01-03 10:42:10 +01:00
lto-plugin Update copyright years. 2022-01-03 10:42:10 +01:00
maintainer-scripts Daily bump. 2021-05-15 00:16:27 +00:00
zlib Daily bump. 2021-12-17 00:16:20 +00:00
.dir-locals.el dir-locals: Use https for bug references 2021-07-20 11:40:34 +01:00
.gitattributes
.gitignore Add cscope.out to git ignore. 2021-06-24 16:51:40 +05:30
ABOUT-NLS
ar-lib
ChangeLog Daily bump. 2022-02-09 00:16:24 +00:00
ChangeLog.jit
ChangeLog.tree-ssa
compile
config-ml.in
config.guess config.sub, config.guess : Import upstream 2021-01-25. 2021-02-23 17:21:10 +08:00
config.rpath
config.sub config.sub: change mode to 755. 2021-12-21 09:10:57 +01:00
configure config: Add check whether D compiler works (PR103528) 2021-12-21 21:29:35 +01:00
configure.ac Revert "Sync with binutils: GCC: Pass --plugin to AR and RANLIB" 2021-12-15 20:45:58 -08:00
COPYING
COPYING3
COPYING3.LIB
COPYING.LIB
COPYING.RUNTIME
depcomp
install-sh
libtool-ldflags
libtool.m4 Revert "Sync with binutils: GCC: Pass --plugin to AR and RANLIB" 2021-12-15 20:45:58 -08:00
lt~obsolete.m4
ltgcc.m4
ltmain.sh
ltoptions.m4
ltsugar.m4
ltversion.m4
MAINTAINERS MAINTAINERS: Update my email address. 2022-02-22 15:55:05 +00:00
Makefile.def Revert "Fix PR 67102: Add libstdc++ dependancy to libffi" [PR67102] 2022-01-25 18:46:21 +01:00
Makefile.in Revert "Fix PR 67102: Add libstdc++ dependancy to libffi" [PR67102] 2022-01-25 18:46:21 +01:00
Makefile.tpl Revert "Sync with binutils: GCC: Pass --plugin to AR and RANLIB" 2021-12-15 20:45:58 -08:00
missing
mkdep
mkinstalldirs
move-if-change
multilib.am
README
symlink-tree
test-driver
ylwrap

This directory contains the GNU Compiler Collection (GCC).

The GNU Compiler Collection is free software.  See the files whose
names start with COPYING for copying permission.  The manuals, and
some of the runtime libraries, are under different terms; see the
individual source files for details.

The directory INSTALL contains copies of the installation information
as HTML and plain text.  The source of this information is
gcc/doc/install.texi.  The installation information includes details
of what is included in the GCC sources and what files GCC installs.

See the file gcc/doc/gcc.texi (together with other files that it
includes) for usage and porting information.  An online readable
version of the manual is in the files gcc/doc/gcc.info*.

See http://gcc.gnu.org/bugs/ for how to report bugs usefully.

Copyright years on GCC source files may be listed using range
notation, e.g., 1987-2012, indicating that every year in the range,
inclusive, is a copyrightable year that could otherwise be listed
individually.