Commit Graph

31 Commits

Author SHA1 Message Date
Jan Beulich
740a1e7911 x86: reduce AVX512 FP set of insns decoded through vex_w_table[]
Like for AVX512-FP16, there's not that many FP insns where going through
this table is easier / cheaper than using suitable macros. Utilize %XS
and %XD more to eliminate a fair number of table entries.

While doing this I noticed a few anomalies. Where lines get touched /
moved anyway, these are being addressed right here:
- vmovshdup used EXx for its 2nd operand, thus displaying seemingly
  valid broadcast when EVEX.b is set with a memory operand; use
  EXEvexXNoBcst instead just like vmovsldup already does
- vmovlhps used EXx for its 3rd operand, when all sibling entries use
  EXq; switch to EXq there for consistency (the two differ only for
  memory operands)
2022-01-14 10:54:55 +01:00
Jan Beulich
2235ecb8af x86: reduce AVX512-FP16 set of insns decoded through vex_w_table[]
Like already indicated during review of the original submission, there's
really only very few insns where going through this table is easier /
cheaper than using suitable macros. Utilize %XH more and introduce
similar %XS and %XD (which subsequently can be used for further table
size reduction).

While there also switch to using oappend() in 'XH' macro processing.
2022-01-14 10:54:21 +01:00
Cui,Lili
0cc7872125 [PATCH 1/2] Enable Intel AVX512_FP16 instructions
Intel AVX512 FP16 instructions use maps 3, 5 and 6. Maps 5 and 6 use 3 bits
in the EVEX.mmm field (0b101, 0b110). Map 5 is for instructions that were FP32
in map 1 (0Fxx). Map 6 is for instructions that were FP32 in map 2 (0F38xx).
There are some exceptions to this rule. Some things in map 1 (0Fxx) with imm8
operands predated our current conventions; those instructions moved to map 3.
FP32 things in map 3 (0F3Axx) found new opcodes in map3 for FP16 because map3
is very sparsely populated. Most of the FP16 instructions share opcodes and
prefix (EVEX.pp) bits with the related FP32 operations.

Intel AVX512 FP16 instructions has new displacements scaling rules, please refer
to the public software developer manual for detail information.

gas/

2021-08-05  Igor Tsimbalist  <igor.v.tsimbalist@intel.com>
            H.J. Lu  <hongjiu.lu@intel.com>
            Wei Xiao <wei3.xiao@intel.com>
            Lili Cui  <lili.cui@intel.com>

	* config/tc-i386.c (struct Broadcast_Operation): Adjust comment.
	(cpu_arch): Add .avx512_fp16.
	(cpu_noarch): Add noavx512_fp16.
	(pte): Add evexmap5 and evexmap6.
	(build_evex_prefix): Handle EVEXMAP5 and EVEXMAP6.
	(check_VecOperations): Handle {1to32}.
	(check_VecOperands): Handle CheckRegNumb.
	(check_word_reg): Handle Toqword.
	(i386_error): Add invalid_dest_and_src_register_set.
	(match_template): Handle invalid_dest_and_src_register_set.
	* doc/c-i386.texi: Document avx512_fp16, noavx512_fp16.

opcodes/

2021-08-05  Igor Tsimbalist  <igor.v.tsimbalist@intel.com>
            H.J. Lu  <hongjiu.lu@intel.com>
            Wei Xiao <wei3.xiao@intel.com>
            Lili Cui  <lili.cui@intel.com>

	* i386-dis.c (EXwScalarS): New.
	(EXxh): Ditto.
	(EXxhc): Ditto.
	(EXxmmqh): Ditto.
	(EXxmmqdh): Ditto.
	(EXEvexXwb): Ditto.
	(DistinctDest_Fixup): Ditto.
	(enum): Add xh_mode, evex_half_bcst_xmmqh_mode, evex_half_bcst_xmmqdh_mode
	and w_swap_mode.
	(enum): Add PREFIX_EVEX_0F3A08_W_0, PREFIX_EVEX_0F3A0A_W_0,
	PREFIX_EVEX_0F3A26, PREFIX_EVEX_0F3A27, PREFIX_EVEX_0F3A56,
	PREFIX_EVEX_0F3A57, PREFIX_EVEX_0F3A66, PREFIX_EVEX_0F3A67,
	PREFIX_EVEX_0F3AC2, PREFIX_EVEX_MAP5_10, PREFIX_EVEX_MAP5_11,
	PREFIX_EVEX_MAP5_1D, PREFIX_EVEX_MAP5_2A, PREFIX_EVEX_MAP5_2C,
	PREFIX_EVEX_MAP5_2D, PREFIX_EVEX_MAP5_2E, PREFIX_EVEX_MAP5_2F,
	PREFIX_EVEX_MAP5_51, PREFIX_EVEX_MAP5_58, PREFIX_EVEX_MAP5_59,
	PREFIX_EVEX_MAP5_5A_W_0, PREFIX_EVEX_MAP5_5A_W_1,
	PREFIX_EVEX_MAP5_5B_W_0, PREFIX_EVEX_MAP5_5B_W_1,
	PREFIX_EVEX_MAP5_5C, PREFIX_EVEX_MAP5_5D, PREFIX_EVEX_MAP5_5E,
	PREFIX_EVEX_MAP5_5F, PREFIX_EVEX_MAP5_78, PREFIX_EVEX_MAP5_79,
	PREFIX_EVEX_MAP5_7A, PREFIX_EVEX_MAP5_7B, PREFIX_EVEX_MAP5_7C,
	PREFIX_EVEX_MAP5_7D_W_0, PREFIX_EVEX_MAP6_13, PREFIX_EVEX_MAP6_56,
	PREFIX_EVEX_MAP6_57, PREFIX_EVEX_MAP6_D6, PREFIX_EVEX_MAP6_D7
	(enum): Add EVEX_MAP5 and EVEX_MAP6.
	(enum): Add EVEX_W_MAP5_5A, EVEX_W_MAP5_5B,
	EVEX_W_MAP5_78_P_0, EVEX_W_MAP5_78_P_2, EVEX_W_MAP5_79_P_0,
	EVEX_W_MAP5_79_P_2, EVEX_W_MAP5_7A_P_2, EVEX_W_MAP5_7A_P_3,
	EVEX_W_MAP5_7B_P_2, EVEX_W_MAP5_7C_P_0, EVEX_W_MAP5_7C_P_2,
	EVEX_W_MAP5_7D, EVEX_W_MAP6_13_P_0, EVEX_W_MAP6_13_P_2,
	(get_valid_dis386): Properly handle new instructions.
	(intel_operand_size): Handle new modes.
	(OP_E_memory): Ditto.
	(OP_EX): Ditto.
	* i386-dis-evex.h: Updated for AVX512_FP16.
	* i386-dis-evex-mod.h: Updated for AVX512_FP16.
	* i386-dis-evex-prefix.h: Updated for AVX512_FP16.
	* i386-dis-evex-reg.h : Updated for AVX512_FP16.
	* i386-dis-evex-w.h : Updated for AVX512_FP16.
	* i386-gen.c (cpu_flag_init): Add CPU_AVX512_FP16_FLAGS,
	and CPU_ANY_AVX512_FP16_FLAGS. Update CPU_ANY_AVX512F_FLAGS
	and CPU_ANY_AVX512BW_FLAGS.
	(cpu_flags): Add CpuAVX512_FP16.
	(opcode_modifiers): Add DistinctDest.
	* i386-opc.h (enum): (AVX512_FP16): New.
	(i386_opcode_modifier): Add reqdistinctreg.
	(i386_cpu_flags): Add cpuavx512_fp16.
	(EVEXMAP5): Defined as a macro.
	(EVEXMAP6): Ditto.
	* i386-opc.tbl: Add Intel AVX512_FP16 instructions.
	* i386-init.h: Regenerated.
	* i386-tbl.h: Ditto.
2021-08-05 21:03:41 +08:00
Jan Beulich
c1d66d5f24 x86: drop xmm_m{b,w,d,q}_mode
They're effectively redundant with {b,w,d,q}_mode.
2021-07-22 13:08:39 +02:00
Jan Beulich
be2f8fcd9d x86: correct VCVT{,U}SI2SD rounding mode handling
With EVEX.W clear the instruction doesn't ignore the rounding mode, but
(like for other insns without rounding semantics) EVEX.b set causes #UD.
Hence the handling of EVEX.W needs to be done when processing
evex_rounding_64_mode, not at the decode stages.

Derive a new 64-bit testcase from the 32-bit one to cover the different
EVEX.W treatment in both cases.
2021-07-22 13:02:08 +02:00
Jan Beulich
d0579d4d1c x86: drop OP_Mask()
By moving its vex.r check there it becomes fully redundant with OP_G().
2021-07-22 13:01:09 +02:00
Jan Beulich
b763d508db x86/Intel: correct AVX512 S/G disassembly
Commit 6ff00b5e12 ("x86/Intel: correct permitted operand sizes for
AVX512 scatter/gather") brought the assembler side of AVX512 S/G insn
handling in line with AVX2's, but the disassembler side was forgotten.
This has the benefit of
- allowing to fold a number of table entries,
- rendering a few #define-s and enumerators unused.
2021-03-10 08:20:29 +01:00
Jan Beulich
85ba7507f6 x86: reuse further VEX entries for EVEX
When the VEX.L=1 decode matches that of both EVEX.L'L=1 and EVEX.L'L=2
(typically when all three are invalid) the (smaller) VEX table entry can
be reused by EVEX, instead of duplicating data. (Note that XM and XMM as
well as EXxmm_md and EXd are equivalent at least for the purposes here.)
2021-03-10 08:19:11 +01:00
Jan Beulich
066f82b96a x86: reuse VEX entries for EVEX vperm{q,pd}
By matching VEX decode order (L before W), some EVEX entries can refer
back to VEX ones instead of carrying duplicates.
2021-03-10 08:18:24 +01:00
Jan Beulich
fc681dd6a1 x86: re-arrange order of decode for various EVEX opcodes
The order of decodes influences the overall number of table entries.
Reduce table size quite a bit by first decoding few-alternatives
attributes common to all valid leaves.

This also adds a PREFIX_DATA 7531c61332 ("x86: simplify decode of
opcodes valid with (embedded) 66 prefix only") missed to apply to
vbroadcastf64x4.
2021-03-10 08:16:54 +01:00
Jan Beulich
464d2b6568 x86: drop Rdq, Rd, and MaskR
Rdq, Rd, and MaskR can be replaced by Edq, Ed / Rm, and MaskE
respectively, as OP_R() doesn't enforce ModRM.mod == 3, and hence where
MOD matters but hasn't been decoded yet it needs to be anyway. (The case
of converting to Rm is temporary until a subsequent change.)
2020-07-14 10:42:33 +02:00
Jan Beulich
7531c61332 x86: simplify decode of opcodes valid with (embedded) 66 prefix only
The only valid (embedded or explicit) prefix being the data size one
(which is a fairly common pattern), avoid going through prefix_table[].
Instead extend the "required prefix" logic to also handle PREFIX_DATA
alone in a table entry, now used to identify this case. This requires
moving the (adjusted) ->prefix_requirement logic ahead of the printing
of stray prefixes, as the latter needs to observe the new setting of
PREFIX_DATA in used_prefixes.

Also add PREFIX_OPCODE on related entries when previously there was
mistakenly no decode step through prefix_table[].
2020-07-14 10:33:40 +02:00
Jan Beulich
41f5efc685 x86: drop need_vex_reg
It was quite odd for the prior operand handling to have to clear this
flag for the actual operand handling to print nothing. Have the actual
operand handling determine whether the operand is actually present.
With this {d,q}_scalar_swap_mode become unused and hence also get dropped.
2020-07-14 10:32:19 +02:00
Jan Beulich
4726e9a479 x86: extend %BW use to VP{COMPRESS,EXPAND}{B,W}
Unlike the earlier ones these also need their operands adjusted. Replace
the (mis-described: there's nothing "scalar" here) {b,w}_scalar_mode by
a single new mode, with the actual unit width controlled by EVEX.W.
2020-07-14 10:29:25 +02:00
Jan Beulich
b24d668c07 x86-64: fix {,V}PCMPESTR{I,M} disassembly in Intel mode
The operands don't allow disambiguating the insn in 64-bit mode, and
hence suffixes need to be emitted not just in AT&T mode. Achieve this
by re-using %LQ while dropping PCMPESTR_Fixup().
2020-07-14 10:28:12 +02:00
Jan Beulich
c4de76066e x86: fold VCMP_Fixup() into CMP_Fixup()
There's no reason to have two functions and two tables, when the AVX
functionality here is a proper superset of the SSE one.
2020-07-14 10:27:32 +02:00
Jan Beulich
931452b644 x86: introduce %BW to avoid going through vex_w_table[]
This parallels %LW and %XW.
2020-07-07 08:08:09 +02:00
Jan Beulich
21a3faebba x86: use %LW / %XW instead of going through vex_w_table[]
Since we have these macros, there's no point having unnecessary table
depth.

VFPCLASSP{S,D} are now the first instance of using two %-prefixed
macros, which has pointed out a problem with the implementation. Instead
of using custom code in various case blocks, do the macro accumulation
centralized at the top of the main loop of putop(), and zap the
accumulated macros at the bottom of that loop once it has been
processed.
2020-07-06 13:44:03 +02:00
Jan Beulich
bc152a17ff x86: most VBROADCAST{F,I}{32,64}x* only accept memory operands
VBROADCAST{F,I}32x2 are the only exceptions here.
2020-07-06 13:43:34 +02:00
Jan Beulich
fedfb81e60 x86: drop EVEX table entries that can be made served by VEX ones
By doing the EVEX.W decode first, in various cases VEX table entries can
be re-used.
2020-07-06 13:42:33 +02:00
Jan Beulich
3a57774c7b x86: AVX512 VPERM{D,Q,PS,PD} insns need to honor EVEX.L'L
Just like (where they exist) their AVX counterparts do for VEX.L. For
all of them the 128-bit forms are invalid.
2020-07-06 13:41:58 +02:00
Jan Beulich
e74d9fa9cf x86: AVX512 extract/insert insns need to honor EVEX.L'L
Just like their AVX counterparts do for VEX.L.

At this occasion also make EVEX.W have the same effect as VEX.W on the
printing of VPINSR{B,W}'s operands, bringing them also in sync with
VPEXTR{B,W}.
2020-07-06 13:41:27 +02:00
Jan Beulich
6431c8015b x86: honor VEX.W for VCVT{PH2PS,PS2PH}
Unlike for the EVEX-encoded versions, the VEX ones failed to decode
VEX.W. Once the necessary adjustments are done, it becomes obvious that
the EVEX and VEX table entries for VCVTPS2PH are identical and can hence
be folded.
2020-07-06 13:40:40 +02:00
Jan Beulich
6df22cf64c x86: drop EVEX table entries that can be served by VEX ones
The duplication is not only space inefficient, but also risks entries
going out of sync (some of which that I became aware of while doing this
work will get addressed subsequently). Right here note that for
VGF2P8MULB this also addresses the prior lack of EVEX.W decoding (i.e. a
first example of out of sync entries).

This introduces EXxEVexR to some VEX templates, on the basis that this
operand is benign there and only relevant when EVEX encoding ends up
reaching these entries.
2020-07-06 13:40:13 +02:00
Jan Beulich
39e0f45682 x86: replace EXqScalarS by EXqVexScalarS
There's only a single user, that that one can do fine with the
alternative, as the "Vex" aspect of the other operand kind is meaningful
only on 3-operand insns.

While doing this I noticed that I didn't need to do the same adjustment
in the EVEX tables, and voilà - there was a bug, which gets fixed at the
same time (see the testsuite changes).
2020-07-06 13:35:38 +02:00
Jan Beulich
5b872f7df7 x86: replace EX{d,q}Scalar by EXxmm_m{d,q}
Along the lines of 4102be5cf9 ("x86: replace EXxmm_mdq by
EXVexWdqScalar"), but in the opposite direction, replace EXdScalar/
EXqScalar by EXxmm_md/EXxmm_mq respectively, rendering d_scalar_mode and
q_scalar_mode unused. The change is done this way to improve telling
apart operands affected here from ones using EXbScalar/EXwScalar, which
work sufficiently differently. Additionally this increases similarity
between several VEX-encoded insns and their EVEX-encoded counterparts.
2020-07-06 13:34:36 +02:00
Jan Beulich
97e6786a6e x86: utilize X macro in EVEX decoding
For major opcodes allowing only packed FP kinds of operands, i.e. the
ones where legacy and AVX decoding uses the X macro, we can do so for
AVX512 as well, by attaching to the checking logic the "EVEX.W must
match presence of embedded 66 prefix" rule. (Encodings not following
this general pattern simply may not gain the PREFIX_OPCODE attribute.)

Note that testing of the thus altered decoding has already been put in
place by "x86: correct decoding of packed-FP-only AVX encodings".

This can also be at least partly applied to scalar-FP-only insns (i.e.
V{,U}COMIS{S,D}) as well as the vector-FP forms of insns also allowing
scalar encodings (e.g. VADDP{S,D}).

Take the opportunity and also fix EVEX-encoded VMOVNTP{S,D} as well as
to-memory forms of VMOV{L,H}PS and both forms of VMOV{L,H}PD to wrongly
disassemble with only register operands.
2020-06-09 08:57:22 +02:00
Jan Beulich
36cc073ef4 x86: remove ModRM.mod decoding layer from AVX512F VMOVS{S,D}
Just like their AVX counterparts they can utilize XMVexScalar /
EXdVexScalarS / EXqVexScalarS taking care of dropping the middle operand
for their memory forms.
2019-07-01 08:23:41 +02:00
H.J. Lu
e395f487b3 i386: Check vector length for scatter/gather prefetch instructions
Since not all vector lengths are supported by scatter/gather prefetch
instructions, decode them only with supported vector lengths.

gas/

	PR binutils/24719
	* testsuite/gas/i386/disassem.s: Add test for vgatherpf0dps
	with invalid vector length.
	* testsuite/gas/i386/x86-64-disassem.s: Likewise.
	* testsuite/gas/i386/disassem.d: Updated.
	* testsuite/gas/i386/x86-64-disassem.d: Likewise.

opcodes/

	PR binutils/24719
	* i386-dis-evex-len.h: Add EVEX_LEN_0F38C6_REG_1_PREFIX_2,
	EVEX_LEN_0F38C6_REG_2_PREFIX_2, EVEX_LEN_0F38C6_REG_5_PREFIX_2,
	EVEX_LEN_0F38C6_REG_6_PREFIX_2, EVEX_LEN_0F38C7_R_1_P_2_W_0,
	EVEX_LEN_0F38C7_R_1_P_2_W_1, EVEX_LEN_0F38C7_R_2_P_2_W_0,
	EVEX_LEN_0F38C7_R_2_P_2_W_1, EVEX_LEN_0F38C7_R_5_P_2_W_0,
	EVEX_LEN_0F38C7_R_5_P_2_W_1, EVEX_LEN_0F38C7_R_6_P_2_W_0 and
	EVEX_LEN_0F38C7_R_6_P_2_W_1.
	* i386-dis-evex-prefix.h: Update PREFIX_EVEX_0F38C6_REG_1,
	PREFIX_EVEX_0F38C6_REG_2, PREFIX_EVEX_0F38C6_REG_5 and
	PREFIX_EVEX_0F38C6_REG_6 entries.
	* i386-dis-evex-w.h: Update EVEX_W_0F38C7_R_1_P_2,
	EVEX_W_0F38C7_R_2_P_2, EVEX_W_0F38C7_R_5_P_2 and
	EVEX_W_0F38C7_R_6_P_2 entries.
	* i386-dis.c: Add EVEX_LEN_0F38C6_REG_1_PREFIX_2,
	EVEX_LEN_0F38C6_REG_2_PREFIX_2, EVEX_LEN_0F38C6_REG_5_PREFIX_2,
	EVEX_LEN_0F38C6_REG_6_PREFIX_2, EVEX_LEN_0F38C7_R_1_P_2_W_0,
	EVEX_LEN_0F38C7_R_1_P_2_W_1, EVEX_LEN_0F38C7_R_2_P_2_W_0,
	EVEX_LEN_0F38C7_R_2_P_2_W_1, EVEX_LEN_0F38C7_R_5_P_2_W_0,
	EVEX_LEN_0F38C7_R_5_P_2_W_1, EVEX_LEN_0F38C7_R_6_P_2_W_0 and
	EVEX_LEN_0F38C7_R_6_P_2_W_1 enums.
2019-06-27 13:39:32 -07:00
Jan Beulich
54fbadc0c3 x86: drop dqa_mode
I assume this mode was needed when EVEX.W handling wasn't really correct
yet for other than 64-bit mode. It's clearly not needed anymore. Its
elimination also allows dropping the EVEX.W split of VCVT{,U}SI2SS. (For
the record, the dropped mode would have been wrong if used in any table
entry not already guaranteeing EVEX.W=1.)
2019-06-25 09:35:17 +02:00
H.J. Lu
ad692897c1 i386: Break i386-dis-evex.h into small files
Break i386-dis-evex.h into small files such that each file is included
just once.

	* i386-dis-evex.h: Break into ...
	* i386-dis-evex-len.h: New file.
	* i386-dis-evex-mod.h: Likewise.
	* i386-dis-evex-prefix.h: Likewise.
	* i386-dis-evex-reg.h: Likewise.
	* i386-dis-evex-w.h: Likewise.
	* i386-dis.c: Include i386-dis-evex-reg.h, i386-dis-evex-prefix.h,
	i386-dis-evex.h, i386-dis-evex-len.h, i386-dis-evex-w.h and
	i386-dis-evex-mod.h.
2019-06-21 13:18:41 -07:00