Much like AVX512-{4FMAPS,4VNNIW} have a constraint on their register
source, there's a constraint (need to be even) on the destination
register here.
Adjust "good" test cases accordingly, and add a new test case to check
the warning.
Without this APX support isn't really complete.
For Intel syntax displacement form is needed, such that symbolic
operands won't need prefixing by "offset". (The other form is actually
not used at all in Intel syntax.)
For the record: To restrict displacement form to Intel syntax is not
something I actually agree with.
As soon as I committed Zhaoxin's patch, I realized that I did not
include the regen file. Regenerate them and commit as obvious.
opcodes/ChangeLog:
* i386-tbl.h: Regenerated.
* i386-mnem.h: Ditto.
* i386-init.h: Ditto.
In this patch, we will support AVX10.2 convert instructions. All
of them are new instruction forms.
Among all the instructions, vcvtbiasph2[b,h]f8[,s] needs extra care.
Since Operand 2 could indicate memory size, we do not need suffix
under ATTmode. However, we could not fold all three templates but only
XMM/YMM since the dst operand size are the same for them. Also, a new
iterator <cvt8> is added to reduce redundancy.
gas/
* testsuite/gas/i386/i386.exp: Add AVX10.2 tests.
* testsuite/gas/i386/x86-64.exp: Ditto.
* testsuite/gas/i386/avx10_2-256-cvt-intel.d: New.
* testsuite/gas/i386/avx10_2-256-cvt.d: Ditto.
* testsuite/gas/i386/avx10_2-256-cvt.s: Ditto.
* testsuite/gas/i386/avx10_2-512-cvt-intel.d: Ditto.
* testsuite/gas/i386/avx10_2-512-cvt.d: Ditto.
* testsuite/gas/i386/avx10_2-512-cvt.s: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-256-cvt-intel.d: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-256-cvt.d: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-256-cvt.s: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-512-cvt-intel.d: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-512-cvt.d: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-512-cvt.s: Ditto.
opcodes/
* i386-dis-evex-prefix.h: Add PREFIX_EVEX_0F3874,
PREFIX_EVEX_MAP5_18, PREFIX_EVEX_MAP5_1B,
PREFIX_EVEX_MAP5_1E and PREFIX_EVEX_MAP5_74.
* i386-dis-evex.h: Add table pass for AVX10.2
instructions.
* i386-dis.c (MOD_EVEX_0F38B1): New.
(PREFIX_EVEX_0F3874): Ditto.
(PREFIX_EVEX_MAP5_18): Ditto.
(PREFIX_EVEX_MAP5_1B): Ditto.
(PREFIX_EVEX_MAP5_1E): Ditto.
(PREFIX_EVEX_MAP5_74): Ditto.
* i386-opc.tbl: Add AVX10.2 instructions.
* i386-mnem.h: Regenerated.
* i386-tbl.h: Ditto.
Co-authored-by: Kong Lingling <lingling.kong@intel.com>
Co-authored-by: Haochen Jiang <haochen.jiang@intel.com>
In disassembler part, for vnni instructions, we extended previous
VEX part using %XE in disassembler to promote them to EVEX by reusing
the original VEX table. For vmpsadbw, we will also use %XE. However,
it is hard to reuse the VEX table, so we are using new ones.
In assmbler part, we put the vnni table entries with previous vnni
instructions since they are just promotion from AVX-VNNI-INT{8,16}.
Since we will prefer VEX encoding, we need to use the different table
order in template <vnni>, which prefers EVEX due to earlier introduction
for AVX512_VNNI than AVX_VNNI. This means a new <vnni>. For vdpphps
and vmpsadbw, we put them at the end of the table, with future AVX10.2
instructions.
Nit: I will remove the arch requirement for avx_vnni_int{8,16} in
evex-promote testcases after AVX10.2 implies AVX-VNNI-INT{8,16}.
gas/Changelog:
* testsuite/gas/i386/i386.exp: Add AVX10.2 tests.
* testsuite/gas/i386/x86-64.exp: Ditto.
* testsuite/gas/i386/avx10_2-256-1-intel.d: New.
* testsuite/gas/i386/avx10_2-256-1.d: Ditto.
* testsuite/gas/i386/avx10_2-256-1.s: Ditto.
* testsuite/gas/i386/avx10_2-512-1-intel.d: Ditto.
* testsuite/gas/i386/avx10_2-512-1.d: Ditto.
* testsuite/gas/i386/avx10_2-512-1.s: Ditto.
* testsuite/gas/i386/avx10_2-promote.d: Ditto.
* testsuite/gas/i386/avx10_2-promote.s: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-256-1-intel.d: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-256-1.d: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-256-1.s: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-512-1-intel.d: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-512-1.d: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-512-1.s: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-promote.d: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-promote.s: Ditto.
opcodes/Changelog:
* i386-dis-evex-prefix.h: Adjust PREFIX_EVEX_0F3852.
Add PREFIX_EVEX_0F3A42_W_0.
* i386-dis-evex-w.h: Adjust EVEX_W_0F3A42.
* i386-dis-evex.h: Add table pass for AVX10.2
instructions.
* i386-dis.c: Adjust PREFIX_VEX_0F3850_W_0, PREFIX_VEX_0F3851_W_0,
PREFIX_VEX_0F38D2_W_0 and PREFIX_VEX_0F38D3_W_0.
* i386-opc.tbl: Add AVX10.2 instructions.
* i386-mnem.h: Regenerated.
* i386-tbl.h: Ditto.
Co-authored-by: Lili Cui <lili.cui@intel.com>
Reduce redundancy, in preparation of the addition of further counterparts
for AVX10.2. Provide the "ne" parameter needed there right away, even if
unused for now.
V{BROADCAST,EXTRACT,INSERT}{F,I}128 and VROUND{P,S}{S,D} aren't promoted
to support EGPR in APX spec. Don't promote them out of APX spec. This
commit effectively reverted:
ec3babb8c1 x86/APX: V{BROADCAST,EXTRACT,INSERT}{F,I}128 can also be expressed
5a635f1f59 x86/APX: VROUND{P,S}{S,D} encodings require AVX512{F,VL}
eea4357967 x86/APX: VROUND{P,S}{S,D} can generally be encoded
gas/
PR gas/32171
* testsuite/gas/i386/x86-64-apx-egpr-promote-inval.s: Add
V{BROADCAST,EXTRACT,INSERT}{F,I}128 tests with EGPR.
* testsuite/gas/i386/x86-64-apx-evex-promoted.s: Remove
V{BROADCAST,EXTRACT,INSERT}{F,I}128 and VROUND{P,S}{S,D} tests
with EGPR.
* testsuite/gas/i386/x86-64-apx-egpr-inval.l: Updated.
* testsuite/gas/i386/x86-64-apx-egpr-promote-inval.l: Likewise.
* testsuite/gas/i386/x86-64-apx-evex-promoted-intel.d: Likewise.
* testsuite/gas/i386/x86-64-apx-evex-promoted-wig.d: Likewise.
* testsuite/gas/i386/x86-64-apx-evex-promoted.d: Likewise.
opcodes/
PR gas/32171
* i386-opc.tbl: Remove V{BROADCAST,EXTRACT,INSERT}{F,I}128 and
VROUND{P,S}{S,D} entries with EGPR.
* i386-tbl.h: Regenerated.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Along the lines of 2513312930 ("x86/APX: apply NDD-to-legacy
transformation to further CMOVcc forms") these can similarly be
converted to the shorter legacy-encoded CMOVcc.
In the patch, in order to support ymm rounding for AVX10.2, we derive
evex attribute for all cases instead of only for rc_none to encode U bit.
Also changed some bad_opcode return due to the share of U bit with APX_F.
gas/ChangeLog:
* config/tc-i386.c
(cpu_flags_match): Handle AVX10_2.
(build_evex_prefix): Handle U bit. Derive evex attribute
for all cases.
(check_VecOperands): Handle AVX10.2 and ymm roundings.
* doc/c-i386.texi: Document .avx10.2.
* testsuite/gas/i386/i386.exp: Run AVX10.2 tests.
* testsuite/gas/i386/x86-64.exp: Ditto.
* testsuite/gas/i386/avx10_2-rounding-intel.d: New test.
* testsuite/gas/i386/avx10_2-rounding-inval.l: Ditto.
* testsuite/gas/i386/avx10_2-rounding-inval.s: Ditto.
* testsuite/gas/i386/avx10_2-rounding.d: Ditto.
* testsuite/gas/i386/avx10_2-rounding.s: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-rounding-intel.d: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-rounding.d: Ditto.
* testsuite/gas/i386/x86-64-avx10_2-rounding.s: Ditto.
opcodes/ChangeLog:
* i386-dis.c (struct instr_info): Add U bit.
(get_valid_dis386): Handle U bit.
* i386-gen.c (isa_dependencies): Add AVX10.2.
(cpu_flags): Ditto.
* i386-init.h: Regenerated.
* i386-opc.h (CpuAVX10_2): New.
(i386_cpu_flags): Add cpuavx10_2.
* i386-opc.tbl: Add rounding to old entries which do not
permit rounding previously. Also eliminate the redundant
RegXMM for vcvtps2uqq.
* i386-tbl.h: Regenerated.
The special property really only applies to the "extended" byte regs
having legacy word/dword counterparts.
While touching involved code also drop redundant byte checks from a
conditional in establish_rex(): The other remaining RegRex64 uses only
exist on registers which can't be used as register operands anyway.
Hence RegRex64 as an attribute of a (valid) register operand implies
that it's a byte reg.
The CMOVcc instruction proposed by EVEX has four different forms,
corresponding to the four possible combinations of EVEX.ND and EVEX.NF
values.
In the encoder part, when the CFCMOV template supports EVEX_NF, it means that
it requires EVEX.NF to be 1.
In the decoder part, CFCMOV_Fixup is used to reverse source and destination
operands in the 2-operand case.
gas/ChangeLog:
* config/tc-i386.c (build_apx_evex_prefix): Set NF bit for cfcmov
when the insn template supports EVEX_NF.
* testsuite/gas/i386/x86-64-apx-inval.l: Add invalid tests for cfcmov.
* testsuite/gas/i386/x86-64-apx-inval.s: Ditto.
* testsuite/gas/i386/x86-64.exp: Add tests for cfcmov and cmov.
* testsuite/gas/i386/x86-64-apx-cfcmov-intel.d: Ditto.
* testsuite/gas/i386/x86-64-apx-cfcmov.d: Ditto.
* testsuite/gas/i386/x86-64-apx-cfcmov.s: Ditto.
opcodes/ChangeLog:
* i386-dis-evex-prefix.h: Add cfcmov instructions.
* i386-dis.c (CFCMOV_Fixup): Special handling of cfcmov.
(putop): Print 'cf' for cfcmov instructions.
* i386-opc.h (EVEX_NF): New.
* i386-opc.tbl: Add cfcmov instructions.
* i386-mnem.h: Regerated.
* i386-tbl.h: Regerated.
..., for differing only in the resulting EFLAGS, which are left
untouched anyway. That's a shorter encoding, available as long as
certain constraints on operands are met; see code comments. (SHL-by-1
forms may then be subject to further optimization that was introduced
earlier.)
Note that kind of as a side effect this also converts multiplication by
1 to shift by 0, which is a plain move or even no-op anyway. That could
be further shrunk (as could be presence of shifts/rotates by 0 in the
original code as well as a fair set of other {nf}-form insns), yet the
expectation (for now) is that people won't write such code in the first
place.
..., as that leaves EFLAGS untouched anyway. That's a shorter encoding,
available as long as certain constraints on operand size and registers
are met; see code comments.
Note that this requires deferring to derive encoding_evex from {nf}
presence, as in optimize_encoding() we want to avoid touching the insns
when {evex} was also used.
Note further that this requires want_disp32() to now also consider the
opcode: We don't want to replace i.tm.mnem_off, for diagnostics to still
report the original mnemonic (or else things can get confusing). While
there, correct adjacent mis-indentation.
Unlike for the legacy forms, where there's a difference in the resulting
EFLAGS.CF, for the NF variants the immediate can be got rid of in that
case by switching to a 1-bit rotate in the opposite direction.
Unlike for the legacy forms, where there's a difference in the resulting
EFLAGS, for the NF variants we can safely replace ones using 0x80 by the
respectively other insn while negating the immediate, saving 3 immediate
bytes (just 1 though for 16-bit operand size). Similarly we can replace
ones using 1 / -1 by INC/DEC (eliminating the immediate).
These can be replaced by adds when acting on a register operand.
While for the scalar forms there's no gain in encoding size, ADD
generally has higher throughput than SHL. EFLAGS set by ADD are a
superset of those set by SHL (AF in particular is undefined there).
For the SIMD cases the transformation also reduced code size, by
eliminating the 1-byte immediate from the resulting encoding. Note
that this transformation is not applied by gcc13 (according to my
observations), so would - as of now - even improve compiler generated
code.
There are two encodings for each opcode F6/F7 in ctest, but the second one
is never used, so remove it to reduce the size of opcode_tbl.h.
opcodes/ChangeLog:
* i386-opc.tbl: Removed the secondary insn template for ctest.
* i386-tbl.h: Regenerated.
CCMP and CTEST are two new sets of instructions for conditional CMP
and TEST, SCC and OSZC flags are given as suffixes of CCMP or CTEST
in the instruction mnemonic, e.g.:
ccmp<cc> { dfv=sf , cf , of } %eax, %ecx
also add
{evex} cmp/test %eax, %ecx
as an alias for ccmpt.
For the encoder part, add function check_Scc_OszcOperation to parse
'{ dfv=of , sf, sf, cf}', store scc in the lower 4 bits of base_opcode,
and adjust base_opcode to its normal meaning in install_template.
For the decoder part, add 'SC' and 'DF' macros to add scc and oszc flags
suffixes.
gas/ChangeLog:
* config/tc-i386.c (OSZC_CF): New.
(OSZC_ZF): Ditto.
(OSZC_SF): Ditto.
(OSZC_OF): Ditto.
(set_oszc_flags): Set oszc flags and report error for using the same oszc flags twice.
(check_Scc_OszcOperations): Handle SCC OSZC flags.
(install_template): Add scc and oszc_flags.
(build_apx_evex_prefix): Encode SCC and oszc flags bits.
(parse_insn): Handle check_Scc_OszcOperations.
* testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d: Add ivalid test case.
* testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s: Ditto.
* testsuite/gas/i386/x86-64.exp: Add test for ccmp and ctest.
* testsuite/gas/i386/x86-64-apx-ccmp-ctest-intel.d: New test.
* testsuite/gas/i386/x86-64-apx-ccmp-ctest-inval.l: Ditto.
* testsuite/gas/i386/x86-64-apx-ccmp-ctest-inval.s: Ditto.
* testsuite/gas/i386/x86-64-apx-ccmp-ctest.d: Ditto.
* testsuite/gas/i386/x86-64-apx-ccmp-ctest.s: Ditto.
opcodes/ChangeLog:
* i386-dis-evex-reg.h: Add ccmp and ctest.
* i386-dis-evex.h: Ditto.
* i386-dis.c (struct instr_info): add scc.
(struct dis386): Add new micro 'NE','SC' and'DF'.
(get_valid_dis386): Get scc value and move MAP4 invalid check to print_insn.
(putop): Handle %NE, %SC and %DF.
* i386-opc.h (SCC): New.
* i386-opc.tbl: Add ccmp/ctest and evex format for cmp/test.
* i386-mnem.h: Regenerated.
* i386-tbl.h: Ditto.
Extremely rarely used attributes are inefficient when represented by a
separate attribute. Convert it to an operand constraint, as already
suggested during review. The collision with RegKludge is pretty simple
to resolve.
As indicated during review, spelling/readability-wise
setz %eax
is easier than
setzuz %al
_and_ properly specifies the full register that's being modified. Permit
that form to be used, even if the spec writers are unwilling to formally
mention it.
While there also correct the non-ZU EVEX form: That ought to also permit
memory operands.
Except for very few insns mnemonic suffixes aren't permitted in Intel
syntax. Warn about such for now, indicating that they will be outright
refused down the road.
While fiddling with testcases to address fallout, drop a few things
which should never have been tested as valid Intel syntax.
Also add a previously missing line to simd-suffix.d.
Properly reject inappropriate suffixes (No_lSuf / No_qSuf mistakenly
omitted by cf665fee1d ["x86: re-work AVX512 embedded rounding / SAE"]),
to avoid emitting bad or arbitrarily guessed instructions. Interestingly
check_{long,qword}_suffix() don't help here, which perhaps is another
indication that the way they work right now isn't quite appropriate.
Sadly correcting just the templates breaks operand ambiguity detection,
since so far that worked from a single template permitting more than one
suffix. Here we have ambiguity though which can now be noticed only when
taking all (matching) templates together. Therefore we need to determine
further matching templates (see code comments for constraints), to then
accumulate permitted suffixes across all of them.
This patch is to enable ZU for IMUL (opcodes 0x69 and 0x6B) and SETcc.
Since the spec only recommends one form of setzu, I won't be adding
set<cc>reg32/reg64 support in this patch.
gas/ChangeLog:
* config/tc-i386.c (build_apx_evex_prefix): Handle ZU.
* testsuite/gas/i386/x86-64.exp: Added new tests for ZU.
* testsuite/gas/i386/x86-64.exp: Added new tests for ZU.
* testsuite/gas/i386/x86-64-apx-zu-intel.d: New test.
* testsuite/gas/i386/x86-64-apx-zu-inval.l: Ditto.
* testsuite/gas/i386/x86-64-apx-zu-inval.s: Ditto.
* testsuite/gas/i386/x86-64-apx-zu.d: Ditto.
* testsuite/gas/i386/x86-64-apx-zu.s: Ditto.
opcodes/ChangeLog:
* i386-dis-evex-prefix.h: Handle PREFIX_EVEX_MAP4_40 ~
PREFIX_EVEX_MAP4_4F.
* i386-dis-evex.h: Ditto.
* i386-dis.c (struct dis386): Add new micro 'ZU'.
(putop): Handle %ZU.
* i386-gen.c: Added ZU.
* i386-opc.h: Ditto.
* i386-opc.tbl: Added new templates to support ZU.
gas/ChangeLog:
* config/tc-i386.c (build_modrm_byte): Dropped the use of
extension_opcode to encode the vvvv register.
* testsuite/gas/i386/x86-64-sse2avx.d: Added new testcases.
* testsuite/gas/i386/x86-64-sse2avx.s: Diito.
opcodes/ChangeLog:
* i386-opc.tbl: Added DstVVVV to some extension_opcode instructions.
* i386-tbl.h: Regenerated.
gas/ChangeLog:
* config/tc-i386.c (build_modrm_byte): Dropped the use of
SWAP_SOURCES to encode the vvvv register.
opcodes/ChangeLog:
* i386-opc.h (SWAP_SOURCES): Dropped.
(NO_DEFAULT_MASK): Adjusted the value.
(ADDR_PREFIX_OP_REG): Ditto.
(DISTINCT_DEST): Ditto.
(IMPLICIT_STACK_OP): Ditto.
(VexVVVV_SRC2): New.
* i386-opc.tbl: Dropped SwapSources and replaced its VexVVVV
with Src1VVVV.
* i386-tbl.h: Regenerated.
Use vexvvvv as the switch state, and replace VexVVVV with Src1VVVV.
Src1VVVV means using VEX.vvvv encodes the first source register
operand. The old logic did not check vexvvvv first, which made the
logic here very complicated.
gas/ChangeLog:
* config/tc-i386.c (optimize_encoding): Replaced 1 with Src1VVVV.
(build_modrm_byte): Used vexvvvv to encode the vvvv register.
(s_insn): Replaced 1 with Src1VVVV.
opcodes/ChangeLog:
* i386-opc.h (VexVVVV_DST): Adjusted the value.
(Src1VVVV): New.
* i386-opc.tbl: Replaced part VexVVVV with Src1VVVV.
* i386-tbl.h: Regenerated.
Since {vex}/{vex3} are respected on legacy mnemonics when -msse2avx is
in use, {evex} should be respected, too. So far this is the case only
for insns where eGPR-s can come into play. Extend coverage to insns with
only %xmm register and possibly immediate operands.
Legacy encoded SIMD insns are converted to AVX ones in that mode. When
eGPR-s are in use, i.e. with APX, convert to AVX10 insns (where
available; there are quite a few which can't be converted).
Note that LDDQU is represented as VMOVDQU32 (and the prior use of the
sse3 template there needs dropping, to get the order right).
Note further that in a few cases, due to the use of templates, AVX512VL
is used when AVX512F would suffice. Since AVX10 is the main reference,
this shouldn't be too much of a problem.
For the case when NDD and NF are both 0 in evex-promoted format,
we will fully support and test it in another patch.
gas/ChangeLog:
* NEWS: Support Intel APX NF.
* config/tc-i386.c (enum i386_error): Add unsupported_nf.
(struct _i386_insn): Add has_nf.
(is_apx_evex_encoding): Ditto.
(build_apx_evex_prefix): Encode the NF bit.
(md_assemble): Handle unsupported_nf.
(parse_insn): Handle Prefix_NF and report bad for illegal combination.
(can_convert_NDD_to_legacy): Replace i.tm.opcode_modifier.nf with i.has_nf.
(match_template): Support D for APX_F insns and check NF support.
* testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d: Add bad test for NF bit.
* testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s: Ditto.
* testsuite/gas/i386/x86-64-apx-inval.l: Ditto.
* testsuite/gas/i386/x86-64-apx-inval.s: Ditto.
* testsuite/gas/i386/x86-64.exp: Add apx nf tests.
* testsuite/gas/i386/x86-64-apx-nf-intel.d: New test.
* testsuite/gas/i386/x86-64-apx-nf.d: Ditto.
* testsuite/gas/i386/x86-64-apx-nf.s: Ditto.
opcodes/ChangeLog:
* i386-dis-evex.h: Add %NF to the instructions that support APX NF and
add new instruction imul, popcnt, tzcnt and lzcnt to EVEX table.
* i386-dis-evex-reg.h: Ditto.
* i386-dis.c (struct instr_info): Add nf.
(struct dis386): Add "NF" for EVEX.NF.
(get_valid_dis386): Set ins->vex.nf and report bad-nf for illegal case.
(print_insn): Handle ins.vex.nf.
(putop): Handle "%NF".
* i386-opc.h (Prefix_NF): New.
* i386-opc.tbl: Added new entries to support full APX NF instructions.
* i386-mnem.h: Regenerated.
* i386-tbl.h: Regenerated.
Like for WRUSSQ it's not needed here. The legacy insn had gained it in
the course of zapping Rex64, but that attribute wasn't needed here
either. The APX insn then simply gained it by copy-and-paste, I suppose.
With the multitude of new APX templates, it finally becomes desirable to
further remove redundancy by also templatizing basic arithmetic insns.
Continue with the shift-double ones.
While there also drop the APX form with ShiftCount omitted. Other shift
and rotate insns were deliberately left without this form as well. Note
that there's also no testsuite adjustment needed for this, indicating
that the form wasn't tested either.