Commit Graph

600 Commits

Author SHA1 Message Date
Jan Beulich
497ee27a74 x86: VP2INTERSECT{D,Q} have mask register destination group
Much like AVX512-{4FMAPS,4VNNIW} have a constraint on their register
source, there's a constraint (need to be even) on the destination
register here.

Adjust "good" test cases accordingly, and add a new test case to check
the warning.
2024-11-18 11:45:50 +01:00
Jan Beulich
a3db0f57df x86/APX: support JMPABS also in assembler
Without this APX support isn't really complete.

For Intel syntax displacement form is needed, such that symbolic
operands won't need prefixing by "offset". (The other form is actually
not used at all in Intel syntax.)

For the record: To restrict displacement form to Intel syntax is not
something I actually agree with.
2024-10-30 12:12:54 +01:00
Jan Beulich
5168ed9912 x86: use <xyz> for VFPCLASSP{S,D}
Just like VFPCLASSPH does. While the order of generated table entries
changes this way, the individual entries don't change.
2024-10-29 08:08:50 +01:00
MayShao-oc
b2841da4f2 x86: Regenerate missing table files
As soon as I committed Zhaoxin's patch, I realized that I did not
include the regen file. Regenerate them and commit as obvious.

opcodes/ChangeLog:

	* i386-tbl.h: Regenerated.
	* i386-mnem.h: Ditto.
	* i386-init.h: Ditto.
2024-10-18 15:57:22 +08:00
Liwei Xu
3bac89e65f Support Intel AVX10.2 convert instructions
In this patch, we will support AVX10.2 convert instructions. All
of them are new instruction forms.

Among all the instructions, vcvtbiasph2[b,h]f8[,s] needs extra care.
Since Operand 2 could indicate memory size, we do not need suffix
under ATTmode. However, we could not fold all three templates but only
XMM/YMM since the dst operand size are the same for them. Also, a new
iterator <cvt8> is added to reduce redundancy.

gas/
	* testsuite/gas/i386/i386.exp: Add AVX10.2 tests.
	* testsuite/gas/i386/x86-64.exp: Ditto.
	* testsuite/gas/i386/avx10_2-256-cvt-intel.d: New.
	* testsuite/gas/i386/avx10_2-256-cvt.d: Ditto.
	* testsuite/gas/i386/avx10_2-256-cvt.s: Ditto.
	* testsuite/gas/i386/avx10_2-512-cvt-intel.d: Ditto.
	* testsuite/gas/i386/avx10_2-512-cvt.d: Ditto.
	* testsuite/gas/i386/avx10_2-512-cvt.s: Ditto.
	* testsuite/gas/i386/x86-64-avx10_2-256-cvt-intel.d: Ditto.
	* testsuite/gas/i386/x86-64-avx10_2-256-cvt.d: Ditto.
	* testsuite/gas/i386/x86-64-avx10_2-256-cvt.s: Ditto.
	* testsuite/gas/i386/x86-64-avx10_2-512-cvt-intel.d: Ditto.
	* testsuite/gas/i386/x86-64-avx10_2-512-cvt.d: Ditto.
	* testsuite/gas/i386/x86-64-avx10_2-512-cvt.s: Ditto.

opcodes/
	* i386-dis-evex-prefix.h: Add PREFIX_EVEX_0F3874,
	PREFIX_EVEX_MAP5_18, PREFIX_EVEX_MAP5_1B,
	PREFIX_EVEX_MAP5_1E and PREFIX_EVEX_MAP5_74.
	* i386-dis-evex.h: Add table pass for AVX10.2
	instructions.
	* i386-dis.c (MOD_EVEX_0F38B1): New.
	(PREFIX_EVEX_0F3874): Ditto.
	(PREFIX_EVEX_MAP5_18): Ditto.
	(PREFIX_EVEX_MAP5_1B): Ditto.
	(PREFIX_EVEX_MAP5_1E): Ditto.
	(PREFIX_EVEX_MAP5_74): Ditto.
	* i386-opc.tbl: Add AVX10.2 instructions.
	* i386-mnem.h: Regenerated.
	* i386-tbl.h: Ditto.

Co-authored-by: Kong Lingling <lingling.kong@intel.com>
Co-authored-by: Haochen Jiang <haochen.jiang@intel.com>
2024-10-16 10:25:35 +08:00
Haochen Jiang
873e7b6cf6 Support Intel AVX10.2 media instructions
In disassembler part, for vnni instructions, we extended previous
VEX part using %XE in disassembler to promote them to EVEX by reusing
the original VEX table. For vmpsadbw, we will also use %XE. However,
it is hard to reuse the VEX table, so we are using new ones.

In assmbler part, we put the vnni table entries with previous vnni
instructions since they are just promotion from AVX-VNNI-INT{8,16}.
Since we will prefer VEX encoding, we need to use the different table
order in template <vnni>, which prefers EVEX due to earlier introduction
for AVX512_VNNI than AVX_VNNI. This means a new <vnni>. For vdpphps
and vmpsadbw, we put them at the end of the table, with future AVX10.2
instructions.

Nit: I will remove the arch requirement for avx_vnni_int{8,16} in
evex-promote testcases after AVX10.2 implies AVX-VNNI-INT{8,16}.

gas/Changelog:

	* testsuite/gas/i386/i386.exp: Add AVX10.2 tests.
	* testsuite/gas/i386/x86-64.exp: Ditto.
	* testsuite/gas/i386/avx10_2-256-1-intel.d: New.
	* testsuite/gas/i386/avx10_2-256-1.d: Ditto.
	* testsuite/gas/i386/avx10_2-256-1.s: Ditto.
	* testsuite/gas/i386/avx10_2-512-1-intel.d: Ditto.
	* testsuite/gas/i386/avx10_2-512-1.d: Ditto.
	* testsuite/gas/i386/avx10_2-512-1.s: Ditto.
	* testsuite/gas/i386/avx10_2-promote.d: Ditto.
	* testsuite/gas/i386/avx10_2-promote.s: Ditto.
	* testsuite/gas/i386/x86-64-avx10_2-256-1-intel.d: Ditto.
	* testsuite/gas/i386/x86-64-avx10_2-256-1.d: Ditto.
	* testsuite/gas/i386/x86-64-avx10_2-256-1.s: Ditto.
	* testsuite/gas/i386/x86-64-avx10_2-512-1-intel.d: Ditto.
	* testsuite/gas/i386/x86-64-avx10_2-512-1.d: Ditto.
	* testsuite/gas/i386/x86-64-avx10_2-512-1.s: Ditto.
	* testsuite/gas/i386/x86-64-avx10_2-promote.d: Ditto.
	* testsuite/gas/i386/x86-64-avx10_2-promote.s: Ditto.

opcodes/Changelog:

	* i386-dis-evex-prefix.h: Adjust PREFIX_EVEX_0F3852.
	Add PREFIX_EVEX_0F3A42_W_0.
	* i386-dis-evex-w.h: Adjust EVEX_W_0F3A42.
	* i386-dis-evex.h: Add table pass for AVX10.2
	instructions.
	* i386-dis.c: Adjust PREFIX_VEX_0F3850_W_0, PREFIX_VEX_0F3851_W_0,
	PREFIX_VEX_0F38D2_W_0 and PREFIX_VEX_0F38D3_W_0.
	* i386-opc.tbl: Add AVX10.2 instructions.
	* i386-mnem.h: Regenerated.
	* i386-tbl.h: Ditto.

Co-authored-by: Lili Cui <lili.cui@intel.com>
2024-10-11 10:38:27 +08:00
Jan Beulich
ca6b6f9d6e x86: optimize {,V}INSERTPS with certain immediates
They are equivalent to simple moves or xors, which are up to 3 bytes
shorter to encode (and maybe/likely also cheaper to execute).
2024-09-27 11:23:12 +02:00
Jan Beulich
f079b0c4b2 x86: optimize {,V}EXTRACT{F,I}{128,32x{4,8},64x{2,4}} with immediate 0
They, too, are equivalent to simple moves, which are up to 3 bytes
shorter to encode (and maybe also cheaper to execute).
2024-09-27 11:22:34 +02:00
Jan Beulich
afd5b33bc7 x86: optimize {,V}EXTRACTPS with immediate 0
They are equivalent to simple moves, which are up to 2 bytes shorter to
encode (and maybe also cheaper to execute).
2024-09-27 11:21:51 +02:00
Jan Beulich
174e5e38b9 x86: templatize SIMD narrowing-move templates
Once again to reduce redundancy.
2024-09-26 12:27:14 +02:00
Jan Beulich
2bb43416f9 x86: templatize SIMD sign-/zero-extension templates
Yet again to reduce redundancy.
2024-09-26 12:27:01 +02:00
Jan Beulich
0c27c22320 x86: templatize SIMD FP binary-logic templates
Once more to reduce redundancy.
2024-09-26 12:26:34 +02:00
Jan Beulich
5d285de425 x86: further templatize FMA templates
Further reduce redundancy, in preparation of the addition of
counterparts for AVX10.2.
2024-09-26 12:26:15 +02:00
Jan Beulich
fc91e3cec5 x86: templatize SIMD FP arithmetic templates
Reduce redundancy, in preparation of the addition of further counterparts
for AVX10.2. Provide the "ne" parameter needed there right away, even if
unused for now.
2024-09-26 12:25:45 +02:00
H.J. Lu
2963d7d80d x86/APX: Don't promote AVX/AVX2 instructions out of APX spec
V{BROADCAST,EXTRACT,INSERT}{F,I}128 and VROUND{P,S}{S,D} aren't promoted
to support EGPR in APX spec.  Don't promote them out of APX spec.  This
commit effectively reverted:

ec3babb8c1 x86/APX: V{BROADCAST,EXTRACT,INSERT}{F,I}128 can also be expressed
5a635f1f59 x86/APX: VROUND{P,S}{S,D} encodings require AVX512{F,VL}
eea4357967 x86/APX: VROUND{P,S}{S,D} can generally be encoded

gas/

	PR gas/32171
	* testsuite/gas/i386/x86-64-apx-egpr-promote-inval.s: Add
	V{BROADCAST,EXTRACT,INSERT}{F,I}128 tests with EGPR.
	* testsuite/gas/i386/x86-64-apx-evex-promoted.s: Remove
	V{BROADCAST,EXTRACT,INSERT}{F,I}128 and VROUND{P,S}{S,D} tests
	with EGPR.
	* testsuite/gas/i386/x86-64-apx-egpr-inval.l: Updated.
	* testsuite/gas/i386/x86-64-apx-egpr-promote-inval.l: Likewise.
	* testsuite/gas/i386/x86-64-apx-evex-promoted-intel.d: Likewise.
	* testsuite/gas/i386/x86-64-apx-evex-promoted-wig.d: Likewise.
	* testsuite/gas/i386/x86-64-apx-evex-promoted.d: Likewise.

opcodes/

	PR gas/32171
	* i386-opc.tbl: Remove V{BROADCAST,EXTRACT,INSERT}{F,I}128 and
	VROUND{P,S}{S,D} entries with EGPR.
	* i386-tbl.h: Regenerated.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2024-09-18 10:11:02 +08:00
Jan Beulich
4eb59a5243 x86/APX: use D for 2-operand CFCMOVcc
There's no need to have 30 redundant templates when we can easily take
care of the operand swapping like we do for various other insns.
2024-09-06 08:35:42 +02:00
Jan Beulich
6b8ed67d6e x86/APX: optimize certain reg-only CFCMOVcc forms
Along the lines of 2513312930 ("x86/APX: apply NDD-to-legacy
transformation to further CMOVcc forms") these can similarly be
converted to the shorter legacy-encoded CMOVcc.
2024-09-06 08:35:07 +02:00
Jan Beulich
f12eb19e17 x86: templatize VNNI templates
Reduce redundancy, in preparation of the addition of further counterparts
for AVX10.2.
2024-09-06 08:33:47 +02:00
Haochen Jiang
85e370a3d6 Support ymm rounding control for Intel AVX10.2
In the patch, in order to support ymm rounding for AVX10.2, we derive
evex attribute for all cases instead of only for rc_none to encode U bit.
Also changed some bad_opcode return due to the share of U bit with APX_F.

gas/ChangeLog:

	* config/tc-i386.c
	(cpu_flags_match): Handle AVX10_2.
	(build_evex_prefix): Handle U bit. Derive evex attribute
	for all cases.
	(check_VecOperands): Handle AVX10.2 and ymm roundings.
	* doc/c-i386.texi: Document .avx10.2.
	* testsuite/gas/i386/i386.exp: Run AVX10.2 tests.
	* testsuite/gas/i386/x86-64.exp: Ditto.
	* testsuite/gas/i386/avx10_2-rounding-intel.d: New test.
	* testsuite/gas/i386/avx10_2-rounding-inval.l: Ditto.
	* testsuite/gas/i386/avx10_2-rounding-inval.s: Ditto.
	* testsuite/gas/i386/avx10_2-rounding.d: Ditto.
	* testsuite/gas/i386/avx10_2-rounding.s: Ditto.
	* testsuite/gas/i386/x86-64-avx10_2-rounding-intel.d: Ditto.
	* testsuite/gas/i386/x86-64-avx10_2-rounding.d: Ditto.
	* testsuite/gas/i386/x86-64-avx10_2-rounding.s: Ditto.

opcodes/ChangeLog:

	* i386-dis.c (struct instr_info): Add U bit.
	(get_valid_dis386): Handle U bit.
	* i386-gen.c (isa_dependencies): Add AVX10.2.
	(cpu_flags): Ditto.
	* i386-init.h: Regenerated.
	* i386-opc.h (CpuAVX10_2): New.
	(i386_cpu_flags): Add cpuavx10_2.
	* i386-opc.tbl: Add rounding to old entries which do not
	permit rounding previously. Also eliminate the redundant
	RegXMM for vcvtps2uqq.
	* i386-tbl.h: Regenerated.
2024-09-02 10:53:59 +08:00
Jan Beulich
4eb19fde73 x86: limit RegRex64 use
The special property really only applies to the "extended" byte regs
having legacy word/dword counterparts.

While touching involved code also drop redundant byte checks from a
conditional in establish_rex(): The other remaining RegRex64 uses only
exist on registers which can't be used as register operands anyway.
Hence RegRex64 as an attribute of a (valid) register operand implies
that it's a byte reg.
2024-08-30 11:23:16 +02:00
Jan Beulich
1cd36be7c9 x86/APX: optimize certain {nf}-form insns to BMI2 ones
..., as those leave EFLAGS untouched anyway. That's a shorter encoding,
available as long as no eGPR is in use anywhere.
2024-07-26 07:59:04 +02:00
Cui, Lili
b0dd832fa4 Support APX CFCMOV
The CMOVcc instruction proposed by EVEX has four different forms,
corresponding to the four possible combinations of EVEX.ND and EVEX.NF
values.

In the encoder part, when the CFCMOV template supports EVEX_NF, it means that
it requires EVEX.NF to be 1.

In the decoder part, CFCMOV_Fixup is used to reverse source and destination
operands in the 2-operand case.

gas/ChangeLog:

        * config/tc-i386.c (build_apx_evex_prefix): Set NF bit for cfcmov
        when the insn template supports EVEX_NF.
        * testsuite/gas/i386/x86-64-apx-inval.l: Add invalid tests for cfcmov.
        * testsuite/gas/i386/x86-64-apx-inval.s: Ditto.
        * testsuite/gas/i386/x86-64.exp: Add tests for cfcmov and cmov.
        * testsuite/gas/i386/x86-64-apx-cfcmov-intel.d: Ditto.
        * testsuite/gas/i386/x86-64-apx-cfcmov.d: Ditto.
        * testsuite/gas/i386/x86-64-apx-cfcmov.s: Ditto.

opcodes/ChangeLog:

        * i386-dis-evex-prefix.h: Add cfcmov instructions.
        * i386-dis.c (CFCMOV_Fixup): Special handling of cfcmov.
        (putop): Print 'cf' for cfcmov instructions.
        * i386-opc.h (EVEX_NF): New.
        * i386-opc.tbl: Add cfcmov instructions.
        * i386-mnem.h: Regerated.
        * i386-tbl.h: Regerated.
2024-07-04 15:55:00 +08:00
Jan Beulich
2513312930 x86/APX: apply NDD-to-legacy transformation to further CMOVcc forms
With both sources being registers, these insns are almost commutative;
the only extra adjustment needed is inversion of the encoded condition.
2024-06-28 08:24:45 +02:00
Jan Beulich
7add993917 x86/APX: extend TEST-by-imm7 optimization to CTESTcc
The same properties apply there.
2024-06-28 08:24:12 +02:00
Jan Beulich
82e06fa803 x86/APX: optimize {nf}-form IMUL-by-power-of-2 to SHL
..., for differing only in the resulting EFLAGS, which are left
untouched anyway. That's a shorter encoding, available as long as
certain constraints on operands are met; see code comments. (SHL-by-1
forms may then be subject to further optimization that was introduced
earlier.)

Note that kind of as a side effect this also converts multiplication by
1 to shift by 0, which is a plain move or even no-op anyway. That could
be further shrunk (as could be presence of shifts/rotates by 0 in the
original code as  well as a fair set of other {nf}-form insns), yet the
expectation (for now) is that people won't write such code in the first
place.
2024-06-28 08:22:39 +02:00
Jan Beulich
27ef4876f7 x86/APX: optimize certain {nf}-form insns to LEA
..., as that leaves EFLAGS untouched anyway. That's a shorter encoding,
available as long as certain constraints on operand size and registers
are met; see code comments.

Note that this requires deferring to derive encoding_evex from {nf}
presence, as in optimize_encoding() we want to avoid touching the insns
when {evex} was also used.

Note further that this requires want_disp32() to now also consider the
opcode: We don't want to replace i.tm.mnem_off, for diagnostics to still
report the original mnemonic (or else things can get confusing). While
there, correct adjacent mis-indentation.
2024-06-28 08:19:59 +02:00
Jan Beulich
c7eae03eab x86/APX: optimize {nf}-form rotate-by-width-less-1
Unlike for the legacy forms, where there's a difference in the resulting
EFLAGS.CF, for the NF variants the immediate can be got rid of in that
case by switching to a 1-bit rotate in the opposite direction.
2024-06-28 08:19:32 +02:00
Jan Beulich
0868b8999b x86/APX: optimize {nf} forms of ADD/SUB with specific immediates
Unlike for the legacy forms, where there's a difference in the resulting
EFLAGS, for the NF variants we can safely replace ones using 0x80 by the
respectively other insn while negating the immediate, saving 3 immediate
bytes (just 1 though for 16-bit operand size). Similarly we can replace
ones using 1 / -1 by INC/DEC (eliminating the immediate).
2024-06-28 08:18:40 +02:00
Jan Beulich
f4a966a91d x86: optimize {,V}PEXTR{D,Q} with immediate of 0
Such are equivalent to simple moves, which are up to 3 bytes shorter to
encode (and perhaps also cheaper to execute).
2024-06-21 14:40:44 +02:00
Jan Beulich
fa2c4239f1 x86: optimize left-shift-by-1
These can be replaced by adds when acting on a register operand.

While for the scalar forms there's no gain in encoding size, ADD
generally has higher throughput than SHL. EFLAGS set by ADD are a
superset of those set by SHL (AF in particular is undefined there).

For the SIMD cases the transformation also reduced code size, by
eliminating the 1-byte immediate from the resulting encoding. Note
that this transformation is not applied by gcc13 (according to my
observations), so would - as of now - even improve compiler generated
code.
2024-06-21 14:39:52 +02:00
Cui, Lili
5445d7819b x86: Remove the secondary encoding for ctest.
There are two encodings for each opcode F6/F7 in ctest, but the second one
is never used, so remove it to reduce the size of opcode_tbl.h.

opcodes/ChangeLog:

        * i386-opc.tbl: Removed the secondary insn template for ctest.
        * i386-tbl.h: Regenerated.
2024-06-19 16:23:26 +08:00
Cui, Lili
d8ba1c4037 Support APX CCMP and CTEST
CCMP and CTEST are two new sets of instructions for conditional CMP
and TEST, SCC and OSZC flags are given as suffixes of CCMP or CTEST
in the instruction mnemonic, e.g.:

ccmp<cc> { dfv=sf , cf , of } %eax, %ecx

also add

{evex} cmp/test %eax, %ecx

as an alias for ccmpt.

For the encoder part, add function check_Scc_OszcOperation to parse
'{ dfv=of , sf, sf, cf}', store scc in the lower 4 bits of base_opcode,
and adjust base_opcode to its normal meaning in install_template.

For the decoder part, add 'SC' and 'DF' macros to add scc and oszc flags
suffixes.

gas/ChangeLog:

        * config/tc-i386.c (OSZC_CF): New.
        (OSZC_ZF): Ditto.
        (OSZC_SF): Ditto.
        (OSZC_OF): Ditto.
        (set_oszc_flags): Set oszc flags and report error for using the same oszc flags twice.
        (check_Scc_OszcOperations): Handle SCC OSZC flags.
        (install_template): Add scc and oszc_flags.
        (build_apx_evex_prefix): Encode SCC and oszc flags bits.
        (parse_insn): Handle check_Scc_OszcOperations.
        * testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d: Add ivalid test case.
        * testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s: Ditto.
        * testsuite/gas/i386/x86-64.exp: Add test for ccmp and ctest.
        * testsuite/gas/i386/x86-64-apx-ccmp-ctest-intel.d: New test.
        * testsuite/gas/i386/x86-64-apx-ccmp-ctest-inval.l: Ditto.
        * testsuite/gas/i386/x86-64-apx-ccmp-ctest-inval.s: Ditto.
        * testsuite/gas/i386/x86-64-apx-ccmp-ctest.d: Ditto.
        * testsuite/gas/i386/x86-64-apx-ccmp-ctest.s: Ditto.

opcodes/ChangeLog:

        * i386-dis-evex-reg.h: Add ccmp and ctest.
        * i386-dis-evex.h: Ditto.
        * i386-dis.c (struct instr_info): add scc.
        (struct dis386): Add new micro 'NE','SC' and'DF'.
        (get_valid_dis386): Get scc value and move MAP4 invalid check to print_insn.
        (putop): Handle %NE, %SC and %DF.
        * i386-opc.h (SCC): New.
        * i386-opc.tbl: Add ccmp/ctest and evex format for cmp/test.
        * i386-mnem.h: Regenerated.
        * i386-tbl.h: Ditto.
2024-06-18 10:52:40 +08:00
Jan Beulich
d1c2dd6f4d x86/APX: convert ZU to operand constraint
Extremely rarely used attributes are inefficient when represented by a
separate attribute. Convert it to an operand constraint, as already
suggested during review. The collision with RegKludge is pretty simple
to resolve.
2024-06-10 10:46:21 +02:00
Jan Beulich
f3f71a5ca0 x86/APX: support extended SETcc form
As indicated during review, spelling/readability-wise

	setz	%eax

is easier than

	setzuz	%al

_and_ properly specifies the full register that's being modified. Permit
that form to be used, even if the spec writers are unwilling to formally
mention it.

While there also correct the non-ZU EVEX form: That ought to also permit
memory operands.
2024-06-10 10:45:16 +02:00
Jan Beulich
d967140f8c x86/APX: add missing CPU requirement to imm+rm forms of <alu2> insns
This was overlooked when the form was added by dd74a60337 ("Support
APX NF").
2024-06-10 09:05:23 +02:00
Jan Beulich
b83021de7a x86/Intel: warn about undue mnemonic suffixes
Except for very few insns mnemonic suffixes aren't permitted in Intel
syntax. Warn about such for now, indicating that they will be outright
refused down the road.

While fiddling with testcases to address fallout, drop a few things
which should never have been tested as valid Intel syntax.

Also add a previously missing line to simd-suffix.d.
2024-05-29 10:03:00 +02:00
Jan Beulich
acd86c81f0 x86: correct VCVT{,U}SI2SD
Properly reject inappropriate suffixes (No_lSuf / No_qSuf mistakenly
omitted by cf665fee1d ["x86: re-work AVX512 embedded rounding / SAE"]),
to avoid emitting bad or arbitrarily guessed instructions. Interestingly
check_{long,qword}_suffix() don't help here, which perhaps is another
indication that the way they work right now isn't quite appropriate.

Sadly correcting just the templates breaks operand ambiguity detection,
since so far that worked from a single template permitting more than one
suffix. Here we have ambiguity though which can now be noticed only when
taking all (matching) templates together. Therefore we need to determine
further matching templates (see code comments for constraints), to then
accumulate permitted suffixes across all of them.
2024-05-24 11:50:38 +02:00
Cui, Lili
bbe8d019ed Support APX zero-upper
This patch is to enable ZU for IMUL (opcodes 0x69 and 0x6B) and SETcc.
Since the spec only recommends one form of setzu, I won't be adding
set<cc>reg32/reg64 support in this patch.

gas/ChangeLog:

        * config/tc-i386.c (build_apx_evex_prefix): Handle ZU.
        * testsuite/gas/i386/x86-64.exp: Added new tests for ZU.
        * testsuite/gas/i386/x86-64.exp: Added new tests for ZU.
        * testsuite/gas/i386/x86-64-apx-zu-intel.d: New test.
        * testsuite/gas/i386/x86-64-apx-zu-inval.l: Ditto.
        * testsuite/gas/i386/x86-64-apx-zu-inval.s: Ditto.
        * testsuite/gas/i386/x86-64-apx-zu.d: Ditto.
        * testsuite/gas/i386/x86-64-apx-zu.s: Ditto.

opcodes/ChangeLog:

        * i386-dis-evex-prefix.h: Handle PREFIX_EVEX_MAP4_40 ~
        PREFIX_EVEX_MAP4_4F.
        * i386-dis-evex.h: Ditto.
        * i386-dis.c (struct dis386): Add new micro 'ZU'.
        (putop): Handle %ZU.
        * i386-gen.c: Added ZU.
        * i386-opc.h: Ditto.
        * i386-opc.tbl: Added new templates to support ZU.
2024-05-22 16:15:47 +08:00
Cui, Lili
c8866e3ec5 x86: Drop using extension_opcode to encode vvvv register
gas/ChangeLog:

        * config/tc-i386.c (build_modrm_byte): Dropped the use of
        extension_opcode to encode the vvvv register.
        * testsuite/gas/i386/x86-64-sse2avx.d: Added new testcases.
        * testsuite/gas/i386/x86-64-sse2avx.s: Diito.

opcodes/ChangeLog:

        * i386-opc.tbl: Added DstVVVV to some extension_opcode instructions.
	* i386-tbl.h: Regenerated.
2024-05-06 18:33:45 +08:00
Cui, Lili
0820c9f5fc x86: Drop SwapSources
gas/ChangeLog:

        * config/tc-i386.c (build_modrm_byte): Dropped the use of
	SWAP_SOURCES to encode the vvvv register.

opcodes/ChangeLog:

        * i386-opc.h (SWAP_SOURCES): Dropped.
        (NO_DEFAULT_MASK): Adjusted the value.
        (ADDR_PREFIX_OP_REG): Ditto.
        (DISTINCT_DEST): Ditto.
        (IMPLICIT_STACK_OP): Ditto.
        (VexVVVV_SRC2): New.
        * i386-opc.tbl: Dropped SwapSources and replaced its VexVVVV
	with Src1VVVV.
	* i386-tbl.h: Regenerated.
2024-05-06 18:21:28 +08:00
Cui, Lili
f2a3a8814d x86: Use vexvvvv as the switch state to encode the vvvv register
Use vexvvvv as the switch state, and replace VexVVVV with Src1VVVV.
Src1VVVV means using VEX.vvvv encodes the first source register
operand. The old logic did not check vexvvvv first, which made the
logic here very complicated.

gas/ChangeLog:

        * config/tc-i386.c (optimize_encoding): Replaced 1 with Src1VVVV.
        (build_modrm_byte): Used vexvvvv to encode the vvvv register.
        (s_insn): Replaced 1 with Src1VVVV.

opcodes/ChangeLog:

        * i386-opc.h (VexVVVV_DST): Adjusted the value.
        (Src1VVVV): New.
        * i386-opc.tbl: Replaced part VexVVVV with Src1VVVV.
	* i386-tbl.h: Regenerated.
2024-05-06 18:16:42 +08:00
Jan Beulich
1d026d6b19 x86/APX: further extend SSE2AVX coverage
Since {vex}/{vex3} are respected on legacy mnemonics when -msse2avx is
in use, {evex} should be respected, too. So far this is the case only
for insns where eGPR-s can come into play. Extend coverage to insns with
only %xmm register and possibly immediate operands.
2024-05-03 09:27:00 +02:00
Jan Beulich
24187fb9c0 x86/APX: extend SSE2AVX coverage
Legacy encoded SIMD insns are converted to AVX ones in that mode. When
eGPR-s are in use, i.e. with APX, convert to AVX10 insns (where
available; there are quite a few which can't be converted).

Note that LDDQU is represented as VMOVDQU32 (and the prior use of the
sse3 template there needs dropping, to get the order right).

Note further that in a few cases, due to the use of templates, AVX512VL
is used when AVX512F would suffice. Since AVX10 is the main reference,
this shouldn't be too much of a problem.
2024-05-03 09:26:25 +02:00
Cui, Lili
dd74a60337 Support APX NF
For the case when NDD and NF are both 0 in evex-promoted format,
we will fully support and test it in another patch.

gas/ChangeLog:

       * NEWS: Support Intel APX NF.
       * config/tc-i386.c (enum i386_error): Add unsupported_nf.
       (struct _i386_insn): Add has_nf.
       (is_apx_evex_encoding): Ditto.
       (build_apx_evex_prefix): Encode the NF bit.
       (md_assemble): Handle unsupported_nf.
       (parse_insn): Handle Prefix_NF and report bad for illegal combination.
       (can_convert_NDD_to_legacy): Replace i.tm.opcode_modifier.nf with i.has_nf.
       (match_template): Support D for APX_F insns and check NF support.
       * testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d: Add bad test for NF bit.
       * testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s: Ditto.
       * testsuite/gas/i386/x86-64-apx-inval.l: Ditto.
       * testsuite/gas/i386/x86-64-apx-inval.s: Ditto.
       * testsuite/gas/i386/x86-64.exp: Add apx nf tests.
       * testsuite/gas/i386/x86-64-apx-nf-intel.d: New test.
       * testsuite/gas/i386/x86-64-apx-nf.d: Ditto.
       * testsuite/gas/i386/x86-64-apx-nf.s: Ditto.

opcodes/ChangeLog:

       * i386-dis-evex.h: Add %NF to the instructions that support APX NF and
       add new instruction imul, popcnt, tzcnt and lzcnt to EVEX table.
       * i386-dis-evex-reg.h: Ditto.
       * i386-dis.c (struct instr_info): Add nf.
       (struct dis386): Add "NF" for EVEX.NF.
       (get_valid_dis386): Set ins->vex.nf and report bad-nf for illegal case.
       (print_insn): Handle ins.vex.nf.
       (putop): Handle "%NF".
       * i386-opc.h (Prefix_NF): New.
       * i386-opc.tbl: Added new entries to support full APX NF instructions.
       * i386-mnem.h: Regenerated.
       * i386-tbl.h: Regenerated.
2024-04-07 17:28:25 +08:00
H.J. Lu
cca46dea4d Revert "x86: Restore APX shift-double instructions with omitted shift count"
This reverts commit c2d698fe03.

GCC 14 has been changed to use explicit shift count in shift-double
instructions by the commit:

06a7e7514af x86: Use explicit shift count in double-precision shifts

gas/

	PR gas/31606
	* testsuite/gas/i386/x86-64-apx-ndd-wig.d: Updated.
	* testsuite/gas/i386/x86-64-apx-ndd.d: Likewise.
	* testsuite/gas/i386/x86-64-apx-ndd.s: Remove tests for APX
	shift-double instructions with omitted shift count.

opcodes/

	PR gas/31606
	* i386-opc.tbl: Remove APX shift-double instructions with
	omitted shift count.
	* i386-tbl.h: Regenerated.
2024-04-06 05:07:18 -07:00
H.J. Lu
c2d698fe03 x86: Restore APX shift-double instructions with omitted shift count
Restore APX shift-double instructions with omitted shift count since
they are generated by GCC as shown in:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114590

gas/

	PR gas/31606
	* testsuite/gas/i386/x86-64-apx-ndd-wig.d: Updated.
	* testsuite/gas/i386/x86-64-apx-ndd.d: Likewise.
	* testsuite/gas/i386/x86-64-apx-ndd.s: Add tests for APX
	shift-double instructions with omitted shift count.

opcodes/

	PR gas/31606
	* i386-opc.tbl: Restore APX shift-double instructions with
	omitted shift count.
	* i386-tbl.h: Regenerated.
2024-04-04 13:16:20 -07:00
Jan Beulich
ef9a6314d8 x86: add missing No_qSuf to non-64-bit PTWRITE
While largely benign, it still should have been put there when the
original single template was split (commit a04973848d).
2024-04-03 10:41:30 +02:00
Jan Beulich
0006623c18 x86: drop stray Size64 from WRSSQ
Like for WRUSSQ it's not needed here. The legacy insn had gained it in
the course of zapping Rex64, but that attribute wasn't needed here
either. The APX insn then simply gained it by copy-and-paste, I suppose.
2024-04-03 10:40:57 +02:00
Cui, Lili
8963a60d7b x86/APX: Remove KEYLOCKER and SHA promotions from EVEX MAP4
APX spec removed KEYLOCKER and SHA promotions from EVEX MAP4.
https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html

gas/ChangeLog:

        * NEWS: Mention that remove KEYLOCKER and SHA promotions from EVEX
	* MAP4.
        * config/tc-i386.c (process_operands): Removed special handling of
	* KEYLOCKER and SHA.
        * testsuite/gas/i386/x86-64-apx-egpr-promote-inval.l: Removed KEYLOCKER
        * and SHA instructions.
        * testsuite/gas/i386/x86-64-apx-egpr-promote-inval.s: Ditto.
        * testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d: Ditto.
        * testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s: Ditto.
        * testsuite/gas/i386/x86-64-apx-evex-promoted-intel.d: Ditto.
        * testsuite/gas/i386/x86-64-apx-evex-promoted-wig.d: Ditto.
        * testsuite/gas/i386/x86-64-apx-evex-promoted.d: Ditto.
        * testsuite/gas/i386/x86-64-apx-evex-promoted.s: Ditto.

opcodes/ChangeLog:

        * i386-dis-evex-prefix.h: Removed KEYLOCKER and SHA instructions.
        * i386-dis-evex.h: Ditto.
        * i386-opc.tbl: Ditto.
        * i386-dis.c (print_vector_reg): Removed special handling of KEYLOCKER
	*  and SHA.
2024-04-03 09:50:00 +08:00
Jan Beulich
ffa2571063 x86: templatize shift-double insns
With the multitude of new APX templates, it finally becomes desirable to
further remove redundancy by also templatizing basic arithmetic insns.
Continue with the shift-double ones.

While there also drop the APX form with ShiftCount omitted. Other shift
and rotate insns were deliberately left without this form as well. Note
that there's also no testsuite adjustment needed for this, indicating
that the form wasn't tested either.
2024-03-28 11:49:48 +01:00