First of all rename the meanwhile misleading Opcode_SIMD_FloatD, as it
has also been used for KMOV* and BNDMOV. Then simplify the condition
selecting which form if "reversing" to use - except for the MOV to/from
control/debug/test registers all extended opcode space insns use bit 0
(rather than bit 1) to indicate the direction (from/to memory) of an
operation. With that, D can simply be set on the first of the two
templates, while the other can be dropped.
This saves quite a number of shift instructions: The "operands" field
can now be retrieved by just masking (no shift), and extracting the
"extension_opcode" field now only requires a (signed) right shift, with
no prereq left one. (Of course there may be architectures where, in a
cross build, there might be no difference at all, e.g. when there are
suitable bitfield extraction insns.)
This once again allows to reduce redundancy in (and size of) the opcode
table.
Don't go as far as also making D work on the two 5-operand XOP insns:
This would significantly complicate the code, as there the first
(immediate) operand would need special treatment in several places.
Note that the .s suffix isn't being enabled to have any effect, for
being deprecated. Whereas neither {load} nor {store} pseudo prefixes
make sense here, as the respective operands are inputs (loads) only
anyway, regardless of order. Hence there is (as before) no way for the
programmer to request the alternative encoding to be used for register-
only insns.
Note further that it is always the first original template which is
retained (and altered), to make sure the same encoding as before is
used for register-only insns. This has the slightly odd (but pre-
existing) effect of XOP register-only insns having XOP.W clear, but FMA4
ones having VEX.W set.
The only case where 64-bit code uses non-sign-extended (can also be
considered zero-extended) displacements is when an address size override
is in place for a memory operand (i.e. particularly excluding
displacements of direct branches, which - if at all - are controlled by
operand size, and then are still sign-extended, just from 16 bits).
Hence the distinction in templates is unnecessary, allowing code to be
simplified in a number of places. The only place where logic becomes
more complicated is when signed-ness of relocations is determined in
output_disp().
The other caveat is that Disp64 cannot be specified anymore in an insn
template at the same time as Disp32. Unlike for non-64-bit mode,
templates don't specify displacements for both possible addressing
modes; the necessary adjustment to the expected ones has already been
done in match_template() anyway (but of course the logic there needs
tweaking now). Hence the single template so far doing so is split.
Setting this field risks cpu_flags_all_zero() mistakenly returning
"false" when the object passed in was e.g. the result of ANDing together
two objects which had the bit set, or ANDNing together an object with
the field set and one with the field clear.
While there also avoid setting CpuNo64: Like Cpu64 this is driven
differently anyway and hence shouldn't be set anywhere by default.
Note that the moving of the two items in i386-gen.c's cpu_flags[] is
only for documentation purposes (and slight reducing of overhead), as
the fields are sorted anyway upon program start.
To avoid issues like that addressed by 6e3e5c9e41 ("x86: extend SSE
check to PCLMULQDQ, AES, and GFNI insns"), base the check on opcode
attributes and operand types.
The result of running etc/update-copyright.py --this-year, fixing all
the files whose mode is changed by the script, plus a build with
--enable-maintainer-mode --enable-cgen-maint=yes, then checking
out */po/*.pot which we don't update frequently.
The copy of cgen was with commit d1dd5fcc38ead reverted as that commit
breaks building of bfp opcodes files.
Intel AVX512 FP16 instructions use maps 3, 5 and 6. Maps 5 and 6 use 3 bits
in the EVEX.mmm field (0b101, 0b110). Map 5 is for instructions that were FP32
in map 1 (0Fxx). Map 6 is for instructions that were FP32 in map 2 (0F38xx).
There are some exceptions to this rule. Some things in map 1 (0Fxx) with imm8
operands predated our current conventions; those instructions moved to map 3.
FP32 things in map 3 (0F3Axx) found new opcodes in map3 for FP16 because map3
is very sparsely populated. Most of the FP16 instructions share opcodes and
prefix (EVEX.pp) bits with the related FP32 operations.
Intel AVX512 FP16 instructions has new displacements scaling rules, please refer
to the public software developer manual for detail information.
gas/
2021-08-05 Igor Tsimbalist <igor.v.tsimbalist@intel.com>
H.J. Lu <hongjiu.lu@intel.com>
Wei Xiao <wei3.xiao@intel.com>
Lili Cui <lili.cui@intel.com>
* config/tc-i386.c (struct Broadcast_Operation): Adjust comment.
(cpu_arch): Add .avx512_fp16.
(cpu_noarch): Add noavx512_fp16.
(pte): Add evexmap5 and evexmap6.
(build_evex_prefix): Handle EVEXMAP5 and EVEXMAP6.
(check_VecOperations): Handle {1to32}.
(check_VecOperands): Handle CheckRegNumb.
(check_word_reg): Handle Toqword.
(i386_error): Add invalid_dest_and_src_register_set.
(match_template): Handle invalid_dest_and_src_register_set.
* doc/c-i386.texi: Document avx512_fp16, noavx512_fp16.
opcodes/
2021-08-05 Igor Tsimbalist <igor.v.tsimbalist@intel.com>
H.J. Lu <hongjiu.lu@intel.com>
Wei Xiao <wei3.xiao@intel.com>
Lili Cui <lili.cui@intel.com>
* i386-dis.c (EXwScalarS): New.
(EXxh): Ditto.
(EXxhc): Ditto.
(EXxmmqh): Ditto.
(EXxmmqdh): Ditto.
(EXEvexXwb): Ditto.
(DistinctDest_Fixup): Ditto.
(enum): Add xh_mode, evex_half_bcst_xmmqh_mode, evex_half_bcst_xmmqdh_mode
and w_swap_mode.
(enum): Add PREFIX_EVEX_0F3A08_W_0, PREFIX_EVEX_0F3A0A_W_0,
PREFIX_EVEX_0F3A26, PREFIX_EVEX_0F3A27, PREFIX_EVEX_0F3A56,
PREFIX_EVEX_0F3A57, PREFIX_EVEX_0F3A66, PREFIX_EVEX_0F3A67,
PREFIX_EVEX_0F3AC2, PREFIX_EVEX_MAP5_10, PREFIX_EVEX_MAP5_11,
PREFIX_EVEX_MAP5_1D, PREFIX_EVEX_MAP5_2A, PREFIX_EVEX_MAP5_2C,
PREFIX_EVEX_MAP5_2D, PREFIX_EVEX_MAP5_2E, PREFIX_EVEX_MAP5_2F,
PREFIX_EVEX_MAP5_51, PREFIX_EVEX_MAP5_58, PREFIX_EVEX_MAP5_59,
PREFIX_EVEX_MAP5_5A_W_0, PREFIX_EVEX_MAP5_5A_W_1,
PREFIX_EVEX_MAP5_5B_W_0, PREFIX_EVEX_MAP5_5B_W_1,
PREFIX_EVEX_MAP5_5C, PREFIX_EVEX_MAP5_5D, PREFIX_EVEX_MAP5_5E,
PREFIX_EVEX_MAP5_5F, PREFIX_EVEX_MAP5_78, PREFIX_EVEX_MAP5_79,
PREFIX_EVEX_MAP5_7A, PREFIX_EVEX_MAP5_7B, PREFIX_EVEX_MAP5_7C,
PREFIX_EVEX_MAP5_7D_W_0, PREFIX_EVEX_MAP6_13, PREFIX_EVEX_MAP6_56,
PREFIX_EVEX_MAP6_57, PREFIX_EVEX_MAP6_D6, PREFIX_EVEX_MAP6_D7
(enum): Add EVEX_MAP5 and EVEX_MAP6.
(enum): Add EVEX_W_MAP5_5A, EVEX_W_MAP5_5B,
EVEX_W_MAP5_78_P_0, EVEX_W_MAP5_78_P_2, EVEX_W_MAP5_79_P_0,
EVEX_W_MAP5_79_P_2, EVEX_W_MAP5_7A_P_2, EVEX_W_MAP5_7A_P_3,
EVEX_W_MAP5_7B_P_2, EVEX_W_MAP5_7C_P_0, EVEX_W_MAP5_7C_P_2,
EVEX_W_MAP5_7D, EVEX_W_MAP6_13_P_0, EVEX_W_MAP6_13_P_2,
(get_valid_dis386): Properly handle new instructions.
(intel_operand_size): Handle new modes.
(OP_E_memory): Ditto.
(OP_EX): Ditto.
* i386-dis-evex.h: Updated for AVX512_FP16.
* i386-dis-evex-mod.h: Updated for AVX512_FP16.
* i386-dis-evex-prefix.h: Updated for AVX512_FP16.
* i386-dis-evex-reg.h : Updated for AVX512_FP16.
* i386-dis-evex-w.h : Updated for AVX512_FP16.
* i386-gen.c (cpu_flag_init): Add CPU_AVX512_FP16_FLAGS,
and CPU_ANY_AVX512_FP16_FLAGS. Update CPU_ANY_AVX512F_FLAGS
and CPU_ANY_AVX512BW_FLAGS.
(cpu_flags): Add CpuAVX512_FP16.
(opcode_modifiers): Add DistinctDest.
* i386-opc.h (enum): (AVX512_FP16): New.
(i386_opcode_modifier): Add reqdistinctreg.
(i386_cpu_flags): Add cpuavx512_fp16.
(EVEXMAP5): Defined as a macro.
(EVEXMAP6): Ditto.
* i386-opc.tbl: Add Intel AVX512_FP16 instructions.
* i386-init.h: Regenerated.
* i386-tbl.h: Ditto.
The former two are unused anyway. And having such constants isn't very
helpful either, when they live in a place where updating the register
table wouldn't even allow noticing the need to adjust these constants.
st(1) ... st(7) will never be looked up in the hash table, so there's no
point inserting the entries. It's also not really necessary to do a 2nd
hash lookup after parsing the register number, nor is there a real
reason for having both st and st(0) entries. Plus we can easily do away
with the need for st to be first in the table.
Now that all base opcodes are only at most 2 bytes in size, shrink its
template field to just as much. By also shrinking extension_opcode and
operands to just what they really need, we can free up an entire 32-bit
slot (plus 4 left bits past the bitfields themselves).
At present this alters sizeof(struct insn_template) only for 32-bit
builds. In 64-bit builds it instead leaves a padding hole that will
allow to buffer future growth of other fields (opcode_modifier,
cpu_flags, operand_types[]).
In the majority of cases we can easily determine the length from the
encoding, irrespective of whether a prefix is specified there as well.
We further don't even need to record the value in the table entries, as
it's easy enough to determine it (without any guesswork, unless an insn
with major opcode 00 appeared that requires a 2nd opcode byte to be
specified explicitly) when installing the chosen template for further
processing.
Should an encoding appear which
- has a major opcode byte of 66, F3, or F2,
- requires a 2nd opcode byte to be specified explicitly,
- doesn't have a mandatory prefix
we'd need to convert all templates presently encoding a mandatory prefix
this way to the Prefix_0X<nn> model to eliminate the respective guessing
i386-gen does.
This is in preparation of opcode_length going away as a field in the
templates. Identify pseudo prefixes by a base opcode of zero instead:
No real prefix has an opcode of zero. This at the same time allows
dropping a curious special case from i386-gen.
Since most attributes are identical for all pseudo prefixes, take the
opportunity and also template them.
In preparation to use PREFIX_0X<nn> attributes also in VEX/XOP/EVEX
encoding templates, renumber the pseudo-enumerators such that their
values can then also be used directly in the respective prefix bit
fields.
Commit 8b65b8953a ("x86: Remove the prefix byte from non-VEX/EVEX
base_opcode") used the opcodeprefix field for two distinct purposes. In
preparation of having VEX/XOP/EVEX and non-VEX templates become similar
in the representatioon of both encoding space and opcode prefixes, split
the field to have a separate one holding an insn's opcode space.
RepPrefixOk, HLEPrefixOk, and NoTrackPrefixOk can't be specified
together, so can share an enum-like field. IsLockable can be inferred
from HLE setting and hence only needs specifying when neither of them
is present.
Rename VexOpcode to OpcodePrefix so that OpcodePrefix can be used for
regular encoding prefix.
gas/
* config/tc-i386.c (build_vex_prefix): Replace vexopcode with
opcodeprefix.
(build_evex_prefix): Likewise.
(is_any_vex_encoding): Don't check vexopcode.
(output_insn): Handle opcodeprefix.
opcodes/
* i386-gen.c (opcode_modifiers): Replace VexOpcode with
OpcodePrefix.
* i386-opc.h (VexOpcode): Renamed to ...
(OpcodePrefix): This.
(PREFIX_NONE): New.
(PREFIX_0X66): Likewise.
(PREFIX_0XF2): Likewise.
(PREFIX_0XF3): Likewise.
* i386-opc.tbl (Prefix_0X66): New.
(Prefix_0XF2): Likewise.
(Prefix_0XF3): Likewise.
Replace VexOpcode= with OpcodePrefix=. Use Prefix_0X66 on xorpd.
Use Prefix_0XF3 on cvtdq2pd. Use Prefix_0XF2 on cvtpd2dq.
* i386-tbl.h: Regenerated.
We check register-only source operand to decide if two source operands of
VEX encoded instructions should be swapped. But source operands in AMX
instructions with two source operands swapped are all register-only
operand. Add SwapSources to indicate two source operands should be
swapped.
gas/
* config/tc-i386.c (build_modrm_byte): Check vexswapsources to
swap two source operands.
opcodes/
* i386-gen.c (opcode_modifiers): Add VexSwapSources.
* i386-opc.h (VexSwapSources): New.
(i386_opcode_modifier): Add vexswapsources.
* i386-opc.tbl: Add VexSwapSources to BMI2 and BMI instructions
with two source operands swapped.
* i386-tbl.h: Regenerated.
Rename VecSIB to SIB to support Intel Advanced Matrix Extensions which
introduces instructions with a mandatory SIB byte which isn't a vector
SIB (VSIB).
gas/
* config/tc-i386.c (check_VecOperands): Replace vecsib with sib.
Replace VecSIB128, VecSIB256 and VecSIB512 with VECSIB128,
VECSIB256 and VECSIB512, respectively.
(build_modrm_byte): Replace vecsib with sib.
opcodes/
* i386-gen.c (opcode_modifiers): Replace VecSIB with SIB.
(VecSIB128): Renamed to ...
(VECSIB128): This.
(VecSIB256): Renamed to ...
(VECSIB256): This.
(VecSIB512): Renamed to ...
(VECSIB512): This.
(VecSIB): Renamed to ...
(SIB): This.
(i386_opcode_modifier): Replace vecsib with sib.
* i386-opc.tbl (VexSIB128): New.
(VecSIB256): Likewise.
(VecSIB512): Likewise.
Replace VecSIB=1, VecSIB=2 and VecSIB=3 with VexSIB128, VecSIB256
and VecSIB512, respectively.
Register aliases (created e.g. via .set) check their target register at
the time of creation of the alias. While this makes sense, it's not
enough: The underlying register must also be "visible" at the time of
use. Wrong use of such aliases would lead to internal errors in e.g.
add_prefix() or build_modrm_byte().
Split the checking part of parse_real_register() into a new helper
function and use it also from the latter part of parse_register() (at
the same time replacing a minor open coded part of it).
Since parse_register() returning NULL already has a meaning, a fake new
"bad register" indicator gets added, which all callers need to check
for.
It is almost entirely redundant with Size64, and the sole case (CRC32)
where direct replacement isn't possible can easily be taken care of in
another way.
AMD ABM has 2 instructions: popcnt and lzcnt. ABM CPUID feature bit has
been reused for lzcnt and a POPCNT CPUID feature bit is added for popcnt
which used to be the part of SSE4.2. This patch removes CpuABM and adds
CpuPOPCNT. It changes ABM to enable both lzcnt and popcnt, changes SSE4.2
to also enable popcnt.
gas/
* config/tc-i386.c (cpu_arch): Add .popcnt.
* doc/c-i386.texi: Remove abm and .abm. Add popcnt and .popcnt.
Add a tab before @samp{.sse4a}.
opcodes/
* i386-gen.c (cpu_flag_init): Replace CpuABM with
CpuLZCNT|CpuPOPCNT. Add CpuPOPCNT to CPU_SSE4_2_FLAGS. Add
CPU_POPCNT_FLAGS.
(cpu_flags): Remove CpuABM. Add CpuPOPCNT.
* i386-opc.h (CpuABM): Removed.
(CpuPOPCNT): New.
(i386_cpu_flags): Remove cpuabm. Add cpupopcnt.
* i386-opc.tbl: Replace CpuABM|CpuSSE4_2 with CpuPOPCNT on
popcnt. Remove CpuABM from lzcnt.
* i386-init.h: Regenerated.
* i386-tbl.h: Likewise.
Commit d835a58baa disabled sysenter/sysenter in 64-bit mode by
default. By default, assembler should accept common, Intel64 only
and AMD64 ISAs since there are no conflicts.
gas/
PR gas/25516
* config/tc-i386.c (intel64): Renamed to ...
(isa64): This.
(match_template): Accept Intel64 only instruction by default.
(i386_displacement): Updated.
(md_parse_option): Updated.
* c-i386.texi: Update -mamd64/-mintel64 documentation.
* testsuite/gas/i386/i386.exp: Run x86-64-sysenter. Pass
-mamd64 to x86-64-sysenter-amd.
* testsuite/gas/i386/x86-64-sysenter.d: New file.
opcodes/
PR gas/25516
* i386-gen.c (opcode_modifiers): Replace AMD64 and Intel64
with ISA64.
* i386-opc.h (AMD64): Removed.
(Intel64): Likewose.
(AMD64): New.
(INTEL64): Likewise.
(INTEL64ONLY): Likewise.
(i386_opcode_modifier): Replace amd64 and intel64 with isa64.
* i386-opc.tbl (Amd64): New.
(Intel64): Likewise.
(Intel64Only): Likewise.
Replace AMD64 with Amd64. Update sysenter/sysenter with
Cpu64 and Intel64Only. Remove AMD64 from sysenter/sysenter.
* i386-tbl.h: Regenerated.
... instead of an operand one. Which operand it applies to can be
determined from other operand properties, but as it turns out the only
place it is actually used at doesn't even need further qualification.
EsSeg (a per-operand bit) is used with IsString (a per-insn attribute)
only. Extend the attribute to 2 bits, thus allowing to encode
- not a string insn,
- string insn with neither operand requiring use of %es:,
- string insn with 1st operand requiring use of %es:,
- string insn with 2nd operand requiring use of %es:,
which covers all possible cases, allowing to drop EsSeg.
The (transient) need to comment out the OTUnused #define did uncover an
oversight in the earlier OTMax -> OTNum conversion, which is being taken
care of here.
Drop the remaining instances left in place by commit c3949f432f ("x86:
limit ImmExt abuse), now that we have a way to specify specific GPRs.
Take the opportunity and also introduce proper 16-bit forms of
applicable SVME insns as well as 1-operand forms of CLZERO.
Special register "class" instances can't be combined with one another
(neither in templates nor in register entries), and hence it is not a
good use of resources (memory as well as execution time) to represent
them as individual bits of a bit field.
Furthermore the generalization becoming possible will allow
improvements to the handling of insns accepting only individual
registers as their operands.