Add support for the SME2 ADD. SUB, FADD and FSUB instructions.
SUB and FSUB have the same form as ADD and FADD, except that
ADD also has a 2-operand accumulating form.
The 64-bit ADD/SUB instructions require FEAT_SME_I16I64 and the
64-bit FADD/FSUB instructions require FEAT_SME_F64F64.
These are the first instructions to have tied register list
operands, as opposed to tied single registers.
The parse_operands change prevents unsuffixed Z registers (width==-1)
from being treated as though they had an Advanced SIMD-style suffix
(.4s etc.). It means that:
Error: expected element type rather than vector type at operand 2 -- `add za\.s\[w8,0\],{z0-z1}'
becomes:
Error: missing type suffix at operand 2 -- `add za\.s\[w8,0\],{z0-z1}'
SME2 adds lookup table instructions for quantisation. They use
a new lookup table register called ZT0.
LUTI2 takes an unsuffixed SVE vector index of the form Zn[<imm>],
which is the first time that this syntax has been used.
Implementation-wise, the main things to note here are:
- the WHILE* instructions have forms that return a pair of predicate
registers. This is the first time that we've had lists of predicate
registers, and they wrap around after register 15 rather than after
register 31.
- the predicate-as-counter WHILE* instructions have a fourth operand
that specifies the vector length. We can treat this as an enumeration,
except that immediate values aren't allowed.
- PEXT takes an unsuffixed predicate index of the form PN<n>[<imm>].
This is the first instance of a vector/predicate index having
no suffix.
SME2 adds LD1 and ST1 variants for lists of 2 and 4 registers.
The registers can be consecutive or strided. In the strided case,
2-register lists have a stride of 8, starting at register x0xxx.
4-register lists have a stride of 4, starting at register x00xx.
The instructions are predicated on a predicate-as-counter register in
the range pn8-pn15. Although we already had register fields with upper
bounds of 7 and 15, this is the first plain register operand to have a
nonzero lower bound. The patch uses the operand-specific data field
to record the minimum value, rather than having separate inserters
and extractors for each lower bound. This in turn required adding
an extra bit to the field.
SME2 defines new MOVA instructions for moving multiple registers
to and from ZA. As with SME, the instructions are also available
through MOV aliases.
One notable feature of these instructions (and many other SME2
instructions) is that some register lists must start at a multiple
of the list's size. The patch uses the general error "start register
out of range" when this constraint isn't met, rather than an error
specifically about multiples. This ensures that the error is
consistent between these simple consecutive lists and later
strided lists, for which the requirements aren't a simple multiple.
SME2 adds a new format for the existing SVE predicate registers:
predicates as counters rather than predicates as masks. In assembly
code, operands that interpret predicates as counters are written
pn<N> rather than p<N>.
This patch adds support for these registers and extends some
existing instructions to support them. Since the new forms
are just a programmer convenience, there's no need to make them
more restrictive than the earlier predicate-as-mask forms.
Some SME2 instructions operate on a range of consecutive ZA vectors.
This is indicated by syntax such as:
za[<Wv>, <imml>:<immh>]
Like with the earlier vgx2 and vgx4 support, we get better error
messages if the parser allows all ZA indices to have a range.
We can then reject invalid cases during constraint checking.
Many SME2 instructions operate on groups of 2 or 4 ZA vectors.
This is indicated by adding a "vgx2" or "vgx4" group size to the
ZA index. The group size is optional in assembly but preferred
for disassembly.
There is not a binary distinction between mnemonics that have
group sizes and mnemonics that don't, nor between mnemonics that
take vgx2 and mnemonics that take vgx4. We therefore get better
error messages if we allow any ZA index to have a group size
during parsing, and wait until constraint checking to reject
invalid sizes.
A quirk of the way errors are reported means that if an instruction
is wrong both in its qualifiers and its use of a group size, we'll
print suggested alternative instructions that also have an incorrect
group size. But that's a general property that also applies to
things like out-of-range immediates. It's also not obviously the
wrong thing to do. We need to be relatively confident that we're
looking at the right opcode before reporting detailed operand-specific
errors, so doing qualifier checking first seems resonable.
SME2 adds various new fields that are similar to
AARCH64_OPND_SME_ZA_array, but are distinguished by the size of
their offset fields. This patch adds _off4 to the name of the
field that we already have.
Until now, binutils has supported register ranges such
as { v0.4s - v3.4s } as an unofficial shorthand for
{ v0.4s, v1.4s, v2.4s, v3.4s }. The SME2 ISA embraces this form
and makes it the preferred disassembly. It also embraces wrapped
lists such as { z31.s - z2.s }, which is something that binutils
didn't previously allow.
The range form was already binutils's preferred disassembly for 3- and
4-register lists. This patch prefers it for 2-register lists too.
The patch also adds support for wrap-around.
SME2 has instructions that accept strided register lists,
such as { z0.s, z4.s, z8.s, z12.s }. The purpose of this
patch is to extend binutils to support such lists.
The parsing code already had (unused) support for strides of 2.
The idea here is instead to accept all strides during parsing
and reject invalid strides during constraint checking.
The SME2 instructions that accept strided operands also have
non-strided forms. The errors about invalid strides therefore
take a bitmask of acceptable strides, which allows multiple
possibilities to be summed up in a single message.
I've tried to update all code that handles register lists.
Some FLD_imm* suffixes used a counting scheme such as FLD_immN,
FLD_immN_2, FLD_immN_3, etc., while others used the lsb as the
suffix. The latter seems more mnemonic, and was a big help
in doing the SME2 work.
Similarly, the _10 suffix on FLD_SME_size_10 was nonobvious.
Presumably it indicated a 2-bit field, but it actually starts
in bit 22.
Quite a lot of SME2 instructions have an opcode bit that selects
between 32-bit and 64-bit forms of an instruction, with the 32-bit
forms being part of base SME2 and with the 64-bit forms being part
of an optional extension. It's nevertheless useful to have a single
opcode entry for both forms since (a) that matches the ISA definition
and (b) it tends to improve error reporting.
This patch therefore adds a libopcodes function called
aarch64_cpu_supports_inst_p that tests whether the target
supports a particular instruction. In future it will depend
on internal libopcodes routines.
This patch renames the OP_SME_* macros in aarch64-tbl.h so that
they follow the same scheme as the OP_SVE_* ones. It also uses
OP_SVE_ as the prefix, since there is no real distinction between
the SVE and SME uses of qualifiers: a macro defined for one can
be useful for the other too.
If an instruction has invalid qualifiers, GAS would report the
error against the final opcode entry that got to the qualifier-
checking stage. It seems better to report the error against
the opcode entry that had the closest match, just like we
pick the closest match within an opcode entry for the
"did you mean this?" message.
This patch adds the number of invalid operands as an
argument to AARCH64_OPDE_INVALID_VARIANT and then picks the
AARCH64_OPDE_INVALID_VARIANT with the lowest argument.
AARCH64_OPDE_REG_LIST took a single operand that specified the
expected number of registers. However, there are quite a few
SME2 instructions that have both 2-register forms and (separate)
4-register forms. If the user tries to use a 3-register list,
it isn't obvious which opcode entry they meant. Saying that we
expect 2 registers and saying that we expect 4 registers would
both be wrong.
This patch therefore switches the operand to a bitfield. If a
AARCH64_OPDE_REG_LIST is reported against multiple opcode entries,
the patch ORs up the expected lengths.
This has no user-visible effect yet. A later patch adds more error
strings, alongside tests that use them.
SVE register lists were classified as SVE_REG, since there had been
no particular reason to separate them out. However, some SME2
instructions have tied register list operands, and so we need to
distinguish registers and register lists when checking whether two
operands match.
Also, the register list operands used a general error message,
even though we already have a dedicated error code for register
lists that are the wrong length.
libopcodes currently reports out-of-range registers as a general
AARCH64_OPDE_OTHER_ERROR. However, this means that each register
range needs its own hard-coded string, which is a bit cumbersome
if the range is determined programmatically. This patch therefore
adds a dedicated error type for out-of-range errors.
In SME, the vector select register had to be in the range
w12-w15, so it made sense to enforce that during parsing.
However, SME2 adds instructions for which the range is
w8-w11 instead.
This patch therefore moves the range check from the parsing
stage to the constraint-checking stage.
Also, the previous error used a capitalised range W12-W15,
whereas other register range errors used lowercase ranges
like p0-p7. A quick internal poll showed a preference for
the lowercase form, so the patch uses that.
The patch uses "selection register" rather than "vector
select register" so that the terminology extends more
naturally to PSEL.
This patch moves the range checks on ZA vector select offsets from
gas to libopcodes. Doing the checks there means that the error
messages contain the expected range. It also fits in better
with the error severity scheme, which becomes important later.
(This is because out-of-range indices are treated as more severe than
syntax errors, on the basis that parsing must have succeeded if we get
to the point of checking the completed opcode.)
The patch also adds a new check_za_access function for checking
ZA accesses. That's a bit over the top for one offset check, but the
function becomes more complex with later patches.
sme-9-illegal.s checked for an invalid .q suffix using:
psel p1, p15, p3.q[w15]
but this is doubly invalid because it misses the immediate part
of the index. The patch keeps that test but adds another with
a zero index, so that .q is the only thing wrong.
The aarch64-tbl.h change includes neatening up the backslash
positions.
A later patch moves the range checking for ZA vector select
offsets from gas to libopcodes. That in turn requires the
immediate field to be big enough to support all parsed values.
This shouldn't be a particularly size-sensitive structure,
so there should be no memory problems with doing this.
za_tile_vector is also used for indexing ZA as a whole, rather than
just for indexing tiles. The former is more common than the latter
in SME2, so this patch generalises the name to "indexed_za".
The patch also names the associated structure, so that later patches
can reuse it during parsing.
We already treat the ZA tiles ZA0-ZA15 as registers. This patch
does the same for ZA itself. parse_sme_zero_mask can then parse
ZA tiles and ZA in the same way, through parsed_type_reg.
One important effect of going through parsed_type_reg (in general)
is that it allows ZA to take qualifiers. This is necessary for many
SME2 instructions.
However, to support existing unqualified uses of ZA, parse_reg_with_qual
needs to treat the qualiier as optional. Hopefully the net effect is
to give better error messages, since now that SME2 makes "za.<T>"
valid in some contexts, it might be natural to use it (incorrectly)
in ZERO too.
While there, the patch also tweaks the error messages for invalid
ZA tiles, to try to make some cases more specific.
For now, parse_sme_za_array just uses parse_reg, rather than
parse_typed_reg/parse_reg_with_qual. A later patch consolidates
the parsing further.
This patch makes all SME instructions use F_STRICT, so that qualifiers
have to be provided explicitly rather than being inferred from other
operands. The main change is to move the qualifier setting from the
operand-level decoders to the opcode level.
This is one step towards consolidating the ZA parsing code and
extending it to handle SME2.
In the register-index forms of PRFM, the unallocated prefetch opcodes
24-31 have been reused for the encoding of the new RPRFM instruction.
The PRFM opcode space is now capped at 23 for these forms. The other
forms of PRFM are unaffected.
Most extension flags are named after the associated architectural
FEAT_* flags, but sme-i64 and sme-f64 were exceptions. This patch
adds sme-i16i64 and sme-f64f64 aliases, but keeps the old names too
for compatibility.
With VexVVVV only being boolean, the SSE shift-by-immediate instructions
don't need special casing anymore for SSE2AVX handling. Simplify the two
respective templates. (No change to generated tables.)
With the SDM long having dropped the NDS/NDD/DDS concept of identifying
encoding variants, we can finally do away with this concept as well. Of
the few consumers of the attribute, only an assertion was still checking
for a particular value, which we don't really need to retain.
When touching lines anyway, modernize other aspects as well. This often
improves similarity to adjacent lines.
The function has accumulated a number of special cases for no real
reason. Some were necessary because insn attributes (SwapSources in
particular) weren't suitably utilized instead. Note that the addition of
SwapSources actually increases consistency among the templates: Like
others which already have the attribute, these are all insns where the
VEX.VVVV-encoded register comes first (or last when looking at the SDM).
Note that the vexvvvv attribute now has merely boolean meaning anymore,
in line with the SDM long having dropped the NDS/NDD/DDS concept of
identifying encoding variants. The fallout will be taken care of
subsequently, though, to not further clutter the change here.
As to the TILEZERO special case: If more instructions like this
appeared, a new attribute would likely be the way to go. But as long as
it's only a single insn, going from the mnemonic is cheaper.
This reverts commit 92d450c79a.
Accessing these local var structs using a volatile qualified pointer
may indeed read the object, but I don't think changed values are
guaranteed to be written back to the object unless the actual object
is declared volatile. That would probably slow down i386 disassembly
unacceptably.
The following commit added libopcodes styling for m68k:
commit c22ff44927
Date: Tue Feb 14 18:07:19 2023 +0100
opcodes: style m68k disassembler output
but didn't set disassemble_info::created_styled_output in
disassemble.c, which is needed in order for GDB to start using the
libopcodes based styling.
This commit fixes this small oversight. GDB now styles correctly.
The feature isn't universally available on 64-bit CPUs.
Note that in i386-gen.c:isa_dependencies[] I'm only adding it to models
where I'm certain the functionality exists. For Nocona and Core I'm
uncertain in particular.
While MOV to/from segment register as well as selector storing insns
already permit 32- and 64-bit GPR operands, selector loading insns and
ARPL do not. Split templates accordingly.
For shifts (but not ordinary rotates) and other cases where an immediate
describes e.g. a bit count or position, allowing negative operands is at
best confusing. An extreme example would be the two rotate-through-carry
insns, where a negative value would _not_ mean rotating the
corresponding number of bits in the other direction. To refuse such,
give meaning to the combination of Imm8 and Imm8S in templates (so far
these weren't used together anywhere). The issue was with
smallest_imm_type() blindly setting .imm8 for signed numbers determined
to fit in a byte.
VPROT{B,W,D,Q} is a little special: The rotate count there is a signed
quantity, so Imm8 is replaced by Imm8S. Adjust affected testcases
accordingly as well.
Another small adjustment to the testsuite is necessary: AAM and AAD were
never sensible to use with 0xffffff90 operands. This should have been an
error.
Just like we suppress emitting REX.W for e.g. MOV from/to segment
register, there's also no need for it for LAR and LSL - these can only
ever return 32-bit values and hence always zero-extend their results
anyway.
While there also drop the redundant Word from the first operand of
the second template each - this is already implied by Reg16.
In 64-bit mode BT can have REX.W or a data size prefix dropped in
certain cases. Outside of 64-bit mode all 4 insns can have the data
size prefix dropped in certain cases.
Before commit:
commit 2438b771ee
Date: Wed Nov 2 15:53:43 2022 +0000
opcodes/mips: use .word/.short for undefined instructions
unknown 32-bit microMIPS instructions were disassembled as a raw
32-bit number with no '.word' directive. The above commit changed
this and added a '.word' directive before the 32-bit number.
It was pointed out on the mailing list, that for microMIPS it would be
better to display such 32-bit instructions using a '.short' directive
followed by two 16-bit values.
This commit updates the mips disassembler to do this, and adds a new
test that validates this output.
The attribute really specifies that the sum of register and memory
operands is 4. Express it like that in most places, while using the 2nd
(apart from XOP) CPU feature flags (FMA4) in reversed operand matching
logic.
With the use in build_modrm_byte() gone, part of an assertion there
also becomes meaningless - simplify that at the same time.
With all uses of the opcode modifier field gone, also drop that.
The few XOP insns which used it wrongly didn't have VexVVVV specified.
With that added, the only further missing piece to use more generic code
elsewhere is SwapSources - see e.g. the BMI2 insns for similar operand
patterns.
With the only users gone, drop the #define as well as the special case
code.
The VPROT* forms with an immediate operand are entirely standard in the
way their ModR/M bytes are built. There's no reason to invoke special
case code. With that the handling of an immediate there can also be
dropped; it was partially bogus anyway, as in its "no memory operands"
portion it ignores the possibility of an immediate operand (which was
okay only because that case was already handled by more generic code).
This really isn't a "modifier" and rather ought to live next to the base
opcode anyway. Use the bits we presently have available to fit in the
field, renaming it to opcode_space. As an intended side effect this
helps readability at the use sites, by shortening the references quite a
bit.
In generated code arrange for human readable output, by using the
SPACE_* constants there rather than raw numbers. This may aid debugging
down the road.
Regenerating BPF target using the maintainer mode emits:
.../opcodes/bpf-opc.c:57:11: error: conversion from ‘long unsigned int’ to ‘unsigned int’ changes value from ‘18446744073709486335’ to ‘4294902015’ [-Werror=overflow]
57 | 64, 64, 0xffffffffffff00ff, { { F (F_IMM32) }, { F (F_OFFSET16) }, { F (F_SRCLE) }, { F (F_OP_CODE) }, { F (F_DSTLE) }, { F (F_OP_SRC) }, { F (F_OP_CLASS) }, { 0 } }
The use of a narrow size to handle the mask CGEN in instruction format
is causing this error. Additionally eBPF `call' instructions
constructed by expressions using symbols (BPF_PSEUDO_CALL) emits
annotations in `src' field of the instruction, used to identify BPF
target endianness.
cpu/
* bpf.cpu (define-call-insn): Remove `src' field from
instruction mask.
include/
*opcode/cge.h (CGEN_IFMT): Adjust mask bit width.
opcodes/
* bpf-opc.c: Regenerate.
Insn width granularity being 16 bits, producing byte granular output
isn't very useful. With there being a way to specific otherwise
unknown insns to the assembler, use that same representation (to be
precise: its <length>,<encoding> flavor) for disassembly.
Along with the normal JAL alias, the C-extension one should have been
moved as well by 839189bc93 ("RISC-V: re-arrange opcode table for
consistent alias handling"), for the assembler to actually be able to
use it where/when possible.
Since neither this nor any other compressed branch insn was being tested
so far, take the opportunity and introduce a new testcase covering those.
While one of the two actually having been instrumented (i386-gen.c) now
has that instrumentation dropped, there's still no point in honoring
such instrumentation in general (i.e. now for ia64-gen.c only), as that
only leads to a waste of resources.
With CFILES then being merely an alias of LIBOPCODES_CFILES, drop the
former variable altogether.
All glibc malloc() implementations I've checked have a smallest
allocation size worth of 3 pointers, with an increment worth of 2
pointers. Hence mnemonics with multiple templates can be stored more
efficiently when maintaining the shared "name" field only in the actual
hash entry. (To express the shared nature, also convert "name" to by
pointer-to-const.)
While doing the conversation also pull out common code from the involved
if/else construct in expand_templates().
Register names are (including their nul terminators) on average almost 4
bytes long. Otoh no register name is longer than 8 bytes. Hence even for
32-bit builds using a pointer is only slightly more space efficient than
embedding the strings. A level of indirection can be also avoided by
embedding the names as an array of 8 characters directly in the arrays,
and the number of base relocations in libopcodes.so (or PIE builds of
statically linked executables) goes down as well.
To amortize for the otherwise reduced folding of string literals by the
linker, use att_names_seg[] in place of string literals in append_seg()
and OP_ESreg().
Register names are (including their nul terminators) on average almost 4
bytes long. Otoh no register name is longer than 7 bytes. Hence even for
32-bit builds using a pointer is only slightly more space efficient than
embedding the strings. A level of indirection can be also avoided by
embedding the names as an array of 8 characters directly in the struct,
and the number of base relocations in PIE builds of gas goes down as
well.
When generating the mnemonic string table we already set up an
identifier for the following entry in a number of cases. Re-use that on
the next loop iteration rather than re-doing allocation and conversion.
Compact the mnemonic string table such that the tails of longer
mnemonics are re-used for shorter ones, going beyond what compilers
would typically do, but matching what ELF linkers may do when processing
SHF_MERGE|SHF_STRINGS sections. This reduces table size by about 12.5%.
Using full pointers to reference the insn mnemonic strings is not very
efficient. With overall string size presently just slightly over 20k,
even a 16-bit value would suffice. Use "unsigned int" for now, as
there's no good use we could presently make of the otherwise saved 16
bits.
For 64-bit builds this reduces table size by 6.25% (prior to the recent
ISA extension additions it would have been 12.5%), with a similar effect
on cache occupation of table entries accessed. For PIE builds of gas
this also reduces the number of base relocations quite a bit (obviously
independent of bitness).
The newer update-copyright.py fixes file encoding too, removing cr/lf
on binutils/bfdtest2.c and ld/testsuite/ld-cygwin/exe-export.exp, and
embedded cr in binutils/testsuite/binutils-all/ar.exp string match.
TSXLDTRK takes RTM as a prereq. Additionally introduce an umbrella "tsx"
extension option covering both RTM and HLE, paralleling the "abm" one we
already have.
SEV-ES is an extension to SVME. SNP in turn is an extension to SEV-ES,
and yet in turn RMPQUERY is a SNP extension.
Note that cpu_arch[] has no SNP entry, so CPU_ANY_SNP_FLAGS remains
unused (just like CPU_SNP_FLAGS already is).
Like various other features AMX-TILE takes XSAVE as a prereq.
XSAVES, unconditionally using compacted format, in turn effectively
takes XSAVEC as a prereq (an SDM clarification to this effect is in the
works).
Like AVX512-FP16, several other extensions require wider than 16-bit
mask registers. As a result they take AVX512BW as a prereq, not (just)
AVX512F. Which in turn points out wrong expectations in the noavx512-1
testcase.
SSE itself takes FXSR as a prereq. Like AES, PCLMUL, and SHA both GFNI
and KL take SSE2 as a prereq, for operating on packed integers. And
while correcting KL also record it as a prereq to WIDEKL.
Getting both forward and reverse ISA dependencies right / consistent has
been a permanent source of mistakes. Reduce what needs specifying
manually to just the direct forward dependencies. Transitive forward
dependencies as well as reverse ones are now derived and hence cannot go
out of sync anymore (at least in the vast majority of cases; there are a
few special cases to still take care of manually). In the course of this
several CPU_ANY_*_FLAGS disappear, requiring adjustment to the
assembler's cpu_arch[].
Note that to retain the correct reverse dependency of AVX512F wrt
AVX512-VP2INTERSECT, the latter has the previously missing AVX512F
prereq added.
Note further that to avoid adding the following undue prereqs:
* ATHLON, K8, and AMDFAM10 gain CMOV and FXSR,
* IAMCU gains 387,
auxiliary table entries (including a colon-separated modifier) are
introduced in addition to the ones representing from converting the old
table.
To maintain forward-only dependencies between AVX (XOP) and SSE* (SSE4a)
(i.e. "nosse" not disabling AVX), reverse dependency tracking is
artifically suppressed.
As a side effect disabling of SSE or SSE2 will now also disable AES,
PCLMUL, and SHA (respective elements were missing from
CPU_ANY_SSE2_FLAGS).
While originally indeed used for register size checking only, the
attribute has been used for memory operand size checking as well already
for quite a while, with more such uses recently having been added.
Having a "None" field in the vast majority of entries is needlessly
cluttering the overall table. Instead of this being a separate field,
use a representation matching that of Intel SDM and AMD PM for the main
use of the field: Append the value after a / as the separator.
PR gas/29524
Having templates with a suffix explicitly present has always been
quirky. After prior adjustment all that's left to also eliminate the
anomaly from move-with-sign-extend is to consolidate the insn templates
and to make may_need_pass2() cope (plus extend testsuite coverage).
The need for them on the operand-less string insns has gone away with
the removal of maybe_adjust_templates() and associated logic. Since
i386_index_check() needs adjustment then anyway, take the opportunity
and also simplify it, possible again as a result of said removal (plus
the opcode template adjustments done here).
Having templates with a suffix explicitly present has always been
quirky. Introduce a 2nd matching pass in case the 1st one couldn't find
a suitable template _and_ didn't itself already need to trim off a
suffix to find a match at all. This requires error reporting adjustments
(albeit luckily fewer than I was afraid might be necessary), as errors
previously reported during matching now need deferring until after the
2nd pass (because, obviously, we must not emit any error if the 2nd pass
succeeds). While also related to PR gas/29524, it was requested that
move-with-sign-extend be left as broken as it always was.
PR gas/29525
Note that with the dropped CMPSD and MOVSD Intel Syntax string insn
templates taking operands, mixed IsString/non-IsString template groups
(with memory operands) cannot occur anymore. With that
maybe_adjust_templates() becomes unnecessary (and is hence being
removed).
PR gas/29526
Note further that while the additions to the intel16 testcase aren't
really proper Intel syntax, we've been permitting all of those except
for the MOVD variant. The test therefore is to avoid re-introducing such
an inconsistency.
This reverts the disassembler parts of 859aa2c86d ("x86: Allow 16-bit
register source for LAR and LSL"), adjusting testcases as necessary.
That change was itself a partial revert of c9f5b96bda ("x86: correct
handling of LAR and LSL"), without actually saying so. While the earlier
commit was properly agreed upon, the partial revert was not, and hence
should not have been committed. This is even more so that the revert
part of that change wasn't even necessary to address PR gas/29844.
Speed up gas startup by avoiding runtime allocation of the instances of
type "templates". At the same time cut the memory requirement to just
very little over half (not even accounting for any overhead
notes_alloc() may incur) by reusing the "end" slot of a preceding entry
for the "start" slot of the subsequent one.
Now that the table is local to gas, ARRAY_SIZE() can be used to
determine the end of the table. Re-arrange the processing loop in
md_begin() accordingly, at the same time folding the two calls to
notes_alloc() into just one.
Remove the now empty i386-opc.c. To compensate, tie table generation in
opcodes/ to the building of i386-dis.o, despite the file not really
depending on the generated data.
Unlike many other architectures, x86 does not share an opcode table
between assembly and disassembly. Any consumer of libopcodes would only
ever access one of the two. Since gas is the only consumer of the
assembly data, move it there. While doing so mark respective entities
"static" in i386-gen (we may want to do away with i386_regtab_size
altogether).
This also shrinks the number of relocations to be processed for
libopcodes.so by about 30%.
For all the xh_mode usage in table, they are all using %XH, which will
print "{bad}" while EVEX.W=1. This makes this vex.w check unnecessary.
opcodes/ChangeLog:
* i386-dis.c (OP_E_memory): Remove vex.w check for xh_mode.
This commit adds disassembler styling support for MIPS. After this
commit objdump and GDB will style disassembler output.
This is a pretty straight forward change, we switch to use the
disassemble_info::fprintf_styled_func callback, and pass an
appropriate style through as needed. No additional tricks were
needed (compared to say i386, or ARM).
Tested by running all of the objdump commands used by the gas
testsuite and manually inspecting the styled output, everything looks
reasonable, though I'm not a MIPS expert, so it is possible that I've
missed some corner cases. Worst case though is that something will be
styled incorrectly, the actual content should be unchanged.
All the gas, ld, and binutils tests still pass for me.