All other cgen ports keep their generated desc & opc files under
opcodes/, so move the cris files over too. The cris-opc.c file,
while not generated, is already here to complement.
The initial problem I wanted to fix here is that GAS was rejecting MVE
instructions such as:
vmov q3[2], q3[0], r2, r2
with:
Error: General purpose registers may not be the same -- `vmov q3[2],q3[0],r2,r2'
which is incorrect; such instructions are valid. Note that for moves in
the other direction, e.g.:
vmov r2, r2, q3[2], q3[0]
GAS is correct in rejecting this as it does not make sense to move both
lanes into the same register (the Arm ARM says this is CONSTRAINED
UNPREDICTABLE).
After fixing this issue, I added assembly/disassembly tests for these
vmovs. This revealed several disassembly issues, including incorrectly
marking the moves into vector lanes as UNPREDICTABLE, and disassembling
many of the vmovs as vector loads. These are now fixed.
gas/ChangeLog:
* config/tc-arm.c (do_mve_mov): Only reject vmov if we're moving
into the same GPR twice.
* testsuite/gas/arm/mve-vmov-bad-2.l: Tweak error message.
* testsuite/gas/arm/mve-vmov-3.d: New test.
* testsuite/gas/arm/mve-vmov-3.s: New test.
opcodes/ChangeLog:
* arm-dis.c (mve_opcodes): Fix disassembly of
MVE_VMOV2_GP_TO_VEC_LANE when idx == 1.
(is_mve_encoding_conflict): MVE vector loads should not match
when P = W = 0.
(is_mve_unpredictable): It's not unpredictable to use the same
source register twice (for MVE_VMOV2_GP_TO_VEC_LANE).
The gotha() relocation mnemonic will be outputted by OpenRISC GCC when
using the -mcmodel=large option. This relocation is used along with
got() to generate 32-bit GOT offsets. This increases the previous GOT
offset limit from the previous 16-bit (64K) limit.
This is needed on large binaries where the GOT grows larger than 64k.
bfd/ChangeLog:
PR 21464
* bfd-in2.h: Add BFD_RELOC_OR1K_GOT_AHI16 relocation.
* elf32-or1k.c (or1k_elf_howto_table, or1k_reloc_map): Likewise.
(or1k_final_link_relocate, or1k_elf_relocate_section,
or1k_elf_check_relocs): Likewise.
* libbfd.h (bfd_reloc_code_real_names): Likewise.
* reloc.c: Likewise.
cpu/ChangeLog:
PR 21464
* or1k.opc (or1k_imm16_relocs, parse_reloc): Define parse logic
for gotha() relocation.
include/ChangeLog:
PR 21464
* elf/or1k.h (elf_or1k_reloc_type): Define R_OR1K_GOT_AHI16 number.
opcodes/ChangeLog:
PR 21464
* or1k-asm.c: Regenerate.
gas/ChangeLog:
PR 21464
* testsuite/gas/or1k/reloc-1.s: Add test for new relocation.
* testsuite/gas/or1k/reloc-1.d: Add test result for new
relocation.
Cc: Giulio Benetti <giulio.benetti@benettiengineering.com>
fixup reloc, add tests
Over the years I've seen a number of instances where people used
lea (%reg1), %reg2
or
lea symbol, %reg
despite the same thing being expressable via MOV. Since additionally
LEA often has restrictions towards the ports it can be issued to, while
MOV typically gets dealt with simply by register renaming, transform to
MOV when possible (without growing opcode size and without altering
involved relocation types).
Note that for Mach-O the new 64-bit testcases would fail (for
BFD_RELOC_X86_64_32S not having a representation), and hence get skipped
there.
Display literal value loaded with l32r opcode as a part of disassembly.
This significantly simplifies reading of disassembly output.
2020-04-23 Max Filippov <jcmvbkbc@gmail.com>
opcodes/
* xtensa-dis.c (print_xtensa_operand): For PC-relative operand
of l32r fetch and display referenced literal value.
Output literals as 4-byte words, not as separate bytes.
2021-04-23 Max Filippov <jcmvbkbc@gmail.com>
opcodes/
* xtensa-dis.c (print_insn_xtensa): Set info->bytes_per_chunk
to 4 for literal disassembly.
This patch adds support to four new system registers (RPAOS, RPALOS, PAALLOS,
PAALL) in conjunction with TLBI instruction. This change is part of RME (Realm
Management Extension).
gas/ChangeLog:
2021-04-19 Przemyslaw Wirkus <przemyslaw.wirkus@arm.com>
* NEWS: Update news.
* testsuite/gas/aarch64/rme.d: Update test.
* testsuite/gas/aarch64/rme.s: Update test.
opcodes/ChangeLog:
2021-04-19 Przemyslaw Wirkus <przemyslaw.wirkus@arm.com>
* aarch64-opc.c: Add new registers (RPAOS, RPALOS, PAALLOS, PAALL) support for
TLBI instruction.
This patch adds support to two new system registers (CIPAPA, CIGDPAPA) in
conjunction with DC instruction. This change is part of RME (Realm Management
Extension).
gas/ChangeLog:
2021-04-19 Przemyslaw Wirkus <przemyslaw.wirkus@arm.com>
* testsuite/gas/aarch64/rme.d: Update test.
* testsuite/gas/aarch64/rme.s: Update test.
opcodes/ChangeLog:
2021-04-19 Przemyslaw Wirkus <przemyslaw.wirkus@arm.com>
* aarch64-opc.c: Add new register (CIPAPA, CIGDPAPA) support for
DC instruction.
Patch 1: Fix diagnostics for exclusive load/stores and reclassify
Armv8.7-A ST/LD64 Atomics.
Following upstream pointing out some inconsistencies in diagnostics,
https://sourceware.org/pipermail/binutils/2021-February/115356.html
attached is a patch set that fixes the issues. I believe a combination
of two patches mainly contributed to these bugs:
https://sourceware.org/pipermail/binutils/2020-November/113961.htmlhttps://sourceware.org/pipermail/binutils/2018-June/103322.html
A summary of what this patch set fixes:
For instructions
STXR w0,x2,[x0]
STLXR w0,x2,[x0]
The warning we emit currently is misleading:
Warning: unpredictable: identical transfer and status registers --`stlxr w0,x2,[x0]'
Warning: unpredictable: identical transfer and status registers --`stxr w0,x2,[x0]'
it ought to be:
Warning: unpredictable: identical base and status registers --`stlxr w0,x2,[x0]'
Warning: unpredictable: identical base and status registers --`stxr w0,x2,[x0]'
For instructions:
ldaxp x0,x0,[x0]
ldxp x0,x0,[x0]
The warning we emit is incorrect
Warning: unpredictable: identical transfer and status registers --`ldaxp x0,x0,[x0]'
Warning: unpredictable: identical transfer and status registers --`ldxp x0,x0,[x0]'
it ought to be:
Warning: unpredictable load of register pair -- `ldaxp x0,x0,[x0]'
Warning: unpredictable load of register pair -- `ldxp x0,x0,[x0]'
For instructions
stlxp w0, x2, x2, [x0]
stxp w0, x2, x2, [x0]
We don't emit any warning when it ought to be:
Warning: unpredictable: identical base and status registers --`stlxp w0,x2,x2,[x0]'
Warning: unpredictable: identical base and status registers --`stxp w0,x2,x2,[x0]'
For instructions:
st64bv x0, x2, [x0]
st64bv x2, x0, [x0]
We incorrectly warn when its not necessary. This is because we classify them
incorrectly as ldstexcl when it should be lse_atomics in the opcode table.
The incorrect classification makes it pick up the warnings from warning on
exclusive load/stores.
Patch 2: Reclassify Armv8.7-A ST/LD64 Atomics.
This patch reclassifies ST64B{V,V0}, LD64B as lse_atomics rather than ldstexcl
according to their encoding class as specified in the architecture. This also
has the fortunate side-effect of spurious unpredictable warnings getting
eliminated.
For eg. For instruction:
st64bv x0, x2, [x0]
We incorrectly warn when its not necessary:
Warning: unpredictable: identical transfer and status registers --`st64bv x0,x2,[x0]'
This is because we classify them incorrectly as ldstexcl when it should be
lse_atomics in the opcode table. The incorrect classification makes it pick
up the warnings from warning on exclusive load/stores. This patch fixes it
by reclassifying it and no warnings are issued for this instruction.
opcodes/ChangeLog:
2021-04-09 Tejas Belagod <tejas.belagod@arm.com>
* aarch64-tbl.h (struct aarch64_opcode aarch64_opcode_table): Reclassify
LD64/ST64 instructions to lse_atomic instead of ldstexcl.
This adds some annotation to Power10 pcrel instructions, displaying
the target address (ie. pc + D34 field) plus a symbol if there is one
at exactly that target address. pld from the .got or .plt will also
look up the entry and display it, symbolically if there is a dynamic
relocation on the entry.
include/
* dis-asm.h (struct disassemble_info): Add dynrelbuf and dynrelcount.
binutils/
* objdump.c (struct objdump_disasm_info): Delete dynrelbuf and
dynrelcount.
(find_symbol_for_address): Adjust for dynrelbuf and dynrelcount move.
(disassemble_section, disassemble_data): Likewise.
opcodes/
* ppc-dis.c (struct dis_private): Add "special".
(POWERPC_DIALECT): Delete. Replace uses with..
(private_data): ..this. New inline function.
(disassemble_init_powerpc): Init "special" names.
(skip_optional_operands): Add is_pcrel arg, set when detecting R
field of prefix instructions.
(bsearch_reloc, print_got_plt): New functions.
(print_insn_powerpc): For pcrel instructions, print target address
and symbol if known, and decode plt and got loads too.
gas/
* testsuite/gas/ppc/prefix-pcrel.d: Update expected output.
* testsuite/gas/ppc/prefix-reloc.d: Likewise.
* gas/testsuite/gas/ppc/vsx_32byte.d: Likewise.
ld/
* testsuite/ld-powerpc/inlinepcrel-1.d: Update expected output.
* testsuite/ld-powerpc/inlinepcrel-2.d: Likewise.
* testsuite/ld-powerpc/notoc2.d: Likewise.
* testsuite/ld-powerpc/notoc3.d: Likewise.
* testsuite/ld-powerpc/pcrelopt.d: Likewise.
* testsuite/ld-powerpc/startstop.d: Likewise.
* testsuite/ld-powerpc/tlsget.d: Likewise.
* testsuite/ld-powerpc/tlsget2.d: Likewise.
* testsuite/ld-powerpc/tlsld.d: Likewise.
* testsuite/ld-powerpc/weak1.d: Likewise.
* testsuite/ld-powerpc/weak1so.d: Likewise.
Note that this doesn't implement the ISA to the letter regarding
dcbtds (and dcbtstds), which says that the TH field may be zero. That
doesn't make sense because allowing TH=0 would mean you no long have a
dcbtds but rather a dcbtct instruction. I'm interpreting the ISA
wording about allowing TH=0 to mean that the TH field of dcbtds is
optional (in which case the TH value is 0b1000).
opcodes/
PR 27676
* ppc-opc.c (DCBT_EO): Move earlier.
(insert_thct, extract_thct, insert_thds, extract_thds): New functions.
(powerpc_operands): Add THCT and THDS entries.
(powerpc_opcodes): Add dcbtstct, dcbtstds, dcbna, dcbtct, dcbtds.
gas/
* testsuite/gas/ppc/pr27676.d,
* testsuite/gas/ppc/pr27676.s: New test.
* testsuite/gas/ppc/ppc.exp: Run it.
* testsuite/gas/ppc/dcbt.d: Update.
* testsuite/gas/ppc/power4_32.d: Update.
The former two are unused anyway. And having such constants isn't very
helpful either, when they live in a place where updating the register
table wouldn't even allow noticing the need to adjust these constants.
st(1) ... st(7) will never be looked up in the hash table, so there's no
point inserting the entries. It's also not really necessary to do a 2nd
hash lookup after parsing the register number, nor is there a real
reason for having both st and st(0) entries. Plus we can easily do away
with the need for st to be first in the table.
For a long time there hasn't been a need anymore to keep together all
templates with identical mnemonics. Move the MOVQ and MOVABS ones next
to their MOV counterparts. Move the string forms of CMPSD and MOVSD next
to their CMPS / MOVS counterparts. Re-arrange what so fgar was the SSE3
section.
This makes reasonably obvious that MONITOR/MWAIT aren't suitable to
cover by CpuSSE3, but adjusting this is left for another time.
In commit 79dec6b7ba ("x86-64: optimize certain commutative
VEX-encoded insns") I missed the fact that there being subtraction
involved here doesn't matter, as absolute differences get summed up.
This way not only the overall (source) table size shrinks by quite a
bit and the risk of related templates going out of sync with one another
gets lowered, but also (dis)similarities between neighboring templates
become easier to spot.
Note that for certain SSE2AVX templates this results in benign attribute
changes:
- LDMXCSR and STMXCSR: NoAVX gets set,
- MOVMSKPS, PMOVMSKB, PEXTR{B,W} (register destination), and PINSR{B,W}
(register source): IgnoreSize and NoRex64 get set,
- CVT{DQ,PS}2PD, CVTSD2SS, MOVMSKPD, MOVDDUP, PMOV{S,Z}X{BW,WD,DQ}, and
ROUNDSD: NoRex64 gets set,
- CVTSS2SD, INSERTPS, PEXTRW (memory destination), PINSRW (memory
source), and PMOV{S,Z}X{BD,WQ,BQ}: IgnoreSize gets set.
Similarly the "normal" (non-SSE2AVX)
- non-64-bit CVTS{I,S}2SD forms get NoRex64 set,
- CMP{EQ,ORD,NEQ,UNORD}{P,S}{S,D} forms get C set,
all again in a benign way.
The remaining differences in the generated table are due to re-ordering
of entries in the course of being folded into templates.
The table entries are more natural to read (and slightly shorter) when
the prefixes, like is the case for VEX/XOP/EVEX-encoded entries, are
specified as part of the opcode. This is particularly noticable for
side-by-side legacy and SSE2AVX entries.
An implication is that we now need to use "unsigned long long" for the
initially parsed opcode in i386-gen. I don't expect this to be an issue.
Now that all base opcodes are only at most 2 bytes in size, shrink its
template field to just as much. By also shrinking extension_opcode and
operands to just what they really need, we can free up an entire 32-bit
slot (plus 4 left bits past the bitfields themselves).
At present this alters sizeof(struct insn_template) only for 32-bit
builds. In 64-bit builds it instead leaves a padding hole that will
allow to buffer future growth of other fields (opcode_modifier,
cpu_flags, operand_types[]).
Just like is already done for VEX/XOP/EVEX encoded insns, record the
encoding space information in the respective opcode modifier field. Do
this again without changing the source table, but rather by deriving the
values from their existing source representation.
While all of MMX, SSE, and SSE2 are included in "generic64", they can be
individually disabled. There are two MOVQ forms lacking respective
attributes. While the MMX one would get refused anyway (due to MMX
registers not recognized with .nommx), the assembler did happily accept
the SSE2 form. Add respective CPU settings to both, paralleling what the
MOVD counterparts have.
The code was checking wrong bit for sign extension. It caused it
to zero-extend instead of sign-extend the immediate value.
2021-03-25 Abid Qadeer <abidh@codesourcery.com>
opcodes/
* nios2-dis.c (nios2_print_insn_arg): Fix sign extension of
immediate in br.n instruction.
gas/
* testsuite/gas/nios2/brn.s: New.
* testsuite/gas/nios2/brn.d: New.
For VEX-encoded ones, all three involved vector registers have to be
distinct. For EVEX-encoded ones an actual mask register has to be in use
and zeroing-masking cannot be used (violation of either will #UD).
Additionally both involved vector registers have to be distinct for
EVEX-encoded gathers.
For INVLPGB the operand count was wrong (besides %edx there's also %ecx
which is an input to the insn). In this case I see little sense in
retaining the bogus 2-operand template. Plus swapping of the operands
wasn't properly suppressed for Intel syntax.
For PVALIDATE, RMPADJUST, and RMPUPDATE bogus single operand templates
were specified. These get retained, as the address operand is the only
one really needed to expressed non-default address size, but only for
compatibility reasons. Proper multi-operand insn get introduced and the
testcases get adjusted / extended accordingly.
While at it also drop the redundant definition of __amd64__ - we already
have x86_64 defined (or not) to distinguish 64-bit and non-64-bit cases.
In the majority of cases we can easily determine the length from the
encoding, irrespective of whether a prefix is specified there as well.
We further don't even need to record the value in the table entries, as
it's easy enough to determine it (without any guesswork, unless an insn
with major opcode 00 appeared that requires a 2nd opcode byte to be
specified explicitly) when installing the chosen template for further
processing.
Should an encoding appear which
- has a major opcode byte of 66, F3, or F2,
- requires a 2nd opcode byte to be specified explicitly,
- doesn't have a mandatory prefix
we'd need to convert all templates presently encoding a mandatory prefix
this way to the Prefix_0X<nn> model to eliminate the respective guessing
i386-gen does.
Just like is already done for legacy encoded insns, record the mandatory
prefix information in the respective opcode modifier field. Do this
without changing the source table, but rather by deriving the values from
their existing source representation.
This is in preparation of opcode_length going away as a field in the
templates. Identify pseudo prefixes by a base opcode of zero instead:
No real prefix has an opcode of zero. This at the same time allows
dropping a curious special case from i386-gen.
Since most attributes are identical for all pseudo prefixes, take the
opportunity and also template them.
In preparation to use PREFIX_0X<nn> attributes also in VEX/XOP/EVEX
encoding templates, renumber the pseudo-enumerators such that their
values can then also be used directly in the respective prefix bit
fields.
Commit 8b65b8953a ("x86: Remove the prefix byte from non-VEX/EVEX
base_opcode") used the opcodeprefix field for two distinct purposes. In
preparation of having VEX/XOP/EVEX and non-VEX templates become similar
in the representatioon of both encoding space and opcode prefixes, split
the field to have a separate one holding an insn's opcode space.