Implement the MVE VHCADD insn, which is similar to VCADD
but performs a halving step. This one overlaps with VADC.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-43-peter.maydell@linaro.org
Implement the MVE VCADD insn, which performs a complex add with
rotate. Note that the size=0b11 encoding is VSBC.
The architecture grants some leeway for the "destination and Vm
source overlap" case for the size MO_32 case, but we choose not to
make use of it, instead always calculating all 16 bytes worth of
results before setting the destination register.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-42-peter.maydell@linaro.org
Implement the MVE VADC and VSBC insns. These perform an
add-with-carry or subtract-with-carry of the 32-bit elements in each
lane of the input vectors, where the carry-out of each add is the
carry-in of the next. The initial carry input is either 1 or is from
FPSCR.C; the carry out at the end is written back to FPSCR.C.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-41-peter.maydell@linaro.org
Implement the MVE VRHADD insn, which performs a rounded halving
addition.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-40-peter.maydell@linaro.org
Implement the vector form of the MVE VQDMULL insn.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-39-peter.maydell@linaro.org
Implement the MVE VQDMLSDH and VQRDMLSDH insns, which are
like VQDMLADH and VQRDMLADH except that products are subtracted
rather than added.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-38-peter.maydell@linaro.org
Implement the MVE VQDMLADH and VQRDMLADH insns. These multiply
elements, and then add pairs of products, double, possibly round,
saturate and return the high half of the result.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-37-peter.maydell@linaro.org
Implement the MV VQRSHL (vector) insn. Again, the code to perform
the actual shifts is borrowed from neon_helper.c.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-34-peter.maydell@linaro.org
Implement the MVE VQSHL insn (encoding T4, which is the
vector-shift-by-vector version).
The DO_SQSHL_OP and DO_UQSHL_OP macros here are derived from
the neon_helper.c code for qshl_u{8,16,32} and qshl_s{8,16,32}.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-33-peter.maydell@linaro.org
Implement the vector forms of the MVE VQADD and VQSUB insns.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-32-peter.maydell@linaro.org
Implement the vector forms of the MVE VQDMULH and VQRDMULH insns.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-31-peter.maydell@linaro.org
Implement the MVE VQDMULL scalar insn. This multiplies the top or
bottom half of each element by the scalar, doubles and saturates
to a double-width result.
Note that this encoding overlaps with VQADD and VQSUB; it uses
what in VQADD and VQSUB would be the 'size=0b11' encoding.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-30-peter.maydell@linaro.org
Implement the MVE VQDMULH and VQRDMULH scalar insns, which multiply
elements by the scalar, double, possibly round, take the high half
and saturate.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-29-peter.maydell@linaro.org
Implement the MVE VQADD and VQSUB insns, which perform saturating
addition of a scalar to each element. Note that individual bytes of
each result element are used or discarded according to the predicate
mask, but FPSCR.QC is only set if the predicate mask for the lowest
byte of the element is set.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-28-peter.maydell@linaro.org
Implement the MVE VPST insn, which sets the predicate mask
fields in the VPR to the immediate value encoded in the insn.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-27-peter.maydell@linaro.org
Implement the MVE VBRSR insn, which reverses a specified
number of bits in each element, setting the rest to zero.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-26-peter.maydell@linaro.org
Implement the scalar variants of the MVE VHADD and VHSUB insns.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-25-peter.maydell@linaro.org
Implement the scalar forms of the MVE VSUB and VMUL insns.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-24-peter.maydell@linaro.org
Implement the scalar form of the MVE VADD insn. This takes the
scalar operand from a general purpose register.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-23-peter.maydell@linaro.org
Implement the MVE VRMLALDAVH and VRMLSLDAVH insns, which accumulate
the results of a rounded multiply of pairs of elements into a 72-bit
accumulator, returning the top 64 bits in a pair of general purpose
registers.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-22-peter.maydell@linaro.org
Implement the MVE insn VMLSLDAV, which multiplies source elements,
alternately adding and subtracting them, and accumulates into a
64-bit result in a pair of general purpose registers.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-21-peter.maydell@linaro.org
Implement the MVE VMLALDAV insn, which multiplies pairs of integer
elements, accumulating them into a 64-bit result in a pair of
general-purpose registers.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-20-peter.maydell@linaro.org
Implement the MVE VMULL insn, which multiplies two single
width integer elements to produce a double width result.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-19-peter.maydell@linaro.org
Implement MVE VHADD and VHSUB insns, which perform an addition
or subtraction and then halve the result.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-18-peter.maydell@linaro.org
Implement the MVE VABD insn.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-17-peter.maydell@linaro.org
Implement the MVE VMAX and VMIN insns.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-16-peter.maydell@linaro.org
Implement the MVE VRMULH insn, which performs a rounding multiply
and then returns the high half.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-15-peter.maydell@linaro.org
Implement the MVE VMULH insn, which performs a vector
multiply and returns the high half of the result.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-14-peter.maydell@linaro.org
Implement the MVE VADD, VSUB and VMUL insns.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-13-peter.maydell@linaro.org
Implement the MVE vector logical operations operating
on two registers.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-12-peter.maydell@linaro.org
Implement the MVE VDUP insn, which duplicates a value from
a general-purpose register into every lane of a vector
register (subject to predication).
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-11-peter.maydell@linaro.org
The Arm MVE VDUP implementation would like to be able to emit code to
duplicate a byte or halfword value into an i32. We have code to do
this already in tcg-op-gvec.c, so all we need to do is make the
functions global.
For consistency with other functions made available to the frontends:
* we rename to tcg_gen_dup_*
* we expose both the _i32 and _i64 forms
* we provide the #define for a _tl form
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20210617121628.20116-10-peter.maydell@linaro.org
Implement the MVE VNEG insn (both integer and floating point forms).
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-9-peter.maydell@linaro.org
Implement the MVE VABS functions (both integer and floating point).
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-8-peter.maydell@linaro.org
Implement the MVE VMVN(register) operation. Note that for
predication this operation is byte-by-byte.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-7-peter.maydell@linaro.org
Implement the MVE instructions VREV16, VREV32 and VREV64.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-6-peter.maydell@linaro.org
Implement the MVE VCLS insn.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-5-peter.maydell@linaro.org
Implement the MVE VCLZ insn (and the necessary machinery
for MVE 1-input vector ops).
Note that for non-load instructions predication is always performed
at a byte level granularity regardless of element size (R_ZLSJ),
and so the masking logic here differs from that used in the VLDR
and VSTR helpers.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-4-peter.maydell@linaro.org
Implement the variants of MVE VLDR (encodings T1, T2) which perform
"widening" loads where bytes or halfwords are loaded from memory and
zero or sign-extended into halfword or word length vector elements,
and the narrowing MVE VSTR (encodings T1, T2) where bytes or
halfwords are stored from halfword or word elements.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-3-peter.maydell@linaro.org
Implement the forms of the MVE VLDR and VSTR insns which perform
non-widening loads of bytes, halfwords or words from memory into
vector elements of the same width (encodings T5, T6, T7).
(At the moment we know for MVE and M-profile in general that
vfp_access_check() can never return false, but we include the
conventional return-true-on-failure check for consistency
with non-M-profile translation code.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-2-peter.maydell@linaro.org
Instead of open-coding the "take NOCP exception if FPU disabled,
otherwise call gen_preserve_fp_state()" code in the accessors for
FPCXT_NS, add an argument to vfp_access_check_m() which tells it to
skip the gen_update_fp_context() call, so we can use it for the
FPCXT_NS case.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210618141019.10671-8-peter.maydell@linaro.org
vfp_access_check and its helper routine full_vfp_access_check() has
gradually grown and is now an awkward mix of A-profile only and
M-profile only pieces. Refactor it into an A-profile only and an
M-profile only version, taking advantage of the fact that now the
only direct call to full_vfp_access_check() is in A-profile-only
code.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210618141019.10671-7-peter.maydell@linaro.org
Factor the code in full_vfp_access_check() which updates the
ownership of the FP context and creates a new FP context
out into its own function.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210618141019.10671-6-peter.maydell@linaro.org
A few subcases of VLDR/VSTR sysreg succeed but do not perform a
memory access:
* VSTR of VPR when unprivileged
* VLDR to VPR when unprivileged
* VLDR to FPCXT_NS when fpInactive
In these cases, even though we don't do the memory access we should
still update the base register and perform the stack limit check if
the insn's addressing mode specifies writeback. Our implementation
failed to do this, because we handle these side-effects inside the
memory_to_fp_sysreg() and fp_sysreg_to_memory() callback functions,
which are only called if there's something to load or store.
Fix this by adding an extra argument to the callbacks which is set to
true to actually perform the access and false to only do side effects
like writeback, and calling the callback with do_access = false
for the three cases listed above.
This produces slightly suboptimal code for the case of a write
to FPCXT_NS when the FPU is inactive and the insn didn't have
side effects (ie no writeback, or via VMSR), in which case we'll
generate a conditional branch over an unconditional branch.
But this doesn't seem to be important enough to merit requiring
the callback to report back whether it generated any code or not.
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210618141019.10671-5-peter.maydell@linaro.org
The M-profile architecture requires that accesses to FPCXT_NS when
there is no active FP state must not take a NOCP fault even if the
FPU is disabled. We were not implementing this correctly, because
in our decode we catch the NOCP faults early in m-nocp.decode.
Fix this bug by moving all the handling of M-profile FP system
register accesses from vfp.decode into m-nocp.decode and putting
it above the NOCP blocks. This provides the correct behaviour:
* for accesses other than FPCXT_NS the trans functions call
vfp_access_check(), which will check for FPU disabled and
raise a NOCP exception if necessary
* for FPCXT_NS we have the special case code that doesn't
call vfp_access_check()
* when these trans functions want to raise an UNDEF they return
false, so the decoder will fall through into the NOCP blocks.
This means that NOCP correctly takes precedence over UNDEF
for these insns. (This is a difference from the other insns
handled by m-nocp.decode, where UNDEF takes precedence and
which we implement by having those trans functions call
unallocated_encoding() in the appropriate places.)
[Note for backport to stable: this commit has a semantic dependency
on commit 9a486856e9, which was not marked as cc-stable because
we didn't know we'd need it for a for-stable bugfix.]
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210618141019.10671-4-peter.maydell@linaro.org
If the guest makes an FPCXT_NS access when the FPU is disabled,
one of two things happens:
* if there is no active FP context, then the insn behaves the
same way as if the FPU was enabled: writes ignored, reads
same value as FPDSCR_NS
* if there is an active FP context, then we take a NOCP
exception
Add code to the sysreg read/write functions which emits
code to take the NOCP exception in the latter case.
At the moment this will never be used, because the NOCP checks in
m-nocp.decode happen first, and so the trans functions are never
called when the FPU is disabled. The code will be needed when we
move the sysreg access insns to before the NOCP patterns in the
following commit.
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210618141019.10671-3-peter.maydell@linaro.org
In the code for handling VFP system register accesses there is some
stray whitespace after a unary '-' operator, and also some incorrect
indent in a couple of function prototypes. We're about to move this
code to another file, so fix the code style issues first so
checkpatch doesn't complain about the code-movement patch.
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210618141019.10671-2-peter.maydell@linaro.org
These days the Arm architecture has a wide range of fine-grained
optional extra architectural features. We implement quite a lot
of these but by no means all of them. Document what we do implement,
so that users can find out without having to dig through back-issues
of our Changelog on the wiki.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20210617140328.28622-1-peter.maydell@linaro.org
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>