We had tables for these in the disassembler already, but I'd like to use
them in brw_print.cpp as well. Just wrap the tables in convenience
functions we can use there.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>
This is a new unified set of opcodes for memory access loosely patterned
after the new LSC-style data port messages introduced on Alchemist GPUs.
Rather than creating an opcode for every type of memory access, it has
only three opcodes: load, store, and atomic. It has various sources to
indicate the rest:
- Binding type (raw pointer, pointer to surface state, or BT index)
- Address size (A64, A32, A16)
- Data size (bit size, number of components)
- Opcode (atomic opcode, or LOAD/STORE vs. LOAD_CMASK/STORE_CMASK)
- Mode (typed vs. untyped vs. shared-local vs. scratch)
- Address (and its dimensionality)
- Data (0 for loads, 1 for stores, 2 for atomics)
- Whether we want block access
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>
This is going to handle more than atomics shortly.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>
The intention of inst->is_partial_write() is that it should return true
when any REG_SIZE (32B) chunk of inst's destination is written but not
fully overwritten. This can be used to tell whether inst combines new
data with existing data, or screens off any previous writes, so the old
values are no longer required.
The existing (exec_size * brw_type_size_bytes(this->dst.type) < 32)
check doesn't work in a number of cases. For example, LSC block loads
have exec_size == 1 and force_writemask_all set, but may write multiple
full registers of data. (Currently, we only see them with exec_size 1
after logical-send-lowering, so our SHADER_OPCODE_SEND special case
was covering those.) We had also special cased UNDEF.
Instead, we can simply check:
1. Predication
2. !inst->dst.contiguous()
3. inst->dst.offset % REG_SIZE != 0
4. inst->size_written % REG_SIZE != 0
We had the first three already, but #4 is new. If either #3 or #4
are true, then that implies there is a REG_SIZE chunk of the destination
which is written, but not entirely written, so it's a partial write.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>
The intention here is to detect ALU hardware instructions, but not
virtual instructions that haven't been explicitly whitelisted.
For some reason we had arbitrarily hardcoded 128 here, but our virtual
opcodes don't start at 128. They start at NUM_BRW_OPCODES. So, use
that instead.
This prevents regressions later when we delete some opcodes, shifting
some virtual opcodes into the 72-128 range.
Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>
Near/far depth clip was implemented by setting the low/high_depth_clamp
to -/+INFINITY, which is invalid on Mali.
This commit removes the modification of the depth clamp values and
enables depth clipping by setting the depth_cull_enable state in the ZS
descriptor for v9+ (the equivalent pre-v9 RSD state is already set
correctly) and setting the primitive cull flags correctly for both.
Finally, it disables PIPE_CAP_DEPTH_CLIP_DISABLE_SEPARATE for v9+, as
both plane clips are controlled by a single value.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11506
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31041>
The global thread storage descriptor can be allocated in the dispatch
path, which means the test on tsd != NULL might be true when the first
RUN_IDVS() happens, and we never set r24 to the TSD address.
Fix that by keeping a tsd field in panvk_cmd_graphics_state that's only
set in the draw path.
Reported-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Tested-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Tested-by: Alexandre ARNOUD <aarnoud@me.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31147>
Instead of unpacking the x86_64_build container and its billion build
dependencies every time, switch to using only what's in the minimal
pyutils container, and the Python scripts we get as an artifact from the
python-test job. Pulling the artifacts from S3 rather than using GitLab
is also much more efficient.
This should substantially reduce the runtime required to get to testing.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31151>
Currently, our jobs which only want to run a little bit of python -
python-test and the LAVA jobs - pull the entire x86_64-build image,
which is both massive, and massively unnecessary.
Create a separate image which only carries what we need to run our
Python tests and utilities, and switch python-test to using that.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31151>
It has proven to be useful.
Due to the .rusticl-rules reference, job was already running in pre-merge,
so let's make it official.
Reviewed-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Signed-off-by: David Heidelberg <david@ixit.cz>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31144>
Note that D3D12_VIDEO_ENCODER_SUPPORT_FLAG_READABLE_RECONSTRUCTED_PICTURE_LAYOUT_AVAILABLE
was added in DirectX-Headers v1.614.1
Reviewed-By: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31092>
As sampler view can be used multiple times, do not attempt to rebind if
it was already bound.
This fixes a crash when replaying half-life-2-v2.trace.
Backport-to: 24.2
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31049>
This reverts commit 0b85476d86.
When mapping a BO in v3d, the map keeps forever until freeing the BO. If
later the map is required again, we reuse the map instead of doing the
map from scratch.
This saves calling map/unmap continuously, as well as a mechanism to
keep control of the map usage, like a reference count.
Thus, when reallocating a BO, if it is mapped it just means the map was
used in the past, but not necessarily it is in use right now.
The reverted commit was causing performance regressions in multiple
applications, reducing from 60fps to 5fps.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11783
Backport-to: 24.2
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31049>
We currently have 3 runners with 7950X3D CPUs and 7900XT GPUs which
can rip through VKCTS in ~16 minutes.
Since this is over the 15 minutes threshold, we parallelize the job to
get under 15 minutes, which nets up ~10 minutes total runtime.
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31085>
Coverity alerts that the uint32_t pointer I was passing into
isl_color_value_pack() could possibly be used as an array. The value is
being used as such, but only the first element of that array should be
accessed. That's because the depth buffer formats I'm also passing into
the function only have a single channel, R. Nonetheless, let's update
the code to avoid the warning.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31123>
this is tricky since the ownership model here works the other
way around from normal: the resolve surface's lifetime is
determined by the resource's lifetime, meaning that
the surface should be destroyed by the resource instead
of the resource being destroyed by the surface
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31124>
panfrost_invert_swizzle produces a (one-sided) inverse, that is a
function g such that g(f(x)) = x. However, it is actually used in
some cases where we want a two-sided inverse, where we also have
f(g(x)) = x. An example is its use to pre-swizzle border color
values so that a later pass will un-swizzle them. If the swizzle
is not one-to-one this two sided inverse does not exist. However,
we can do better than the original code, e.g. for an RRR1 swizzle
inverse produced was originally B000, which when applied on the wrong
side results in BBB1 as output, whereas R000 would produce the
desired RRR1 output. Using the first valid component we see, rather
than the last one, is thus usually better.
The "correct" solution is to re-write all the code that uses
an inverse to handle non-unique inverses. But frankly these uses only
crop up in fairly niche cases like tests, and it's probably not worth
spending a lot of effort to deal with these edge cases when this
patch fixes most of them.
Fixes some failing piglit ext_framebuffer_multisample tests.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31060>
- More robust.
- Handles properly UBO cases, needed for proper OpenCL support (rusticl).
- Resolved KHR-GL46.gpu_shader_fp64.fp64.max_uniform_components failure.
Fixes: f5ce806ed7 ("freedreno/ir3: Add wide load/store lowering")
Reviewed-by: Rob Clark <robdclark@freedesktop.org>
Co-authored-by: Rob Clark <robclark@freedesktop.org>
Signed-off-by: David Heidelberg <david@ixit.cz>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30961>
Patch panvk_macros.h to add a case for v10 and accept v10 devices.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Rebecca Mckeever <rebecca.mckeever@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: John Anthony <john.anthony@arm.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30969>
With its sync object primitive, CSF allows native event support.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: John Anthony <john.anthony@arm.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30969>
With those two components implemented, we can now compile all common
per-arch source files.
Co-developed-by: Rebecca Mckeever <rebecca.mckeever@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Rebecca Mckeever <rebecca.mckeever@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30969>
Events cross the device boundary and can be read/written by the host.
We could add the necessary cache flush and made those GPU-cached, but
it's way simpler to allocate non-cached memory.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Rebecca Mckeever <rebecca.mckeever@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: John Anthony <john.anthony@arm.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30969>
Panthor allows us to control the GPU virtual address space,
which, among other things, will allow us to support sparse
bindings.
Let's tweak the device initialization and BO mapping to
takes this explicit VA management mode into account.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Rebecca Mckeever <rebecca.mckeever@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: John Anthony <john.anthony@arm.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30969>