Those are i915_drm.h specific types and should not be in code paths
shared by i915 and Xe KMD.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21887>
Skip heavyweight crashing tests that have the potential to take down not just
themselves but also other Piglit tests running concurrently via piglit-runner
(which would otherwise become piglit-runner level flakes).
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21891>
These are (have always been) quite broken. Given that the whole section is
already in the flakes.txt, and there's no plan for improving this (I've tried
and fails), I'd rather just skip the section and reduce the noise in
the #panfrost-ci channel.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21891>
This is really dumb.
But this fixes arb_shader_language_420pack-active-sampler-conflict on v7 which
otherwise dereferences a null pointer trying to access the nonexistant texture
arrays, or DATA_INVALID_FAULTs if you give it a texture array filled with
zeroes. But it seems happy if you bind in null textures. This is dumb but less
faults in Piglit is good for reducing flakes.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21891>
mesa/st doesn't like to use 24-bit textures, preferring RGBX over true RGB even
for texture views where this isn't valid. Given how silly true RGB is in
practice, I'd rather drop support and fix texture views than go against the
grain and risk more issues down the line since nobody else in tree is testing
these paths and apps really shouldn't be caring.
Fixes page faults in arb_texture_view-rendering-formats_gles3 which tries to
sample an R8G8B8_UINT texture with a R8G8B8X8_UNORM view in one subcase. That
test is now passing reliably.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21891>
This is signed, not unsigned. We were already passing negatives and silently
relying on 2's complement and C to do the right thing. But that's silly. We
should just, actually do the right thing.
Found while struggling to debug primitive-restart-draw-mode.
v2: Update the other architectures too, including a decode_csf.c change for the
v10 incarnation of this v4-era field.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net> [v1]
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21891>
We just want a bit-exact transfer for integers. Using .auto32 accomplishes this
without any clamping shenanigans. Fixes gl-3.0-vertexattribipointer.
Note we can't use .auto32 unconditionally, since reading a uint vertex as float
is supposed to convert (or something like that, gl-2.0-vertexattribpointer tests
the bad case at any rate).
Fixes: 482cc273af ("pan/bi: Implement load attribute with the builder")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21891>
draw->index_bounds_valid tells drivers that the values of min_index/max_index
are set correctly and can be used e.g. to allocate memory for varyings. If set
incorrectly, the GL promises badness.
But, with primconvert, we go mucking with index buffers and then never update
the bounds. So it doesn't matter if the original index bounds were valid, we
can't promise the original bounds are *still* valid. If we were trying to
optimize CPU overhead, we could try to preserve the new min/max index but seeing
as only older Mali cares about this flag, and if you're using primconvert you're
already screwed, I'm not too inclined to go rework primconvert.
Fixes* page faults in primitive-restart-draw-mode on Mali-G52 for GL_QUAD_STRIPS
and GL_POLYGON, which hit the primconvert path. The full dmesg splat looks like:
[ 5438.811727] panfrost ffe40000.gpu: Unhandled Page fault in AS0 at VA 0x000000100A16BAC0
Reason: TODO
raw fault status: 0x25002C1
decoded fault status: SLAVE FAULT
exception type 0xC1: TRANSLATION_FAULT_1
access type 0x2: READ
source id 0x250
Notice that a high bit is randomly set in the address, this is trying to read
a varying from the actual varying buffer in the vicinity of 0xa16bac0. What's
actually happening is that we're trying to read index #0 despite promising the
driver a minimum index of 2, causing an integer underflow as we try to read
index -2, or as the hardware sees, 4294967294.
As long as we stop lying to panfrost about the bounds being correct, panfrost is
able to calculate the real (post-primconverted) bounds on its own, fixing the
test.
* Alternatively, maybe Panfrost should just ignore this bit, in which I don't
know why we have it in Gallium, since it's probably not conformant to fault on
out-of-range glDrawRangeElements.
Fixes: 72ff53098c ("gallium: add pipe_draw_info::index_bounds_valid")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21891>
depth_stencil_attachment has been changed from a pointer to the attachment idx
to just the attachment idx, as this avoids the driver having to check for NULL
when comparing attachments indexes with depth_stencil_attachment.
Anyplace that relies on depth_stencil_attachment being a valid index must
already check that depth_stencil_attachment is not VK_ATTACHMENT_UNUSED, so
this change avoids having to check both the pointer and the index for the same
information.
Noticed when running dEQP-VK.api.smoke.triangle
Signed-off-by: Jarred Davies <jarred.davies@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21690>
This also makes vn_QueueSignalReleaseImageANDROID async since it makes
use of a queue submit followed by an external fence export internally.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21716>
vn_instance_roundtrip does 2 things:
1. vn_instance_submit_roundtrip
- before: encode a cmd to write vq seqno to ring extra field
- after: encode a cmd to update vq seqno against a ring
- submit the encoded cmd via vq
2. vn_instance_wait_roundtrip
- before: wait until ring extra field has the vq seqno
- after: let renderer ring thread wait for the vq seqno
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21716>
Up until now, we have been handrolling part of the init-stage2.sh in
the b2c command line. Let's stop doing that and instead use the same
script as every other HW farms.
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21872>
First make sure shadow mask change sets dirty state.
Second move shadow mask bit removal to unbind_samplerview which
is cleaner and correctly clears the shadow bit when binding buffer texture.
Fixes: 5193f4f712 ("zink: add a fs shader key member to indicate depth texturing mode")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21571>
usually zink_get_cmdbuf() is enough for reordering operations, but
with new technology, it becomes possible to promote even the most stubborn
buffers to the unordered cmdbuf
first, check the src buffer to ensure that there's no pending writes in
the main cmdbuf that would prohibit reordering
second, apply a TRANSFER_DST to the dst buffer using the util function
to determine whether it can be reordered
if both the src and dst can be reordered for their respective regions
and read/write usage, then the entire op can be promoted regardless of
the unordered_read/unordered_write flags
this optimizes out patterns like
upload index buffer (offset=0)
draw
upload index buffer (offset=128)
draw
upload index buffer (offset=256)
draw
...
so that the uploads and draws can be separated and batched
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21802>
I wanted to find slow pieces of code in our Anv driver using our
drm-shim stub.
The last bit of code still talking to the compositor was the WSI
swapchain code and failing because none of the submissions are taking
place (because of the stub).
This change introduces a new variable MESA_VK_WSI_HEADLESS_SWAPCHAIN
which when set turns every swapchain creation into a headless
swapchain. This swapchain does not present anything, allowing the
application to spin as many frames as possible. Thus helping to
identify slow spots in command buffer building path.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6156>
nir->info.subgroup_size can be set to an enum :
SUBGROUP_SIZE_VARYING = 0
SUBGROUP_SIZE_UNIFORM = 1
SUBGROUP_SIZE_API_CONSTANT = 2
SUBGROUP_SIZE_FULL_SUBGROUPS = 3
So compute the API subgroup size value and compare it to the dispatch
size to determine whether we need some bound checking.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 9ac192d79d ("intel/fs: bound subgroup invocation read to dispatch size")
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21856>
I'm not planning to stand mesa-swrast back up until we get Kata set up, so
turn the testing back on at a reduced fraction on so that
venus/llvmpipe/etc. dev can still get some coverage.
I haven't turned lavapipe back on, because it is now unstable in memory
model / atomics tests.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21880>
For instance, to load uniform data with the LSC we usually rely on
tranpose messages which have to execute in SIMD1. Those end up being
considered as partial writes so within loops their life span spread to
the whole loop, increasing register pressure.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21867>
on pipeline bind with dynamic state, depth_clip_near needs to either be set by
* applying the dynamic state
* using the pipeline state
the previous code always used the pipeline state
fixes:
dEQP-VK.pipeline.*.extended_dynamic_state.between_pipelines.depth_clamp_enable
Fixes: 650880105e ("vulkan,lavapipe: Use a tri-state enum for depth clip enable")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21814>
if it's known that a renderpass is active and the driver wants to do
renderpass optimizing, help out by not forcing a sync and instead doing
what the driver would do: create a staging buffer and copy it to the
image
this requires that the driver already handles buffer -> image copies
with resource_copy_region
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21801>
It is legal to pass in nullptr as an instance into
vkGetInstanceProcAddr when resolving any global addresses, this
wasn't handled correctly and an illegal access to a member of
a null struct was made.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21827>
This commit rewrites the KGSL backend to utilize vk common wherever
possible to bring the codebase in line with DRM while implicitly
fixing minor API bugs that may have occurred as a result of manually
implementing VK functions.
As a part of moving to vk common, KGSL sync is now implemented
atop vk common sync and vastly expanded in terms of functionality
such as:
* Import/Export of sync FDs - A required capability for properly
supporting the Android WSI and as these functions were stubbed
when a presentation operation used semaphores, it would cause a
leak of FDs that were imported due to the expectation that the
driver would close them. As well as causing UB around due to
ignoring the imported FD or not exporting a valid FD.
* Supporting pre-signalled fences - Vulkan allows fences to be
created in a signalled state which was stubbed prior and can
lead to UB.
* Timeline semaphore support - As a result of utilizing vk common
as the backbone for synchronization, its timeline semaphore
emulation has been utilized to provide support for them without
needing kernel support. (Note: On newer versions of KGSL,
timeline semaphores can be implemented natively rather than
using emulation as they support wait-before-signal)
Fixes freezes due to semaphore usage with presentation on:
* Genshin Impact
* Skyline Emulator
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21651>
Enough that we can use OUT_PKT() to emit it, which will be needed when
we use it to write regs that are different btwn a6xx and a7xx.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21846>
Originally if we had an anonymous field (ie. field declared as part of
the register definition itself) the name in the generated field struct
would include the gen prefix (ie. .a6xx_rb_stencil_buffer_pitch), but
this doesn't work for variants because the variant regs would have
different gen prefixes. Fix this by using reg name instead of the
full_name.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21846>
For regs with multiple variants, generate a template'ized function to
pack the reg value. If the template param is known at compile time
(which is the expected usage) this will optimize to the same thing as
the "traditional" reg packing.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21846>
Track varset and assert that variants refer to a valid varset enum
value. This adds a bit of extra sanity checking, but becomes more
useful in the next commit.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21846>
They have more similarities than differences, so merge them and use
"variant" attribute as needed to manage differences.
Note initially using "variant" conservatively when it comes to regs
known on a7xx but not a6xx. It could be that they exist also on later
versions of a6xx as well, for example. For ex, LPAC related regs/bits
likely existed on later a6xx (eg. a660 family) but BV stuff is not.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21846>
To merge a7xx and a6xx regs, using variant property to manage the
differences, we'll want regs/etc to be named according to the first
generation it is use rather than the domain name. Add a new prefix
type to accomplish this. By default, if no variant property, things
will still be named based on domain (ie. REG_A6XX_...), and things
that have variant="A6XX" will also end up as they currently are
(since the chip enum matches domain name), but things that have
variant="A7XX" will end up as REG_A7XX_...
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21846>
Clang seems more relaxed about this, allowing C99 style initializers
without requiring ordering. But unfortunately g++ is more picky :-/
TODO this doesn't completely fix everything with g++, namely sparse
array initialization.. for ir3 driver-params, I think we can convert
these to structs. But there are still one or two others to deal with.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21846>
C++ is more picky about a goto jumping over variable initialization,
even if unused after the goto label (presumably because of destructors
that can be called after a variable goes out of scope). Since there is
only a single fallback path, get rid of the goto.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21846>
If both, any-hit and intersection shader, use scratch vars,
it could happen that they end up in the same location and
overwrite each other.
Found by inspection.
Fixes: c3d82a9622 ('radv: Add pass to lower anyhit shader into an intersection shader.')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21863>
This should allow us to remove a big memset when compiling a
graphics pipeline. This is mostly for imported NIR stages which
don't go through radv_pipeline_stage_init().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20947>
This should allow us to remove a big memset when compiling a
graphics pipeline. This is mostly for imported NIR stages which
don't go through radv_pipeline_stage_init().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20947>
In file included from ../src/tool/pps/pps_device.cc:10:
../src/tool/pps/pps_device.h:23:11: error: ‘uint32_t’ does not name a type
23 | static uint32_t device_count();
| ^~~~~~~~
In file included from ../src/tool/pps/pps_counter.cc:10:
../src/tool/pps/pps_counter.h:22:4: error: ‘uint32_t’ does not name a type
22 | uint32_t id;
| ^~~~~~~~
Fixes: 1cc72b2aef ("pps: Gfx-pps v0.3.0")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8186
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21714>
We have 3 new/changed functions with this commit:
1. _mesa_alloc_dispatch_tables creates all dispatch tables that are not
created on demand and sets them to nop. This operates on gl_dispatch,
so it's reusable (e.g. glthread will want to use it)
2. _mesa_free_dispatch_tables frees everything
3. _mesa_initialize_dispatch_tables initializes gl_dispatch for GL
(not glthread)
Acked-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21777>
I like this more. The name self-documents itself. It's always equal
to the dispatch set in glapi.
GLAPI is a definition, so can't use that.
Acked-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21777>
There is a new struct gl_dispatch, which I'd like to reuse in glthread.
This allows building code around gl_dispatch that can be shared between
mesa and glthread. This is only refactoring.
Acked-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21777>
Prepare for Clover removal; don't waste resources on Clover anymore.
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21865>
Upstream moved Host.h from Support to TargetParser in LLVM 17.
This shouldn't lead to a FTBFS, since there is a forwarding include left
behind. Sadly the added deprecation warning #pragma is invalid and thus
causes a build failure right away. But since we would have to follow the
move anyway in the future, just do it right away.
Reference: d768bf994f
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Closes: #8275
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21263>
After enabling the Wayland platform for x86_64,
multiple new tests were triggered, some of which timed out.
Also wayland-dEQP-EGL.functional.negative_api.create_pixmap_surface now pass.
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21786>
Currently, the generated tests consist of some boilerplate, generated
test cases, and at the very end the actual test. This is bad for readability,
because the actual code is all the way at the bottom. It's also bad for
clang-format linting: even though the test cases are /* clang-format off */,
they still take an exceptionally long time to parse when linting. I suspect this
is a clang-format bug, but it's easy enough to workaround.
To solve these issues, restructure so that the test cases are in separate files
(containing the actual data), but the manually written test functions are
consolidated into a new family of generated layout tests. This is probably
cleaner.
Parallel clang-format linting is now 10x faster on the M1, which means it's
now practical to lint in my "publish branch" hook.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21854>
Not using it yet, that will be done in the next patch.
Xe only supports submission using VM.
For i915 the backend functions are just a noop.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21698>
As noted on [1], lowering patterns of the form
floatS -> floatB -> floatS ==> floatS
cannot require precision since this may cause flush denorming.
[1] 3f779013 ("nir: Add an algebraic optimization for float->double->float")
Fixes: b86305bb ("nir/algebraic: collapse conversion opcodes (many patterns)")
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Signed-off-by: Isabella Basso <isabellabdoamaral@usp.br>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20965>
Conversions on smaller bit sizes should also be collapsed when composed.
This also adds more patterns on the
intS -> intB -> floatB ==> intS -> floatB
lowering so as to deal with any int size C > B instead of a fixed intB.
Closes: #7776
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Signed-off-by: Isabella Basso <isabellabdoamaral@usp.br>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20965>
Some patterns were outside the list of optimizations.
Fixes: b86305bb ("nir/algebraic: collapse conversion opcodes (many patterns)")
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Signed-off-by: Isabella Basso <isabellabdoamaral@usp.br>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20965>
Rather than ingesting separate control and memory barriers, ingest only the
combined and optimized scoped_barrier intrinsic. For barriers originating from
GLSL, this makes it easier to ensure correctness. For barriers originating from
SPIR-V, this is required for translation at all, as spirv_to_nir knows only
scoped barriers. So this gets us closer to Vulkan and OpenCL.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21752>
New 10 devices - asus-CM1400CXA-dalboz hosted on Collabora farm.
1x Move VA-API tests to the dalboz (more resources). One timeout dropped.
9x Run VKCTS on dalboz.
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21702>
Lowers away 64-bit loads, which we'll create in the sysval lowering for
dynamically indexed UBOs/VBOs. The lowering generates pack_64_2x32 instructions,
so lower those too.
No shader-db changes.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21674>
Instead of lowering to bitwise ops. Yet another way of subdividing in NIR.
Probably insignificant but makes it easy to check that the pass ordering from the
previous pass is right. It does let us get much better codegen for
unpacksnorm2x16, whatever that's worth.
No shader-db changes.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21674>
As intended. We can't CSE with partial null destinations in the way, so we
shouldn't eliminate dead destinations until after CSE has run. But we should
still eliminate dead instructions to ensure CSE doesn't move things around
needlessly, hurting register pressure.
Noticed while debugging live range splitting.
No GLES3.0 shader-db changes.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21674>
Our dead code elimination pass does two things:
1. delete instructions that are entirely unnecessary
2. delete unnecessary destinations of necessary instructions
To deal with pass ordering issues, we sometimes want to do #1 without #2.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21674>
We should handle nir_op_unpack_32_2x16_split_* natively, since we can generate
better code with agx_subdivide (coalescing the ops away) than the bitshift
lowering.
That said, we do need some extra instructions for the floating point
conversions.
No shader-db changes (which makes sense because we're targetting the GLES3.0
shader-db, which doesn't have the packing GLSL functions).
The real motivation of this change isn't optimizing some GLSL pack functions,
though, it's avoiding a code regression from using NIR's memory bit size
lowering in a future MR. That lowering will turn things like "load i16vec4" into
"load i32vec2 + unpack_32_2x16", so we need to be able to coalesce that unpack.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21674>
At some point in a refactoring long ago, our 'Piglit' runs on arm64
started actually being dEQP-GLES2 runs. Oh dear.
Surprisingly, there are a number of expectation changes; added every
fail I saw from a long overnight stress test.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21851>
Minimum/maximum LOD and LOD bias are unsigned and signed fixed point formats
respectively. They are not unsigned integers. Introduce fixed-point types into
our GenXML and use them in the XML, rather than packing in sidebands. This makes
the XML more correct and fixes pretty-printing of texture and sampler
descriptors.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20445>
Connor Abbott wrote a nice explanation of how instance divisors work on Mali.
Let's add it to the driver docs instead of letting it languish in a forgotten
header file.
This is mostly pasted from the existing header in tree, with a few local changes
applied.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20445>
Nowadays, formats are defined with GenXML, not the old panfrost-job.h, so most
of the format #defines in panfrost-job.h are unused. That said, a few are still
in use as a backdoor for compressed format queries to avoid a GenXML dependency.
That's not great but cleaning that up isn't the subject of this MR.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20445>
XE architecture enables many more metrics, perhaps too many for
the average user. Reduce reported metrics to smaller subset,
known as non-extended metrics, by default. Can re-enable extended
metrics with env var INTEL_EXTENDED_METRICS=1
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21841>
Implement buffer textures in full generality. There are a few issues here:
* OpenGL requires buffer textures support a minimum size of 65536 elements,
however 1D textures in AGX are (at most) 8192 elements.
* OpenGL 4.0 (and OpenGL ES) require buffer textures to support the "RGB32"
texture formats. These are 3 packed channels of 32-bits each. In general,
non-power-of-two texel sizes are problematic. AGX does not support any such
formats and we rely on the GL frontend to lower to a padded format (RGBX) if
necessary. Such a lowering cannot work for buffer textures, however, so we
need to find a way to implement RGB32 buffer textures.
We solve these issues in the follow way:
* Use 2D texture descriptors for buffer textures, with a large fixed
power-of-two size along one axis. Then large texel indices may be accessed at
a small vec2 texel coordinate, and since the fixed dimension is a
power-of-two, that vector may be recovered by simply shifting and masking.
This effectively avoids size restriction. We do need to clamp texel indices to
the buffer size to avoid faulting on OOB reads, since we may read past the end
of the buffer (if the app binds a non-page-aligned offset into the buffer).
* Use a general purpose memory load for RGB32 buffer textures. Lower the texture
load instruction to a memory load from the buffer and some address arithmetic.
There's no format conversion needed for RGB32, other than maybe filling in a
format-appropriate alpha, so this is straightforward. Again, we need to clamp
the texel index for robustness with OOB reads.
Each of these solutions brings its own problem.
* Using 2D textures instead of 1D requires physically rounding up the buffer
size when packing the descriptor, so we can no longer implement textureSize()
by reading off the texture descriptor like normal.
* We don't know at compile-time whether a given texture load will read from an
RGB32 buffer texture or not, so we need to emit code for both. In Vulkan, we
can't key the shader to this property, either, since it's descriptor set state
and not pipeline state.
And each of these problems in turn brings its own solution:
* The texture descriptor is linear, so the "compression buffer address" field is
ignored by the hardware. We stash the real buffer size there so that
textureSize becomes a load from the texture descriptor like usual, without
requiring a sideband (which would complicate bindless textures).
* If we determine a texture descriptor contains RGB32 data, then it will never
be interpreted by the hardware and hence does not need to be a valid texture
descriptor. So, we extend the hardware's format enum to contain a
software-defined RGB32 format enum. Then, when lowering texture buffer loads,
we either read it as a typed RGB32 memory load or as a texture load depending
on the value of the format field in the texture descriptor.
All of this is accomplished with a big NIR pass generating a pile of strange
looking code. But it should be good enough in practice for this silly feature.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21672>
In NIR, texelFetch (txf) does not use a sampler, but in AGX, it does -- even
though the contents of the sampler are semantically irrelevant. Rather than
requiring the state tracker to bind a sampler anyway (indicated for texture
buffers with PIPE_CAP_TEXTURE_BUFFER_SAMPLER), just add a dummy sampler
ourselves if txf is used and there are otherwise no samplers. This is helpful
because PIPE_CAP_TEXTURE_BUFFER_SAMPLER isn't honoured by Rusticl or seemingly
mesa/st's PBO code, and after implementing this dummy sampler workaround in
Panfrost for Rusticl, I realized this CAP is silly and shouldn't exist in the
first place. (And I regret pushing for its reinclusion.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21672>
This commit ensures that we are using mesa release builds in performance
jobs.
To achieve that, some modifications were made on top of
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21492.
- Append the `BUILDTYPE` variable into the S3 artifact name
(MINIO_ARTIFACT_NAME environment variable) to allow for better
artifact management.
- The ./artifacts directory has been added to the list of artifact
directories for build-common. This ensures that the debian-release and
debian-arm64-release jobs are the only ones necessary for running
performance jobs. These jobs only produce artifacts via
prepare-artifacts.sh when we are under performance workflow.
- Make lava-submit.sh behave similar to baremetal jobs regarding
MINIO_ARTIFACT_NAME variable. For example, users can now easily
differentiate between mesa-arm64.tar.zstd and
mesa-arm64-release.tar.zstd by looking inside the `Downloading
artifacts from s3` Gitlab section.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21804>
The check in alloc_bo_from_cache() was skiping any try to get a bo
from cache but after use a protected bo was still being put in some
cache bucket and could be used for cases that don't require a
protected bo.
Using a protected bo in cases that don't require it can have
performance implications.
So here returning NULL when trying to get a cache bucket for a
protected bo, this will cause bo->real.reusable to be set to false
avoiding the bo to be reused.
Fixes: 9402ac8023 ("iris: handle protected BO creation")
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21824>
We don't have a way to tell the ZLS hardware to use linear buffers, so if a
buffer could be used for depth/stencil, we have to twiddle. This isn't a problem
in practice, since depth/stencil buffers can't be shared across processes or
mapped directly as linear.
Fixes faults in depthstencil-render-miplevels, which was picking linear for one
buffer because of a STAGING bind flag. But that won't work :-)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21753>
On Intel platforms, the uclz lowering if ufind_msb is either one
instruction better (Gfx7 and newer) or two instructions better (all
older platforms) than the ifind_msb implementations.
On platforms that use lower_find_msb_to_reverse, there should be no
difference.
All Haswell and newer Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 19938662 -> 19938634 (<.01%)
instructions in affected programs: 850 -> 822 (-3.29%)
helped: 2 / HURT: 0
total cycles in shared programs: 858467067 -> 858465538 (<.01%)
cycles in affected programs: 10080 -> 8551 (-15.17%)
helped: 2 / HURT: 0
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>
4d802df3aa loosened the type restrictions
on these opcodes to enable support for 64-bit ballot operations. In
doing so, it enabled 8-bit and 16-bit sizes as well.
It's impossible to get these sizes through GLSL or SPIR-V. None of the
lowering in nir_opt_algebraic can handle non-32-bit sizes. Almost no
drivers can handle non-32-bit sizes.
It doesn't seem possible to enforce anything other than "one bit size"
or "all bit sizes" in nir_opcodes.py. The only way it seems possible to
enforce this is in nir_validate. This is not ideal, but it be what it
be.
v2: Remove restriction on find_lsb. It is acutally possible to get this
via GLSL by doing findLSB() on a lowp value. findMSB declares its
parameter as highp, so that path is still impossible.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>
Fossil-db results:
All Intel platforms had similar results. (Ice Lake shown)
Cycles in all programs: 9098346105 -> 9098333765 (-0.0%)
Cycles helped: 6
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>
The 31-ufind_msb_rev(x) lowering only produces the correct result for
32-bit sources. ufind_msb_rev can also have 64-bit sources, and most
platforms are expected to lower this to 32-bit instructions with extra
logic operations.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>
Unlike ufind_msb, ifind_msb is only defined in NIR for 32-bit values, so
no @32 annotation is required.
No shader-db or fossil-db changes on any Intel platform.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>
Stipple lines now appear correctly when they are oblique.
Previously the number of steps of the stipple counter between two vertices
was calculated as the euclidian distance between them in screen space, however
the length occupied by pixel along a line is only `1` for lines that are either
vertical or horizontal and will be anywhere between `1` and `sqrt(2)`
for other cases.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21290>
Get the texture/sampler index from the texture/sampler_offset source (which
is an offset from 0 thanks to the lower_index_to_offset lowering) and feed it in
as corresponding 16-bit texture instruction sources.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21704>
For indirect indexing into the binding table. Note this does not handle packing
the bindless forms, since that's a bit more involved.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21704>
This is an inline function with a compile-constant switch, so I expect
the compiler wouldn't produce any better code like this, but for humans
it's easier to read when function calls are not embedded into other
function calls.
Signed-off-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21835>
When enabled, on gfx12 plus, we will add the sync nop instruction after
each instruction to make sure that current instruction depends on the
previous instruction explicitly.
This option will help us to get a hint if something is missing or broken
in software scoreboard pass.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21797>
Now that we aren't using them on Gfx8+ we can drop a lot of cruft.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>
This mainly lets the software scoreboarding pass correctly mark the
instructions, without needing to resort to fragile manual handling in
the generator.
We can also make small improvements. On Gfx 8LP-12.0, we no longer have
the restrictions about DWord alignment, so we can simply write each half
into its intended location, rather than writing it to the low DWord and
then shifting it in place.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>
I originally thought that we were intentionally emitting the legacy
opcodes here to make them opaque to the optimizer, so that it wouldn't
eliminate the explicit type conversions, as they're actually required
to do the quantization. But...we don't actually optimize those away
currently anyway. So...go ahead and use the helpers for consistency.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>
With the previous patch, we no longer need to special case this, as we
emit a MOV with an HF source, rather than F16TO32 with an UW source,
on all platforms that need scoreboarding.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>
This gets us a MOV at the IR level on Gfx8+ which should be more
optimizable than F16TO32. It also removes confusion about which
pipe which the instruction will run on.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>
We can just use the new builder helpers to get the optimization
advantages of a MOV on Gfx8+ while also getting the necessary F32TO16
on Gfx7.x and yet not worry too hard about it.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>
These take care of emitting the F32TO16/F16TO32 instructions on Gfx7.x
but otherwise just emit a type converting MOV on Gfx8+.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>
For converting half-float to float, we currently emit BRW_OPCODE_F16TO32
with a UW source, to match legacy Gfx7 behavior. In the generator, this
becomes a MOV with a HF source on Gfx8+. Unfortunately, this UW source
confuses the scoreboarding pass into thinking it's an integer source,
leading to incorrect SWSB annotations on Alchemist.
We should ultimately fix the IR to stop being so...legacy...here, but
this is the simplest fix for stable branches.
Fixes misrendering in Elden Ring and likely Sekiro: Shadows Die Twice.
Cc: mesa-stable
Tested-by: Chuansheng Liu <chuansheng.liu@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
References: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8018
References: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8375
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>
Wa_1409600907 was enabled for gen12+. It should not be applied for
platforms after gen12.0. Use generated helpers to ensure application
to all relevant platforms.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21743>
We used to set RADEON_FLAG_GTT_WC when wsi_info is set. This changes it
to set the flag for any external mem on vram, extending the logic for
apps using external memory directly.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21803>
CTS runs on stoney are currently taking ~20min to complete, which seems
to have begun with the upgrade to CTS 1.3.5.0. This is a bit too long in
and of itself, but it means that - assuming zero contention - a job that
has to be retried because the machine hung can take 40 minutes.
Aim to drop this to 15min turnaround by lowering the overall fraction
from 1/8th of the CTS to 1/11th.
As the jobs we run have been reshuffled, this adds a lot more expected
fails. As most of them categorise easily into patterns, group the
failures together in the file. Non-strict wide lines has passed since we
last ran it; the other failures all group into existing classes seen
for a long time.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21791>
Quake II uses GL_POINT_SMOOTH to render particles.
Zink currently requires `zink_emulate_point_smooth` to support that feature.
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21731>
The HW can technically execute 16-bit operations, but the restrictions on
16-bit ALU ops are so great that it ends up not being a win for
GLES-on-Vulkan to lower mediump to 16-bit operations, at least with the
current state of the Intel compiler. This brings zink-on-anv in line with
iris and angle-on-anv for mediump behavior (ANGLE uses RelaxedPrecision,
which we ignore).
Perf on some angle traces on my brya (ADL) and i9-9900K (CFL):
ADL zink pubg_mobile_battle_royale: +13.4574% +/- 5.2046% (n=5)
CFL zink pubg_mobile_battle_royale: +29.5332% +/- 0.646585% (n=6)
ADL zink aztec_ruins_high: +5.78027% +/- 4.80645% (n=4)
CFL zink aztec_ruins_high: -1.10641% +/- 0.140562% (n=12)
ADL zink trex_200: +5.86956% +/- 2.09633% (n=10)
CFL zink trex_200: +9.72136% +/- 0.749261% (n=10)
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21775>
Our TGL machines are currently slightly oversubscribed (max. 17 jobs in
a pipeline on 15 DUTs). They're also currently suffering from
thermally-induced GPU throttling (being investigated), and a
thundering-herd network load effect: as all 15 jobs start at once, we
end up saturating one of our network links.
The combination of all three of these things means that TGL is often our
long pole in CI runs. Until we can ameliorate the two issues
constraining throughput (and a third where an unreliable hardware UART
sometimes kills jobs when it shouldn't), halve the workload so we at
least have some breathing room to absorb them.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21790>
Seems to fix a hang in the following titles :
- Age of Empire 4
- Monster Hunter Rise
where the HW is hung on a PIPE_CONTROL after a GPGPU_WALKER but no
MEDIA_INTERFACE_DESCRIPTOR_LOAD was emitted since the switch from 3D
to GPGPU.
This would happen in the following case :
vkCmdBindPipeline(COMPUTE, cs_pipeline);
vkCmdDispatch(...);
vkCmdBindPipeline(GRAPHICS, gfx_pipeline);
vkCmdDraw(...);
vkCmdDispatch(...);
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17247>
The previous solution breaks potential loop header phis.
Move the dummy-break to the bottom of the loop.
Fixes: dEQP-VK.reconvergence.subgroup_uniform_control_flow_ballot.*
Fixes: a9c4a31d8d ('aco: handle NIR loops without breaks')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21736>
the recording rp_info may be a pointer to a member of the array being
reallocated, so test for this and re-set it to avoid invalid memory
access
found with this caselist:
KHR-GL46.texture_gather.offset-gather-unorm-2darray
KHR-GL46.texture_view.view_sampling
cc: mesa-stable
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21729>
PA_SC_VRS_OVERRIDE_CNTL is emitted when a framebuffer is bound because
it controls the VRS surface enable bit. Though, if a pipeline is bound
after the framebuffer is emitted, it can override the state. Remove it
completely since VRS for flat shading and RADV_FORCE_VRS are disabled.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20333>
On GFX11, VRS rate images can't use linear tiling and the swizzle mode
must be either SW_Z or SW_R.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20333>
NGG streamout performance is limited by the workgroup size, so make it as
large as possible.
Since this uses si_get_max_workgroup_size() to set the NGG workgroup size,
the side effect is that all GS is also getting an increase to 256, which
is OK.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21403>
This should help with NGG streamout performance, which is limited by
the workgroup size (it should be as large as possible).
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21403>
This moves the code to a separate branch to make it less intertwined
with the rest to allow sampler descriptor lowering later.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21403>
The hardware uses the register to premultiply GS vertex indices
in input VGPRs.
This changes the behavior as follows:
- VGT_ESGS_RING_ITEMSIZE is always 1 on gfx9-11, set in the preamble.
- The value is passed to the shader via current_gs_state (vs_state_bits).
- The shader does the multiplication.
The reason is that VGT_ESGS_RING_ITEMSIZE will be removed in the future.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21403>
The naming of aco_*_key didn't make sense because they
were never actually used as cache keys, only radv_*_key
are used as cache keys.
Rename the aco structs to aco_*_info instead.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21696>
v2 by Timur Kristóf:
- Rebase this patch on latest main.
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21696>
v2 by Timur Kristóf:
- Rebase this patch.
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21696>
Handle arguments that need a waitcnt without relying on
RADV specific VS input information.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21696>
This is to indicate when an argument was loaded from VMEM
and needs a waitcnt before it can be used.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21696>
Sometimes we can end up with a sequence where we need to flush a batch
with no clears and no draws (for ex, to get a fence). Promote these to
sysmem.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21747>
This patch will add support for dpb resize when low to high resolution
change/ svc use-cases.
With DPB tier1 type,vp9 svc decoder use cases are failed. This
Change will fix this[VCN1/VCN2].
Signed-off-by: SureshGuttula <suresh.guttula@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21548>