External(imported or exported) objects needs to have vm_id set to 0.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21885>
Xe KMD does a special caching handling for buffers that will be
scanout to display, so that is why it needs a flag set during
allocation.
Checking if VK_STRUCTURE_TYPE_WSI_MEMORY_ALLOCATE_INFO_MESA
is available in AllocateMemory() and marking the buffer as scanout.
All WSI code paths but one sets
VK_STRUCTURE_TYPE_WSI_MEMORY_ALLOCATE_INFO_MESA.
The only one that doesn't requires that WSI is initialize with
wsi_device_options.sw_device = true to be executed, what is not the
case for ANV.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21885>
The logic that updates `ctx->gfx_pipeline_state.final_hash` assumed that
the program is replaced. It is supposed to xor `final_hash` with the
hash first and then with the new hash however when the program is
updated it end up xor-ing the new hash twice so it does nothing.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Fixes: 15450d2c2e ("zink: incrementally hash all pipeline component hashes")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21925>
Having driver name in the renderer will be useful to differentiate
between open source and proprietary drivers as they can have different
feature sets/quirks.
Vulkan API version is also added to the name to match up with ANGLE.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21922>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16805>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16805>
This new intrinsic maps to the MTBUF instruction format on AMD GPUs
and represents a typed buffer load in NIR.
Also add an unsigned upper bound for the new intrinsic.
Code for that ported from aco_instruction_selection_setup.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16805>
We haven't measured any noteworthy perf improvement
from these, and they are difficult to port to NIR,
so remove them before the NIR based VS input lowering
in order to make it easier to bisect and analyze stats.
Fossil DB stats on Rembrandt (GFX10.3):
Totals from 21750 (16.12% of 134913) affected shaders:
VGPRs: 868512 -> 868664 (+0.02%); split: -0.00%, +0.02%
CodeSize: 64406804 -> 64397572 (-0.01%); split: -0.08%, +0.07%
MaxWaves: 567904 -> 567888 (-0.00%); split: +0.00%, -0.00%
Instrs: 12327212 -> 12324851 (-0.02%); split: -0.10%, +0.08%
Latency: 61367324 -> 61371204 (+0.01%); split: -0.04%, +0.05%
InvThroughput: 9687734 -> 9686000 (-0.02%); split: -0.03%, +0.01%
VClause: 248207 -> 303449 (+22.26%); split: -0.02%, +22.28%
SClause: 314942 -> 315564 (+0.20%); split: -0.09%, +0.29%
Copies: 921581 -> 921820 (+0.03%); split: -0.16%, +0.19%
Branches: 341964 -> 341967 (+0.00%); split: -0.00%, +0.00%
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16805>
It's possible to get a display list with no vertex buffers if the linker
eliminates all VS inputs or if the list was built with glArrayElement with
no enabled attribs.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21860>
New options depending on what you want to print:
- initnir = initial NIR of shader CSOs
- nir = final NIR of variants after all lowering
- initllvm = LLVM IR before optimizations
- llvm = final LLVM IR
- asm = asm
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21860>
In vkCreateImage() if block comrpessed format and VK_IMAGE_TILING_LINEAR is
used, then the app crashes in vega gpu.
This is because addrlib does not support BC + linear as from function
ValidateSwModeParams(). From Marek Olšák the addrlib behaviour is correct.
In pal driver, flags.texture is not set in DetermineSurfaceFlags() function
if BC + linear. pal driver does it because it is expected that the
BC + linear image is only used as transfer resource.
This patch sets RADEON_SURF_NO_TEXTURE flag if usage is not
VK_IMAGE_USAGE_SAMPLED_BIT and and VK_IMAGE_USAGE_STORAGE_BIT.
flags.texture flag is not set if RADEON_SURF_NO_TEXTURE and this fixes
the crash.
v1: set NO_TEXTURE if not SAMPLED or STORAGE (Marek Olšák)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21422>
Block compressed + linear format is not supported in addrlib. But these
surface can be used as transfer resource. RADEON_SURF_NO_TEXTURE flag
indicates not to set flags.texture flag in gfx9_compute_surface().
This will help to fix the vkCreateImage() crash where block
compressed + linear format image is requested.
v2: combine RADEON_SURF_NO_TEXTURE to below line (Marek Olšák)
v1: add RADEON_SURF_NO_TEXTURE flag (Marek Olšák)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21422>
With the CTS coverage and tu featureset extending, these jobs have been
reliably timing out for a while. I've updated the xfails based on a
single run, we'll see how that goes in the upcoming nightlies.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21879>
Indeed, ureg_get_tokens() returns an allocated string that should be
freed using ureg_free_tokens().
For instance, with "piglit/bin/arb_shader_image_load_store-invalid -auto -fbo"
Direct leak of 768 byte(s) in 2 object(s) allocated from:
#0 0x7fa819a78b48 in __interceptor_realloc (/usr/lib64/libasan.so.6+0xb1b48)
#1 0x7fa80e189e04 in tokens_expand ../src/gallium/auxiliary/tgsi/tgsi_ureg.c:239
#2 0x7fa80e189e04 in get_tokens ../src/gallium/auxiliary/tgsi/tgsi_ureg.c:262
#3 0x7fa80e191f6e in copy_instructions ../src/gallium/auxiliary/tgsi/tgsi_ureg.c:2079
#4 0x7fa80e191f6e in ureg_finalize ../src/gallium/auxiliary/tgsi/tgsi_ureg.c:2129
#5 0x7fa80e19447b in ureg_get_tokens ../src/gallium/auxiliary/tgsi/tgsi_ureg.c:2206
#6 0x7fa80ec68b91 in si_create_fmask_expand_cs ../src/gallium/drivers/radeonsi/si_shaderlib_tgsi.c:564
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21871>
Now that we don't do GLSL IR constant propagation or constant folding, we
can leave constant variable detection and handling to NIR. This also
avoids some OOB array access in GLSL IR in a piglit test!
Freedreno stats again look like noise:
total instructions in shared programs: 2718412 -> 2718746 (0.01%)
instructions in affected programs: 80497 -> 80831 (0.41%)
total last-baryf in shared programs: 110015 -> 110510 (0.45%)
last-baryf in affected programs: 35263 -> 35758 (1.40%)
total full in shared programs: 189486 -> 189480 (<.01%)
full in affected programs: 52 -> 46 (-11.54%)
total constlen in shared programs: 494540 -> 494496 (<.01%)
constlen in affected programs: 452 -> 408 (-9.73%)
total sstall in shared programs: 198297 -> 197928 (-0.19%)
sstall in affected programs: 3691 -> 3322 (-10.00%)
total systall in shared programs: 432150 -> 431799 (-0.08%)
systall in affected programs: 6070 -> 5719 (-5.78%)
total waves in shared programs: 435098 -> 435110 (<.01%)
waves in affected programs: 92 -> 104 (13.04%)
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21751>
NIR is happy to take care of constant folding for us, and it's easier to
do in SSA.
This required adjusting of lower_precision unit tests to have un-folded
constants.
freedreno results look like noise. Some excerpts:
total instructions in shared programs: 2718343 -> 2718412 (<.01%)
instructions in affected programs: 6847 -> 6916 (1.01%)
total last-baryf in shared programs: 109992 -> 110015 (0.02%)
last-baryf in affected programs: 117 -> 140 (19.66%)
total sstall in shared programs: 198312 -> 198297 (<.01%)
sstall in affected programs: 148 -> 133 (-10.14%)
total systall in shared programs: 432163 -> 432150 (<.01%)
systall in affected programs: 1016 -> 1003 (-1.28%)
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21751>
glsl has been constant-propagating out references to ir->constant_value
(the value of a variable declared as const), but we can get rid of that
whole pass if we just have glsl-to-nir hand the constant propagating
problem off to NIR.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21751>
These passes were missing the macros to handle debug output and extra
validation. But also, for working on GLSL, it's nice to see the raw
output of glsl-to-nir before you move on.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21751>
The whole usage of this flag is to call iris_use_pinned_bo() with
writable argument, for that we don't need any i915_drm.h specific type.
IRIS_BLORP_RELOC_FLAGS_EXEC_OBJECT_WRITE could have any other value but
keeping the same as i915_drm.h.
With this we can drop 2 i915_drm.h imports from generic Iris code.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21887>
Those are i915_drm.h specific types and should not be in code paths
shared by i915 and Xe KMD.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21887>
Skip heavyweight crashing tests that have the potential to take down not just
themselves but also other Piglit tests running concurrently via piglit-runner
(which would otherwise become piglit-runner level flakes).
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21891>
These are (have always been) quite broken. Given that the whole section is
already in the flakes.txt, and there's no plan for improving this (I've tried
and fails), I'd rather just skip the section and reduce the noise in
the #panfrost-ci channel.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21891>
This is really dumb.
But this fixes arb_shader_language_420pack-active-sampler-conflict on v7 which
otherwise dereferences a null pointer trying to access the nonexistant texture
arrays, or DATA_INVALID_FAULTs if you give it a texture array filled with
zeroes. But it seems happy if you bind in null textures. This is dumb but less
faults in Piglit is good for reducing flakes.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21891>
mesa/st doesn't like to use 24-bit textures, preferring RGBX over true RGB even
for texture views where this isn't valid. Given how silly true RGB is in
practice, I'd rather drop support and fix texture views than go against the
grain and risk more issues down the line since nobody else in tree is testing
these paths and apps really shouldn't be caring.
Fixes page faults in arb_texture_view-rendering-formats_gles3 which tries to
sample an R8G8B8_UINT texture with a R8G8B8X8_UNORM view in one subcase. That
test is now passing reliably.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21891>
This is signed, not unsigned. We were already passing negatives and silently
relying on 2's complement and C to do the right thing. But that's silly. We
should just, actually do the right thing.
Found while struggling to debug primitive-restart-draw-mode.
v2: Update the other architectures too, including a decode_csf.c change for the
v10 incarnation of this v4-era field.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net> [v1]
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21891>
We just want a bit-exact transfer for integers. Using .auto32 accomplishes this
without any clamping shenanigans. Fixes gl-3.0-vertexattribipointer.
Note we can't use .auto32 unconditionally, since reading a uint vertex as float
is supposed to convert (or something like that, gl-2.0-vertexattribpointer tests
the bad case at any rate).
Fixes: 482cc273af ("pan/bi: Implement load attribute with the builder")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21891>
draw->index_bounds_valid tells drivers that the values of min_index/max_index
are set correctly and can be used e.g. to allocate memory for varyings. If set
incorrectly, the GL promises badness.
But, with primconvert, we go mucking with index buffers and then never update
the bounds. So it doesn't matter if the original index bounds were valid, we
can't promise the original bounds are *still* valid. If we were trying to
optimize CPU overhead, we could try to preserve the new min/max index but seeing
as only older Mali cares about this flag, and if you're using primconvert you're
already screwed, I'm not too inclined to go rework primconvert.
Fixes* page faults in primitive-restart-draw-mode on Mali-G52 for GL_QUAD_STRIPS
and GL_POLYGON, which hit the primconvert path. The full dmesg splat looks like:
[ 5438.811727] panfrost ffe40000.gpu: Unhandled Page fault in AS0 at VA 0x000000100A16BAC0
Reason: TODO
raw fault status: 0x25002C1
decoded fault status: SLAVE FAULT
exception type 0xC1: TRANSLATION_FAULT_1
access type 0x2: READ
source id 0x250
Notice that a high bit is randomly set in the address, this is trying to read
a varying from the actual varying buffer in the vicinity of 0xa16bac0. What's
actually happening is that we're trying to read index #0 despite promising the
driver a minimum index of 2, causing an integer underflow as we try to read
index -2, or as the hardware sees, 4294967294.
As long as we stop lying to panfrost about the bounds being correct, panfrost is
able to calculate the real (post-primconverted) bounds on its own, fixing the
test.
* Alternatively, maybe Panfrost should just ignore this bit, in which I don't
know why we have it in Gallium, since it's probably not conformant to fault on
out-of-range glDrawRangeElements.
Fixes: 72ff53098c ("gallium: add pipe_draw_info::index_bounds_valid")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21891>
depth_stencil_attachment has been changed from a pointer to the attachment idx
to just the attachment idx, as this avoids the driver having to check for NULL
when comparing attachments indexes with depth_stencil_attachment.
Anyplace that relies on depth_stencil_attachment being a valid index must
already check that depth_stencil_attachment is not VK_ATTACHMENT_UNUSED, so
this change avoids having to check both the pointer and the index for the same
information.
Noticed when running dEQP-VK.api.smoke.triangle
Signed-off-by: Jarred Davies <jarred.davies@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21690>
This also makes vn_QueueSignalReleaseImageANDROID async since it makes
use of a queue submit followed by an external fence export internally.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21716>
vn_instance_roundtrip does 2 things:
1. vn_instance_submit_roundtrip
- before: encode a cmd to write vq seqno to ring extra field
- after: encode a cmd to update vq seqno against a ring
- submit the encoded cmd via vq
2. vn_instance_wait_roundtrip
- before: wait until ring extra field has the vq seqno
- after: let renderer ring thread wait for the vq seqno
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21716>
Up until now, we have been handrolling part of the init-stage2.sh in
the b2c command line. Let's stop doing that and instead use the same
script as every other HW farms.
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21872>
First make sure shadow mask change sets dirty state.
Second move shadow mask bit removal to unbind_samplerview which
is cleaner and correctly clears the shadow bit when binding buffer texture.
Fixes: 5193f4f712 ("zink: add a fs shader key member to indicate depth texturing mode")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21571>
usually zink_get_cmdbuf() is enough for reordering operations, but
with new technology, it becomes possible to promote even the most stubborn
buffers to the unordered cmdbuf
first, check the src buffer to ensure that there's no pending writes in
the main cmdbuf that would prohibit reordering
second, apply a TRANSFER_DST to the dst buffer using the util function
to determine whether it can be reordered
if both the src and dst can be reordered for their respective regions
and read/write usage, then the entire op can be promoted regardless of
the unordered_read/unordered_write flags
this optimizes out patterns like
upload index buffer (offset=0)
draw
upload index buffer (offset=128)
draw
upload index buffer (offset=256)
draw
...
so that the uploads and draws can be separated and batched
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21802>
I wanted to find slow pieces of code in our Anv driver using our
drm-shim stub.
The last bit of code still talking to the compositor was the WSI
swapchain code and failing because none of the submissions are taking
place (because of the stub).
This change introduces a new variable MESA_VK_WSI_HEADLESS_SWAPCHAIN
which when set turns every swapchain creation into a headless
swapchain. This swapchain does not present anything, allowing the
application to spin as many frames as possible. Thus helping to
identify slow spots in command buffer building path.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6156>
nir->info.subgroup_size can be set to an enum :
SUBGROUP_SIZE_VARYING = 0
SUBGROUP_SIZE_UNIFORM = 1
SUBGROUP_SIZE_API_CONSTANT = 2
SUBGROUP_SIZE_FULL_SUBGROUPS = 3
So compute the API subgroup size value and compare it to the dispatch
size to determine whether we need some bound checking.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 9ac192d79d ("intel/fs: bound subgroup invocation read to dispatch size")
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21856>
I'm not planning to stand mesa-swrast back up until we get Kata set up, so
turn the testing back on at a reduced fraction on so that
venus/llvmpipe/etc. dev can still get some coverage.
I haven't turned lavapipe back on, because it is now unstable in memory
model / atomics tests.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21880>
For instance, to load uniform data with the LSC we usually rely on
tranpose messages which have to execute in SIMD1. Those end up being
considered as partial writes so within loops their life span spread to
the whole loop, increasing register pressure.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21867>
on pipeline bind with dynamic state, depth_clip_near needs to either be set by
* applying the dynamic state
* using the pipeline state
the previous code always used the pipeline state
fixes:
dEQP-VK.pipeline.*.extended_dynamic_state.between_pipelines.depth_clamp_enable
Fixes: 650880105e ("vulkan,lavapipe: Use a tri-state enum for depth clip enable")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21814>
if it's known that a renderpass is active and the driver wants to do
renderpass optimizing, help out by not forcing a sync and instead doing
what the driver would do: create a staging buffer and copy it to the
image
this requires that the driver already handles buffer -> image copies
with resource_copy_region
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21801>
It is legal to pass in nullptr as an instance into
vkGetInstanceProcAddr when resolving any global addresses, this
wasn't handled correctly and an illegal access to a member of
a null struct was made.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21827>
This commit rewrites the KGSL backend to utilize vk common wherever
possible to bring the codebase in line with DRM while implicitly
fixing minor API bugs that may have occurred as a result of manually
implementing VK functions.
As a part of moving to vk common, KGSL sync is now implemented
atop vk common sync and vastly expanded in terms of functionality
such as:
* Import/Export of sync FDs - A required capability for properly
supporting the Android WSI and as these functions were stubbed
when a presentation operation used semaphores, it would cause a
leak of FDs that were imported due to the expectation that the
driver would close them. As well as causing UB around due to
ignoring the imported FD or not exporting a valid FD.
* Supporting pre-signalled fences - Vulkan allows fences to be
created in a signalled state which was stubbed prior and can
lead to UB.
* Timeline semaphore support - As a result of utilizing vk common
as the backbone for synchronization, its timeline semaphore
emulation has been utilized to provide support for them without
needing kernel support. (Note: On newer versions of KGSL,
timeline semaphores can be implemented natively rather than
using emulation as they support wait-before-signal)
Fixes freezes due to semaphore usage with presentation on:
* Genshin Impact
* Skyline Emulator
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21651>
Enough that we can use OUT_PKT() to emit it, which will be needed when
we use it to write regs that are different btwn a6xx and a7xx.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21846>
Originally if we had an anonymous field (ie. field declared as part of
the register definition itself) the name in the generated field struct
would include the gen prefix (ie. .a6xx_rb_stencil_buffer_pitch), but
this doesn't work for variants because the variant regs would have
different gen prefixes. Fix this by using reg name instead of the
full_name.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21846>
For regs with multiple variants, generate a template'ized function to
pack the reg value. If the template param is known at compile time
(which is the expected usage) this will optimize to the same thing as
the "traditional" reg packing.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21846>
Track varset and assert that variants refer to a valid varset enum
value. This adds a bit of extra sanity checking, but becomes more
useful in the next commit.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21846>
They have more similarities than differences, so merge them and use
"variant" attribute as needed to manage differences.
Note initially using "variant" conservatively when it comes to regs
known on a7xx but not a6xx. It could be that they exist also on later
versions of a6xx as well, for example. For ex, LPAC related regs/bits
likely existed on later a6xx (eg. a660 family) but BV stuff is not.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21846>
To merge a7xx and a6xx regs, using variant property to manage the
differences, we'll want regs/etc to be named according to the first
generation it is use rather than the domain name. Add a new prefix
type to accomplish this. By default, if no variant property, things
will still be named based on domain (ie. REG_A6XX_...), and things
that have variant="A6XX" will also end up as they currently are
(since the chip enum matches domain name), but things that have
variant="A7XX" will end up as REG_A7XX_...
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21846>
Clang seems more relaxed about this, allowing C99 style initializers
without requiring ordering. But unfortunately g++ is more picky :-/
TODO this doesn't completely fix everything with g++, namely sparse
array initialization.. for ir3 driver-params, I think we can convert
these to structs. But there are still one or two others to deal with.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21846>
C++ is more picky about a goto jumping over variable initialization,
even if unused after the goto label (presumably because of destructors
that can be called after a variable goes out of scope). Since there is
only a single fallback path, get rid of the goto.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21846>
If both, any-hit and intersection shader, use scratch vars,
it could happen that they end up in the same location and
overwrite each other.
Found by inspection.
Fixes: c3d82a9622 ('radv: Add pass to lower anyhit shader into an intersection shader.')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21863>
This should allow us to remove a big memset when compiling a
graphics pipeline. This is mostly for imported NIR stages which
don't go through radv_pipeline_stage_init().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20947>
This should allow us to remove a big memset when compiling a
graphics pipeline. This is mostly for imported NIR stages which
don't go through radv_pipeline_stage_init().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20947>
In file included from ../src/tool/pps/pps_device.cc:10:
../src/tool/pps/pps_device.h:23:11: error: ‘uint32_t’ does not name a type
23 | static uint32_t device_count();
| ^~~~~~~~
In file included from ../src/tool/pps/pps_counter.cc:10:
../src/tool/pps/pps_counter.h:22:4: error: ‘uint32_t’ does not name a type
22 | uint32_t id;
| ^~~~~~~~
Fixes: 1cc72b2aef ("pps: Gfx-pps v0.3.0")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8186
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21714>
We have 3 new/changed functions with this commit:
1. _mesa_alloc_dispatch_tables creates all dispatch tables that are not
created on demand and sets them to nop. This operates on gl_dispatch,
so it's reusable (e.g. glthread will want to use it)
2. _mesa_free_dispatch_tables frees everything
3. _mesa_initialize_dispatch_tables initializes gl_dispatch for GL
(not glthread)
Acked-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21777>
I like this more. The name self-documents itself. It's always equal
to the dispatch set in glapi.
GLAPI is a definition, so can't use that.
Acked-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21777>
There is a new struct gl_dispatch, which I'd like to reuse in glthread.
This allows building code around gl_dispatch that can be shared between
mesa and glthread. This is only refactoring.
Acked-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21777>
Prepare for Clover removal; don't waste resources on Clover anymore.
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21865>
Upstream moved Host.h from Support to TargetParser in LLVM 17.
This shouldn't lead to a FTBFS, since there is a forwarding include left
behind. Sadly the added deprecation warning #pragma is invalid and thus
causes a build failure right away. But since we would have to follow the
move anyway in the future, just do it right away.
Reference: d768bf994f
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Closes: #8275
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21263>
After enabling the Wayland platform for x86_64,
multiple new tests were triggered, some of which timed out.
Also wayland-dEQP-EGL.functional.negative_api.create_pixmap_surface now pass.
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21786>
Currently, the generated tests consist of some boilerplate, generated
test cases, and at the very end the actual test. This is bad for readability,
because the actual code is all the way at the bottom. It's also bad for
clang-format linting: even though the test cases are /* clang-format off */,
they still take an exceptionally long time to parse when linting. I suspect this
is a clang-format bug, but it's easy enough to workaround.
To solve these issues, restructure so that the test cases are in separate files
(containing the actual data), but the manually written test functions are
consolidated into a new family of generated layout tests. This is probably
cleaner.
Parallel clang-format linting is now 10x faster on the M1, which means it's
now practical to lint in my "publish branch" hook.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21854>
Not using it yet, that will be done in the next patch.
Xe only supports submission using VM.
For i915 the backend functions are just a noop.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21698>
As noted on [1], lowering patterns of the form
floatS -> floatB -> floatS ==> floatS
cannot require precision since this may cause flush denorming.
[1] 3f779013 ("nir: Add an algebraic optimization for float->double->float")
Fixes: b86305bb ("nir/algebraic: collapse conversion opcodes (many patterns)")
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Signed-off-by: Isabella Basso <isabellabdoamaral@usp.br>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20965>
Conversions on smaller bit sizes should also be collapsed when composed.
This also adds more patterns on the
intS -> intB -> floatB ==> intS -> floatB
lowering so as to deal with any int size C > B instead of a fixed intB.
Closes: #7776
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Signed-off-by: Isabella Basso <isabellabdoamaral@usp.br>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20965>
Some patterns were outside the list of optimizations.
Fixes: b86305bb ("nir/algebraic: collapse conversion opcodes (many patterns)")
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Signed-off-by: Isabella Basso <isabellabdoamaral@usp.br>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20965>
Rather than ingesting separate control and memory barriers, ingest only the
combined and optimized scoped_barrier intrinsic. For barriers originating from
GLSL, this makes it easier to ensure correctness. For barriers originating from
SPIR-V, this is required for translation at all, as spirv_to_nir knows only
scoped barriers. So this gets us closer to Vulkan and OpenCL.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21752>
New 10 devices - asus-CM1400CXA-dalboz hosted on Collabora farm.
1x Move VA-API tests to the dalboz (more resources). One timeout dropped.
9x Run VKCTS on dalboz.
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21702>
Lowers away 64-bit loads, which we'll create in the sysval lowering for
dynamically indexed UBOs/VBOs. The lowering generates pack_64_2x32 instructions,
so lower those too.
No shader-db changes.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21674>
Instead of lowering to bitwise ops. Yet another way of subdividing in NIR.
Probably insignificant but makes it easy to check that the pass ordering from the
previous pass is right. It does let us get much better codegen for
unpacksnorm2x16, whatever that's worth.
No shader-db changes.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21674>
As intended. We can't CSE with partial null destinations in the way, so we
shouldn't eliminate dead destinations until after CSE has run. But we should
still eliminate dead instructions to ensure CSE doesn't move things around
needlessly, hurting register pressure.
Noticed while debugging live range splitting.
No GLES3.0 shader-db changes.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21674>
Our dead code elimination pass does two things:
1. delete instructions that are entirely unnecessary
2. delete unnecessary destinations of necessary instructions
To deal with pass ordering issues, we sometimes want to do #1 without #2.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21674>
We should handle nir_op_unpack_32_2x16_split_* natively, since we can generate
better code with agx_subdivide (coalescing the ops away) than the bitshift
lowering.
That said, we do need some extra instructions for the floating point
conversions.
No shader-db changes (which makes sense because we're targetting the GLES3.0
shader-db, which doesn't have the packing GLSL functions).
The real motivation of this change isn't optimizing some GLSL pack functions,
though, it's avoiding a code regression from using NIR's memory bit size
lowering in a future MR. That lowering will turn things like "load i16vec4" into
"load i32vec2 + unpack_32_2x16", so we need to be able to coalesce that unpack.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21674>
At some point in a refactoring long ago, our 'Piglit' runs on arm64
started actually being dEQP-GLES2 runs. Oh dear.
Surprisingly, there are a number of expectation changes; added every
fail I saw from a long overnight stress test.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21851>
Minimum/maximum LOD and LOD bias are unsigned and signed fixed point formats
respectively. They are not unsigned integers. Introduce fixed-point types into
our GenXML and use them in the XML, rather than packing in sidebands. This makes
the XML more correct and fixes pretty-printing of texture and sampler
descriptors.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20445>
Connor Abbott wrote a nice explanation of how instance divisors work on Mali.
Let's add it to the driver docs instead of letting it languish in a forgotten
header file.
This is mostly pasted from the existing header in tree, with a few local changes
applied.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20445>
Nowadays, formats are defined with GenXML, not the old panfrost-job.h, so most
of the format #defines in panfrost-job.h are unused. That said, a few are still
in use as a backdoor for compressed format queries to avoid a GenXML dependency.
That's not great but cleaning that up isn't the subject of this MR.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20445>
XE architecture enables many more metrics, perhaps too many for
the average user. Reduce reported metrics to smaller subset,
known as non-extended metrics, by default. Can re-enable extended
metrics with env var INTEL_EXTENDED_METRICS=1
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21841>
Implement buffer textures in full generality. There are a few issues here:
* OpenGL requires buffer textures support a minimum size of 65536 elements,
however 1D textures in AGX are (at most) 8192 elements.
* OpenGL 4.0 (and OpenGL ES) require buffer textures to support the "RGB32"
texture formats. These are 3 packed channels of 32-bits each. In general,
non-power-of-two texel sizes are problematic. AGX does not support any such
formats and we rely on the GL frontend to lower to a padded format (RGBX) if
necessary. Such a lowering cannot work for buffer textures, however, so we
need to find a way to implement RGB32 buffer textures.
We solve these issues in the follow way:
* Use 2D texture descriptors for buffer textures, with a large fixed
power-of-two size along one axis. Then large texel indices may be accessed at
a small vec2 texel coordinate, and since the fixed dimension is a
power-of-two, that vector may be recovered by simply shifting and masking.
This effectively avoids size restriction. We do need to clamp texel indices to
the buffer size to avoid faulting on OOB reads, since we may read past the end
of the buffer (if the app binds a non-page-aligned offset into the buffer).
* Use a general purpose memory load for RGB32 buffer textures. Lower the texture
load instruction to a memory load from the buffer and some address arithmetic.
There's no format conversion needed for RGB32, other than maybe filling in a
format-appropriate alpha, so this is straightforward. Again, we need to clamp
the texel index for robustness with OOB reads.
Each of these solutions brings its own problem.
* Using 2D textures instead of 1D requires physically rounding up the buffer
size when packing the descriptor, so we can no longer implement textureSize()
by reading off the texture descriptor like normal.
* We don't know at compile-time whether a given texture load will read from an
RGB32 buffer texture or not, so we need to emit code for both. In Vulkan, we
can't key the shader to this property, either, since it's descriptor set state
and not pipeline state.
And each of these problems in turn brings its own solution:
* The texture descriptor is linear, so the "compression buffer address" field is
ignored by the hardware. We stash the real buffer size there so that
textureSize becomes a load from the texture descriptor like usual, without
requiring a sideband (which would complicate bindless textures).
* If we determine a texture descriptor contains RGB32 data, then it will never
be interpreted by the hardware and hence does not need to be a valid texture
descriptor. So, we extend the hardware's format enum to contain a
software-defined RGB32 format enum. Then, when lowering texture buffer loads,
we either read it as a typed RGB32 memory load or as a texture load depending
on the value of the format field in the texture descriptor.
All of this is accomplished with a big NIR pass generating a pile of strange
looking code. But it should be good enough in practice for this silly feature.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21672>
In NIR, texelFetch (txf) does not use a sampler, but in AGX, it does -- even
though the contents of the sampler are semantically irrelevant. Rather than
requiring the state tracker to bind a sampler anyway (indicated for texture
buffers with PIPE_CAP_TEXTURE_BUFFER_SAMPLER), just add a dummy sampler
ourselves if txf is used and there are otherwise no samplers. This is helpful
because PIPE_CAP_TEXTURE_BUFFER_SAMPLER isn't honoured by Rusticl or seemingly
mesa/st's PBO code, and after implementing this dummy sampler workaround in
Panfrost for Rusticl, I realized this CAP is silly and shouldn't exist in the
first place. (And I regret pushing for its reinclusion.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21672>
This commit ensures that we are using mesa release builds in performance
jobs.
To achieve that, some modifications were made on top of
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21492.
- Append the `BUILDTYPE` variable into the S3 artifact name
(MINIO_ARTIFACT_NAME environment variable) to allow for better
artifact management.
- The ./artifacts directory has been added to the list of artifact
directories for build-common. This ensures that the debian-release and
debian-arm64-release jobs are the only ones necessary for running
performance jobs. These jobs only produce artifacts via
prepare-artifacts.sh when we are under performance workflow.
- Make lava-submit.sh behave similar to baremetal jobs regarding
MINIO_ARTIFACT_NAME variable. For example, users can now easily
differentiate between mesa-arm64.tar.zstd and
mesa-arm64-release.tar.zstd by looking inside the `Downloading
artifacts from s3` Gitlab section.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21804>
The check in alloc_bo_from_cache() was skiping any try to get a bo
from cache but after use a protected bo was still being put in some
cache bucket and could be used for cases that don't require a
protected bo.
Using a protected bo in cases that don't require it can have
performance implications.
So here returning NULL when trying to get a cache bucket for a
protected bo, this will cause bo->real.reusable to be set to false
avoiding the bo to be reused.
Fixes: 9402ac8023 ("iris: handle protected BO creation")
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21824>
We don't have a way to tell the ZLS hardware to use linear buffers, so if a
buffer could be used for depth/stencil, we have to twiddle. This isn't a problem
in practice, since depth/stencil buffers can't be shared across processes or
mapped directly as linear.
Fixes faults in depthstencil-render-miplevels, which was picking linear for one
buffer because of a STAGING bind flag. But that won't work :-)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21753>
On Intel platforms, the uclz lowering if ufind_msb is either one
instruction better (Gfx7 and newer) or two instructions better (all
older platforms) than the ifind_msb implementations.
On platforms that use lower_find_msb_to_reverse, there should be no
difference.
All Haswell and newer Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 19938662 -> 19938634 (<.01%)
instructions in affected programs: 850 -> 822 (-3.29%)
helped: 2 / HURT: 0
total cycles in shared programs: 858467067 -> 858465538 (<.01%)
cycles in affected programs: 10080 -> 8551 (-15.17%)
helped: 2 / HURT: 0
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>
4d802df3aa loosened the type restrictions
on these opcodes to enable support for 64-bit ballot operations. In
doing so, it enabled 8-bit and 16-bit sizes as well.
It's impossible to get these sizes through GLSL or SPIR-V. None of the
lowering in nir_opt_algebraic can handle non-32-bit sizes. Almost no
drivers can handle non-32-bit sizes.
It doesn't seem possible to enforce anything other than "one bit size"
or "all bit sizes" in nir_opcodes.py. The only way it seems possible to
enforce this is in nir_validate. This is not ideal, but it be what it
be.
v2: Remove restriction on find_lsb. It is acutally possible to get this
via GLSL by doing findLSB() on a lowp value. findMSB declares its
parameter as highp, so that path is still impossible.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>
Fossil-db results:
All Intel platforms had similar results. (Ice Lake shown)
Cycles in all programs: 9098346105 -> 9098333765 (-0.0%)
Cycles helped: 6
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>
The 31-ufind_msb_rev(x) lowering only produces the correct result for
32-bit sources. ufind_msb_rev can also have 64-bit sources, and most
platforms are expected to lower this to 32-bit instructions with extra
logic operations.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>
Unlike ufind_msb, ifind_msb is only defined in NIR for 32-bit values, so
no @32 annotation is required.
No shader-db or fossil-db changes on any Intel platform.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>
Stipple lines now appear correctly when they are oblique.
Previously the number of steps of the stipple counter between two vertices
was calculated as the euclidian distance between them in screen space, however
the length occupied by pixel along a line is only `1` for lines that are either
vertical or horizontal and will be anywhere between `1` and `sqrt(2)`
for other cases.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21290>
Get the texture/sampler index from the texture/sampler_offset source (which
is an offset from 0 thanks to the lower_index_to_offset lowering) and feed it in
as corresponding 16-bit texture instruction sources.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21704>
For indirect indexing into the binding table. Note this does not handle packing
the bindless forms, since that's a bit more involved.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21704>
This is an inline function with a compile-constant switch, so I expect
the compiler wouldn't produce any better code like this, but for humans
it's easier to read when function calls are not embedded into other
function calls.
Signed-off-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21835>
When enabled, on gfx12 plus, we will add the sync nop instruction after
each instruction to make sure that current instruction depends on the
previous instruction explicitly.
This option will help us to get a hint if something is missing or broken
in software scoreboard pass.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21797>
Now that we aren't using them on Gfx8+ we can drop a lot of cruft.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>
This mainly lets the software scoreboarding pass correctly mark the
instructions, without needing to resort to fragile manual handling in
the generator.
We can also make small improvements. On Gfx 8LP-12.0, we no longer have
the restrictions about DWord alignment, so we can simply write each half
into its intended location, rather than writing it to the low DWord and
then shifting it in place.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>
I originally thought that we were intentionally emitting the legacy
opcodes here to make them opaque to the optimizer, so that it wouldn't
eliminate the explicit type conversions, as they're actually required
to do the quantization. But...we don't actually optimize those away
currently anyway. So...go ahead and use the helpers for consistency.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>
With the previous patch, we no longer need to special case this, as we
emit a MOV with an HF source, rather than F16TO32 with an UW source,
on all platforms that need scoreboarding.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>
This gets us a MOV at the IR level on Gfx8+ which should be more
optimizable than F16TO32. It also removes confusion about which
pipe which the instruction will run on.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>
We can just use the new builder helpers to get the optimization
advantages of a MOV on Gfx8+ while also getting the necessary F32TO16
on Gfx7.x and yet not worry too hard about it.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>
These take care of emitting the F32TO16/F16TO32 instructions on Gfx7.x
but otherwise just emit a type converting MOV on Gfx8+.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>
For converting half-float to float, we currently emit BRW_OPCODE_F16TO32
with a UW source, to match legacy Gfx7 behavior. In the generator, this
becomes a MOV with a HF source on Gfx8+. Unfortunately, this UW source
confuses the scoreboarding pass into thinking it's an integer source,
leading to incorrect SWSB annotations on Alchemist.
We should ultimately fix the IR to stop being so...legacy...here, but
this is the simplest fix for stable branches.
Fixes misrendering in Elden Ring and likely Sekiro: Shadows Die Twice.
Cc: mesa-stable
Tested-by: Chuansheng Liu <chuansheng.liu@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
References: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8018
References: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8375
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>
Wa_1409600907 was enabled for gen12+. It should not be applied for
platforms after gen12.0. Use generated helpers to ensure application
to all relevant platforms.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21743>