The unvanquished flaked time to time from beginning, minetest-v2 has
occasional 1 tiny change in the pixel.
Acked-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21990>
Can you spot the problem?
exp param0 v6, v5, v5, v5
exp param1 v7, off, off, off
exp param1 v7, off, off, off
radeonsi uses ac_nir_optimize_outputs to eliminate output stores with
identical SSA defs (i.e. duplicated), which then causes 2 outputs to
map to the same parameter export.
This is a regression. The old LLVM code was correctly emitting each
export only once. vs_output_param_mask was supposed to be used for
this instead of vs_output_param_offset.
Fixes: 80506be31b - ac/nir/ngg,radv,radeonsi: nogs use ac_nir_export_(position|parameter)
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21920>
Some helper functions in iris_bufmgr were also moved because the only
caller is in iris_i915_batch.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21965>
This way when replacing a broken context we don't need to ask to
kernel what is the priority of the context being replaced.
Also this will be necessary for Xe kmd as it don't have any uapi to
query engine priority.
While doing that also taking the oportunity to move more code from
iris_bufmgr.c/h that only has one caller.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21965>
Intel Gen9 GPUs have hardware ASTC support, but have a bug where they
don't handle denormalized values in void extent blocks correctly. This
isn't that hard to work around - on upload, we can detect such blocks,
and flush any denorms to zero. Because we're altering the data behind
the application's back, and applications can theoretically ask to
download the original unaltered image data, we unfortunately need to
maintain shadow copies of the data.
To make sure that we don't accidentally skip the void-extent flushing
via any fast-upload paths, and support download correctly, we plug this
into the st/mesa compressed texture format fallback paths, which store
a CPU copy of the original image data, and upload altered data.
This is unfortunately common code for what's likely to be a single
driver's issue (on a single generation), but it beats replicating an
entire framework we already have inside the driver.
Fixes dEQP-GLES3.functional.texture.compressed.astc.void_extent_ldr.*
using iris on Intel Gen9 GPUs.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4167
Reviewed-by: Emma Anholt <emma@anholt.net>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21943>
As Xe KMD requires VM, iris_bufmgr_init_global_vm() now is returing
a boolean telling if bufmgr creationg should continue or not.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21937>
Pointed out by GCC with LTO:
../src/gallium/drivers/iris/iris_context.c: In function 'iris_create_context':
../src/gallium/drivers/iris/iris_context.c:304:7: error: 'free' called on pointer 'block_180' with nonzero offset 48 [-Werror=free-nonheap-object]
304 | free(ctx);
| ^
[...]
../src/gallium/drivers/iris/iris_context.c:313:7: error: 'free' called on pointer 'block_180' with nonzero offset 48 [-Werror=free-nonheap-object]
313 | free(ctx);
| ^
v2:
* Use ice pointer instead of ctx. (Karol Herbst)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21781>
Pointed out by GCC with LTO:
../src/gallium/drivers/crocus/crocus_context.c: In function 'crocus_create_context':
../src/gallium/drivers/crocus/crocus_context.c:261:7: error: 'free' called on pointer 'block_174' with nonzero offset 48 [-Werror=free-nonheap-object]
261 | free(ctx);
| ^
v2:
* Use ice pointer instead of ctx. (Karol Herbst)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21781>
Fixes strict aliasing violation:
In function 'r600_init_resource_fields',
inlined from 'r600_buffer_create' at ../src/gallium/drivers/r600/r600_buffer_common.c:578:2:
../src/gallium/drivers/r600/r600_buffer_common.c:139:48: warning: array subscript 'struct r600_texture[0]' is partly outside array bounds of 'unsigned char[264]' [-Warray-bounds]
139 | if ((res->b.b.target != PIPE_BUFFER && !rtex->surface.is_linear) ||
| ^~~~~~~~~~~~~~~~~~~~~~~~
In file included from ../src/util/os_memory.h:37,
from ../src/util/u_memory.h:38,
from ../src/gallium/include/pipe/p_state.h:47,
from ../src/gallium/auxiliary/util/u_inlines.h:34,
from ../src/gallium/auxiliary/pipebuffer/pb_buffer.h:49,
from ../src/gallium/include/winsys/radeon_winsys.h:46,
from ../src/gallium/drivers/r600/r600_pipe_common.h:37,
from ../src/gallium/drivers/r600/r600_cs.h:33,
from ../src/gallium/drivers/r600/r600_buffer_common.c:27:
In function 'r600_alloc_buffer_struct',
inlined from 'r600_buffer_create' at ../src/gallium/drivers/r600/r600_buffer_common.c:576:34:
../src/util/os_memory_stdc.h:41:27: note: object of size 264 allocated by 'malloc'
41 | #define os_malloc(_size) malloc(_size)
| ^~~~~~~~~~~~~
../src/util/u_memory.h:46:24: note: in expansion of macro 'os_malloc'
46 | #define MALLOC(_size) os_malloc(_size)
| ^~~~~~~~~
../src/util/u_memory.h:54:41: note: in expansion of macro 'MALLOC'
54 | #define MALLOC_STRUCT(T) (struct T *) MALLOC(sizeof(struct T))
| ^~~~~~
../src/gallium/drivers/r600/r600_buffer_common.c:554:19: note: in expansion of macro 'MALLOC_STRUCT'
554 | rbuffer = MALLOC_STRUCT(r600_resource);
| ^~~~~~~~~~~~~
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21781>
This matches the type of the underlying size member, and is consistent
with other getSize methods.
Avoids compiler warning with LTO enabled:
In member function '__ct ',
inlined from 'convertToSSA' at ../src/nouveau/codegen/nv50_ir_ssa.cpp:401:26,
inlined from 'convertToSSA' at ../src/nouveau/codegen/nv50_ir_ssa.cpp:310:28,
inlined from 'nv50_ir_generate_code' at ../src/nouveau/codegen/nv50_ir.cpp:1331:22:
../src/nouveau/codegen/nv50_ir_ssa.cpp:407:48: error: argument 1 value '18446744073709551615' exceeds maximum object size 9223372036854775807 [-Werror=alloc-size-larger-than=]
407 | stack = new Stack[func->allLValues.getSize()];
| ^
/usr/include/c++/12/new: In function 'nv50_ir_generate_code':
/usr/include/c++/12/new:128:26: note: in a call to allocation function 'operator new []' declared here
128 | _GLIBCXX_NODISCARD void* operator new[](std::size_t) _GLIBCXX_THROW (std::bad_alloc)
| ^
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21781>
This can have two main uses:
* If we suspect a problem with TFU copies, we can disable it and
check if other codepaths gets a test/app working.
* To test other codepaths, as in general, TFU is the preferred
option for copies.
Note that for now this is only for v3dv, as for v3d, mipmap generation
uses TFU without an alternative codepath.
With this option we also adds an assert if we try to submit a TFU job,
just in case we keep adding other methods that use TFU, and forget to
include the debug option there.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21952>
the tricky part of tracking renderpass info in tc is handling batch
flushing. there are a number of places that can trigger it, but
there are only two types of flushes:
* full flushes (all commands execute)
* partial flushes (more commands will execute after)
the latter case is the important one, as it effectively means that
the current renderpass info needs to "roll over" into the next one,
and it's really the next info that the driver will want to look at.
this is made trickier by there being no way (for the driver) to distinguish
when a rollover happens in order to delay beginning a renderpass for
further parsing
to solve this, add a member to renderpass info to chain the rolled-over info,
which tc can then process when the driver tries to wait. this works "most"
of the time, except when an app/test blows out the tc batch count, in which
case this pointer will be non-null, and it can be directly signaled as a less
optimized renderpass to avoid deadlocking
also sometimes a flush will trigger sub-flushes for buffer lists, so
add an assert to ensure nobody tries using this with driver_calls_flush_notify=true
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21800>
this enables some more under-the-hood changes without touching the header
that will force all of gallium to be recompiled
also update/clarify rules for using rp tracking; these haven't changed,
but the documentation was less clear before
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21800>
this value is -1 by default, which means the initial allocation yields
9 info structs instead of 10 (though this has no bearing on functionality)
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21800>
drivers should be maintaining a local copy of the rp info, and this
provides a consistent way to reset that info if a renderpass is ended
early
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21800>
it's expected that a driver won't immediately trigger a deferred flush
if a fence is present, so don't split the renderpass in this case since
that breaks everything
Fixes: 07017aa137 ("util/tc: implement renderpass tracking")
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21800>
not sure if this actually affects anything, but if a renderpass has
to be split for some reason, ensure subsequent renderpasses don't lose
data
also ensure that rp data isn't lost when triggering primgen clears and
delete a now-invalid assert
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21800>
This fixes the behavior of register coalesce in cases where the source
of a copy is written elsewhere in the program by a force_writemask_all
instruction, which could cause the overwrite to be executed for an
inactive channel under non-uniform control flow, causing
can_coalesce_vars() to give incorrect results. This has been reported
in cases like:
> while (true) {
> x = imageSize(img);
> if (non_uniform_condition()) {
> y = x;
> break;
> }
> }
> use(y);
Currently the register coalesce pass would coalesce x and y in the
example above, which is invalid since in the example above imageSize()
is implemented as a force_writemask_all SEND message, whose result is
broadcast to all channels, so when a given channel executes 'y = x'
and breaks out of the loop, another divergent channel can execute a
subsequent iteration of the loop overwriting 'x' with a different
value, hence coalescing y and x into the same register changes the
behavior of the program.
Note that this is a regression introduced by commit a4b36cd3dd. In
order to avoid the problem without reverting that patch, we prevent
register coalesce if there is an overwrite of the source with
force_writemask_all behavior inconsistent with the copy and this
occurs anywhere in the intersection of the live ranges of source and
destination, even if it occurs lexically before the copy, since it
might be physically executed after the copy under divergent loop
control flow.
Fixes: a4b36cd3dd ("intel/fs: Coalesce when the src live range is contained in the dst")
Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21351>
This fixes the behavior of copy propagation in cases where either the
source or destination of an ACP is overwritten elsewhere in the
program by a force_writemask_all instruction, which could cause the
overwrite to be executed for an inactive channel under non-uniform
control flow, causing the current per-channel dataflow propagation to
give incorrect results. This has been reported in cases like:
> while (true) {
> x = imageSize(img);
> if (non_uniform_condition()) {
> y = x;
> break;
> }
> }
> use(y);
Currently the copy propagation pass would propagate copy 'y = x' into
'use(y)', which is invalid since in the example above imageSize() is
implemented as a force_writemask_all SEND message, whose result is
broadcast to all channels, so when a given channel executes 'y = x'
and breaks out of the loop, another divergent channel can execute a
subsequent iteration of the loop overwriting 'x' with a different
value, hence replacing 'y' with 'x' at 'use(y)' changes the behavior
of the program.
This patch extends the global dataflow analysis algorithm to determine
whether there is any control flow path from a given copy to an
overwrite of its source or destination which has force_writemask_all
behavior inconsistent with the copy, and in such case prevents copy
propagation for that ACP entry at any point of the program which can
be reached from the overwrite, even if the copy is statically
re-executed along all such control flow paths (as in the example
above), since the execution of the overwrite for a given channel i may
corrupt other channels j!=i inactive for the subsequently re-executed
copy.
Note that a simpler solution has been attempted which fully shuts down
copy propagation if such a force_writemask_all ACP overwrite is
present /anywhere/ in the program regardless of its location in the
control flow graph, however that led to large shader-db regressions in
some programs from shader-db (like a CS from Car Chase which would
emit 53% more instructions). With this solution the only handful of
shaders that suffer instruction count regressions seem to be getting
misoptimized right now (e.g. some compute shaders from Deus Ex
Mankind). This solution doesn't seem to affect the run-time of
shader-db significantly, it's less than 1% higher with the fix
applied.
Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21351>
force_writemask_all determines whether all channels of the copy are
actually valid, and may be required to be set for it to be propagated
safely in cases where the destination of the copy is used by another
force_writemask_all instruction, or when the copy occurs in a
divergent control flow block different from its use.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21351>
Commit 28311f9d02 moved ufind_msb lowering to NIR and started emitting
uclz. Unfortunately, the vec4 backend never actually implemented uclz.
It's trivial to do. Now it does.
Fixes: 28311f9d02 ("nir: intel/compiler: Move ufind_msb lowering to NIR")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21974>
Ahem, "add builder helpers that work on Gfx7"...now might actually work.
Too much copy and paste...
Fixes: 966995d911 ("intel/fs: Add builder helpers for F32TO16/F16TO32 that work on Gfx7.x")
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21974>
generate_tex() asserts that sampler_index.type == UD, but commit
83fd7a5ed1 removed the uint temporary, which caused us to see D at
some points. Really, either should be fine, but let's just put the
UD retype back. This fixes a ton of things in crocus.
Fixes: 83fd7a5ed1 ("intel: Use nir_lower_tex_options::lower_index_to_offset")
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21974>
tc fence lifetimes can exceed the lifetimes of their parent contexts,
which means they can be destroyed after mfence->fence has been destroyed
to avoid invalid memory access on a destroyed fence, store all the assigned
tc fences into an array on the real fence and then use that to unset fence
pointers on any outstanding tc fences
fixes flakiness in dEQP-EGL.functional.sharing.gles2.multithread.random_egl_sync.images.texsubimage2d.12
in caselist:
dEQP-EGL.functional.query_context.get_current_surface.rgba4444_pbuffer
dEQP-EGL.functional.create_surface.platform_window.rgba5551_depth_no_stencil
dEQP-EGL.functional.query_surface.simple.pbuffer.rgb888_depth_no_stencil
dEQP-EGL.functional.color_clears.multi_context.gles2.rgb888_pixmap
dEQP-EGL.functional.color_clears.multi_context.gles1_gles2.rgba8888_window
dEQP-EGL.functional.color_clears.multi_context.gles1_gles2_gles3.rgb888_window
dEQP-EGL.functional.render.multi_thread.gles2_gles3.rgba5551_pbuffer
dEQP-EGL.functional.sharing.gles2.multithread.random_egl_sync.buffers.buffersubdata.3
dEQP-EGL.functional.sharing.gles2.multithread.random_egl_sync.programs.link.6
dEQP-EGL.functional.sharing.gles2.multithread.random_egl_sync.images.texsubimage2d.12
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21843>
This helps pass the mesh shader I/O tests.
Swizzled buffer addressing seems to be broken on GFX11
when the idxen bit is 0.
No Fossil DB changes on Rembrandt (GFX10.3).
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21930>
Now that we have stub sync support in the submission API, we can
implement the batch tracking changes required to support an explicit
sync world. This excludes the UAPI-specific bits (command decoding and
status parsing).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21620>
When a BO is exported, implicit sync convention requires that writers
signal a fence on the object when complete. We already do this for BOs
that are *already* exported, but it is possible for a BO to be written
to, then exported for the first time.
Add a field to agx_bo to keep track of the current writer syncobj
handle. On first export, we use this to import it into the DMA-BUF.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21620>
We don't want a writer hash table with persistent pointers to resources, because
the resources could be freed without the hash table being updated (even though
the underlying BO will not be freed until it's ready). To avoid the reference
count hell, do away with the pointer hash table and instead use a flat dynarray
for mapping BO (handles) to writer (batch indices).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21620>
This introduces tracking of the required semaphore values in pipelines,
which is then propagated to cmd_buffers on bind. Each queue also keeps
track the maximum count it has waited for, so that we can avoid the waiting
overhead once all the shaders are loaded and referenced.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16271>
Following PAL's implementation, this patch avoids allocating shader code
buffers in BAR and use SDMA to upload them to invisible VRAM
directly.
For some games like HZD, shaders can take as much as 400MB, which exceeds
the non-resizable BAR size (256MB) and cause inconsistent spilling
behavior. The kernel will normally move these to invisible VRAM on its own,
but there are a few cases that it does not reliably happen. This patch does
the moving explicitly in the driver to ensure predictable results.
In this patch, we upload the shaders synchronously; so the shader will be
ready as soon as vkCreate*Pipeline returns. A following patch will make
this asynchronous and don't block until we see a use of the pipeline.
As a side effect, when SQTT is used we now store the shaders on a cacheable
buffer which would speed up writing the trace to the disk.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16271>
For consistency with the sdma_copy_buffer helper that will be added next.
As a general justification, SDMA commands require little state tracking and
using radeon_cmdbuf makes it more suitable for driver internal use.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16271>
Set TEDMODE_RR_STRICT when TEEnable is set.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21899>
Set TEDMODE_RR_STRICT when TEEnable is set.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21899>
Avoid all the issues of having to keep them in sync, and few-enough
people (read: probably no-one ever) will be working on the asahi driver
from a Windows machine, so symlinks can be relied upon, especially for
something optional like automatic code formatting.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21951>