pipeline_flags was 64-bit yet only the first 4 bytes were hashed.
Luckily, the mask included no flag above the 32nd bit, so this was
technically working fine. Still, it's better to use explicit sizeof
constructs to be more resilient to accidental type changes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26145>
spec@arb_shader_image_load_store@coherency will write to coherent
image in tess shader and read it in fragmant shader. There is a
geometry shader in between.
When lower ngg for the geometry shader, it will wait memory writes
before pos0 export if there's no param output to prevent fragment
shader start early and read any previous memory writes.
We need to update the memory writes info of GS with ES ones because
ES and GS is merged into one shader but when nir they are separated.
LLVM does not have this problem because it will add memory write
wait at the beginning of GS automatically.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26122>
The allocator passed to VkDevice won't be available once it is destroyed
and thefore it cannot be used to allocate `object_name` for instance
level objects such as `VkInstance` or `VkPhysicalDevice` or else there
would be no way of deallocating it when those objects are destroyed.
Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26085>
The current name doesn't cover all the tex related instructions and
in all usages, we already have a switch statement to dispatch
per instruction type, so is more natural to list the instructions we
care there.
In fs::is_send_from_grf() we can simply ignore them since the
instructions are either lowered directly to SEND (Gfx7+) or use
MRF (Gfx6-).
With this change, the fs_inst::size_read() generated code gets
simplified (the "tex" entries get added to the switch jump table
in gcc) and the default case loses the conditional handling tex.
This reduces shader compilation time, as illustrated by replaying
fossils (tested on my TGL laptop):
```
// Rise of the Tomb Raider (N=13)
Difference at 95.0% confidence
-1.32231 +/- 0.0170138
-4.37605% +/- 0.0563054%
(Student's t, pooled s = 0.0210159)
// Cyberpunk 2077 (N=7)
Difference at 95.0% confidence
-3.64 +/- 0.114993
-2.95188% +/- 0.0932544%
(Student's t, pooled s = 0.09873)
```
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25721>
KHR_present_id and KHR_present_wait were set in get_device_extensions() but uninitialized
in tu_get_features(). This causes presentId and presentWait to be false at all times. Fix it.
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-Authored-By: Xilin Wu <wuxilin123@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26097>
v3d driver will implicitly clear the buffer's content on the first write
operation. This clearing operation is helpful for allocated buffers,
initializing them with zeros instead of having memory garbage.
Also, this avoids reading the buffer from the RAM to the GPU cache
before rendering, making the first write operation slightly faster.
The clearing operation should not happen for imported buffers where
the buffer may already contain valuable data and the user may want
to render into the buffer only partially.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26136>
Android.mk rules for radeonsi are updated according to commit
0a56417 "meson: be able to build radeonsi without llvm"
cflag -DFORCE_BUILD_AMDGPU is required when building radeonsi with llvm support
based on android-x86 downstream LLVM fork that follows the AOSP llvm build rules.
Reviewed-by: Roman Stratiienko <r.stratiienko@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26049>
We have an optimization to try to swap regular live intervals with
killed sources when evicting them fails in order to make a contiguous
space for the destination to fit in, but this doesn't work when the
destination is early-clobber.
Fixes
dEQP-GLES31.functional.synchronization.inter_invocation.image_atomic_read_write
on a650+.
Fixes: d4b5d2a ("ir3/ra: Use killed sources in register eviction")
Closes: #8886
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26004>
Not supported according to the docs and will trigger an assert
isl_get_render_compression_format().
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26112>
The Talos Principle VR shares the same engine quirk as its non-VR counterpart.
Backport-to: 23.2
Backport-to: 23.3
Reviewed-by: Antonino Maniscalco <antonino.maniscalco@collabora.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26047>
I was playing around with possible improvements to STACK_ARRAY(), and
one of my experiments made gcc point us that we were not freeing
'stages'.
Fixes: 514c10344e ("vulkan/meta: Add a concept of rect pipelines")
Reviewed-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26041>
There is close to zero work needed to execute this job.
Should speed up the initial process of entering into pipeline tree
and also provide an opportunity for `aarch64` runners to engage sooner,
even when x86_64 machines are loaded.
Acked-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26125>
As a follow-up to
8cfc17bdda ("kmsro: Add the rest of the current set of tinydrm drivers.")
and
0a42d5b98b ("kmsro: add _dri.so to two of the kmsro drivers.")
add even more TinyDRM drivers that have been added to the kernel but not
to gallium.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26129>
We previously used the 2d path for single-sampled copies which seems to
canonicalize NaNs when the source format is a 16-bit floating point
format, likely because it implicitly converts to 32 bits. The current 3d
path also implicitly converts and has the same problem. Add a new shader
variant for half-float copies and switch to using it.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26042>
With the removal of DRM_IOCTL_XE_MMIO xe_gem_read_render_timestamp()
was always returning false but with DRM_XE_DEVICE_QUERY_ENGINE_CYCLES
it can be re implemented making use of
xe_gem_read_correlate_cpu_gpu_timestamp().
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24591>
The Intel Xe driver added the ability to do cpu/gpu timestamp
correlation giving a much better alignment of timestamps (we use to
have ~20us delta between the 2 samples, just because of the ioctl
barrier potentially sneaking in some work).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24591>
Add anv_get_default_cpu_clock_id() to return the default cpu clock
id to be used in the begin and end time captures of
anv_GetCalibratedTimestampsEXT().
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24591>
This function will make use of Xe DRM_XE_DEVICE_QUERY_ENGINE_CYCLES by
returning correlate CPU ang GPU timestamp to be used by Intel drives.
This correlate timestamps gives us more accuracy.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24591>
This is lowered to 32-bit integer execution type by the regioning
lowering pass now, so the existing special casing is redudant for
Gfx12 and buggy for Xe2+, since SEL_EXEC is now emitted without
lowering for 64-bit integers.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25514>
Extended math instructions are now synchronized as in-order
instructions like other ALU operations, which is more efficient than
the out-of-order tracking we had to do in previous generations, and
avoids false dependencies introduced due to SBID aliasing.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25514>