Move the NIR control flow out of the cull_small_primitive_triangle
function to make it more readable and follow the other functions.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31973>
Change the workgroup scan to be inclusive and adjust
the scalar operations after it.
This gets rid of 1 VALU instruction for 2 SALU. Win!
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31973>
dot_op would be dead code when v_dot instructions are unavailable.
It was originally added there because ACO didn't have an ILP
scheduler yet, but now it does so let's trust it to do its job.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31973>
It should be enough to do this at the end of each submit instead.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31695>
Create a CS which contains just a cache flush,
that can be used as a postamble in command submissions.
According to RadeonSI code, the kernel flushes L2
before shaders are finished on GFX6.
Previously, RADV always added a flush at the end of
each command buffer. The flush postamble should be
a less wasteful alternative to that.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31695>
Add the missing fs_user_dirty and PANVK_CMD_GRAPHICS_DIRTY_RENDER_STATE
checks.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32267>
Sort the dirty states and make it clear that we use
panvk_rendering_state.
Constify color_attachment_samples for panvk_per_arch(blend_emit_descs).
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32267>
There is no point in using UBWC for last small mip levels,
it's an additional overhead for memory and likely less performant.
Additionaly this change fixes multi-planar formats with `noubwc`.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31631>
Many D3D11 games use 3D images writing to them from compute shaders.
Most of such 3D images don't use mipmaps, and in such case enabling
UBWC is trivial.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31631>
OpenCL has 64-bit global IDs, but for driver-internal OpenCL we only need
32-bit. Might as well lower in nir_lower_system_values instead of bringing up a
whole new pass just for this.
Will be used for asahi precomp
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32210>
@fullnopstart
some assembly instructions
@fullnopend
Similar to fullnop and fullsync IR3 dbg options, but useful for
bisecting the assembly via shader override to find the problematic
place.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32256>
- A650+ - should be able to support pipelineFragmentShadingRate
but in some other way than A7XX. Not implemented here.
- A7XX - support pipelineFragmentShadingRate and attachmentFragmentShadingRate
- A740+ - support primitiveFragmentShadingRate
layeredShadingRateAttachments is unsupported at the moment due to tests
failure, but prop driver supports it.
Passes:
dEQP-VK.fragment_shading_rate.*
On A750/A740
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30905>
When gfx version >=12, EFC should be enabled for
VCN5 and plus, in that case DCC is transparent to
VCN engine.
The previous condition for DCC will be invalid, in
that case.
Reviewed-by: David Rosca <david.rosca@amd.com>
Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32263>