NVIDIA hardware can process tall or wide videos, but not both at the
same time (for some gens). This limit is provided in units of
macroblocks.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10677>
The hardware has no support for 3d image loads/stores. So present the
image as a larger 2d image and fudge the coordinates. Note that a 2d
image (in the shader) may be backed by a slice of a 3d image, so we
always have to do the coordinate adjustments for 2d as well.
This is largely copied from the nv50 support, which has the same
restriction, with extra care taken to differentiate loads (which
specifies the X coordinate in bytes) and stores, which specifies it in
(formatted) pixels.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10820>
Prior to an earlier commit, xfb queries were not being marked as 64-bit.
The end result of this is that they would never appear to be "ready",
which in turn led to there always being a wait happening.
Once these got marked as 64-bit, we started checking the attached fence
for being signalled. However the screen fence does not seem to be enough
to wait for the streamout query data to actually be written out. So
instead we add a bit of extra "data" which emulates the 32-bit query way
of doing things (with the payload in front) which is emitted from the
same "unit" as the other streamout data. This seems to be sufficient.
Note that it does not seem to be required to actually emit the final
32-bit query from the streamout unit, but that seems logical and perhaps
there are edge cases where it is required.
While at it, also make the sequence management/initialization more
similar to the nvc0 driver.
Fixes dEQP-GLES3.functional.transform_feedback.*
Fixes: 58d47ca324 ("nv50: add compute invocations counter")
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10867>
Fix defect reported by Coverity Scan.
Side effect in assertion (ASSERT_SIDE_EFFECT)
assignment_where_comparison_intended: Assignment deviceMask = 1U
has a side effect. This code will work differently in a non-debug
build.
Fixes: 234e1b7356 ("v3dv: implement VK_KHR_device_group")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11197>
On AGX, the special register for front facing is inverted from its meaning in
APIs. We need to lower load_front_face to inot(load_back_face). Doing this in
the backend is trivial, but then we would miss out on algebraic optimizations
for the inot.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11199>
In order to simplify main DSI host database, split away phy register
definitions used on DSI v2 hosts to the separate database file.
Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11075>
Somehow fairly recently the traces CI job started hitting timeouts, not
all the time but enough to be inconvenient for CI. I tracked it down to
getting into a situation where `ctx->batch->flush == true`, which causes
an infinite loop in the draw_vbo and clear paths (because
fd_batch_lock_submit() checks for flushed batch but fd_context_batch()
does not). I'm not entirely sure how we get into that state, or what
triggered this (seems possibly triggered by !10937). But it is easy
enough to recover.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11196>
DRI3 needs version 4 of __DRI2_FLUSH.
Straight up port of i965 commit 313f2bc32b ("intel: Add
support for the new flush_with_flags extension.").
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9734>
DRI3 needs __DRI_IMAGE_ATTRIB_OFFSET so implement it.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9734>
The format enum space isn't necessarily contiguous so we can't assume
that if it's in the table it's valid. We need to check something.
Fixes: ed6e586562 "intel: properly constify isl_format_layouts"
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11191>
Now that we have an idea of how many regs the conflicting allocation uses,
we can just skip to the next one and save repeated tests to find the same
conflicting neighbor again.
shadowrun-returns shader-db time on skl -1.62821% +/- 1.58079% (n=679),
now there's no statistically significant change from the start of the series
(n=420)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9437>
By using the new class type, we don't need to make 1928 different
registers to represent each contigous reg size starting from the actual
128 HW register, or have a mapping between RA regs and HW base regs. With
the number of regs reduced, and the fast q computation when using the new
classes, we no longer need to compute our own q.
This drops the FS RA initialization time on my CFL system from about 1ms to
50us.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9437>
In the fully general case of register classes, to expose an allocation
class of unaligned 2-contiguous-regs allocations, for example, you'd have
your base individual regs (128 on intel), and another set of 127 regs that
each conflicted with the corresponding pair of the base regs. Single-reg
nodes would allocate in the 128, and double-reg nodes would allocate in
the 127 and the user would remap from the 127 down to the base regs with
some irritating table.
If you need many different contiguous allocation sizes (16 is a pretty
common number across drivers), your number of regs explodes, wasting
memory and making the q computation expensive at startup.
If all the user has is contiguous-reg classes, we can easily compute the q
value up front (as found in the intel driver and nouveau, for example),
and we only have to change a couple of places in the conflict-checking
logic so the contiguous-reg classes can use the base registers.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9437>
Putting a const char * in the struct means it's a pointer that has to be
resolved at rtld time, which means it can be in .data.rel.ro but not
.rodata like you'd hope. Fix this with the usual string table trick.
Cuts about 20k (-80k read-write +60k read-only) and ~280 relocations
from the gallium driver.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11168>
Nothing in llvmpipe uses util_fast_log2().
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11173>
This was bumped in 7e584a70c4 ("gallium: increase table size for fast
log/pow functions") presumably to fix conformance of tgsi_exec, but we
don't need that much accuracy in the only place it's used in the tree any
more: softpipe texture sampling.
softpipe glmark2 -b texture:texture-filter=linear FPS +0.335748% +/-
0.220111% (n=20)
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11173>
It's disabled due to non-conformance with no configuration knob to turn it
on, and if you care about swrast performance you're on llvmpipe anyway.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11173>
This improves viewperf performance and it shouldn't break synchronization
with external clients when it's indirectly implied by glFlush.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10937>
this is allowed for fb attachments, so we can use it to avoid needing to
change layouts for zs textures if we know that it isn't going to be written
to during a given subpass
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11130>
This will be used by SPM and also for configuring the trap handler.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11128>