mirror of
https://gitlab.freedesktop.org/mesa/mesa.git
synced 2024-11-23 18:24:13 +08:00
docs: Add some documentation of game GL buffer object mapping behavior.
There are a variety of paths that apps take (this is by no means a complete enumeration, I tried to keep going until I saw repeats but eventually ran out of steam), and it should be useful to driver developers writing their pipe_transfer_map() and invalidate_resource() calls to see a bunch of the patterns without having to do performance debug on each app. Acked-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9231>
This commit is contained in:
parent
71e8141503
commit
a2a8c6a36c
414
docs/gallium/buffermapping.rst
Normal file
414
docs/gallium/buffermapping.rst
Normal file
@ -0,0 +1,414 @@
|
||||
Buffer mapping patterns
|
||||
-----------------------
|
||||
|
||||
There are two main strategies the driver has for CPU access to GL buffer
|
||||
objects. One is that the GL calls allocate temporary storage and blit to the GPU
|
||||
at
|
||||
``glBufferSubData()``/``glBufferData()``/``glFlushMappedBufferRange()``/``glUnmapBuffer()``
|
||||
time. This makes the behavior easily match. However, this may be more costly
|
||||
than direct mapping of the GL BO on some platforms, and is essentially not
|
||||
available to tiling GPUs (since tiling involves running through the command
|
||||
stream multiple times). Thus, GL has additional interfaces to help make it so
|
||||
apps can directly access memory while avoiding implicit blocking on the GPU
|
||||
rendering from those BOs.
|
||||
|
||||
Rendering engines have a variety of knobs to set on those GL interfaces for data
|
||||
upload, and as a whole they seem to take just about every path available. Let's
|
||||
look at some examples to see how they might constrain GL driver buffer upload
|
||||
behavior.
|
||||
|
||||
Portal 2
|
||||
========
|
||||
|
||||
.. code-block:: console:
|
||||
|
||||
1030842 glXSwapBuffers(dpy = 0x82a8000, drawable = 20971540)
|
||||
1030876 glBufferDataARB(target = GL_ELEMENT_ARRAY_BUFFER, size = 65536, data = NULL, usage = GL_DYNAMIC_DRAW)
|
||||
1030877 glBufferSubData(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, size = 576, data = blob(576))
|
||||
1030896 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 526, count = 252, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 0)
|
||||
1030915 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 19657, count = 36, type = GL_UNSIGNED_SHORT, indices = 0x1f8, basevertex = 0)
|
||||
1030917 glBufferDataARB(target = GL_ARRAY_BUFFER, size = 1572864, data = NULL, usage = GL_DYNAMIC_DRAW)
|
||||
1030918 glBufferSubData(target = GL_ARRAY_BUFFER, offset = 0, size = 128, data = blob(128))
|
||||
1030919 glBufferSubData(target = GL_ELEMENT_ARRAY_BUFFER, offset = 576, size = 12, data = blob(12))
|
||||
1030936 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 3, count = 6, type = GL_UNSIGNED_SHORT, indices = 0x240, basevertex = 0)
|
||||
1030937 glBufferSubData(target = GL_ARRAY_BUFFER, offset = 128, size = 128, data = blob(128))
|
||||
1030938 glBufferSubData(target = GL_ELEMENT_ARRAY_BUFFER, offset = 588, size = 12, data = blob(12))
|
||||
1030940 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 4, end = 7, count = 6, type = GL_UNSIGNED_SHORT, indices = 0x24c, basevertex = 0)
|
||||
[... repeated draws at increasing offsets]
|
||||
1033097 glXSwapBuffers(dpy = 0x82a8000, drawable = 20971540)
|
||||
|
||||
From this sequence, we can see that it is important that the driver either
|
||||
implement ``glBufferSubData()`` as a blit from a streaming uploader in sequence with
|
||||
the ``glDraw*()`` calls (a common behavior for non-tiled GPUs, particularly those with
|
||||
dedicated memory), or that you:
|
||||
|
||||
1) Track the valid range of the buffer so that you don't have to flush the draws
|
||||
and synchronize on each following ``glBufferSubData()``.
|
||||
|
||||
2) Reallocate the buffer storage on ``glBufferData`` so that your first
|
||||
``glBufferSubData()`` of the frame doesn't stall on the last frame's
|
||||
rendering completing.
|
||||
|
||||
You can't just empty your valid range on ``glBufferData()`` unless you know that
|
||||
the GPU access from the previous frame has completed. This pattern of
|
||||
incrementing ``glBufferSubData()`` offsets interleaved with draws from that data
|
||||
is common among newer Valve games.
|
||||
|
||||
.. code-block:: console:
|
||||
[ during setup ]
|
||||
|
||||
679259 glGenBuffersARB(n = 1, buffers = &1314)
|
||||
679260 glBindBufferARB(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1314)
|
||||
679261 glBufferDataARB(target = GL_ELEMENT_ARRAY_BUFFER, size = 3072, data = NULL, usage = GL_STATIC_DRAW)
|
||||
679264 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 3072, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384000
|
||||
679269 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 3072)
|
||||
679270 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE
|
||||
|
||||
[... setup of other buffers on this binding point]
|
||||
|
||||
679343 glBindBufferARB(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1314)
|
||||
679344 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384000
|
||||
679346 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768)
|
||||
679347 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE
|
||||
679348 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 768, length = 768, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384300
|
||||
679350 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768)
|
||||
679351 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE
|
||||
679352 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 1536, length = 768, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384600
|
||||
679354 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768)
|
||||
679355 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE
|
||||
679356 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 2304, length = 768, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384900
|
||||
679358 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768)
|
||||
679359 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE
|
||||
|
||||
[... setup completes and we start drawing later]
|
||||
|
||||
761845 glBindBufferARB(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1314)
|
||||
761846 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 323, count = 384, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 0)
|
||||
|
||||
This suggests that, for non-blitting drivers, resetting your "might be used on
|
||||
the GPU" range after a stall could save you a bunch of additional GPU stalls
|
||||
during setup.
|
||||
|
||||
Terraria
|
||||
========
|
||||
|
||||
.. code-block:: console:
|
||||
|
||||
167581 glXSwapBuffers(dpy = 0x3004630, drawable = 25165844)
|
||||
|
||||
167585 glBufferData(target = GL_ARRAY_BUFFER, size = 196608, data = NULL, usage = GL_STREAM_DRAW)
|
||||
167586 glBufferSubData(target = GL_ARRAY_BUFFER, offset = 0, size = 1728, data = blob(1728))
|
||||
167588 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 71, count = 108, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 0)
|
||||
167589 glBufferData(target = GL_ARRAY_BUFFER, size = 196608, data = NULL, usage = GL_STREAM_DRAW)
|
||||
167590 glBufferSubData(target = GL_ARRAY_BUFFER, offset = 0, size = 27456, data = blob(27456))
|
||||
167592 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 7, count = 12, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 0)
|
||||
167594 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 3, count = 6, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 8)
|
||||
167596 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 3, count = 6, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 12)
|
||||
[...]
|
||||
|
||||
In this game, we can see ``glBufferData()`` being used on the same array buffer
|
||||
throughout, to get new storage so that the ``glBufferSubData()`` doesn't cause
|
||||
synchronization.
|
||||
|
||||
Don't Starve
|
||||
============
|
||||
|
||||
.. code-block:: console:
|
||||
|
||||
7251917 glGenBuffers(n = 1, buffers = &115052)
|
||||
7251918 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 115052)
|
||||
7251919 glBufferData(target = GL_ARRAY_BUFFER, size = 144, data = blob(144), usage = GL_STREAM_DRAW)
|
||||
7251921 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 115052)
|
||||
7251928 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6)
|
||||
7251930 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 114872)
|
||||
7251936 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 18)
|
||||
7251938 glGenBuffers(n = 1, buffers = &115053)
|
||||
7251939 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 115053)
|
||||
7251940 glBufferData(target = GL_ARRAY_BUFFER, size = 144, data = blob(144), usage = GL_STREAM_DRAW)
|
||||
7251942 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 115053)
|
||||
7251949 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6)
|
||||
7251973 glXSwapBuffers(dpy = 0x86dd860, drawable = 20971540)
|
||||
[... drawing next frame]
|
||||
7252388 glDeleteBuffers(n = 1, buffers = &115052)
|
||||
7252389 glDeleteBuffers(n = 1, buffers = &115053)
|
||||
7252390 glXSwapBuffers(dpy = 0x86dd860, drawable = 20971540)
|
||||
|
||||
In this game we have a lot of tiny ``glBufferData()`` calls, suggesting that we
|
||||
could see working set wins and possibly CPU overhead reduction by packing small
|
||||
GL buffers in the same BO. Interestingly, the deletes of the temporary buffers
|
||||
always happen at the end of the next frame.
|
||||
|
||||
Euro Truck Simulator
|
||||
====================
|
||||
|
||||
.. code-block:: console:
|
||||
|
||||
[usage of VBO 14,15]
|
||||
[...]
|
||||
885199 glXSwapBuffers(dpy = 0x379a3e0, drawable = 20971527)
|
||||
885203 glInvalidateBufferData(buffer = 14)
|
||||
885204 glInvalidateBufferData(buffer = 15)
|
||||
[...]
|
||||
889330 glXSwapBuffers(dpy = 0x379a3e0, drawable = 20971527)
|
||||
889334 glInvalidateBufferData(buffer = 12)
|
||||
889335 glInvalidateBufferData(buffer = 16)
|
||||
[...]
|
||||
893461 glXSwapBuffers(dpy = 0x379a3e0, drawable = 20971527)
|
||||
893462 glClientWaitSync(sync = 0x77eee10, flags = 0x0, timeout = 0) = GL_ALREADY_SIGNALED
|
||||
893463 glDeleteSync(sync = 0x780a630)
|
||||
893464 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0x78ec730
|
||||
893465 glInvalidateBufferData(buffer = 13)
|
||||
893466 glInvalidateBufferData(buffer = 17)
|
||||
893505 glBindBuffer(target = GL_COPY_READ_BUFFER, buffer = 14)
|
||||
893506 glMapBufferRange(target = GL_COPY_READ_BUFFER, offset = 0, length = 788, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b034efd1000
|
||||
893508 glUnmapBuffer(target = GL_COPY_READ_BUFFER) = GL_TRUE
|
||||
893509 glBindBuffer(target = GL_COPY_READ_BUFFER, buffer = 15)
|
||||
893510 glMapBufferRange(target = GL_COPY_READ_BUFFER, offset = 0, length = 32, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b034e5df000
|
||||
893512 glUnmapBuffer(target = GL_COPY_READ_BUFFER) = GL_TRUE
|
||||
893532 glBindVertexBuffers(first = 0, count = 2, buffers = {10, 15}, offsets = {0, 0}, strides = {52, 16})
|
||||
893552 glDrawElementsInstancedBaseVertex(mode = GL_TRIANGLES, count = 18, type = GL_UNSIGNED_SHORT, indices = 0x13f280, instancecount = 1, basevertex = 25131)
|
||||
893609 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6)
|
||||
893732 glBindVertexBuffers(first = 0, count = 1, buffers = &14, offsets = &0, strides = &48)
|
||||
893733 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 14)
|
||||
893744 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 6, type = GL_UNSIGNED_SHORT, indices = 0xf0, basevertex = 0)
|
||||
893759 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 24, type = GL_UNSIGNED_SHORT, indices = 0x2e0, basevertex = 6)
|
||||
893786 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 600, type = GL_UNSIGNED_SHORT, indices = 0xe87b0, basevertex = 21515)
|
||||
893822 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6)
|
||||
893845 glBindBuffer(target = GL_COPY_READ_BUFFER, buffer = 14)
|
||||
893846 glMapBufferRange(target = GL_COPY_READ_BUFFER, offset = 788, length = 788, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_RANGE_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b034efd1314
|
||||
893848 glUnmapBuffer(target = GL_COPY_READ_BUFFER) = GL_TRUE
|
||||
893886 glDrawElementsInstancedBaseVertex(mode = GL_TRIANGLES, count = 18, type = GL_UNSIGNED_SHORT, indices = 0x13f280, instancecount = 1, basevertex = 25131)
|
||||
893943 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6)
|
||||
|
||||
At the start of this frame, buffer 14 and 15 haven't been used in the previous 2
|
||||
frames, and the ``GL_ARB_sync`` fence has ensured that the GPU has at least started
|
||||
frame n-1 as the CPU starts the current frame. The first map is ``offset = 0,
|
||||
INVALIDATE_BUFFER | UNSYNCHRONIZED``, which suggests that the driver should
|
||||
reallocate storage for the mapping even in the ``UNSYNCHRONIZED`` case, except
|
||||
that the buffer is definitely going to be idle, making reallocation unnecessary
|
||||
(you may need to empty your valid range, though, to prevent unnecessary batch
|
||||
flushes).
|
||||
|
||||
Also note the use of a totally unrelated binding point for the mapping of the
|
||||
vertex array -- you can't effectively use it as a hint for any buffer placement
|
||||
in memory. The game does also use ``glCopyBufferSubData()``, but only on a
|
||||
different buffer.
|
||||
|
||||
|
||||
Plague Inc
|
||||
==========
|
||||
|
||||
.. code-block:: console:
|
||||
|
||||
1640732 glXSwapBuffers(dpy = 0xb218f20, drawable = 23068674)
|
||||
1640733 glClientWaitSync(sync = 0xb4141430, flags = 0x0, timeout = 0) = GL_ALREADY_SIGNALED
|
||||
1640734 glDeleteSync(sync = 0xb4141430)
|
||||
1640735 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0xb4141430
|
||||
|
||||
1640780 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 78)
|
||||
1640787 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 79)
|
||||
1640788 glDrawElements(mode = GL_TRIANGLES, count = 9636, type = GL_UNSIGNED_SHORT, indices = NULL)
|
||||
1640795 glDrawElements(mode = GL_TRIANGLES, count = 9636, type = GL_UNSIGNED_SHORT, indices = NULL)
|
||||
1640813 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1096)
|
||||
1640814 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 67584, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0xbfef4000
|
||||
1640815 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1091)
|
||||
1640816 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 12, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0xc3998000
|
||||
1640817 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1096)
|
||||
1640819 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 352)
|
||||
1640820 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
|
||||
1640821 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1091)
|
||||
1640823 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 12)
|
||||
1640824 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
|
||||
1640825 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 1096)
|
||||
1640831 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1091)
|
||||
1640832 glDrawElements(mode = GL_TRIANGLES, count = 6, type = GL_UNSIGNED_SHORT, indices = NULL)
|
||||
|
||||
1640847 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1096)
|
||||
1640848 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 352, length = 67584, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0xbfef4160
|
||||
1640849 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1091)
|
||||
1640850 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 88, length = 12, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0xc3998058
|
||||
1640851 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1096)
|
||||
1640853 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 352)
|
||||
1640854 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
|
||||
1640855 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1091)
|
||||
1640857 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 12)
|
||||
1640858 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
|
||||
1640863 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 6, type = GL_UNSIGNED_SHORT, indices = 0x58, basevertex = 4)
|
||||
|
||||
At the start of this frame, the VBOs haven't been used in about 6 frames, and
|
||||
the ``GL_ARB_sync`` fence has ensured that the GPU has started frame n-1.
|
||||
|
||||
Note the use of ``glFlushMappedBufferRange()`` on a small fraction of the size
|
||||
of the VBO -- it is important that a blitting driver make use of the flush
|
||||
ranges when in explicit mode.
|
||||
|
||||
Darkest Dungeon
|
||||
===============
|
||||
|
||||
.. code-block:: console:
|
||||
|
||||
938384 glXSwapBuffers(dpy = 0x377fcd0, drawable = 23068692)
|
||||
|
||||
938385 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2)
|
||||
938386 glBufferData(target = GL_ARRAY_BUFFER, size = 1048576, data = NULL, usage = GL_STREAM_DRAW)
|
||||
938511 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2)
|
||||
938512 glMapBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 1048576, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7a73fcaa7000
|
||||
938514 glFlushMappedBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 512)
|
||||
938515 glUnmapBuffer(target = GL_ARRAY_BUFFER) = GL_TRUE
|
||||
938523 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1)
|
||||
938524 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2)
|
||||
938525 glDrawElements(mode = GL_TRIANGLES, count = 24, type = GL_UNSIGNED_SHORT, indices = NULL)
|
||||
938527 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2)
|
||||
938528 glMapBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 1048576, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7a73fcaa7000
|
||||
938530 glFlushMappedBufferRange(target = GL_ARRAY_BUFFER, offset = 512, length = 512)
|
||||
938531 glUnmapBuffer(target = GL_ARRAY_BUFFER) = GL_TRUE
|
||||
938539 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1)
|
||||
938540 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2)
|
||||
938541 glDrawElements(mode = GL_TRIANGLES, count = 24, type = GL_UNSIGNED_SHORT, indices = 0x30)
|
||||
[... more maps and draws at increasing offsets]
|
||||
|
||||
Interesting note for this game, after the initial ``glBufferData()`` in the
|
||||
frame to reallocate the storage, it unsync maps the whole buffer each time, and
|
||||
just changes which region it flushes. The same GL buffer name is used in every
|
||||
frame.
|
||||
|
||||
Tabletop Simulator
|
||||
==================
|
||||
|
||||
.. code-block:: console:
|
||||
|
||||
1287594 glXSwapBuffers(dpy = 0x3e10810, drawable = 23068692)
|
||||
1287595 glClientWaitSync(sync = 0x7abf554e37b0, flags = 0x0, timeout = 0) = GL_ALREADY_SIGNALED
|
||||
1287596 glDeleteSync(sync = 0x7abf554e37b0)
|
||||
1287597 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0x7abf56647490
|
||||
|
||||
1287614 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 480)
|
||||
1287615 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 384, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_RANGE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7abf2e79a000
|
||||
1287642 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 614)
|
||||
1287650 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 5)
|
||||
1287651 glBufferSubData(target = GL_COPY_WRITE_BUFFER, offset = 0, size = 1088, data = blob(1088))
|
||||
1287652 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 615)
|
||||
1287653 glDrawElements(mode = GL_TRIANGLES, count = 1788, type = GL_UNSIGNED_SHORT, indices = NULL)
|
||||
[... more draw calls]
|
||||
1289055 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 480)
|
||||
1289057 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 384)
|
||||
1289058 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
|
||||
1289059 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 480)
|
||||
1289066 glDrawArrays(mode = GL_TRIANGLE_STRIP, first = 12, count = 4)
|
||||
1289068 glDrawArrays(mode = GL_TRIANGLE_STRIP, first = 8, count = 4)
|
||||
1289553 glXSwapBuffers(dpy = 0x3e10810, drawable = 23068692)
|
||||
|
||||
In this app, buffer 480 gets used like this every other frame. The ``GL_ARB_sync``
|
||||
fence ensures that frame n-1 has started on the GPU before CPU work starts on
|
||||
the current frame, so the unsynchronized access to the buffers is safe.
|
||||
|
||||
Hollow Knight
|
||||
=============
|
||||
|
||||
.. code-block:: console:
|
||||
|
||||
1873034 glXSwapBuffers(dpy = 0x28609d0, drawable = 23068692)
|
||||
1873035 glClientWaitSync(sync = 0x7b1a5ca6e130, flags = 0x0, timeout = 0) = GL_ALREADY_SIGNALED
|
||||
1873036 glDeleteSync(sync = 0x7b1a5ca6e130)
|
||||
1873037 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0x7b1a5ca6e130
|
||||
1873038 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 29)
|
||||
1873039 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 8640, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b1a04c7e000
|
||||
1873040 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 30)
|
||||
1873041 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 720, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b1a07430000
|
||||
1873065 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 29)
|
||||
1873067 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 8640)
|
||||
1873068 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
|
||||
1873069 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 30)
|
||||
1873071 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 720)
|
||||
1873072 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
|
||||
1873073 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 29)
|
||||
1873074 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 8640, length = 576, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b1a04c801c0
|
||||
1873075 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 30)
|
||||
1873076 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 720, length = 72, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b1a074302d0
|
||||
1873077 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 29)
|
||||
1873079 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 576)
|
||||
1873080 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
|
||||
1873081 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 30)
|
||||
1873083 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 72)
|
||||
1873084 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
|
||||
1873085 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 29)
|
||||
1873096 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 30)
|
||||
1873097 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 36, type = GL_UNSIGNED_SHORT, indices = 0x2d0, basevertex = 240)
|
||||
|
||||
In this app, buffer 29/30 get used like this starting from offset 0 every other
|
||||
frame. The ``GL_ARB_sync`` fence is used to make sure that the GPU has reached the
|
||||
start of the previous frame before we go unsynchronized writing over the n-2
|
||||
frame's buffer.
|
||||
|
||||
Borderlands 2
|
||||
=============
|
||||
|
||||
.. code-block:: console:
|
||||
|
||||
3561998 glFlush()
|
||||
3562004 glXSwapBuffers(dpy = 0xbaf0f90, drawable = 23068705)
|
||||
3562006 glClientWaitSync(sync = 0x231c2ab0, flags = GL_SYNC_FLUSH_COMMANDS_BIT, timeout = 10000000000) = GL_ALREADY_SIGNALED
|
||||
3562007 glDeleteSync(sync = 0x231c2ab0)
|
||||
3562008 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0x231aadc0
|
||||
|
||||
3562050 glBindBufferARB(target = GL_ARRAY_BUFFER, buffer = 1193)
|
||||
3562051 glMapBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 1792, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT) = 0xde056000
|
||||
3562053 glUnmapBufferARB(target = GL_ARRAY_BUFFER) = GL_TRUE
|
||||
3562054 glBindBufferARB(target = GL_ARRAY_BUFFER, buffer = 1194)
|
||||
3562055 glMapBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 1280, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT) = 0xd9426000
|
||||
3562057 glUnmapBufferARB(target = GL_ARRAY_BUFFER) = GL_TRUE
|
||||
[... unrelated draws]
|
||||
3563051 glBindBufferARB(target = GL_ARRAY_BUFFER, buffer = 1193)
|
||||
3563064 glBindBufferARB(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 875)
|
||||
3563065 glDrawElementsInstancedARB(mode = GL_TRIANGLES, count = 72, type = GL_UNSIGNED_SHORT, indices = NULL, instancecount = 28)
|
||||
|
||||
The ``GL_ARB_sync`` fence ensures that the GPU has started frame n-1 before the CPU
|
||||
starts on the current frame.
|
||||
|
||||
This sequence of buffer uploads appears in each frame with the same buffer
|
||||
names, so you do need to handle the ``GL_MAP_INVALIDATE_BUFFER_BIT`` as a
|
||||
reallocate if the buffer is GPU-busy (it wasn't in this trace capture) to avoid
|
||||
stalls on the n-1 frame completing.
|
||||
|
||||
Note that this is just one small buffer. Most of the vertex data goes through a
|
||||
``glBufferSubData()``/``glDraw*()`` path with the VBO used across multiple
|
||||
frames, with a ``glBufferData()`` when needing to wrap.
|
||||
|
||||
Buffer mapping conclusions
|
||||
--------------------------
|
||||
|
||||
* Non-blitting drivers must track the valid range of a freshly allocated buffer
|
||||
as it gets uploaded in ``pipe_transfer_map()`` and avoid stalling on the GPU
|
||||
when mapping an undefined portion of the buffer when ``glBufferSubData()`` is
|
||||
interleaved with drawing.
|
||||
|
||||
* Non-blitting drivers must reallocate storage on ``glBufferData(NULL)`` so that
|
||||
the following ``glBufferSubData()`` won't stall. That ``glBufferData(NULL)``
|
||||
call will appear in the driver as an ``invalidate_resource()`` call if
|
||||
``PIPE_CAP_INVALIDATE_BUFFER`` is available. (If that flag is not set, then
|
||||
mesa/st will create a new pipe_resource for you). Storage reallocation may be
|
||||
skipped if you for some reason know that the buffer is idle, in which case you
|
||||
can just empty the valid region.
|
||||
|
||||
* Blitting drivers must use the ``transfer_flush_region()`` region
|
||||
instead of the mapped range when ``PIPE_MAP_FLUSH_EXPLICIT`` is set, to avoid
|
||||
blitting too much data. (When that bit is unset, you just blit the whole
|
||||
mapped range at unmap time.)
|
||||
|
||||
* Buffer valid range tracking in non-blitting drivers must use the
|
||||
``transfer_flush_region()`` region instead of the mapped range when
|
||||
``PIPE_MAP_FLUSH_EXPLICIT`` is set, to avoid excess stalls.
|
||||
|
||||
* Buffer valid range tracking doesn't need to be fancy, "number of bytes
|
||||
valid starting from 0" is sufficient for all examples found.
|
||||
|
||||
* Use the ``pipe_debug_callback`` to report stalls on buffer mapping to ease
|
||||
debug.
|
||||
|
||||
* Buffer binding points are not useful for tuning buffer placement (See all the
|
||||
``PIPE_COPY_WRITE_BUFFER`` instances), you have to track the actual usage
|
||||
history of a GL BO name. mesa/st does this for optimizing its state updates
|
||||
on reallocation in the ``!PIPE_CAP_INVALIDATE_BUFFER`` case, and if you set
|
||||
``PIPE_CAP_INVALIDATE_BUFFER`` then you have to flag your own internal state
|
||||
updates (VBO addresses, XFB addresses, texture buffer addresses, etc.) on
|
||||
reallocation based on usage history.
|
@ -14,6 +14,7 @@ Contents:
|
||||
format
|
||||
context
|
||||
cso
|
||||
buffermapping
|
||||
distro
|
||||
postprocess
|
||||
glossary
|
||||
|
Loading…
Reference in New Issue
Block a user