mirror of
https://gitlab.freedesktop.org/mesa/mesa.git
synced 2024-12-05 08:14:57 +08:00
8cb8a3c03d
just remove () and * and the space Reviewed-by: Adam Jackson <ajax@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19741>
210 lines
8.9 KiB
ReStructuredText
210 lines
8.9 KiB
ReStructuredText
GL Dispatch
|
|
===========
|
|
|
|
Several factors combine to make efficient dispatch of OpenGL functions
|
|
fairly complicated. This document attempts to explain some of the issues
|
|
and introduce the reader to Mesa's implementation. Readers already
|
|
familiar with the issues around GL dispatch can safely skip ahead to the
|
|
:ref:`overview of Mesa's implementation <overview>`.
|
|
|
|
1. Complexity of GL Dispatch
|
|
----------------------------
|
|
|
|
Every GL application has at least one object called a GL *context*. This
|
|
object, which is an implicit parameter to every GL function, stores all
|
|
of the GL related state for the application. Every texture, every buffer
|
|
object, every enable, and much, much more is stored in the context.
|
|
Since an application can have more than one context, the context to be
|
|
used is selected by a window-system dependent function such as
|
|
``glXMakeContextCurrent``.
|
|
|
|
In environments that implement OpenGL with X-Windows using GLX, every GL
|
|
function, including the pointers returned by ``glXGetProcAddress``, are
|
|
*context independent*. This means that no matter what context is
|
|
currently active, the same ``glVertex3fv`` function is used.
|
|
|
|
This creates the first bit of dispatch complexity. An application can
|
|
have two GL contexts. One context is a direct rendering context where
|
|
function calls are routed directly to a driver loaded within the
|
|
application's address space. The other context is an indirect rendering
|
|
context where function calls are converted to GLX protocol and sent to a
|
|
server. The same ``glVertex3fv`` has to do the right thing depending on
|
|
which context is current.
|
|
|
|
Highly optimized drivers or GLX protocol implementations may want to
|
|
change the behavior of GL functions depending on current state. For
|
|
example, ``glFogCoordf`` may operate differently depending on whether or
|
|
not fog is enabled.
|
|
|
|
In multi-threaded environments, it is possible for each thread to have a
|
|
different GL context current. This means that poor old ``glVertex3fv``
|
|
has to know which GL context is current in the thread where it is being
|
|
called.
|
|
|
|
.. _overview:
|
|
|
|
2. Overview of Mesa's Implementation
|
|
------------------------------------
|
|
|
|
Mesa uses two per-thread pointers. The first pointer stores the address
|
|
of the context current in the thread, and the second pointer stores the
|
|
address of the *dispatch table* associated with that context. The
|
|
dispatch table stores pointers to functions that actually implement
|
|
specific GL functions. Each time a new context is made current in a
|
|
thread, these pointers are updated.
|
|
|
|
The implementation of functions such as ``glVertex3fv`` becomes
|
|
conceptually simple:
|
|
|
|
- Fetch the current dispatch table pointer.
|
|
- Fetch the pointer to the real ``glVertex3fv`` function from the
|
|
table.
|
|
- Call the real function.
|
|
|
|
This can be implemented in just a few lines of C code. The file
|
|
``src/mesa/glapi/glapitemp.h`` contains code very similar to this.
|
|
|
|
.. code-block:: c
|
|
:caption: Sample dispatch function
|
|
|
|
void glVertex3f(GLfloat x, GLfloat y, GLfloat z)
|
|
{
|
|
const struct _glapi_table * const dispatch = GET_DISPATCH();
|
|
|
|
dispatch->Vertex3f(x, y, z);
|
|
}
|
|
|
|
The problem with this simple implementation is the large amount of
|
|
overhead that it adds to every GL function call.
|
|
|
|
In a multithreaded environment, a naive implementation of
|
|
``GET_DISPATCH()`` involves a call to ``_glapi_get_dispatch()`` or
|
|
``_glapi_tls_Dispatch``.
|
|
|
|
3. Optimizations
|
|
----------------
|
|
|
|
A number of optimizations have been made over the years to diminish the
|
|
performance hit imposed by GL dispatch. This section describes these
|
|
optimizations. The benefits of each optimization and the situations
|
|
where each can or cannot be used are listed.
|
|
|
|
3.1. ELF TLS
|
|
~~~~~~~~~~~~
|
|
|
|
Starting with the 2.4.20 Linux kernel, each thread is allocated an area
|
|
of per-thread, global storage. Variables can be put in this area using
|
|
some extensions to GCC that called ``ELF TLS``. By storing the dispatch table
|
|
pointer in this area, the expensive call to ``pthread_getspecific`` and
|
|
the test of ``_glapi_Dispatch`` can be avoided. As we don't support for
|
|
Linux kernel earlier than 2.4.20, so we can always using ``ELF TLS``.
|
|
|
|
The dispatch table pointer is stored in a new variable called
|
|
``_glapi_tls_Dispatch``. A new variable name is used so that a single
|
|
libGL can implement both interfaces. This allows the libGL to operate
|
|
with direct rendering drivers that use either interface. Once the
|
|
pointer is properly declared, ``GET_DISPACH`` becomes a simple variable
|
|
reference.
|
|
|
|
.. code-block:: c
|
|
:caption: TLS ``GET_DISPATCH`` Implementation
|
|
|
|
extern __THREAD_INITIAL_EXEC struct _glapi_table *_glapi_tls_Dispatch;
|
|
|
|
#define GET_DISPATCH() _glapi_tls_Dispatch
|
|
|
|
3.2. Assembly Language Dispatch Stubs
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Many platforms have difficulty properly optimizing the tail-call in the
|
|
dispatch stubs. Platforms like x86 that pass parameters on the stack
|
|
seem to have even more difficulty optimizing these routines. All of the
|
|
dispatch routines are very short, and it is trivial to create optimal
|
|
assembly language versions. The amount of optimization provided by using
|
|
assembly stubs varies from platform to platform and application to
|
|
application. However, by using the assembly stubs, many platforms can
|
|
use an additional space optimization (see :ref:`below <fixedsize>`).
|
|
|
|
The biggest hurdle to creating assembly stubs is handling the various
|
|
ways that the dispatch table pointer can be accessed. There are four
|
|
different methods that can be used:
|
|
|
|
#. Using ``_glapi_Dispatch`` directly in builds for non-multithreaded
|
|
environments.
|
|
#. Using ``_glapi_Dispatch`` and ``_glapi_get_dispatch`` in
|
|
multithreaded environments.
|
|
#. Using ``_glapi_tls_Dispatch`` directly in TLS enabled multithreaded
|
|
environments.
|
|
|
|
People wishing to implement assembly stubs for new platforms should
|
|
focus on #3 if the new platform supports TLS. Otherwise implement #2.
|
|
Environments that do not support multithreading are
|
|
uncommon and not terribly relevant.
|
|
|
|
Selection of the dispatch table pointer access method is controlled by a
|
|
few preprocessor defines.
|
|
|
|
- If ``HAVE_PTHREAD`` is defined, method #2 is used.
|
|
- If none of the preceding are defined, method #1 is used.
|
|
|
|
Two different techniques are used to handle the various different cases.
|
|
On x86 and SPARC, a macro called ``GL_STUB`` is used. In the preamble of
|
|
the assembly source file different implementations of the macro are
|
|
selected based on the defined preprocessor variables. The assembly code
|
|
then consists of a series of invocations of the macros such as:
|
|
|
|
.. code-block:: c
|
|
:caption: SPARC Assembly Implementation of ``glColor3fv``
|
|
|
|
GL_STUB(Color3fv, _gloffset_Color3fv)
|
|
|
|
The benefit of this technique is that changes to the calling pattern
|
|
(i.e., addition of a new dispatch table pointer access method) require
|
|
fewer changed lines in the assembly code.
|
|
|
|
However, this technique can only be used on platforms where the function
|
|
implementation does not change based on the parameters passed to the
|
|
function. For example, since x86 passes all parameters on the stack, no
|
|
additional code is needed to save and restore function parameters around
|
|
a call to ``pthread_getspecific``. Since x86-64 passes parameters in
|
|
registers, varying amounts of code needs to be inserted around the call
|
|
to ``pthread_getspecific`` to save and restore the GL function's
|
|
parameters.
|
|
|
|
The other technique, used by platforms like x86-64 that cannot use the
|
|
first technique, is to insert ``#ifdef`` within the assembly
|
|
implementation of each function. This makes the assembly file
|
|
considerably larger (e.g., 29,332 lines for ``glapi_x86-64.S`` versus
|
|
1,155 lines for ``glapi_x86.S``) and causes simple changes to the
|
|
function implementation to generate many lines of diffs. Since the
|
|
assembly files are typically generated by scripts, this isn't a
|
|
significant problem.
|
|
|
|
Once a new assembly file is created, it must be inserted in the build
|
|
system. There are two steps to this. The file must first be added to
|
|
``src/mesa/sources``. That gets the file built and linked. The second
|
|
step is to add the correct ``#ifdef`` magic to
|
|
``src/mesa/glapi/glapi_dispatch.c`` to prevent the C version of the
|
|
dispatch functions from being built.
|
|
|
|
.. _fixedsize:
|
|
|
|
3.3. Fixed-Length Dispatch Stubs
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
To implement ``glXGetProcAddress``, Mesa stores a table that associates
|
|
function names with pointers to those functions. This table is stored in
|
|
``src/mesa/glapi/glprocs.h``. For different reasons on different
|
|
platforms, storing all of those pointers is inefficient. On most
|
|
platforms, including all known platforms that support TLS, we can avoid
|
|
this added overhead.
|
|
|
|
If the assembly stubs are all the same size, the pointer need not be
|
|
stored for every function. The location of the function can instead be
|
|
calculated by multiplying the size of the dispatch stub by the offset of
|
|
the function in the table. This value is then added to the address of
|
|
the first dispatch stub.
|
|
|
|
This path is activated by adding the correct ``#ifdef`` magic to
|
|
``src/mesa/glapi/glapi.c`` just before ``glprocs.h`` is included.
|