mirror of
https://github.com/php/php-src.git
synced 2024-11-23 01:44:06 +08:00
Create book for docs
Closes GH-13338
This commit is contained in:
parent
68f10504de
commit
19d2b84788
29
.github/workflows/docs.yml
vendored
Normal file
29
.github/workflows/docs.yml
vendored
Normal file
@ -0,0 +1,29 @@
|
||||
name: Docs
|
||||
on:
|
||||
push:
|
||||
branches:
|
||||
- master
|
||||
paths:
|
||||
- docs/*
|
||||
pull_request:
|
||||
paths:
|
||||
- docs/*
|
||||
jobs:
|
||||
pages:
|
||||
runs-on: ubuntu-22.04
|
||||
permissions:
|
||||
pages: write
|
||||
id-token: write
|
||||
steps:
|
||||
- name: git checkout
|
||||
uses: actions/checkout@v4
|
||||
- name: Install dependencies
|
||||
run: pip install sphinx-design sphinxawesome-theme rstfmt
|
||||
- name: Check formatting
|
||||
run: rstfmt --check -w 100 docs/source
|
||||
- name: Publish
|
||||
if: github.event_name == 'push'
|
||||
uses: sphinx-notes/pages@v3
|
||||
with:
|
||||
checkout: false
|
||||
documentation_path: docs/source
|
1
docs/.gitignore
vendored
Normal file
1
docs/.gitignore
vendored
Normal file
@ -0,0 +1 @@
|
||||
build/
|
31
docs/README.md
Normal file
31
docs/README.md
Normal file
@ -0,0 +1,31 @@
|
||||
# php-src docs
|
||||
|
||||
This is the home of the php-src internal documentation. It is in very early stages, but is intended
|
||||
to become the primary place where new information about php-src is documented. Over time, it is
|
||||
expected to replace various mediums like:
|
||||
|
||||
* https://www.phpinternalsbook.com/
|
||||
* https://wiki.php.net/internals
|
||||
* Blogs from contributors
|
||||
|
||||
## How to build
|
||||
|
||||
`python` 3 and `pip` are required.
|
||||
|
||||
```bash
|
||||
pip install sphinx sphinx-design sphinxawesome-theme
|
||||
make html
|
||||
```
|
||||
|
||||
That's it! You can view the documentation under `./build/html/index.html` in your browser.
|
||||
|
||||
## Formatting
|
||||
|
||||
The files in this documentation are formatted using the [``rstfmt``](https://github.com/dzhu/rstfmt) tool.
|
||||
|
||||
```bash
|
||||
rstfmt -w 100 source
|
||||
```
|
||||
|
||||
This tool is not perfect. It breaks on custom directives, so we might switch to either a fork or
|
||||
something else in the future.
|
35
docs/make.bat
Normal file
35
docs/make.bat
Normal file
@ -0,0 +1,35 @@
|
||||
@ECHO OFF
|
||||
|
||||
pushd %~dp0
|
||||
|
||||
REM Command file for Sphinx documentation
|
||||
|
||||
if "%SPHINXBUILD%" == "" (
|
||||
set SPHINXBUILD=sphinx-build
|
||||
)
|
||||
set SOURCEDIR=source
|
||||
set BUILDDIR=build
|
||||
|
||||
%SPHINXBUILD% >NUL 2>NUL
|
||||
if errorlevel 9009 (
|
||||
echo.
|
||||
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
|
||||
echo.installed, then set the SPHINXBUILD environment variable to point
|
||||
echo.to the full path of the 'sphinx-build' executable. Alternatively you
|
||||
echo.may add the Sphinx directory to PATH.
|
||||
echo.
|
||||
echo.If you don't have Sphinx installed, grab it from
|
||||
echo.https://www.sphinx-doc.org/
|
||||
exit /b 1
|
||||
)
|
||||
|
||||
if "%1" == "" goto help
|
||||
|
||||
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
|
||||
goto end
|
||||
|
||||
:help
|
||||
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
|
||||
|
||||
:end
|
||||
popd
|
0
docs/source/_static/.gitkeep
Normal file
0
docs/source/_static/.gitkeep
Normal file
0
docs/source/_templates/.gitkeep
Normal file
0
docs/source/_templates/.gitkeep
Normal file
59
docs/source/conf.py
Normal file
59
docs/source/conf.py
Normal file
@ -0,0 +1,59 @@
|
||||
# Configuration file for the Sphinx documentation builder.
|
||||
#
|
||||
# For the full list of built-in configuration values, see the documentation:
|
||||
# https://www.sphinx-doc.org/en/master/usage/configuration.html
|
||||
|
||||
from dataclasses import asdict
|
||||
from sphinxawesome_theme import ThemeOptions
|
||||
from sphinxawesome_theme.postprocess import Icons
|
||||
from sphinx.highlighting import lexers
|
||||
from pygments.lexers.web import PhpLexer
|
||||
|
||||
lexers['php'] = PhpLexer(startinline=True)
|
||||
lexers['php-annotations'] = PhpLexer(startinline=True)
|
||||
|
||||
project = 'php-src docs'
|
||||
author = 'The PHP Group'
|
||||
extensions = [
|
||||
'sphinx_design',
|
||||
'sphinxawesome_theme.highlighting',
|
||||
]
|
||||
templates_path = ['_templates']
|
||||
html_theme = 'sphinxawesome_theme'
|
||||
html_static_path = ['_static']
|
||||
html_title = project
|
||||
html_permalinks_icon = Icons.permalinks_icon
|
||||
theme_options = ThemeOptions(
|
||||
show_prev_next=True,
|
||||
extra_header_link_icons={
|
||||
'repository on GitHub': {
|
||||
'link': 'https://github.com/php/php-src',
|
||||
'icon': (
|
||||
'<svg height="26px" style="margin-top:-2px;display:inline" '
|
||||
'viewBox="0 0 45 44" '
|
||||
'fill="currentColor" xmlns="http://www.w3.org/2000/svg">'
|
||||
'<path fill-rule="evenodd" clip-rule="evenodd" '
|
||||
'd="M22.477.927C10.485.927.76 10.65.76 22.647c0 9.596 6.223 17.736 '
|
||||
'14.853 20.608 1.087.2 1.483-.47 1.483-1.047 '
|
||||
'0-.516-.019-1.881-.03-3.693-6.04 '
|
||||
'1.312-7.315-2.912-7.315-2.912-.988-2.51-2.412-3.178-2.412-3.178-1.972-1.346.149-1.32.149-1.32 ' # noqa
|
||||
'2.18.154 3.327 2.24 3.327 2.24 1.937 3.318 5.084 2.36 6.321 '
|
||||
'1.803.197-1.403.759-2.36 '
|
||||
'1.379-2.903-4.823-.548-9.894-2.412-9.894-10.734 '
|
||||
'0-2.37.847-4.31 2.236-5.828-.224-.55-.969-2.759.214-5.748 0 0 '
|
||||
'1.822-.584 5.972 2.226 '
|
||||
'1.732-.482 3.59-.722 5.437-.732 1.845.01 3.703.25 5.437.732 '
|
||||
'4.147-2.81 5.967-2.226 '
|
||||
'5.967-2.226 1.185 2.99.44 5.198.217 5.748 1.392 1.517 2.232 3.457 '
|
||||
'2.232 5.828 0 '
|
||||
'8.344-5.078 10.18-9.916 10.717.779.67 1.474 1.996 1.474 4.021 0 '
|
||||
'2.904-.027 5.247-.027 '
|
||||
'5.96 0 .58.392 1.256 1.493 1.044C37.981 40.375 44.2 32.24 44.2 '
|
||||
'22.647c0-11.996-9.726-21.72-21.722-21.72" '
|
||||
'fill="currentColor"/></svg>'
|
||||
),
|
||||
},
|
||||
},
|
||||
)
|
||||
html_theme_options = asdict(theme_options)
|
||||
pygments_style = 'sphinx'
|
12
docs/source/core/data-structures/index.rst
Normal file
12
docs/source/core/data-structures/index.rst
Normal file
@ -0,0 +1,12 @@
|
||||
#################
|
||||
Data structures
|
||||
#################
|
||||
|
||||
.. toctree::
|
||||
:hidden:
|
||||
|
||||
zval
|
||||
reference-counting
|
||||
zend_string
|
||||
|
||||
This section provides an overview of the core data structures used in php-src.
|
228
docs/source/core/data-structures/reference-counting.rst
Normal file
228
docs/source/core/data-structures/reference-counting.rst
Normal file
@ -0,0 +1,228 @@
|
||||
####################
|
||||
Reference counting
|
||||
####################
|
||||
|
||||
In languages like C, when you need memory for storing data for an indefinite period of time or in a
|
||||
large amount, you call ``malloc`` and ``free`` to acquire and release blocks of memory of some size.
|
||||
This sounds simple on the surface but turns out to be quite tricky, mainly because the data may not
|
||||
be freed for as long as it is used anywhere in the program. Sometimes this makes it unclear who is
|
||||
responsible for freeing the memory, and when to do so. Failure to handle this correctly may result
|
||||
in a use-after-free, double-free, or memory leak.
|
||||
|
||||
In PHP you usually do not need to think about memory management. The engine takes care of allocating
|
||||
and freeing memory for you by tracking which values are no longer needed. It does this by assigning
|
||||
a reference count to each allocated value, often abbreviated as refcount or RC. Whenever a reference
|
||||
to a value is passed somewhere else, its reference count is increased to indicate the value is now
|
||||
used by another party. When the party no longer needs the value, it is responsible for decreasing
|
||||
the reference count. Once the reference count reaches zero, we know the value is no longer needed
|
||||
anywhere, and that it may be freed.
|
||||
|
||||
.. code:: php
|
||||
|
||||
$a = new stdClass; // RC 1
|
||||
$b = $a; // RC 2
|
||||
unset($a); // RC 1
|
||||
unset($b); // RC 0, free
|
||||
|
||||
Reference counting is needed for types that store auxiliary data, which are the following:
|
||||
|
||||
- Strings
|
||||
- Arrays
|
||||
- Objects
|
||||
- References
|
||||
- Resources
|
||||
|
||||
These are either reference types (objects, references and resources) or they are large types that
|
||||
don't fit in a single ``zend_value`` directly (strings, arrays). Simpler types either don't store a
|
||||
value at all (``null``, ``false``, ``true``) or their value is small enough to fit directly in
|
||||
``zend_value`` (``int``, ``float``).
|
||||
|
||||
All of the reference counted types share a common initial struct sequence.
|
||||
|
||||
.. code:: c
|
||||
|
||||
typedef struct _zend_refcounted_h {
|
||||
uint32_t refcount; /* reference counter 32-bit */
|
||||
union {
|
||||
uint32_t type_info;
|
||||
} u;
|
||||
} zend_refcounted_h;
|
||||
|
||||
struct _zend_string {
|
||||
zend_refcounted_h gc;
|
||||
// ...
|
||||
};
|
||||
|
||||
struct _zend_array {
|
||||
zend_refcounted_h gc;
|
||||
// ...
|
||||
};
|
||||
|
||||
The ``zend_refcounted_h`` struct is simple. It contains the reference count, and a ``type_info``
|
||||
field that repeats some of the type information that is also stored in the ``zval``, for situations
|
||||
where we're not dealing with a ``zval`` directly. It also stores some additional fields, described
|
||||
under `GC flags`_.
|
||||
|
||||
********
|
||||
Macros
|
||||
********
|
||||
|
||||
As with ``zval``, ``zend_refcounted_h`` members should not be accessed directly. Instead, you should
|
||||
use the provided macros. There are macros that work with reference counted types directly, prefixed
|
||||
with ``GC_``, or macros that work on ``zval`` values, usually prefixed with ``Z_``. Unfortunately,
|
||||
naming is not always consistent.
|
||||
|
||||
.. list-table:: ``zval`` macros
|
||||
:header-rows: 1
|
||||
|
||||
- - Macro
|
||||
- Non-RC [#non-rc]_
|
||||
- Description
|
||||
|
||||
- - ``Z_REFCOUNT[_P]``
|
||||
- No
|
||||
- Returns the reference count.
|
||||
|
||||
- - ``Z_ADDREF[_P]``
|
||||
- No
|
||||
- Increases the reference count.
|
||||
|
||||
- - ``Z_TRY_ADDREF[_P]``
|
||||
- Yes
|
||||
- Increases the reference count. May be called on any ``zval``.
|
||||
|
||||
- - ``zval_ptr_dtor``
|
||||
- Yes
|
||||
- Decreases the reference count and frees the value if the reference count reaches zero.
|
||||
|
||||
.. [#non-rc]
|
||||
|
||||
Whether the macro works with non-reference counted types. If it does, the operation is usually a
|
||||
no-op. If it does not, using the macro on these values is undefined behavior.
|
||||
|
||||
.. list-table:: ``zend_refcounted_h`` macros
|
||||
:header-rows: 1
|
||||
|
||||
- - Macro
|
||||
- Immutable [#immutable]_
|
||||
- Description
|
||||
|
||||
- - ``GC_REFCOUNT[_P]``
|
||||
- Yes
|
||||
- Returns the reference count.
|
||||
|
||||
- - ``GC_ADDREF[_P]``
|
||||
- No
|
||||
- Increases the reference count.
|
||||
|
||||
- - ``GC_TRY_ADDREF[_P]``
|
||||
- Yes
|
||||
- Increases the reference count.
|
||||
|
||||
- - ``GC_DTOR[_P]``
|
||||
- Yes
|
||||
- Decreases the reference count and frees the value if the reference count reaches zero.
|
||||
|
||||
.. [#immutable]
|
||||
|
||||
Whether the macro works with immutable types, described under `Immutable reference counted types`_.
|
||||
|
||||
************
|
||||
Separation
|
||||
************
|
||||
|
||||
PHP has value and reference types. Reference types are types that are shared through a reference, a
|
||||
"pointer" to the value, rather than the value itself. Modifying such a value in one place changes it
|
||||
for all of its observers. For example, writing to a property changes the property in every place the
|
||||
object is referenced. Value types, on the other hand, are copied when passed to another party.
|
||||
Modifying the original value does not affect the copy, and vice versa.
|
||||
|
||||
In PHP, arrays and strings are value types. Since they are also reference counted types, this
|
||||
requires some special care when modifying values. In particular, we need to make sure that modifying
|
||||
the value is not observable from other places. Modifying a value with RC 1 is unproblematic, since
|
||||
we are the values sole owner. However, if the value has a reference count of >1, we need to create a
|
||||
fresh copy before modifying it. This process is called separation or CoW (copy on write).
|
||||
|
||||
.. code:: php
|
||||
|
||||
$a = [1, 2, 3]; // RC 1
|
||||
$b = $a; // RC 2
|
||||
$b[] = 4; // Separation, $a RC 1, $b RC 1
|
||||
var_dump($a); // [1, 2, 3]
|
||||
var_dump($b); // [1, 2, 3, 4]
|
||||
|
||||
***********************************
|
||||
Immutable reference counted types
|
||||
***********************************
|
||||
|
||||
Sometimes, even a reference counted type is not reference counted. When PHP runs in a multi-process
|
||||
or multi-threaded environment with opcache enabled, it shares some common values between processes
|
||||
or threads to reduce memory consumption. As you may know, sharing memory between processes or
|
||||
threads can be tricky and requires special care when modifying values. In particular, modification
|
||||
usually requires exclusive access to the memory so that the other processes or threads wait until
|
||||
the value is done being updated. In this case, this synchronization is avoided by making the value
|
||||
immutable and never modifying the reference count. Such values will receive the ``GC_IMMUTABLE``
|
||||
flag in their ``gc->u.type_info`` field.
|
||||
|
||||
Some macros like ``GC_TRY_ADDREF`` will guard against immutable values. You should not use immutable
|
||||
values on some macros, like ``GC_ADDREF``. This will result in undefined behavior, because the macro
|
||||
will not check whether the value is immutable before performing the reference count modifications.
|
||||
You may execute PHP with the ``-d opcache.protect_memory=1`` flag to mark the shared memory as
|
||||
read-only and trigger a hardware exception if the code accidentally attempts to modify it.
|
||||
|
||||
*****************
|
||||
Cycle collector
|
||||
*****************
|
||||
|
||||
Sometimes, reference counting is not enough. Consider the following example:
|
||||
|
||||
.. code:: php
|
||||
|
||||
$a = new stdClass;
|
||||
$b = new stdClass;
|
||||
$a->b = $b;
|
||||
$b->a = $a;
|
||||
unset($a);
|
||||
unset($b);
|
||||
|
||||
When this code finishes, the reference count of both instances of ``stdClass`` will still be 1, as
|
||||
they reference each other. This is called a reference cycle.
|
||||
|
||||
PHP implements a cycle collector that detects such cycles and frees values that are only reachable
|
||||
through their own references. The cycle collector will record values that may be involved in a
|
||||
cycle, and run when this buffer becomes full. It is also possible to invoke it explicitly by calling
|
||||
the ``gc_collect_cycles()`` function. The cycle collectors design is described in the `Cycle
|
||||
collector <todo>`_ chapter.
|
||||
|
||||
**********
|
||||
GC flags
|
||||
**********
|
||||
|
||||
.. code:: c
|
||||
|
||||
/* zval_gc_flags(zval.value->gc.u.type_info) (common flags) */
|
||||
#define GC_NOT_COLLECTABLE (1<<4)
|
||||
#define GC_PROTECTED (1<<5) /* used for recursion detection */
|
||||
#define GC_IMMUTABLE (1<<6) /* can't be changed in place */
|
||||
#define GC_PERSISTENT (1<<7) /* allocated using malloc */
|
||||
#define GC_PERSISTENT_LOCAL (1<<8) /* persistent, but thread-local */
|
||||
|
||||
The ``GC_NOT_COLLECTABLE`` flag indicates that the value may not be involved in a reference cycle.
|
||||
This allows for a fast way to detect values that don't need to be added to the cycle collector
|
||||
buffer. Only arrays and objects may actually be involved in reference cycles.
|
||||
|
||||
The ``GC_PROTECTED`` flag is used to protect against recursion in various internal functions. For
|
||||
example, ``var_dump`` recursively prints the contents of values, and marks visited values with the
|
||||
``GC_PROTECTED`` flag. If the value is recursive, it prevents the same value from being visited
|
||||
again.
|
||||
|
||||
``GC_IMMUTABLE`` has been discussed in `Immutable reference counted types`_.
|
||||
|
||||
The ``GC_PERSISTENT`` flag indicates that the value was allocated using ``malloc``, instead of PHPs
|
||||
own allocator. Usually, such values are alive for the entire lifetime of the process, instead of
|
||||
being freed at the end of the request. See the `Zend allocator <todo>`_ chapter for more
|
||||
information.
|
||||
|
||||
The ``GC_PERSISTENT_LOCAL`` flag indicates that a ``CG_PERSISTENT`` value is only accessibly in one
|
||||
thread, and is thus still safe to modify. This flag is only used in debug builds to satisfy an
|
||||
``assert``.
|
196
docs/source/core/data-structures/zend_string.rst
Normal file
196
docs/source/core/data-structures/zend_string.rst
Normal file
@ -0,0 +1,196 @@
|
||||
#############
|
||||
zend_string
|
||||
#############
|
||||
|
||||
In C, strings are represented as sequential lists of characters, ``char*`` or ``char[]``. The end of
|
||||
the string is usually indicated by the special NUL character, ``'\0'``. This comes with a few
|
||||
significant downsides:
|
||||
|
||||
- Calculating the length of the string is expensive, as it requires walking the entire string to
|
||||
look for the terminating NUL character.
|
||||
- The string may not contain the NUL character itself.
|
||||
- It is easy to run into buffer overflows if the NUL byte is accidentally missing.
|
||||
|
||||
php-src uses the ``zend_string`` struct as an abstraction over ``char*``, which explicitly stores
|
||||
the strings length, along with some other fields. It looks as follows:
|
||||
|
||||
.. code:: c
|
||||
|
||||
struct _zend_string {
|
||||
zend_refcounted_h gc;
|
||||
zend_ulong h; /* hash value */
|
||||
size_t len;
|
||||
char val[1];
|
||||
};
|
||||
|
||||
The ``gc`` field is used for :doc:`./reference-counting`. The ``h`` field contains a hash value,
|
||||
which is used for `hash table <todo>`__ lookups. The ``len`` field stores the length of the string
|
||||
in bytes, and the ``val`` field contains the actual string data.
|
||||
|
||||
You may wonder why the ``val`` field is declared as ``char val[1]``. This is called the `struct
|
||||
hack`_ in C. It is used to create structs with a flexible size, namely by allowing the last element
|
||||
to be expanded arbitrarily. In this case, the size of ``zend_string`` depends on the strings length,
|
||||
which is determined at runtime (see ``_ZSTR_STRUCT_SIZE``). When allocating the string, we append
|
||||
enough bytes to the allocation to hold the strings content.
|
||||
|
||||
.. _struct hack: https://www.geeksforgeeks.org/struct-hack/
|
||||
|
||||
Here's a basic example of how to use ``zend_string``:
|
||||
|
||||
.. code:: c
|
||||
|
||||
// Allocate the string.
|
||||
zend_string *string = ZSTR_INIT_LITERAL("Hello world!", /* persistent */ false);
|
||||
// Write it to the output buffer.
|
||||
zend_write(ZSTR_VAL(string), ZSTR_LEN(string));
|
||||
// Decrease the reference count and free it if necessary.
|
||||
zend_string_release(string);
|
||||
|
||||
``ZSTR_INIT_LITERAL`` creates a ``zend_string`` from a string literal. It is just a wrapper around
|
||||
``zend_string_init(char *string, size_t length, bool persistent)`` that provides the length of the
|
||||
string at compile time. The ``persistent`` parameter indicates whether the string is allocated using
|
||||
``malloc`` (``persistent == true``) or ``emalloc``, `PHPs custom allocator <todo>`__ (``persistent
|
||||
== false``) that is emptied after each request.
|
||||
|
||||
When you're done using the string, you must call ``zend_string_release``, or the memory will leak.
|
||||
``zend_string_release`` will automatically call ``malloc`` or ``emalloc``, depending on how the
|
||||
string was allocated. After releasing the string, you must not access any of its fields anymore, as
|
||||
it may have been freed if you were its last user.
|
||||
|
||||
*****
|
||||
API
|
||||
*****
|
||||
|
||||
The string API is defined in ``Zend/zend_string.h``. It provides a number of functions for creating
|
||||
new strings.
|
||||
|
||||
.. list-table:: ``zend_string`` creation
|
||||
:header-rows: 1
|
||||
|
||||
- - Function/Macro [#persistent]_
|
||||
- Description
|
||||
|
||||
- - ``ZSTR_INIT(s, p)``
|
||||
- Creates a new string from a string literal.
|
||||
|
||||
- - ``zend_string_init(s, l, p)``
|
||||
- Creates a new string from a character buffer.
|
||||
|
||||
- - ``zend_string_alloc(l, p)``
|
||||
- Creates a new string of a given length without initializing its content.
|
||||
|
||||
- - ``zend_string_concat2(s1, l1, s2, l2)``
|
||||
- Creates a non-persistent string by concatenating two character buffers.
|
||||
|
||||
- - ``zend_string_concat3(...)``
|
||||
- Same as ``zend_string_concat2``, but for three character buffers.
|
||||
|
||||
- - ``ZSTR_EMPTY_ALLOC()``
|
||||
- Gets an immutable, empty string. This does not allocate memory.
|
||||
|
||||
- - ``ZSTR_CHAR(char)``
|
||||
- Gets an immutable, single-character string. This does not allocate memory.
|
||||
|
||||
- - ``ZSTR_KNOWN(ZEND_STR_const)``
|
||||
|
||||
- Gets an immutable, predefined string. Used for string common within PHP itself, e.g.
|
||||
``"class"``. See ``ZEND_KNOWN_STRINGS`` in ``Zend/zend_string.h``. This does not allocate
|
||||
memory.
|
||||
|
||||
.. [#persistent]
|
||||
|
||||
``s`` = ``zend_string``, ``l`` = ``length``, ``p`` = ``persistent``.
|
||||
|
||||
As per php-src fashion, you are not supposed to access the ``zend_string`` fields directly. Instead,
|
||||
use the following macros. There are macros for both ``zend_string`` and ``zvals`` known to contain
|
||||
strings.
|
||||
|
||||
.. list-table:: Accessor macros
|
||||
:header-rows: 1
|
||||
|
||||
- - ``zend_string``
|
||||
- ``zval``
|
||||
- Description
|
||||
|
||||
- - ``ZSTR_LEN``
|
||||
- ``Z_STRLEN[_P]``
|
||||
- Returns the length of the string in bytes.
|
||||
|
||||
- - ``ZSTR_VAL``
|
||||
- ``Z_STRVAL[_P]``
|
||||
- Returns the string data as a ``char*``.
|
||||
|
||||
- - ``ZSTR_HASH``
|
||||
- ``Z_STRHASH[_P]``
|
||||
- Computes the string has if it hasn't already been, and returns it.
|
||||
|
||||
- - ``ZSTR_H``
|
||||
- \-
|
||||
- Returns the string hash. This macro assumes that the hash has already been computed.
|
||||
|
||||
.. list-table:: Reference counting macros
|
||||
:header-rows: 1
|
||||
|
||||
- - Macro
|
||||
- Description
|
||||
|
||||
- - ``zend_string_copy(s)``
|
||||
- Increases the reference count and returns the same string. The reference count is not
|
||||
increased if the string is interned.
|
||||
|
||||
- - ``zend_string_release(s)``
|
||||
- Decreases the reference count and frees the string if it goes to 0.
|
||||
|
||||
- - ``zend_string_dup(s, p)``
|
||||
- Creates a true copy of the string in a new allocation, except if the string is interned.
|
||||
|
||||
- - ``zend_string_separate(s)``
|
||||
- Duplicates the string if the reference count is greater than 1. See
|
||||
:doc:`./reference-counting` for details.
|
||||
|
||||
- - ``zend_string_realloc(s, l, p)``
|
||||
|
||||
- Changes the size of the string. If the string has a reference count greater than 1 or if
|
||||
the string is interned, a new string is created. You must always use the return value of
|
||||
this function, as the original array may have been moved to a new location in memory.
|
||||
|
||||
There are various functions to compare strings. The ``zend_string_equals`` function compares two
|
||||
strings in full, while ``zend_string_starts_with`` checks whether the first argument starts with the
|
||||
second. There are variations for ``_ci`` and ``_literal``, i.e. case-insensitive comparison and
|
||||
literal strings, respectively. We won't go over all variations here, as they are straightforward to
|
||||
use.
|
||||
|
||||
******************
|
||||
Interned strings
|
||||
******************
|
||||
|
||||
Programs use some strings many times. For example, if your program declares a class called
|
||||
``MyClass``, it would be wasteful to allocate a new string ``"MyClass"`` every time it is referenced
|
||||
within your program. Instead, when repeated strings are expected, php-src uses a technique called
|
||||
string interning. Essentially, this is just a simple `HashTable <todo>`__ where existing interned
|
||||
strings are stored. When creating a new interned string, php-src first checks the interned string
|
||||
buffer. If it finds it there, it can return a pointer to the existing string. If it doesn't, it
|
||||
allocates a new string and adds it to the buffer.
|
||||
|
||||
.. code:: c
|
||||
|
||||
zend_string *str1 = zend_new_interned_string(
|
||||
ZSTR_INIT_LITERAL("MyClass", /* persistent */ false));
|
||||
|
||||
// In some other place entirely.
|
||||
zend_string *str2 = zend_new_interned_string(
|
||||
ZSTR_INIT_LITERAL("MyClass", /* persistent */ false));
|
||||
|
||||
assert(ZSTR_IS_INTERNED(str1));
|
||||
assert(ZSTR_IS_INTERNED(str2));
|
||||
assert(str1 == str2);
|
||||
|
||||
Interned strings are *not* reference counted, as they are expected to live for the entire request,
|
||||
or longer.
|
||||
|
||||
With opcache, this goes one step further by sharing strings across different processes. For example,
|
||||
if you're using php-fpm with 8 workers, all workers will share the same interned strings buffer. It
|
||||
gets a bit more complicated. During requests, no interned strings are actually created. Instead,
|
||||
this is delayed until the script is persisted to shared memory. This means that
|
||||
``zend_new_interned_string`` may not actually return an interned string if opcache is enabled.
|
||||
Usually you don't have to worry about this.
|
225
docs/source/core/data-structures/zval.rst
Normal file
225
docs/source/core/data-structures/zval.rst
Normal file
@ -0,0 +1,225 @@
|
||||
######
|
||||
zval
|
||||
######
|
||||
|
||||
PHP is a dynamic language. A variable can typically contain a value of any type, and the type of the
|
||||
variable may even change during the execution of the program. Under the hood, this is implemented
|
||||
through the ``zval`` struct. It is one of the most important data structures in php-src. It is
|
||||
implemented as a "tagged union", meaning it stores what type of value it contains, and the value
|
||||
itself. Let's look at the type first.
|
||||
|
||||
************
|
||||
zval types
|
||||
************
|
||||
|
||||
.. code:: c
|
||||
|
||||
#define IS_UNDEF 0 /* A variable that was never written to. */
|
||||
#define IS_NULL 1
|
||||
#define IS_FALSE 2
|
||||
#define IS_TRUE 3
|
||||
#define IS_LONG 4 /* An integer value. */
|
||||
#define IS_DOUBLE 5 /* A floating point value. */
|
||||
#define IS_STRING 6
|
||||
#define IS_ARRAY 7
|
||||
#define IS_OBJECT 8
|
||||
#define IS_RESOURCE 9
|
||||
#define IS_REFERENCE 10
|
||||
|
||||
These simple integer constants determine what value is currently stored in a variable. If you are a
|
||||
PHP developer, these types should sound fairly familiar. They are pretty much an exact reflection of
|
||||
the types you may use in regular PHP code. One small oddity is that ``IS_FALSE`` and ``IS_TRUE`` are
|
||||
implemented as separate types, instead of as a ``IS_BOOL`` type.
|
||||
|
||||
Some of these types are self-contained, they don't store any auxiliary data. This includes
|
||||
``IS_UNDEF``, ``IS_NULL``, ``IS_FALSE`` and ``IS_TRUE``. For the rest of the types, we are going to
|
||||
require some additional memory to store the actual value of the variable.
|
||||
|
||||
************
|
||||
zend_value
|
||||
************
|
||||
|
||||
.. code:: c
|
||||
|
||||
typedef union _zend_value {
|
||||
zend_long lval; /* long value, i.e. int. */
|
||||
double dval; /* double value, i.e. float. */
|
||||
zend_refcounted *counted;
|
||||
zend_string *str;
|
||||
zend_array *arr;
|
||||
zend_object *obj;
|
||||
zend_resource *res;
|
||||
zend_reference *ref;
|
||||
// Less important for now.
|
||||
zend_ast_ref *ast;
|
||||
zval *zv;
|
||||
void *ptr;
|
||||
zend_class_entry *ce;
|
||||
zend_function *func;
|
||||
struct {
|
||||
uint32_t w1;
|
||||
uint32_t w2;
|
||||
} ww;
|
||||
} zend_value;
|
||||
|
||||
A C union is a data type that may store any one of its members at a time, by being (at least) as big
|
||||
as its biggest member. For example, ``zend_value`` may store the ``lval`` member, or the ``dval``
|
||||
member, but never both at the same time. However, it doesn't know which member is being stored.
|
||||
Remembering this is our job, and that's exactly what the ``IS_*`` constants are for.
|
||||
|
||||
The top members of ``zend_value`` mostly mirror the ``IS_*`` constants, with the exception of
|
||||
``counted``. ``counted`` polymorphically refers to any `reference counted <todo>`__ value, including
|
||||
strings, arrays, objects, resources and references. ``null`` and ``bool`` are missing from
|
||||
``zend_value`` because their types are self-contained.
|
||||
|
||||
The rest of the fields aren't important for now.
|
||||
|
||||
******
|
||||
zval
|
||||
******
|
||||
|
||||
Together, the value and the tag make up the ``zval``, along with some other fields. It may look
|
||||
intimidating at first. We'll go over it step by step.
|
||||
|
||||
.. code:: c
|
||||
|
||||
typedef struct _zval_struct zval;
|
||||
|
||||
struct _zval_struct {
|
||||
zend_value value;
|
||||
union {
|
||||
uint32_t type_info;
|
||||
struct {
|
||||
ZEND_ENDIAN_LOHI_3(
|
||||
uint8_t type, /* active type */
|
||||
uint8_t type_flags,
|
||||
union {
|
||||
uint16_t extra; /* not further specified */
|
||||
} u)
|
||||
} v;
|
||||
} u1;
|
||||
union {
|
||||
uint32_t next; /* hash collision chain */
|
||||
uint32_t cache_slot; /* cache slot (for RECV_INIT) */
|
||||
uint32_t opline_num; /* opline number (for FAST_CALL) */
|
||||
uint32_t lineno; /* line number (for ast nodes) */
|
||||
uint32_t num_args; /* arguments number for EX(This) */
|
||||
uint32_t fe_pos; /* foreach position */
|
||||
uint32_t fe_iter_idx; /* foreach iterator index */
|
||||
uint32_t guard; /* recursion and single property guard */
|
||||
uint32_t constant_flags; /* constant flags */
|
||||
uint32_t extra; /* not further specified */
|
||||
} u2;
|
||||
};
|
||||
|
||||
``zval.value`` reserves space for the actual variable data, as discussed above.
|
||||
|
||||
``zval.u1`` stores the variable type, the given ``IS_*`` constant, along with some other flags. It's
|
||||
definition looks a bit complicated. You can think of the entire field as a 4 bit integer, split into
|
||||
3 parts. ``v.type`` stores the actual variable type, ``v.type_flags`` is used for some `reference
|
||||
counting <todo>`__ flags, and ``v.u.extra`` is pretty much unused.
|
||||
|
||||
``zval.u2`` defines some more storage for various contexts that is often unoccupied. It's there
|
||||
because the memory would otherwise be wasted due to padding, so we may as well make use of it. We'll
|
||||
go over the relevant ones in their corresponding chapters.
|
||||
|
||||
********
|
||||
Macros
|
||||
********
|
||||
|
||||
The fields in ``zval`` should never be accessed directly. Instead, there are a plethora of macros to
|
||||
access them, concealing some of the implementation details of the ``zval`` struct. For many macros,
|
||||
there's a ``_P``-suffixed variant that performs the same operation on a pointer to the given
|
||||
``zval``.
|
||||
|
||||
.. list-table:: ``zval`` macros
|
||||
:header-rows: 1
|
||||
|
||||
- - Macro
|
||||
- Description
|
||||
- - ``Z_TYPE[_P]``
|
||||
- Access the ``zval.u1.v.type`` part of the type flags, containing the ``IS_*`` type.
|
||||
- - ``Z_LVAL[_P]``
|
||||
- Access the underlying ``int`` value.
|
||||
- - ``Z_DVAL[_P]``
|
||||
- Access the underlying ``float`` value.
|
||||
- - ``Z_STR[_P]``
|
||||
- Access the underlying ``zend_string`` pointer.
|
||||
- - ``Z_STRVAL[_P]``
|
||||
- Access the strings raw ``char *`` pointer.
|
||||
- - ``Z_STRLEN[_P]``
|
||||
- Access the strings length.
|
||||
- - ``ZVAL_COPY_VALUE(t, s)``
|
||||
- Copy one ``zval`` to another, including type and value.
|
||||
- - ``ZVAL_COPY(t, s)``
|
||||
- Same as ``ZVAL_COPY_VALUE``, but if the value is reference counted, increase the counter.
|
||||
|
||||
..
|
||||
_todo: There are many more.
|
||||
|
||||
******************
|
||||
Other zval types
|
||||
******************
|
||||
|
||||
``zval``\ s are sometimes used internally with types that don't exist in userland.
|
||||
|
||||
.. code:: c
|
||||
|
||||
#define IS_CONSTANT_AST 11
|
||||
#define IS_INDIRECT 12
|
||||
#define IS_PTR 13
|
||||
#define IS_ALIAS_PTR 14
|
||||
#define _IS_ERROR 15
|
||||
|
||||
``IS_CONSTANT_AST`` is used to represent constant values (the right hand side of ``const``,
|
||||
property/parameter initializers, etc.) before they are evaluated. The evaluation of a constant
|
||||
expression is not always possible during compilation, because they may contain references to values
|
||||
only available at runtime. Until that evaluation is possible, the constants contain the AST of the
|
||||
expression rather than the concrete values. Check the `parser <todo>`__ chapter for more information
|
||||
on ASTs. When this flag is set, the ``zval.value.ast`` union member is set accordingly.
|
||||
|
||||
``IS_INDIRECT`` indicates that the ``zval.value.zv`` member is populated. This field stores a
|
||||
pointer to some other ``zval``. This type is mainly used in two situations, namely for intermediate
|
||||
values between ``FETCH`` and ``ASSIGN`` instructions, and for the sharing of variables in the symbol
|
||||
table.
|
||||
|
||||
..
|
||||
_todo: There are many more.
|
||||
|
||||
``IS_PTR`` is used for pointers to arbitrary data. Most commonly, this type is used internally for
|
||||
``HashTable``, as ``HashTable`` may only store ``zval`` values. For example, ``EG(class_table)``
|
||||
represents the class table, which is a hash map of class names to the corresponding
|
||||
``zend_class_entry``, representing the class. The same goes for functions and many other data types.
|
||||
``IS_ALIAS_PTR`` is used for class aliases registered via ``class_alias``. Essentially, it just
|
||||
allows differencing between members in the class table that are aliases, or actual classes.
|
||||
Otherwise, it is essentially the same as ``IS_PTR``. Arbitrary data is accessed through
|
||||
``zval.value.ptr``, and casted to the correct type depending on context. If ``ptr`` stores a class
|
||||
or function, the ``zval.value.ce`` or ``zval.value.func`` fields may be used, respectively.
|
||||
|
||||
``_IS_ERROR`` is used as an error value for some `object handlers <todo>`__. It is described in more
|
||||
detail in its own chapter.
|
||||
|
||||
.. code:: c
|
||||
|
||||
/* Fake types used only for type hinting.
|
||||
* These are allowed to overlap with the types below. */
|
||||
#define IS_CALLABLE 12
|
||||
#define IS_ITERABLE 13
|
||||
#define IS_VOID 14
|
||||
#define IS_STATIC 15
|
||||
#define IS_MIXED 16
|
||||
#define IS_NEVER 17
|
||||
|
||||
/* used for casts */
|
||||
#define _IS_BOOL 18
|
||||
#define _IS_NUMBER 19
|
||||
|
||||
These flags are never actually stored in ``zval.u1``. They are used for type hinting and in the
|
||||
`object handler <todo>`__ API.
|
||||
|
||||
This only leaves the ``zval.value.ww`` field. In short, this field is used on 32-bit platforms when
|
||||
copying data from one ``zval`` to another. Normally, ``zval.value.counted`` is copied as a generic
|
||||
value, no matter what the actual underlying type is. ``zend_value`` always consists of 8 bytes due
|
||||
to the ``double`` field. Pointers, however, consist only of 4. Because we would otherwise miss the
|
||||
other 4 bytes, they are copied manually using ``z->value.ww.w2 = _w2;``. This happens in the
|
||||
``ZVAL_COPY_VALUE_EX`` macro, you won't ever have to care about this.
|
58
docs/source/index.rst
Normal file
58
docs/source/index.rst
Normal file
@ -0,0 +1,58 @@
|
||||
##############
|
||||
php-src docs
|
||||
##############
|
||||
|
||||
.. toctree::
|
||||
:caption: Introduction
|
||||
:hidden:
|
||||
|
||||
introduction/high-level-overview
|
||||
|
||||
.. toctree::
|
||||
:caption: Core
|
||||
:hidden:
|
||||
|
||||
core/data-structures/index
|
||||
|
||||
Welcome to the php-src documentation!
|
||||
|
||||
.. warning::
|
||||
|
||||
This documentation is work in progress.
|
||||
|
||||
At this point in time, there are other guides that provide a more complete picture of the PHP
|
||||
project. Check the `CONTRIBUTING.md
|
||||
<https://github.com/php/php-src/blob/master/CONTRIBUTING.md#technical-resources>`__ file for a
|
||||
list of technical resources.
|
||||
|
||||
php-src is the canonical implementation of the interpreter for the PHP programming language, as well
|
||||
as various extensions that provide common functionality. This documentation is intended to help you
|
||||
understand how the interpreter works, how you can build and test changes, and how you can create
|
||||
extensions yourself.
|
||||
|
||||
This documentation is not intended to be comprehensive, but is meant to explain core concepts that
|
||||
are not easy to grasp by reading code alone. It describes best practices, and will frequently omit
|
||||
APIs that are discouraged for general use.
|
||||
|
||||
******************
|
||||
How to get help?
|
||||
******************
|
||||
|
||||
Getting started with a new and complicated project like php-src can be overwhelming. While there's
|
||||
no way around reading lots and lots of code, asking questions of somebody with experience can save a
|
||||
lot of time. Luckily, many core developers are eager to help. Here are some ways you can get in
|
||||
touch.
|
||||
|
||||
- `Discord <https://phpc.chat>`__ (``#php-internals`` channel)
|
||||
- `R11 on StackOverflow <https://chat.stackoverflow.com/rooms/11/php>`__
|
||||
|
||||
***************
|
||||
Prerequisites
|
||||
***************
|
||||
|
||||
The php-src interpreter is written in C, and so are most of the bundled extensions. While extensions
|
||||
may also be written in C++, ext-intl is currently the only bundled extension to do so. It is
|
||||
advisable that you have *some* knowledge of C before jumping into php-src.
|
||||
|
||||
It is also advisable to get familiar with the semantics of PHP itself, so that you may better
|
||||
differentiate between bugs and expected behavior, and model new language features.
|
194
docs/source/introduction/high-level-overview.rst
Normal file
194
docs/source/introduction/high-level-overview.rst
Normal file
@ -0,0 +1,194 @@
|
||||
#####################
|
||||
High-level overview
|
||||
#####################
|
||||
|
||||
PHP is an interpreted language. Interpreted languages differ from compiled ones in that they aren't
|
||||
compiled into machine-readable code ahead of time. Instead, the source files are read, processed and
|
||||
interpreted when the program is executed. This can be very convenient for developers for rapid
|
||||
prototyping, as it skips a lengthy compilation phase. However, it also poses some unique challenges
|
||||
to performance, which is one of the primary reasons interpreters can be complex. php-src borrows
|
||||
many concepts from other compilers and interpreters.
|
||||
|
||||
**********
|
||||
Pipeline
|
||||
**********
|
||||
|
||||
The goal of the interpreter is to read the users source files, and to simulate the users intent.
|
||||
This process can be split into distinct phases that are easier to understand and implement.
|
||||
|
||||
- Tokenization - splitting whole source files into words, called tokens.
|
||||
- Parsing - building a tree structure from tokens, called AST (abstract syntax tree).
|
||||
- Compilation - traversing the AST and building a list of operations, called opcodes.
|
||||
- Interpretation - reading and executing opcodes.
|
||||
|
||||
php-src as a whole can be seen as a pipeline consisting of these stages, using the input of the
|
||||
previous phase and producing some output for the next.
|
||||
|
||||
.. code:: haskell
|
||||
|
||||
source_code
|
||||
|> tokenizer -- tokens
|
||||
|> parser -- ast
|
||||
|> compiler -- opcodes
|
||||
|> interpreter
|
||||
|
||||
Let's go into each phase in a bit more detail.
|
||||
|
||||
**************
|
||||
Tokenization
|
||||
**************
|
||||
|
||||
Tokenization, often called "lexing" or "scanning", is the process of taking an entire program file
|
||||
and splitting it into a list of words and symbols. Tokens generally consist of a type, a simple
|
||||
integer constant representing the token, and a lexeme, the literal string used in the source code.
|
||||
|
||||
.. code:: php
|
||||
|
||||
if ($cond) {
|
||||
echo "Cond is true\n";
|
||||
}
|
||||
|
||||
.. code:: text
|
||||
|
||||
T_IF "if"
|
||||
T_WHITESPACE " "
|
||||
"("
|
||||
T_VARIABLE "$cond"
|
||||
")"
|
||||
T_WHITESPACE " "
|
||||
"{"
|
||||
T_WHITESPACE "\n "
|
||||
T_ECHO "echo"
|
||||
T_WHITESPACE " "
|
||||
T_CONSTANT_ENCAPSED_STRING '"Cond is true\n"'
|
||||
";"
|
||||
T_WHITESPACE "\n"
|
||||
"}"
|
||||
|
||||
While tokenizers are not difficult to write by hand, PHP uses a tool called ``re2c`` to automate
|
||||
this process. It takes a definition file and generates efficient C code to build these tokens from a
|
||||
stream of characters. The definition for PHP lives in ``Zend/zend_language_scanner.l``. Check the
|
||||
`re2c documentation`_ for details.
|
||||
|
||||
.. _re2c documentation: https://re2c.org/
|
||||
|
||||
*********
|
||||
Parsing
|
||||
*********
|
||||
|
||||
Parsing is the process of reading the tokens generated from the tokenizer and building a tree
|
||||
structure from it. To humans, how source code elements are grouped seems obvious through whitespace
|
||||
and the usage of symbols like ``()`` and ``{}``. However, computers cannot visually glance over the
|
||||
code to determine these boundaries quickly. To make it easier and faster to work with, we build a
|
||||
tree structure from the tokens to more closely reflect the source code the way humans see it.
|
||||
|
||||
Here is a simplified example of what an AST from the tokens above might look like.
|
||||
|
||||
.. code:: text
|
||||
|
||||
ZEND_AST_IF {
|
||||
ZEND_AST_IF_ELEM {
|
||||
ZEND_AST_VAR {
|
||||
ZEND_AST_ZVAL { "cond" },
|
||||
},
|
||||
ZEND_AST_STMT_LIST {
|
||||
ZEND_AST_ECHO {
|
||||
ZEND_AST_ZVAL { "Cond is true\n" },
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
Each AST node has a type and may have children. They also store their original position in the
|
||||
source code, and may define some arbitrary flags. These are omitted for brevity.
|
||||
|
||||
Like with tokenization, we use a tool called ``Bison`` to generate the parser implementation from a
|
||||
grammar specification. The grammar lives in the ``Zend/zend_language_parser.y`` file. Check the
|
||||
`Bison documentation`_ for details. Luckily, the syntax is quite approachable.
|
||||
|
||||
.. _bison documentation: https://www.gnu.org/software/bison/manual/
|
||||
|
||||
Parsing is described in more detail in its `dedicated chapter <todo>`__.
|
||||
|
||||
*************
|
||||
Compilation
|
||||
*************
|
||||
|
||||
Computers don't understand human language, or even programming languages. They only understand
|
||||
machine code, which are sequences of simple, mostly atomic instructions for doing one thing. For
|
||||
example, they may add two numbers, load some memory from RAM, jump to an instruction under a certain
|
||||
condition, etc. It turns out that even the most complex expressions can be reduced to a number of
|
||||
these simple instructions.
|
||||
|
||||
PHP is a bit different, in that it does not execute machine code directly. Instead, instructions run
|
||||
on a "virtual machine", often abbreviated to VM. This is just a fancy way of saying that there is no
|
||||
physical machine you can buy that understands these instructions, but that this machine is
|
||||
implemented in software. This is our interpreter. This also means that we are free to make up
|
||||
instructions ourselves at will. Some of these instructions look very similar to something you'd find
|
||||
in an actual CPU instruction set (e.g. adding two numbers), while others are much more high-level
|
||||
(e.g. load property of object by name).
|
||||
|
||||
With that little detour out of the way, the job of the compiler is to read the AST and translate it
|
||||
into our virtual machine instructions, also called opcodes. The code responsible for this
|
||||
transformation lives in ``Zend/zend_compile.c``. It essentially traverses the AST and generates a
|
||||
number of instructions, before going to the next node.
|
||||
|
||||
Here's what the surprisingly compact opcodes for the AST above might look like:
|
||||
|
||||
.. code:: text
|
||||
|
||||
0000 JMPZ CV0($cond) 0002
|
||||
0001 ECHO string("Cond is true\n")
|
||||
0002 RETURN int(1)
|
||||
|
||||
****************
|
||||
Interpretation
|
||||
****************
|
||||
|
||||
Finally, the opcodes are read and executed by the interpreter. PHPs uses `three-address code`_ for
|
||||
instructions. This essentially means that each instructions may have a result value, and at most two
|
||||
operands. Most modern CPUs also use this format. Both result and operands in PHP are :doc:`zvals
|
||||
<../core/data-structures/zval>`.
|
||||
|
||||
.. _three-address code: https://en.wikipedia.org/wiki/Three-address_code
|
||||
|
||||
How exactly each opcode behaves depends on its purpose. You can find a complete list of opcodes in
|
||||
the generated ``Zend/zend_vm_opcodes.h`` file. The behavior of each instruction is defined in
|
||||
``Zend/zend_vm_def.h``.
|
||||
|
||||
Let's step through the opcodes form the example above:
|
||||
|
||||
- We start at the top, i.e. ``JMPZ``. If its first operand contains a "falsy" value, it will jump
|
||||
to the instruction encoded in its second operand. If it is truthy, it will simply fall-through to
|
||||
the next instruction.
|
||||
|
||||
- The ``ECHO`` instruction prints its first operand.
|
||||
|
||||
- The ``RETURN`` operand terminates the current function.
|
||||
|
||||
With these simple rules, we can see that the interpreter will ``echo`` only when ``$cond`` is
|
||||
truthy, and skip over the ``echo`` otherwise.
|
||||
|
||||
That's it! This is how PHP works, fundamentally. Of course, we skipped over a ton of details. The VM
|
||||
is quite complex, and will be discussed separately in the `virtual machine <todo>`__ chapter.
|
||||
|
||||
*********
|
||||
Opcache
|
||||
*********
|
||||
|
||||
As you may imagine, running this whole pipeline every time PHP serves a request is time consuming.
|
||||
Luckily, it is also not necessary. We can cache the opcodes in memory between requests, to skip over
|
||||
all of the phases, except for the execution phase. This is precisely what the opcache extension
|
||||
does. It lives in the ``ext/opcache`` directory.
|
||||
|
||||
Opcache also performs some optimizations on the opcodes before caching them. As opcaches are
|
||||
expected to be reused many times, it is profitable to spend some additional time simplifying them if
|
||||
possible to improve performance during execution. The optimizer lives in ``Zend/Optimizer``.
|
||||
|
||||
JIT
|
||||
===
|
||||
|
||||
The opcache also implements a JIT compiler, which stands for just-in-time compiler. This compiler
|
||||
takes the virtual PHP opcodes and turns it into actual machine instructions, with additional
|
||||
information gained at runtime. JITs are very complex pieces of software, so this book will likely
|
||||
barely scratch the surface of how it works. It lives in ``ext/opcache/jit``.
|
Loading…
Reference in New Issue
Block a user