* Add behavioural tests for incdec operators
* Add support to ++/-- for objects castable to _IS_NUMBER
* Add str_increment() function
* Add str_decrement() function
RFC: https://wiki.php.net/rfc/saner-inc-dec-operators
Co-authored-by: Ilija Tovilo <ilija.tovilo@me.com>
Co-authored-by: Arnaud Le Blanc <arnaud.lb@gmail.com>
When the code was moved to solve the uaf for memory overflow, this
caused the refcount to be higher than one in some self-concatenation
scenarios. This in turn causes quadratic time performance problems when
these concatenations happen in a loop.
Closes GH-11508.
free_op2_string may be set to false when the operands are not strings, and
`result == op1 == op2`, by re-using the same string for both operands. In that
case, the string should still be copied to result because result is not actually
a string. Also change the op1 branch to stay consistent.
Introduced by GH-10049
The following sequence of actions was happening which caused a null
pointer dereference:
1. debug_backtrace() returns an array
2. The concatenation to $c will transform the array to a string via
`zval_get_string_func` for op2 and output a warning.
Note that zval op1 is of type string due to the first do-while
sequence.
3. The warning of an implicit "array to string conversion" triggers
the ob_start callback to run. This code transform $c (==op1) to a long.
4. The code below the 2 do-while sequences assume that both op1 and op2
are strings, but this is no longer the case. A dereference of the
string will therefore result in a null pointer dereference.
The solution used here is to work with the zend_string directly instead
of with the ops.
For the tests:
Co-authored-by: changochen1@gmail.com
Co-authored-by: cmbecker69@gmx.de
Co-authored-by: yukik@risec.co.jp
Closes GH-10049.
Since lowercasing and uppercasing is a common operation for both
internal purposes and userland purposes, it makes sense to implement a
NEON accelerated version for this.
When using ZEND_NORMALIZE_BOOL(a - b) where a and b are doubles, this
generates the following instruction sequence on x64:
subsd xmm0, xmm1
pxor xmm1, xmm1
comisd xmm0, xmm1
...
whereas if we use ZEND_THREEWAY_COMPARE we get two instructions less:
ucomisd xmm0, xmm1
The only difference is that the threeway compare uses *u*comisd instead
of comisd. The difference is that it will cause a FP signal if a
signaling NAN is used, but as far as I'm aware this doesn't matter for
our use case.
Similarly, the amount of instructions on AArch64 is also quite a bit
lower for this code compared to the old code.
** Results **
Using the benchmark https://gist.github.com/nielsdos/b36517d81a1af74d96baa3576c2b70df
I used hyperfine: hyperfine --runs 25 --warmup 3 './sapi/cli/php sort_double.php'
No extensions such as opcache used during benchmarking.
BEFORE THIS PATCH
-----------------
Time (mean ± σ): 255.5 ms ± 2.2 ms [User: 251.0 ms, System: 2.5 ms]
Range (min … max): 251.5 ms … 260.7 ms 25 runs
AFTER THIS PATCH
----------------
Time (mean ± σ): 236.2 ms ± 2.8 ms [User: 228.9 ms, System: 5.0 ms]
Range (min … max): 231.5 ms … 242.7 ms 25 runs
At least on 32-bit, the address computations overflow in running the
test on CI with UBSAN enabled. Fix it by reordering the arithmetic.
Since a part of the expression is already used in the code above the
computation, this should not negatively affect performance.
Closes GH-10936.
`zend_uchar` suggests that the value is an ASCII character, but here,
it's about very small integers. This is misleading, so let's use a
C99 integer instead.
On all architectures currently supported by PHP, `zend_uchar` and
`uint8_t` are identical. This change is only about code readability.
This abstracts away, and cleans up, the flag handling for properties of
strings that hold when concatenating two strings if they both hold that
property. (These macros also work with simply copies of strings because
a copy of a string can be considered a concatenation with the empty
string.) This gets rid of some branches and some repetitive code, and
leaves room for adding more flags like these in the future.
On short strings, there is no difference in performance. However, for
strings around 10,000 bytes long, the AVX2-accelerated function is
about 55% faster than the SSE2-accelerated one.
The UTF-8 valid flag needs to be copied upon interning,
otherwise strings that are concatenated at compile time lose this information.
However, if previously this string was interned without the flag it is not added
E.g. in the case the string is an existing class name.
Co-authored-by: Niels Dossche <7771979+nielsdos@users.noreply.github.com>
I learned this trick for doing a faster bounds check with both upper
and lower bounds by reading a disassembler listing of optimized code
produced by GCC; instead of doing 2 compares to check the upper and the
lower bound, add an immediate value to shift the range you are testing
for to the far low or high end of the range of possible values for the
type in question, and then a single compare will do. Intstead of
compare + compare + AND, you just do ADD + compare.
From microbenchmarking on my development PC, this makes strtoupper()
about 10% faster on long strings (~10,000 bytes).
* Remove ZEND_DVAL_TO_LVAL_CAST_OK
As far as I can see, this operation should always use the _slow method, and the results seem to be wrong when ZEND_DVAL_TO_LVAL_CAST_OK is enabled.
* update NEWS
Add zend_ini_parse_quantity() and deprecate zend_atol(), zend_atoi()
zend_atol() and zend_atoi() don't just do number parsing.
They also check for a 'K', 'M', or 'G' at the end of the string,
and multiply the parsed value out accordingly.
Unfortunately, they ignore any other non-numerics between the
numeric component and the last character in the string.
This means that numbers such as the following are both valid
and non-intuitive in their final output.
* "123KMG" is interpreted as "123G" -> 132070244352
* "123G " is interpreted as "123 " -> 123
* "123GB" is interpreted as "123B" -> 123
* "123 I like tacos." is also interpreted as "123." -> 123
Currently, in php-src these functions are used only for parsing ini values.
In this change we deprecate zend_atol(), zend_atoi(), and introduce a new
function with the same behavior, but with the ability to report invalid inputs
to the caller. The function's name also makes the behavior less unexpected:
zend_ini_parse_quantity().
Co-authored-by: Sara Golemon <pollita@php.net>
Casting a huge unsigned value to signed is implementation-defined
behavior in C. By introducing the ZEND_THREEWAY_COMPARE() macro, we
can sidestep this integer overflow/underflow/casting problem.
Unfortunately, libedit is locale based and does not accept UTF-8
input when the C locale is used. This patch switches the default
locale to C.UTF-8 instead (if it is available). This makes libedit
work and I believe it shouldn't affect behavior of single-byte
locale-dependent functions that PHP otherwise uses.
Closes GH-7635.
zend_class_implements_interface works fine if the "class" is an
interface, so simply drop this assertion. This avoids the need to
special case this situation.
Add a family of upper case conversion functions to zend_operators.c,
by analogy with the lower case functions.
Move the single-character conversion macros to the header so that they
can be used as a locale-independent replacement for tolower() and
toupper().
Factor out the ugly bits of the SSE2 case conversion so that the four
functions that use it are easy to read and processor-independent.
Use the new ASCII upper case functions in ext/xml, ext/pdo_dblib and as
an optimization for strtoupper() when the locale is "C".
These are thin wrappers ... around the wrong functions. They call
the "_l()" version of the underlying APIs. For clarify, just call
the wrapped API directly.
zend_double_to_str() converts a double to string in the way that
(string) would (using %.*H using precision).
smart_str_append_double() provides some more fine control over
the precision, and whether a zero fraction should be appeneded
for whole numbers.
A caveat here is that raw calls to zend_gcvt and going through
s*printf has slightly different behavior for the degenarate
precision=0 case. zend_gcvt will add a dummy E+0 in that case,
while s*printf convert this to precision=1 and will not. I'm
going with the s*printf behavior here, which is more common,
but does result in a minor change to the precision.phpt test.
Trying to allocate a `zend_string` with a length only slighty smaller
than `SIZE_MAX` causes an integer overflow, so callers may need to
check that explicitly. To make that easy in a portable way, we
introduce `ZSTR_MAX_LEN`.
Closes GH-7294.