For php-ast interning the file name is an effective memory leak,
see php-ast#134.
I don't think there's any reason to do this. At some point this
was needed due to bugs in the interned string mechanism that
caused issues if the string was later interned, e.g. through a
__FILE__ reference. These issues have since been resolved.
In conjunction with the filenames_table removal in c4016ecd44
this means that filenames now need to be refcounted like normal
strings. In particular the filename reference in op_arrays and CEs
are refcounted.
Voidification of Zend API which always succeeded
Use bool argument types instead of int for boolean arguments
Use bool return type for functions which return true/false (1/0)
Use zend_result return type for functions which return SUCCESS/FAILURE as they don't follow normal boolean semantics
Closes GH-6002
The primary issue was already resolved in 7c3e487289,
but the particular example used in this bug report ran into an
additional issue on PHP 8, because I forgot to drop a number of
zend_bailout calls when switch require failure to throw.
Unconditionally strip shebang lines when using the CLI SAPI,
independently of whether they occur in the primary or non-primary
script. It's unlikely that someone intentionally wants to print
that shebang line when including a script, and this regularly
causes issues when scripts are used in multiple contexts, e.g.
for direct invocation and as a phar bootstrap.
Don't include a trailing newline in T_COMMENT tokens, instead leave
it for a following T_WHITESPACE token. The newline does not belong
to the comment logically, and this makes for an ugly special case,
as other tokens do not include trailing newlines.
Whitespace-sensitive tooling will want to either forward or backward
emulate this change.
Closes GH-5182.
One of the weirdest pieces of PHP code I've ever seen. In terms
of tokens, this gets internally translated to
use x as y; echo as my_echo;
On master it crashes because this "echo" does not have attached
identifier metadata. Make sure it is added and then reject the
use of "<?=" as an identifier inside zend_lex_tstring.
Fixes oss-fuzz #23547.
This is a bit tricky: In this cases we have "namespace as", which
means that we will only recognize "namespace" as an identifier when
the lookahead token is already at the "as". This means that
zend_lex_tstring picks up the wrong identifier.
We solve this by actually assigning the identifier as the semantic
value on the parser stack -- as in almost all cases we will not
actually need the identifier, this is just an (offset, size)
reference, not a copy of the string.
Additionally, we need to teach the lexer feedback mechanism used
by tokenizer TOKEN_PARSE mode to apply feedback to something
other than the very last token. To that purpose we pass through
the token text and check the tokens in reverse order to find the
right one.
Closes GH-5668.
Aside from a few very specific syntax errors for which detailed exceptions are
thrown, generally PHP just emits the default error messages generated by bison on syntax
error. These messages are very uninformative; they just say "Unexpected ... at line ...".
This is most problematic with constructs which can span an arbitrary number of lines, such
as blocks of code delimited by { }, 'if' conditions delimited by ( ), and so on. If a closing
delimiter is missed, the block will run for the entire remainder of the source file (which
could be thousands of lines), and then at the end, a parse error will be thrown with the
dreaded words: "Unexpected end of file".
Therefore, track the positions of opening and closing delimiters and ensure that they match
up correctly. If any mismatch or missing delimiter is detected, immediately throw a parse
error which points the user to the offending line. This is best done in the *lexer* and not
in the parser.
Thanks to Nikita Popov and George Peter Banyard for suggesting improvements.
Fixes bug #79368.
Closes GH-5364.
Closes GH-5353. From now on, PHP will have reflection information
about default values of parameters of internal functions.
Co-authored-by: Nikita Popov <nikita.ppv@gmail.com>
Usually it will already fail when opening, but reads can also
fail since PHP 7.4, in which case we still need to place the
file handle in open_files to make sure the destructor will run
on it.
This originally manifested as a leak in oss-fuzz #18000. The following
is a reduced test case:
<?php
[
5 => 1,
"foo" > 1,
" " => "" == 0
];
<<<BAR
$x
BAR;
Because this particular error condition did not return T_ERROR,
EG(exception) was set while performing binary operation constant
evaluation, which checks exceptions for cast failures.
Instead of adding this indirect test case, I'm adding an assertion
that the lexer has to return T_ERROR if EG(exception) is set.
To clean up the mess here a bit, check for invalid octal digits
with an explicit loop instead of mixing this into the string to
number conversion.
Also clean up some type usage.