If a PHP file contains an invalid hex literal such as `0x_10`, the expected error
is `Parse error: syntax error, unexpected 'x_10' (T_STRING) in %s on line %d`.
This already worked correctly on Linux, but on Windows prior to this patch a different
error was produced: `Parse error: Invalid numeric literal in %s on line %d`.
This reverts commit e528762c1c.
Dmitry reports that this has a non-trivial impact on parsing
overhead, especially on 32-bit systems. As we don't have a strong
need for this change right now, I'm reverting it.
See also comments on
e528762c1c.
Locations for AST nodes are now tracked with the help of bison
location tracking. This is more accurate than what we currently do
and easier to extend with more information.
A zend_ast_loc structure is introduced, which is used for the location
stack. Currently it only holds the start lineno, but can be extended
to also hold end lineno and offset/column information in the future.
All AST constructors now accept a zend_ast_loc* as first argument, and
will use it to determine their lineno. Previously this used either the
CG(zend_lineno), or the smallest AST lineno of child nodes.
On the parser side, the location structure for a whole rule can be
obtained using the &@$ character salad.
RFC: https://wiki.php.net/rfc/null_coalesce_equal_operator
$a ??= $b is $a ?? ($a = $b), with the difference that $a is only
evaluated once, to the degree that this is possible. In particular
in $a[foo()] ?? $b function foo() is only ever called once.
However, the variable access themselves will be reevaluated.
The $Id$ keywords were used in Subversion where they can be substituted
with filename, last revision number change, last changed date, and last
user who changed it.
In Git this functionality is different and can be done with Git attribute
ident. These need to be defined manually for each file in the
.gitattributes file and are afterwards replaced with 40-character
hexadecimal blob object name which is based only on the particular file
contents.
This patch simplifies handling of $Id$ keywords by removing them since
they are not used anymore.
zval_dtor() doesn't make a lot of sense in PHP-7.* and it's used incorrectly in some places.
Its occurances should be replaced by zval_ptr_dtor() or zval_ptr_dtor_nogc(), or even more specialized destructors.
RFC: https://wiki.php.net/rfc/flexible_heredoc_nowdoc_syntaxes
* The ending label no longer has to be followed by a semicolon or
newline. Any non-label character is fine.
* The ending label may be indented. The indentation will be stripped
from all lines in the heredoc/nowdoc string.
Lexing of heredoc strings performs a scan-ahead to determine the
indentation of the ending label, so that the correct amount of
indentation can be removed when calculting the semantic values for
use by the parser. This makes the implementation quite a bit more
complicated than we would like :/
In the following piece of code:
```php
function from1234($x) {
return $x;
}
function foo($x) {
yield from1234($x);
}
```
The statement inside foo is taken as `yield from` `1234($x)`
which is neither the intent, nor even legal syntax for an fcall.
Do a lookahead for breaking non-label characters after the
`yield from` and only accept it if they occur.
Introduce an on_event_context passed to the on_event hook. Use this
context to pass along the token array. Previously this was stored
in a non-tls global :/
This slightly improves calls to regular function and method calls in cost of a bit slower generator initialization.
Separate call frame for generators, allocated on heap, now created by ZEND_GENERATOR_CREATE instruction.
This turned out to be rather inconvenient after all. Instead just
return the same output we did on PHP 5. If people want to have an
error, use TOKEN_PARSE.
Added Throwable interface that exceptions must
implement in order to be thrown. BaseException
was removed, EngineException renamed to
Error, and TypeException and ParseException
renamed to TypeError and ParseError. Exception
and Error no longer extend a common base
class, rather they both implement the Throwable
interface.
we basically added a mechanism to store the token stream during parsing
and exposed the entire parser stack on the tokenizer extension through
an opt in flag: token_get_all($src, TOKEN_PARSE).
this change allows easy future language enhancements regarding context
aware parsing & scanning without further maintance on the tokenizer
extension while solves known inconsistencies "parseless" tokenizer
extension has when it handles `__halt_compiler()` presence.
The implementation has no regression risks, has an even smaller footprint
compared to the previous attempt involving a pure lexical approach, is higly
predictable and higly configurable.
To turn a word semi-reserved you only need to edit the "SEMI_RESERVED" parser rule,
it's an inclusive list of all the words that should be matched as T_STRING on specific contexts.
Example:
```
method_modifiers function returns_ref indentifier '(' parameter_list ')' ...
```
instead of:
```
method_modifiers function returns_ref T_STRING '(' parameter_list ')' ...
```
TODO: port ext tokenizer
Introduce helper macro FC(x) for CG(file_context).x.
end_compilation() now handled by file_context_end().
While at it, dropped zval wrapper for ticcks.
Renamed compiler_context to oparray_context. Introduced per-file
file_context. Moved import tables into the file_context.
context_stack no longer exists, instead keeping backups of contexts
on C stack. Same for file contexts.
TODO: Move more things out of CG into file_context. There should be
a number of other things that we should not try to reuse in nested
compilations.
Primarily to avoid getting fatal errors from token_get_all().
Implemented using a magic E_ERROR token, which the lexer emits to
force a parser failure.
As per https://wiki.php.net/rfc/remove_alternative_php_tags.
Removes:
* <% opening tag
* %> closing tag
* <%= short opening tag
* /<script\s+language\s*=\s*(php|"php"|'php')\s*>/i opening tag
* /</script>/i closing tag
* asp_tags ini directive