Introduce an on_event_context passed to the on_event hook. Use this
context to pass along the token array. Previously this was stored
in a non-tls global :/
we basically added a mechanism to store the token stream during parsing
and exposed the entire parser stack on the tokenizer extension through
an opt in flag: token_get_all($src, TOKEN_PARSE).
this change allows easy future language enhancements regarding context
aware parsing & scanning without further maintance on the tokenizer
extension while solves known inconsistencies "parseless" tokenizer
extension has when it handles `__halt_compiler()` presence.
This fixes bug #60097.
Before two global variables CG(heredoc) and CG(heredoc_len) were used to
track the current heredoc label. In order to support nested heredoc
strings the *previous* heredoc label was assigned as the token value of
T_START_HEREDOC and the language_parser.y assigned that to CG(heredoc).
This created a dependency of the lexer on the parser. Thus the
token_get_all() function, which accesses the lexer directly without
also running the parser, was not able to tokenize nested heredoc strings
(and leaked memory). Same applies for the source-code highlighting
functions.
The new approach is to maintain a heredoc_label_stack in the lexer, which
contains all active heredoc labels.
As it is no longer required, T_START_HEREDOC and T_END_HEREDOC now don't
carry a token value anymore.
In order to make the work with zend_ptr_stack in this context more
convenient I added a new function zend_ptr_stack_top(), which retrieves the
top element of the stack (similar to zend_stack_top()).
this enables ZE2 to gracefully parse scripts written in UTF-8 (with BOM),
UTF-16, UTF-32, Shift_JIS, ISO-2022-JP etc... (when configured with
'--enable-zend-multibyte' and '--enable-mbstring')