php-src/UPGRADING.INTERNALS
Christoph M. Becker 3b0f051193 Allow empty $escape to eschew escaping CSV
Albeit CSV is still a widespread data exchange format, it has never been
officially standardized.  There exists, however, the “informational” RFC
4180[1] which has no notion of escape characters, but rather defines
`escaped` as strings enclosed in double-quotes where contained
double-quotes have to be doubled.  While this concept is supported by
PHP's implementation (`$enclosure`), the `$escape` sometimes interferes,
so that `fgetcsv()` is unable to correctly parse externally generated
CSV, and `fputcsv()` is sometimes generating non-compliant CSV.  Since
PHP's `$escape` concept is availble for many years, we cannot drop it
for BC reasons (even though many consider it as bug).  Instead we allow
to pass an empty string as `$escape` parameter to the respective
functions, which results in ignoring/omitting any escaping, and as such
is more inline with RFC 4180.  It is noteworthy that this is almost no
userland BC break, since formerly most functions did not accept an empty
string, and failed in this case.  The only exception was `str_getcsv()`
which did accept an empty string, and used a backslash as escape
character then (which appears to be unintended behavior, anyway).

The changed functions are `fputcsv()`, `fgetcsv()` and `str_getcsv()`,
and also the `::setCsvControl()`, `::getCsvControl()`, `::fputcsv()`,
and `::fgetcsv()` methods of `SplFileObject`.

The implementation also changes the type of the escape parameter of the
PHP_APIs `php_fgetcsv()` and `php_fputcsv()` from `char` to `int`, where
`PHP_CSV_NO_ESCAPE` means to ignore/omit escaping.  The parameter
accepts the same values as `isalpha()` and friends, i.e. “the value of
which shall be representable as an `unsigned char` or shall equal the
value of the macro `EOF`.  If the argument has any other value, the
behavior is undefined.”  This is a subtle BC break, since the character
`chr(128)` has the value `-1` if `char` is signed, and so likely would
be confused with `EOF` when converted to `int`.  We consider this BC
break to be acceptable, since it's rather unlikely that anybody uses
`chr(128)` as escape character, and it easily can be fixed by casting
all `escape` arguments to `unsigned char`.

This patch implements the feature requests 38301[2] and 51496[3].

[1] <https://tools.ietf.org/html/rfc4180>
[2] <https://bugs.php.net/bug.php?id=38301>
[3] <https://bugs.php.net/bug.php?id=51496>
2018-12-15 14:38:15 +01:00

173 lines
6.9 KiB
Plaintext

PHP 7.4 INTERNALS UPGRADE NOTES
1. Internal API changes
a. php_sys_symlink() and php_sys_link()
b. zend_lookup_class_ex() and zend_fetch_class_by_name()
c. Function/property/class flags
d. Removed zend_check_private()
e. php_win32_error_to_msg() memory management
f. get_properties_for() handler / Z_OBJDEBUG_P
g. Required object handlers
h. Immutable classes and op_arrays
i. php_fgetcsv() and php_fputcsv()
2. Build system changes
a. Abstract
b. Unix build system changes
c. Windows build system changes
3. Module changes
a. ext/xml
b. ext/hash
========================
1. Internal API changes
========================
a. php_sys_symlink() and php_sys_link() portability macros have been
added, which behave like POSIX's symlink() and link(), respectively, on
POSIX compliant systems and on Windows.
b. zend_lookup_class_ex() and zend_fetch_class_by_name() prototypes were
changed to accept optional lower-case class name as zend_string*,
instead of zval*.
c. Function/property/class flags changes
- ZEND_ACC_CTOR and ZEND_ACC_DTOR are removed. It's possible to check if
method is a constructor/destructor using the following condition
(func->common.scope->constructor == func).
- ZEND_ACC_IMPLEMENTED_ABSTRACT is removed (it was used only internally
during inheritance).
- ZEND_ACC_IMPLICIT_PUBLIC is removed (it was used only for reflection)
- ZEND_ACC_SHADOW property flag is removed. Instead of creating shadow
clone, now we use the same private property_info, and should also
check property_info->ce (in the same way as with methods).
- ZEND_ACC_ANON_BOUND is replaced with ZEND_ACC_LINKED. This flag is set
not only during anonymous classes declaration, but also during any
run-time or compile-time class declaration.
- ZEND_ACC_NO_RT_ARENA renamed into ZEND_ACC_HEAP_RT_CACHE. Now it's used
not only for closures, but also for pseudo-main op_arrays.
- ZEND_ACC_... flags are re-numbered.
d. zend_check_private() is removed. Use (func->common.scope == scope) instead.
e. Pointers returned by php_win32_error_to_msg() have to be freed using
php_win32_error_msg_free(). Same regarding php_win_err() vs.
php_win_err_free().
f. A new, optional object handler with the signature
HashTable *get_properties_for(zval *obj, zend_prop_purpose purpose)
has been introduced, where zend_prop_purpose (currently) takes one of:
ZEND_PROP_PURPOSE_DEBUG // var_dump etc.
ZEND_PROP_PURPOSE_ARRAY_CAST // (array) $obj
ZEND_PROP_PURPOSE_SERIALIZE // "O"-format serialization (__wakeup)
ZEND_PROP_PURPOSE_VAR_EXPORT // var_export (__set_state)
ZEND_PROP_PURPOSE_JSON // json_encode
The handler returns a non-null HashTable with increased refcounted, and
the return value must be released using zend_release_properties().
This handler serves the same general function as get_properties(), but
provides more control over different property uses, while also making
it possible to return a temporary property table.
get_properties() is still used in cases where none of the above purposes
apply, but overloading get_properties() is generally discouraged. If you
want to provide purposes for general usage rather than just debugging or
serialization, please prefer using properly declared properties.
get_debug_info() is superseded by get_properties_for() with the
ZEND_PROP_PURPOSE_DEBUG purpose, but remains available for backwards-
compatibility reasons. However, while it is fine to define this handler,
it should never be directly called by consuming code.
The Z_OBJDEBUG_P macro has been removed. It should be replaced by calls to
zend_get_properties_for() with the ZEND_PROP_PURPOSE_DEBUG purpose:
// OLD
int is_temp;
HashTable *ht = Z_OBJDEBUG_P(obj, is_temp);
// ...
if (is_temp) {
zend_hash_destroy(ht);
FREE_HASHTABLE(ht);
}
// NEW
HashTable *ht = zend_get_properties_for(obj, ZEND_PROP_PURPOSE_DEBUG);
// ...
zend_release_properties(ht);
g. The following object handlers are now required (must be non-NULL):
* free_obj
* dtor_obj
* read_property
* write_property
* read_dimension
* write_dimension
* get_property_ptr_ptr
* has_property
* unset_property
* has_dimension
* unset_dimension
* get_properties
* get_method
* get_constructor
* get_class_name
* get_gc
It is recommended to initialize object handler structures by copying the
std object handlers and only overwriting those you want to change.
h. Opcache may make classes and op_arrays immutable. Such classes are marked
by ZEND_ACC_IMMUTABLE flag, they are not going to be copied from opcache
shared memory to process memory and must not be modified at all.
Few related data structures were changed to allow addressing mutable data
structures from immutable ones. This access is implemented through
ZEND_MAP_PTR... abstraction macros and, basically, uses additional level of
indirection. op_array->run_time_cache, op_array->static_variables_ptr and
class_entry->static_members_table now have to be accessed through
ZEND_MAP_PTR... macros.
It's also not allowed to change op_array->reserved[] handles of immutable
op_arrays. Instead, now you have to reserve op_array handle using
zend_get_op_array_extension_handle() during MINIT and access its value
using ZEND_OP_ARRAY_EXTENSION(op_array, handle).
i. The type of the escape parameter of php_fgetcsv() and php_fputcsv() has
been changed from char to int. This allows to pass the new constant macro
PHP_CSV_NO_ESCAPE to this parameter, to disable PHP's proprietary escape
mechanism.
========================
2. Build system changes
========================
a. Abstract
- The hash extension is now always available, meaning the --enable-hash
configure argument has been removed.
b. Unix build system changes
- configure --help now also outputs --program-suffix and --program-prefix
information by using the Autoconf AC_ARG_PROGRAM macro.
- Obsolescent macros AC_FUNC_VPRINTF and AC_FUNC_UTIME_NULL have been
removed. Symbols HAVE_VPRINTF and HAVE_UTIME_NULL are no longer defined
since they are not needed on the current systems.
c. Windows build system changes
========================
3. Module changes
========================
a. ext/xml
- The public (internal) API of the ext/xml extension has been removed. All
functions and structures are private to the extension now.
b. ext/hash
- The hash extension is now always available, allowing extensions to rely
on its functionality to be available without compile time checks.