Commit Graph

1523 Commits

Author SHA1 Message Date
Christoph M. Becker
8d9f47fb51 Merge branch 'PHP-7.2'
* PHP-7.2:
  Fix #76113: mbstring does not build with Oniguruma 6.8.1
2018-03-20 17:02:52 +01:00
Christoph M. Becker
8f5c34cd39 Merge branch 'PHP-7.1' into PHP-7.2
* PHP-7.1:
  Fix #76113: mbstring does not build with Oniguruma 6.8.1
2018-03-20 16:53:17 +01:00
Christoph M. Becker
4072b27870 Fix #76113: mbstring does not build with Oniguruma 6.8.1
As of Oniguruma 6.8.1, the regex structure has been moved from the
public `oniguruma.h` to the private `regint.h`.  Thus, it is no longer
possible to directly access the struct's members, and actually, there
is no need to, since there are respective accessor functions available
at least of 2.3.1.
2018-03-20 16:42:28 +01:00
Christoph M. Becker
9004985273 Merge branch 'PHP-7.2'
* PHP-7.2:
  Fix #75944: Wrong cp1251 detection
2018-03-19 14:48:10 +01:00
Christoph M. Becker
cd2912af5e Merge branch 'PHP-7.1' into PHP-7.2
* PHP-7.1:
  Fix #75944: Wrong cp1251 detection
2018-03-19 14:34:09 +01:00
Christoph M. Becker
47461368ca Fix #75944: Wrong cp1251 detection
`\xFF` is a valid character of CP-1251.
2018-03-19 14:24:27 +01:00
Christoph M. Becker
ef01ec08f0 Merge branch 'PHP-7.2'
* PHP-7.2:
  Fix #62545: wrong unicode mapping in some charsets
2018-03-11 18:05:08 +01:00
Christoph M. Becker
2b02e6dff3 Merge branch 'PHP-7.1' into PHP-7.2
* PHP-7.1:
  Fix #62545: wrong unicode mapping in some charsets
2018-03-11 17:54:45 +01:00
Christoph M. Becker
01ea314e8c Fix #62545: wrong unicode mapping in some charsets
Undefined characters are best mapped to Unicode REPLACEMENT characters.
2018-03-11 17:38:28 +01:00
Christoph M. Becker
d48b233991 Update to Oniguruma 6.7.1
We also apply the still relevant parts of `oniguruma.patch` and update
the patch accordingly.
2018-03-10 01:07:00 +01:00
Gabriel Caruso
e1cc4863d9 Remove duplicated tests 2018-02-22 13:03:21 +01:00
Gabriel Caruso
ded3d984c6 Use EXPECT instead of EXPECTF when possible
EXPECTF logic in run-tests.php is considerable, so let's avoid it.
2018-02-20 21:53:48 +01:00
Anatol Belski
0bc4cf901c Fix unsigned comparisons 2018-02-17 13:02:50 +01:00
Gabriel Caruso
21e3b0c70c Remove trailing whitespace in inc files 2018-02-10 19:20:23 +01:00
Gabriel Caruso
2d48d734a2 Fix some misspellings 2018-02-06 16:59:00 +01:00
Nikita Popov
d7fe32500e Match strpos() behavior with mbstring.func_overload
mb_strpos() specifically emulates strpos() behavior when function
overloading is enabled. However, the condition was not changed
when strpos() behavior changed in PHP 7.
2018-02-05 20:58:15 +01:00
Gabriel Caruso
fef879a2d6 Use bool instead of boolean while throwing a type error
PHP requires boolean typehints to be written "bool" and disallows
"boolean" as an alias. This changes the error messages to match
the actual type name and avoids confusing messages like "must be
of type boolean, boolean given".

This a followup to ce1d69a1f6, which
implements the same change for integer->int.
2018-02-04 23:09:40 +01:00
Gabriel Caruso
ce1d69a1f6 Use int instead of integer in type errors
PHP requires integer typehints to be written "int" and does not
allow "integer" as an alias. This changes type error messages to
match the actual type name and avoids confusing messages like
"must be of the type integer, integer given".
2018-02-04 19:08:23 +01:00
Stanislav Malyshev
3616b6b935 Cleanup some tests - remove unnecessary sections
Also unify credits - all are under --CREDITS-- now.
2018-02-04 02:21:40 -08:00
Gabriel Caruso
c6c9e71a5b Add missing SKIPIF sections 2018-02-03 13:54:34 +01:00
Nat Zimmermann
478af26d84 Update mb_preferred_mime_name tests 2018-01-26 22:25:18 +01:00
Nat Zimmermann
6fb78e3017 Add unknown encoding warning test for mb_encoding_aliases 2018-01-26 22:25:18 +01:00
Gabriel Caruso
2238403892 Trailing whitespaces on ext/*
Signed-off-by: Gabriel Caruso <carusogabriel34@gmail.com>
2018-01-04 02:38:32 -02:00
Gabriel Caruso
6400264856 Trailing whitespaces
Signed-off-by: Gabriel Caruso <carusogabriel34@gmail.com>
2018-01-03 14:38:00 +01:00
Xinchen Hui
a76eeea736 Merge branch 'PHP-7.2'
* PHP-7.2:
  Happy new year (Update copyright to 2018)

Conflicts:
	ext/phar/LICENSE
2018-01-03 16:02:15 +08:00
Xinchen Hui
0e62639d28 Merge branch 'PHP-7.1' into PHP-7.2
* PHP-7.1:
  Happy new year (Update copyright to 2018)
2018-01-03 16:00:34 +08:00
Lior Kaplan
fbfdd1e1c4 Happy new year (Update copyright to 2018) 2018-01-02 23:42:29 +02:00
Xinchen Hui
a6519d0514 year++ 2018-01-02 12:57:58 +08:00
Xinchen Hui
7a7ec01a49 year++ 2018-01-02 12:55:14 +08:00
Xinchen Hui
ccd4716ec7 year++ 2018-01-02 12:53:31 +08:00
Dmitry Stogov
b864e6b58c Move constants into read-only data segment 2017-12-15 01:55:00 +03:00
Dmitry Stogov
83e495e0fd Move constants into read-only data segment 2017-12-14 22:14:36 +03:00
Dmitry Stogov
9e709e2fa0 Move constants into read-only data segment 2017-12-14 18:43:44 +03:00
Dmitry Stogov
185478d07e Use cheaper SEPARATE macros 2017-12-07 22:35:17 +03:00
Dmitry Stogov
6a9d2b2190 Cleanup type conversion 2017-12-07 19:24:55 +03:00
Nikita Popov
d21c902841 Fix cp950 pua check
One set of parenthesis was missing, causing a legitimate compiler
warnings. In the end it doesn't actually matter, because it just
ends up doing an unnecessary check in the w > 0 case.

This fixes the logic and moves it out into a separate functions,
to be a bit more readable.
2017-11-22 23:47:18 +01:00
Colin O'Dell
201930106d Add test for negative lengths in mb_strcut() 2017-11-22 22:47:55 +01:00
Colin O'Dell
830d87b86e Add tests for mb_language() 2017-11-22 22:47:55 +01:00
Joe Watkins
21e4ab1977
Merge branch 'PHP-7.2'
* PHP-7.2:
  Fix proto documents for new global functions
2017-11-06 07:24:51 +00:00
Tyson Andre
5cdf37e603
Fix proto documents for new global functions
See NEWS and UPGRADING (or arginfo/implementation) for details.
2017-11-06 07:24:42 +00:00
Dmitry Stogov
3b2e858304 Overlad functions once in MINIT (instead of on each requestr in RINIT) 2017-11-02 14:09:06 +03:00
Dmitry Stogov
ed5b4d5c99 Use Zend MM heap 2017-11-01 02:38:26 +03:00
Nikita Popov
251c1b1a44 Fix invalid read in mb_ord() 2017-10-28 16:44:32 +02:00
Peter Kokot
5c5bd30339 Remove --with-libmbfl configure option
The bundled libmbfl library is no longer API or ABI compatible with
the (currently unmaintained) upstream library. As such, building
against an external libmbfl is no longer possible.
2017-10-28 16:11:30 +02:00
Dmitry Stogov
9cf87aa196 Avoid HashTable allocations for empty arrays (using zend_empty_array). 2017-10-24 17:27:31 +03:00
Peter Kokot
3ed3bc3a0c Update README information for the libmbfl library
The libmbfl library is bundled with PHP and has its own repository for
development and bug fixes. To avoid confusion and faster development the
README has been updated to include the information of the original library and
to use the bundled library as a fork of the upstream repository instead.
2017-10-08 17:51:02 +02:00
Peter Kokot
a57de26c3d Refactor mbstring READMEs 2017-10-08 17:51:02 +02:00
Dmitry Stogov
45ee78e040 mb_convert_variables() refactored to use simple recursion.
Fixed incorrect recursion protection (previous implementation kept protection flag or apply counter in non-zero state).
2017-10-06 12:08:55 +03:00
Dmitry Stogov
cb9d81ef4f Refactored recursion pretection 2017-10-06 01:34:50 +03:00
Peter Kokot
39ea632f74 Join untracked files to root .gitignore 2017-10-05 12:36:47 +02:00
Dmitry Stogov
44e0b79ac6 Refactored array creation API. array_init() and array_init_size() are converted into macros calling zend_new_array(). They are not functions anymore and don't return any values. 2017-09-20 02:25:56 +03:00
Joe Watkins
c898349e16
fixes PR #2722, no clue how it broke ... 2017-09-06 11:13:27 +01:00
shinemotec@gmail.com
9b77615608
fixed mbstring extension compiled broken with archlinux 2017-09-06 09:50:08 +01:00
Nikita Popov
fea7957d08 Optimize mb_chr()
By avoiding an unnecessary copy between a string an zend_string.
2017-08-04 22:38:54 +02:00
Nikita Popov
f24db7686e Optimize mb_ord()
Don't perform a full encoding conversion into UCS4-BE, instead only
perform an input conversion into a wchar device.
2017-08-04 22:22:58 +02:00
Nikita Popov
633a471ba0 Store input and output filters in mbfl encodings
For functions like mb_chr() and mb_ord() just looking up the
input/output filter for the encoding dominates the runtime. This
commit stores the input/output filter for an encoding in the
mbfl encoding structure, so it can be looked up directly, rather
than scanning through filter function lists.
2017-08-04 22:22:58 +02:00
Nikita Popov
e20fbd43ba Separate mbfl filters into three categories
Input filters, output filters and special filters.
2017-08-04 22:22:58 +02:00
Nikita Popov
840b77c02e Merge branch 'PHP-7.2' 2017-08-04 22:20:11 +02:00
Nikita Popov
6b73b2d6eb Check for empty string in mb_ord() 2017-08-04 22:20:05 +02:00
Nikita Popov
4e4ec31e2e Merge branch 'PHP-7.2' 2017-08-04 13:02:44 +02:00
Nikita Popov
353f7bf461 Also check for invalid codepoints in mb_ord()
And return false in that case, instead of returning 0x3f...
2017-08-04 13:01:03 +02:00
Nikita Popov
5caf05f6c5 Merge branch 'PHP-7.2' 2017-08-03 22:41:15 +02:00
Nikita Popov
e53162a32b Return false on invalid codepoint in mb_chr()
Instead of returning the encoding of the current substitution
character. This allows a robust check for the failure case. The
substitution character (especially the default of "?") is also
a valid output of mb_chr() for a valid input (for "?" that would be
0x3f), so it's a bad choice for an error value.
2017-08-03 22:36:42 +02:00
Nikita Popov
41e9ba6333 Always use Unicode codepoints in mb_ord() and mb_chr()
Previously mb_chr() had two different encoding-dependent behaviors:
 * For "Unicode-encodings" it took a Unicode codepoint and returned
   its encoded representation.
 * Otherwise it returned a big-endian binary encoding of the passed
   integer.

Now the input is always interpreted as a Unicode codepoint. If
a big-endian binary encoding is what you want, you don't need
mbstring to implement that.
2017-08-03 22:14:00 +02:00
Nikita Popov
c98714f19e Merge branch 'PHP-7.2' 2017-08-03 21:57:35 +02:00
Nikita Popov
fb9bf5b64b Revert/fix substitution character fallback
The introduced checks were not correct in two respects:
 * It was checked whether the source encoding of the string matches
   the internal encoding, while the actually relevant encoding is
   the *target* encoding.
 * Even if the correct encoding is used, the checks are still too
   conservative. Just because something is not a "Unicode-encoding"
   does not mean that it does not map any non-ASCII characters.

I've reverted the added checks and instead adjusted mbfl_convert
to first try to use the provided substitution character and if
that fails, perform the fallback to '?' at that point. This means
that any codepoint mapped in the target encoding should now be
correctly supported and anything else should fall back to '?'.
2017-08-03 21:53:59 +02:00
Nikita Popov
3d948d77d1 Merge branch 'PHP-7.2' 2017-08-03 21:17:26 +02:00
Nikita Popov
a8a9e93e9a Revert/fix mb_substitute_character() codepoint checks
The introduced checks did not treat "non-Unicode" encodings correctly,
because they treated the passed integer as encoded in the internal
encoding in that case, while in actuality the substitute character
is always a Unicode codepoint.

Additionally checking the codepoint against the internal encoding
is not correct in any case, because the substitution character must
be mapped in the *target* encoding of the conversion, which does
not necessarily coincide with the internal encoding (the internal
encoding is the default *source* encoding, not *target* encoding).

This reverts the checks back to simple range checks, but in a way
that still resolves #69079: Characters outside the Basic
Multilingual Plane are now accepted and Surrogate Codepoints are
rejected. A distinction between UTF-8 and non-UTF-8 encodings is
not made for surrogate checks (as in the original patch), as
surrogates are always illegal on their own. Specifying a surrogate
as substitution character would only make sense if you could
specify a substitution string with more than one character --
however we do not support that.
2017-08-03 21:12:41 +02:00
Nikita Popov
94fe629992 Merge branch 'PHP-7.2' 2017-08-02 18:11:17 +02:00
Nikita Popov
91240073ea Merge branch 'PHP-7.1' into PHP-7.2 2017-08-02 18:11:12 +02:00
Nikita Popov
63607375f5 Merge branch 'PHP-7.0' into PHP-7.1 2017-08-02 18:09:09 +02:00
Fabien Villepinte
2cc1cbf2f4 Fix Bug #75001: Wrong reflection on mb_eregi_replace 2017-08-02 18:08:42 +02:00
Anatol Belski
f9c3ee9ae8 fix c89 compat 2017-07-28 22:18:51 +02:00
Nikita Popov
f4a1d9c821 Fixed bug #65544 and #71298 2017-07-28 14:57:08 +02:00
Nikita Popov
25b6e68432 Merge branch 'PHP-7.2' 2017-07-28 13:03:35 +02:00
Nikita Popov
5d777e56e2 Merge branch 'PHP-7.1' into PHP-7.2 2017-07-28 13:03:26 +02:00
Nikita Popov
c48c638aeb Merge branch 'PHP-7.0' into PHP-7.1 2017-07-28 13:03:02 +02:00
Nikita Popov
e3d25e78eb Fixed bug #62934 2017-07-28 13:02:25 +02:00
Nikita Popov
582a65b06f Implement full case mapping
Implement full case mapping according to SpecialCasing.txt and
also full case folding according to CaseFolding.txt (F). There
are a number of caveats:

* Only language-agnostic and unconditional full case mapping
  is implemented. The only language-agnostic conditional case
  mapping rule relates to Greek sigma in final position
  (Final_Sigma). Correctly handling this requires both arbitrary
  lookahead and lookbehind, which would require some larger
  changes to how the case mapping is implemented. This is a
  possible future extension.
* The only language-specific handling that is implemented is
  for Turkish dotted/undotted Is, if the ISO-8859-9 encoding
  is used. This matches the previous behavior and makes sure
  that no codepoints not supported by the encoding are
  produced. A future extension would be to also handle the
  Turkish mappings specified by SpecialCasing.txt based on
  the mbfl internal language.
* Full case folding is implemented, but case-insensitive mb_*
  operations continue to use simple case folding. The reason is
  that full case folding of the haystack string may change the
  position at which a match occurred. This would have to be
  mapped back into the position in the original string.
* mb_convert_case() exposes both the full and the simple case
  mapping / folding, where full is the default. The constants
  are:

   * MB_CASE_LOWER (used by mb_strtolower)
   * MB_CASE_UPPER (used by mb_strtolower)
   * MB_CASE_TITLE
   * MB_CASE_FOLD
   * MB_CASE_LOWER_SIMPLE
   * MB_CASE_UPPER_SIMPLE
   * MB_CASE_TITLE_SIMPLE
   * MB_CASE_FOLD_SIMPLE (used by case-insensitive operations)
2017-07-28 12:32:50 +02:00
Nikita Popov
9ac7c1e71d Use case-folding for case insensitive comparisons
Instead of using lowercasing.
2017-07-28 12:32:50 +02:00
Nikita Popov
80a0601fe5 Use MPH for case maps
Instead of performing a binary search, use a hashtable to store
the case maps. In particular a minimal perfect hash construction
is used, which does not require collision resolution (but does
use an auxiliary table for the hash perturbation).
2017-07-28 12:32:50 +02:00
Nikita Popov
f56b0afe6e Avoid some unnecessary mbfl_strlen() calculations 2017-07-28 12:32:50 +02:00
Nikita Popov
eacd70f762 Don't store titlecase if same as uppercase
The totitle code already has a fallback for that case.
2017-07-28 12:32:50 +02:00
Nikita Popov
cedfc2f426 Drop implementation-specific character properties
No point in keeping around non-standard character properties if
we're not using them and most are not even being populated.
2017-07-28 12:32:50 +02:00
Anatol Belski
98fe82cc05 fix data types 2017-07-25 21:26:25 +02:00
Anatol Belski
13a2629005 size_t fixes 2017-07-25 19:03:33 +02:00
Nikita Popov
8ace7045e9 Handle character ranges in ucgendat generically
In particular, the previous implementation did not account for
Tangut Ideographs and CJK Ideograph extensions C through F.
2017-07-25 18:48:12 +02:00
Nikita Popov
0c0e35fedc Port ucgendat to PHP
Implemented such that the output is identical, including some
quirks that should be fixed subsequently.
2017-07-25 18:48:12 +02:00
Nikita Popov
4bd61ec7ad Fix handling of some special ranges in ucgendat
* Han Ideagraphs go up to U+9FEA.
* CJK Compatibility Ideographs are no longer specified as a special
  range in remotely recent versions of Unicode.
* Surrogate properties should be assigned to U+D800-U+DFFF, not to
  U+10000-U+1FFFF.
2017-07-25 18:48:12 +02:00
Nikita Popov
445e13b149 Add MBFL_SUBSTR_TO_END mode to mbfl_substr
This takes the substr from the offset to the end of the string.
This avoids pointless searching for the end position and also
saves us a length calculation in the strstr family of functions.
2017-07-23 23:17:12 +02:00
Nikita Popov
bff11c382e Remove more obsolete length checks 2017-07-23 19:09:36 +02:00
Nikita Popov
3c6b2512cb Change layout of case mapping table
Previously the case mapping table was segregated by the type of
the character (upper, lower, title) and always stored the other
two variants (key, other1, other2). Now the table is segregated
by the target type (key, other). As only very few characters have
more than one target this only slightly increases the size of the
table.

The advantage of this layout is that we only need to perform a
single table lookup in the case table. Previously, depending on
the case that was hit, either one lookup in the property table,
or two lookups in the property table and one lookup in the case
table were required.

This changes the layout from libunicode in the OpenLDAP project
-- however, the last commit there was over 10 years ago, so I
don't see value in keeping this in sync.
2017-07-23 18:33:15 +02:00
Anatol Belski
78944bdfc6 remove cast 2017-07-23 17:38:28 +02:00
Anatol Belski
6809be2090 fix warnings and datatype
ident
2017-07-23 17:36:10 +02:00
Anatol Belski
7496bad2ac adjust datatype, used for position handling 2017-07-23 16:37:31 +02:00
Anatol Belski
ea83b69883 Adjust datatypes and reorder which saves 8 bytes on 64-bit 2017-07-23 16:37:30 +02:00
Nikita Popov
fe8384fdfd Merge branch 'PHP-7.2' 2017-07-23 16:06:25 +02:00
Nikita Popov
706f0cf8a0 Update Unicode data for Unicode 10 2017-07-23 16:05:39 +02:00
Nikita Popov
24cfbfd56f Update ucgendat for more bidi properties
Handle them the same way as others -- by classifying as Other
Neutral.
2017-07-23 16:03:11 +02:00
Nikita Popov
7077c719db Merge branch 'PHP-7.2' 2017-07-23 15:36:25 +02:00
Nikita Popov
077e61fad3 Fixed bug #69267 completely
ucgendat.c was assuming that a title-case character is a character
that has both lower and upper-case variants. However, there are
title-case characters that only have a lower-case variant. Use the
Lt general character proprety to determine where in the case map
the character should be placed instead.
2017-07-23 15:30:17 +02:00
Nikita Popov
c0bcd301d3 Another fix for bug #69267
mb_strtoupper() was converting lowercase characters into
titlecase characters, instead of uppercase characters. Luckily
there are only very few characters with a distinct titlecase
representation, so this mostly worked out okay...
2017-07-23 15:07:02 +02:00
Nikita Popov
0e4af9192f Partial fix for bug #69267
This pulls in 60a25c72ba389f53b0621ca250bc99f3b295d43f from the
OpenLDAP project.
2017-07-23 14:47:21 +02:00
Nikita Popov
698132d6f9 Merge branch 'PHP-7.2' 2017-07-23 12:22:09 +02:00
Nikita Popov
88f752a947 Merge branch 'PHP-7.1' into PHP-7.2 2017-07-23 12:21:51 +02:00
Nikita Popov
f116a88592 Merge branch 'PHP-7.0' into PHP-7.1 2017-07-23 12:21:16 +02:00
Christoph M. Becker
418da85f15 Fix #71606: Segmentation fault mb_strcut with HTML-ENTITIES
The HTML decoding filter uses the `opaque` member of mbfl_convert_filter
as buffer, but there was no copy constructor defined, what caused double
frees when the filter is copied (what happens multiple times in mb_strcut(),
for instance).
2017-07-23 12:19:27 +02:00
Nikita Popov
b8ed74ce77 Merge branch 'PHP-7.2' 2017-07-23 11:55:46 +02:00
Nikita Popov
42ff1aa86c Fix overflow checks in mbfl_memory_device
Also prune out some duplicate code and use strlen() and memcpy()
instead of ad-hoc reimplementations. Remove multiplications by
sizeof(unsigned char), which wrongly imply that this can be
anything but 1.
2017-07-23 11:55:43 +02:00
Nikita Popov
bd63c0f5b3 Fix bug #73528 2017-07-23 11:55:43 +02:00
Nikita Popov
80463579ce Remove confusing null checks in mb_send_mail
These are required parameters, they cannot be missing.
2017-07-23 11:55:43 +02:00
Nikita Popov
9af5b7f33d Fix use after free in mb_send_mail 2017-07-23 11:55:26 +02:00
Anatol Belski
4fbd7ccba2 touch yet more places for datatypes 2017-07-23 00:47:24 +02:00
Anatol Belski
0eea41b6c4 add missing header 2017-07-23 00:23:02 +02:00
Anatol Belski
61784bcb71 sync libmbfl allocator with the size_t changes 2017-07-22 23:53:00 +02:00
Anatol Belski
e0825ec60f Mitigation for ssize_t issue in 22a5f554a8
and some more
2017-07-22 22:34:16 +02:00
Nikita Popov
a319063aae Only write single terminating byte
As far as I could determine this is sufficient. It avoids
reallocating the buffer, if it was perfectly allocated beforehand.
2017-07-20 21:41:52 +02:00
Nikita Popov
1388751f10 Use fast zpp in mb_strlen()
For short strings this function is now sufficiently fast for zpp
to be a bottleneck.
2017-07-20 21:41:52 +02:00
Nikita Popov
b3c1d9d111 Directly use encodings instead of no_encoding in libmbfl
In particular strings now store encoding rather than the
no_encoding.

I've also pruned out libmbfl APIs that existed in two forms, one
using no_encoding and the other using encoding. We were not actually
using any of the former.
2017-07-20 21:41:52 +02:00
Nikita Popov
22a5f554a8 Temporary fix for windows build
This API should be changed to stop using negative offsets. For now
I'm replacing ssize_t with long.
2017-07-20 18:29:44 +02:00
Nikita Popov
77cb7bd837 Free last_used_encoding_name in RSHUTDOWN
efree() cannot be used in GSHUTDOWN
2017-07-20 18:12:04 +02:00
Nikita Popov
c098304e17 Reduce number of encoding conversions in case conversion
Don't indirect through UCS4BE, instead directly work on wchars
using a custom filter.

This replaces the pipeline
  utf8 -> wchar -> ucs4be -> wchar -case-> wchar -> ucs4be -> wchar -> utf8
with
  utf8 -> wchar -case-> -> wchar -> utf8
2017-07-20 15:33:24 +02:00
Nikita Popov
17da862b51 Optimize php_unicode_tolower/upper for ASCII 2017-07-20 13:58:40 +02:00
Nikita Popov
ba383b8239 Add basic mbstring encoding cache
Store the last used encoding and compare against it. It's quite
likely that an application is going to be using the same encoding
again and again.

The actual mbfl_name2encoding() function could also be optimized
to use a hash lookup rather than a linear scan, but we don't have
a hashtable implmentation in libmbfl...
2017-07-20 13:58:40 +02:00
Nikita Popov
264387e31e Add php_mb_get_no_encoding() helper function 2017-07-20 13:58:40 +02:00
Nikita Popov
adaea77593 Switch libmbfl to use size_t
Switch mbfl_string and related structures to use size_t lengths.

Quite likely that I broke some things along the way...
2017-07-20 13:58:40 +02:00
Nikita Popov
79c26d597f Optimize php_unicode_is_lower/upper for ASCII 2017-07-20 13:58:40 +02:00
Nikita Popov
9c73be898d Directly accept encoding in php_unicode_convert_case()
As a side-effect mb_strtolower() and mb_strtoupper() now correctly
handle a NULL encoding parameter by using the internal encoding.
This is what caused the two test changes.
2017-07-19 23:59:42 +02:00
Nikita Popov
4128746b94 Add php_mb_get_encoding() convenience function 2017-07-19 23:59:42 +02:00
Nikita Popov
4cf22cbb2d Optimize php_unicode_is_prop()
Do not try to extract the properties from a bitmask. Instead make
the function variadic and pass all properties individually.

Also add a php_unicode_is_prop1() function to check only a single
property.
2017-07-19 23:59:42 +02:00
Nikita Popov
dead4f0b1b Avoid unnecessary encoding lookups in mbstring
Extract part of php_mb_convert_encoding that does the actual work
and use it whenever we already know the encoding.
2017-07-19 23:59:42 +02:00
Anatol Belski
1e2764614b add oniguruma.patch to ease future upgrades 2017-07-13 17:34:14 +02:00
Lior Kaplan
c2c60fcac7 SIZEOF_SIZE_T doesn't exist on AIX and POWER8 (ppc64le), keep using SIZEOF_LONG 2017-07-13 18:05:47 +03:00
Remi Collet
703be4f77e Patch from the upstream git
https://github.com/kkos/oniguruma/issues/60 (CVE-2017-9228)

Thanks to Mamoru TASAKA <mtasaka@fedoraproject.org>
2017-07-05 09:26:06 +02:00
Remi Collet
27a743b82b Patch from the upstream git
https://github.com/kkos/oniguruma/issues/59 (CVE-2017-9229)
b690371bbf97794b4a1d3f295d4fb9a8b05d402d Modified for onig 5.9.6

Thanks to Mamoru TASAKA <mtasaka@fedoraproject.org>
2017-07-05 09:25:57 +02:00
Remi Collet
bdf7393ddb Patch from the upstream git
https://github.com/kkos/oniguruma/issues/58 (CVE-2017-9227)

Thanks to Mamoru TASAKA <mtasaka@fedoraproject.org>
2017-07-05 09:25:49 +02:00
Remi Collet
2693e52113 Patch from the upstream git
https://github.com/kkos/oniguruma/issues/57 (CVE-2017-9224)

Thanks to Mamoru TASAKA <mtasaka@fedoraproject.org>
2017-07-05 09:25:39 +02:00
Remi Collet
4e68b2c52b Patch from the upstream git
https://github.com/kkos/oniguruma/issues/55 (CVE-2017-9226)
b4bf968ad52afe14e60a2dc8a95d3555c543353a Modified for onig 5.9.6
f015fbdd95f76438cd86366467bb2b39870dd7c6 Modified for onig 5.9.6

Thanks to Mamoru TASAKA <mtasaka@fedoraproject.org>
2017-07-05 09:25:27 +02:00
Anatol Belski
b8a334f149 reapply platform related onig patches 2017-05-30 15:47:56 +02:00
Remi Collet
bee52f352f Merge branch 'PHP-7.0' into PHP-7.1
* PHP-7.0:
  NEWS
  Patch from the upstream git https://github.com/kkos/oniguruma/issues/60 (CVE-2017-9228)
  Patch from the upstream git https://github.com/kkos/oniguruma/issues/59 (CVE-2017-9229) b690371bbf97794b4a1d3f295d4fb9a8b05d402d Modified for onig 5.9.6
  Patch from the upstream git https://github.com/kkos/oniguruma/issues/58 (CVE-2017-9227)
  Patch from the upstream git https://github.com/kkos/oniguruma/issues/57 (CVE-2017-9224)
  Patch from the upstream git https://github.com/kkos/oniguruma/issues/55 (CVE-2017-9226) b4bf968ad52afe14e60a2dc8a95d3555c543353a Modified for onig 5.9.6 f015fbdd95f76438cd86366467bb2b39870dd7c6 Modified for onig 5.9.6
2017-05-30 15:45:52 +02:00
Remi Collet
1c845d2950 Patch from the upstream git
https://github.com/kkos/oniguruma/issues/60 (CVE-2017-9228)

Thanks to Mamoru TASAKA <mtasaka@fedoraproject.org>
2017-05-30 15:40:32 +02:00
Remi Collet
5416deec66 Patch from the upstream git
https://github.com/kkos/oniguruma/issues/59 (CVE-2017-9229)
b690371bbf97794b4a1d3f295d4fb9a8b05d402d Modified for onig 5.9.6

Thanks to Mamoru TASAKA <mtasaka@fedoraproject.org>
2017-05-30 15:39:21 +02:00
Remi Collet
6a8ae7cf8d Patch from the upstream git
https://github.com/kkos/oniguruma/issues/58 (CVE-2017-9227)

Thanks to Mamoru TASAKA <mtasaka@fedoraproject.org>
2017-05-30 15:38:17 +02:00
Remi Collet
60b1829e1c Patch from the upstream git
https://github.com/kkos/oniguruma/issues/57 (CVE-2017-9224)

Thanks to Mamoru TASAKA <mtasaka@fedoraproject.org>
2017-05-30 15:37:11 +02:00
Remi Collet
1e0c4386ab Patch from the upstream git
https://github.com/kkos/oniguruma/issues/55 (CVE-2017-9226)
b4bf968ad52afe14e60a2dc8a95d3555c543353a Modified for onig 5.9.6
f015fbdd95f76438cd86366467bb2b39870dd7c6 Modified for onig 5.9.6

Thanks to Mamoru TASAKA <mtasaka@fedoraproject.org>
2017-05-30 15:35:42 +02:00
Remi Collet
0ae2f95b8b Update Oniguruma to latest upstream version 6.3.0
Windows specific changes need to be applied again.
2017-05-30 14:14:57 +02:00
Sara Golemon
9d6b7435e4 Ignore ext/mbstring/oniguruma/oniguruma.h
This is just copied in from ext/mbstring/oniguruma/src/oniguruma.h
and is hasn't been kept in GIT since Nov 2016.
2017-05-02 21:48:47 -07:00
Thomas Punt
9f08aff3fd Remove superfluous allocation checks around ZMM-based functions 2017-04-02 00:58:19 +02:00
Thomas Punt
932c4b35dc Remove more unnecessary checks on Zend's allocator functions 2017-03-16 12:23:55 +01:00
Nikita Popov
edcabf6d07 Drop unnecessary allocator return value checks 2017-03-13 22:07:15 +01:00