mirror of
https://github.com/php/php-src.git
synced 2025-01-22 11:44:09 +08:00
upgrade PCRE to version 7.5
This commit is contained in:
parent
ccc0d6e32b
commit
4c501a0ab6
2
NEWS
2
NEWS
@ -29,7 +29,7 @@ PHP NEWS
|
||||
invoking the date parser. (Scott)
|
||||
|
||||
- Removed the experimental RPL (master/slave) functions from mysqli. (Andrey)
|
||||
- Upgraded PCRE to version 7.4 (Nuno)
|
||||
- Upgraded PCRE to version 7.5 (Nuno)
|
||||
|
||||
- Improved php.ini handling: (Jani)
|
||||
. Added ".htaccess" style user-defined php.ini files support for CGI/FastCGI
|
||||
|
@ -1,6 +1,143 @@
|
||||
ChangeLog for PCRE
|
||||
------------------
|
||||
|
||||
Version 7.5 10-Jan-08
|
||||
---------------------
|
||||
|
||||
1. Applied a patch from Craig: "This patch makes it possible to 'ignore'
|
||||
values in parens when parsing an RE using the C++ wrapper."
|
||||
|
||||
2. Negative specials like \S did not work in character classes in UTF-8 mode.
|
||||
Characters greater than 255 were excluded from the class instead of being
|
||||
included.
|
||||
|
||||
3. The same bug as (2) above applied to negated POSIX classes such as
|
||||
[:^space:].
|
||||
|
||||
4. PCRECPP_STATIC was referenced in pcrecpp_internal.h, but nowhere was it
|
||||
defined or documented. It seems to have been a typo for PCRE_STATIC, so
|
||||
I have changed it.
|
||||
|
||||
5. The construct (?&) was not diagnosed as a syntax error (it referenced the
|
||||
first named subpattern) and a construct such as (?&a) would reference the
|
||||
first named subpattern whose name started with "a" (in other words, the
|
||||
length check was missing). Both these problems are fixed. "Subpattern name
|
||||
expected" is now given for (?&) (a zero-length name), and this patch also
|
||||
makes it give the same error for \k'' (previously it complained that that
|
||||
was a reference to a non-existent subpattern).
|
||||
|
||||
6. The erroneous patterns (?+-a) and (?-+a) give different error messages;
|
||||
this is right because (?- can be followed by option settings as well as by
|
||||
digits. I have, however, made the messages clearer.
|
||||
|
||||
7. Patterns such as (?(1)a|b) (a pattern that contains fewer subpatterns
|
||||
than the number used in the conditional) now cause a compile-time error.
|
||||
This is actually not compatible with Perl, which accepts such patterns, but
|
||||
treats the conditional as always being FALSE (as PCRE used to), but it
|
||||
seems to me that giving a diagnostic is better.
|
||||
|
||||
8. Change "alphameric" to the more common word "alphanumeric" in comments
|
||||
and messages.
|
||||
|
||||
9. Fix two occurrences of "backslash" in comments that should have been
|
||||
"backspace".
|
||||
|
||||
10. Remove two redundant lines of code that can never be obeyed (their function
|
||||
was moved elsewhere).
|
||||
|
||||
11. The program that makes PCRE's Unicode character property table had a bug
|
||||
which caused it to generate incorrect table entries for sequences of
|
||||
characters that have the same character type, but are in different scripts.
|
||||
It amalgamated them into a single range, with the script of the first of
|
||||
them. In other words, some characters were in the wrong script. There were
|
||||
thirteen such cases, affecting characters in the following ranges:
|
||||
|
||||
U+002b0 - U+002c1
|
||||
U+0060c - U+0060d
|
||||
U+0061e - U+00612
|
||||
U+0064b - U+0065e
|
||||
U+0074d - U+0076d
|
||||
U+01800 - U+01805
|
||||
U+01d00 - U+01d77
|
||||
U+01d9b - U+01dbf
|
||||
U+0200b - U+0200f
|
||||
U+030fc - U+030fe
|
||||
U+03260 - U+0327f
|
||||
U+0fb46 - U+0fbb1
|
||||
U+10450 - U+1049d
|
||||
|
||||
12. The -o option (show only the matching part of a line) for pcregrep was not
|
||||
compatible with GNU grep in that, if there was more than one match in a
|
||||
line, it showed only the first of them. It now behaves in the same way as
|
||||
GNU grep.
|
||||
|
||||
13. If the -o and -v options were combined for pcregrep, it printed a blank
|
||||
line for every non-matching line. GNU grep prints nothing, and pcregrep now
|
||||
does the same. The return code can be used to tell if there were any
|
||||
non-matching lines.
|
||||
|
||||
14. Added --file-offsets and --line-offsets to pcregrep.
|
||||
|
||||
15. The pattern (?=something)(?R) was not being diagnosed as a potentially
|
||||
infinitely looping recursion. The bug was that positive lookaheads were not
|
||||
being skipped when checking for a possible empty match (negative lookaheads
|
||||
and both kinds of lookbehind were skipped).
|
||||
|
||||
16. Fixed two typos in the Windows-only code in pcregrep.c, and moved the
|
||||
inclusion of <windows.h> to before rather than after the definition of
|
||||
INVALID_FILE_ATTRIBUTES (patch from David Byron).
|
||||
|
||||
17. Specifying a possessive quantifier with a specific limit for a Unicode
|
||||
character property caused pcre_compile() to compile bad code, which led at
|
||||
runtime to PCRE_ERROR_INTERNAL (-14). Examples of patterns that caused this
|
||||
are: /\p{Zl}{2,3}+/8 and /\p{Cc}{2}+/8. It was the possessive "+" that
|
||||
caused the error; without that there was no problem.
|
||||
|
||||
18. Added --enable-pcregrep-libz and --enable-pcregrep-libbz2.
|
||||
|
||||
19. Added --enable-pcretest-libreadline.
|
||||
|
||||
20. In pcrecpp.cc, the variable 'count' was incremented twice in
|
||||
RE::GlobalReplace(). As a result, the number of replacements returned was
|
||||
double what it should be. I removed one of the increments, but Craig sent a
|
||||
later patch that removed the other one (the right fix) and added unit tests
|
||||
that check the return values (which was not done before).
|
||||
|
||||
21. Several CMake things:
|
||||
|
||||
(1) Arranged that, when cmake is used on Unix, the libraries end up with
|
||||
the names libpcre and libpcreposix, not just pcre and pcreposix.
|
||||
|
||||
(2) The above change means that pcretest and pcregrep are now correctly
|
||||
linked with the newly-built libraries, not previously installed ones.
|
||||
|
||||
(3) Added PCRE_SUPPORT_LIBREADLINE, PCRE_SUPPORT_LIBZ, PCRE_SUPPORT_LIBBZ2.
|
||||
|
||||
22. In UTF-8 mode, with newline set to "any", a pattern such as .*a.*=.b.*
|
||||
crashed when matching a string such as a\x{2029}b (note that \x{2029} is a
|
||||
UTF-8 newline character). The key issue is that the pattern starts .*;
|
||||
this means that the match must be either at the beginning, or after a
|
||||
newline. The bug was in the code for advancing after a failed match and
|
||||
checking that the new position followed a newline. It was not taking
|
||||
account of UTF-8 characters correctly.
|
||||
|
||||
23. PCRE was behaving differently from Perl in the way it recognized POSIX
|
||||
character classes. PCRE was not treating the sequence [:...:] as a
|
||||
character class unless the ... were all letters. Perl, however, seems to
|
||||
allow any characters between [: and :], though of course it rejects as
|
||||
unknown any "names" that contain non-letters, because all the known class
|
||||
names consist only of letters. Thus, Perl gives an error for [[:1234:]],
|
||||
for example, whereas PCRE did not - it did not recognize a POSIX character
|
||||
class. This seemed a bit dangerous, so the code has been changed to be
|
||||
closer to Perl. The behaviour is not identical to Perl, because PCRE will
|
||||
diagnose an unknown class for, for example, [[:l\ower:]] where Perl will
|
||||
treat it as [[:lower:]]. However, PCRE does now give "unknown" errors where
|
||||
Perl does, and where it didn't before.
|
||||
|
||||
24. Rewrite so as to remove the single use of %n from pcregrep because in some
|
||||
Windows environments %n is disabled by default.
|
||||
|
||||
|
||||
Version 7.4 21-Sep-07
|
||||
---------------------
|
||||
|
||||
|
@ -1,6 +1,14 @@
|
||||
News about PCRE releases
|
||||
------------------------
|
||||
|
||||
Release 7.5 10-Jan-08
|
||||
---------------------
|
||||
|
||||
This is mainly a bug-fix release. However the ability to link pcregrep with
|
||||
libz or libbz2 and the ability to link pcretest with libreadline have been
|
||||
added. Also the --line-offsets and --file-offsets options were added to
|
||||
pcregrep.
|
||||
|
||||
|
||||
Release 7.4 21-Sep-07
|
||||
---------------------
|
||||
|
@ -84,7 +84,7 @@ The following are generic comments about building the PCRE C library "by hand".
|
||||
ucptable.h
|
||||
|
||||
(5) Also ensure that you have the following file, which is #included as source
|
||||
when building a debugging version of PCRE and is also used by pcretest.
|
||||
when building a debugging version of PCRE, and is also used by pcretest.
|
||||
|
||||
pcre_printint.src
|
||||
|
||||
|
@ -258,6 +258,24 @@ library. You can read more about them in the pcrebuild man page.
|
||||
|
||||
This automatically implies --enable-rebuild-chartables (see above).
|
||||
|
||||
. It is possible to compile pcregrep to use libz and/or libbz2, in order to
|
||||
read .gz and .bz2 files (respectively), by specifying one or both of
|
||||
|
||||
--enable-pcregrep-libz
|
||||
--enable-pcregrep-libbz2
|
||||
|
||||
Of course, the relevant libraries must be installed on your system.
|
||||
|
||||
. It is possible to compile pcretest so that it links with the libreadline
|
||||
library, by specifying
|
||||
|
||||
--enable-pcretest-libreadline
|
||||
|
||||
If this is done, when pcretest's input is from a terminal, it reads it using
|
||||
the readline() function. This provides line-editing and history facilities.
|
||||
Note that libreadline is GPL-licenced, so if you distribute a binary of
|
||||
pcretest linked in this way, there may be licensing issues.
|
||||
|
||||
The "configure" script builds the following files for the basic C library:
|
||||
|
||||
. Makefile is the makefile that builds the library
|
||||
@ -725,4 +743,4 @@ The distribution should contain the following files:
|
||||
Philip Hazel
|
||||
Email local part: ph10
|
||||
Email domain: cam.ac.uk
|
||||
Last updated: 21 September 2007
|
||||
Last updated: 18 December 2007
|
||||
|
@ -51,6 +51,11 @@ them both to 0; an emulation function will be used. */
|
||||
/* Define to 1 if you have the <bits/type_traits.h> header file. */
|
||||
/* #undef HAVE_BITS_TYPE_TRAITS_H */
|
||||
|
||||
/* Define to 1 if you have the <bzlib.h> header file. */
|
||||
#ifndef HAVE_BZLIB_H
|
||||
#define HAVE_BZLIB_H 1
|
||||
#endif
|
||||
|
||||
/* Define to 1 if you have the <dirent.h> header file. */
|
||||
#ifndef HAVE_DIRENT_H
|
||||
#define HAVE_DIRENT_H 1
|
||||
@ -86,6 +91,16 @@ them both to 0; an emulation function will be used. */
|
||||
#define HAVE_MEMORY_H 1
|
||||
#endif
|
||||
|
||||
/* Define to 1 if you have the <readline/history.h> header file. */
|
||||
#ifndef HAVE_READLINE_HISTORY_H
|
||||
#define HAVE_READLINE_HISTORY_H 1
|
||||
#endif
|
||||
|
||||
/* Define to 1 if you have the <readline/readline.h> header file. */
|
||||
#ifndef HAVE_READLINE_READLINE_H
|
||||
#define HAVE_READLINE_READLINE_H 1
|
||||
#endif
|
||||
|
||||
/* Define to 1 if you have the <stdint.h> header file. */
|
||||
#ifndef HAVE_STDINT_H
|
||||
#define HAVE_STDINT_H 1
|
||||
@ -152,6 +167,11 @@ them both to 0; an emulation function will be used. */
|
||||
/* Define to 1 if you have the <windows.h> header file. */
|
||||
/* #undef HAVE_WINDOWS_H */
|
||||
|
||||
/* Define to 1 if you have the <zlib.h> header file. */
|
||||
#ifndef HAVE_ZLIB_H
|
||||
#define HAVE_ZLIB_H 1
|
||||
#endif
|
||||
|
||||
/* Define to 1 if you have the `_strtoi64' function. */
|
||||
/* #undef HAVE__STRTOI64 */
|
||||
|
||||
@ -231,13 +251,13 @@ them both to 0; an emulation function will be used. */
|
||||
#define PACKAGE_NAME "PCRE"
|
||||
|
||||
/* Define to the full name and version of this package. */
|
||||
#define PACKAGE_STRING "PCRE 7.4"
|
||||
#define PACKAGE_STRING "PCRE 7.5"
|
||||
|
||||
/* Define to the one symbol short name of this package. */
|
||||
#define PACKAGE_TARNAME "pcre"
|
||||
|
||||
/* Define to the version of this package. */
|
||||
#define PACKAGE_VERSION "7.4"
|
||||
#define PACKAGE_VERSION "7.5"
|
||||
|
||||
|
||||
/* If you are compiling for a system other than a Unix-like system or
|
||||
@ -271,6 +291,17 @@ them both to 0; an emulation function will be used. */
|
||||
#define STDC_HEADERS 1
|
||||
#endif
|
||||
|
||||
/* Define to allow pcregrep to be linked with libbz2, so that it is able to
|
||||
handle .bz2 files. */
|
||||
/* #undef SUPPORT_LIBBZ2 */
|
||||
|
||||
/* Define to allow pcretest to be linked with libreadline. */
|
||||
/* #undef SUPPORT_LIBREADLINE */
|
||||
|
||||
/* Define to allow pcregrep to be linked with libz, so that it is able to
|
||||
handle .gz files. */
|
||||
/* #undef SUPPORT_LIBZ */
|
||||
|
||||
/* Define to enable support for Unicode properties */
|
||||
/* #undef SUPPORT_UCP */
|
||||
|
||||
@ -279,7 +310,7 @@ them both to 0; an emulation function will be used. */
|
||||
|
||||
/* Version number of package */
|
||||
#ifndef VERSION
|
||||
#define VERSION "7.4"
|
||||
#define VERSION "7.5"
|
||||
#endif
|
||||
|
||||
/* Define to empty if `const' does not conform to ANSI C. */
|
||||
|
@ -42,9 +42,9 @@ POSSIBILITY OF SUCH DAMAGE.
|
||||
/* The current PCRE version information. */
|
||||
|
||||
#define PCRE_MAJOR 7
|
||||
#define PCRE_MINOR 4
|
||||
#define PCRE_MINOR 5
|
||||
#define PCRE_PRERELEASE
|
||||
#define PCRE_DATE 2007-09-21
|
||||
#define PCRE_DATE 2008-01-10
|
||||
|
||||
/* When an application links to a PCRE DLL in Windows, the symbols that are
|
||||
imported have to be identified as such. When building PCRE, the appropriate
|
||||
|
@ -239,7 +239,7 @@ static const char error_texts[] =
|
||||
/* 10 */
|
||||
"operand of unlimited repeat could match the empty string\0" /** DEAD **/
|
||||
"internal error: unexpected repeat\0"
|
||||
"unrecognized character after (?\0"
|
||||
"unrecognized character after (? or (?-\0"
|
||||
"POSIX named classes are supported only within a class\0"
|
||||
"missing )\0"
|
||||
/* 15 */
|
||||
@ -298,7 +298,9 @@ static const char error_texts[] =
|
||||
"(*VERB) with an argument is not supported\0"
|
||||
/* 60 */
|
||||
"(*VERB) not recognized\0"
|
||||
"number is too big";
|
||||
"number is too big\0"
|
||||
"subpattern name expected\0"
|
||||
"digit expected after (?+";
|
||||
|
||||
|
||||
/* Table to identify digits and hex digits. This is used when compiling
|
||||
@ -494,16 +496,16 @@ ptr--; /* Set pointer back to the last byte */
|
||||
|
||||
if (c == 0) *errorcodeptr = ERR1;
|
||||
|
||||
/* Non-alphamerics are literals. For digits or letters, do an initial lookup in
|
||||
a table. A non-zero result is something that can be returned immediately.
|
||||
/* Non-alphanumerics are literals. For digits or letters, do an initial lookup
|
||||
in a table. A non-zero result is something that can be returned immediately.
|
||||
Otherwise further processing may be required. */
|
||||
|
||||
#ifndef EBCDIC /* ASCII coding */
|
||||
else if (c < '0' || c > 'z') {} /* Not alphameric */
|
||||
else if (c < '0' || c > 'z') {} /* Not alphanumeric */
|
||||
else if ((i = escapes[c - '0']) != 0) c = i;
|
||||
|
||||
#else /* EBCDIC coding */
|
||||
else if (c < 'a' || (ebcdic_chartab[c] & 0x0E) == 0) {} /* Not alphameric */
|
||||
else if (c < 'a' || (ebcdic_chartab[c] & 0x0E) == 0) {} /* Not alphanumeric */
|
||||
else if ((i = escapes[c - 0x48]) != 0) c = i;
|
||||
#endif
|
||||
|
||||
@ -720,10 +722,10 @@ else
|
||||
break;
|
||||
|
||||
/* PCRE_EXTRA enables extensions to Perl in the matter of escapes. Any
|
||||
other alphameric following \ is an error if PCRE_EXTRA was set; otherwise,
|
||||
for Perl compatibility, it is a literal. This code looks a bit odd, but
|
||||
there used to be some cases other than the default, and there may be again
|
||||
in future, so I haven't "optimized" it. */
|
||||
other alphanumeric following \ is an error if PCRE_EXTRA was set;
|
||||
otherwise, for Perl compatibility, it is a literal. This code looks a bit
|
||||
odd, but there used to be some cases other than the default, and there may
|
||||
be again in future, so I haven't "optimized" it. */
|
||||
|
||||
default:
|
||||
if ((options & PCRE_EXTRA) != 0) switch(c)
|
||||
@ -1504,8 +1506,9 @@ for (;;)
|
||||
can match the empty string or not. It is called from could_be_empty()
|
||||
below and from compile_branch() when checking for an unlimited repeat of a
|
||||
group that can match nothing. Note that first_significant_code() skips over
|
||||
assertions. If we hit an unclosed bracket, we return "empty" - this means we've
|
||||
struck an inner bracket whose current branch will already have been scanned.
|
||||
backward and negative forward assertions when its final argument is TRUE. If we
|
||||
hit an unclosed bracket, we return "empty" - this means we've struck an inner
|
||||
bracket whose current branch will already have been scanned.
|
||||
|
||||
Arguments:
|
||||
code points to start of search
|
||||
@ -1527,6 +1530,16 @@ for (code = first_significant_code(code + _pcre_OP_lengths[*code], NULL, 0, TRUE
|
||||
|
||||
c = *code;
|
||||
|
||||
/* Skip over forward assertions; the other assertions are skipped by
|
||||
first_significant_code() with a TRUE final argument. */
|
||||
|
||||
if (c == OP_ASSERT)
|
||||
{
|
||||
do code += GET(code, 1); while (*code == OP_ALT);
|
||||
c = *code;
|
||||
continue;
|
||||
}
|
||||
|
||||
/* Groups with zero repeats can of course be empty; skip them. */
|
||||
|
||||
if (c == OP_BRAZERO || c == OP_BRAMINZERO)
|
||||
@ -1722,29 +1735,48 @@ return TRUE;
|
||||
*************************************************/
|
||||
|
||||
/* This function is called when the sequence "[:" or "[." or "[=" is
|
||||
encountered in a character class. It checks whether this is followed by an
|
||||
optional ^ and then a sequence of letters, terminated by a matching ":]" or
|
||||
".]" or "=]".
|
||||
encountered in a character class. It checks whether this is followed by a
|
||||
sequence of characters terminated by a matching ":]" or ".]" or "=]". If we
|
||||
reach an unescaped ']' without the special preceding character, return FALSE.
|
||||
|
||||
Argument:
|
||||
Originally, this function only recognized a sequence of letters between the
|
||||
terminators, but it seems that Perl recognizes any sequence of characters,
|
||||
though of course unknown POSIX names are subsequently rejected. Perl gives an
|
||||
"Unknown POSIX class" error for [:f\oo:] for example, where previously PCRE
|
||||
didn't consider this to be a POSIX class. Likewise for [:1234:].
|
||||
|
||||
The problem in trying to be exactly like Perl is in the handling of escapes. We
|
||||
have to be sure that [abc[:x\]pqr] is *not* treated as containing a POSIX
|
||||
class, but [abc[:x\]pqr:]] is (so that an error can be generated). The code
|
||||
below handles the special case of \], but does not try to do any other escape
|
||||
processing. This makes it different from Perl for cases such as [:l\ower:]
|
||||
where Perl recognizes it as the POSIX class "lower" but PCRE does not recognize
|
||||
"l\ower". This is a lesser evil that not diagnosing bad classes when Perl does,
|
||||
I think.
|
||||
|
||||
Arguments:
|
||||
ptr pointer to the initial [
|
||||
endptr where to return the end pointer
|
||||
cd pointer to compile data
|
||||
|
||||
Returns: TRUE or FALSE
|
||||
*/
|
||||
|
||||
static BOOL
|
||||
check_posix_syntax(const uschar *ptr, const uschar **endptr, compile_data *cd)
|
||||
check_posix_syntax(const uschar *ptr, const uschar **endptr)
|
||||
{
|
||||
int terminator; /* Don't combine these lines; the Solaris cc */
|
||||
terminator = *(++ptr); /* compiler warns about "non-constant" initializer. */
|
||||
if (*(++ptr) == '^') ptr++;
|
||||
while ((cd->ctypes[*ptr] & ctype_letter) != 0) ptr++;
|
||||
if (*ptr == terminator && ptr[1] == ']')
|
||||
for (++ptr; *ptr != 0; ptr++)
|
||||
{
|
||||
*endptr = ptr;
|
||||
return TRUE;
|
||||
if (*ptr == '\\' && ptr[1] == ']') ptr++; else
|
||||
{
|
||||
if (*ptr == ']') return FALSE;
|
||||
if (*ptr == terminator && ptr[1] == ']')
|
||||
{
|
||||
*endptr = ptr;
|
||||
return TRUE;
|
||||
}
|
||||
}
|
||||
}
|
||||
return FALSE;
|
||||
}
|
||||
@ -2381,6 +2413,7 @@ req_caseopt = ((options & PCRE_CASELESS) != 0)? REQ_CASELESS : 0;
|
||||
for (;; ptr++)
|
||||
{
|
||||
BOOL negate_class;
|
||||
BOOL should_flip_negation;
|
||||
BOOL possessive_quantifier;
|
||||
BOOL is_quantifier;
|
||||
BOOL is_recurse;
|
||||
@ -2604,7 +2637,7 @@ for (;; ptr++)
|
||||
they are encountered at the top level, so we'll do that too. */
|
||||
|
||||
if ((ptr[1] == ':' || ptr[1] == '.' || ptr[1] == '=') &&
|
||||
check_posix_syntax(ptr, &tempptr, cd))
|
||||
check_posix_syntax(ptr, &tempptr))
|
||||
{
|
||||
*errorcodeptr = (ptr[1] == ':')? ERR13 : ERR31;
|
||||
goto FAILED;
|
||||
@ -2629,6 +2662,12 @@ for (;; ptr++)
|
||||
else break;
|
||||
}
|
||||
|
||||
/* If a class contains a negative special such as \S, we need to flip the
|
||||
negation flag at the end, so that support for characters > 255 works
|
||||
correctly (they are all included in the class). */
|
||||
|
||||
should_flip_negation = FALSE;
|
||||
|
||||
/* Keep a count of chars with values < 256 so that we can optimize the case
|
||||
of just a single character (as long as it's < 256). However, For higher
|
||||
valued UTF-8 characters, we don't yet do any optimization. */
|
||||
@ -2684,7 +2723,7 @@ for (;; ptr++)
|
||||
|
||||
if (c == '[' &&
|
||||
(ptr[1] == ':' || ptr[1] == '.' || ptr[1] == '=') &&
|
||||
check_posix_syntax(ptr, &tempptr, cd))
|
||||
check_posix_syntax(ptr, &tempptr))
|
||||
{
|
||||
BOOL local_negate = FALSE;
|
||||
int posix_class, taboffset, tabopt;
|
||||
@ -2701,6 +2740,7 @@ for (;; ptr++)
|
||||
if (*ptr == '^')
|
||||
{
|
||||
local_negate = TRUE;
|
||||
should_flip_negation = TRUE; /* Note negative special */
|
||||
ptr++;
|
||||
}
|
||||
|
||||
@ -2775,7 +2815,7 @@ for (;; ptr++)
|
||||
c = check_escape(&ptr, errorcodeptr, cd->bracount, options, TRUE);
|
||||
if (*errorcodeptr != 0) goto FAILED;
|
||||
|
||||
if (-c == ESC_b) c = '\b'; /* \b is backslash in a class */
|
||||
if (-c == ESC_b) c = '\b'; /* \b is backspace in a class */
|
||||
else if (-c == ESC_X) c = 'X'; /* \X is literal X in a class */
|
||||
else if (-c == ESC_R) c = 'R'; /* \R is literal R in a class */
|
||||
else if (-c == ESC_Q) /* Handle start of quoted string */
|
||||
@ -2803,6 +2843,7 @@ for (;; ptr++)
|
||||
continue;
|
||||
|
||||
case ESC_D:
|
||||
should_flip_negation = TRUE;
|
||||
for (c = 0; c < 32; c++) classbits[c] |= ~cbits[c+cbit_digit];
|
||||
continue;
|
||||
|
||||
@ -2811,6 +2852,7 @@ for (;; ptr++)
|
||||
continue;
|
||||
|
||||
case ESC_W:
|
||||
should_flip_negation = TRUE;
|
||||
for (c = 0; c < 32; c++) classbits[c] |= ~cbits[c+cbit_word];
|
||||
continue;
|
||||
|
||||
@ -2820,13 +2862,11 @@ for (;; ptr++)
|
||||
continue;
|
||||
|
||||
case ESC_S:
|
||||
should_flip_negation = TRUE;
|
||||
for (c = 0; c < 32; c++) classbits[c] |= ~cbits[c+cbit_space];
|
||||
classbits[1] |= 0x08; /* Perl 5.004 onwards omits VT from \s */
|
||||
continue;
|
||||
|
||||
case ESC_E: /* Perl ignores an orphan \E */
|
||||
continue;
|
||||
|
||||
default: /* Not recognized; fall through */
|
||||
break; /* Need "default" setting to stop compiler warning. */
|
||||
}
|
||||
@ -3061,7 +3101,7 @@ for (;; ptr++)
|
||||
d = check_escape(&ptr, errorcodeptr, cd->bracount, options, TRUE);
|
||||
if (*errorcodeptr != 0) goto FAILED;
|
||||
|
||||
/* \b is backslash; \X is literal X; \R is literal R; any other
|
||||
/* \b is backspace; \X is literal X; \R is literal R; any other
|
||||
special means the '-' was literal */
|
||||
|
||||
if (d < 0)
|
||||
@ -3325,11 +3365,14 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
|
||||
zeroreqbyte = reqbyte;
|
||||
|
||||
/* If there are characters with values > 255, we have to compile an
|
||||
extended class, with its own opcode. If there are no characters < 256,
|
||||
we can omit the bitmap in the actual compiled code. */
|
||||
extended class, with its own opcode, unless there was a negated special
|
||||
such as \S in the class, because in that case all characters > 255 are in
|
||||
the class, so any that were explicitly given as well can be ignored. If
|
||||
(when there are explicit characters > 255 that must be listed) there are no
|
||||
characters < 256, we can omit the bitmap in the actual compiled code. */
|
||||
|
||||
#ifdef SUPPORT_UTF8
|
||||
if (class_utf8)
|
||||
if (class_utf8 && !should_flip_negation)
|
||||
{
|
||||
*class_utf8data++ = XCL_END; /* Marks the end of extra data */
|
||||
*code++ = OP_XCLASS;
|
||||
@ -3355,20 +3398,19 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
|
||||
}
|
||||
#endif
|
||||
|
||||
/* If there are no characters > 255, negate the 32-byte map if necessary,
|
||||
and copy it into the code vector. If this is the first thing in the branch,
|
||||
there can be no first char setting, whatever the repeat count. Any reqbyte
|
||||
setting must remain unchanged after any kind of repeat. */
|
||||
/* If there are no characters > 255, set the opcode to OP_CLASS or
|
||||
OP_NCLASS, depending on whether the whole class was negated and whether
|
||||
there were negative specials such as \S in the class. Then copy the 32-byte
|
||||
map into the code vector, negating it if necessary. */
|
||||
|
||||
*code++ = (negate_class == should_flip_negation) ? OP_CLASS : OP_NCLASS;
|
||||
if (negate_class)
|
||||
{
|
||||
*code++ = OP_NCLASS;
|
||||
if (lengthptr == NULL) /* Save time in the pre-compile phase */
|
||||
for (c = 0; c < 32; c++) code[c] = ~classbits[c];
|
||||
}
|
||||
else
|
||||
{
|
||||
*code++ = OP_CLASS;
|
||||
memcpy(code, classbits, 32);
|
||||
}
|
||||
code += 32;
|
||||
@ -4004,7 +4046,9 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
|
||||
int len;
|
||||
if (*tempcode == OP_EXACT || *tempcode == OP_TYPEEXACT ||
|
||||
*tempcode == OP_NOTEXACT)
|
||||
tempcode += _pcre_OP_lengths[*tempcode];
|
||||
tempcode += _pcre_OP_lengths[*tempcode] +
|
||||
((*tempcode == OP_TYPEEXACT &&
|
||||
(tempcode[3] == OP_PROP || tempcode[3] == OP_NOTPROP))? 2:0);
|
||||
len = code - tempcode;
|
||||
if (len > 0) switch (*tempcode)
|
||||
{
|
||||
@ -4231,16 +4275,13 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
|
||||
*errorcodeptr = ERR58;
|
||||
goto FAILED;
|
||||
}
|
||||
if (refsign == '-')
|
||||
recno = (refsign == '-')?
|
||||
cd->bracount - recno + 1 : recno +cd->bracount;
|
||||
if (recno <= 0 || recno > cd->final_bracount)
|
||||
{
|
||||
recno = cd->bracount - recno + 1;
|
||||
if (recno <= 0)
|
||||
{
|
||||
*errorcodeptr = ERR15;
|
||||
goto FAILED;
|
||||
}
|
||||
*errorcodeptr = ERR15;
|
||||
goto FAILED;
|
||||
}
|
||||
else recno += cd->bracount;
|
||||
PUT2(code, 2+LINK_SIZE, recno);
|
||||
break;
|
||||
}
|
||||
@ -4312,9 +4353,10 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
|
||||
skipbytes = 1;
|
||||
}
|
||||
|
||||
/* Check for the "name" actually being a subpattern number. */
|
||||
/* Check for the "name" actually being a subpattern number. We are
|
||||
in the second pass here, so final_bracount is set. */
|
||||
|
||||
else if (recno > 0)
|
||||
else if (recno > 0 && recno <= cd->final_bracount)
|
||||
{
|
||||
PUT2(code, 2+LINK_SIZE, recno);
|
||||
}
|
||||
@ -4508,7 +4550,9 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
|
||||
|
||||
/* We come here from the Python syntax above that handles both
|
||||
references (?P=name) and recursion (?P>name), as well as falling
|
||||
through from the Perl recursion syntax (?&name). */
|
||||
through from the Perl recursion syntax (?&name). We also come here from
|
||||
the Perl \k<name> or \k'name' back reference syntax and the \k{name}
|
||||
.NET syntax. */
|
||||
|
||||
NAMED_REF_OR_RECURSE:
|
||||
name = ++ptr;
|
||||
@ -4520,6 +4564,11 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
|
||||
|
||||
if (lengthptr != NULL)
|
||||
{
|
||||
if (namelen == 0)
|
||||
{
|
||||
*errorcodeptr = ERR62;
|
||||
goto FAILED;
|
||||
}
|
||||
if (*ptr != terminator)
|
||||
{
|
||||
*errorcodeptr = ERR42;
|
||||
@ -4533,14 +4582,19 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
|
||||
recno = 0;
|
||||
}
|
||||
|
||||
/* In the real compile, seek the name in the table */
|
||||
/* In the real compile, seek the name in the table. We check the name
|
||||
first, and then check that we have reached the end of the name in the
|
||||
table. That way, if the name that is longer than any in the table,
|
||||
the comparison will fail without reading beyond the table entry. */
|
||||
|
||||
else
|
||||
{
|
||||
slot = cd->name_table;
|
||||
for (i = 0; i < cd->names_found; i++)
|
||||
{
|
||||
if (strncmp((char *)name, (char *)slot+2, namelen) == 0) break;
|
||||
if (strncmp((char *)name, (char *)slot+2, namelen) == 0 &&
|
||||
slot[2+namelen] == 0)
|
||||
break;
|
||||
slot += cd->name_entry_size;
|
||||
}
|
||||
|
||||
@ -4577,7 +4631,15 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
|
||||
{
|
||||
const uschar *called;
|
||||
|
||||
if ((refsign = *ptr) == '+') ptr++;
|
||||
if ((refsign = *ptr) == '+')
|
||||
{
|
||||
ptr++;
|
||||
if ((digitab[*ptr] & ctype_digit) == 0)
|
||||
{
|
||||
*errorcodeptr = ERR63;
|
||||
goto FAILED;
|
||||
}
|
||||
}
|
||||
else if (refsign == '-')
|
||||
{
|
||||
if ((digitab[ptr[1]] & ctype_digit) == 0)
|
||||
@ -5904,7 +5966,7 @@ to compile parts of the pattern into; the compiled code is discarded when it is
|
||||
no longer needed, so hopefully this workspace will never overflow, though there
|
||||
is a test for its doing so. */
|
||||
|
||||
cd->bracount = 0;
|
||||
cd->bracount = cd->final_bracount = 0;
|
||||
cd->names_found = 0;
|
||||
cd->name_entry_size = 0;
|
||||
cd->name_table = NULL;
|
||||
@ -5981,6 +6043,7 @@ field. Reset the bracket count and the names_found field. Also reset the hwm
|
||||
field; this time it's used for remembering forward references to subpatterns.
|
||||
*/
|
||||
|
||||
cd->final_bracount = cd->bracount; /* Save for checking forward references */
|
||||
cd->bracount = 0;
|
||||
cd->names_found = 0;
|
||||
cd->name_table = (uschar *)re + re->name_table_offset;
|
||||
|
@ -4668,10 +4668,10 @@ for(;;)
|
||||
if (first_byte_caseless)
|
||||
while (start_match < end_subject &&
|
||||
md->lcc[*start_match] != first_byte)
|
||||
start_match++;
|
||||
{ NEXTCHAR(start_match); }
|
||||
else
|
||||
while (start_match < end_subject && *start_match != first_byte)
|
||||
start_match++;
|
||||
{ NEXTCHAR(start_match); }
|
||||
}
|
||||
|
||||
/* Or to just after a linebreak for a multiline match if possible */
|
||||
@ -4681,7 +4681,7 @@ for(;;)
|
||||
if (start_match > md->start_subject + start_offset)
|
||||
{
|
||||
while (start_match <= end_subject && !WAS_NEWLINE(start_match))
|
||||
start_match++;
|
||||
{ NEXTCHAR(start_match); }
|
||||
|
||||
/* If we have just passed a CR and the newline option is ANY or ANYCRLF,
|
||||
and we are now at a LF, advance the match position by one more character.
|
||||
@ -4702,7 +4702,9 @@ for(;;)
|
||||
while (start_match < end_subject)
|
||||
{
|
||||
register unsigned int c = *start_match;
|
||||
if ((start_bits[c/8] & (1 << (c&7))) == 0) start_match++; else break;
|
||||
if ((start_bits[c/8] & (1 << (c&7))) == 0)
|
||||
{ NEXTCHAR(start_match); }
|
||||
else break;
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -363,6 +363,7 @@ never be called in byte mode. To make sure it can never even appear when UTF-8
|
||||
support is omitted, we don't even define it. */
|
||||
|
||||
#ifndef SUPPORT_UTF8
|
||||
#define NEXTCHAR(p) p++;
|
||||
#define GETCHAR(c, eptr) c = *eptr;
|
||||
#define GETCHARTEST(c, eptr) c = *eptr;
|
||||
#define GETCHARINC(c, eptr) c = *eptr++;
|
||||
@ -372,6 +373,13 @@ support is omitted, we don't even define it. */
|
||||
|
||||
#else /* SUPPORT_UTF8 */
|
||||
|
||||
/* Advance a character pointer one byte in non-UTF-8 mode and by one character
|
||||
in UTF-8 mode. */
|
||||
|
||||
#define NEXTCHAR(p) \
|
||||
p++; \
|
||||
if (utf8) { while((*p & 0xc0) == 0x80) p++; }
|
||||
|
||||
/* Get the next UTF-8 character, not advancing the pointer. This is called when
|
||||
we know we are in UTF-8 mode. */
|
||||
|
||||
@ -871,7 +879,7 @@ enum { ERR0, ERR1, ERR2, ERR3, ERR4, ERR5, ERR6, ERR7, ERR8, ERR9,
|
||||
ERR30, ERR31, ERR32, ERR33, ERR34, ERR35, ERR36, ERR37, ERR38, ERR39,
|
||||
ERR40, ERR41, ERR42, ERR43, ERR44, ERR45, ERR46, ERR47, ERR48, ERR49,
|
||||
ERR50, ERR51, ERR52, ERR53, ERR54, ERR55, ERR56, ERR57, ERR58, ERR59,
|
||||
ERR60, ERR61 };
|
||||
ERR60, ERR61, ERR62, ERR63 };
|
||||
|
||||
/* The real format of the start of the pcre block; the index of names and the
|
||||
code vector run on as long as necessary after the end. We store an explicit
|
||||
@ -934,7 +942,8 @@ typedef struct compile_data {
|
||||
uschar *name_table; /* The name/number table */
|
||||
int names_found; /* Number of entries so far */
|
||||
int name_entry_size; /* Size of each entry */
|
||||
int bracount; /* Count of capturing parens */
|
||||
int bracount; /* Count of capturing parens as we compile */
|
||||
int final_bracount; /* Saved value after first pass */
|
||||
int top_backref; /* Maximum back reference */
|
||||
unsigned int backref_map; /* Bitmap of low back refs */
|
||||
int external_options; /* External (initial) options */
|
||||
@ -1036,7 +1045,7 @@ typedef struct dfa_match_data {
|
||||
#define ctype_letter 0x02
|
||||
#define ctype_digit 0x04
|
||||
#define ctype_xdigit 0x08
|
||||
#define ctype_word 0x10 /* alphameric or '_' */
|
||||
#define ctype_word 0x10 /* alphanumeric or '_' */
|
||||
#define ctype_meta 0x80 /* regexp meta char or zero (end pattern) */
|
||||
|
||||
/* Offsets for the bitmap tables in pcre_cbits. Each table contains a set
|
||||
|
@ -60,7 +60,7 @@ an invalid string are then undefined.
|
||||
Originally, this function checked according to RFC 2279, allowing for values in
|
||||
the range 0 to 0x7fffffff, up to 6 bytes long, but ensuring that they were in
|
||||
the canonical format. Once somebody had pointed out RFC 3629 to me (it
|
||||
obsoletes 2279), additional restrictions were applies. The values are now
|
||||
obsoletes 2279), additional restrictions were applied. The values are now
|
||||
limited to be between 0 and 0x0010ffff, no more than 4 bytes long, and the
|
||||
subrange 0xd000 to 0xdfff is excluded.
|
||||
|
||||
|
File diff suppressed because it is too large
Load Diff
@ -122,7 +122,9 @@ static const int eint[] = {
|
||||
REG_INVARG, /* inconsistent NEWLINE options */
|
||||
REG_BADPAT, /* \g is not followed followed by an (optionally braced) non-zero number */
|
||||
REG_BADPAT, /* (?+ or (?- must be followed by a non-zero number */
|
||||
REG_BADPAT /* number is too big */
|
||||
REG_BADPAT, /* number is too big */
|
||||
REG_BADPAT, /* subpattern name expected */
|
||||
REG_BADPAT /* digit expected after (?+ */
|
||||
};
|
||||
|
||||
/* Table of texts corresponding to POSIX error codes */
|
||||
|
15
ext/pcre/pcrelib/testdata/grepoutput
vendored
15
ext/pcre/pcrelib/testdata/grepoutput
vendored
@ -358,10 +358,13 @@ after the binary zero
|
||||
./testdata/grepinput:597:after the binary zero
|
||||
---------------------------- Test 42 ------------------------------
|
||||
595:before
|
||||
595:zero
|
||||
596:zero
|
||||
597:after
|
||||
597:zero
|
||||
---------------------------- Test 43 ------------------------------
|
||||
595:before
|
||||
595:zero
|
||||
596:zero
|
||||
597:zero
|
||||
---------------------------- Test 44 ------------------------------
|
||||
@ -387,3 +390,15 @@ PUT NEW DATA ABOVE THIS LINE.
|
||||
over the lazy dog.
|
||||
---------------------------- Test 51 ------------------------------
|
||||
fox [1;31mjumps[00m
|
||||
---------------------------- Test 52 ------------------------------
|
||||
36972,6
|
||||
36990,4
|
||||
37024,4
|
||||
37066,5
|
||||
37083,4
|
||||
---------------------------- Test 53 ------------------------------
|
||||
595:15,6
|
||||
595:33,4
|
||||
596:28,4
|
||||
597:15,5
|
||||
597:32,4
|
||||
|
16
ext/pcre/pcrelib/testdata/testinput1
vendored
16
ext/pcre/pcrelib/testdata/testinput1
vendored
@ -3421,11 +3421,6 @@
|
||||
/((?m)^b)/
|
||||
a\nb\nc\n
|
||||
|
||||
/(?(1)a|b)/
|
||||
|
||||
/(?(1)b|a)/
|
||||
a
|
||||
|
||||
/(x)?(?(1)a|b)/
|
||||
*** Failers
|
||||
a
|
||||
@ -4030,4 +4025,15 @@
|
||||
/( (?(1)0|)* )/x
|
||||
abcd
|
||||
|
||||
/[[:abcd:xyz]]/
|
||||
a]
|
||||
:]
|
||||
|
||||
/[abc[:x\]pqr]/
|
||||
a
|
||||
[
|
||||
:
|
||||
]
|
||||
p
|
||||
|
||||
/ End of testinput1 /
|
||||
|
80
ext/pcre/pcrelib/testdata/testinput2
vendored
80
ext/pcre/pcrelib/testdata/testinput2
vendored
@ -398,8 +398,6 @@
|
||||
|
||||
/(?(1?)a|b)/
|
||||
|
||||
/(?(1)a|b|c)/
|
||||
|
||||
/[a[:xyz:/
|
||||
|
||||
/(?<=x+)y/
|
||||
@ -568,15 +566,15 @@
|
||||
|
||||
/ab\d+/I
|
||||
|
||||
/a(?(1)b)/I
|
||||
/a(?(1)b)(.)/I
|
||||
|
||||
/a(?(1)bag|big)/I
|
||||
/a(?(1)bag|big)(.)/I
|
||||
|
||||
/a(?(1)bag|big)*/I
|
||||
/a(?(1)bag|big)*(.)/I
|
||||
|
||||
/a(?(1)bag|big)+/I
|
||||
/a(?(1)bag|big)+(.)/I
|
||||
|
||||
/a(?(1)b..|b..)/I
|
||||
/a(?(1)b..|b..)(.)/I
|
||||
|
||||
/ab\d{0}e/I
|
||||
|
||||
@ -977,13 +975,13 @@
|
||||
|
||||
/()a/I
|
||||
|
||||
/(?(1)ab|ac)/I
|
||||
/(?(1)ab|ac)(.)/I
|
||||
|
||||
/(?(1)abz|acz)/I
|
||||
/(?(1)abz|acz)(.)/I
|
||||
|
||||
/(?(1)abz)/I
|
||||
/(?(1)abz)(.)/I
|
||||
|
||||
/(?(1)abz)123/I
|
||||
/(?(1)abz)(1)23/I
|
||||
|
||||
/(a)+/I
|
||||
|
||||
@ -2190,8 +2188,8 @@ a random value. /Ix
|
||||
|
||||
/((?(-2)a))/BZ
|
||||
|
||||
/^(?(+1)X|Y)/BZ
|
||||
Y
|
||||
/^(?(+1)X|Y)(.)/BZ
|
||||
Y!
|
||||
|
||||
/(foo)\Kbar/
|
||||
foobar
|
||||
@ -2535,4 +2533,60 @@ a random value. /Ix
|
||||
|
||||
/(*CRLF)(*BSR_ANYCRLF)(*CR)ab/I
|
||||
|
||||
/(?<a>)(?&)/
|
||||
|
||||
/(?<abc>)(?&a)/
|
||||
|
||||
/(?<a>)(?&aaaaaaaaaaaaaaaaaaaaaaa)/
|
||||
|
||||
/(?+-a)/
|
||||
|
||||
/(?-+a)/
|
||||
|
||||
/(?(-1))/
|
||||
|
||||
/(?(+10))/
|
||||
|
||||
/(?(10))/
|
||||
|
||||
/(?(+2))()()/
|
||||
|
||||
/(?(2))()()/
|
||||
|
||||
/\k''/
|
||||
|
||||
/\k<>/
|
||||
|
||||
/\k{}/
|
||||
|
||||
/(?P=)/
|
||||
|
||||
/(?P>)/
|
||||
|
||||
/(?!\w)(?R)/
|
||||
|
||||
/(?=\w)(?R)/
|
||||
|
||||
/(?<!\w)(?R)/
|
||||
|
||||
/(?<=\w)(?R)/
|
||||
|
||||
/[[:foo:]]/
|
||||
|
||||
/[[:1234:]]/
|
||||
|
||||
/[[:f\oo:]]/
|
||||
|
||||
/[[: :]]/
|
||||
|
||||
/[[:...:]]/
|
||||
|
||||
/[[:l\ower:]]/
|
||||
|
||||
/[[:abc\:]]/
|
||||
|
||||
/[abc[:x\]pqr:]]/
|
||||
|
||||
/[[:a\dz:]]/
|
||||
|
||||
/ End of testinput2 /
|
||||
|
72
ext/pcre/pcrelib/testdata/testinput4
vendored
72
ext/pcre/pcrelib/testdata/testinput4
vendored
@ -535,4 +535,76 @@
|
||||
/\W{2}/8g
|
||||
+\x{a3}==
|
||||
|
||||
/\S/8g
|
||||
\x{442}\x{435}\x{441}\x{442}
|
||||
|
||||
/[\S]/8g
|
||||
\x{442}\x{435}\x{441}\x{442}
|
||||
|
||||
/\D/8g
|
||||
\x{442}\x{435}\x{441}\x{442}
|
||||
|
||||
/[\D]/8g
|
||||
\x{442}\x{435}\x{441}\x{442}
|
||||
|
||||
/\W/8g
|
||||
\x{2442}\x{2435}\x{2441}\x{2442}
|
||||
|
||||
/[\W]/8g
|
||||
\x{2442}\x{2435}\x{2441}\x{2442}
|
||||
|
||||
/[\S\s]*/8
|
||||
abc\n\r\x{442}\x{435}\x{441}\x{442}xyz
|
||||
|
||||
/[\x{41f}\S]/8g
|
||||
\x{442}\x{435}\x{441}\x{442}
|
||||
|
||||
/.[^\S]./8g
|
||||
abc def\x{442}\x{443}xyz\npqr
|
||||
|
||||
/.[^\S\n]./8g
|
||||
abc def\x{442}\x{443}xyz\npqr
|
||||
|
||||
/[[:^alnum:]]/8g
|
||||
+\x{2442}
|
||||
|
||||
/[[:^alpha:]]/8g
|
||||
+\x{2442}
|
||||
|
||||
/[[:^ascii:]]/8g
|
||||
A\x{442}
|
||||
|
||||
/[[:^blank:]]/8g
|
||||
A\x{442}
|
||||
|
||||
/[[:^cntrl:]]/8g
|
||||
A\x{442}
|
||||
|
||||
/[[:^digit:]]/8g
|
||||
A\x{442}
|
||||
|
||||
/[[:^graph:]]/8g
|
||||
\x19\x{e01ff}
|
||||
|
||||
/[[:^lower:]]/8g
|
||||
A\x{422}
|
||||
|
||||
/[[:^print:]]/8g
|
||||
\x{19}\x{e01ff}
|
||||
|
||||
/[[:^punct:]]/8g
|
||||
A\x{442}
|
||||
|
||||
/[[:^space:]]/8g
|
||||
A\x{442}
|
||||
|
||||
/[[:^upper:]]/8g
|
||||
a\x{442}
|
||||
|
||||
/[[:^word:]]/8g
|
||||
+\x{2442}
|
||||
|
||||
/[[:^xdigit:]]/8g
|
||||
M\x{442}
|
||||
|
||||
/ End of testinput4 /
|
||||
|
8
ext/pcre/pcrelib/testdata/testinput5
vendored
8
ext/pcre/pcrelib/testdata/testinput5
vendored
@ -453,4 +453,12 @@ can't tell the difference.) --/
|
||||
a\x{85}b\<bsr_anycrlf>
|
||||
a\x0bb\<bsr_anycrlf>
|
||||
|
||||
/.*a.*=.b.*/8<ANY>
|
||||
QQQ\x{2029}ABCaXYZ=!bPQR
|
||||
** Failers
|
||||
a\x{2029}b
|
||||
\x61\xe2\x80\xa9\x62
|
||||
|
||||
/[[:a\x{100}b:]]/8
|
||||
|
||||
/ End of testinput5 /
|
||||
|
75
ext/pcre/pcrelib/testdata/testinput6
vendored
75
ext/pcre/pcrelib/testdata/testinput6
vendored
@ -832,4 +832,79 @@ was broken in all cases./
|
||||
|
||||
/(\p{Yi}{0,3}+\277)*/
|
||||
|
||||
/^[\p{Arabic}]/8
|
||||
\x{60e}
|
||||
\x{656}
|
||||
\x{657}
|
||||
\x{658}
|
||||
\x{659}
|
||||
\x{65a}
|
||||
\x{65b}
|
||||
\x{65c}
|
||||
\x{65d}
|
||||
\x{65e}
|
||||
\x{66a}
|
||||
\x{6e9}
|
||||
\x{6ef}
|
||||
\x{6fa}
|
||||
** Failers
|
||||
\x{600}
|
||||
\x{650}
|
||||
\x{651}
|
||||
\x{652}
|
||||
\x{653}
|
||||
\x{654}
|
||||
\x{655}
|
||||
\x{65f}
|
||||
|
||||
/^\p{Cyrillic}/8
|
||||
\x{1d2b}
|
||||
|
||||
/^\p{Common}/8
|
||||
\x{589}
|
||||
\x{60c}
|
||||
\x{61f}
|
||||
\x{964}
|
||||
\x{965}
|
||||
\x{970}
|
||||
|
||||
/^\p{Inherited}/8
|
||||
\x{64b}
|
||||
\x{654}
|
||||
\x{655}
|
||||
\x{200c}
|
||||
** Failers
|
||||
\x{64a}
|
||||
\x{656}
|
||||
|
||||
/^\p{Shavian}/8
|
||||
\x{10450}
|
||||
\x{1047f}
|
||||
|
||||
/^\p{Deseret}/8
|
||||
\x{10400}
|
||||
\x{1044f}
|
||||
|
||||
/^\p{Osmanya}/8
|
||||
\x{10480}
|
||||
\x{1049d}
|
||||
\x{104a0}
|
||||
\x{104a9}
|
||||
** Failers
|
||||
\x{1049e}
|
||||
\x{1049f}
|
||||
\x{104aa}
|
||||
|
||||
/\p{Zl}{2,3}+/8BZ
|
||||
\xe2\x80\xa8\xe2\x80\xa8
|
||||
\x{2028}\x{2028}\x{2028}
|
||||
|
||||
/\p{Zl}/8BZ
|
||||
|
||||
/\p{Lu}{3}+/8BZ
|
||||
|
||||
/\pL{2}+/8BZ
|
||||
|
||||
/\p{Cc}{2}+/8BZ
|
||||
|
||||
/ End of testinput6 /
|
||||
|
24
ext/pcre/pcrelib/testdata/testoutput1
vendored
24
ext/pcre/pcrelib/testdata/testoutput1
vendored
@ -5551,12 +5551,6 @@ No match
|
||||
0: b
|
||||
1: b
|
||||
|
||||
/(?(1)a|b)/
|
||||
|
||||
/(?(1)b|a)/
|
||||
a
|
||||
0: a
|
||||
|
||||
/(x)?(?(1)a|b)/
|
||||
*** Failers
|
||||
No match
|
||||
@ -6593,4 +6587,22 @@ No match
|
||||
0:
|
||||
1:
|
||||
|
||||
/[[:abcd:xyz]]/
|
||||
a]
|
||||
0: a]
|
||||
:]
|
||||
0: :]
|
||||
|
||||
/[abc[:x\]pqr]/
|
||||
a
|
||||
0: a
|
||||
[
|
||||
0: [
|
||||
:
|
||||
0: :
|
||||
]
|
||||
0: ]
|
||||
p
|
||||
0: p
|
||||
|
||||
/ End of testinput1 /
|
||||
|
161
ext/pcre/pcrelib/testdata/testoutput2
vendored
161
ext/pcre/pcrelib/testdata/testoutput2
vendored
@ -109,7 +109,7 @@ Failed: missing ) at offset 4
|
||||
Failed: missing ) after comment at offset 7
|
||||
|
||||
/(?z)abc/
|
||||
Failed: unrecognized character after (? at offset 2
|
||||
Failed: unrecognized character after (? or (?- at offset 2
|
||||
|
||||
/.*b/I
|
||||
Capturing subpattern count = 0
|
||||
@ -310,7 +310,7 @@ No match
|
||||
No match
|
||||
|
||||
/ab(?z)cd/
|
||||
Failed: unrecognized character after (? at offset 4
|
||||
Failed: unrecognized character after (? or (?- at offset 4
|
||||
|
||||
/^abc|def/I
|
||||
Capturing subpattern count = 0
|
||||
@ -946,26 +946,23 @@ Failed: missing ) at offset 4
|
||||
Failed: unrecognized character after (?< at offset 3
|
||||
|
||||
/a(?{)b/
|
||||
Failed: unrecognized character after (? at offset 3
|
||||
Failed: unrecognized character after (? or (?- at offset 3
|
||||
|
||||
/a(?{{})b/
|
||||
Failed: unrecognized character after (? at offset 3
|
||||
Failed: unrecognized character after (? or (?- at offset 3
|
||||
|
||||
/a(?{}})b/
|
||||
Failed: unrecognized character after (? at offset 3
|
||||
Failed: unrecognized character after (? or (?- at offset 3
|
||||
|
||||
/a(?{"{"})b/
|
||||
Failed: unrecognized character after (? at offset 3
|
||||
Failed: unrecognized character after (? or (?- at offset 3
|
||||
|
||||
/a(?{"{"}})b/
|
||||
Failed: unrecognized character after (? at offset 3
|
||||
Failed: unrecognized character after (? or (?- at offset 3
|
||||
|
||||
/(?(1?)a|b)/
|
||||
Failed: malformed number or name after (?( at offset 4
|
||||
|
||||
/(?(1)a|b|c)/
|
||||
Failed: conditional group contains more than two branches at offset 10
|
||||
|
||||
/[a[:xyz:/
|
||||
Failed: missing terminating ] for character class at offset 8
|
||||
|
||||
@ -1599,32 +1596,32 @@ No options
|
||||
First char = 'a'
|
||||
Need char = 'b'
|
||||
|
||||
/a(?(1)b)/I
|
||||
Capturing subpattern count = 0
|
||||
/a(?(1)b)(.)/I
|
||||
Capturing subpattern count = 1
|
||||
No options
|
||||
First char = 'a'
|
||||
No need char
|
||||
|
||||
/a(?(1)bag|big)/I
|
||||
Capturing subpattern count = 0
|
||||
/a(?(1)bag|big)(.)/I
|
||||
Capturing subpattern count = 1
|
||||
No options
|
||||
First char = 'a'
|
||||
Need char = 'g'
|
||||
|
||||
/a(?(1)bag|big)*/I
|
||||
Capturing subpattern count = 0
|
||||
/a(?(1)bag|big)*(.)/I
|
||||
Capturing subpattern count = 1
|
||||
No options
|
||||
First char = 'a'
|
||||
No need char
|
||||
|
||||
/a(?(1)bag|big)+/I
|
||||
Capturing subpattern count = 0
|
||||
/a(?(1)bag|big)+(.)/I
|
||||
Capturing subpattern count = 1
|
||||
No options
|
||||
First char = 'a'
|
||||
Need char = 'g'
|
||||
|
||||
/a(?(1)b..|b..)/I
|
||||
Capturing subpattern count = 0
|
||||
/a(?(1)b..|b..)(.)/I
|
||||
Capturing subpattern count = 1
|
||||
No options
|
||||
First char = 'a'
|
||||
Need char = 'b'
|
||||
@ -1905,7 +1902,7 @@ No need char
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
^
|
||||
[\x00-/:-@[-`{-\xff]
|
||||
[\x00-/:-@[-`{-\xff] (neg)
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
@ -1931,7 +1928,7 @@ No need char
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
^
|
||||
[\x00-@[-`{-\xff]
|
||||
[\x00-@[-`{-\xff] (neg)
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
@ -1965,7 +1962,7 @@ No need char
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
^
|
||||
[\x80-\xff]
|
||||
[\x80-\xff] (neg)
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
@ -1991,7 +1988,7 @@ No need char
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
^
|
||||
[\x00-\x08\x0a-\x1f!-\xff]
|
||||
[\x00-\x08\x0a-\x1f!-\xff] (neg)
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
@ -2142,7 +2139,7 @@ No need char
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
^
|
||||
[ -~\x80-\xff]
|
||||
[ -~\x80-\xff] (neg)
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
@ -2155,7 +2152,7 @@ No need char
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
^
|
||||
[\x00-/12:-\xff]
|
||||
[\x00-/12:-\xff] (neg)
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
@ -2168,7 +2165,7 @@ No need char
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
^
|
||||
[\x00-\x08\x0a-\x1f!-\xff]
|
||||
[\x00-\x08\x0a-\x1f!-\xff] (neg)
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
@ -2736,7 +2733,7 @@ No need char
|
||||
/[\S]/DZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
[\x00-\x08\x0b\x0e-\x1f!-\xff]
|
||||
[\x00-\x08\x0b\x0e-\x1f!-\xff] (neg)
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
@ -3441,26 +3438,26 @@ No options
|
||||
No first char
|
||||
Need char = 'a'
|
||||
|
||||
/(?(1)ab|ac)/I
|
||||
Capturing subpattern count = 0
|
||||
/(?(1)ab|ac)(.)/I
|
||||
Capturing subpattern count = 1
|
||||
No options
|
||||
First char = 'a'
|
||||
No need char
|
||||
|
||||
/(?(1)abz|acz)/I
|
||||
Capturing subpattern count = 0
|
||||
/(?(1)abz|acz)(.)/I
|
||||
Capturing subpattern count = 1
|
||||
No options
|
||||
First char = 'a'
|
||||
Need char = 'z'
|
||||
|
||||
/(?(1)abz)/I
|
||||
Capturing subpattern count = 0
|
||||
/(?(1)abz)(.)/I
|
||||
Capturing subpattern count = 1
|
||||
No options
|
||||
No first char
|
||||
No need char
|
||||
|
||||
/(?(1)abz)123/I
|
||||
Capturing subpattern count = 0
|
||||
/(?(1)abz)(1)23/I
|
||||
Capturing subpattern count = 1
|
||||
No options
|
||||
No first char
|
||||
Need char = '3'
|
||||
@ -8308,7 +8305,7 @@ Failed: reference to non-existent subpattern at offset 6
|
||||
/((?(-2)a))/BZ
|
||||
Failed: reference to non-existent subpattern at offset 7
|
||||
|
||||
/^(?(+1)X|Y)/BZ
|
||||
/^(?(+1)X|Y)(.)/BZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
^
|
||||
@ -8318,11 +8315,15 @@ Failed: reference to non-existent subpattern at offset 7
|
||||
Alt
|
||||
Y
|
||||
Ket
|
||||
CBra 1
|
||||
Any
|
||||
Ket
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Y
|
||||
0: Y
|
||||
Y!
|
||||
0: Y!
|
||||
1: !
|
||||
|
||||
/(foo)\Kbar/
|
||||
foobar
|
||||
@ -9302,4 +9303,86 @@ Forced newline sequence: CR
|
||||
First char = 'a'
|
||||
Need char = 'b'
|
||||
|
||||
/(?<a>)(?&)/
|
||||
Failed: subpattern name expected at offset 9
|
||||
|
||||
/(?<abc>)(?&a)/
|
||||
Failed: reference to non-existent subpattern at offset 12
|
||||
|
||||
/(?<a>)(?&aaaaaaaaaaaaaaaaaaaaaaa)/
|
||||
Failed: reference to non-existent subpattern at offset 32
|
||||
|
||||
/(?+-a)/
|
||||
Failed: digit expected after (?+ at offset 3
|
||||
|
||||
/(?-+a)/
|
||||
Failed: unrecognized character after (? or (?- at offset 3
|
||||
|
||||
/(?(-1))/
|
||||
Failed: reference to non-existent subpattern at offset 6
|
||||
|
||||
/(?(+10))/
|
||||
Failed: reference to non-existent subpattern at offset 7
|
||||
|
||||
/(?(10))/
|
||||
Failed: reference to non-existent subpattern at offset 6
|
||||
|
||||
/(?(+2))()()/
|
||||
|
||||
/(?(2))()()/
|
||||
|
||||
/\k''/
|
||||
Failed: subpattern name expected at offset 3
|
||||
|
||||
/\k<>/
|
||||
Failed: subpattern name expected at offset 3
|
||||
|
||||
/\k{}/
|
||||
Failed: subpattern name expected at offset 3
|
||||
|
||||
/(?P=)/
|
||||
Failed: subpattern name expected at offset 4
|
||||
|
||||
/(?P>)/
|
||||
Failed: subpattern name expected at offset 4
|
||||
|
||||
/(?!\w)(?R)/
|
||||
Failed: recursive call could loop indefinitely at offset 9
|
||||
|
||||
/(?=\w)(?R)/
|
||||
Failed: recursive call could loop indefinitely at offset 9
|
||||
|
||||
/(?<!\w)(?R)/
|
||||
Failed: recursive call could loop indefinitely at offset 10
|
||||
|
||||
/(?<=\w)(?R)/
|
||||
Failed: recursive call could loop indefinitely at offset 10
|
||||
|
||||
/[[:foo:]]/
|
||||
Failed: unknown POSIX class name at offset 3
|
||||
|
||||
/[[:1234:]]/
|
||||
Failed: unknown POSIX class name at offset 3
|
||||
|
||||
/[[:f\oo:]]/
|
||||
Failed: unknown POSIX class name at offset 3
|
||||
|
||||
/[[: :]]/
|
||||
Failed: unknown POSIX class name at offset 3
|
||||
|
||||
/[[:...:]]/
|
||||
Failed: unknown POSIX class name at offset 3
|
||||
|
||||
/[[:l\ower:]]/
|
||||
Failed: unknown POSIX class name at offset 3
|
||||
|
||||
/[[:abc\:]]/
|
||||
Failed: unknown POSIX class name at offset 3
|
||||
|
||||
/[abc[:x\]pqr:]]/
|
||||
Failed: unknown POSIX class name at offset 6
|
||||
|
||||
/[[:a\dz:]]/
|
||||
Failed: unknown POSIX class name at offset 3
|
||||
|
||||
/ End of testinput2 /
|
||||
|
131
ext/pcre/pcrelib/testdata/testoutput4
vendored
131
ext/pcre/pcrelib/testdata/testoutput4
vendored
@ -938,4 +938,135 @@ No match
|
||||
0: +\x{a3}
|
||||
0: ==
|
||||
|
||||
/\S/8g
|
||||
\x{442}\x{435}\x{441}\x{442}
|
||||
0: \x{442}
|
||||
0: \x{435}
|
||||
0: \x{441}
|
||||
0: \x{442}
|
||||
|
||||
/[\S]/8g
|
||||
\x{442}\x{435}\x{441}\x{442}
|
||||
0: \x{442}
|
||||
0: \x{435}
|
||||
0: \x{441}
|
||||
0: \x{442}
|
||||
|
||||
/\D/8g
|
||||
\x{442}\x{435}\x{441}\x{442}
|
||||
0: \x{442}
|
||||
0: \x{435}
|
||||
0: \x{441}
|
||||
0: \x{442}
|
||||
|
||||
/[\D]/8g
|
||||
\x{442}\x{435}\x{441}\x{442}
|
||||
0: \x{442}
|
||||
0: \x{435}
|
||||
0: \x{441}
|
||||
0: \x{442}
|
||||
|
||||
/\W/8g
|
||||
\x{2442}\x{2435}\x{2441}\x{2442}
|
||||
0: \x{2442}
|
||||
0: \x{2435}
|
||||
0: \x{2441}
|
||||
0: \x{2442}
|
||||
|
||||
/[\W]/8g
|
||||
\x{2442}\x{2435}\x{2441}\x{2442}
|
||||
0: \x{2442}
|
||||
0: \x{2435}
|
||||
0: \x{2441}
|
||||
0: \x{2442}
|
||||
|
||||
/[\S\s]*/8
|
||||
abc\n\r\x{442}\x{435}\x{441}\x{442}xyz
|
||||
0: abc\x{0a}\x{0d}\x{442}\x{435}\x{441}\x{442}xyz
|
||||
|
||||
/[\x{41f}\S]/8g
|
||||
\x{442}\x{435}\x{441}\x{442}
|
||||
0: \x{442}
|
||||
0: \x{435}
|
||||
0: \x{441}
|
||||
0: \x{442}
|
||||
|
||||
/.[^\S]./8g
|
||||
abc def\x{442}\x{443}xyz\npqr
|
||||
0: c d
|
||||
0: z\x{0a}p
|
||||
|
||||
/.[^\S\n]./8g
|
||||
abc def\x{442}\x{443}xyz\npqr
|
||||
0: c d
|
||||
|
||||
/[[:^alnum:]]/8g
|
||||
+\x{2442}
|
||||
0: +
|
||||
0: \x{2442}
|
||||
|
||||
/[[:^alpha:]]/8g
|
||||
+\x{2442}
|
||||
0: +
|
||||
0: \x{2442}
|
||||
|
||||
/[[:^ascii:]]/8g
|
||||
A\x{442}
|
||||
0: \x{442}
|
||||
|
||||
/[[:^blank:]]/8g
|
||||
A\x{442}
|
||||
0: A
|
||||
0: \x{442}
|
||||
|
||||
/[[:^cntrl:]]/8g
|
||||
A\x{442}
|
||||
0: A
|
||||
0: \x{442}
|
||||
|
||||
/[[:^digit:]]/8g
|
||||
A\x{442}
|
||||
0: A
|
||||
0: \x{442}
|
||||
|
||||
/[[:^graph:]]/8g
|
||||
\x19\x{e01ff}
|
||||
0: \x{19}
|
||||
0: \x{e01ff}
|
||||
|
||||
/[[:^lower:]]/8g
|
||||
A\x{422}
|
||||
0: A
|
||||
0: \x{422}
|
||||
|
||||
/[[:^print:]]/8g
|
||||
\x{19}\x{e01ff}
|
||||
0: \x{19}
|
||||
0: \x{e01ff}
|
||||
|
||||
/[[:^punct:]]/8g
|
||||
A\x{442}
|
||||
0: A
|
||||
0: \x{442}
|
||||
|
||||
/[[:^space:]]/8g
|
||||
A\x{442}
|
||||
0: A
|
||||
0: \x{442}
|
||||
|
||||
/[[:^upper:]]/8g
|
||||
a\x{442}
|
||||
0: a
|
||||
0: \x{442}
|
||||
|
||||
/[[:^word:]]/8g
|
||||
+\x{2442}
|
||||
0: +
|
||||
0: \x{2442}
|
||||
|
||||
/[[:^xdigit:]]/8g
|
||||
M\x{442}
|
||||
0: M
|
||||
0: \x{442}
|
||||
|
||||
/ End of testinput4 /
|
||||
|
13
ext/pcre/pcrelib/testdata/testoutput5
vendored
13
ext/pcre/pcrelib/testdata/testoutput5
vendored
@ -1595,4 +1595,17 @@ No match
|
||||
a\x0bb\<bsr_anycrlf>
|
||||
No match
|
||||
|
||||
/.*a.*=.b.*/8<ANY>
|
||||
QQQ\x{2029}ABCaXYZ=!bPQR
|
||||
0: ABCaXYZ=!bPQR
|
||||
** Failers
|
||||
No match
|
||||
a\x{2029}b
|
||||
No match
|
||||
\x61\xe2\x80\xa9\x62
|
||||
No match
|
||||
|
||||
/[[:a\x{100}b:]]/8
|
||||
Failed: unknown POSIX class name at offset 3
|
||||
|
||||
/ End of testinput5 /
|
||||
|
157
ext/pcre/pcrelib/testdata/testoutput6
vendored
157
ext/pcre/pcrelib/testdata/testoutput6
vendored
@ -1522,4 +1522,161 @@ No match
|
||||
|
||||
/(\p{Yi}{0,3}+\277)*/
|
||||
|
||||
/^[\p{Arabic}]/8
|
||||
\x{60e}
|
||||
0: \x{60e}
|
||||
\x{656}
|
||||
0: \x{656}
|
||||
\x{657}
|
||||
0: \x{657}
|
||||
\x{658}
|
||||
0: \x{658}
|
||||
\x{659}
|
||||
0: \x{659}
|
||||
\x{65a}
|
||||
0: \x{65a}
|
||||
\x{65b}
|
||||
0: \x{65b}
|
||||
\x{65c}
|
||||
0: \x{65c}
|
||||
\x{65d}
|
||||
0: \x{65d}
|
||||
\x{65e}
|
||||
0: \x{65e}
|
||||
\x{66a}
|
||||
0: \x{66a}
|
||||
\x{6e9}
|
||||
0: \x{6e9}
|
||||
\x{6ef}
|
||||
0: \x{6ef}
|
||||
\x{6fa}
|
||||
0: \x{6fa}
|
||||
** Failers
|
||||
No match
|
||||
\x{600}
|
||||
No match
|
||||
\x{650}
|
||||
No match
|
||||
\x{651}
|
||||
No match
|
||||
\x{652}
|
||||
No match
|
||||
\x{653}
|
||||
No match
|
||||
\x{654}
|
||||
No match
|
||||
\x{655}
|
||||
No match
|
||||
\x{65f}
|
||||
No match
|
||||
|
||||
/^\p{Cyrillic}/8
|
||||
\x{1d2b}
|
||||
0: \x{1d2b}
|
||||
|
||||
/^\p{Common}/8
|
||||
\x{589}
|
||||
0: \x{589}
|
||||
\x{60c}
|
||||
0: \x{60c}
|
||||
\x{61f}
|
||||
0: \x{61f}
|
||||
\x{964}
|
||||
0: \x{964}
|
||||
\x{965}
|
||||
0: \x{965}
|
||||
\x{970}
|
||||
0: \x{970}
|
||||
|
||||
/^\p{Inherited}/8
|
||||
\x{64b}
|
||||
0: \x{64b}
|
||||
\x{654}
|
||||
0: \x{654}
|
||||
\x{655}
|
||||
0: \x{655}
|
||||
\x{200c}
|
||||
0: \x{200c}
|
||||
** Failers
|
||||
No match
|
||||
\x{64a}
|
||||
No match
|
||||
\x{656}
|
||||
No match
|
||||
|
||||
/^\p{Shavian}/8
|
||||
\x{10450}
|
||||
0: \x{10450}
|
||||
\x{1047f}
|
||||
0: \x{1047f}
|
||||
|
||||
/^\p{Deseret}/8
|
||||
\x{10400}
|
||||
0: \x{10400}
|
||||
\x{1044f}
|
||||
0: \x{1044f}
|
||||
|
||||
/^\p{Osmanya}/8
|
||||
\x{10480}
|
||||
0: \x{10480}
|
||||
\x{1049d}
|
||||
0: \x{1049d}
|
||||
\x{104a0}
|
||||
0: \x{104a0}
|
||||
\x{104a9}
|
||||
0: \x{104a9}
|
||||
** Failers
|
||||
No match
|
||||
\x{1049e}
|
||||
No match
|
||||
\x{1049f}
|
||||
No match
|
||||
\x{104aa}
|
||||
No match
|
||||
|
||||
/\p{Zl}{2,3}+/8BZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
prop Zl {2}
|
||||
prop Zl ?+
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
\xe2\x80\xa8\xe2\x80\xa8
|
||||
0: \x{2028}\x{2028}
|
||||
\x{2028}\x{2028}\x{2028}
|
||||
0: \x{2028}\x{2028}\x{2028}
|
||||
|
||||
/\p{Zl}/8BZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
prop Zl
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
|
||||
/\p{Lu}{3}+/8BZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
prop Lu {3}
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
|
||||
/\pL{2}+/8BZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
prop L {2}
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
|
||||
/\p{Cc}{2}+/8BZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
prop Cc {2}
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
|
||||
/ End of testinput6 /
|
||||
|
@ -539,7 +539,8 @@ static const cnode ucp_table[] = {
|
||||
{ 0x21000293, 0x14000000 },
|
||||
{ 0x21000294, 0x1c000000 },
|
||||
{ 0x21800295, 0x1400001a },
|
||||
{ 0x218002b0, 0x18000011 },
|
||||
{ 0x218002b0, 0x18000008 },
|
||||
{ 0x098002b9, 0x18000008 },
|
||||
{ 0x098002c2, 0x60000003 },
|
||||
{ 0x098002c6, 0x1800000b },
|
||||
{ 0x098002d2, 0x6000000d },
|
||||
@ -1039,15 +1040,18 @@ static const cnode ucp_table[] = {
|
||||
{ 0x198005f3, 0x54000001 },
|
||||
{ 0x09800600, 0x04000003 },
|
||||
{ 0x0000060b, 0x5c000000 },
|
||||
{ 0x0980060c, 0x54000001 },
|
||||
{ 0x0900060c, 0x54000000 },
|
||||
{ 0x0000060d, 0x54000000 },
|
||||
{ 0x0080060e, 0x68000001 },
|
||||
{ 0x00800610, 0x30000005 },
|
||||
{ 0x0900061b, 0x54000000 },
|
||||
{ 0x0080061e, 0x54000001 },
|
||||
{ 0x0000061e, 0x54000000 },
|
||||
{ 0x0900061f, 0x54000000 },
|
||||
{ 0x00800621, 0x1c000019 },
|
||||
{ 0x09000640, 0x18000000 },
|
||||
{ 0x00800641, 0x1c000009 },
|
||||
{ 0x1b80064b, 0x30000013 },
|
||||
{ 0x1b80064b, 0x3000000a },
|
||||
{ 0x00800656, 0x30000008 },
|
||||
{ 0x09800660, 0x34000009 },
|
||||
{ 0x0080066a, 0x54000003 },
|
||||
{ 0x0080066e, 0x1c000001 },
|
||||
@ -1074,7 +1078,8 @@ static const cnode ucp_table[] = {
|
||||
{ 0x31000711, 0x30000000 },
|
||||
{ 0x31800712, 0x1c00001d },
|
||||
{ 0x31800730, 0x3000001a },
|
||||
{ 0x3180074d, 0x1c000020 },
|
||||
{ 0x3180074d, 0x1c000002 },
|
||||
{ 0x00800750, 0x1c00001d },
|
||||
{ 0x37800780, 0x1c000025 },
|
||||
{ 0x378007a6, 0x3000000a },
|
||||
{ 0x370007b1, 0x1c000000 },
|
||||
@ -1460,7 +1465,10 @@ static const cnode ucp_table[] = {
|
||||
{ 0x1f0017dd, 0x30000000 },
|
||||
{ 0x1f8017e0, 0x34000009 },
|
||||
{ 0x1f8017f0, 0x3c000009 },
|
||||
{ 0x25801800, 0x54000005 },
|
||||
{ 0x25801800, 0x54000001 },
|
||||
{ 0x09801802, 0x54000001 },
|
||||
{ 0x25001804, 0x54000000 },
|
||||
{ 0x09001805, 0x54000000 },
|
||||
{ 0x25001806, 0x44000000 },
|
||||
{ 0x25801807, 0x54000003 },
|
||||
{ 0x2580180b, 0x30000002 },
|
||||
@ -1513,14 +1521,20 @@ static const cnode ucp_table[] = {
|
||||
{ 0x3d801b61, 0x68000009 },
|
||||
{ 0x3d801b6b, 0x30000008 },
|
||||
{ 0x3d801b74, 0x68000008 },
|
||||
{ 0x21801d00, 0x1400002b },
|
||||
{ 0x21801d2c, 0x18000035 },
|
||||
{ 0x21801d62, 0x14000015 },
|
||||
{ 0x21801d00, 0x14000025 },
|
||||
{ 0x13801d26, 0x14000004 },
|
||||
{ 0x0c001d2b, 0x14000000 },
|
||||
{ 0x21801d2c, 0x18000030 },
|
||||
{ 0x13801d5d, 0x18000004 },
|
||||
{ 0x21801d62, 0x14000003 },
|
||||
{ 0x13801d66, 0x14000004 },
|
||||
{ 0x21801d6b, 0x1400000c },
|
||||
{ 0x0c001d78, 0x18000000 },
|
||||
{ 0x21801d79, 0x14000003 },
|
||||
{ 0x21001d7d, 0x14000ee6 },
|
||||
{ 0x21801d7e, 0x1400001c },
|
||||
{ 0x21801d9b, 0x18000024 },
|
||||
{ 0x21801d9b, 0x18000023 },
|
||||
{ 0x13001dbf, 0x18000000 },
|
||||
{ 0x1b801dc0, 0x3000000a },
|
||||
{ 0x1b801dfe, 0x30000001 },
|
||||
{ 0x21001e00, 0x24000001 },
|
||||
@ -1982,7 +1996,9 @@ static const cnode ucp_table[] = {
|
||||
{ 0x13001ffc, 0x2000fff7 },
|
||||
{ 0x13801ffd, 0x60000001 },
|
||||
{ 0x09802000, 0x7400000a },
|
||||
{ 0x0980200b, 0x04000004 },
|
||||
{ 0x0900200b, 0x04000000 },
|
||||
{ 0x1b80200c, 0x04000001 },
|
||||
{ 0x0980200e, 0x04000001 },
|
||||
{ 0x09802010, 0x44000005 },
|
||||
{ 0x09802016, 0x54000001 },
|
||||
{ 0x09002018, 0x50000000 },
|
||||
@ -2615,7 +2631,8 @@ static const cnode ucp_table[] = {
|
||||
{ 0x090030a0, 0x44000000 },
|
||||
{ 0x1d8030a1, 0x1c000059 },
|
||||
{ 0x090030fb, 0x54000000 },
|
||||
{ 0x098030fc, 0x18000002 },
|
||||
{ 0x090030fc, 0x18000000 },
|
||||
{ 0x1d8030fd, 0x18000001 },
|
||||
{ 0x1d0030ff, 0x1c000000 },
|
||||
{ 0x03803105, 0x1c000027 },
|
||||
{ 0x17803131, 0x1c00005d },
|
||||
@ -2630,7 +2647,8 @@ static const cnode ucp_table[] = {
|
||||
{ 0x0980322a, 0x68000019 },
|
||||
{ 0x09003250, 0x68000000 },
|
||||
{ 0x09803251, 0x3c00000e },
|
||||
{ 0x17803260, 0x6800001f },
|
||||
{ 0x17803260, 0x6800001d },
|
||||
{ 0x0980327e, 0x68000001 },
|
||||
{ 0x09803280, 0x3c000009 },
|
||||
{ 0x0980328a, 0x68000026 },
|
||||
{ 0x098032b1, 0x3c00000e },
|
||||
@ -2678,7 +2696,8 @@ static const cnode ucp_table[] = {
|
||||
{ 0x1900fb3e, 0x1c000000 },
|
||||
{ 0x1980fb40, 0x1c000001 },
|
||||
{ 0x1980fb43, 0x1c000001 },
|
||||
{ 0x1980fb46, 0x1c00006b },
|
||||
{ 0x1980fb46, 0x1c000009 },
|
||||
{ 0x0080fb50, 0x1c000061 },
|
||||
{ 0x0080fbd3, 0x1c00016a },
|
||||
{ 0x0900fd3e, 0x58000000 },
|
||||
{ 0x0900fd3f, 0x48000000 },
|
||||
@ -2944,7 +2963,8 @@ static const cnode ucp_table[] = {
|
||||
{ 0x0d01044d, 0x1400ffd8 },
|
||||
{ 0x0d01044e, 0x1400ffd8 },
|
||||
{ 0x0d01044f, 0x1400ffd8 },
|
||||
{ 0x2e810450, 0x1c00004d },
|
||||
{ 0x2e810450, 0x1c00002f },
|
||||
{ 0x2c810480, 0x1c00001d },
|
||||
{ 0x2c8104a0, 0x34000009 },
|
||||
{ 0x0b810800, 0x1c000005 },
|
||||
{ 0x0b010808, 0x1c000000 },
|
||||
|
Loading…
Reference in New Issue
Block a user