upgrade PCRE to version 7.5

This commit is contained in:
Nuno Lopes 2008-01-13 12:44:57 +00:00
parent ccc0d6e32b
commit 4c501a0ab6
25 changed files with 1066 additions and 2254 deletions

2
NEWS
View File

@ -29,7 +29,7 @@ PHP NEWS
invoking the date parser. (Scott)
- Removed the experimental RPL (master/slave) functions from mysqli. (Andrey)
- Upgraded PCRE to version 7.4 (Nuno)
- Upgraded PCRE to version 7.5 (Nuno)
- Improved php.ini handling: (Jani)
. Added ".htaccess" style user-defined php.ini files support for CGI/FastCGI

View File

@ -1,6 +1,143 @@
ChangeLog for PCRE
------------------
Version 7.5 10-Jan-08
---------------------
1. Applied a patch from Craig: "This patch makes it possible to 'ignore'
values in parens when parsing an RE using the C++ wrapper."
2. Negative specials like \S did not work in character classes in UTF-8 mode.
Characters greater than 255 were excluded from the class instead of being
included.
3. The same bug as (2) above applied to negated POSIX classes such as
[:^space:].
4. PCRECPP_STATIC was referenced in pcrecpp_internal.h, but nowhere was it
defined or documented. It seems to have been a typo for PCRE_STATIC, so
I have changed it.
5. The construct (?&) was not diagnosed as a syntax error (it referenced the
first named subpattern) and a construct such as (?&a) would reference the
first named subpattern whose name started with "a" (in other words, the
length check was missing). Both these problems are fixed. "Subpattern name
expected" is now given for (?&) (a zero-length name), and this patch also
makes it give the same error for \k'' (previously it complained that that
was a reference to a non-existent subpattern).
6. The erroneous patterns (?+-a) and (?-+a) give different error messages;
this is right because (?- can be followed by option settings as well as by
digits. I have, however, made the messages clearer.
7. Patterns such as (?(1)a|b) (a pattern that contains fewer subpatterns
than the number used in the conditional) now cause a compile-time error.
This is actually not compatible with Perl, which accepts such patterns, but
treats the conditional as always being FALSE (as PCRE used to), but it
seems to me that giving a diagnostic is better.
8. Change "alphameric" to the more common word "alphanumeric" in comments
and messages.
9. Fix two occurrences of "backslash" in comments that should have been
"backspace".
10. Remove two redundant lines of code that can never be obeyed (their function
was moved elsewhere).
11. The program that makes PCRE's Unicode character property table had a bug
which caused it to generate incorrect table entries for sequences of
characters that have the same character type, but are in different scripts.
It amalgamated them into a single range, with the script of the first of
them. In other words, some characters were in the wrong script. There were
thirteen such cases, affecting characters in the following ranges:
U+002b0 - U+002c1
U+0060c - U+0060d
U+0061e - U+00612
U+0064b - U+0065e
U+0074d - U+0076d
U+01800 - U+01805
U+01d00 - U+01d77
U+01d9b - U+01dbf
U+0200b - U+0200f
U+030fc - U+030fe
U+03260 - U+0327f
U+0fb46 - U+0fbb1
U+10450 - U+1049d
12. The -o option (show only the matching part of a line) for pcregrep was not
compatible with GNU grep in that, if there was more than one match in a
line, it showed only the first of them. It now behaves in the same way as
GNU grep.
13. If the -o and -v options were combined for pcregrep, it printed a blank
line for every non-matching line. GNU grep prints nothing, and pcregrep now
does the same. The return code can be used to tell if there were any
non-matching lines.
14. Added --file-offsets and --line-offsets to pcregrep.
15. The pattern (?=something)(?R) was not being diagnosed as a potentially
infinitely looping recursion. The bug was that positive lookaheads were not
being skipped when checking for a possible empty match (negative lookaheads
and both kinds of lookbehind were skipped).
16. Fixed two typos in the Windows-only code in pcregrep.c, and moved the
inclusion of <windows.h> to before rather than after the definition of
INVALID_FILE_ATTRIBUTES (patch from David Byron).
17. Specifying a possessive quantifier with a specific limit for a Unicode
character property caused pcre_compile() to compile bad code, which led at
runtime to PCRE_ERROR_INTERNAL (-14). Examples of patterns that caused this
are: /\p{Zl}{2,3}+/8 and /\p{Cc}{2}+/8. It was the possessive "+" that
caused the error; without that there was no problem.
18. Added --enable-pcregrep-libz and --enable-pcregrep-libbz2.
19. Added --enable-pcretest-libreadline.
20. In pcrecpp.cc, the variable 'count' was incremented twice in
RE::GlobalReplace(). As a result, the number of replacements returned was
double what it should be. I removed one of the increments, but Craig sent a
later patch that removed the other one (the right fix) and added unit tests
that check the return values (which was not done before).
21. Several CMake things:
(1) Arranged that, when cmake is used on Unix, the libraries end up with
the names libpcre and libpcreposix, not just pcre and pcreposix.
(2) The above change means that pcretest and pcregrep are now correctly
linked with the newly-built libraries, not previously installed ones.
(3) Added PCRE_SUPPORT_LIBREADLINE, PCRE_SUPPORT_LIBZ, PCRE_SUPPORT_LIBBZ2.
22. In UTF-8 mode, with newline set to "any", a pattern such as .*a.*=.b.*
crashed when matching a string such as a\x{2029}b (note that \x{2029} is a
UTF-8 newline character). The key issue is that the pattern starts .*;
this means that the match must be either at the beginning, or after a
newline. The bug was in the code for advancing after a failed match and
checking that the new position followed a newline. It was not taking
account of UTF-8 characters correctly.
23. PCRE was behaving differently from Perl in the way it recognized POSIX
character classes. PCRE was not treating the sequence [:...:] as a
character class unless the ... were all letters. Perl, however, seems to
allow any characters between [: and :], though of course it rejects as
unknown any "names" that contain non-letters, because all the known class
names consist only of letters. Thus, Perl gives an error for [[:1234:]],
for example, whereas PCRE did not - it did not recognize a POSIX character
class. This seemed a bit dangerous, so the code has been changed to be
closer to Perl. The behaviour is not identical to Perl, because PCRE will
diagnose an unknown class for, for example, [[:l\ower:]] where Perl will
treat it as [[:lower:]]. However, PCRE does now give "unknown" errors where
Perl does, and where it didn't before.
24. Rewrite so as to remove the single use of %n from pcregrep because in some
Windows environments %n is disabled by default.
Version 7.4 21-Sep-07
---------------------

View File

@ -1,6 +1,14 @@
News about PCRE releases
------------------------
Release 7.5 10-Jan-08
---------------------
This is mainly a bug-fix release. However the ability to link pcregrep with
libz or libbz2 and the ability to link pcretest with libreadline have been
added. Also the --line-offsets and --file-offsets options were added to
pcregrep.
Release 7.4 21-Sep-07
---------------------

View File

@ -84,7 +84,7 @@ The following are generic comments about building the PCRE C library "by hand".
ucptable.h
(5) Also ensure that you have the following file, which is #included as source
when building a debugging version of PCRE and is also used by pcretest.
when building a debugging version of PCRE, and is also used by pcretest.
pcre_printint.src

View File

@ -258,6 +258,24 @@ library. You can read more about them in the pcrebuild man page.
This automatically implies --enable-rebuild-chartables (see above).
. It is possible to compile pcregrep to use libz and/or libbz2, in order to
read .gz and .bz2 files (respectively), by specifying one or both of
--enable-pcregrep-libz
--enable-pcregrep-libbz2
Of course, the relevant libraries must be installed on your system.
. It is possible to compile pcretest so that it links with the libreadline
library, by specifying
--enable-pcretest-libreadline
If this is done, when pcretest's input is from a terminal, it reads it using
the readline() function. This provides line-editing and history facilities.
Note that libreadline is GPL-licenced, so if you distribute a binary of
pcretest linked in this way, there may be licensing issues.
The "configure" script builds the following files for the basic C library:
. Makefile is the makefile that builds the library
@ -725,4 +743,4 @@ The distribution should contain the following files:
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
Last updated: 21 September 2007
Last updated: 18 December 2007

View File

@ -51,6 +51,11 @@ them both to 0; an emulation function will be used. */
/* Define to 1 if you have the <bits/type_traits.h> header file. */
/* #undef HAVE_BITS_TYPE_TRAITS_H */
/* Define to 1 if you have the <bzlib.h> header file. */
#ifndef HAVE_BZLIB_H
#define HAVE_BZLIB_H 1
#endif
/* Define to 1 if you have the <dirent.h> header file. */
#ifndef HAVE_DIRENT_H
#define HAVE_DIRENT_H 1
@ -86,6 +91,16 @@ them both to 0; an emulation function will be used. */
#define HAVE_MEMORY_H 1
#endif
/* Define to 1 if you have the <readline/history.h> header file. */
#ifndef HAVE_READLINE_HISTORY_H
#define HAVE_READLINE_HISTORY_H 1
#endif
/* Define to 1 if you have the <readline/readline.h> header file. */
#ifndef HAVE_READLINE_READLINE_H
#define HAVE_READLINE_READLINE_H 1
#endif
/* Define to 1 if you have the <stdint.h> header file. */
#ifndef HAVE_STDINT_H
#define HAVE_STDINT_H 1
@ -152,6 +167,11 @@ them both to 0; an emulation function will be used. */
/* Define to 1 if you have the <windows.h> header file. */
/* #undef HAVE_WINDOWS_H */
/* Define to 1 if you have the <zlib.h> header file. */
#ifndef HAVE_ZLIB_H
#define HAVE_ZLIB_H 1
#endif
/* Define to 1 if you have the `_strtoi64' function. */
/* #undef HAVE__STRTOI64 */
@ -231,13 +251,13 @@ them both to 0; an emulation function will be used. */
#define PACKAGE_NAME "PCRE"
/* Define to the full name and version of this package. */
#define PACKAGE_STRING "PCRE 7.4"
#define PACKAGE_STRING "PCRE 7.5"
/* Define to the one symbol short name of this package. */
#define PACKAGE_TARNAME "pcre"
/* Define to the version of this package. */
#define PACKAGE_VERSION "7.4"
#define PACKAGE_VERSION "7.5"
/* If you are compiling for a system other than a Unix-like system or
@ -271,6 +291,17 @@ them both to 0; an emulation function will be used. */
#define STDC_HEADERS 1
#endif
/* Define to allow pcregrep to be linked with libbz2, so that it is able to
handle .bz2 files. */
/* #undef SUPPORT_LIBBZ2 */
/* Define to allow pcretest to be linked with libreadline. */
/* #undef SUPPORT_LIBREADLINE */
/* Define to allow pcregrep to be linked with libz, so that it is able to
handle .gz files. */
/* #undef SUPPORT_LIBZ */
/* Define to enable support for Unicode properties */
/* #undef SUPPORT_UCP */
@ -279,7 +310,7 @@ them both to 0; an emulation function will be used. */
/* Version number of package */
#ifndef VERSION
#define VERSION "7.4"
#define VERSION "7.5"
#endif
/* Define to empty if `const' does not conform to ANSI C. */

View File

@ -42,9 +42,9 @@ POSSIBILITY OF SUCH DAMAGE.
/* The current PCRE version information. */
#define PCRE_MAJOR 7
#define PCRE_MINOR 4
#define PCRE_MINOR 5
#define PCRE_PRERELEASE
#define PCRE_DATE 2007-09-21
#define PCRE_DATE 2008-01-10
/* When an application links to a PCRE DLL in Windows, the symbols that are
imported have to be identified as such. When building PCRE, the appropriate

View File

@ -239,7 +239,7 @@ static const char error_texts[] =
/* 10 */
"operand of unlimited repeat could match the empty string\0" /** DEAD **/
"internal error: unexpected repeat\0"
"unrecognized character after (?\0"
"unrecognized character after (? or (?-\0"
"POSIX named classes are supported only within a class\0"
"missing )\0"
/* 15 */
@ -298,7 +298,9 @@ static const char error_texts[] =
"(*VERB) with an argument is not supported\0"
/* 60 */
"(*VERB) not recognized\0"
"number is too big";
"number is too big\0"
"subpattern name expected\0"
"digit expected after (?+";
/* Table to identify digits and hex digits. This is used when compiling
@ -494,16 +496,16 @@ ptr--; /* Set pointer back to the last byte */
if (c == 0) *errorcodeptr = ERR1;
/* Non-alphamerics are literals. For digits or letters, do an initial lookup in
a table. A non-zero result is something that can be returned immediately.
/* Non-alphanumerics are literals. For digits or letters, do an initial lookup
in a table. A non-zero result is something that can be returned immediately.
Otherwise further processing may be required. */
#ifndef EBCDIC /* ASCII coding */
else if (c < '0' || c > 'z') {} /* Not alphameric */
else if (c < '0' || c > 'z') {} /* Not alphanumeric */
else if ((i = escapes[c - '0']) != 0) c = i;
#else /* EBCDIC coding */
else if (c < 'a' || (ebcdic_chartab[c] & 0x0E) == 0) {} /* Not alphameric */
else if (c < 'a' || (ebcdic_chartab[c] & 0x0E) == 0) {} /* Not alphanumeric */
else if ((i = escapes[c - 0x48]) != 0) c = i;
#endif
@ -720,10 +722,10 @@ else
break;
/* PCRE_EXTRA enables extensions to Perl in the matter of escapes. Any
other alphameric following \ is an error if PCRE_EXTRA was set; otherwise,
for Perl compatibility, it is a literal. This code looks a bit odd, but
there used to be some cases other than the default, and there may be again
in future, so I haven't "optimized" it. */
other alphanumeric following \ is an error if PCRE_EXTRA was set;
otherwise, for Perl compatibility, it is a literal. This code looks a bit
odd, but there used to be some cases other than the default, and there may
be again in future, so I haven't "optimized" it. */
default:
if ((options & PCRE_EXTRA) != 0) switch(c)
@ -1504,8 +1506,9 @@ for (;;)
can match the empty string or not. It is called from could_be_empty()
below and from compile_branch() when checking for an unlimited repeat of a
group that can match nothing. Note that first_significant_code() skips over
assertions. If we hit an unclosed bracket, we return "empty" - this means we've
struck an inner bracket whose current branch will already have been scanned.
backward and negative forward assertions when its final argument is TRUE. If we
hit an unclosed bracket, we return "empty" - this means we've struck an inner
bracket whose current branch will already have been scanned.
Arguments:
code points to start of search
@ -1527,6 +1530,16 @@ for (code = first_significant_code(code + _pcre_OP_lengths[*code], NULL, 0, TRUE
c = *code;
/* Skip over forward assertions; the other assertions are skipped by
first_significant_code() with a TRUE final argument. */
if (c == OP_ASSERT)
{
do code += GET(code, 1); while (*code == OP_ALT);
c = *code;
continue;
}
/* Groups with zero repeats can of course be empty; skip them. */
if (c == OP_BRAZERO || c == OP_BRAMINZERO)
@ -1722,29 +1735,48 @@ return TRUE;
*************************************************/
/* This function is called when the sequence "[:" or "[." or "[=" is
encountered in a character class. It checks whether this is followed by an
optional ^ and then a sequence of letters, terminated by a matching ":]" or
".]" or "=]".
encountered in a character class. It checks whether this is followed by a
sequence of characters terminated by a matching ":]" or ".]" or "=]". If we
reach an unescaped ']' without the special preceding character, return FALSE.
Argument:
Originally, this function only recognized a sequence of letters between the
terminators, but it seems that Perl recognizes any sequence of characters,
though of course unknown POSIX names are subsequently rejected. Perl gives an
"Unknown POSIX class" error for [:f\oo:] for example, where previously PCRE
didn't consider this to be a POSIX class. Likewise for [:1234:].
The problem in trying to be exactly like Perl is in the handling of escapes. We
have to be sure that [abc[:x\]pqr] is *not* treated as containing a POSIX
class, but [abc[:x\]pqr:]] is (so that an error can be generated). The code
below handles the special case of \], but does not try to do any other escape
processing. This makes it different from Perl for cases such as [:l\ower:]
where Perl recognizes it as the POSIX class "lower" but PCRE does not recognize
"l\ower". This is a lesser evil that not diagnosing bad classes when Perl does,
I think.
Arguments:
ptr pointer to the initial [
endptr where to return the end pointer
cd pointer to compile data
Returns: TRUE or FALSE
*/
static BOOL
check_posix_syntax(const uschar *ptr, const uschar **endptr, compile_data *cd)
check_posix_syntax(const uschar *ptr, const uschar **endptr)
{
int terminator; /* Don't combine these lines; the Solaris cc */
terminator = *(++ptr); /* compiler warns about "non-constant" initializer. */
if (*(++ptr) == '^') ptr++;
while ((cd->ctypes[*ptr] & ctype_letter) != 0) ptr++;
if (*ptr == terminator && ptr[1] == ']')
for (++ptr; *ptr != 0; ptr++)
{
*endptr = ptr;
return TRUE;
if (*ptr == '\\' && ptr[1] == ']') ptr++; else
{
if (*ptr == ']') return FALSE;
if (*ptr == terminator && ptr[1] == ']')
{
*endptr = ptr;
return TRUE;
}
}
}
return FALSE;
}
@ -2381,6 +2413,7 @@ req_caseopt = ((options & PCRE_CASELESS) != 0)? REQ_CASELESS : 0;
for (;; ptr++)
{
BOOL negate_class;
BOOL should_flip_negation;
BOOL possessive_quantifier;
BOOL is_quantifier;
BOOL is_recurse;
@ -2604,7 +2637,7 @@ for (;; ptr++)
they are encountered at the top level, so we'll do that too. */
if ((ptr[1] == ':' || ptr[1] == '.' || ptr[1] == '=') &&
check_posix_syntax(ptr, &tempptr, cd))
check_posix_syntax(ptr, &tempptr))
{
*errorcodeptr = (ptr[1] == ':')? ERR13 : ERR31;
goto FAILED;
@ -2629,6 +2662,12 @@ for (;; ptr++)
else break;
}
/* If a class contains a negative special such as \S, we need to flip the
negation flag at the end, so that support for characters > 255 works
correctly (they are all included in the class). */
should_flip_negation = FALSE;
/* Keep a count of chars with values < 256 so that we can optimize the case
of just a single character (as long as it's < 256). However, For higher
valued UTF-8 characters, we don't yet do any optimization. */
@ -2684,7 +2723,7 @@ for (;; ptr++)
if (c == '[' &&
(ptr[1] == ':' || ptr[1] == '.' || ptr[1] == '=') &&
check_posix_syntax(ptr, &tempptr, cd))
check_posix_syntax(ptr, &tempptr))
{
BOOL local_negate = FALSE;
int posix_class, taboffset, tabopt;
@ -2701,6 +2740,7 @@ for (;; ptr++)
if (*ptr == '^')
{
local_negate = TRUE;
should_flip_negation = TRUE; /* Note negative special */
ptr++;
}
@ -2775,7 +2815,7 @@ for (;; ptr++)
c = check_escape(&ptr, errorcodeptr, cd->bracount, options, TRUE);
if (*errorcodeptr != 0) goto FAILED;
if (-c == ESC_b) c = '\b'; /* \b is backslash in a class */
if (-c == ESC_b) c = '\b'; /* \b is backspace in a class */
else if (-c == ESC_X) c = 'X'; /* \X is literal X in a class */
else if (-c == ESC_R) c = 'R'; /* \R is literal R in a class */
else if (-c == ESC_Q) /* Handle start of quoted string */
@ -2803,6 +2843,7 @@ for (;; ptr++)
continue;
case ESC_D:
should_flip_negation = TRUE;
for (c = 0; c < 32; c++) classbits[c] |= ~cbits[c+cbit_digit];
continue;
@ -2811,6 +2852,7 @@ for (;; ptr++)
continue;
case ESC_W:
should_flip_negation = TRUE;
for (c = 0; c < 32; c++) classbits[c] |= ~cbits[c+cbit_word];
continue;
@ -2820,13 +2862,11 @@ for (;; ptr++)
continue;
case ESC_S:
should_flip_negation = TRUE;
for (c = 0; c < 32; c++) classbits[c] |= ~cbits[c+cbit_space];
classbits[1] |= 0x08; /* Perl 5.004 onwards omits VT from \s */
continue;
case ESC_E: /* Perl ignores an orphan \E */
continue;
default: /* Not recognized; fall through */
break; /* Need "default" setting to stop compiler warning. */
}
@ -3061,7 +3101,7 @@ for (;; ptr++)
d = check_escape(&ptr, errorcodeptr, cd->bracount, options, TRUE);
if (*errorcodeptr != 0) goto FAILED;
/* \b is backslash; \X is literal X; \R is literal R; any other
/* \b is backspace; \X is literal X; \R is literal R; any other
special means the '-' was literal */
if (d < 0)
@ -3325,11 +3365,14 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
zeroreqbyte = reqbyte;
/* If there are characters with values > 255, we have to compile an
extended class, with its own opcode. If there are no characters < 256,
we can omit the bitmap in the actual compiled code. */
extended class, with its own opcode, unless there was a negated special
such as \S in the class, because in that case all characters > 255 are in
the class, so any that were explicitly given as well can be ignored. If
(when there are explicit characters > 255 that must be listed) there are no
characters < 256, we can omit the bitmap in the actual compiled code. */
#ifdef SUPPORT_UTF8
if (class_utf8)
if (class_utf8 && !should_flip_negation)
{
*class_utf8data++ = XCL_END; /* Marks the end of extra data */
*code++ = OP_XCLASS;
@ -3355,20 +3398,19 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
}
#endif
/* If there are no characters > 255, negate the 32-byte map if necessary,
and copy it into the code vector. If this is the first thing in the branch,
there can be no first char setting, whatever the repeat count. Any reqbyte
setting must remain unchanged after any kind of repeat. */
/* If there are no characters > 255, set the opcode to OP_CLASS or
OP_NCLASS, depending on whether the whole class was negated and whether
there were negative specials such as \S in the class. Then copy the 32-byte
map into the code vector, negating it if necessary. */
*code++ = (negate_class == should_flip_negation) ? OP_CLASS : OP_NCLASS;
if (negate_class)
{
*code++ = OP_NCLASS;
if (lengthptr == NULL) /* Save time in the pre-compile phase */
for (c = 0; c < 32; c++) code[c] = ~classbits[c];
}
else
{
*code++ = OP_CLASS;
memcpy(code, classbits, 32);
}
code += 32;
@ -4004,7 +4046,9 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
int len;
if (*tempcode == OP_EXACT || *tempcode == OP_TYPEEXACT ||
*tempcode == OP_NOTEXACT)
tempcode += _pcre_OP_lengths[*tempcode];
tempcode += _pcre_OP_lengths[*tempcode] +
((*tempcode == OP_TYPEEXACT &&
(tempcode[3] == OP_PROP || tempcode[3] == OP_NOTPROP))? 2:0);
len = code - tempcode;
if (len > 0) switch (*tempcode)
{
@ -4231,16 +4275,13 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
*errorcodeptr = ERR58;
goto FAILED;
}
if (refsign == '-')
recno = (refsign == '-')?
cd->bracount - recno + 1 : recno +cd->bracount;
if (recno <= 0 || recno > cd->final_bracount)
{
recno = cd->bracount - recno + 1;
if (recno <= 0)
{
*errorcodeptr = ERR15;
goto FAILED;
}
*errorcodeptr = ERR15;
goto FAILED;
}
else recno += cd->bracount;
PUT2(code, 2+LINK_SIZE, recno);
break;
}
@ -4312,9 +4353,10 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
skipbytes = 1;
}
/* Check for the "name" actually being a subpattern number. */
/* Check for the "name" actually being a subpattern number. We are
in the second pass here, so final_bracount is set. */
else if (recno > 0)
else if (recno > 0 && recno <= cd->final_bracount)
{
PUT2(code, 2+LINK_SIZE, recno);
}
@ -4508,7 +4550,9 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
/* We come here from the Python syntax above that handles both
references (?P=name) and recursion (?P>name), as well as falling
through from the Perl recursion syntax (?&name). */
through from the Perl recursion syntax (?&name). We also come here from
the Perl \k<name> or \k'name' back reference syntax and the \k{name}
.NET syntax. */
NAMED_REF_OR_RECURSE:
name = ++ptr;
@ -4520,6 +4564,11 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
if (lengthptr != NULL)
{
if (namelen == 0)
{
*errorcodeptr = ERR62;
goto FAILED;
}
if (*ptr != terminator)
{
*errorcodeptr = ERR42;
@ -4533,14 +4582,19 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
recno = 0;
}
/* In the real compile, seek the name in the table */
/* In the real compile, seek the name in the table. We check the name
first, and then check that we have reached the end of the name in the
table. That way, if the name that is longer than any in the table,
the comparison will fail without reading beyond the table entry. */
else
{
slot = cd->name_table;
for (i = 0; i < cd->names_found; i++)
{
if (strncmp((char *)name, (char *)slot+2, namelen) == 0) break;
if (strncmp((char *)name, (char *)slot+2, namelen) == 0 &&
slot[2+namelen] == 0)
break;
slot += cd->name_entry_size;
}
@ -4577,7 +4631,15 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
{
const uschar *called;
if ((refsign = *ptr) == '+') ptr++;
if ((refsign = *ptr) == '+')
{
ptr++;
if ((digitab[*ptr] & ctype_digit) == 0)
{
*errorcodeptr = ERR63;
goto FAILED;
}
}
else if (refsign == '-')
{
if ((digitab[ptr[1]] & ctype_digit) == 0)
@ -5904,7 +5966,7 @@ to compile parts of the pattern into; the compiled code is discarded when it is
no longer needed, so hopefully this workspace will never overflow, though there
is a test for its doing so. */
cd->bracount = 0;
cd->bracount = cd->final_bracount = 0;
cd->names_found = 0;
cd->name_entry_size = 0;
cd->name_table = NULL;
@ -5981,6 +6043,7 @@ field. Reset the bracket count and the names_found field. Also reset the hwm
field; this time it's used for remembering forward references to subpatterns.
*/
cd->final_bracount = cd->bracount; /* Save for checking forward references */
cd->bracount = 0;
cd->names_found = 0;
cd->name_table = (uschar *)re + re->name_table_offset;

View File

@ -4668,10 +4668,10 @@ for(;;)
if (first_byte_caseless)
while (start_match < end_subject &&
md->lcc[*start_match] != first_byte)
start_match++;
{ NEXTCHAR(start_match); }
else
while (start_match < end_subject && *start_match != first_byte)
start_match++;
{ NEXTCHAR(start_match); }
}
/* Or to just after a linebreak for a multiline match if possible */
@ -4681,7 +4681,7 @@ for(;;)
if (start_match > md->start_subject + start_offset)
{
while (start_match <= end_subject && !WAS_NEWLINE(start_match))
start_match++;
{ NEXTCHAR(start_match); }
/* If we have just passed a CR and the newline option is ANY or ANYCRLF,
and we are now at a LF, advance the match position by one more character.
@ -4702,7 +4702,9 @@ for(;;)
while (start_match < end_subject)
{
register unsigned int c = *start_match;
if ((start_bits[c/8] & (1 << (c&7))) == 0) start_match++; else break;
if ((start_bits[c/8] & (1 << (c&7))) == 0)
{ NEXTCHAR(start_match); }
else break;
}
}

View File

@ -363,6 +363,7 @@ never be called in byte mode. To make sure it can never even appear when UTF-8
support is omitted, we don't even define it. */
#ifndef SUPPORT_UTF8
#define NEXTCHAR(p) p++;
#define GETCHAR(c, eptr) c = *eptr;
#define GETCHARTEST(c, eptr) c = *eptr;
#define GETCHARINC(c, eptr) c = *eptr++;
@ -372,6 +373,13 @@ support is omitted, we don't even define it. */
#else /* SUPPORT_UTF8 */
/* Advance a character pointer one byte in non-UTF-8 mode and by one character
in UTF-8 mode. */
#define NEXTCHAR(p) \
p++; \
if (utf8) { while((*p & 0xc0) == 0x80) p++; }
/* Get the next UTF-8 character, not advancing the pointer. This is called when
we know we are in UTF-8 mode. */
@ -871,7 +879,7 @@ enum { ERR0, ERR1, ERR2, ERR3, ERR4, ERR5, ERR6, ERR7, ERR8, ERR9,
ERR30, ERR31, ERR32, ERR33, ERR34, ERR35, ERR36, ERR37, ERR38, ERR39,
ERR40, ERR41, ERR42, ERR43, ERR44, ERR45, ERR46, ERR47, ERR48, ERR49,
ERR50, ERR51, ERR52, ERR53, ERR54, ERR55, ERR56, ERR57, ERR58, ERR59,
ERR60, ERR61 };
ERR60, ERR61, ERR62, ERR63 };
/* The real format of the start of the pcre block; the index of names and the
code vector run on as long as necessary after the end. We store an explicit
@ -934,7 +942,8 @@ typedef struct compile_data {
uschar *name_table; /* The name/number table */
int names_found; /* Number of entries so far */
int name_entry_size; /* Size of each entry */
int bracount; /* Count of capturing parens */
int bracount; /* Count of capturing parens as we compile */
int final_bracount; /* Saved value after first pass */
int top_backref; /* Maximum back reference */
unsigned int backref_map; /* Bitmap of low back refs */
int external_options; /* External (initial) options */
@ -1036,7 +1045,7 @@ typedef struct dfa_match_data {
#define ctype_letter 0x02
#define ctype_digit 0x04
#define ctype_xdigit 0x08
#define ctype_word 0x10 /* alphameric or '_' */
#define ctype_word 0x10 /* alphanumeric or '_' */
#define ctype_meta 0x80 /* regexp meta char or zero (end pattern) */
/* Offsets for the bitmap tables in pcre_cbits. Each table contains a set

View File

@ -60,7 +60,7 @@ an invalid string are then undefined.
Originally, this function checked according to RFC 2279, allowing for values in
the range 0 to 0x7fffffff, up to 6 bytes long, but ensuring that they were in
the canonical format. Once somebody had pointed out RFC 3629 to me (it
obsoletes 2279), additional restrictions were applies. The values are now
obsoletes 2279), additional restrictions were applied. The values are now
limited to be between 0 and 0x0010ffff, no more than 4 bytes long, and the
subrange 0xd000 to 0xdfff is excluded.

File diff suppressed because it is too large Load Diff

View File

@ -122,7 +122,9 @@ static const int eint[] = {
REG_INVARG, /* inconsistent NEWLINE options */
REG_BADPAT, /* \g is not followed followed by an (optionally braced) non-zero number */
REG_BADPAT, /* (?+ or (?- must be followed by a non-zero number */
REG_BADPAT /* number is too big */
REG_BADPAT, /* number is too big */
REG_BADPAT, /* subpattern name expected */
REG_BADPAT /* digit expected after (?+ */
};
/* Table of texts corresponding to POSIX error codes */

View File

@ -358,10 +358,13 @@ after the binary zero
./testdata/grepinput:597:after the binary zero
---------------------------- Test 42 ------------------------------
595:before
595:zero
596:zero
597:after
597:zero
---------------------------- Test 43 ------------------------------
595:before
595:zero
596:zero
597:zero
---------------------------- Test 44 ------------------------------
@ -387,3 +390,15 @@ PUT NEW DATA ABOVE THIS LINE.
over the lazy dog.
---------------------------- Test 51 ------------------------------
fox jumps
---------------------------- Test 52 ------------------------------
36972,6
36990,4
37024,4
37066,5
37083,4
---------------------------- Test 53 ------------------------------
595:15,6
595:33,4
596:28,4
597:15,5
597:32,4

View File

@ -3421,11 +3421,6 @@
/((?m)^b)/
a\nb\nc\n
/(?(1)a|b)/
/(?(1)b|a)/
a
/(x)?(?(1)a|b)/
*** Failers
a
@ -4030,4 +4025,15 @@
/( (?(1)0|)* )/x
abcd
/[[:abcd:xyz]]/
a]
:]
/[abc[:x\]pqr]/
a
[
:
]
p
/ End of testinput1 /

View File

@ -398,8 +398,6 @@
/(?(1?)a|b)/
/(?(1)a|b|c)/
/[a[:xyz:/
/(?<=x+)y/
@ -568,15 +566,15 @@
/ab\d+/I
/a(?(1)b)/I
/a(?(1)b)(.)/I
/a(?(1)bag|big)/I
/a(?(1)bag|big)(.)/I
/a(?(1)bag|big)*/I
/a(?(1)bag|big)*(.)/I
/a(?(1)bag|big)+/I
/a(?(1)bag|big)+(.)/I
/a(?(1)b..|b..)/I
/a(?(1)b..|b..)(.)/I
/ab\d{0}e/I
@ -977,13 +975,13 @@
/()a/I
/(?(1)ab|ac)/I
/(?(1)ab|ac)(.)/I
/(?(1)abz|acz)/I
/(?(1)abz|acz)(.)/I
/(?(1)abz)/I
/(?(1)abz)(.)/I
/(?(1)abz)123/I
/(?(1)abz)(1)23/I
/(a)+/I
@ -2190,8 +2188,8 @@ a random value. /Ix
/((?(-2)a))/BZ
/^(?(+1)X|Y)/BZ
Y
/^(?(+1)X|Y)(.)/BZ
Y!
/(foo)\Kbar/
foobar
@ -2535,4 +2533,60 @@ a random value. /Ix
/(*CRLF)(*BSR_ANYCRLF)(*CR)ab/I
/(?<a>)(?&)/
/(?<abc>)(?&a)/
/(?<a>)(?&aaaaaaaaaaaaaaaaaaaaaaa)/
/(?+-a)/
/(?-+a)/
/(?(-1))/
/(?(+10))/
/(?(10))/
/(?(+2))()()/
/(?(2))()()/
/\k''/
/\k<>/
/\k{}/
/(?P=)/
/(?P>)/
/(?!\w)(?R)/
/(?=\w)(?R)/
/(?<!\w)(?R)/
/(?<=\w)(?R)/
/[[:foo:]]/
/[[:1234:]]/
/[[:f\oo:]]/
/[[: :]]/
/[[:...:]]/
/[[:l\ower:]]/
/[[:abc\:]]/
/[abc[:x\]pqr:]]/
/[[:a\dz:]]/
/ End of testinput2 /

View File

@ -535,4 +535,76 @@
/\W{2}/8g
+\x{a3}==
/\S/8g
\x{442}\x{435}\x{441}\x{442}
/[\S]/8g
\x{442}\x{435}\x{441}\x{442}
/\D/8g
\x{442}\x{435}\x{441}\x{442}
/[\D]/8g
\x{442}\x{435}\x{441}\x{442}
/\W/8g
\x{2442}\x{2435}\x{2441}\x{2442}
/[\W]/8g
\x{2442}\x{2435}\x{2441}\x{2442}
/[\S\s]*/8
abc\n\r\x{442}\x{435}\x{441}\x{442}xyz
/[\x{41f}\S]/8g
\x{442}\x{435}\x{441}\x{442}
/.[^\S]./8g
abc def\x{442}\x{443}xyz\npqr
/.[^\S\n]./8g
abc def\x{442}\x{443}xyz\npqr
/[[:^alnum:]]/8g
+\x{2442}
/[[:^alpha:]]/8g
+\x{2442}
/[[:^ascii:]]/8g
A\x{442}
/[[:^blank:]]/8g
A\x{442}
/[[:^cntrl:]]/8g
A\x{442}
/[[:^digit:]]/8g
A\x{442}
/[[:^graph:]]/8g
\x19\x{e01ff}
/[[:^lower:]]/8g
A\x{422}
/[[:^print:]]/8g
\x{19}\x{e01ff}
/[[:^punct:]]/8g
A\x{442}
/[[:^space:]]/8g
A\x{442}
/[[:^upper:]]/8g
a\x{442}
/[[:^word:]]/8g
+\x{2442}
/[[:^xdigit:]]/8g
M\x{442}
/ End of testinput4 /

View File

@ -453,4 +453,12 @@ can't tell the difference.) --/
a\x{85}b\<bsr_anycrlf>
a\x0bb\<bsr_anycrlf>
/.*a.*=.b.*/8<ANY>
QQQ\x{2029}ABCaXYZ=!bPQR
** Failers
a\x{2029}b
\x61\xe2\x80\xa9\x62
/[[:a\x{100}b:]]/8
/ End of testinput5 /

View File

@ -832,4 +832,79 @@ was broken in all cases./
/(\p{Yi}{0,3}+\277)*/
/^[\p{Arabic}]/8
\x{60e}
\x{656}
\x{657}
\x{658}
\x{659}
\x{65a}
\x{65b}
\x{65c}
\x{65d}
\x{65e}
\x{66a}
\x{6e9}
\x{6ef}
\x{6fa}
** Failers
\x{600}
\x{650}
\x{651}
\x{652}
\x{653}
\x{654}
\x{655}
\x{65f}
/^\p{Cyrillic}/8
\x{1d2b}
/^\p{Common}/8
\x{589}
\x{60c}
\x{61f}
\x{964}
\x{965}
\x{970}
/^\p{Inherited}/8
\x{64b}
\x{654}
\x{655}
\x{200c}
** Failers
\x{64a}
\x{656}
/^\p{Shavian}/8
\x{10450}
\x{1047f}
/^\p{Deseret}/8
\x{10400}
\x{1044f}
/^\p{Osmanya}/8
\x{10480}
\x{1049d}
\x{104a0}
\x{104a9}
** Failers
\x{1049e}
\x{1049f}
\x{104aa}
/\p{Zl}{2,3}+/8BZ
\xe2\x80\xa8\xe2\x80\xa8
\x{2028}\x{2028}\x{2028}
/\p{Zl}/8BZ
/\p{Lu}{3}+/8BZ
/\pL{2}+/8BZ
/\p{Cc}{2}+/8BZ
/ End of testinput6 /

View File

@ -5551,12 +5551,6 @@ No match
0: b
1: b
/(?(1)a|b)/
/(?(1)b|a)/
a
0: a
/(x)?(?(1)a|b)/
*** Failers
No match
@ -6593,4 +6587,22 @@ No match
0:
1:
/[[:abcd:xyz]]/
a]
0: a]
:]
0: :]
/[abc[:x\]pqr]/
a
0: a
[
0: [
:
0: :
]
0: ]
p
0: p
/ End of testinput1 /

View File

@ -109,7 +109,7 @@ Failed: missing ) at offset 4
Failed: missing ) after comment at offset 7
/(?z)abc/
Failed: unrecognized character after (? at offset 2
Failed: unrecognized character after (? or (?- at offset 2
/.*b/I
Capturing subpattern count = 0
@ -310,7 +310,7 @@ No match
No match
/ab(?z)cd/
Failed: unrecognized character after (? at offset 4
Failed: unrecognized character after (? or (?- at offset 4
/^abc|def/I
Capturing subpattern count = 0
@ -946,26 +946,23 @@ Failed: missing ) at offset 4
Failed: unrecognized character after (?< at offset 3
/a(?{)b/
Failed: unrecognized character after (? at offset 3
Failed: unrecognized character after (? or (?- at offset 3
/a(?{{})b/
Failed: unrecognized character after (? at offset 3
Failed: unrecognized character after (? or (?- at offset 3
/a(?{}})b/
Failed: unrecognized character after (? at offset 3
Failed: unrecognized character after (? or (?- at offset 3
/a(?{"{"})b/
Failed: unrecognized character after (? at offset 3
Failed: unrecognized character after (? or (?- at offset 3
/a(?{"{"}})b/
Failed: unrecognized character after (? at offset 3
Failed: unrecognized character after (? or (?- at offset 3
/(?(1?)a|b)/
Failed: malformed number or name after (?( at offset 4
/(?(1)a|b|c)/
Failed: conditional group contains more than two branches at offset 10
/[a[:xyz:/
Failed: missing terminating ] for character class at offset 8
@ -1599,32 +1596,32 @@ No options
First char = 'a'
Need char = 'b'
/a(?(1)b)/I
Capturing subpattern count = 0
/a(?(1)b)(.)/I
Capturing subpattern count = 1
No options
First char = 'a'
No need char
/a(?(1)bag|big)/I
Capturing subpattern count = 0
/a(?(1)bag|big)(.)/I
Capturing subpattern count = 1
No options
First char = 'a'
Need char = 'g'
/a(?(1)bag|big)*/I
Capturing subpattern count = 0
/a(?(1)bag|big)*(.)/I
Capturing subpattern count = 1
No options
First char = 'a'
No need char
/a(?(1)bag|big)+/I
Capturing subpattern count = 0
/a(?(1)bag|big)+(.)/I
Capturing subpattern count = 1
No options
First char = 'a'
Need char = 'g'
/a(?(1)b..|b..)/I
Capturing subpattern count = 0
/a(?(1)b..|b..)(.)/I
Capturing subpattern count = 1
No options
First char = 'a'
Need char = 'b'
@ -1905,7 +1902,7 @@ No need char
------------------------------------------------------------------
Bra
^
[\x00-/:-@[-`{-\xff]
[\x00-/:-@[-`{-\xff] (neg)
Ket
End
------------------------------------------------------------------
@ -1931,7 +1928,7 @@ No need char
------------------------------------------------------------------
Bra
^
[\x00-@[-`{-\xff]
[\x00-@[-`{-\xff] (neg)
Ket
End
------------------------------------------------------------------
@ -1965,7 +1962,7 @@ No need char
------------------------------------------------------------------
Bra
^
[\x80-\xff]
[\x80-\xff] (neg)
Ket
End
------------------------------------------------------------------
@ -1991,7 +1988,7 @@ No need char
------------------------------------------------------------------
Bra
^
[\x00-\x08\x0a-\x1f!-\xff]
[\x00-\x08\x0a-\x1f!-\xff] (neg)
Ket
End
------------------------------------------------------------------
@ -2142,7 +2139,7 @@ No need char
------------------------------------------------------------------
Bra
^
[ -~\x80-\xff]
[ -~\x80-\xff] (neg)
Ket
End
------------------------------------------------------------------
@ -2155,7 +2152,7 @@ No need char
------------------------------------------------------------------
Bra
^
[\x00-/12:-\xff]
[\x00-/12:-\xff] (neg)
Ket
End
------------------------------------------------------------------
@ -2168,7 +2165,7 @@ No need char
------------------------------------------------------------------
Bra
^
[\x00-\x08\x0a-\x1f!-\xff]
[\x00-\x08\x0a-\x1f!-\xff] (neg)
Ket
End
------------------------------------------------------------------
@ -2736,7 +2733,7 @@ No need char
/[\S]/DZ
------------------------------------------------------------------
Bra
[\x00-\x08\x0b\x0e-\x1f!-\xff]
[\x00-\x08\x0b\x0e-\x1f!-\xff] (neg)
Ket
End
------------------------------------------------------------------
@ -3441,26 +3438,26 @@ No options
No first char
Need char = 'a'
/(?(1)ab|ac)/I
Capturing subpattern count = 0
/(?(1)ab|ac)(.)/I
Capturing subpattern count = 1
No options
First char = 'a'
No need char
/(?(1)abz|acz)/I
Capturing subpattern count = 0
/(?(1)abz|acz)(.)/I
Capturing subpattern count = 1
No options
First char = 'a'
Need char = 'z'
/(?(1)abz)/I
Capturing subpattern count = 0
/(?(1)abz)(.)/I
Capturing subpattern count = 1
No options
No first char
No need char
/(?(1)abz)123/I
Capturing subpattern count = 0
/(?(1)abz)(1)23/I
Capturing subpattern count = 1
No options
No first char
Need char = '3'
@ -8308,7 +8305,7 @@ Failed: reference to non-existent subpattern at offset 6
/((?(-2)a))/BZ
Failed: reference to non-existent subpattern at offset 7
/^(?(+1)X|Y)/BZ
/^(?(+1)X|Y)(.)/BZ
------------------------------------------------------------------
Bra
^
@ -8318,11 +8315,15 @@ Failed: reference to non-existent subpattern at offset 7
Alt
Y
Ket
CBra 1
Any
Ket
Ket
End
------------------------------------------------------------------
Y
0: Y
Y!
0: Y!
1: !
/(foo)\Kbar/
foobar
@ -9302,4 +9303,86 @@ Forced newline sequence: CR
First char = 'a'
Need char = 'b'
/(?<a>)(?&)/
Failed: subpattern name expected at offset 9
/(?<abc>)(?&a)/
Failed: reference to non-existent subpattern at offset 12
/(?<a>)(?&aaaaaaaaaaaaaaaaaaaaaaa)/
Failed: reference to non-existent subpattern at offset 32
/(?+-a)/
Failed: digit expected after (?+ at offset 3
/(?-+a)/
Failed: unrecognized character after (? or (?- at offset 3
/(?(-1))/
Failed: reference to non-existent subpattern at offset 6
/(?(+10))/
Failed: reference to non-existent subpattern at offset 7
/(?(10))/
Failed: reference to non-existent subpattern at offset 6
/(?(+2))()()/
/(?(2))()()/
/\k''/
Failed: subpattern name expected at offset 3
/\k<>/
Failed: subpattern name expected at offset 3
/\k{}/
Failed: subpattern name expected at offset 3
/(?P=)/
Failed: subpattern name expected at offset 4
/(?P>)/
Failed: subpattern name expected at offset 4
/(?!\w)(?R)/
Failed: recursive call could loop indefinitely at offset 9
/(?=\w)(?R)/
Failed: recursive call could loop indefinitely at offset 9
/(?<!\w)(?R)/
Failed: recursive call could loop indefinitely at offset 10
/(?<=\w)(?R)/
Failed: recursive call could loop indefinitely at offset 10
/[[:foo:]]/
Failed: unknown POSIX class name at offset 3
/[[:1234:]]/
Failed: unknown POSIX class name at offset 3
/[[:f\oo:]]/
Failed: unknown POSIX class name at offset 3
/[[: :]]/
Failed: unknown POSIX class name at offset 3
/[[:...:]]/
Failed: unknown POSIX class name at offset 3
/[[:l\ower:]]/
Failed: unknown POSIX class name at offset 3
/[[:abc\:]]/
Failed: unknown POSIX class name at offset 3
/[abc[:x\]pqr:]]/
Failed: unknown POSIX class name at offset 6
/[[:a\dz:]]/
Failed: unknown POSIX class name at offset 3
/ End of testinput2 /

View File

@ -938,4 +938,135 @@ No match
0: +\x{a3}
0: ==
/\S/8g
\x{442}\x{435}\x{441}\x{442}
0: \x{442}
0: \x{435}
0: \x{441}
0: \x{442}
/[\S]/8g
\x{442}\x{435}\x{441}\x{442}
0: \x{442}
0: \x{435}
0: \x{441}
0: \x{442}
/\D/8g
\x{442}\x{435}\x{441}\x{442}
0: \x{442}
0: \x{435}
0: \x{441}
0: \x{442}
/[\D]/8g
\x{442}\x{435}\x{441}\x{442}
0: \x{442}
0: \x{435}
0: \x{441}
0: \x{442}
/\W/8g
\x{2442}\x{2435}\x{2441}\x{2442}
0: \x{2442}
0: \x{2435}
0: \x{2441}
0: \x{2442}
/[\W]/8g
\x{2442}\x{2435}\x{2441}\x{2442}
0: \x{2442}
0: \x{2435}
0: \x{2441}
0: \x{2442}
/[\S\s]*/8
abc\n\r\x{442}\x{435}\x{441}\x{442}xyz
0: abc\x{0a}\x{0d}\x{442}\x{435}\x{441}\x{442}xyz
/[\x{41f}\S]/8g
\x{442}\x{435}\x{441}\x{442}
0: \x{442}
0: \x{435}
0: \x{441}
0: \x{442}
/.[^\S]./8g
abc def\x{442}\x{443}xyz\npqr
0: c d
0: z\x{0a}p
/.[^\S\n]./8g
abc def\x{442}\x{443}xyz\npqr
0: c d
/[[:^alnum:]]/8g
+\x{2442}
0: +
0: \x{2442}
/[[:^alpha:]]/8g
+\x{2442}
0: +
0: \x{2442}
/[[:^ascii:]]/8g
A\x{442}
0: \x{442}
/[[:^blank:]]/8g
A\x{442}
0: A
0: \x{442}
/[[:^cntrl:]]/8g
A\x{442}
0: A
0: \x{442}
/[[:^digit:]]/8g
A\x{442}
0: A
0: \x{442}
/[[:^graph:]]/8g
\x19\x{e01ff}
0: \x{19}
0: \x{e01ff}
/[[:^lower:]]/8g
A\x{422}
0: A
0: \x{422}
/[[:^print:]]/8g
\x{19}\x{e01ff}
0: \x{19}
0: \x{e01ff}
/[[:^punct:]]/8g
A\x{442}
0: A
0: \x{442}
/[[:^space:]]/8g
A\x{442}
0: A
0: \x{442}
/[[:^upper:]]/8g
a\x{442}
0: a
0: \x{442}
/[[:^word:]]/8g
+\x{2442}
0: +
0: \x{2442}
/[[:^xdigit:]]/8g
M\x{442}
0: M
0: \x{442}
/ End of testinput4 /

View File

@ -1595,4 +1595,17 @@ No match
a\x0bb\<bsr_anycrlf>
No match
/.*a.*=.b.*/8<ANY>
QQQ\x{2029}ABCaXYZ=!bPQR
0: ABCaXYZ=!bPQR
** Failers
No match
a\x{2029}b
No match
\x61\xe2\x80\xa9\x62
No match
/[[:a\x{100}b:]]/8
Failed: unknown POSIX class name at offset 3
/ End of testinput5 /

View File

@ -1522,4 +1522,161 @@ No match
/(\p{Yi}{0,3}+\277)*/
/^[\p{Arabic}]/8
\x{60e}
0: \x{60e}
\x{656}
0: \x{656}
\x{657}
0: \x{657}
\x{658}
0: \x{658}
\x{659}
0: \x{659}
\x{65a}
0: \x{65a}
\x{65b}
0: \x{65b}
\x{65c}
0: \x{65c}
\x{65d}
0: \x{65d}
\x{65e}
0: \x{65e}
\x{66a}
0: \x{66a}
\x{6e9}
0: \x{6e9}
\x{6ef}
0: \x{6ef}
\x{6fa}
0: \x{6fa}
** Failers
No match
\x{600}
No match
\x{650}
No match
\x{651}
No match
\x{652}
No match
\x{653}
No match
\x{654}
No match
\x{655}
No match
\x{65f}
No match
/^\p{Cyrillic}/8
\x{1d2b}
0: \x{1d2b}
/^\p{Common}/8
\x{589}
0: \x{589}
\x{60c}
0: \x{60c}
\x{61f}
0: \x{61f}
\x{964}
0: \x{964}
\x{965}
0: \x{965}
\x{970}
0: \x{970}
/^\p{Inherited}/8
\x{64b}
0: \x{64b}
\x{654}
0: \x{654}
\x{655}
0: \x{655}
\x{200c}
0: \x{200c}
** Failers
No match
\x{64a}
No match
\x{656}
No match
/^\p{Shavian}/8
\x{10450}
0: \x{10450}
\x{1047f}
0: \x{1047f}
/^\p{Deseret}/8
\x{10400}
0: \x{10400}
\x{1044f}
0: \x{1044f}
/^\p{Osmanya}/8
\x{10480}
0: \x{10480}
\x{1049d}
0: \x{1049d}
\x{104a0}
0: \x{104a0}
\x{104a9}
0: \x{104a9}
** Failers
No match
\x{1049e}
No match
\x{1049f}
No match
\x{104aa}
No match
/\p{Zl}{2,3}+/8BZ
------------------------------------------------------------------
Bra
prop Zl {2}
prop Zl ?+
Ket
End
------------------------------------------------------------------
\xe2\x80\xa8\xe2\x80\xa8
0: \x{2028}\x{2028}
\x{2028}\x{2028}\x{2028}
0: \x{2028}\x{2028}\x{2028}
/\p{Zl}/8BZ
------------------------------------------------------------------
Bra
prop Zl
Ket
End
------------------------------------------------------------------
/\p{Lu}{3}+/8BZ
------------------------------------------------------------------
Bra
prop Lu {3}
Ket
End
------------------------------------------------------------------
/\pL{2}+/8BZ
------------------------------------------------------------------
Bra
prop L {2}
Ket
End
------------------------------------------------------------------
/\p{Cc}{2}+/8BZ
------------------------------------------------------------------
Bra
prop Cc {2}
Ket
End
------------------------------------------------------------------
/ End of testinput6 /

View File

@ -539,7 +539,8 @@ static const cnode ucp_table[] = {
{ 0x21000293, 0x14000000 },
{ 0x21000294, 0x1c000000 },
{ 0x21800295, 0x1400001a },
{ 0x218002b0, 0x18000011 },
{ 0x218002b0, 0x18000008 },
{ 0x098002b9, 0x18000008 },
{ 0x098002c2, 0x60000003 },
{ 0x098002c6, 0x1800000b },
{ 0x098002d2, 0x6000000d },
@ -1039,15 +1040,18 @@ static const cnode ucp_table[] = {
{ 0x198005f3, 0x54000001 },
{ 0x09800600, 0x04000003 },
{ 0x0000060b, 0x5c000000 },
{ 0x0980060c, 0x54000001 },
{ 0x0900060c, 0x54000000 },
{ 0x0000060d, 0x54000000 },
{ 0x0080060e, 0x68000001 },
{ 0x00800610, 0x30000005 },
{ 0x0900061b, 0x54000000 },
{ 0x0080061e, 0x54000001 },
{ 0x0000061e, 0x54000000 },
{ 0x0900061f, 0x54000000 },
{ 0x00800621, 0x1c000019 },
{ 0x09000640, 0x18000000 },
{ 0x00800641, 0x1c000009 },
{ 0x1b80064b, 0x30000013 },
{ 0x1b80064b, 0x3000000a },
{ 0x00800656, 0x30000008 },
{ 0x09800660, 0x34000009 },
{ 0x0080066a, 0x54000003 },
{ 0x0080066e, 0x1c000001 },
@ -1074,7 +1078,8 @@ static const cnode ucp_table[] = {
{ 0x31000711, 0x30000000 },
{ 0x31800712, 0x1c00001d },
{ 0x31800730, 0x3000001a },
{ 0x3180074d, 0x1c000020 },
{ 0x3180074d, 0x1c000002 },
{ 0x00800750, 0x1c00001d },
{ 0x37800780, 0x1c000025 },
{ 0x378007a6, 0x3000000a },
{ 0x370007b1, 0x1c000000 },
@ -1460,7 +1465,10 @@ static const cnode ucp_table[] = {
{ 0x1f0017dd, 0x30000000 },
{ 0x1f8017e0, 0x34000009 },
{ 0x1f8017f0, 0x3c000009 },
{ 0x25801800, 0x54000005 },
{ 0x25801800, 0x54000001 },
{ 0x09801802, 0x54000001 },
{ 0x25001804, 0x54000000 },
{ 0x09001805, 0x54000000 },
{ 0x25001806, 0x44000000 },
{ 0x25801807, 0x54000003 },
{ 0x2580180b, 0x30000002 },
@ -1513,14 +1521,20 @@ static const cnode ucp_table[] = {
{ 0x3d801b61, 0x68000009 },
{ 0x3d801b6b, 0x30000008 },
{ 0x3d801b74, 0x68000008 },
{ 0x21801d00, 0x1400002b },
{ 0x21801d2c, 0x18000035 },
{ 0x21801d62, 0x14000015 },
{ 0x21801d00, 0x14000025 },
{ 0x13801d26, 0x14000004 },
{ 0x0c001d2b, 0x14000000 },
{ 0x21801d2c, 0x18000030 },
{ 0x13801d5d, 0x18000004 },
{ 0x21801d62, 0x14000003 },
{ 0x13801d66, 0x14000004 },
{ 0x21801d6b, 0x1400000c },
{ 0x0c001d78, 0x18000000 },
{ 0x21801d79, 0x14000003 },
{ 0x21001d7d, 0x14000ee6 },
{ 0x21801d7e, 0x1400001c },
{ 0x21801d9b, 0x18000024 },
{ 0x21801d9b, 0x18000023 },
{ 0x13001dbf, 0x18000000 },
{ 0x1b801dc0, 0x3000000a },
{ 0x1b801dfe, 0x30000001 },
{ 0x21001e00, 0x24000001 },
@ -1982,7 +1996,9 @@ static const cnode ucp_table[] = {
{ 0x13001ffc, 0x2000fff7 },
{ 0x13801ffd, 0x60000001 },
{ 0x09802000, 0x7400000a },
{ 0x0980200b, 0x04000004 },
{ 0x0900200b, 0x04000000 },
{ 0x1b80200c, 0x04000001 },
{ 0x0980200e, 0x04000001 },
{ 0x09802010, 0x44000005 },
{ 0x09802016, 0x54000001 },
{ 0x09002018, 0x50000000 },
@ -2615,7 +2631,8 @@ static const cnode ucp_table[] = {
{ 0x090030a0, 0x44000000 },
{ 0x1d8030a1, 0x1c000059 },
{ 0x090030fb, 0x54000000 },
{ 0x098030fc, 0x18000002 },
{ 0x090030fc, 0x18000000 },
{ 0x1d8030fd, 0x18000001 },
{ 0x1d0030ff, 0x1c000000 },
{ 0x03803105, 0x1c000027 },
{ 0x17803131, 0x1c00005d },
@ -2630,7 +2647,8 @@ static const cnode ucp_table[] = {
{ 0x0980322a, 0x68000019 },
{ 0x09003250, 0x68000000 },
{ 0x09803251, 0x3c00000e },
{ 0x17803260, 0x6800001f },
{ 0x17803260, 0x6800001d },
{ 0x0980327e, 0x68000001 },
{ 0x09803280, 0x3c000009 },
{ 0x0980328a, 0x68000026 },
{ 0x098032b1, 0x3c00000e },
@ -2678,7 +2696,8 @@ static const cnode ucp_table[] = {
{ 0x1900fb3e, 0x1c000000 },
{ 0x1980fb40, 0x1c000001 },
{ 0x1980fb43, 0x1c000001 },
{ 0x1980fb46, 0x1c00006b },
{ 0x1980fb46, 0x1c000009 },
{ 0x0080fb50, 0x1c000061 },
{ 0x0080fbd3, 0x1c00016a },
{ 0x0900fd3e, 0x58000000 },
{ 0x0900fd3f, 0x48000000 },
@ -2944,7 +2963,8 @@ static const cnode ucp_table[] = {
{ 0x0d01044d, 0x1400ffd8 },
{ 0x0d01044e, 0x1400ffd8 },
{ 0x0d01044f, 0x1400ffd8 },
{ 0x2e810450, 0x1c00004d },
{ 0x2e810450, 0x1c00002f },
{ 0x2c810480, 0x1c00001d },
{ 0x2c8104a0, 0x34000009 },
{ 0x0b810800, 0x1c000005 },
{ 0x0b010808, 0x1c000000 },