mirror of
https://github.com/php/php-src.git
synced 2025-01-20 18:53:37 +08:00
Upgraded bundled PCRE to version 8.10
This commit is contained in:
parent
8584b90199
commit
ef22824315
4
NEWS
4
NEWS
@ -1,8 +1,8 @@
|
||||
PHP NEWS
|
||||
PHP NEWS
|
||||
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||
?? ??? 201?, PHP 5.3.99
|
||||
- Upgraded bundled sqlite to version 3.6.23.1. (Ilia)
|
||||
- Upgraded bundled PCRE to version 8.02. (Ilia)
|
||||
- Upgraded bundled PCRE to version 8.10. (Ilia)
|
||||
|
||||
- Added caches to eliminate repeatable run-time bindings of functions, classes,
|
||||
constants, methods and properties (Dmitry)
|
||||
|
@ -1,6 +1,101 @@
|
||||
ChangeLog for PCRE
|
||||
------------------
|
||||
|
||||
Version 8.10 25-Jun-2010
|
||||
------------------------
|
||||
|
||||
1. Added support for (*MARK:ARG) and for ARG additions to PRUNE, SKIP, and
|
||||
THEN.
|
||||
|
||||
2. (*ACCEPT) was not working when inside an atomic group.
|
||||
|
||||
3. Inside a character class, \B is treated as a literal by default, but
|
||||
faulted if PCRE_EXTRA is set. This mimics Perl's behaviour (the -w option
|
||||
causes the error). The code is unchanged, but I tidied the documentation.
|
||||
|
||||
4. Inside a character class, PCRE always treated \R and \X as literals,
|
||||
whereas Perl faults them if its -w option is set. I have changed PCRE so
|
||||
that it faults them when PCRE_EXTRA is set.
|
||||
|
||||
5. Added support for \N, which always matches any character other than
|
||||
newline. (It is the same as "." when PCRE_DOTALL is not set.)
|
||||
|
||||
6. When compiling pcregrep with newer versions of gcc which may have
|
||||
FORTIFY_SOURCE set, several warnings "ignoring return value of 'fwrite',
|
||||
declared with attribute warn_unused_result" were given. Just casting the
|
||||
result to (void) does not stop the warnings; a more elaborate fudge is
|
||||
needed. I've used a macro to implement this.
|
||||
|
||||
7. Minor change to pcretest.c to avoid a compiler warning.
|
||||
|
||||
8. Added four artifical Unicode properties to help with an option to make
|
||||
\s etc use properties (see next item). The new properties are: Xan
|
||||
(alphanumeric), Xsp (Perl space), Xps (POSIX space), and Xwd (word).
|
||||
|
||||
9. Added PCRE_UCP to make \b, \d, \s, \w, and certain POSIX character classes
|
||||
use Unicode properties. (*UCP) at the start of a pattern can be used to set
|
||||
this option. Modified pcretest to add /W to test this facility. Added
|
||||
REG_UCP to make it available via the POSIX interface.
|
||||
|
||||
10. Added --line-buffered to pcregrep.
|
||||
|
||||
11. In UTF-8 mode, if a pattern that was compiled with PCRE_CASELESS was
|
||||
studied, and the match started with a letter with a code point greater than
|
||||
127 whose first byte was different to the first byte of the other case of
|
||||
the letter, the other case of this starting letter was not recognized
|
||||
(#976).
|
||||
|
||||
12. If a pattern that was studied started with a repeated Unicode property
|
||||
test, for example, \p{Nd}+, there was the theoretical possibility of
|
||||
setting up an incorrect bitmap of starting bytes, but fortunately it could
|
||||
not have actually happened in practice until change 8 above was made (it
|
||||
added property types that matched character-matching opcodes).
|
||||
|
||||
13. pcre_study() now recognizes \h, \v, and \R when constructing a bit map of
|
||||
possible starting bytes for non-anchored patterns.
|
||||
|
||||
14. Extended the "auto-possessify" feature of pcre_compile(). It now recognizes
|
||||
\R, and also a number of cases that involve Unicode properties, both
|
||||
explicit and implicit when PCRE_UCP is set.
|
||||
|
||||
15. If a repeated Unicode property match (e.g. \p{Lu}*) was used with non-UTF-8
|
||||
input, it could crash or give wrong results if characters with values
|
||||
greater than 0xc0 were present in the subject string. (Detail: it assumed
|
||||
UTF-8 input when processing these items.)
|
||||
|
||||
16. Added a lot of (int) casts to avoid compiler warnings in systems where
|
||||
size_t is 64-bit (#991).
|
||||
|
||||
17. Added a check for running out of memory when PCRE is compiled with
|
||||
--disable-stack-for-recursion (#990).
|
||||
|
||||
18. If the last data line in a file for pcretest does not have a newline on
|
||||
the end, a newline was missing in the output.
|
||||
|
||||
19. The default pcre_chartables.c file recognizes only ASCII characters (values
|
||||
less than 128) in its various bitmaps. However, there is a facility for
|
||||
generating tables according to the current locale when PCRE is compiled. It
|
||||
turns out that in some environments, 0x85 and 0xa0, which are Unicode space
|
||||
characters, are recognized by isspace() and therefore were getting set in
|
||||
these tables, and indeed these tables seem to approximate to ISO 8859. This
|
||||
caused a problem in UTF-8 mode when pcre_study() was used to create a list
|
||||
of bytes that can start a match. For \s, it was including 0x85 and 0xa0,
|
||||
which of course cannot start UTF-8 characters. I have changed the code so
|
||||
that only real ASCII characters (less than 128) and the correct starting
|
||||
bytes for UTF-8 encodings are set for characters greater than 127 when in
|
||||
UTF-8 mode. (When PCRE_UCP is set - see 9 above - the code is different
|
||||
altogether.)
|
||||
|
||||
20. Added the /T option to pcretest so as to be able to run tests with non-
|
||||
standard character tables, thus making it possible to include the tests
|
||||
used for 19 above in the standard set of tests.
|
||||
|
||||
21. A pattern such as (?&t)(?#()(?(DEFINE)(?<t>a)) which has a forward
|
||||
reference to a subpattern the other side of a comment that contains an
|
||||
opening parenthesis caused either an internal compiling error, or a
|
||||
reference to the wrong subpattern.
|
||||
|
||||
|
||||
Version 8.02 19-Mar-2010
|
||||
------------------------
|
||||
|
||||
|
@ -1,6 +1,17 @@
|
||||
News about PCRE releases
|
||||
------------------------
|
||||
|
||||
Release 8.10 25-Jun-2010
|
||||
------------------------
|
||||
|
||||
There are two major additions: support for (*MARK) and friends, and the option
|
||||
PCRE_UCP, which changes the behaviour of \b, \d, \s, and \w (and their
|
||||
opposites) so that they make use of Unicode properties. There are also a number
|
||||
of lesser new features, and several bugs have been fixed. A new option,
|
||||
--line-buffered, has been added to pcregrep, for use when it is connected to
|
||||
pipes.
|
||||
|
||||
|
||||
Release 8.02 19-Mar-2010
|
||||
------------------------
|
||||
|
||||
|
@ -188,9 +188,9 @@ significantly slower when this is done. There is more about stack usage in the
|
||||
LINKING PROGRAMS IN WINDOWS ENVIRONMENTS
|
||||
|
||||
If you want to statically link a program against a PCRE library in the form of
|
||||
a non-dll .a file, you must define PCRE_STATIC before including pcre.h,
|
||||
otherwise the pcre_malloc() and pcre_free() exported functions will be declared
|
||||
__declspec(dllimport), with unwanted results.
|
||||
a non-dll .a file, you must define PCRE_STATIC before including pcre.h or
|
||||
pcrecpp.h, otherwise the pcre_malloc() and pcre_free() exported functions will
|
||||
be declared __declspec(dllimport), with unwanted results.
|
||||
|
||||
|
||||
CALLING CONVENTIONS IN WINDOWS ENVIRONMENTS
|
||||
@ -497,5 +497,5 @@ build.log file in the root of the package also.
|
||||
|
||||
|
||||
=========================
|
||||
Last Updated: 19 January 2010
|
||||
Last Updated: 26 May 2010
|
||||
****
|
||||
|
@ -271,13 +271,16 @@ them both to 0; an emulation function will be used. */
|
||||
#define PACKAGE_NAME "PCRE"
|
||||
|
||||
/* Define to the full name and version of this package. */
|
||||
#define PACKAGE_STRING "PCRE 8.02"
|
||||
#define PACKAGE_STRING "PCRE 8.10"
|
||||
|
||||
/* Define to the one symbol short name of this package. */
|
||||
#define PACKAGE_TARNAME "pcre"
|
||||
|
||||
/* Define to the home page for this package. */
|
||||
#define PACKAGE_URL ""
|
||||
|
||||
/* Define to the version of this package. */
|
||||
#define PACKAGE_VERSION "8.02"
|
||||
#define PACKAGE_VERSION "8.10"
|
||||
|
||||
|
||||
/* If you are compiling for a system other than a Unix-like system or
|
||||
@ -333,7 +336,7 @@ them both to 0; an emulation function will be used. */
|
||||
|
||||
/* Version number of package */
|
||||
#ifndef VERSION
|
||||
#define VERSION "8.02"
|
||||
#define VERSION "8.10"
|
||||
#endif
|
||||
|
||||
/* Define to empty if `const' does not conform to ANSI C. */
|
||||
|
File diff suppressed because it is too large
Load Diff
@ -5,7 +5,7 @@
|
||||
/* This is the public header file for the PCRE library, to be #included by
|
||||
applications that call the PCRE functions.
|
||||
|
||||
Copyright (c) 1997-2009 University of Cambridge
|
||||
Copyright (c) 1997-2010 University of Cambridge
|
||||
|
||||
-----------------------------------------------------------------------------
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
@ -42,9 +42,9 @@ POSSIBILITY OF SUCH DAMAGE.
|
||||
/* The current PCRE version information. */
|
||||
|
||||
#define PCRE_MAJOR 8
|
||||
#define PCRE_MINOR 02
|
||||
#define PCRE_MINOR 10
|
||||
#define PCRE_PRERELEASE
|
||||
#define PCRE_DATE 2010-03-19
|
||||
#define PCRE_DATE 2010-06-25
|
||||
|
||||
/* When an application links to a PCRE DLL in Windows, the symbols that are
|
||||
imported have to be identified as such. When building PCRE, the appropriate
|
||||
@ -131,6 +131,7 @@ both, so we keep them all distinct. */
|
||||
#define PCRE_NO_START_OPTIMISE 0x04000000
|
||||
#define PCRE_PARTIAL_HARD 0x08000000
|
||||
#define PCRE_NOTEMPTY_ATSTART 0x10000000
|
||||
#define PCRE_UCP 0x20000000
|
||||
|
||||
/* Exec-time and get/set-time error codes */
|
||||
|
||||
@ -200,6 +201,7 @@ these bits, just add new ones on the end, in order to remain compatible. */
|
||||
#define PCRE_EXTRA_CALLOUT_DATA 0x0004
|
||||
#define PCRE_EXTRA_TABLES 0x0008
|
||||
#define PCRE_EXTRA_MATCH_LIMIT_RECURSION 0x0010
|
||||
#define PCRE_EXTRA_MARK 0x0020
|
||||
|
||||
/* Types */
|
||||
|
||||
@ -225,6 +227,7 @@ typedef struct pcre_extra {
|
||||
void *callout_data; /* Data passed back in callouts */
|
||||
const unsigned char *tables; /* Pointer to character tables */
|
||||
unsigned long int match_limit_recursion; /* Max recursive calls to match() */
|
||||
unsigned char **mark; /* For passing back a mark pointer */
|
||||
} pcre_extra;
|
||||
|
||||
/* The structure for passing out data via the pcre_callout_function. We use a
|
||||
|
@ -14,7 +14,7 @@ example ISO-8859-1. When dftables is run, it creates these tables in the
|
||||
current locale. If PCRE is configured with --enable-rebuild-chartables, this
|
||||
happens automatically.
|
||||
|
||||
The following #includes are present because without the gcc 4.x may remove the
|
||||
The following #includes are present because without them gcc 4.x may remove the
|
||||
array definition from the final binary if PCRE is built into a static library
|
||||
and dead code stripping is activated. This leads to link errors. Pulling in the
|
||||
header ensures that the array gets flagged as "someone outside this compilation
|
||||
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@ -192,9 +192,7 @@ stdint.h is available, include it; it may define INT64_MAX. Systems that do not
|
||||
have stdint.h (e.g. Solaris) may have inttypes.h. The macro int64_t may be set
|
||||
by "configure". */
|
||||
|
||||
#ifdef PHP_WIN32
|
||||
#include "win32/php_stdint.h"
|
||||
#elif HAVE_STDINT_H
|
||||
#if HAVE_STDINT_H
|
||||
#include <stdint.h>
|
||||
#elif HAVE_INTTYPES_H
|
||||
#include <inttypes.h>
|
||||
@ -477,7 +475,8 @@ know we are in UTF-8 mode. */
|
||||
} \
|
||||
}
|
||||
|
||||
/* Get the next character, testing for UTF-8 mode, and advancing the pointer */
|
||||
/* Get the next character, testing for UTF-8 mode, and advancing the pointer.
|
||||
This is called when we don't know if we are in UTF-8 mode. */
|
||||
|
||||
#define GETCHARINCTEST(c, eptr) \
|
||||
c = *eptr++; \
|
||||
@ -514,7 +513,7 @@ if there are extra bytes. This is called when we know we are in UTF-8 mode. */
|
||||
|
||||
/* Get the next UTF-8 character, testing for UTF-8 mode, not advancing the
|
||||
pointer, incrementing length if there are extra bytes. This is called when we
|
||||
know we are in UTF-8 mode. */
|
||||
do not know if we are in UTF-8 mode. */
|
||||
|
||||
#define GETCHARLENTEST(c, eptr, len) \
|
||||
c = *eptr; \
|
||||
@ -582,7 +581,7 @@ time, run time, or study time, respectively. */
|
||||
PCRE_DOTALL|PCRE_DOLLAR_ENDONLY|PCRE_EXTRA|PCRE_UNGREEDY|PCRE_UTF8| \
|
||||
PCRE_NO_AUTO_CAPTURE|PCRE_NO_UTF8_CHECK|PCRE_AUTO_CALLOUT|PCRE_FIRSTLINE| \
|
||||
PCRE_DUPNAMES|PCRE_NEWLINE_BITS|PCRE_BSR_ANYCRLF|PCRE_BSR_UNICODE| \
|
||||
PCRE_JAVASCRIPT_COMPAT)
|
||||
PCRE_JAVASCRIPT_COMPAT|PCRE_UCP)
|
||||
|
||||
#define PUBLIC_EXEC_OPTIONS \
|
||||
(PCRE_ANCHORED|PCRE_NOTBOL|PCRE_NOTEOL|PCRE_NOTEMPTY|PCRE_NOTEMPTY_ATSTART| \
|
||||
@ -877,6 +876,7 @@ so that PCRE works on both ASCII and EBCDIC platforms, in non-UTF-mode only. */
|
||||
#define STRING_COMMIT0 "COMMIT\0"
|
||||
#define STRING_F0 "F\0"
|
||||
#define STRING_FAIL0 "FAIL\0"
|
||||
#define STRING_MARK0 "MARK\0"
|
||||
#define STRING_PRUNE0 "PRUNE\0"
|
||||
#define STRING_SKIP0 "SKIP\0"
|
||||
#define STRING_THEN "THEN"
|
||||
@ -906,6 +906,7 @@ so that PCRE works on both ASCII and EBCDIC platforms, in non-UTF-mode only. */
|
||||
#define STRING_BSR_ANYCRLF_RIGHTPAR "BSR_ANYCRLF)"
|
||||
#define STRING_BSR_UNICODE_RIGHTPAR "BSR_UNICODE)"
|
||||
#define STRING_UTF8_RIGHTPAR "UTF8)"
|
||||
#define STRING_UCP_RIGHTPAR "UCP)"
|
||||
|
||||
#else /* SUPPORT_UTF8 */
|
||||
|
||||
@ -1129,6 +1130,7 @@ only. */
|
||||
#define STRING_COMMIT0 STR_C STR_O STR_M STR_M STR_I STR_T "\0"
|
||||
#define STRING_F0 STR_F "\0"
|
||||
#define STRING_FAIL0 STR_F STR_A STR_I STR_L "\0"
|
||||
#define STRING_MARK0 STR_M STR_A STR_R STR_K "\0"
|
||||
#define STRING_PRUNE0 STR_P STR_R STR_U STR_N STR_E "\0"
|
||||
#define STRING_SKIP0 STR_S STR_K STR_I STR_P "\0"
|
||||
#define STRING_THEN STR_T STR_H STR_E STR_N
|
||||
@ -1158,6 +1160,7 @@ only. */
|
||||
#define STRING_BSR_ANYCRLF_RIGHTPAR STR_B STR_S STR_R STR_UNDERSCORE STR_A STR_N STR_Y STR_C STR_R STR_L STR_F STR_RIGHT_PARENTHESIS
|
||||
#define STRING_BSR_UNICODE_RIGHTPAR STR_B STR_S STR_R STR_UNDERSCORE STR_U STR_N STR_I STR_C STR_O STR_D STR_E STR_RIGHT_PARENTHESIS
|
||||
#define STRING_UTF8_RIGHTPAR STR_U STR_T STR_F STR_8 STR_RIGHT_PARENTHESIS
|
||||
#define STRING_UCP_RIGHTPAR STR_U STR_C STR_P STR_RIGHT_PARENTHESIS
|
||||
|
||||
#endif /* SUPPORT_UTF8 */
|
||||
|
||||
@ -1190,9 +1193,13 @@ only. */
|
||||
|
||||
#define PT_ANY 0 /* Any property - matches all chars */
|
||||
#define PT_LAMP 1 /* L& - the union of Lu, Ll, Lt */
|
||||
#define PT_GC 2 /* General characteristic (e.g. L) */
|
||||
#define PT_PC 3 /* Particular characteristic (e.g. Lu) */
|
||||
#define PT_GC 2 /* Specified general characteristic (e.g. L) */
|
||||
#define PT_PC 3 /* Specified particular characteristic (e.g. Lu) */
|
||||
#define PT_SC 4 /* Script (e.g. Han) */
|
||||
#define PT_ALNUM 5 /* Alphanumeric - the union of L and N */
|
||||
#define PT_SPACE 6 /* Perl space - Z plus 9,10,12,13 */
|
||||
#define PT_PXSPACE 7 /* POSIX space - Z plus 9,10,11,12,13 */
|
||||
#define PT_WORD 8 /* Word - L plus N plus underscore */
|
||||
|
||||
/* Flag bits and data types for the extended class (OP_XCLASS) for classes that
|
||||
contain UTF-8 characters with values greater than 255. */
|
||||
@ -1209,9 +1216,15 @@ contain UTF-8 characters with values greater than 255. */
|
||||
/* These are escaped items that aren't just an encoding of a particular data
|
||||
value such as \n. They must have non-zero values, as check_escape() returns
|
||||
their negation. Also, they must appear in the same order as in the opcode
|
||||
definitions below, up to ESC_z. There's a dummy for OP_ANY because it
|
||||
corresponds to "." rather than an escape sequence, and another for OP_ALLANY
|
||||
(which is used for [^] in JavaScript compatibility mode).
|
||||
definitions below, up to ESC_z. There's a dummy for OP_ALLANY because it
|
||||
corresponds to "." in DOTALL mode rather than an escape sequence. It is also
|
||||
used for [^] in JavaScript compatibility mode. In non-DOTALL mode, "." behaves
|
||||
like \N.
|
||||
|
||||
The special values ESC_DU, ESC_du, etc. are used instead of ESC_D, ESC_d, etc.
|
||||
when PCRE_UCP is set, when replacement of \d etc by \p sequences is required.
|
||||
They must be contiguous, and remain in order so that the replacements can be
|
||||
looked up from a table.
|
||||
|
||||
The final escape must be ESC_REF as subsequent values are used for
|
||||
backreferences (\1, \2, \3, etc). There are two tests in the code for an escape
|
||||
@ -1221,11 +1234,12 @@ put in between that don't consume a character, that code will have to change.
|
||||
*/
|
||||
|
||||
enum { ESC_A = 1, ESC_G, ESC_K, ESC_B, ESC_b, ESC_D, ESC_d, ESC_S, ESC_s,
|
||||
ESC_W, ESC_w, ESC_dum1, ESC_dum2, ESC_C, ESC_P, ESC_p, ESC_R, ESC_H,
|
||||
ESC_h, ESC_V, ESC_v, ESC_X, ESC_Z, ESC_z, ESC_E, ESC_Q, ESC_g, ESC_k,
|
||||
ESC_W, ESC_w, ESC_N, ESC_dum, ESC_C, ESC_P, ESC_p, ESC_R, ESC_H,
|
||||
ESC_h, ESC_V, ESC_v, ESC_X, ESC_Z, ESC_z,
|
||||
ESC_E, ESC_Q, ESC_g, ESC_k,
|
||||
ESC_DU, ESC_du, ESC_SU, ESC_su, ESC_WU, ESC_wu,
|
||||
ESC_REF };
|
||||
|
||||
|
||||
/* Opcode table: Starting from 1 (i.e. after OP_END), the values up to
|
||||
OP_EOD must correspond in order to the list of escapes immediately above.
|
||||
|
||||
@ -1249,8 +1263,8 @@ enum {
|
||||
OP_WHITESPACE, /* 9 \s */
|
||||
OP_NOT_WORDCHAR, /* 10 \W */
|
||||
OP_WORDCHAR, /* 11 \w */
|
||||
OP_ANY, /* 12 Match any character (subject to DOTALL) */
|
||||
OP_ALLANY, /* 13 Match any character (not subject to DOTALL) */
|
||||
OP_ANY, /* 12 Match any character except newline */
|
||||
OP_ALLANY, /* 13 Match any character */
|
||||
OP_ANYBYTE, /* 14 Match any byte (\C); different to OP_ANY for UTF-8 */
|
||||
OP_NOTPROP, /* 15 \P (not Unicode property) */
|
||||
OP_PROP, /* 16 \p (Unicode property) */
|
||||
@ -1380,20 +1394,24 @@ enum {
|
||||
|
||||
/* These are backtracking control verbs */
|
||||
|
||||
OP_PRUNE, /* 107 */
|
||||
OP_SKIP, /* 108 */
|
||||
OP_THEN, /* 109 */
|
||||
OP_COMMIT, /* 110 */
|
||||
OP_MARK, /* 107 always has an argument */
|
||||
OP_PRUNE, /* 108 */
|
||||
OP_PRUNE_ARG, /* 109 same, but with argument */
|
||||
OP_SKIP, /* 110 */
|
||||
OP_SKIP_ARG, /* 111 same, but with argument */
|
||||
OP_THEN, /* 112 */
|
||||
OP_THEN_ARG, /* 113 same, but with argument */
|
||||
OP_COMMIT, /* 114 */
|
||||
|
||||
/* These are forced failure and success verbs */
|
||||
|
||||
OP_FAIL, /* 111 */
|
||||
OP_ACCEPT, /* 112 */
|
||||
OP_CLOSE, /* 113 Used before OP_ACCEPT to close open captures */
|
||||
OP_FAIL, /* 115 */
|
||||
OP_ACCEPT, /* 116 */
|
||||
OP_CLOSE, /* 117 Used before OP_ACCEPT to close open captures */
|
||||
|
||||
/* This is used to skip a subpattern with a {0} quantifier */
|
||||
|
||||
OP_SKIPZERO, /* 114 */
|
||||
OP_SKIPZERO, /* 118 */
|
||||
|
||||
/* This is not an opcode, but is used to check that tables indexed by opcode
|
||||
are the correct length, in order to catch updating errors - there have been
|
||||
@ -1404,7 +1422,7 @@ enum {
|
||||
|
||||
/* *** NOTE NOTE NOTE *** Whenever the list above is updated, the two macro
|
||||
definitions that follow must also be updated to match. There are also tables
|
||||
called "coptable" cna "poptable" in pcre_dfa_exec.c that must be updated. */
|
||||
called "coptable" and "poptable" in pcre_dfa_exec.c that must be updated. */
|
||||
|
||||
|
||||
/* This macro defines textual names for all the opcodes. These are used only
|
||||
@ -1429,7 +1447,8 @@ for debugging. The macro is referenced only in pcre_printint.c. */
|
||||
"Once", "Bra", "CBra", "Cond", "SBra", "SCBra", "SCond", \
|
||||
"Cond ref", "Cond nref", "Cond rec", "Cond nrec", "Cond def", \
|
||||
"Brazero", "Braminzero", \
|
||||
"*PRUNE", "*SKIP", "*THEN", "*COMMIT", "*FAIL", "*ACCEPT", \
|
||||
"*MARK", "*PRUNE", "*PRUNE", "*SKIP", "*SKIP", \
|
||||
"*THEN", "*THEN", "*COMMIT", "*FAIL", "*ACCEPT", \
|
||||
"Close", "Skip zero"
|
||||
|
||||
|
||||
@ -1495,8 +1514,9 @@ in UTF-8 mode. The code that uses this table must know about such things. */
|
||||
3, 3, /* RREF, NRREF */ \
|
||||
1, /* DEF */ \
|
||||
1, 1, /* BRAZERO, BRAMINZERO */ \
|
||||
1, 1, 1, 1, /* PRUNE, SKIP, THEN, COMMIT, */ \
|
||||
1, 1, 3, 1 /* FAIL, ACCEPT, CLOSE, SKIPZERO */
|
||||
3, 1, 3, /* MARK, PRUNE, PRUNE_ARG, */ \
|
||||
1, 3, 1, 3, /* SKIP, SKIP_ARG, THEN, THEN_ARG, */ \
|
||||
1, 1, 1, 3, 1 /* COMMIT, FAIL, ACCEPT, CLOSE, SKIPZERO */
|
||||
|
||||
|
||||
/* A magic value for OP_RREF and OP_NRREF to indicate the "any recursion"
|
||||
@ -1514,7 +1534,7 @@ enum { ERR0, ERR1, ERR2, ERR3, ERR4, ERR5, ERR6, ERR7, ERR8, ERR9,
|
||||
ERR30, ERR31, ERR32, ERR33, ERR34, ERR35, ERR36, ERR37, ERR38, ERR39,
|
||||
ERR40, ERR41, ERR42, ERR43, ERR44, ERR45, ERR46, ERR47, ERR48, ERR49,
|
||||
ERR50, ERR51, ERR52, ERR53, ERR54, ERR55, ERR56, ERR57, ERR58, ERR59,
|
||||
ERR60, ERR61, ERR62, ERR63, ERR64, ERR65, ERRCOUNT };
|
||||
ERR60, ERR61, ERR62, ERR63, ERR64, ERR65, ERR66, ERR67, ERRCOUNT };
|
||||
|
||||
/* The real format of the start of the pcre block; the index of names and the
|
||||
code vector run on as long as necessary after the end. We store an explicit
|
||||
@ -1657,6 +1677,7 @@ typedef struct match_data {
|
||||
BOOL noteol; /* NOTEOL flag */
|
||||
BOOL utf8; /* UTF8 flag */
|
||||
BOOL jscript_compat; /* JAVASCRIPT_COMPAT flag */
|
||||
BOOL use_ucp; /* PCRE_UCP flag */
|
||||
BOOL endonly; /* Dollar not before final \n */
|
||||
BOOL notempty; /* Empty string match not wanted */
|
||||
BOOL notempty_atstart; /* Empty string match at start not wanted */
|
||||
@ -1676,6 +1697,7 @@ typedef struct match_data {
|
||||
int eptrn; /* Next free eptrblock */
|
||||
recursion_info *recursive; /* Linked list of recursion data */
|
||||
void *callout_data; /* To pass back to callouts */
|
||||
const uschar *mark; /* Mark pointer to pass back */
|
||||
} match_data;
|
||||
|
||||
/* A similar structure is used for the same purpose by the DFA matching
|
||||
|
@ -534,6 +534,14 @@ for(;;)
|
||||
}
|
||||
break;
|
||||
|
||||
case OP_MARK:
|
||||
case OP_PRUNE_ARG:
|
||||
case OP_SKIP_ARG:
|
||||
case OP_THEN_ARG:
|
||||
fprintf(f, " %s %s", OP_names[*code], code + 2);
|
||||
extra += code[1];
|
||||
break;
|
||||
|
||||
/* Anything else is just an item with no data*/
|
||||
|
||||
default:
|
||||
|
@ -46,6 +46,7 @@ supporting functions. */
|
||||
|
||||
#include "pcre_internal.h"
|
||||
|
||||
#define SET_BIT(c) start_bits[c/8] |= (1 << (c&7))
|
||||
|
||||
/* Returns from set_start_bits() */
|
||||
|
||||
@ -411,6 +412,15 @@ for (;;)
|
||||
#endif
|
||||
break;
|
||||
|
||||
/* Skip these, but we need to add in the name length. */
|
||||
|
||||
case OP_MARK:
|
||||
case OP_PRUNE_ARG:
|
||||
case OP_SKIP_ARG:
|
||||
case OP_THEN_ARG:
|
||||
cc += _pcre_OP_lengths[op] + cc[1];
|
||||
break;
|
||||
|
||||
/* For the record, these are the opcodes that are matched by "default":
|
||||
OP_ACCEPT, OP_CLOSE, OP_COMMIT, OP_FAIL, OP_PRUNE, OP_SET_SOM, OP_SKIP,
|
||||
OP_THEN. */
|
||||
@ -429,25 +439,121 @@ for (;;)
|
||||
* Set a bit and maybe its alternate case *
|
||||
*************************************************/
|
||||
|
||||
/* Given a character, set its bit in the table, and also the bit for the other
|
||||
version of a letter if we are caseless.
|
||||
/* Given a character, set its first byte's bit in the table, and also the
|
||||
corresponding bit for the other version of a letter if we are caseless. In
|
||||
UTF-8 mode, for characters greater than 127, we can only do the caseless thing
|
||||
when Unicode property support is available.
|
||||
|
||||
Arguments:
|
||||
start_bits points to the bit map
|
||||
c is the character
|
||||
p points to the character
|
||||
caseless the caseless flag
|
||||
cd the block with char table pointers
|
||||
utf8 TRUE for UTF-8 mode
|
||||
|
||||
Returns: nothing
|
||||
Returns: pointer after the character
|
||||
*/
|
||||
|
||||
static const uschar *
|
||||
set_table_bit(uschar *start_bits, const uschar *p, BOOL caseless,
|
||||
compile_data *cd, BOOL utf8)
|
||||
{
|
||||
unsigned int c = *p;
|
||||
|
||||
SET_BIT(c);
|
||||
|
||||
#ifdef SUPPORT_UTF8
|
||||
if (utf8 && c > 127)
|
||||
{
|
||||
GETCHARINC(c, p);
|
||||
#ifdef SUPPORT_UCP
|
||||
if (caseless)
|
||||
{
|
||||
uschar buff[8];
|
||||
c = UCD_OTHERCASE(c);
|
||||
(void)_pcre_ord2utf8(c, buff);
|
||||
SET_BIT(buff[0]);
|
||||
}
|
||||
#endif
|
||||
return p;
|
||||
}
|
||||
#endif
|
||||
|
||||
/* Not UTF-8 mode, or character is less than 127. */
|
||||
|
||||
if (caseless && (cd->ctypes[c] & ctype_letter) != 0) SET_BIT(cd->fcc[c]);
|
||||
return p + 1;
|
||||
}
|
||||
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Set bits for a positive character type *
|
||||
*************************************************/
|
||||
|
||||
/* This function sets starting bits for a character type. In UTF-8 mode, we can
|
||||
only do a direct setting for bytes less than 128, as otherwise there can be
|
||||
confusion with bytes in the middle of UTF-8 characters. In a "traditional"
|
||||
environment, the tables will only recognize ASCII characters anyway, but in at
|
||||
least one Windows environment, some higher bytes bits were set in the tables.
|
||||
So we deal with that case by considering the UTF-8 encoding.
|
||||
|
||||
Arguments:
|
||||
start_bits the starting bitmap
|
||||
cbit type the type of character wanted
|
||||
table_limit 32 for non-UTF-8; 16 for UTF-8
|
||||
cd the block with char table pointers
|
||||
|
||||
Returns: nothing
|
||||
*/
|
||||
|
||||
static void
|
||||
set_table_bit(uschar *start_bits, unsigned int c, BOOL caseless,
|
||||
set_type_bits(uschar *start_bits, int cbit_type, int table_limit,
|
||||
compile_data *cd)
|
||||
{
|
||||
start_bits[c/8] |= (1 << (c&7));
|
||||
if (caseless && (cd->ctypes[c] & ctype_letter) != 0)
|
||||
start_bits[cd->fcc[c]/8] |= (1 << (cd->fcc[c]&7));
|
||||
register int c;
|
||||
for (c = 0; c < table_limit; c++) start_bits[c] |= cd->cbits[c+cbit_type];
|
||||
if (table_limit == 32) return;
|
||||
for (c = 128; c < 256; c++)
|
||||
{
|
||||
if ((cd->cbits[c/8] & (1 << (c&7))) != 0)
|
||||
{
|
||||
uschar buff[8];
|
||||
(void)_pcre_ord2utf8(c, buff);
|
||||
SET_BIT(buff[0]);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Set bits for a negative character type *
|
||||
*************************************************/
|
||||
|
||||
/* This function sets starting bits for a negative character type such as \D.
|
||||
In UTF-8 mode, we can only do a direct setting for bytes less than 128, as
|
||||
otherwise there can be confusion with bytes in the middle of UTF-8 characters.
|
||||
Unlike in the positive case, where we can set appropriate starting bits for
|
||||
specific high-valued UTF-8 characters, in this case we have to set the bits for
|
||||
all high-valued characters. The lowest is 0xc2, but we overkill by starting at
|
||||
0xc0 (192) for simplicity.
|
||||
|
||||
Arguments:
|
||||
start_bits the starting bitmap
|
||||
cbit type the type of character wanted
|
||||
table_limit 32 for non-UTF-8; 16 for UTF-8
|
||||
cd the block with char table pointers
|
||||
|
||||
Returns: nothing
|
||||
*/
|
||||
|
||||
static void
|
||||
set_nottype_bits(uschar *start_bits, int cbit_type, int table_limit,
|
||||
compile_data *cd)
|
||||
{
|
||||
register int c;
|
||||
for (c = 0; c < table_limit; c++) start_bits[c] |= ~cd->cbits[c+cbit_type];
|
||||
if (table_limit != 32) for (c = 24; c < 32; c++) start_bits[c] = 0xff;
|
||||
}
|
||||
|
||||
|
||||
@ -482,6 +588,7 @@ set_start_bits(const uschar *code, uschar *start_bits, BOOL caseless,
|
||||
{
|
||||
register int c;
|
||||
int yield = SSB_DONE;
|
||||
int table_limit = utf8? 16:32;
|
||||
|
||||
#if 0
|
||||
/* ========================================================================= */
|
||||
@ -605,12 +712,7 @@ do
|
||||
case OP_QUERY:
|
||||
case OP_MINQUERY:
|
||||
case OP_POSQUERY:
|
||||
set_table_bit(start_bits, tcode[1], caseless, cd);
|
||||
tcode += 2;
|
||||
#ifdef SUPPORT_UTF8
|
||||
if (utf8 && tcode[-1] >= 0xc0)
|
||||
tcode += _pcre_utf8_table4[tcode[-1] & 0x3f];
|
||||
#endif
|
||||
tcode = set_table_bit(start_bits, tcode + 1, caseless, cd, utf8);
|
||||
break;
|
||||
|
||||
/* Single-char upto sets the bit and tries the next */
|
||||
@ -618,12 +720,7 @@ do
|
||||
case OP_UPTO:
|
||||
case OP_MINUPTO:
|
||||
case OP_POSUPTO:
|
||||
set_table_bit(start_bits, tcode[3], caseless, cd);
|
||||
tcode += 4;
|
||||
#ifdef SUPPORT_UTF8
|
||||
if (utf8 && tcode[-1] >= 0xc0)
|
||||
tcode += _pcre_utf8_table4[tcode[-1] & 0x3f];
|
||||
#endif
|
||||
tcode = set_table_bit(start_bits, tcode + 3, caseless, cd, utf8);
|
||||
break;
|
||||
|
||||
/* At least one single char sets the bit and stops */
|
||||
@ -636,59 +733,86 @@ do
|
||||
case OP_PLUS:
|
||||
case OP_MINPLUS:
|
||||
case OP_POSPLUS:
|
||||
set_table_bit(start_bits, tcode[1], caseless, cd);
|
||||
(void)set_table_bit(start_bits, tcode + 1, caseless, cd, utf8);
|
||||
try_next = FALSE;
|
||||
break;
|
||||
|
||||
/* Single character type sets the bits and stops */
|
||||
/* Special spacing and line-terminating items. These recognize specific
|
||||
lists of characters. The difference between VSPACE and ANYNL is that the
|
||||
latter can match the two-character CRLF sequence, but that is not
|
||||
relevant for finding the first character, so their code here is
|
||||
identical. */
|
||||
|
||||
case OP_HSPACE:
|
||||
SET_BIT(0x09);
|
||||
SET_BIT(0x20);
|
||||
if (utf8)
|
||||
{
|
||||
SET_BIT(0xC2); /* For U+00A0 */
|
||||
SET_BIT(0xE1); /* For U+1680, U+180E */
|
||||
SET_BIT(0xE2); /* For U+2000 - U+200A, U+202F, U+205F */
|
||||
SET_BIT(0xE3); /* For U+3000 */
|
||||
}
|
||||
else SET_BIT(0xA0);
|
||||
try_next = FALSE;
|
||||
break;
|
||||
|
||||
case OP_ANYNL:
|
||||
case OP_VSPACE:
|
||||
SET_BIT(0x0A);
|
||||
SET_BIT(0x0B);
|
||||
SET_BIT(0x0C);
|
||||
SET_BIT(0x0D);
|
||||
if (utf8)
|
||||
{
|
||||
SET_BIT(0xC2); /* For U+0085 */
|
||||
SET_BIT(0xE2); /* For U+2028, U+2029 */
|
||||
}
|
||||
else SET_BIT(0x85);
|
||||
try_next = FALSE;
|
||||
break;
|
||||
|
||||
/* Single character types set the bits and stop. Note that if PCRE_UCP
|
||||
is set, we do not see these op codes because \d etc are converted to
|
||||
properties. Therefore, these apply in the case when only characters less
|
||||
than 256 are recognized to match the types. */
|
||||
|
||||
case OP_NOT_DIGIT:
|
||||
for (c = 0; c < 32; c++)
|
||||
start_bits[c] |= ~cd->cbits[c+cbit_digit];
|
||||
set_nottype_bits(start_bits, cbit_digit, table_limit, cd);
|
||||
try_next = FALSE;
|
||||
break;
|
||||
|
||||
case OP_DIGIT:
|
||||
for (c = 0; c < 32; c++)
|
||||
start_bits[c] |= cd->cbits[c+cbit_digit];
|
||||
set_type_bits(start_bits, cbit_digit, table_limit, cd);
|
||||
try_next = FALSE;
|
||||
break;
|
||||
|
||||
/* The cbit_space table has vertical tab as whitespace; we have to
|
||||
discard it. */
|
||||
ensure it is set as not whitespace. */
|
||||
|
||||
case OP_NOT_WHITESPACE:
|
||||
for (c = 0; c < 32; c++)
|
||||
{
|
||||
int d = cd->cbits[c+cbit_space];
|
||||
if (c == 1) d &= ~0x08;
|
||||
start_bits[c] |= ~d;
|
||||
}
|
||||
set_nottype_bits(start_bits, cbit_space, table_limit, cd);
|
||||
start_bits[1] |= 0x08;
|
||||
try_next = FALSE;
|
||||
break;
|
||||
|
||||
/* The cbit_space table has vertical tab as whitespace; we have to
|
||||
discard it. */
|
||||
not set it from the table. */
|
||||
|
||||
case OP_WHITESPACE:
|
||||
for (c = 0; c < 32; c++)
|
||||
{
|
||||
int d = cd->cbits[c+cbit_space];
|
||||
if (c == 1) d &= ~0x08;
|
||||
start_bits[c] |= d;
|
||||
}
|
||||
c = start_bits[1]; /* Save in case it was already set */
|
||||
set_type_bits(start_bits, cbit_space, table_limit, cd);
|
||||
start_bits[1] = (start_bits[1] & ~0x08) | c;
|
||||
try_next = FALSE;
|
||||
break;
|
||||
|
||||
case OP_NOT_WORDCHAR:
|
||||
for (c = 0; c < 32; c++)
|
||||
start_bits[c] |= ~cd->cbits[c+cbit_word];
|
||||
set_nottype_bits(start_bits, cbit_word, table_limit, cd);
|
||||
try_next = FALSE;
|
||||
break;
|
||||
|
||||
case OP_WORDCHAR:
|
||||
for (c = 0; c < 32; c++)
|
||||
start_bits[c] |= cd->cbits[c+cbit_word];
|
||||
set_type_bits(start_bits, cbit_word, table_limit, cd);
|
||||
try_next = FALSE;
|
||||
break;
|
||||
|
||||
@ -697,6 +821,7 @@ do
|
||||
|
||||
case OP_TYPEPLUS:
|
||||
case OP_TYPEMINPLUS:
|
||||
case OP_TYPEPOSPLUS:
|
||||
tcode++;
|
||||
break;
|
||||
|
||||
@ -720,52 +845,69 @@ do
|
||||
case OP_TYPEPOSQUERY:
|
||||
switch(tcode[1])
|
||||
{
|
||||
default:
|
||||
case OP_ANY:
|
||||
case OP_ALLANY:
|
||||
return SSB_FAIL;
|
||||
|
||||
case OP_HSPACE:
|
||||
SET_BIT(0x09);
|
||||
SET_BIT(0x20);
|
||||
if (utf8)
|
||||
{
|
||||
SET_BIT(0xC2); /* For U+00A0 */
|
||||
SET_BIT(0xE1); /* For U+1680, U+180E */
|
||||
SET_BIT(0xE2); /* For U+2000 - U+200A, U+202F, U+205F */
|
||||
SET_BIT(0xE3); /* For U+3000 */
|
||||
}
|
||||
else SET_BIT(0xA0);
|
||||
break;
|
||||
|
||||
case OP_ANYNL:
|
||||
case OP_VSPACE:
|
||||
SET_BIT(0x0A);
|
||||
SET_BIT(0x0B);
|
||||
SET_BIT(0x0C);
|
||||
SET_BIT(0x0D);
|
||||
if (utf8)
|
||||
{
|
||||
SET_BIT(0xC2); /* For U+0085 */
|
||||
SET_BIT(0xE2); /* For U+2028, U+2029 */
|
||||
}
|
||||
else SET_BIT(0x85);
|
||||
break;
|
||||
|
||||
case OP_NOT_DIGIT:
|
||||
for (c = 0; c < 32; c++)
|
||||
start_bits[c] |= ~cd->cbits[c+cbit_digit];
|
||||
set_nottype_bits(start_bits, cbit_digit, table_limit, cd);
|
||||
break;
|
||||
|
||||
case OP_DIGIT:
|
||||
for (c = 0; c < 32; c++)
|
||||
start_bits[c] |= cd->cbits[c+cbit_digit];
|
||||
set_type_bits(start_bits, cbit_digit, table_limit, cd);
|
||||
break;
|
||||
|
||||
/* The cbit_space table has vertical tab as whitespace; we have to
|
||||
discard it. */
|
||||
ensure it gets set as not whitespace. */
|
||||
|
||||
case OP_NOT_WHITESPACE:
|
||||
for (c = 0; c < 32; c++)
|
||||
{
|
||||
int d = cd->cbits[c+cbit_space];
|
||||
if (c == 1) d &= ~0x08;
|
||||
start_bits[c] |= ~d;
|
||||
}
|
||||
set_nottype_bits(start_bits, cbit_space, table_limit, cd);
|
||||
start_bits[1] |= 0x08;
|
||||
break;
|
||||
|
||||
/* The cbit_space table has vertical tab as whitespace; we have to
|
||||
discard it. */
|
||||
avoid setting it. */
|
||||
|
||||
case OP_WHITESPACE:
|
||||
for (c = 0; c < 32; c++)
|
||||
{
|
||||
int d = cd->cbits[c+cbit_space];
|
||||
if (c == 1) d &= ~0x08;
|
||||
start_bits[c] |= d;
|
||||
}
|
||||
c = start_bits[1]; /* Save in case it was already set */
|
||||
set_type_bits(start_bits, cbit_space, table_limit, cd);
|
||||
start_bits[1] = (start_bits[1] & ~0x08) | c;
|
||||
break;
|
||||
|
||||
case OP_NOT_WORDCHAR:
|
||||
for (c = 0; c < 32; c++)
|
||||
start_bits[c] |= ~cd->cbits[c+cbit_word];
|
||||
set_nottype_bits(start_bits, cbit_word, table_limit, cd);
|
||||
break;
|
||||
|
||||
case OP_WORDCHAR:
|
||||
for (c = 0; c < 32; c++)
|
||||
start_bits[c] |= cd->cbits[c+cbit_word];
|
||||
set_type_bits(start_bits, cbit_word, table_limit, cd);
|
||||
break;
|
||||
}
|
||||
|
||||
|
@ -241,6 +241,10 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
|
||||
#define STRING_Tifinagh0 STR_T STR_i STR_f STR_i STR_n STR_a STR_g STR_h "\0"
|
||||
#define STRING_Ugaritic0 STR_U STR_g STR_a STR_r STR_i STR_t STR_i STR_c "\0"
|
||||
#define STRING_Vai0 STR_V STR_a STR_i "\0"
|
||||
#define STRING_Xan0 STR_X STR_a STR_n "\0"
|
||||
#define STRING_Xps0 STR_X STR_p STR_s "\0"
|
||||
#define STRING_Xsp0 STR_X STR_s STR_p "\0"
|
||||
#define STRING_Xwd0 STR_X STR_w STR_d "\0"
|
||||
#define STRING_Yi0 STR_Y STR_i "\0"
|
||||
#define STRING_Z0 STR_Z "\0"
|
||||
#define STRING_Zl0 STR_Z STR_l "\0"
|
||||
@ -374,6 +378,10 @@ const char _pcre_utt_names[] =
|
||||
STRING_Tifinagh0
|
||||
STRING_Ugaritic0
|
||||
STRING_Vai0
|
||||
STRING_Xan0
|
||||
STRING_Xps0
|
||||
STRING_Xsp0
|
||||
STRING_Xwd0
|
||||
STRING_Yi0
|
||||
STRING_Z0
|
||||
STRING_Zl0
|
||||
@ -507,11 +515,15 @@ const ucp_type_table _pcre_utt[] = {
|
||||
{ 891, PT_SC, ucp_Tifinagh },
|
||||
{ 900, PT_SC, ucp_Ugaritic },
|
||||
{ 909, PT_SC, ucp_Vai },
|
||||
{ 913, PT_SC, ucp_Yi },
|
||||
{ 916, PT_GC, ucp_Z },
|
||||
{ 918, PT_PC, ucp_Zl },
|
||||
{ 921, PT_PC, ucp_Zp },
|
||||
{ 924, PT_PC, ucp_Zs }
|
||||
{ 913, PT_ALNUM, 0 },
|
||||
{ 917, PT_PXSPACE, 0 },
|
||||
{ 921, PT_SPACE, 0 },
|
||||
{ 925, PT_WORD, 0 },
|
||||
{ 929, PT_SC, ucp_Yi },
|
||||
{ 932, PT_GC, ucp_Z },
|
||||
{ 934, PT_PC, ucp_Zl },
|
||||
{ 937, PT_PC, ucp_Zp },
|
||||
{ 940, PT_PC, ucp_Zs }
|
||||
};
|
||||
|
||||
const int _pcre_utt_size = sizeof(_pcre_utt)/sizeof(ucp_type_table);
|
||||
|
@ -6,7 +6,7 @@
|
||||
and semantics are as close as possible to those of the Perl 5 language.
|
||||
|
||||
Written by Philip Hazel
|
||||
Copyright (c) 1997-2009 University of Cambridge
|
||||
Copyright (c) 1997-2010 University of Cambridge
|
||||
|
||||
-----------------------------------------------------------------------------
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
@ -110,12 +110,13 @@ while ((t = *data++) != XCL_END)
|
||||
break;
|
||||
|
||||
case PT_LAMP:
|
||||
if ((prop->chartype == ucp_Lu || prop->chartype == ucp_Ll || prop->chartype == ucp_Lt) ==
|
||||
(t == XCL_PROP)) return !negated;
|
||||
if ((prop->chartype == ucp_Lu || prop->chartype == ucp_Ll ||
|
||||
prop->chartype == ucp_Lt) == (t == XCL_PROP)) return !negated;
|
||||
break;
|
||||
|
||||
case PT_GC:
|
||||
if ((data[1] == _pcre_ucp_gentype[prop->chartype]) == (t == XCL_PROP)) return !negated;
|
||||
if ((data[1] == _pcre_ucp_gentype[prop->chartype]) == (t == XCL_PROP))
|
||||
return !negated;
|
||||
break;
|
||||
|
||||
case PT_PC:
|
||||
@ -126,6 +127,33 @@ while ((t = *data++) != XCL_END)
|
||||
if ((data[1] == prop->script) == (t == XCL_PROP)) return !negated;
|
||||
break;
|
||||
|
||||
case PT_ALNUM:
|
||||
if ((_pcre_ucp_gentype[prop->chartype] == ucp_L ||
|
||||
_pcre_ucp_gentype[prop->chartype] == ucp_N) == (t == XCL_PROP))
|
||||
return !negated;
|
||||
break;
|
||||
|
||||
case PT_SPACE: /* Perl space */
|
||||
if ((_pcre_ucp_gentype[prop->chartype] == ucp_Z ||
|
||||
c == CHAR_HT || c == CHAR_NL || c == CHAR_FF || c == CHAR_CR)
|
||||
== (t == XCL_PROP))
|
||||
return !negated;
|
||||
break;
|
||||
|
||||
case PT_PXSPACE: /* POSIX space */
|
||||
if ((_pcre_ucp_gentype[prop->chartype] == ucp_Z ||
|
||||
c == CHAR_HT || c == CHAR_NL || c == CHAR_VT ||
|
||||
c == CHAR_FF || c == CHAR_CR) == (t == XCL_PROP))
|
||||
return !negated;
|
||||
break;
|
||||
|
||||
case PT_WORD:
|
||||
if ((_pcre_ucp_gentype[prop->chartype] == ucp_L ||
|
||||
_pcre_ucp_gentype[prop->chartype] == ucp_N || c == CHAR_UNDERSCORE)
|
||||
== (t == XCL_PROP))
|
||||
return !negated;
|
||||
break;
|
||||
|
||||
/* This should never occur, but compilers may mutter if there is no
|
||||
default. */
|
||||
|
||||
|
@ -6,7 +6,7 @@
|
||||
and semantics are as close as possible to those of the Perl 5 language.
|
||||
|
||||
Written by Philip Hazel
|
||||
Copyright (c) 1997-2009 University of Cambridge
|
||||
Copyright (c) 1997-2010 University of Cambridge
|
||||
|
||||
-----------------------------------------------------------------------------
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
@ -55,6 +55,11 @@ previously been set. */
|
||||
# define PCREPOSIX_EXP_DEFN __declspec(dllexport)
|
||||
#endif
|
||||
|
||||
/* We include pcre.h before pcre_internal.h so that the PCRE library functions
|
||||
are declared as "import" for Windows by defining PCRE_EXP_DECL as "import".
|
||||
This is needed even though pcre_internal.h itself includes pcre.h, because it
|
||||
does so after it has set PCRE_EXP_DECL to "export" if it is not already set. */
|
||||
|
||||
#include "pcre.h"
|
||||
#include "pcre_internal.h"
|
||||
#include "pcreposix.h"
|
||||
@ -133,7 +138,7 @@ static const int eint[] = {
|
||||
REG_INVARG, /* inconsistent NEWLINE options */
|
||||
REG_BADPAT, /* \g is not followed followed by an (optionally braced) non-zero number */
|
||||
REG_BADPAT, /* a numbered reference must not be zero */
|
||||
REG_BADPAT, /* (*VERB) with an argument is not supported */
|
||||
REG_BADPAT, /* an argument is not allowed for (*ACCEPT), (*FAIL), or (*COMMIT) */
|
||||
/* 60 */
|
||||
REG_BADPAT, /* (*VERB) not recognized */
|
||||
REG_BADPAT, /* number is too big */
|
||||
@ -141,7 +146,9 @@ static const int eint[] = {
|
||||
REG_BADPAT, /* digit expected after (?+ */
|
||||
REG_BADPAT, /* ] is an invalid data character in JavaScript compatibility mode */
|
||||
/* 65 */
|
||||
REG_BADPAT /* different names for subpatterns of the same number are not allowed */
|
||||
REG_BADPAT, /* different names for subpatterns of the same number are not allowed */
|
||||
REG_BADPAT, /* (*MARK) must have an argument */
|
||||
REG_INVARG, /* this version of PCRE is not compiled with PCRE_UCP support */
|
||||
};
|
||||
|
||||
/* Table of texts corresponding to POSIX error codes */
|
||||
@ -245,6 +252,7 @@ if ((cflags & REG_NEWLINE) != 0) options |= PCRE_MULTILINE;
|
||||
if ((cflags & REG_DOTALL) != 0) options |= PCRE_DOTALL;
|
||||
if ((cflags & REG_NOSUB) != 0) options |= PCRE_NO_AUTO_CAPTURE;
|
||||
if ((cflags & REG_UTF8) != 0) options |= PCRE_UTF8;
|
||||
if ((cflags & REG_UCP) != 0) options |= PCRE_UCP;
|
||||
if ((cflags & REG_UNGREEDY) != 0) options |= PCRE_UNGREEDY;
|
||||
|
||||
preg->re_pcre = pcre_compile2(pattern, options, &errorcode, &errorptr,
|
||||
@ -334,13 +342,13 @@ if ((eflags & REG_STARTEND) != 0)
|
||||
else
|
||||
{
|
||||
so = 0;
|
||||
eo = strlen(string);
|
||||
eo = (int)strlen(string);
|
||||
}
|
||||
|
||||
rc = pcre_exec((const pcre *)preg->re_pcre, NULL, string + so, (eo - so),
|
||||
0, options, ovector, nmatch * 3);
|
||||
0, options, ovector, (int)(nmatch * 3));
|
||||
|
||||
if (rc == 0) rc = nmatch; /* All captured slots were filled in */
|
||||
if (rc == 0) rc = (int)nmatch; /* All captured slots were filled in */
|
||||
|
||||
/* Successful match */
|
||||
|
||||
|
@ -62,6 +62,7 @@ extern "C" {
|
||||
#define REG_STARTEND 0x0080 /* BSD feature: pass subject string by so,eo */
|
||||
#define REG_NOTEMPTY 0x0100 /* NOT defined by POSIX; maps to PCRE_NOTEMPTY */
|
||||
#define REG_UNGREEDY 0x0200 /* NOT defined by POSIX; maps to PCRE_UNGREEDY */
|
||||
#define REG_UCP 0x0400 /* NOT defined by POSIX; maps to PCRE_UCP */
|
||||
|
||||
/* This is not used by PCRE, but by defining it we make it easier
|
||||
to slot PCRE into existing programs that make POSIX calls. */
|
||||
|
13
ext/pcre/pcrelib/testdata/testinput10
vendored
13
ext/pcre/pcrelib/testdata/testinput10
vendored
@ -1,7 +1,8 @@
|
||||
/-- These are a few representative patterns whose lengths and offsets are to be
|
||||
shown when the link size is 2. This is just a doublecheck test to ensure the
|
||||
sizes don't go horribly wrong when something is changed. The pattern contents
|
||||
are all themselves checked in other tests. --/
|
||||
are all themselves checked in other tests. Unicode, including property support,
|
||||
is required for these tests. --/
|
||||
|
||||
/((?i)b)/BM
|
||||
|
||||
@ -121,4 +122,14 @@ are all themselves checked in other tests. --/
|
||||
|
||||
/[^\xaa]/8BM
|
||||
|
||||
/[^\d]/8WB
|
||||
|
||||
/[[:^alpha:][:^cntrl:]]+/8WB
|
||||
|
||||
/[[:^cntrl:][:^alpha:]]+/8WB
|
||||
|
||||
/[[:alpha:]]+/8WB
|
||||
|
||||
/[[:^alpha:]\S]+/8WB
|
||||
|
||||
/-- End of testinput10 --/
|
||||
|
271
ext/pcre/pcrelib/testdata/testinput2
vendored
271
ext/pcre/pcrelib/testdata/testinput2
vendored
@ -2,12 +2,12 @@
|
||||
of PCRE's API, error diagnostics, and the compiled code of some patterns.
|
||||
It also checks the non-Perl syntax the PCRE supports (Python, .NET,
|
||||
Oniguruma). Finally, there are some tests where PCRE and Perl differ,
|
||||
either because PCRE can't be compatible, or there is potential Perl
|
||||
either because PCRE can't be compatible, or there is a possible Perl
|
||||
bug. --/
|
||||
|
||||
/-- Originally, the Perl 5.10 things were in here too, but now I have separated
|
||||
many (most?) of them out into test 11. However, there may still be some
|
||||
that were overlooked. --/
|
||||
/-- Originally, the Perl 5.10 and 5.11 things were in here too, but now I have
|
||||
separated many (most?) of them out into test 11. However, there may still
|
||||
be some that were overlooked. --/
|
||||
|
||||
/(a)b|/I
|
||||
|
||||
@ -51,6 +51,16 @@
|
||||
|
||||
/(?X)[\B]/
|
||||
|
||||
/(?X)[\R]/
|
||||
|
||||
/(?X)[\X]/
|
||||
|
||||
/[\B]/BZ
|
||||
|
||||
/[\R]/BZ
|
||||
|
||||
/[\X]/BZ
|
||||
|
||||
/[z-a]/
|
||||
|
||||
/^*/
|
||||
@ -2279,8 +2289,6 @@ a random value. /Ix
|
||||
/a+b?(*THEN)c+(*FAIL)/C
|
||||
aaabccc
|
||||
|
||||
/a(*PRUNE:XXX)b/
|
||||
|
||||
/a(*MARK)b/
|
||||
|
||||
/(?i:A{1,}\6666666666)/
|
||||
@ -3232,4 +3240,255 @@ a random value. /Ix
|
||||
|
||||
/(?P<L1>(?P<L2>0|)|(?P>L2)(?P>L1))/
|
||||
|
||||
/abc(*MARK:)pqr/
|
||||
|
||||
/abc(*:)pqr/
|
||||
|
||||
/abc(*FAIL:123)xyz/
|
||||
|
||||
/--- This should, and does, fail. In Perl, it does not, which I think is a
|
||||
bug because replacing the B in the pattern by (B|D) does make it fail. ---/
|
||||
|
||||
/A(*COMMIT)B/+K
|
||||
ACABX
|
||||
|
||||
/--- These should be different, but in Perl 5.11 are not, which I think
|
||||
is a bug in Perl. ---/
|
||||
|
||||
/A(*THEN)B|A(*THEN)C/K
|
||||
AC
|
||||
|
||||
/A(*PRUNE)B|A(*PRUNE)C/K
|
||||
AC
|
||||
|
||||
/--- A whole lot of tests of verbs with arguments are here rather than in test
|
||||
11 because Perl doesn't seem to follow its specification entirely
|
||||
correctly. ---/
|
||||
|
||||
/--- Perl 5.11 sets $REGERROR on the AC failure case here; PCRE does not. It is
|
||||
not clear how Perl defines "involved in the failure of the match". ---/
|
||||
|
||||
/^(A(*THEN:A)B|C(*THEN:B)D)/K
|
||||
AB
|
||||
CD
|
||||
** Failers
|
||||
AC
|
||||
CB
|
||||
|
||||
/--- Check the use of names for success and failure. PCRE doesn't show these
|
||||
names for success, though Perl does, contrary to its spec. ---/
|
||||
|
||||
/^(A(*PRUNE:A)B|C(*PRUNE:B)D)/K
|
||||
AB
|
||||
CD
|
||||
** Failers
|
||||
AC
|
||||
CB
|
||||
|
||||
/--- An empty name does not pass back an empty string. It is the same as if no
|
||||
name were given. ---/
|
||||
|
||||
/^(A(*PRUNE:)B|C(*PRUNE:B)D)/K
|
||||
AB
|
||||
CD
|
||||
|
||||
/--- PRUNE goes to next bumpalong; COMMIT does not. ---/
|
||||
|
||||
/A(*PRUNE:A)B/K
|
||||
ACAB
|
||||
|
||||
/(*MARK:A)(*PRUNE:B)(C|X)/K
|
||||
C
|
||||
D
|
||||
|
||||
/(*MARK:A)(*THEN:B)(C|X)/K
|
||||
C
|
||||
D
|
||||
|
||||
/--- This should fail, as the skip causes a bump to offset 3 (the skip) ---/
|
||||
|
||||
/A(*MARK:A)A+(*SKIP)(B|Z) | AC/xK
|
||||
AAAC
|
||||
|
||||
/--- Same --/
|
||||
|
||||
/A(*MARK:A)A+(*MARK:B)(*SKIP:B)(B|Z) | AC/xK
|
||||
AAAC
|
||||
|
||||
/--- This should fail; the SKIP advances by one, but when we get to AC, the
|
||||
PRUNE kills it. ---/
|
||||
|
||||
/A(*PRUNE:A)A+(*SKIP:A)(B|Z) | AC/xK
|
||||
AAAC
|
||||
|
||||
/A(*:A)A+(*SKIP)(B|Z) | AC/xK
|
||||
AAAC
|
||||
|
||||
/--- This should fail, as a null name is the same as no name ---/
|
||||
|
||||
/A(*MARK:A)A+(*SKIP:)(B|Z) | AC/xK
|
||||
AAAC
|
||||
|
||||
/--- This fails in PCRE, and I think that is in accordance with Perl's
|
||||
documentation, though in Perl it succeeds. ---/
|
||||
|
||||
/A(*MARK:A)A+(*SKIP:B)(B|Z) | AAC/xK
|
||||
AAAC
|
||||
|
||||
/--- Mark names can be duplicated ---/
|
||||
|
||||
/A(*:A)B|X(*:A)Y/K
|
||||
AABC
|
||||
XXYZ
|
||||
|
||||
/^A(*:A)B|^X(*:A)Y/K
|
||||
** Failers
|
||||
XAQQ
|
||||
|
||||
/--- A check on what happens after hitting a mark and them bumping along to
|
||||
something that does not even start. Perl reports tags after the failures here,
|
||||
though it does not when the individual letters are made into something
|
||||
more complicated. ---/
|
||||
|
||||
/A(*:A)B|XX(*:B)Y/K
|
||||
AABC
|
||||
XXYZ
|
||||
** Failers
|
||||
XAQQ
|
||||
XAQQXZZ
|
||||
AXQQQ
|
||||
AXXQQQ
|
||||
|
||||
/--- COMMIT at the start of a pattern should be the same as an anchor. Perl
|
||||
optimizations defeat this. So does the PCRE optimization unless we disable it
|
||||
with \Y. ---/
|
||||
|
||||
/(*COMMIT)ABC/
|
||||
ABCDEFG
|
||||
** Failers
|
||||
DEFGABC\Y
|
||||
|
||||
/--- Repeat some tests with added studying. ---/
|
||||
|
||||
/A(*COMMIT)B/+KS
|
||||
ACABX
|
||||
|
||||
/A(*THEN)B|A(*THEN)C/KS
|
||||
AC
|
||||
|
||||
/A(*PRUNE)B|A(*PRUNE)C/KS
|
||||
AC
|
||||
|
||||
/^(A(*THEN:A)B|C(*THEN:B)D)/KS
|
||||
AB
|
||||
CD
|
||||
** Failers
|
||||
AC
|
||||
CB
|
||||
|
||||
/^(A(*PRUNE:A)B|C(*PRUNE:B)D)/KS
|
||||
AB
|
||||
CD
|
||||
** Failers
|
||||
AC
|
||||
CB
|
||||
|
||||
/^(A(*PRUNE:)B|C(*PRUNE:B)D)/KS
|
||||
AB
|
||||
CD
|
||||
|
||||
/A(*PRUNE:A)B/KS
|
||||
ACAB
|
||||
|
||||
/(*MARK:A)(*PRUNE:B)(C|X)/KS
|
||||
C
|
||||
D
|
||||
|
||||
/(*MARK:A)(*THEN:B)(C|X)/KS
|
||||
C
|
||||
D
|
||||
|
||||
/A(*MARK:A)A+(*SKIP)(B|Z) | AC/xKS
|
||||
AAAC
|
||||
|
||||
/A(*MARK:A)A+(*MARK:B)(*SKIP:B)(B|Z) | AC/xKS
|
||||
AAAC
|
||||
|
||||
/A(*PRUNE:A)A+(*SKIP:A)(B|Z) | AC/xKS
|
||||
AAAC
|
||||
|
||||
/A(*:A)A+(*SKIP)(B|Z) | AC/xKS
|
||||
AAAC
|
||||
|
||||
/A(*MARK:A)A+(*SKIP:)(B|Z) | AC/xKS
|
||||
AAAC
|
||||
|
||||
/A(*MARK:A)A+(*SKIP:B)(B|Z) | AAC/xKS
|
||||
AAAC
|
||||
|
||||
/A(*:A)B|XX(*:B)Y/KS
|
||||
AABC
|
||||
XXYZ
|
||||
** Failers
|
||||
XAQQ
|
||||
XAQQXZZ
|
||||
AXQQQ
|
||||
AXXQQQ
|
||||
|
||||
/(*COMMIT)ABC/
|
||||
ABCDEFG
|
||||
** Failers
|
||||
DEFGABC\Y
|
||||
|
||||
/^(ab (c+(*THEN)cd) | xyz)/x
|
||||
abcccd
|
||||
|
||||
/^(ab (c+(*PRUNE)cd) | xyz)/x
|
||||
abcccd
|
||||
|
||||
/^(ab (c+(*FAIL)cd) | xyz)/x
|
||||
abcccd
|
||||
|
||||
/--- Perl 5.11 gets some of these wrong ---/
|
||||
|
||||
/(?>.(*ACCEPT))*?5/
|
||||
abcde
|
||||
|
||||
/(.(*ACCEPT))*?5/
|
||||
abcde
|
||||
|
||||
/(.(*ACCEPT))5/
|
||||
abcde
|
||||
|
||||
/(.(*ACCEPT))*5/
|
||||
abcde
|
||||
|
||||
/A\NB./BZ
|
||||
ACBD
|
||||
** Failers
|
||||
A\nB
|
||||
ACB\n
|
||||
|
||||
/A\NB./sBZ
|
||||
ACBD
|
||||
ACB\n
|
||||
** Failers
|
||||
A\nB
|
||||
|
||||
/A\NB/<crlf>
|
||||
A\nB
|
||||
A\rB
|
||||
** Failers
|
||||
A\r\nB
|
||||
|
||||
/\R+b/BZ
|
||||
|
||||
/\R+\n/BZ
|
||||
|
||||
/\R+\d/BZ
|
||||
|
||||
/\d*\R/BZ
|
||||
|
||||
/\s*\R/BZ
|
||||
|
||||
/-- End of testinput2 --/
|
||||
|
49
ext/pcre/pcrelib/testdata/testinput5
vendored
49
ext/pcre/pcrelib/testdata/testinput5
vendored
@ -745,4 +745,53 @@ can't tell the difference.) --/
|
||||
/X\W{3}X/8
|
||||
\PX
|
||||
|
||||
/\h/SI
|
||||
|
||||
/\h/SI8
|
||||
ABC\x{09}
|
||||
ABC\x{20}
|
||||
ABC\x{a0}
|
||||
ABC\x{1680}
|
||||
ABC\x{180e}
|
||||
ABC\x{2000}
|
||||
ABC\x{202f}
|
||||
ABC\x{205f}
|
||||
ABC\x{3000}
|
||||
|
||||
/\v/SI
|
||||
|
||||
/\v/SI8
|
||||
ABC\x{0a}
|
||||
ABC\x{0b}
|
||||
ABC\x{0c}
|
||||
ABC\x{0d}
|
||||
ABC\x{85}
|
||||
ABC\x{2028}
|
||||
|
||||
/\R/SI
|
||||
|
||||
/\R/SI8
|
||||
|
||||
/\h*A/SI8
|
||||
CDBABC
|
||||
|
||||
/\v+A/SI8
|
||||
|
||||
/\s?xxx\s/8SI
|
||||
|
||||
/\sxxx\s/8T1
|
||||
AB\x{85}xxx\x{a0}XYZ
|
||||
AB\x{a0}xxx\x{85}XYZ
|
||||
|
||||
/\sxxx\s/I8ST1
|
||||
AB\x{85}xxx\x{a0}XYZ
|
||||
AB\x{a0}xxx\x{85}XYZ
|
||||
|
||||
/\S \S/8T1
|
||||
\x{a2} \x{84}
|
||||
|
||||
/\S \S/I8ST1
|
||||
\x{a2} \x{84}
|
||||
A Z
|
||||
|
||||
/-- End of testinput5 --/
|
||||
|
50
ext/pcre/pcrelib/testdata/testinput6
vendored
50
ext/pcre/pcrelib/testdata/testinput6
vendored
@ -752,4 +752,54 @@
|
||||
/\p{Avestan}\p{Bamum}\p{Egyptian_Hieroglyphs}\p{Imperial_Aramaic}\p{Inscriptional_Pahlavi}\p{Inscriptional_Parthian}\p{Javanese}\p{Kaithi}\p{Lisu}\p{Meetei_Mayek}\p{Old_South_Arabian}\p{Old_Turkic}\p{Samaritan}\p{Tai_Tham}\p{Tai_Viet}/8
|
||||
\x{10b00}\x{a6ef}\x{13007}\x{10857}\x{10b78}\x{10b58}\x{a980}\x{110c1}\x{a4ff}\x{abc0}\x{10a7d}\x{10c48}\x{0800}\x{1aad}\x{aac0}
|
||||
|
||||
/^\w+/8W
|
||||
Az_\x{aa}\x{c0}\x{1c5}\x{2b0}\x{3b6}\x{1d7c9}\x{2fa1d}1\x{660}\x{bef}\x{16ee}
|
||||
|
||||
/^[[:xdigit:]]*/8W
|
||||
1a\x{660}\x{bef}\x{16ee}
|
||||
|
||||
/^\d+/8W
|
||||
1\x{660}\x{bef}\x{16ee}
|
||||
|
||||
/^[[:digit:]]+/8W
|
||||
1\x{660}\x{bef}\x{16ee}
|
||||
|
||||
/^>\s+/8W
|
||||
>\x{20}\x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{9}\x{b}
|
||||
|
||||
/^>\pZ+/8W
|
||||
>\x{20}\x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{9}\x{b}
|
||||
|
||||
/^>[[:space:]]*/8W
|
||||
>\x{20}\x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{9}\x{b}
|
||||
|
||||
/^>[[:blank:]]*/8W
|
||||
>\x{20}\x{a0}\x{1680}\x{180e}\x{2000}\x{202f}\x{9}\x{b}\x{2028}
|
||||
|
||||
/^[[:alpha:]]*/8W
|
||||
Az\x{aa}\x{c0}\x{1c5}\x{2b0}\x{3b6}\x{1d7c9}\x{2fa1d}
|
||||
|
||||
/^[[:alnum:]]*/8W
|
||||
Az\x{aa}\x{c0}\x{1c5}\x{2b0}\x{3b6}\x{1d7c9}\x{2fa1d}1\x{660}\x{bef}\x{16ee}
|
||||
|
||||
/^[[:cntrl:]]*/8W
|
||||
\x{0}\x{09}\x{1f}\x{7f}\x{9f}
|
||||
|
||||
/^[[:graph:]]*/8W
|
||||
A\x{a1}\x{a0}
|
||||
|
||||
/^[[:print:]]*/8W
|
||||
A z\x{a0}\x{a1}
|
||||
|
||||
/^[[:punct:]]*/8W
|
||||
.+\x{a1}\x{a0}
|
||||
|
||||
/\p{Zs}*?\R/
|
||||
** Failers
|
||||
a\xFCb
|
||||
|
||||
/\p{Zs}*\R/
|
||||
** Failers
|
||||
a\xFCb
|
||||
|
||||
/-- End of testinput6 --/
|
||||
|
139
ext/pcre/pcrelib/testdata/testinput9
vendored
139
ext/pcre/pcrelib/testdata/testinput9
vendored
@ -847,4 +847,143 @@
|
||||
** Failers
|
||||
\x{1d79}\x{a77d}
|
||||
|
||||
/^\p{Xan}/8
|
||||
ABCD
|
||||
1234
|
||||
\x{6ca}
|
||||
\x{a6c}
|
||||
\x{10a7}
|
||||
** Failers
|
||||
_ABC
|
||||
|
||||
/^\p{Xan}+/8
|
||||
ABCD1234\x{6ca}\x{a6c}\x{10a7}_
|
||||
** Failers
|
||||
_ABC
|
||||
|
||||
/^\p{Xan}*/8
|
||||
ABCD1234\x{6ca}\x{a6c}\x{10a7}_
|
||||
|
||||
/^\p{Xan}{2,9}/8
|
||||
ABCD1234\x{6ca}\x{a6c}\x{10a7}_
|
||||
|
||||
/^[\p{Xan}]/8
|
||||
ABCD1234_
|
||||
1234abcd_
|
||||
\x{6ca}
|
||||
\x{a6c}
|
||||
\x{10a7}
|
||||
** Failers
|
||||
_ABC
|
||||
|
||||
/^[\p{Xan}]+/8
|
||||
ABCD1234\x{6ca}\x{a6c}\x{10a7}_
|
||||
** Failers
|
||||
_ABC
|
||||
|
||||
/^>\p{Xsp}/8
|
||||
>\x{1680}\x{2028}\x{0b}
|
||||
** Failers
|
||||
\x{0b}
|
||||
|
||||
/^>\p{Xsp}+/8
|
||||
> \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
|
||||
|
||||
/^>\p{Xsp}*/8
|
||||
> \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
|
||||
|
||||
/^>\p{Xsp}{2,9}/8
|
||||
> \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
|
||||
|
||||
/^>[\p{Xsp}]/8
|
||||
>\x{2028}\x{0b}
|
||||
|
||||
/^>[\p{Xsp}]+/8
|
||||
> \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
|
||||
|
||||
/^>\p{Xps}/8
|
||||
>\x{1680}\x{2028}\x{0b}
|
||||
>\x{a0}
|
||||
** Failers
|
||||
\x{0b}
|
||||
|
||||
/^>\p{Xps}+/8
|
||||
> \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
|
||||
|
||||
/^>\p{Xps}+?/8
|
||||
>\x{1680}\x{2028}\x{0b}
|
||||
|
||||
/^>\p{Xps}*/8
|
||||
> \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
|
||||
|
||||
/^>\p{Xps}{2,9}/8
|
||||
> \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
|
||||
|
||||
/^>\p{Xps}{2,9}?/8
|
||||
> \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
|
||||
|
||||
/^>[\p{Xps}]/8
|
||||
>\x{2028}\x{0b}
|
||||
|
||||
/^>[\p{Xps}]+/8
|
||||
> \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
|
||||
|
||||
/^\p{Xwd}/8
|
||||
ABCD
|
||||
1234
|
||||
\x{6ca}
|
||||
\x{a6c}
|
||||
\x{10a7}
|
||||
_ABC
|
||||
** Failers
|
||||
[]
|
||||
|
||||
/^\p{Xwd}+/8
|
||||
ABCD1234\x{6ca}\x{a6c}\x{10a7}_
|
||||
|
||||
/^\p{Xwd}*/8
|
||||
ABCD1234\x{6ca}\x{a6c}\x{10a7}_
|
||||
|
||||
/^\p{Xwd}{2,9}/8
|
||||
A_12\x{6ca}\x{a6c}\x{10a7}
|
||||
|
||||
/^[\p{Xwd}]/8
|
||||
ABCD1234_
|
||||
1234abcd_
|
||||
\x{6ca}
|
||||
\x{a6c}
|
||||
\x{10a7}
|
||||
_ABC
|
||||
** Failers
|
||||
[]
|
||||
|
||||
/^[\p{Xwd}]+/8
|
||||
ABCD1234\x{6ca}\x{a6c}\x{10a7}_
|
||||
|
||||
/-- Unicode properties for \b abd \B --/
|
||||
|
||||
/\b...\B/8W
|
||||
abc_
|
||||
\x{37e}abc\x{376}
|
||||
\x{37e}\x{376}\x{371}\x{393}\x{394}
|
||||
!\x{c0}++\x{c1}\x{c2}
|
||||
!\x{c0}+++++
|
||||
|
||||
/-- Without PCRE_UCP, non-ASCII always fail, even if < 256 --/
|
||||
|
||||
/\b...\B/8
|
||||
abc_
|
||||
** Failers
|
||||
\x{37e}abc\x{376}
|
||||
\x{37e}\x{376}\x{371}\x{393}\x{394}
|
||||
!\x{c0}++\x{c1}\x{c2}
|
||||
!\x{c0}+++++
|
||||
|
||||
/-- With PCRE_UCP, non-UTF8 chars that are < 256 still check properties --/
|
||||
|
||||
/\b...\B/W
|
||||
abc_
|
||||
!\x{c0}++\x{c1}\x{c2}
|
||||
!\x{c0}+++++
|
||||
|
||||
/-- End of testinput9 --/
|
||||
|
43
ext/pcre/pcrelib/testdata/testoutput10
vendored
43
ext/pcre/pcrelib/testdata/testoutput10
vendored
@ -1,7 +1,8 @@
|
||||
/-- These are a few representative patterns whose lengths and offsets are to be
|
||||
shown when the link size is 2. This is just a doublecheck test to ensure the
|
||||
sizes don't go horribly wrong when something is changed. The pattern contents
|
||||
are all themselves checked in other tests. --/
|
||||
are all themselves checked in other tests. Unicode, including property support,
|
||||
is required for these tests. --/
|
||||
|
||||
/((?i)b)/BM
|
||||
Memory allocation (code space): 21
|
||||
@ -666,4 +667,44 @@ Memory allocation (code space): 40
|
||||
39 End
|
||||
------------------------------------------------------------------
|
||||
|
||||
/[^\d]/8WB
|
||||
------------------------------------------------------------------
|
||||
0 11 Bra
|
||||
3 [^\p{Nd}]
|
||||
11 11 Ket
|
||||
14 End
|
||||
------------------------------------------------------------------
|
||||
|
||||
/[[:^alpha:][:^cntrl:]]+/8WB
|
||||
------------------------------------------------------------------
|
||||
0 44 Bra
|
||||
3 [ -~\x80-\xff\P{L}]+
|
||||
44 44 Ket
|
||||
47 End
|
||||
------------------------------------------------------------------
|
||||
|
||||
/[[:^cntrl:][:^alpha:]]+/8WB
|
||||
------------------------------------------------------------------
|
||||
0 44 Bra
|
||||
3 [ -~\x80-\xff\P{L}]+
|
||||
44 44 Ket
|
||||
47 End
|
||||
------------------------------------------------------------------
|
||||
|
||||
/[[:alpha:]]+/8WB
|
||||
------------------------------------------------------------------
|
||||
0 12 Bra
|
||||
3 [\p{L}]+
|
||||
12 12 Ket
|
||||
15 End
|
||||
------------------------------------------------------------------
|
||||
|
||||
/[[:^alpha:]\S]+/8WB
|
||||
------------------------------------------------------------------
|
||||
0 15 Bra
|
||||
3 [\P{L}\P{Xsp}]+
|
||||
15 15 Ket
|
||||
18 End
|
||||
------------------------------------------------------------------
|
||||
|
||||
/-- End of testinput10 --/
|
||||
|
484
ext/pcre/pcrelib/testdata/testoutput2
vendored
484
ext/pcre/pcrelib/testdata/testoutput2
vendored
@ -2,12 +2,12 @@
|
||||
of PCRE's API, error diagnostics, and the compiled code of some patterns.
|
||||
It also checks the non-Perl syntax the PCRE supports (Python, .NET,
|
||||
Oniguruma). Finally, there are some tests where PCRE and Perl differ,
|
||||
either because PCRE can't be compatible, or there is potential Perl
|
||||
either because PCRE can't be compatible, or there is a possible Perl
|
||||
bug. --/
|
||||
|
||||
/-- Originally, the Perl 5.10 things were in here too, but now I have separated
|
||||
many (most?) of them out into test 11. However, there may still be some
|
||||
that were overlooked. --/
|
||||
/-- Originally, the Perl 5.10 and 5.11 things were in here too, but now I have
|
||||
separated many (most?) of them out into test 11. However, there may still
|
||||
be some that were overlooked. --/
|
||||
|
||||
/(a)b|/I
|
||||
Capturing subpattern count = 1
|
||||
@ -103,6 +103,36 @@ Failed: missing terminating ] for character class at offset 5
|
||||
/(?X)[\B]/
|
||||
Failed: invalid escape sequence in character class at offset 6
|
||||
|
||||
/(?X)[\R]/
|
||||
Failed: invalid escape sequence in character class at offset 6
|
||||
|
||||
/(?X)[\X]/
|
||||
Failed: invalid escape sequence in character class at offset 6
|
||||
|
||||
/[\B]/BZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
B
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
|
||||
/[\R]/BZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
R
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
|
||||
/[\X]/BZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
X
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
|
||||
/[z-a]/
|
||||
Failed: range out of order in character class at offset 3
|
||||
|
||||
@ -3198,19 +3228,19 @@ Failed: POSIX collating elements are not supported at offset 0
|
||||
Failed: POSIX named classes are supported only within a class at offset 0
|
||||
|
||||
/\l/I
|
||||
Failed: PCRE does not support \L, \l, \N, \U, or \u at offset 1
|
||||
Failed: PCRE does not support \L, \l, \N{name}, \U, or \u at offset 1
|
||||
|
||||
/\L/I
|
||||
Failed: PCRE does not support \L, \l, \N, \U, or \u at offset 1
|
||||
Failed: PCRE does not support \L, \l, \N{name}, \U, or \u at offset 1
|
||||
|
||||
/\N{name}/I
|
||||
Failed: PCRE does not support \L, \l, \N, \U, or \u at offset 1
|
||||
Failed: PCRE does not support \L, \l, \N{name}, \U, or \u at offset 1
|
||||
|
||||
/\u/I
|
||||
Failed: PCRE does not support \L, \l, \N, \U, or \u at offset 1
|
||||
Failed: PCRE does not support \L, \l, \N{name}, \U, or \u at offset 1
|
||||
|
||||
/\U/I
|
||||
Failed: PCRE does not support \L, \l, \N, \U, or \u at offset 1
|
||||
Failed: PCRE does not support \L, \l, \N{name}, \U, or \u at offset 1
|
||||
|
||||
/[/I
|
||||
Failed: missing terminating ] for character class at offset 1
|
||||
@ -8667,11 +8697,8 @@ No match
|
||||
+13 ^ ^ (*FAIL)
|
||||
No match
|
||||
|
||||
/a(*PRUNE:XXX)b/
|
||||
Failed: (*VERB) with an argument is not supported at offset 8
|
||||
|
||||
/a(*MARK)b/
|
||||
Failed: (*VERB) not recognized at offset 7
|
||||
Failed: (*MARK) must have an argument at offset 7
|
||||
|
||||
/(?i:A{1,}\6666666666)/
|
||||
Failed: number is too big at offset 19
|
||||
@ -10668,4 +10695,435 @@ No match
|
||||
/(?P<L1>(?P<L2>0|)|(?P>L2)(?P>L1))/
|
||||
Failed: recursive call could loop indefinitely at offset 31
|
||||
|
||||
/abc(*MARK:)pqr/
|
||||
Failed: (*MARK) must have an argument at offset 10
|
||||
|
||||
/abc(*:)pqr/
|
||||
Failed: (*MARK) must have an argument at offset 6
|
||||
|
||||
/abc(*FAIL:123)xyz/
|
||||
Failed: an argument is not allowed for (*ACCEPT), (*FAIL), or (*COMMIT) at offset 13
|
||||
|
||||
/--- This should, and does, fail. In Perl, it does not, which I think is a
|
||||
bug because replacing the B in the pattern by (B|D) does make it fail. ---/
|
||||
|
||||
/A(*COMMIT)B/+K
|
||||
ACABX
|
||||
No match
|
||||
|
||||
/--- These should be different, but in Perl 5.11 are not, which I think
|
||||
is a bug in Perl. ---/
|
||||
|
||||
/A(*THEN)B|A(*THEN)C/K
|
||||
AC
|
||||
0: AC
|
||||
|
||||
/A(*PRUNE)B|A(*PRUNE)C/K
|
||||
AC
|
||||
No match
|
||||
|
||||
/--- A whole lot of tests of verbs with arguments are here rather than in test
|
||||
11 because Perl doesn't seem to follow its specification entirely
|
||||
correctly. ---/
|
||||
|
||||
/--- Perl 5.11 sets $REGERROR on the AC failure case here; PCRE does not. It is
|
||||
not clear how Perl defines "involved in the failure of the match". ---/
|
||||
|
||||
/^(A(*THEN:A)B|C(*THEN:B)D)/K
|
||||
AB
|
||||
0: AB
|
||||
1: AB
|
||||
CD
|
||||
0: CD
|
||||
1: CD
|
||||
** Failers
|
||||
No match
|
||||
AC
|
||||
No match
|
||||
CB
|
||||
No match, mark = B
|
||||
|
||||
/--- Check the use of names for success and failure. PCRE doesn't show these
|
||||
names for success, though Perl does, contrary to its spec. ---/
|
||||
|
||||
/^(A(*PRUNE:A)B|C(*PRUNE:B)D)/K
|
||||
AB
|
||||
0: AB
|
||||
1: AB
|
||||
CD
|
||||
0: CD
|
||||
1: CD
|
||||
** Failers
|
||||
No match
|
||||
AC
|
||||
No match, mark = A
|
||||
CB
|
||||
No match, mark = B
|
||||
|
||||
/--- An empty name does not pass back an empty string. It is the same as if no
|
||||
name were given. ---/
|
||||
|
||||
/^(A(*PRUNE:)B|C(*PRUNE:B)D)/K
|
||||
AB
|
||||
0: AB
|
||||
1: AB
|
||||
CD
|
||||
0: CD
|
||||
1: CD
|
||||
|
||||
/--- PRUNE goes to next bumpalong; COMMIT does not. ---/
|
||||
|
||||
/A(*PRUNE:A)B/K
|
||||
ACAB
|
||||
0: AB
|
||||
|
||||
/(*MARK:A)(*PRUNE:B)(C|X)/K
|
||||
C
|
||||
0: C
|
||||
1: C
|
||||
MK: A
|
||||
D
|
||||
No match, mark = B
|
||||
|
||||
/(*MARK:A)(*THEN:B)(C|X)/K
|
||||
C
|
||||
0: C
|
||||
1: C
|
||||
MK: A
|
||||
D
|
||||
No match, mark = B
|
||||
|
||||
/--- This should fail, as the skip causes a bump to offset 3 (the skip) ---/
|
||||
|
||||
/A(*MARK:A)A+(*SKIP)(B|Z) | AC/xK
|
||||
AAAC
|
||||
No match
|
||||
|
||||
/--- Same --/
|
||||
|
||||
/A(*MARK:A)A+(*MARK:B)(*SKIP:B)(B|Z) | AC/xK
|
||||
AAAC
|
||||
No match
|
||||
|
||||
/--- This should fail; the SKIP advances by one, but when we get to AC, the
|
||||
PRUNE kills it. ---/
|
||||
|
||||
/A(*PRUNE:A)A+(*SKIP:A)(B|Z) | AC/xK
|
||||
AAAC
|
||||
No match
|
||||
|
||||
/A(*:A)A+(*SKIP)(B|Z) | AC/xK
|
||||
AAAC
|
||||
No match
|
||||
|
||||
/--- This should fail, as a null name is the same as no name ---/
|
||||
|
||||
/A(*MARK:A)A+(*SKIP:)(B|Z) | AC/xK
|
||||
AAAC
|
||||
No match
|
||||
|
||||
/--- This fails in PCRE, and I think that is in accordance with Perl's
|
||||
documentation, though in Perl it succeeds. ---/
|
||||
|
||||
/A(*MARK:A)A+(*SKIP:B)(B|Z) | AAC/xK
|
||||
AAAC
|
||||
No match
|
||||
|
||||
/--- Mark names can be duplicated ---/
|
||||
|
||||
/A(*:A)B|X(*:A)Y/K
|
||||
AABC
|
||||
0: AB
|
||||
MK: A
|
||||
XXYZ
|
||||
0: XY
|
||||
MK: A
|
||||
|
||||
/^A(*:A)B|^X(*:A)Y/K
|
||||
** Failers
|
||||
No match
|
||||
XAQQ
|
||||
No match, mark = A
|
||||
|
||||
/--- A check on what happens after hitting a mark and them bumping along to
|
||||
something that does not even start. Perl reports tags after the failures here,
|
||||
though it does not when the individual letters are made into something
|
||||
more complicated. ---/
|
||||
|
||||
/A(*:A)B|XX(*:B)Y/K
|
||||
AABC
|
||||
0: AB
|
||||
MK: A
|
||||
XXYZ
|
||||
0: XXY
|
||||
MK: B
|
||||
** Failers
|
||||
No match
|
||||
XAQQ
|
||||
No match
|
||||
XAQQXZZ
|
||||
No match
|
||||
AXQQQ
|
||||
No match
|
||||
AXXQQQ
|
||||
No match
|
||||
|
||||
/--- COMMIT at the start of a pattern should be the same as an anchor. Perl
|
||||
optimizations defeat this. So does the PCRE optimization unless we disable it
|
||||
with \Y. ---/
|
||||
|
||||
/(*COMMIT)ABC/
|
||||
ABCDEFG
|
||||
0: ABC
|
||||
** Failers
|
||||
No match
|
||||
DEFGABC\Y
|
||||
No match
|
||||
|
||||
/--- Repeat some tests with added studying. ---/
|
||||
|
||||
/A(*COMMIT)B/+KS
|
||||
ACABX
|
||||
No match
|
||||
|
||||
/A(*THEN)B|A(*THEN)C/KS
|
||||
AC
|
||||
0: AC
|
||||
|
||||
/A(*PRUNE)B|A(*PRUNE)C/KS
|
||||
AC
|
||||
No match
|
||||
|
||||
/^(A(*THEN:A)B|C(*THEN:B)D)/KS
|
||||
AB
|
||||
0: AB
|
||||
1: AB
|
||||
CD
|
||||
0: CD
|
||||
1: CD
|
||||
** Failers
|
||||
No match
|
||||
AC
|
||||
No match
|
||||
CB
|
||||
No match, mark = B
|
||||
|
||||
/^(A(*PRUNE:A)B|C(*PRUNE:B)D)/KS
|
||||
AB
|
||||
0: AB
|
||||
1: AB
|
||||
CD
|
||||
0: CD
|
||||
1: CD
|
||||
** Failers
|
||||
No match
|
||||
AC
|
||||
No match, mark = A
|
||||
CB
|
||||
No match, mark = B
|
||||
|
||||
/^(A(*PRUNE:)B|C(*PRUNE:B)D)/KS
|
||||
AB
|
||||
0: AB
|
||||
1: AB
|
||||
CD
|
||||
0: CD
|
||||
1: CD
|
||||
|
||||
/A(*PRUNE:A)B/KS
|
||||
ACAB
|
||||
0: AB
|
||||
|
||||
/(*MARK:A)(*PRUNE:B)(C|X)/KS
|
||||
C
|
||||
0: C
|
||||
1: C
|
||||
MK: A
|
||||
D
|
||||
No match
|
||||
|
||||
/(*MARK:A)(*THEN:B)(C|X)/KS
|
||||
C
|
||||
0: C
|
||||
1: C
|
||||
MK: A
|
||||
D
|
||||
No match
|
||||
|
||||
/A(*MARK:A)A+(*SKIP)(B|Z) | AC/xKS
|
||||
AAAC
|
||||
No match
|
||||
|
||||
/A(*MARK:A)A+(*MARK:B)(*SKIP:B)(B|Z) | AC/xKS
|
||||
AAAC
|
||||
No match
|
||||
|
||||
/A(*PRUNE:A)A+(*SKIP:A)(B|Z) | AC/xKS
|
||||
AAAC
|
||||
No match
|
||||
|
||||
/A(*:A)A+(*SKIP)(B|Z) | AC/xKS
|
||||
AAAC
|
||||
No match
|
||||
|
||||
/A(*MARK:A)A+(*SKIP:)(B|Z) | AC/xKS
|
||||
AAAC
|
||||
No match
|
||||
|
||||
/A(*MARK:A)A+(*SKIP:B)(B|Z) | AAC/xKS
|
||||
AAAC
|
||||
No match
|
||||
|
||||
/A(*:A)B|XX(*:B)Y/KS
|
||||
AABC
|
||||
0: AB
|
||||
MK: A
|
||||
XXYZ
|
||||
0: XXY
|
||||
MK: B
|
||||
** Failers
|
||||
No match
|
||||
XAQQ
|
||||
No match
|
||||
XAQQXZZ
|
||||
No match
|
||||
AXQQQ
|
||||
No match
|
||||
AXXQQQ
|
||||
No match
|
||||
|
||||
/(*COMMIT)ABC/
|
||||
ABCDEFG
|
||||
0: ABC
|
||||
** Failers
|
||||
No match
|
||||
DEFGABC\Y
|
||||
No match
|
||||
|
||||
/^(ab (c+(*THEN)cd) | xyz)/x
|
||||
abcccd
|
||||
No match
|
||||
|
||||
/^(ab (c+(*PRUNE)cd) | xyz)/x
|
||||
abcccd
|
||||
No match
|
||||
|
||||
/^(ab (c+(*FAIL)cd) | xyz)/x
|
||||
abcccd
|
||||
No match
|
||||
|
||||
/--- Perl 5.11 gets some of these wrong ---/
|
||||
|
||||
/(?>.(*ACCEPT))*?5/
|
||||
abcde
|
||||
0: a
|
||||
|
||||
/(.(*ACCEPT))*?5/
|
||||
abcde
|
||||
0: a
|
||||
1: a
|
||||
|
||||
/(.(*ACCEPT))5/
|
||||
abcde
|
||||
0: a
|
||||
1: a
|
||||
|
||||
/(.(*ACCEPT))*5/
|
||||
abcde
|
||||
0: a
|
||||
1: a
|
||||
|
||||
/A\NB./BZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
A
|
||||
Any
|
||||
B
|
||||
Any
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
ACBD
|
||||
0: ACBD
|
||||
** Failers
|
||||
No match
|
||||
A\nB
|
||||
No match
|
||||
ACB\n
|
||||
No match
|
||||
|
||||
/A\NB./sBZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
A
|
||||
Any
|
||||
B
|
||||
AllAny
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
ACBD
|
||||
0: ACBD
|
||||
ACB\n
|
||||
0: ACB\x0a
|
||||
** Failers
|
||||
No match
|
||||
A\nB
|
||||
No match
|
||||
|
||||
/A\NB/<crlf>
|
||||
A\nB
|
||||
0: A\x0aB
|
||||
A\rB
|
||||
0: A\x0dB
|
||||
** Failers
|
||||
No match
|
||||
A\r\nB
|
||||
No match
|
||||
|
||||
/\R+b/BZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
\R++
|
||||
b
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
|
||||
/\R+\n/BZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
\R+
|
||||
\x0a
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
|
||||
/\R+\d/BZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
\R++
|
||||
\d
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
|
||||
/\d*\R/BZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
\d*+
|
||||
\R
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
|
||||
/\s*\R/BZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
\s*+
|
||||
\R
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
|
||||
/-- End of testinput2 --/
|
||||
|
146
ext/pcre/pcrelib/testdata/testoutput5
vendored
146
ext/pcre/pcrelib/testdata/testoutput5
vendored
@ -2076,4 +2076,150 @@ Partial match: abcde
|
||||
\PX
|
||||
Partial match: X
|
||||
|
||||
/\h/SI
|
||||
Capturing subpattern count = 0
|
||||
No options
|
||||
No first char
|
||||
No need char
|
||||
Subject length lower bound = 1
|
||||
Starting byte set: \x09 \x20 \xa0
|
||||
|
||||
/\h/SI8
|
||||
Capturing subpattern count = 0
|
||||
Options: utf8
|
||||
No first char
|
||||
No need char
|
||||
Subject length lower bound = 1
|
||||
Starting byte set: \x09 \x20 \xc2 \xe1 \xe2 \xe3
|
||||
ABC\x{09}
|
||||
0: \x{09}
|
||||
ABC\x{20}
|
||||
0:
|
||||
ABC\x{a0}
|
||||
0: \x{a0}
|
||||
ABC\x{1680}
|
||||
0: \x{1680}
|
||||
ABC\x{180e}
|
||||
0: \x{180e}
|
||||
ABC\x{2000}
|
||||
0: \x{2000}
|
||||
ABC\x{202f}
|
||||
0: \x{202f}
|
||||
ABC\x{205f}
|
||||
0: \x{205f}
|
||||
ABC\x{3000}
|
||||
0: \x{3000}
|
||||
|
||||
/\v/SI
|
||||
Capturing subpattern count = 0
|
||||
No options
|
||||
No first char
|
||||
No need char
|
||||
Subject length lower bound = 1
|
||||
Starting byte set: \x0a \x0b \x0c \x0d \x85
|
||||
|
||||
/\v/SI8
|
||||
Capturing subpattern count = 0
|
||||
Options: utf8
|
||||
No first char
|
||||
No need char
|
||||
Subject length lower bound = 1
|
||||
Starting byte set: \x0a \x0b \x0c \x0d \xc2 \xe2
|
||||
ABC\x{0a}
|
||||
0: \x{0a}
|
||||
ABC\x{0b}
|
||||
0: \x{0b}
|
||||
ABC\x{0c}
|
||||
0: \x{0c}
|
||||
ABC\x{0d}
|
||||
0: \x{0d}
|
||||
ABC\x{85}
|
||||
0: \x{85}
|
||||
ABC\x{2028}
|
||||
0: \x{2028}
|
||||
|
||||
/\R/SI
|
||||
Capturing subpattern count = 0
|
||||
No options
|
||||
No first char
|
||||
No need char
|
||||
Subject length lower bound = 2
|
||||
Starting byte set: \x0a \x0b \x0c \x0d \x85
|
||||
|
||||
/\R/SI8
|
||||
Capturing subpattern count = 0
|
||||
Options: utf8
|
||||
No first char
|
||||
No need char
|
||||
Subject length lower bound = 2
|
||||
Starting byte set: \x0a \x0b \x0c \x0d \xc2 \xe2
|
||||
|
||||
/\h*A/SI8
|
||||
Capturing subpattern count = 0
|
||||
Options: utf8
|
||||
No first char
|
||||
Need char = 'A'
|
||||
Subject length lower bound = 1
|
||||
Starting byte set: \x09 \x20 A \xc2 \xe1 \xe2 \xe3
|
||||
CDBABC
|
||||
0: A
|
||||
|
||||
/\v+A/SI8
|
||||
Capturing subpattern count = 0
|
||||
Options: utf8
|
||||
No first char
|
||||
Need char = 'A'
|
||||
Subject length lower bound = 2
|
||||
Starting byte set: \x0a \x0b \x0c \x0d \xc2 \xe2
|
||||
|
||||
/\s?xxx\s/8SI
|
||||
Capturing subpattern count = 0
|
||||
Options: utf8
|
||||
No first char
|
||||
Need char = 'x'
|
||||
Subject length lower bound = 4
|
||||
Starting byte set: \x09 \x0a \x0c \x0d \x20 x
|
||||
|
||||
/\sxxx\s/8T1
|
||||
AB\x{85}xxx\x{a0}XYZ
|
||||
0: \x{85}xxx\x{a0}
|
||||
AB\x{a0}xxx\x{85}XYZ
|
||||
0: \x{a0}xxx\x{85}
|
||||
|
||||
/\sxxx\s/I8ST1
|
||||
Capturing subpattern count = 0
|
||||
Options: utf8
|
||||
No first char
|
||||
Need char = 'x'
|
||||
Subject length lower bound = 5
|
||||
Starting byte set: \x09 \x0a \x0c \x0d \x20 \xc2
|
||||
AB\x{85}xxx\x{a0}XYZ
|
||||
0: \x{85}xxx\x{a0}
|
||||
AB\x{a0}xxx\x{85}XYZ
|
||||
0: \x{a0}xxx\x{85}
|
||||
|
||||
/\S \S/8T1
|
||||
\x{a2} \x{84}
|
||||
0: \x{a2} \x{84}
|
||||
|
||||
/\S \S/I8ST1
|
||||
Capturing subpattern count = 0
|
||||
Options: utf8
|
||||
No first char
|
||||
Need char = ' '
|
||||
Subject length lower bound = 3
|
||||
Starting byte set: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0b \x0e
|
||||
\x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d
|
||||
\x1e \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @
|
||||
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e
|
||||
f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \xc0 \xc1 \xc2 \xc3
|
||||
\xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2
|
||||
\xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1
|
||||
\xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0
|
||||
\xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff
|
||||
\x{a2} \x{84}
|
||||
0: \x{a2} \x{84}
|
||||
A Z
|
||||
0: A Z
|
||||
|
||||
/-- End of testinput5 --/
|
||||
|
68
ext/pcre/pcrelib/testdata/testoutput6
vendored
68
ext/pcre/pcrelib/testdata/testoutput6
vendored
@ -1285,4 +1285,72 @@ No match
|
||||
\x{10b00}\x{a6ef}\x{13007}\x{10857}\x{10b78}\x{10b58}\x{a980}\x{110c1}\x{a4ff}\x{abc0}\x{10a7d}\x{10c48}\x{0800}\x{1aad}\x{aac0}
|
||||
0: \x{10b00}\x{a6ef}\x{13007}\x{10857}\x{10b78}\x{10b58}\x{a980}\x{110c1}\x{a4ff}\x{abc0}\x{10a7d}\x{10c48}\x{800}\x{1aad}\x{aac0}
|
||||
|
||||
/^\w+/8W
|
||||
Az_\x{aa}\x{c0}\x{1c5}\x{2b0}\x{3b6}\x{1d7c9}\x{2fa1d}1\x{660}\x{bef}\x{16ee}
|
||||
0: Az_\x{aa}\x{c0}\x{1c5}\x{2b0}\x{3b6}\x{1d7c9}\x{2fa1d}1\x{660}\x{bef}\x{16ee}
|
||||
|
||||
/^[[:xdigit:]]*/8W
|
||||
1a\x{660}\x{bef}\x{16ee}
|
||||
0: 1a
|
||||
|
||||
/^\d+/8W
|
||||
1\x{660}\x{bef}\x{16ee}
|
||||
0: 1\x{660}\x{bef}
|
||||
|
||||
/^[[:digit:]]+/8W
|
||||
1\x{660}\x{bef}\x{16ee}
|
||||
0: 1\x{660}\x{bef}
|
||||
|
||||
/^>\s+/8W
|
||||
>\x{20}\x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{9}\x{b}
|
||||
0: > \x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{09}
|
||||
|
||||
/^>\pZ+/8W
|
||||
>\x{20}\x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{9}\x{b}
|
||||
0: > \x{a0}\x{1680}\x{2028}\x{2029}\x{202f}
|
||||
|
||||
/^>[[:space:]]*/8W
|
||||
>\x{20}\x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{9}\x{b}
|
||||
0: > \x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{09}\x{0b}
|
||||
|
||||
/^>[[:blank:]]*/8W
|
||||
>\x{20}\x{a0}\x{1680}\x{180e}\x{2000}\x{202f}\x{9}\x{b}\x{2028}
|
||||
0: > \x{a0}\x{1680}\x{180e}\x{2000}\x{202f}\x{09}
|
||||
|
||||
/^[[:alpha:]]*/8W
|
||||
Az\x{aa}\x{c0}\x{1c5}\x{2b0}\x{3b6}\x{1d7c9}\x{2fa1d}
|
||||
0: Az\x{aa}\x{c0}\x{1c5}\x{2b0}\x{3b6}\x{1d7c9}\x{2fa1d}
|
||||
|
||||
/^[[:alnum:]]*/8W
|
||||
Az\x{aa}\x{c0}\x{1c5}\x{2b0}\x{3b6}\x{1d7c9}\x{2fa1d}1\x{660}\x{bef}\x{16ee}
|
||||
0: Az\x{aa}\x{c0}\x{1c5}\x{2b0}\x{3b6}\x{1d7c9}\x{2fa1d}1\x{660}\x{bef}\x{16ee}
|
||||
|
||||
/^[[:cntrl:]]*/8W
|
||||
\x{0}\x{09}\x{1f}\x{7f}\x{9f}
|
||||
0: \x{00}\x{09}\x{1f}\x{7f}
|
||||
|
||||
/^[[:graph:]]*/8W
|
||||
A\x{a1}\x{a0}
|
||||
0: A
|
||||
|
||||
/^[[:print:]]*/8W
|
||||
A z\x{a0}\x{a1}
|
||||
0: A z
|
||||
|
||||
/^[[:punct:]]*/8W
|
||||
.+\x{a1}\x{a0}
|
||||
0: .+
|
||||
|
||||
/\p{Zs}*?\R/
|
||||
** Failers
|
||||
No match
|
||||
a\xFCb
|
||||
No match
|
||||
|
||||
/\p{Zs}*\R/
|
||||
** Failers
|
||||
No match
|
||||
a\xFCb
|
||||
No match
|
||||
|
||||
/-- End of testinput6 --/
|
||||
|
360
ext/pcre/pcrelib/testdata/testoutput9
vendored
360
ext/pcre/pcrelib/testdata/testoutput9
vendored
@ -1674,4 +1674,364 @@ No match
|
||||
\x{1d79}\x{a77d}
|
||||
No match
|
||||
|
||||
/^\p{Xan}/8
|
||||
ABCD
|
||||
0: A
|
||||
1234
|
||||
0: 1
|
||||
\x{6ca}
|
||||
0: \x{6ca}
|
||||
\x{a6c}
|
||||
0: \x{a6c}
|
||||
\x{10a7}
|
||||
0: \x{10a7}
|
||||
** Failers
|
||||
No match
|
||||
_ABC
|
||||
No match
|
||||
|
||||
/^\p{Xan}+/8
|
||||
ABCD1234\x{6ca}\x{a6c}\x{10a7}_
|
||||
0: ABCD1234\x{6ca}\x{a6c}\x{10a7}
|
||||
1: ABCD1234\x{6ca}\x{a6c}
|
||||
2: ABCD1234\x{6ca}
|
||||
3: ABCD1234
|
||||
4: ABCD123
|
||||
5: ABCD12
|
||||
6: ABCD1
|
||||
7: ABCD
|
||||
8: ABC
|
||||
9: AB
|
||||
10: A
|
||||
** Failers
|
||||
No match
|
||||
_ABC
|
||||
No match
|
||||
|
||||
/^\p{Xan}*/8
|
||||
ABCD1234\x{6ca}\x{a6c}\x{10a7}_
|
||||
0: ABCD1234\x{6ca}\x{a6c}\x{10a7}
|
||||
1: ABCD1234\x{6ca}\x{a6c}
|
||||
2: ABCD1234\x{6ca}
|
||||
3: ABCD1234
|
||||
4: ABCD123
|
||||
5: ABCD12
|
||||
6: ABCD1
|
||||
7: ABCD
|
||||
8: ABC
|
||||
9: AB
|
||||
10: A
|
||||
11:
|
||||
|
||||
/^\p{Xan}{2,9}/8
|
||||
ABCD1234\x{6ca}\x{a6c}\x{10a7}_
|
||||
0: ABCD1234\x{6ca}
|
||||
1: ABCD1234
|
||||
2: ABCD123
|
||||
3: ABCD12
|
||||
4: ABCD1
|
||||
5: ABCD
|
||||
6: ABC
|
||||
7: AB
|
||||
|
||||
/^[\p{Xan}]/8
|
||||
ABCD1234_
|
||||
0: A
|
||||
1234abcd_
|
||||
0: 1
|
||||
\x{6ca}
|
||||
0: \x{6ca}
|
||||
\x{a6c}
|
||||
0: \x{a6c}
|
||||
\x{10a7}
|
||||
0: \x{10a7}
|
||||
** Failers
|
||||
No match
|
||||
_ABC
|
||||
No match
|
||||
|
||||
/^[\p{Xan}]+/8
|
||||
ABCD1234\x{6ca}\x{a6c}\x{10a7}_
|
||||
0: ABCD1234\x{6ca}\x{a6c}\x{10a7}
|
||||
1: ABCD1234\x{6ca}\x{a6c}
|
||||
2: ABCD1234\x{6ca}
|
||||
3: ABCD1234
|
||||
4: ABCD123
|
||||
5: ABCD12
|
||||
6: ABCD1
|
||||
7: ABCD
|
||||
8: ABC
|
||||
9: AB
|
||||
10: A
|
||||
** Failers
|
||||
No match
|
||||
_ABC
|
||||
No match
|
||||
|
||||
/^>\p{Xsp}/8
|
||||
>\x{1680}\x{2028}\x{0b}
|
||||
0: >\x{1680}
|
||||
** Failers
|
||||
No match
|
||||
\x{0b}
|
||||
No match
|
||||
|
||||
/^>\p{Xsp}+/8
|
||||
> \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
|
||||
0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
|
||||
1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
|
||||
2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
|
||||
3: > \x{09}\x{0a}\x{0c}\x{0d}
|
||||
4: > \x{09}\x{0a}\x{0c}
|
||||
5: > \x{09}\x{0a}
|
||||
6: > \x{09}
|
||||
7: >
|
||||
|
||||
/^>\p{Xsp}*/8
|
||||
> \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
|
||||
0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
|
||||
1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
|
||||
2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
|
||||
3: > \x{09}\x{0a}\x{0c}\x{0d}
|
||||
4: > \x{09}\x{0a}\x{0c}
|
||||
5: > \x{09}\x{0a}
|
||||
6: > \x{09}
|
||||
7: >
|
||||
8: >
|
||||
|
||||
/^>\p{Xsp}{2,9}/8
|
||||
> \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
|
||||
0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
|
||||
1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
|
||||
2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
|
||||
3: > \x{09}\x{0a}\x{0c}\x{0d}
|
||||
4: > \x{09}\x{0a}\x{0c}
|
||||
5: > \x{09}\x{0a}
|
||||
6: > \x{09}
|
||||
|
||||
/^>[\p{Xsp}]/8
|
||||
>\x{2028}\x{0b}
|
||||
0: >\x{2028}
|
||||
|
||||
/^>[\p{Xsp}]+/8
|
||||
> \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
|
||||
0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
|
||||
1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
|
||||
2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
|
||||
3: > \x{09}\x{0a}\x{0c}\x{0d}
|
||||
4: > \x{09}\x{0a}\x{0c}
|
||||
5: > \x{09}\x{0a}
|
||||
6: > \x{09}
|
||||
7: >
|
||||
|
||||
/^>\p{Xps}/8
|
||||
>\x{1680}\x{2028}\x{0b}
|
||||
0: >\x{1680}
|
||||
>\x{a0}
|
||||
0: >\x{a0}
|
||||
** Failers
|
||||
No match
|
||||
\x{0b}
|
||||
No match
|
||||
|
||||
/^>\p{Xps}+/8
|
||||
> \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
|
||||
0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
|
||||
1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
|
||||
2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
|
||||
3: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
|
||||
4: > \x{09}\x{0a}\x{0c}\x{0d}
|
||||
5: > \x{09}\x{0a}\x{0c}
|
||||
6: > \x{09}\x{0a}
|
||||
7: > \x{09}
|
||||
8: >
|
||||
|
||||
/^>\p{Xps}+?/8
|
||||
>\x{1680}\x{2028}\x{0b}
|
||||
0: >\x{1680}\x{2028}\x{0b}
|
||||
1: >\x{1680}\x{2028}
|
||||
2: >\x{1680}
|
||||
|
||||
/^>\p{Xps}*/8
|
||||
> \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
|
||||
0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
|
||||
1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
|
||||
2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
|
||||
3: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
|
||||
4: > \x{09}\x{0a}\x{0c}\x{0d}
|
||||
5: > \x{09}\x{0a}\x{0c}
|
||||
6: > \x{09}\x{0a}
|
||||
7: > \x{09}
|
||||
8: >
|
||||
9: >
|
||||
|
||||
/^>\p{Xps}{2,9}/8
|
||||
> \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
|
||||
0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
|
||||
1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
|
||||
2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
|
||||
3: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
|
||||
4: > \x{09}\x{0a}\x{0c}\x{0d}
|
||||
5: > \x{09}\x{0a}\x{0c}
|
||||
6: > \x{09}\x{0a}
|
||||
7: > \x{09}
|
||||
|
||||
/^>\p{Xps}{2,9}?/8
|
||||
> \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
|
||||
0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
|
||||
1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
|
||||
2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
|
||||
3: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
|
||||
4: > \x{09}\x{0a}\x{0c}\x{0d}
|
||||
5: > \x{09}\x{0a}\x{0c}
|
||||
6: > \x{09}\x{0a}
|
||||
7: > \x{09}
|
||||
|
||||
/^>[\p{Xps}]/8
|
||||
>\x{2028}\x{0b}
|
||||
0: >\x{2028}
|
||||
|
||||
/^>[\p{Xps}]+/8
|
||||
> \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
|
||||
0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
|
||||
1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
|
||||
2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
|
||||
3: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
|
||||
4: > \x{09}\x{0a}\x{0c}\x{0d}
|
||||
5: > \x{09}\x{0a}\x{0c}
|
||||
6: > \x{09}\x{0a}
|
||||
7: > \x{09}
|
||||
8: >
|
||||
|
||||
/^\p{Xwd}/8
|
||||
ABCD
|
||||
0: A
|
||||
1234
|
||||
0: 1
|
||||
\x{6ca}
|
||||
0: \x{6ca}
|
||||
\x{a6c}
|
||||
0: \x{a6c}
|
||||
\x{10a7}
|
||||
0: \x{10a7}
|
||||
_ABC
|
||||
0: _
|
||||
** Failers
|
||||
No match
|
||||
[]
|
||||
No match
|
||||
|
||||
/^\p{Xwd}+/8
|
||||
ABCD1234\x{6ca}\x{a6c}\x{10a7}_
|
||||
0: ABCD1234\x{6ca}\x{a6c}\x{10a7}_
|
||||
1: ABCD1234\x{6ca}\x{a6c}\x{10a7}
|
||||
2: ABCD1234\x{6ca}\x{a6c}
|
||||
3: ABCD1234\x{6ca}
|
||||
4: ABCD1234
|
||||
5: ABCD123
|
||||
6: ABCD12
|
||||
7: ABCD1
|
||||
8: ABCD
|
||||
9: ABC
|
||||
10: AB
|
||||
11: A
|
||||
|
||||
/^\p{Xwd}*/8
|
||||
ABCD1234\x{6ca}\x{a6c}\x{10a7}_
|
||||
0: ABCD1234\x{6ca}\x{a6c}\x{10a7}_
|
||||
1: ABCD1234\x{6ca}\x{a6c}\x{10a7}
|
||||
2: ABCD1234\x{6ca}\x{a6c}
|
||||
3: ABCD1234\x{6ca}
|
||||
4: ABCD1234
|
||||
5: ABCD123
|
||||
6: ABCD12
|
||||
7: ABCD1
|
||||
8: ABCD
|
||||
9: ABC
|
||||
10: AB
|
||||
11: A
|
||||
12:
|
||||
|
||||
/^\p{Xwd}{2,9}/8
|
||||
A_12\x{6ca}\x{a6c}\x{10a7}
|
||||
0: A_12\x{6ca}\x{a6c}\x{10a7}
|
||||
1: A_12\x{6ca}\x{a6c}
|
||||
2: A_12\x{6ca}
|
||||
3: A_12
|
||||
4: A_1
|
||||
5: A_
|
||||
|
||||
/^[\p{Xwd}]/8
|
||||
ABCD1234_
|
||||
0: A
|
||||
1234abcd_
|
||||
0: 1
|
||||
\x{6ca}
|
||||
0: \x{6ca}
|
||||
\x{a6c}
|
||||
0: \x{a6c}
|
||||
\x{10a7}
|
||||
0: \x{10a7}
|
||||
_ABC
|
||||
0: _
|
||||
** Failers
|
||||
No match
|
||||
[]
|
||||
No match
|
||||
|
||||
/^[\p{Xwd}]+/8
|
||||
ABCD1234\x{6ca}\x{a6c}\x{10a7}_
|
||||
0: ABCD1234\x{6ca}\x{a6c}\x{10a7}_
|
||||
1: ABCD1234\x{6ca}\x{a6c}\x{10a7}
|
||||
2: ABCD1234\x{6ca}\x{a6c}
|
||||
3: ABCD1234\x{6ca}
|
||||
4: ABCD1234
|
||||
5: ABCD123
|
||||
6: ABCD12
|
||||
7: ABCD1
|
||||
8: ABCD
|
||||
9: ABC
|
||||
10: AB
|
||||
11: A
|
||||
|
||||
/-- Unicode properties for \b abd \B --/
|
||||
|
||||
/\b...\B/8W
|
||||
abc_
|
||||
0: abc
|
||||
\x{37e}abc\x{376}
|
||||
0: abc
|
||||
\x{37e}\x{376}\x{371}\x{393}\x{394}
|
||||
0: \x{376}\x{371}\x{393}
|
||||
!\x{c0}++\x{c1}\x{c2}
|
||||
0: ++\x{c1}
|
||||
!\x{c0}+++++
|
||||
0: \x{c0}++
|
||||
|
||||
/-- Without PCRE_UCP, non-ASCII always fail, even if < 256 --/
|
||||
|
||||
/\b...\B/8
|
||||
abc_
|
||||
0: abc
|
||||
** Failers
|
||||
0: Fai
|
||||
\x{37e}abc\x{376}
|
||||
No match
|
||||
\x{37e}\x{376}\x{371}\x{393}\x{394}
|
||||
No match
|
||||
!\x{c0}++\x{c1}\x{c2}
|
||||
No match
|
||||
!\x{c0}+++++
|
||||
No match
|
||||
|
||||
/-- With PCRE_UCP, non-UTF8 chars that are < 256 still check properties --/
|
||||
|
||||
/\b...\B/W
|
||||
abc_
|
||||
0: abc
|
||||
!\x{c0}++\x{c1}\x{c2}
|
||||
0: ++\xc1
|
||||
!\x{c0}+++++
|
||||
0: \xc0++
|
||||
|
||||
/-- End of testinput9 --/
|
||||
|
Loading…
Reference in New Issue
Block a user