Update PCRE to 8.00

This commit is contained in:
Scott MacVicar 2009-11-03 12:15:03 +00:00
parent 26e3082abc
commit f03b175f7c
42 changed files with 7020 additions and 3397 deletions

View File

@ -1,6 +1,170 @@
ChangeLog for PCRE
------------------
Version 8.00 19-Oct-09
----------------------
1. The table for translating pcre_compile() error codes into POSIX error codes
was out-of-date, and there was no check on the pcre_compile() error code
being within the table. This could lead to an OK return being given in
error.
2. Changed the call to open a subject file in pcregrep from fopen(pathname,
"r") to fopen(pathname, "rb"), which fixed a problem with some of the tests
in a Windows environment.
3. The pcregrep --count option prints the count for each file even when it is
zero, as does GNU grep. However, pcregrep was also printing all files when
--files-with-matches was added. Now, when both options are given, it prints
counts only for those files that have at least one match. (GNU grep just
prints the file name in this circumstance, but including the count seems
more useful - otherwise, why use --count?) Also ensured that the
combination -clh just lists non-zero counts, with no names.
4. The long form of the pcregrep -F option was incorrectly implemented as
--fixed_strings instead of --fixed-strings. This is an incompatible change,
but it seems right to fix it, and I didn't think it was worth preserving
the old behaviour.
5. The command line items --regex=pattern and --regexp=pattern were not
recognized by pcregrep, which required --regex pattern or --regexp pattern
(with a space rather than an '='). The man page documented the '=' forms,
which are compatible with GNU grep; these now work.
6. No libpcreposix.pc file was created for pkg-config; there was just
libpcre.pc and libpcrecpp.pc. The omission has been rectified.
7. Added #ifndef SUPPORT_UCP into the pcre_ucd.c module, to reduce its size
when UCP support is not needed, by modifying the Python script that
generates it from Unicode data files. This should not matter if the module
is correctly used as a library, but I received one complaint about 50K of
unwanted data. My guess is that the person linked everything into his
program rather than using a library. Anyway, it does no harm.
8. A pattern such as /\x{123}{2,2}+/8 was incorrectly compiled; the trigger
was a minimum greater than 1 for a wide character in a possessive
repetition. The same bug could also affect patterns like /(\x{ff}{0,2})*/8
which had an unlimited repeat of a nested, fixed maximum repeat of a wide
character. Chaos in the form of incorrect output or a compiling loop could
result.
9. The restrictions on what a pattern can contain when partial matching is
requested for pcre_exec() have been removed. All patterns can now be
partially matched by this function. In addition, if there are at least two
slots in the offset vector, the offset of the earliest inspected character
for the match and the offset of the end of the subject are set in them when
PCRE_ERROR_PARTIAL is returned.
10. Partial matching has been split into two forms: PCRE_PARTIAL_SOFT, which is
synonymous with PCRE_PARTIAL, for backwards compatibility, and
PCRE_PARTIAL_HARD, which causes a partial match to supersede a full match,
and may be more useful for multi-segment matching.
11. Partial matching with pcre_exec() is now more intuitive. A partial match
used to be given if ever the end of the subject was reached; now it is
given only if matching could not proceed because another character was
needed. This makes a difference in some odd cases such as Z(*FAIL) with the
string "Z", which now yields "no match" instead of "partial match". In the
case of pcre_dfa_exec(), "no match" is given if every matching path for the
final character ended with (*FAIL).
12. Restarting a match using pcre_dfa_exec() after a partial match did not work
if the pattern had a "must contain" character that was already found in the
earlier partial match, unless partial matching was again requested. For
example, with the pattern /dog.(body)?/, the "must contain" character is
"g". If the first part-match was for the string "dog", restarting with
"sbody" failed. This bug has been fixed.
13. The string returned by pcre_dfa_exec() after a partial match has been
changed so that it starts at the first inspected character rather than the
first character of the match. This makes a difference only if the pattern
starts with a lookbehind assertion or \b or \B (\K is not supported by
pcre_dfa_exec()). It's an incompatible change, but it makes the two
matching functions compatible, and I think it's the right thing to do.
14. Added a pcredemo man page, created automatically from the pcredemo.c file,
so that the demonstration program is easily available in environments where
PCRE has not been installed from source.
15. Arranged to add -DPCRE_STATIC to cflags in libpcre.pc, libpcreposix.cp,
libpcrecpp.pc and pcre-config when PCRE is not compiled as a shared
library.
16. Added REG_UNGREEDY to the pcreposix interface, at the request of a user.
It maps to PCRE_UNGREEDY. It is not, of course, POSIX-compatible, but it
is not the first non-POSIX option to be added. Clearly some people find
these options useful.
17. If a caller to the POSIX matching function regexec() passes a non-zero
value for nmatch with a NULL value for pmatch, the value of
nmatch is forced to zero.
18. RunGrepTest did not have a test for the availability of the -u option of
the diff command, as RunTest does. It now checks in the same way as
RunTest, and also checks for the -b option.
19. If an odd number of negated classes containing just a single character
interposed, within parentheses, between a forward reference to a named
subpattern and the definition of the subpattern, compilation crashed with
an internal error, complaining that it could not find the referenced
subpattern. An example of a crashing pattern is /(?&A)(([^m])(?<A>))/.
[The bug was that it was starting one character too far in when skipping
over the character class, thus treating the ] as data rather than
terminating the class. This meant it could skip too much.]
20. Added PCRE_NOTEMPTY_ATSTART in order to be able to correctly implement the
/g option in pcretest when the pattern contains \K, which makes it possible
to have an empty string match not at the start, even when the pattern is
anchored. Updated pcretest and pcredemo to use this option.
21. If the maximum number of capturing subpatterns in a recursion was greater
than the maximum at the outer level, the higher number was returned, but
with unset values at the outer level. The correct (outer level) value is
now given.
22. If (*ACCEPT) appeared inside capturing parentheses, previous releases of
PCRE did not set those parentheses (unlike Perl). I have now found a way to
make it do so. The string so far is captured, making this feature
compatible with Perl.
23. The tests have been re-organized, adding tests 11 and 12, to make it
possible to check the Perl 5.10 features against Perl 5.10.
24. Perl 5.10 allows subroutine calls in lookbehinds, as long as the subroutine
pattern matches a fixed length string. PCRE did not allow this; now it
does. Neither allows recursion.
25. I finally figured out how to implement a request to provide the minimum
length of subject string that was needed in order to match a given pattern.
(It was back references and recursion that I had previously got hung up
on.) This code has now been added to pcre_study(); it finds a lower bound
to the length of subject needed. It is not necessarily the greatest lower
bound, but using it to avoid searching strings that are too short does give
some useful speed-ups. The value is available to calling programs via
pcre_fullinfo().
26. While implementing 25, I discovered to my embarrassment that pcretest had
not been passing the result of pcre_study() to pcre_dfa_exec(), so the
study optimizations had never been tested with that matching function.
Oops. What is worse, even when it was passed study data, there was a bug in
pcre_dfa_exec() that meant it never actually used it. Double oops. There
were also very few tests of studied patterns with pcre_dfa_exec().
27. If (?| is used to create subpatterns with duplicate numbers, they are now
allowed to have the same name, even if PCRE_DUPNAMES is not set. However,
on the other side of the coin, they are no longer allowed to have different
names, because these cannot be distinguished in PCRE, and this has caused
confusion. (This is a difference from Perl.)
28. When duplicate subpattern names are present (necessarily with different
numbers, as required by 27 above), and a test is made by name in a
conditional pattern, either for a subpattern having been matched, or for
recursion in such a pattern, all the associated numbered subpatterns are
tested, and the overall condition is true if the condition is true for any
one of them. This is the way Perl works, and is also more like the way
testing by number works.
Version 7.9 11-Apr-09
---------------------

View File

@ -67,22 +67,22 @@ many tests of the mode that might slow it down. So I re-factored the compiling
functions to work this way. This got rid of about 600 lines of source. It
should make future maintenance and development easier. As this was such a major
change, I never released 6.8, instead upping the number to 7.0 (other quite
major changes are also present in the 7.0 release).
major changes were also present in the 7.0 release).
A side effect of this work is that the previous limit of 200 on the nesting
A side effect of this work was that the previous limit of 200 on the nesting
depth of parentheses was removed. However, there is a downside: pcre_compile()
runs more slowly than before (30% or more, depending on the pattern) because it
is doing a full analysis of the pattern. My hope is that this is not a big
issue.
is doing a full analysis of the pattern. My hope was that this would not be a
big issue, and in the event, nobody has commented on it.
Traditional matching function
-----------------------------
The "traditional", and original, matching function is called pcre_exec(), and
it implements an NFA algorithm, similar to the original Henry Spencer algorithm
and the way that Perl works. Not surprising, since it is intended to be as
compatible with Perl as possible. This is the function most users of PCRE will
use most of the time.
and the way that Perl works. This is not surprising, since it is intended to be
as compatible with Perl as possible. This is the function most users of PCRE
will use most of the time.
Supplementary matching function
-------------------------------
@ -119,6 +119,7 @@ quantifiers) are always just two bytes long.
A list of the opcodes follows:
Opcodes with no following data
------------------------------
@ -150,12 +151,12 @@ These items are all just one byte long
OP_EXTUNI match an extended Unicode character
OP_ANYNL match any Unicode newline sequence
OP_ACCEPT )
OP_COMMIT )
OP_FAIL ) These are Perl 5.10's "backtracking
OP_PRUNE ) control verbs".
OP_SKIP )
OP_THEN )
OP_ACCEPT ) These are Perl 5.10's "backtracking
OP_COMMIT ) control verbs". If OP_ACCEPT is inside
OP_FAIL ) capturing parentheses, it may be preceded
OP_PRUNE ) by one or more OP_CLOSE, followed by a 2-byte
OP_SKIP ) number, indicating which parentheses must be
OP_THEN ) closed.
Repeating single characters
@ -372,12 +373,15 @@ These are like other subpatterns, but they start with the opcode OP_COND, or
OP_SCOND for one that might match an empty string in an unbounded repeat. If
the condition is a back reference, this is stored at the start of the
subpattern using the opcode OP_CREF followed by two bytes containing the
reference number. If the condition is "in recursion" (coded as "(?(R)"), or "in
recursion of group x" (coded as "(?(Rx)"), the group number is stored at the
start of the subpattern using the opcode OP_RREF, and a value of zero for "the
whole pattern". For a DEFINE condition, just the single byte OP_DEF is used (it
has no associated data). Otherwise, a conditional subpattern always starts with
one of the assertions.
reference number. OP_NCREF is used instead if the reference was generated by
name (so that the runtime code knows to check for duplicate names).
If the condition is "in recursion" (coded as "(?(R)"), or "in recursion of
group x" (coded as "(?(Rx)"), the group number is stored at the start of the
subpattern using the opcode OP_RREF or OP_NRREF (cf OP_NCREF), and a value of
zero for "the whole pattern". For a DEFINE condition, just the single byte
OP_DEF is used (it has no associated data). Otherwise, a conditional subpattern
always starts with one of the assertions.
Recursion
@ -415,4 +419,4 @@ at compile time, and so does not cause anything to be put into the compiled
data.
Philip Hazel
April 2008
October 2009

View File

@ -4,7 +4,7 @@ PCRE LICENCE
PCRE is a library of functions to support regular expressions whose syntax
and semantics are as close as possible to those of the Perl 5 language.
Release 7 of PCRE is distributed under the terms of the "BSD" licence, as
Release 8 of PCRE is distributed under the terms of the "BSD" licence, as
specified below. The documentation for PCRE, supplied in the "doc"
directory, is distributed under the same terms as the software itself.

View File

@ -1,6 +1,21 @@
News about PCRE releases
------------------------
Release 8.00 19-Oct-09
----------------------
Bugs have been fixed in the library and in pcregrep. There are also some
enhancements. Restrictions on patterns used for partial matching have been
removed, extra information is given for partial matches, the partial matching
process has been improved, and an option to make a partial match override a
full match is available. The "study" process has been enhanced by finding a
lower bound matching length. Groups with duplicate numbers may now have
duplicated names without the use of PCRE_DUPNAMES. However, they may not have
different names. The documentation has been revised to reflect these changes.
The version number has been expanded to 3 digits as it is clear that the rate
of change is not slowing down.
Release 7.9 11-Apr-09
---------------------

View File

@ -12,9 +12,10 @@ This document contains the following sections:
Comments about Win32 builds
Building PCRE on Windows with CMake
Use of relative paths with CMake on Windows
Testing with runtest.bat
Testing with RunTest.bat
Building under Windows with BCC5.5
Building PCRE on OpenVMS
Building PCRE on Stratus OpenVOS
GENERAL
@ -36,10 +37,10 @@ wrapper functions are a separate issue (see below).
The PCRE distribution includes a "configure" file for use by the Configure/Make
build system, as found in many Unix-like environments. There is also support
support for CMake, which some users prefer, in particular in Windows
environments. There are some instructions for CMake under Windows in the
section entitled "Building PCRE with CMake" below. CMake can also be used to
build PCRE in Unix-like systems.
support for CMake, which some users prefer, especially in Windows environments.
There are some instructions for CMake under Windows in the section entitled
"Building PCRE with CMake" below. CMake can also be used to build PCRE in
Unix-like systems.
GENERIC INSTRUCTIONS FOR THE PCRE C LIBRARY
@ -278,40 +279,42 @@ things in this area in future.
BUILDING PCRE ON WINDOWS WITH CMAKE
CMake is an alternative build facility that can be used instead of the
traditional Unix "configure". CMake version 2.4.7 supports Borland makefiles,
MinGW makefiles, MSYS makefiles, NMake makefiles, UNIX makefiles, Visual Studio
6, Visual Studio 7, Visual Studio 8, and Watcom W8. The following instructions
CMake is an alternative configuration facility that can be used instead of the
traditional Unix "configure". CMake creates project files (make files, solution
files, etc.) tailored to numerous development environments, including Visual
Studio, Borland, Msys, MinGW, NMake, and Unix. The following instructions
were contributed by a PCRE user.
1. Download CMake 2.4.7 or above from http://www.cmake.org/, install and ensure
that cmake\bin is on your path.
1. Install the latest CMake version available from http://www.cmake.org/, and
ensure that cmake\bin is on your path.
2. Unzip (retaining folder structure) the PCRE source tree into a source
directory such as C:\pcre.
3. Create a new, empty build directory: C:\pcre\build\
3. Create a new, empty build directory, for example C:\pcre\build\
4. Run CMakeSetup from the Shell envirornment of your build tool, e.g., Msys
for Msys/MinGW or Visual Studio Command Prompt for VC/VC++
4. Run cmake-gui from the Shell envirornment of your build tool, for example,
Msys for Msys/MinGW or Visual Studio Command Prompt for VC/VC++.
5. Enter C:\pcre\pcre-xx and C:\pcre\build for the source and build
directories, respectively
directories, respectively.
6. Hit the "Configure" button.
7. Select the particular IDE / build tool that you are using (Visual Studio,
MSYS makefiles, MinGW makefiles, etc.)
7. Select the particular IDE / build tool that you are using (Visual
Studio, MSYS makefiles, MinGW makefiles, etc.)
8. The GUI will then list several configuration options. This is where you can
enable UTF-8 support, etc.
8. The GUI will then list several configuration options. This is where
you can enable UTF-8 support or other PCRE optional features.
9. Hit "Configure" again. The adjacent "OK" button should now be active.
9. Hit "Configure" again. The adjacent "Generate" button should now be
active.
10. Hit "OK".
10. Hit "Generate".
11. The build directory should now contain a usable build system, be it a
solution file for Visual Studio, makefiles for MinGW, etc.
solution file for Visual Studio, makefiles for MinGW, etc. Exit from
cmake-gui and use the generated build system with your compiler or IDE.
USE OF RELATIVE PATHS WITH CMAKE ON WINDOWS
@ -444,5 +447,52 @@ $! Locale could not be set to fr
$!
=========================
Last Updated: 17 March 2009
BUILDING PCRE ON STRATUS OPENVOS
These notes on the port of PCRE to VOS (lightly edited) were supplied by
Ashutosh Warikoo, whose email address has the local part awarikoo and the
domain nse.co.in. The port was for version 7.9 in August 2009.
1. Building PCRE
I built pcre on OpenVOS Release 17.0.1at using GNU Tools 3.4a without any
problems. I used the following packages to build PCRE:
ftp://ftp.stratus.com/pub/vos/posix/ga/posix.save.evf.gz
Please read and follow the instructions that come with these packages. To start
the build of pcre, from the root of the package type:
./build.sh
2. Installing PCRE
Once you have successfully built PCRE, login to the SysAdmin group, switch to
the root user, and type
[ !create_dir (master_disk)>usr --if needed ]
[ !create_dir (master_disk)>usr>local --if needed ]
!gmake install
This installs PCRE and its man pages into /usr/local. You can add
(master_disk)>usr>local>bin to your command search paths, or if you are in
BASH, add /usr/local/bin to the PATH environment variable.
4. Restrictions
This port requires readline library optionally. However during the build I
faced some yet unexplored errors while linking with readline. As it was an
optional component I chose to disable it.
5. Known Problems
I ran a the test suite, but you will have to be your own judge of whether this
command, and this port, suits your purposes. If you find any problems that
appear to be related to the port itself, please let me know. Please see the
build.log file in the root of the package also.
=========================
Last Updated: 05 October 2009
****

View File

@ -24,6 +24,7 @@ The contents of this README file are:
Shared libraries on Unix-like systems
Cross-compiling on Unix-like systems
Using HP's ANSI C++ compiler (aCC)
Using PCRE from MySQL
Making new tarballs
Testing PCRE
Character tables
@ -111,8 +112,8 @@ Building PCRE on non-Unix systems
For a non-Unix system, please read the comments in the file NON-UNIX-USE,
though if your system supports the use of "configure" and "make" you may be
able to build PCRE in the same way as for Unix-like systems. PCRE can also be
configured in many platform environments using the GUI facility of CMake's
CMakeSetup. It creates Makefiles, solution files, etc.
configured in many platform environments using the GUI facility provided by
CMake's cmake-gui command. This creates Makefiles, solution files, etc.
PCRE has been compiled on many different operating systems. It should be
straightforward to build PCRE on any system that has a Standard C compiler and
@ -478,6 +479,26 @@ running the "configure" script:
CXXLDFLAGS="-lstd_v2 -lCsup_v2"
Using Sun's compilers for Solaris
---------------------------------
A user reports that the following configurations work on Solaris 9 sparcv9 and
Solaris 9 x86 (32-bit):
Solaris 9 sparcv9: ./configure --disable-cpp CC=/bin/cc CFLAGS="-m64 -g"
Solaris 9 x86: ./configure --disable-cpp CC=/bin/cc CFLAGS="-g"
Using PCRE from MySQL
---------------------
On systems where both PCRE and MySQL are installed, it is possible to make use
of PCRE from within MySQL, as an alternative to the built-in pattern matching.
There is a web page that tells you how to do this:
http://www.mysqludf.org/lib_mysqludf_preg/index.php
Making new tarballs
-------------------
@ -553,22 +574,32 @@ document entitled NON-UNIX-USE.]
The fourth test checks the UTF-8 support. It is not run automatically unless
PCRE is built with UTF-8 support. To do this you must set --enable-utf8 when
running "configure". This file can be also fed directly to the perltest script,
provided you are running Perl 5.8 or higher. (For Perl 5.6, a small patch,
commented in the script, can be be used.)
running "configure". This file can be also fed directly to the perltest.pl
script, provided you are running Perl 5.8 or higher.
The fifth test checks error handling with UTF-8 encoding, and internal UTF-8
features of PCRE that are not relevant to Perl.
The sixth test checks the support for Unicode character properties. It it not
run automatically unless PCRE is built with Unicode property support. To to
this you must set --enable-unicode-properties when running "configure".
The sixth test (which is Perl-5.10 compatible) checks the support for Unicode
character properties. It it not run automatically unless PCRE is built with
Unicode property support. To to this you must set --enable-unicode-properties
when running "configure".
The seventh, eighth, and ninth tests check the pcre_dfa_exec() alternative
matching function, in non-UTF-8 mode, UTF-8 mode, and UTF-8 mode with Unicode
property support, respectively. The eighth and ninth tests are not run
automatically unless PCRE is build with the relevant support.
The tenth test checks some internal offsets and code size features; it is run
only when the default "link size" of 2 is set (in other cases the sizes
change).
The eleventh test checks out features that are new in Perl 5.10, and the
twelfth test checks a number internals and non-Perl features concerned with
Unicode property support. It it not run automatically unless PCRE is built with
Unicode property support. To to this you must set --enable-unicode-properties
when running "configure".
Character tables
----------------
@ -712,7 +743,7 @@ The distribution should contain the following files:
) "configure" and config.h
depcomp ) script to find program dependencies, generated by
) automake
doc/*.3 man page sources for the PCRE functions
doc/*.3 man page sources for PCRE
doc/*.1 man page sources for pcregrep and pcretest
doc/index.html.src the base HTML page
doc/html/* HTML documentation
@ -721,6 +752,7 @@ The distribution should contain the following files:
doc/perltest.txt plain text documentation of Perl test program
install-sh a shell script for installing files
libpcre.pc.in template for libpcre.pc for pkg-config
libpcreposix.pc.in template for libpcreposix.pc for pkg-config
libpcrecpp.pc.in template for libpcrecpp.pc for pkg-config
ltmain.sh file used to build a libtool script
missing ) common stub for a few missing GNU programs while
@ -764,4 +796,4 @@ The distribution should contain the following files:
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
Last updated: 21 March 2009
Last updated: 19 October 2009

View File

@ -196,6 +196,12 @@ them both to 0; an emulation function will be used. */
#define LINK_SIZE 2
#endif
/* Define to the sub-directory in which libtool stores uninstalled libraries.
*/
#ifndef LT_OBJDIR
#define LT_OBJDIR ".libs/"
#endif
/* The value of MATCH_LIMIT determines the default number of times the
internal match() function can be called during a single execution of
pcre_exec(). There is a runtime interface for setting a different limit.
@ -262,13 +268,13 @@ them both to 0; an emulation function will be used. */
#define PACKAGE_NAME "PCRE"
/* Define to the full name and version of this package. */
#define PACKAGE_STRING "PCRE 7.9"
#define PACKAGE_STRING "PCRE 8.00"
/* Define to the one symbol short name of this package. */
#define PACKAGE_TARNAME "pcre"
/* Define to the version of this package. */
#define PACKAGE_VERSION "7.9"
#define PACKAGE_VERSION "8.00"
/* If you are compiling for a system other than a Unix-like system or
@ -324,7 +330,7 @@ them both to 0; an emulation function will be used. */
/* Version number of package */
#ifndef VERSION
#define VERSION "7.9"
#define VERSION "8.00"
#endif
/* Define to empty if `const' does not conform to ANSI C. */

File diff suppressed because it is too large Load Diff

View File

@ -41,10 +41,10 @@ POSSIBILITY OF SUCH DAMAGE.
/* The current PCRE version information. */
#define PCRE_MAJOR 7
#define PCRE_MINOR 9
#define PCRE_MAJOR 8
#define PCRE_MINOR 00
#define PCRE_PRERELEASE
#define PCRE_DATE 2009-04-11
#define PCRE_DATE 2009-10-19
/* When an application links to a PCRE DLL in Windows, the symbols that are
imported have to be identified as such. When building PCRE, the appropriate
@ -113,7 +113,8 @@ both, so we keep them all distinct. */
#define PCRE_NO_AUTO_CAPTURE 0x00001000
#define PCRE_NO_UTF8_CHECK 0x00002000
#define PCRE_AUTO_CALLOUT 0x00004000
#define PCRE_PARTIAL 0x00008000
#define PCRE_PARTIAL_SOFT 0x00008000
#define PCRE_PARTIAL 0x00008000 /* Backwards compatible synonym */
#define PCRE_DFA_SHORTEST 0x00010000
#define PCRE_DFA_RESTART 0x00020000
#define PCRE_FIRSTLINE 0x00040000
@ -128,6 +129,8 @@ both, so we keep them all distinct. */
#define PCRE_JAVASCRIPT_COMPAT 0x02000000
#define PCRE_NO_START_OPTIMIZE 0x04000000
#define PCRE_NO_START_OPTIMISE 0x04000000
#define PCRE_PARTIAL_HARD 0x08000000
#define PCRE_NOTEMPTY_ATSTART 0x10000000
/* Exec-time and get/set-time error codes */
@ -174,6 +177,7 @@ both, so we keep them all distinct. */
#define PCRE_INFO_OKPARTIAL 12
#define PCRE_INFO_JCHANGED 13
#define PCRE_INFO_HASCRORLF 14
#define PCRE_INFO_MINLENGTH 15
/* Request types for pcre_config(). Do not re-arrange, in order to remain
compatible. */

View File

@ -339,7 +339,9 @@ static const char error_texts[] =
"number is too big\0"
"subpattern name expected\0"
"digit expected after (?+\0"
"] is an invalid data character in JavaScript compatibility mode";
"] is an invalid data character in JavaScript compatibility mode\0"
/* 65 */
"different names for subpatterns of the same number are not allowed";
/* Table to identify digits and hex digits. This is used when compiling
@ -1098,6 +1100,7 @@ if (ptr[0] == CHAR_LEFT_PARENTHESIS)
if (name != NULL && lorn == ptr - thisname &&
strncmp((const char *)name, (const char *)thisname, lorn) == 0)
return *count;
term++;
}
}
}
@ -1132,19 +1135,21 @@ for (; *ptr != 0; ptr++)
BOOL negate_class = FALSE;
for (;;)
{
int c = *(++ptr);
if (c == CHAR_BACKSLASH)
if (ptr[1] == CHAR_BACKSLASH)
{
if (ptr[1] == CHAR_E)
ptr++;
else if (strncmp((const char *)ptr+1,
if (ptr[2] == CHAR_E)
ptr+= 2;
else if (strncmp((const char *)ptr+2,
STR_Q STR_BACKSLASH STR_E, 3) == 0)
ptr += 3;
ptr += 4;
else
break;
}
else if (!negate_class && c == CHAR_CIRCUMFLEX_ACCENT)
else if (!negate_class && ptr[1] == CHAR_CIRCUMFLEX_ACCENT)
{
negate_class = TRUE;
ptr++;
}
else break;
}
@ -1310,7 +1315,9 @@ for (;;)
case OP_CALLOUT:
case OP_CREF:
case OP_NCREF:
case OP_RREF:
case OP_NRREF:
case OP_DEF:
code += _pcre_OP_lengths[*code];
break;
@ -1326,23 +1333,34 @@ for (;;)
/*************************************************
* Find the fixed length of a pattern *
* Find the fixed length of a branch *
*************************************************/
/* Scan a pattern and compute the fixed length of subject that will match it,
/* Scan a branch and compute the fixed length of subject that will match it,
if the length is fixed. This is needed for dealing with backward assertions.
In UTF8 mode, the result is in characters rather than bytes.
In UTF8 mode, the result is in characters rather than bytes. The branch is
temporarily terminated with OP_END when this function is called.
This function is called when a backward assertion is encountered, so that if it
fails, the error message can point to the correct place in the pattern.
However, we cannot do this when the assertion contains subroutine calls,
because they can be forward references. We solve this by remembering this case
and doing the check at the end; a flag specifies which mode we are running in.
Arguments:
code points to the start of the pattern (the bracket)
options the compiling options
atend TRUE if called when the pattern is complete
cd the "compile data" structure
Returns: the fixed length, or -1 if there is no fixed length,
Returns: the fixed length,
or -1 if there is no fixed length,
or -2 if \C was encountered
or -3 if an OP_RECURSE item was encountered and atend is FALSE
*/
static int
find_fixedlength(uschar *code, int options)
find_fixedlength(uschar *code, int options, BOOL atend, compile_data *cd)
{
int length = -1;
@ -1355,6 +1373,7 @@ branch, check the length against that of the other branches. */
for (;;)
{
int d;
uschar *ce, *cs;
register int op = *cc;
switch (op)
{
@ -1362,7 +1381,7 @@ for (;;)
case OP_BRA:
case OP_ONCE:
case OP_COND:
d = find_fixedlength(cc + ((op == OP_CBRA)? 2:0), options);
d = find_fixedlength(cc + ((op == OP_CBRA)? 2:0), options, atend, cd);
if (d < 0) return d;
branchlength += d;
do cc += GET(cc, 1); while (*cc == OP_ALT);
@ -1385,6 +1404,21 @@ for (;;)
branchlength = 0;
break;
/* A true recursion implies not fixed length, but a subroutine call may
be OK. If the subroutine is a forward reference, we can't deal with
it until the end of the pattern, so return -3. */
case OP_RECURSE:
if (!atend) return -3;
cs = ce = (uschar *)cd->start_code + GET(cc, 1); /* Start subpattern */
do ce += GET(ce, 1); while (*ce == OP_ALT); /* End subpattern */
if (cc > cs && cc < ce) return -1; /* Recursion */
d = find_fixedlength(cs + 2, options, atend, cd);
if (d < 0) return d;
branchlength += d;
cc += 1 + LINK_SIZE;
break;
/* Skip over assertive subpatterns */
case OP_ASSERT:
@ -1398,7 +1432,9 @@ for (;;)
case OP_REVERSE:
case OP_CREF:
case OP_NCREF:
case OP_RREF:
case OP_NRREF:
case OP_DEF:
case OP_OPT:
case OP_CALLOUT:
@ -1421,10 +1457,8 @@ for (;;)
branchlength++;
cc += 2;
#ifdef SUPPORT_UTF8
if ((options & PCRE_UTF8) != 0)
{
while ((*cc & 0xc0) == 0x80) cc++;
}
if ((options & PCRE_UTF8) != 0 && cc[-1] >= 0xc0)
cc += _pcre_utf8_table4[cc[-1] & 0x3f];
#endif
break;
@ -1435,10 +1469,8 @@ for (;;)
branchlength += GET2(cc,1);
cc += 4;
#ifdef SUPPORT_UTF8
if ((options & PCRE_UTF8) != 0)
{
while((*cc & 0x80) == 0x80) cc++;
}
if ((options & PCRE_UTF8) != 0 && cc[-1] >= 0xc0)
cc += _pcre_utf8_table4[cc[-1] & 0x3f];
#endif
break;
@ -1517,22 +1549,25 @@ for (;;)
/*************************************************
* Scan compiled regex for numbered bracket *
* Scan compiled regex for specific bracket *
*************************************************/
/* This little function scans through a compiled pattern until it finds a
capturing bracket with the given number.
capturing bracket with the given number, or, if the number is negative, an
instance of OP_REVERSE for a lookbehind. The function is global in the C sense
so that it can be called from pcre_study() when finding the minimum matching
length.
Arguments:
code points to start of expression
utf8 TRUE in UTF-8 mode
number the required bracket number
number the required bracket number or negative to find a lookbehind
Returns: pointer to the opcode for the bracket, or NULL if not found
*/
static const uschar *
find_bracket(const uschar *code, BOOL utf8, int number)
const uschar *
_pcre_find_bracket(const uschar *code, BOOL utf8, int number)
{
for (;;)
{
@ -1545,6 +1580,14 @@ for (;;)
if (c == OP_XCLASS) code += GET(code, 1);
/* Handle recursion */
else if (c == OP_REVERSE)
{
if (number < 0) return (uschar *)code;
code += _pcre_OP_lengths[c];
}
/* Handle capturing bracket */
else if (c == OP_CBRA)
@ -1910,10 +1953,13 @@ for (code = first_significant_code(code + _pcre_OP_lengths[*code], NULL, 0, TRUE
case OP_QUERY:
case OP_MINQUERY:
case OP_POSQUERY:
if (utf8 && code[1] >= 0xc0) code += _pcre_utf8_table4[code[1] & 0x3f];
break;
case OP_UPTO:
case OP_MINUPTO:
case OP_POSUPTO:
if (utf8) while ((code[2] & 0xc0) == 0x80) code++;
if (utf8 && code[3] >= 0xc0) code += _pcre_utf8_table4[code[3] & 0x3f];
break;
#endif
}
@ -3867,10 +3913,15 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
if (repeat_max == 0) goto END_REPEAT;
/*--------------------------------------------------------------------*/
/* This code is obsolete from release 8.00; the restriction was finally
removed: */
/* All real repeats make it impossible to handle partial matching (maybe
one day we will be able to remove this restriction). */
if (repeat_max != 1) cd->external_flags |= PCRE_NOPARTIAL;
/* if (repeat_max != 1) cd->external_flags |= PCRE_NOPARTIAL; */
/*--------------------------------------------------------------------*/
/* Combine the op_type with the repeat_type */
@ -4017,10 +4068,15 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
goto END_REPEAT;
}
/*--------------------------------------------------------------------*/
/* This code is obsolete from release 8.00; the restriction was finally
removed: */
/* All real repeats make it impossible to handle partial matching (maybe
one day we will be able to remove this restriction). */
if (repeat_max != 1) cd->external_flags |= PCRE_NOPARTIAL;
/* if (repeat_max != 1) cd->external_flags |= PCRE_NOPARTIAL; */
/*--------------------------------------------------------------------*/
if (repeat_min == 0 && repeat_max == -1)
*code++ = OP_CRSTAR + repeat_type;
@ -4335,11 +4391,20 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
if (possessive_quantifier)
{
int len;
if (*tempcode == OP_EXACT || *tempcode == OP_TYPEEXACT ||
*tempcode == OP_NOTEXACT)
if (*tempcode == OP_TYPEEXACT)
tempcode += _pcre_OP_lengths[*tempcode] +
((*tempcode == OP_TYPEEXACT &&
(tempcode[3] == OP_PROP || tempcode[3] == OP_NOTPROP))? 2:0);
((tempcode[3] == OP_PROP || tempcode[3] == OP_NOTPROP)? 2 : 0);
else if (*tempcode == OP_EXACT || *tempcode == OP_NOTEXACT)
{
tempcode += _pcre_OP_lengths[*tempcode];
#ifdef SUPPORT_UTF8
if (utf8 && tempcode[-1] >= 0xc0)
tempcode += _pcre_utf8_table4[tempcode[-1] & 0x3f];
#endif
}
len = code - tempcode;
if (len > 0) switch (*tempcode)
{
@ -4417,8 +4482,19 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
if (namelen == verbs[i].len &&
strncmp((char *)name, vn, namelen) == 0)
{
*code = verbs[i].op;
if (*code++ == OP_ACCEPT) cd->had_accept = TRUE;
/* Check for open captures before ACCEPT */
if (verbs[i].op == OP_ACCEPT)
{
open_capitem *oc;
cd->had_accept = TRUE;
for (oc = cd->open_caps; oc != NULL; oc = oc->next)
{
*code++ = OP_CLOSE;
PUT2INC(code, 0, oc->number);
}
}
*code++ = verbs[i].op;
break;
}
vn += verbs[i].len + 1;
@ -4580,7 +4656,10 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
}
/* Otherwise (did not start with "+" or "-"), start by looking for the
name. */
name. If we find a name, add one to the opcode to change OP_CREF or
OP_RREF into OP_NCREF or OP_NRREF. These behave exactly the same,
except they record that the reference was originally to a name. The
information is used to check duplicate names. */
slot = cd->name_table;
for (i = 0; i < cd->names_found; i++)
@ -4595,6 +4674,7 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
{
recno = GET2(slot, 0);
PUT2(code, 2+LINK_SIZE, recno);
code[1+LINK_SIZE]++;
}
/* Search the pattern for a forward reference */
@ -4603,6 +4683,7 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
(options & PCRE_EXTENDED) != 0)) > 0)
{
PUT2(code, 2+LINK_SIZE, i);
code[1+LINK_SIZE]++;
}
/* If terminator == 0 it means that the name followed directly after
@ -4795,11 +4876,24 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
}
}
/* In the real compile, create the entry in the table */
/* In the real compile, create the entry in the table, maintaining
alphabetical order. Duplicate names for different numbers are
permitted only if PCRE_DUPNAMES is set. Duplicate names for the same
number are always OK. (An existing number can be re-used if (?|
appears in the pattern.) In either event, a duplicate name results in
a duplicate entry in the table, even if the number is the same. This
is because the number of names, and hence the table size, is computed
in the pre-compile, and it affects various numbers and pointers which
would all have to be modified, and the compiled code moved down, if
duplicates with the same number were omitted from the table. This
doesn't seem worth the hassle. However, *different* names for the
same number are not permitted. */
else
{
BOOL dupname = FALSE;
slot = cd->name_table;
for (i = 0; i < cd->names_found; i++)
{
int crc = memcmp(name, slot+2, namelen);
@ -4807,33 +4901,66 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
{
if (slot[2+namelen] == 0)
{
if ((options & PCRE_DUPNAMES) == 0)
if (GET2(slot, 0) != cd->bracount + 1 &&
(options & PCRE_DUPNAMES) == 0)
{
*errorcodeptr = ERR43;
goto FAILED;
}
else dupname = TRUE;
}
else crc = -1; /* Current name is substring */
else crc = -1; /* Current name is a substring */
}
/* Make space in the table and break the loop for an earlier
name. For a duplicate or later name, carry on. We do this for
duplicates so that in the simple case (when ?(| is not used) they
are in order of their numbers. */
if (crc < 0)
{
memmove(slot + cd->name_entry_size, slot,
(cd->names_found - i) * cd->name_entry_size);
break;
}
/* Continue the loop for a later or duplicate name */
slot += cd->name_entry_size;
}
/* For non-duplicate names, check for a duplicate number before
adding the new name. */
if (!dupname)
{
uschar *cslot = cd->name_table;
for (i = 0; i < cd->names_found; i++)
{
if (cslot != slot)
{
if (GET2(cslot, 0) == cd->bracount + 1)
{
*errorcodeptr = ERR65;
goto FAILED;
}
}
else i--;
cslot += cd->name_entry_size;
}
}
PUT2(slot, 0, cd->bracount + 1);
memcpy(slot + 2, name, namelen);
slot[2+namelen] = 0;
}
}
/* In both cases, count the number of names we've encountered. */
/* In both pre-compile and compile, count the number of names we've
encountered. */
ptr++; /* Move past > or ' */
cd->names_found++;
ptr++; /* Move past > or ' */
goto NUMBERED_GROUP;
@ -5002,7 +5129,8 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
if (lengthptr == NULL)
{
*code = OP_END;
if (recno != 0) called = find_bracket(cd->start_code, utf8, recno);
if (recno != 0)
called = _pcre_find_bracket(cd->start_code, utf8, recno);
/* Forward reference */
@ -5646,6 +5774,8 @@ uschar *code = *codeptr;
uschar *last_branch = code;
uschar *start_bracket = code;
uschar *reverse_count = NULL;
open_capitem capitem;
int capnumber = 0;
int firstbyte, reqbyte;
int branchfirstbyte, branchreqbyte;
int length;
@ -5672,6 +5802,17 @@ the code that abstracts option settings at the start of the pattern and makes
them global. It tests the value of length for (2 + 2*LINK_SIZE) in the
pre-compile phase to find out whether anything has yet been compiled or not. */
/* If this is a capturing subpattern, add to the chain of open capturing items
so that we can detect them if (*ACCEPT) is encountered. */
if (*code == OP_CBRA)
{
capnumber = GET2(code, 1 + LINK_SIZE);
capitem.number = capnumber;
capitem.next = cd->open_caps;
cd->open_caps = &capitem;
}
/* Offset is set zero to mark that this bracket is still open */
PUT(code, 1, 0);
@ -5766,21 +5907,29 @@ for (;;)
/* If lookbehind, check that this branch matches a fixed-length string, and
put the length into the OP_REVERSE item. Temporarily mark the end of the
branch with OP_END. */
branch with OP_END. If the branch contains OP_RECURSE, the result is -3
because there may be forward references that we can't check here. Set a
flag to cause another lookbehind check at the end. Why not do it all at the
end? Because common, erroneous checks are picked up here and the offset of
the problem can be shown. */
if (lookbehind)
{
int fixed_length;
*code = OP_END;
fixed_length = find_fixedlength(last_branch, options);
fixed_length = find_fixedlength(last_branch, options, FALSE, cd);
DPRINTF(("fixed length = %d\n", fixed_length));
if (fixed_length < 0)
if (fixed_length == -3)
{
cd->check_lookbehind = TRUE;
}
else if (fixed_length < 0)
{
*errorcodeptr = (fixed_length == -2)? ERR36 : ERR25;
*ptrptr = ptr;
return FALSE;
}
PUT(reverse_count, 0, fixed_length);
else { PUT(reverse_count, 0, fixed_length); }
}
}
@ -5808,6 +5957,10 @@ for (;;)
while (branch_length > 0);
}
/* If it was a capturing subpattern, remove it from the chain. */
if (capnumber > 0) cd->open_caps = cd->open_caps->next;
/* Fill in the ket */
*code = OP_KET;
@ -6010,7 +6163,9 @@ do {
switch (*scode)
{
case OP_CREF:
case OP_NCREF:
case OP_RREF:
case OP_NRREF:
case OP_DEF:
return FALSE;
@ -6179,9 +6334,7 @@ int length = 1; /* For final END opcode */
int firstbyte, reqbyte, newline;
int errorcode = 0;
int skipatstart = 0;
#ifdef SUPPORT_UTF8
BOOL utf8;
#endif
BOOL utf8 = (options & PCRE_UTF8) != 0;
size_t size;
uschar *code;
const uschar *codestart;
@ -6278,7 +6431,6 @@ while (ptr[skipatstart] == CHAR_LEFT_PARENTHESIS &&
/* Can't support UTF8 unless PCRE has been compiled to include the code. */
#ifdef SUPPORT_UTF8
utf8 = (options & PCRE_UTF8) != 0;
if (utf8 && (options & PCRE_NO_UTF8_CHECK) == 0 &&
(*erroroffset = _pcre_valid_utf8((uschar *)pattern, -1)) >= 0)
{
@ -6286,7 +6438,7 @@ if (utf8 && (options & PCRE_NO_UTF8_CHECK) == 0 &&
goto PCRE_EARLY_ERROR_RETURN2;
}
#else
if ((options & PCRE_UTF8) != 0)
if (utf8)
{
errorcode = ERR32;
goto PCRE_EARLY_ERROR_RETURN;
@ -6375,6 +6527,7 @@ cd->end_pattern = (const uschar *)(pattern + strlen(pattern));
cd->req_varyopt = 0;
cd->external_options = options;
cd->external_flags = 0;
cd->open_caps = NULL;
/* Now do the pre-compile. On error, errorcode will be set non-zero, so we
don't need to look at the result of the function here. The initial options have
@ -6449,6 +6602,8 @@ cd->start_code = codestart;
cd->hwm = cworkspace;
cd->req_varyopt = 0;
cd->had_accept = FALSE;
cd->check_lookbehind = FALSE;
cd->open_caps = NULL;
/* Set up a starting, non-extracting bracket, then compile the expression. On
error, errorcode will be set non-zero, so we don't need to look at the result
@ -6487,7 +6642,7 @@ while (errorcode == 0 && cd->hwm > cworkspace)
cd->hwm -= LINK_SIZE;
offset = GET(cd->hwm, 0);
recno = GET(codestart, offset);
groupptr = find_bracket(codestart, (re->options & PCRE_UTF8) != 0, recno);
groupptr = _pcre_find_bracket(codestart, utf8, recno);
if (groupptr == NULL) errorcode = ERR53;
else PUT(((uschar *)codestart), offset, groupptr - codestart);
}
@ -6497,6 +6652,47 @@ subpattern. */
if (errorcode == 0 && re->top_backref > re->top_bracket) errorcode = ERR15;
/* If there were any lookbehind assertions that contained OP_RECURSE
(recursions or subroutine calls), a flag is set for them to be checked here,
because they may contain forward references. Actual recursions can't be fixed
length, but subroutine calls can. It is done like this so that those without
OP_RECURSE that are not fixed length get a diagnosic with a useful offset. The
exceptional ones forgo this. We scan the pattern to check that they are fixed
length, and set their lengths. */
if (cd->check_lookbehind)
{
uschar *cc = (uschar *)codestart;
/* Loop, searching for OP_REVERSE items, and process those that do not have
their length set. (Actually, it will also re-process any that have a length
of zero, but that is a pathological case, and it does no harm.) When we find
one, we temporarily terminate the branch it is in while we scan it. */
for (cc = (uschar *)_pcre_find_bracket(codestart, utf8, -1);
cc != NULL;
cc = (uschar *)_pcre_find_bracket(cc, utf8, -1))
{
if (GET(cc, 1) == 0)
{
int fixed_length;
uschar *be = cc - 1 - LINK_SIZE + GET(cc, -LINK_SIZE);
int end_op = *be;
*be = OP_END;
fixed_length = find_fixedlength(cc, re->options, TRUE, cd);
*be = end_op;
DPRINTF(("fixed length = %d\n", fixed_length));
if (fixed_length < 0)
{
errorcode = (fixed_length == -2)? ERR36 : ERR25;
break;
}
PUT(cc, 1, fixed_length);
}
cc += 1 + LINK_SIZE;
}
}
/* Failed to compile, or error while post-processing */
if (errorcode != 0)

File diff suppressed because it is too large Load Diff

View File

@ -117,10 +117,16 @@ switch (what)
case PCRE_INFO_FIRSTTABLE:
*((const uschar **)where) =
(study != NULL && (study->options & PCRE_STUDY_MAPPED) != 0)?
(study != NULL && (study->flags & PCRE_STUDY_MAPPED) != 0)?
((const pcre_study_data *)extra_data->study_data)->start_bits : NULL;
break;
case PCRE_INFO_MINLENGTH:
*((int *)where) =
(study != NULL && (study->flags & PCRE_STUDY_MINLEN) != 0)?
study->minlength : -1;
break;
case PCRE_INFO_LASTLITERAL:
*((int *)where) =
((re->flags & PCRE_REQCHSET) != 0)? re->req_byte : -1;
@ -142,6 +148,9 @@ switch (what)
*((const uschar **)where) = (const uschar *)(_pcre_default_tables);
break;
/* From release 8.00 this will always return TRUE because NOPARTIAL is
no longer ever set (the restrictions have been removed). */
case PCRE_INFO_OKPARTIAL:
*((int *)where) = (re->flags & PCRE_NOPARTIAL) == 0;
break;

View File

@ -535,7 +535,9 @@ Standard C system should have one. */
/* Private flags containing information about the compiled regex. They used to
live at the top end of the options word, but that got almost full, so now they
are in a 16-bit flags word. */
are in a 16-bit flags word. From release 8.00, PCRE_NOPARTIAL is unused, as
the restrictions on partial matching have been lifted. It remains for backwards
compatibility. */
#define PCRE_NOPARTIAL 0x0001 /* can't use partial with this regex */
#define PCRE_FIRSTSET 0x0002 /* first_byte is set */
@ -547,6 +549,7 @@ are in a 16-bit flags word. */
/* Options for the "extra" block produced by pcre_study(). */
#define PCRE_STUDY_MAPPED 0x01 /* a map of starting chars exists */
#define PCRE_STUDY_MINLEN 0x02 /* a minimum length field exists */
/* Masks for identifying the public options that are permitted at compile
time, run time, or study time, respectively. */
@ -562,14 +565,15 @@ time, run time, or study time, respectively. */
PCRE_JAVASCRIPT_COMPAT)
#define PUBLIC_EXEC_OPTIONS \
(PCRE_ANCHORED|PCRE_NOTBOL|PCRE_NOTEOL|PCRE_NOTEMPTY|PCRE_NO_UTF8_CHECK| \
PCRE_PARTIAL|PCRE_NEWLINE_BITS|PCRE_BSR_ANYCRLF|PCRE_BSR_UNICODE| \
PCRE_NO_START_OPTIMIZE)
(PCRE_ANCHORED|PCRE_NOTBOL|PCRE_NOTEOL|PCRE_NOTEMPTY|PCRE_NOTEMPTY_ATSTART| \
PCRE_NO_UTF8_CHECK|PCRE_PARTIAL_HARD|PCRE_PARTIAL_SOFT|PCRE_NEWLINE_BITS| \
PCRE_BSR_ANYCRLF|PCRE_BSR_UNICODE|PCRE_NO_START_OPTIMIZE)
#define PUBLIC_DFA_EXEC_OPTIONS \
(PCRE_ANCHORED|PCRE_NOTBOL|PCRE_NOTEOL|PCRE_NOTEMPTY|PCRE_NO_UTF8_CHECK| \
PCRE_PARTIAL|PCRE_DFA_SHORTEST|PCRE_DFA_RESTART|PCRE_NEWLINE_BITS| \
PCRE_BSR_ANYCRLF|PCRE_BSR_UNICODE|PCRE_NO_START_OPTIMIZE)
(PCRE_ANCHORED|PCRE_NOTBOL|PCRE_NOTEOL|PCRE_NOTEMPTY|PCRE_NOTEMPTY_ATSTART| \
PCRE_NO_UTF8_CHECK|PCRE_PARTIAL_HARD|PCRE_PARTIAL_SOFT|PCRE_DFA_SHORTEST| \
PCRE_DFA_RESTART|PCRE_NEWLINE_BITS|PCRE_BSR_ANYCRLF|PCRE_BSR_UNICODE| \
PCRE_NO_START_OPTIMIZE)
#define PUBLIC_STUDY_OPTIONS 0 /* None defined */
@ -1206,8 +1210,8 @@ enum { ESC_A = 1, ESC_G, ESC_K, ESC_B, ESC_b, ESC_D, ESC_d, ESC_S, ESC_s,
OP_EOD must correspond in order to the list of escapes immediately above.
*** NOTE NOTE NOTE *** Whenever this list is updated, the two macro definitions
that follow must also be updated to match. There is also a table called
"coptable" in pcre_dfa_exec.c that must be updated. */
that follow must also be updated to match. There are also tables called
"coptable" and "poptable" in pcre_dfa_exec.c that must be updated. */
enum {
OP_END, /* 0 End of pattern */
@ -1343,30 +1347,39 @@ enum {
OP_SCBRA, /* 98 Start of capturing bracket, check empty */
OP_SCOND, /* 99 Conditional group, check empty */
OP_CREF, /* 100 Used to hold a capture number as condition */
OP_RREF, /* 101 Used to hold a recursion number as condition */
OP_DEF, /* 102 The DEFINE condition */
/* The next two pairs must (respectively) be kept together. */
OP_BRAZERO, /* 103 These two must remain together and in this */
OP_BRAMINZERO, /* 104 order. */
OP_CREF, /* 100 Used to hold a capture number as condition */
OP_NCREF, /* 101 Same, but generaged by a name reference*/
OP_RREF, /* 102 Used to hold a recursion number as condition */
OP_NRREF, /* 103 Same, but generaged by a name reference*/
OP_DEF, /* 104 The DEFINE condition */
OP_BRAZERO, /* 105 These two must remain together and in this */
OP_BRAMINZERO, /* 106 order. */
/* These are backtracking control verbs */
OP_PRUNE, /* 105 */
OP_SKIP, /* 106 */
OP_THEN, /* 107 */
OP_COMMIT, /* 108 */
OP_PRUNE, /* 107 */
OP_SKIP, /* 108 */
OP_THEN, /* 109 */
OP_COMMIT, /* 110 */
/* These are forced failure and success verbs */
OP_FAIL, /* 109 */
OP_ACCEPT, /* 110 */
OP_FAIL, /* 111 */
OP_ACCEPT, /* 112 */
OP_CLOSE, /* 113 Used before OP_ACCEPT to close open captures */
/* This is used to skip a subpattern with a {0} quantifier */
OP_SKIPZERO /* 111 */
OP_SKIPZERO /* 114 */
};
/* *** NOTE NOTE NOTE *** Whenever the list above is updated, the two macro
definitions that follow must also be updated to match. There are also tables
called "coptable" cna "poptable" in pcre_dfa_exec.c that must be updated. */
/* This macro defines textual names for all the opcodes. These are used only
for debugging. The macro is referenced only in pcre_printint.c. */
@ -1388,9 +1401,10 @@ for debugging. The macro is referenced only in pcre_printint.c. */
"Alt", "Ket", "KetRmax", "KetRmin", "Assert", "Assert not", \
"AssertB", "AssertB not", "Reverse", \
"Once", "Bra", "CBra", "Cond", "SBra", "SCBra", "SCond", \
"Cond ref", "Cond rec", "Cond def", "Brazero", "Braminzero", \
"Cond ref", "Cond nref", "Cond rec", "Cond nrec", "Cond def", \
"Brazero", "Braminzero", \
"*PRUNE", "*SKIP", "*THEN", "*COMMIT", "*FAIL", "*ACCEPT", \
"Skip zero"
"Close", "Skip zero"
/* This macro defines the length of fixed length operations in the compiled
@ -1450,15 +1464,16 @@ in UTF-8 mode. The code that uses this table must know about such things. */
1+LINK_SIZE, /* SBRA */ \
3+LINK_SIZE, /* SCBRA */ \
1+LINK_SIZE, /* SCOND */ \
3, /* CREF */ \
3, /* RREF */ \
3, 3, /* CREF, NCREF */ \
3, 3, /* RREF, NRREF */ \
1, /* DEF */ \
1, 1, /* BRAZERO, BRAMINZERO */ \
1, 1, 1, 1, /* PRUNE, SKIP, THEN, COMMIT, */ \
1, 1, 1 /* FAIL, ACCEPT, SKIPZERO */
1, 1, 3, 1 /* FAIL, ACCEPT, CLOSE, SKIPZERO */
/* A magic value for OP_RREF to indicate the "any recursion" condition. */
/* A magic value for OP_RREF and OP_NRREF to indicate the "any recursion"
condition. */
#define RREF_ANY 0xffff
@ -1471,7 +1486,7 @@ enum { ERR0, ERR1, ERR2, ERR3, ERR4, ERR5, ERR6, ERR7, ERR8, ERR9,
ERR30, ERR31, ERR32, ERR33, ERR34, ERR35, ERR36, ERR37, ERR38, ERR39,
ERR40, ERR41, ERR42, ERR43, ERR44, ERR45, ERR46, ERR47, ERR48, ERR49,
ERR50, ERR51, ERR52, ERR53, ERR54, ERR55, ERR56, ERR57, ERR58, ERR59,
ERR60, ERR61, ERR62, ERR63, ERR64 };
ERR60, ERR61, ERR62, ERR63, ERR64, ERR65 };
/* The real format of the start of the pcre block; the index of names and the
code vector run on as long as necessary after the end. We store an explicit
@ -1487,7 +1502,7 @@ Because people can now save and re-use compiled patterns, any additions to this
structure should be made at the end, and something earlier (e.g. a new
flag in the options or one of the dummy fields) should indicate that the new
fields are present. Currently PCRE always sets the dummy fields to zero.
NOTE NOTE NOTE:
NOTE NOTE NOTE
*/
typedef struct real_pcre {
@ -1514,10 +1529,20 @@ remark (see NOTE above) about extending this structure applies. */
typedef struct pcre_study_data {
pcre_uint32 size; /* Total that was malloced */
pcre_uint32 options;
uschar start_bits[32];
pcre_uint32 flags; /* Private flags */
uschar start_bits[32]; /* Starting char bits */
pcre_uint32 minlength; /* Minimum subject length */
} pcre_study_data;
/* Structure for building a chain of open capturing subpatterns during
compiling, so that instructions to close them can be compiled when (*ACCEPT) is
encountered. */
typedef struct open_capitem {
struct open_capitem *next; /* Chain link */
pcre_uint16 number; /* Capture number */
} open_capitem;
/* Structure for passing "static" information around between the functions
doing the compiling, so that they are thread-safe. */
@ -1530,6 +1555,7 @@ typedef struct compile_data {
const uschar *start_code; /* The start of the compiled code */
const uschar *start_pattern; /* The start of the pattern */
const uschar *end_pattern; /* The end of the pattern */
open_capitem *open_caps; /* Chain of open capture items */
uschar *hwm; /* High watermark of workspace */
uschar *name_table; /* The name/number table */
int names_found; /* Number of entries so far */
@ -1542,6 +1568,7 @@ typedef struct compile_data {
int external_flags; /* External flag bits to be set */
int req_varyopt; /* "After variable item" flag for reqbyte */
BOOL had_accept; /* (*ACCEPT) encountered */
BOOL check_lookbehind; /* Lookbehinds need later checking */
int nltype; /* Newline type */
int nllen; /* Newline string length */
uschar nl[4]; /* Newline string when fixed length */
@ -1565,6 +1592,7 @@ typedef struct recursion_info {
USPTR save_start; /* Old value of mstart */
int *offset_save; /* Pointer to start of saved offsets */
int saved_max; /* Number of saved offsets */
int save_offset_top; /* Current value of offset_top */
} recursion_info;
/* Structure for building a chain of data for holding the values of the subject
@ -1589,6 +1617,9 @@ typedef struct match_data {
int offset_max; /* The maximum usable for return data */
int nltype; /* Newline type */
int nllen; /* Newline string length */
int name_count; /* Number of names in name table */
int name_entry_size; /* Size of entry in names table */
uschar *name_table; /* Table of names */
uschar nl[4]; /* Newline string when fixed */
const uschar *lcc; /* Points to lower casing table */
const uschar *ctypes; /* Points to table of type maps */
@ -1599,7 +1630,7 @@ typedef struct match_data {
BOOL jscript_compat; /* JAVASCRIPT_COMPAT flag */
BOOL endonly; /* Dollar not before final \n */
BOOL notempty; /* Empty string match not wanted */
BOOL partial; /* PARTIAL flag */
BOOL notempty_atstart; /* Empty string match at start not wanted */
BOOL hitend; /* Hit the end of the subject at some point */
BOOL bsr_anycrlf; /* \R is just any CRLF, not full Unicode */
const uschar *start_code; /* For use when recursing */
@ -1607,6 +1638,8 @@ typedef struct match_data {
USPTR end_subject; /* End of the subject string */
USPTR start_match_ptr; /* Start of matched string */
USPTR end_match_ptr; /* Subject position at end match */
USPTR start_used_ptr; /* Earliest consulted character */
int partial; /* PARTIAL options */
int end_offset_top; /* Highwater mark at end of match */
int capture_last; /* Most recent capture number */
int start_offset; /* The start offset value */
@ -1623,7 +1656,9 @@ typedef struct dfa_match_data {
const uschar *start_code; /* Start of the compiled pattern */
const uschar *start_subject; /* Start of the subject string */
const uschar *end_subject; /* End of subject string */
const uschar *start_used_ptr; /* Earliest consulted character */
const uschar *tables; /* Character tables */
int start_offset; /* The start offset value */
int moptions; /* Match options */
int poptions; /* Pattern options */
int nltype; /* Newline type */
@ -1702,15 +1737,16 @@ extern const uschar _pcre_OP_lengths[];
one of the exported public functions. They have to be "external" in the C
sense, but are not part of the PCRE public API. */
extern BOOL _pcre_is_newline(const uschar *, int, const uschar *,
int *, BOOL);
extern int _pcre_ord2utf8(int, uschar *);
extern real_pcre *_pcre_try_flipped(const real_pcre *, real_pcre *,
const pcre_study_data *, pcre_study_data *);
extern int _pcre_valid_utf8(const uschar *, int);
extern BOOL _pcre_was_newline(const uschar *, int, const uschar *,
int *, BOOL);
extern BOOL _pcre_xclass(int, const uschar *);
extern const uschar *_pcre_find_bracket(const uschar *, BOOL, int);
extern BOOL _pcre_is_newline(const uschar *, int, const uschar *,
int *, BOOL);
extern int _pcre_ord2utf8(int, uschar *);
extern real_pcre *_pcre_try_flipped(const real_pcre *, real_pcre *,
const pcre_study_data *, pcre_study_data *);
extern int _pcre_valid_utf8(const uschar *, int);
extern BOOL _pcre_was_newline(const uschar *, int, const uschar *,
int *, BOOL);
extern BOOL _pcre_xclass(int, const uschar *);
/* Unicode character database (UCD) */

View File

@ -246,7 +246,12 @@ for(;;)
fprintf(f, "%s", OP_names[*code]);
break;
case OP_CLOSE:
fprintf(f, " %s %d", OP_names[*code], GET2(code, 1));
break;
case OP_CREF:
case OP_NCREF:
fprintf(f, "%3d %s", GET2(code,1), OP_names[*code]);
break;
@ -258,6 +263,14 @@ for(;;)
fprintf(f, " Cond recurse %d", c);
break;
case OP_NRREF:
c = GET2(code, 1);
if (c == RREF_ANY)
fprintf(f, " Cond nrecurse any");
else
fprintf(f, " Cond nrecurse %d", c);
break;
case OP_DEF:
fprintf(f, " Cond def");
break;

View File

@ -6,7 +6,7 @@
and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
Copyright (c) 1997-2008 University of Cambridge
Copyright (c) 1997-2009 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
@ -52,6 +52,364 @@ supporting functions. */
enum { SSB_FAIL, SSB_DONE, SSB_CONTINUE };
/*************************************************
* Find the minimum subject length for a group *
*************************************************/
/* Scan a parenthesized group and compute the minimum length of subject that
is needed to match it. This is a lower bound; it does not mean there is a
string of that length that matches. In UTF8 mode, the result is in characters
rather than bytes.
Arguments:
code pointer to start of group (the bracket)
startcode pointer to start of the whole pattern
options the compiling options
Returns: the minimum length
-1 if \C was encountered
-2 internal error (missing capturing bracket)
*/
static int
find_minlength(const uschar *code, const uschar *startcode, int options)
{
int length = -1;
BOOL utf8 = (options & PCRE_UTF8) != 0;
BOOL had_recurse = FALSE;
register int branchlength = 0;
register uschar *cc = (uschar *)code + 1 + LINK_SIZE;
if (*code == OP_CBRA || *code == OP_SCBRA) cc += 2;
/* Scan along the opcodes for this branch. If we get to the end of the
branch, check the length against that of the other branches. */
for (;;)
{
int d, min;
uschar *cs, *ce;
register int op = *cc;
switch (op)
{
case OP_CBRA:
case OP_SCBRA:
case OP_BRA:
case OP_SBRA:
case OP_ONCE:
case OP_COND:
case OP_SCOND:
d = find_minlength(cc, startcode, options);
if (d < 0) return d;
branchlength += d;
do cc += GET(cc, 1); while (*cc == OP_ALT);
cc += 1 + LINK_SIZE;
break;
/* Reached end of a branch; if it's a ket it is the end of a nested
call. If it's ALT it is an alternation in a nested call. If it is
END it's the end of the outer call. All can be handled by the same code. */
case OP_ALT:
case OP_KET:
case OP_KETRMAX:
case OP_KETRMIN:
case OP_END:
if (length < 0 || (!had_recurse && branchlength < length))
length = branchlength;
if (*cc != OP_ALT) return length;
cc += 1 + LINK_SIZE;
branchlength = 0;
had_recurse = FALSE;
break;
/* Skip over assertive subpatterns */
case OP_ASSERT:
case OP_ASSERT_NOT:
case OP_ASSERTBACK:
case OP_ASSERTBACK_NOT:
do cc += GET(cc, 1); while (*cc == OP_ALT);
/* Fall through */
/* Skip over things that don't match chars */
case OP_REVERSE:
case OP_CREF:
case OP_NCREF:
case OP_RREF:
case OP_NRREF:
case OP_DEF:
case OP_OPT:
case OP_CALLOUT:
case OP_SOD:
case OP_SOM:
case OP_EOD:
case OP_EODN:
case OP_CIRC:
case OP_DOLL:
case OP_NOT_WORD_BOUNDARY:
case OP_WORD_BOUNDARY:
cc += _pcre_OP_lengths[*cc];
break;
/* Skip over a subpattern that has a {0} or {0,x} quantifier */
case OP_BRAZERO:
case OP_BRAMINZERO:
case OP_SKIPZERO:
cc += _pcre_OP_lengths[*cc];
do cc += GET(cc, 1); while (*cc == OP_ALT);
cc += 1 + LINK_SIZE;
break;
/* Handle literal characters and + repetitions */
case OP_CHAR:
case OP_CHARNC:
case OP_NOT:
case OP_PLUS:
case OP_MINPLUS:
case OP_POSPLUS:
case OP_NOTPLUS:
case OP_NOTMINPLUS:
case OP_NOTPOSPLUS:
branchlength++;
cc += 2;
#ifdef SUPPORT_UTF8
if (utf8 && cc[-1] >= 0xc0) cc += _pcre_utf8_table4[cc[-1] & 0x3f];
#endif
break;
case OP_TYPEPLUS:
case OP_TYPEMINPLUS:
case OP_TYPEPOSPLUS:
branchlength++;
cc += (cc[1] == OP_PROP || cc[1] == OP_NOTPROP)? 4 : 2;
break;
/* Handle exact repetitions. The count is already in characters, but we
need to skip over a multibyte character in UTF8 mode. */
case OP_EXACT:
case OP_NOTEXACT:
branchlength += GET2(cc,1);
cc += 4;
#ifdef SUPPORT_UTF8
if (utf8 && cc[-1] >= 0xc0) cc += _pcre_utf8_table4[cc[-1] & 0x3f];
#endif
break;
case OP_TYPEEXACT:
branchlength += GET2(cc,1);
cc += (cc[3] == OP_PROP || cc[3] == OP_NOTPROP)? 6 : 4;
break;
/* Handle single-char non-literal matchers */
case OP_PROP:
case OP_NOTPROP:
cc += 2;
/* Fall through */
case OP_NOT_DIGIT:
case OP_DIGIT:
case OP_NOT_WHITESPACE:
case OP_WHITESPACE:
case OP_NOT_WORDCHAR:
case OP_WORDCHAR:
case OP_ANY:
case OP_ALLANY:
case OP_EXTUNI:
case OP_HSPACE:
case OP_NOT_HSPACE:
case OP_VSPACE:
case OP_NOT_VSPACE:
branchlength++;
cc++;
break;
/* "Any newline" might match two characters */
case OP_ANYNL:
branchlength += 2;
cc++;
break;
/* The single-byte matcher means we can't proceed in UTF-8 mode */
case OP_ANYBYTE:
#ifdef SUPPORT_UTF8
if (utf8) return -1;
#endif
branchlength++;
cc++;
break;
/* For repeated character types, we have to test for \p and \P, which have
an extra two bytes of parameters. */
case OP_TYPESTAR:
case OP_TYPEMINSTAR:
case OP_TYPEQUERY:
case OP_TYPEMINQUERY:
case OP_TYPEPOSSTAR:
case OP_TYPEPOSQUERY:
if (cc[1] == OP_PROP || cc[1] == OP_NOTPROP) cc += 2;
cc += _pcre_OP_lengths[op];
break;
case OP_TYPEUPTO:
case OP_TYPEMINUPTO:
case OP_TYPEPOSUPTO:
if (cc[3] == OP_PROP || cc[3] == OP_NOTPROP) cc += 2;
cc += _pcre_OP_lengths[op];
break;
/* Check a class for variable quantification */
#ifdef SUPPORT_UTF8
case OP_XCLASS:
cc += GET(cc, 1) - 33;
/* Fall through */
#endif
case OP_CLASS:
case OP_NCLASS:
cc += 33;
switch (*cc)
{
case OP_CRPLUS:
case OP_CRMINPLUS:
branchlength++;
/* Fall through */
case OP_CRSTAR:
case OP_CRMINSTAR:
case OP_CRQUERY:
case OP_CRMINQUERY:
cc++;
break;
case OP_CRRANGE:
case OP_CRMINRANGE:
branchlength += GET2(cc,1);
cc += 5;
break;
default:
branchlength++;
break;
}
break;
/* Backreferences and subroutine calls are treated in the same way: we find
the minimum length for the subpattern. A recursion, however, causes an
a flag to be set that causes the length of this branch to be ignored. The
logic is that a recursion can only make sense if there is another
alternation that stops the recursing. That will provide the minimum length
(when no recursion happens). A backreference within the group that it is
referencing behaves in the same way.
If PCRE_JAVASCRIPT_COMPAT is set, a backreference to an unset bracket
matches an empty string (by default it causes a matching failure), so in
that case we must set the minimum length to zero. */
case OP_REF:
if ((options & PCRE_JAVASCRIPT_COMPAT) == 0)
{
ce = cs = (uschar *)_pcre_find_bracket(startcode, utf8, GET2(cc, 1));
if (cs == NULL) return -2;
do ce += GET(ce, 1); while (*ce == OP_ALT);
if (cc > cs && cc < ce)
{
d = 0;
had_recurse = TRUE;
}
else d = find_minlength(cs, startcode, options);
}
else d = 0;
cc += 3;
/* Handle repeated back references */
switch (*cc)
{
case OP_CRSTAR:
case OP_CRMINSTAR:
case OP_CRQUERY:
case OP_CRMINQUERY:
min = 0;
cc++;
break;
case OP_CRRANGE:
case OP_CRMINRANGE:
min = GET2(cc, 1);
cc += 5;
break;
default:
min = 1;
break;
}
branchlength += min * d;
break;
case OP_RECURSE:
cs = ce = (uschar *)startcode + GET(cc, 1);
if (cs == NULL) return -2;
do ce += GET(ce, 1); while (*ce == OP_ALT);
if (cc > cs && cc < ce)
had_recurse = TRUE;
else
branchlength += find_minlength(cs, startcode, options);
cc += 1 + LINK_SIZE;
break;
/* Anything else does not or need not match a character. We can get the
item's length from the table, but for those that can match zero occurrences
of a character, we must take special action for UTF-8 characters. */
case OP_UPTO:
case OP_NOTUPTO:
case OP_MINUPTO:
case OP_NOTMINUPTO:
case OP_POSUPTO:
case OP_STAR:
case OP_MINSTAR:
case OP_NOTMINSTAR:
case OP_POSSTAR:
case OP_NOTPOSSTAR:
case OP_QUERY:
case OP_MINQUERY:
case OP_NOTMINQUERY:
case OP_POSQUERY:
case OP_NOTPOSQUERY:
cc += _pcre_OP_lengths[op];
#ifdef SUPPORT_UTF8
if (utf8 && cc[-1] >= 0xc0) cc += _pcre_utf8_table4[cc[-1] & 0x3f];
#endif
break;
/* For the record, these are the opcodes that are matched by "default":
OP_ACCEPT, OP_CLOSE, OP_COMMIT, OP_FAIL, OP_PRUNE, OP_SET_SOM, OP_SKIP,
OP_THEN. */
default:
cc += _pcre_OP_lengths[op];
break;
}
}
/* Control never gets here */
}
/*************************************************
* Set a bit and maybe its alternate case *
*************************************************/
@ -498,13 +856,15 @@ Arguments:
set NULL unless error
Returns: pointer to a pcre_extra block, with study_data filled in and the
appropriate flag set;
appropriate flags set;
NULL on error or if no optimization possible
*/
PCRE_EXP_DEFN pcre_extra * PCRE_CALL_CONVENTION
pcre_study(const pcre *external_re, int options, const char **errorptr)
{
int min;
BOOL bits_set = FALSE;
uschar start_bits[32];
pcre_extra *extra;
pcre_study_data *study;
@ -531,30 +891,39 @@ code = (uschar *)re + re->name_table_offset +
(re->name_count * re->name_entry_size);
/* For an anchored pattern, or an unanchored pattern that has a first char, or
a multiline pattern that matches only at "line starts", no further processing
at present. */
a multiline pattern that matches only at "line starts", there is no point in
seeking a list of starting bytes. */
if ((re->options & PCRE_ANCHORED) != 0 ||
(re->flags & (PCRE_FIRSTSET|PCRE_STARTLINE)) != 0)
return NULL;
if ((re->options & PCRE_ANCHORED) == 0 &&
(re->flags & (PCRE_FIRSTSET|PCRE_STARTLINE)) == 0)
{
/* Set the character tables in the block that is passed around */
/* Set the character tables in the block that is passed around */
tables = re->tables;
if (tables == NULL)
(void)pcre_fullinfo(external_re, NULL, PCRE_INFO_DEFAULT_TABLES,
(void *)(&tables));
tables = re->tables;
if (tables == NULL)
(void)pcre_fullinfo(external_re, NULL, PCRE_INFO_DEFAULT_TABLES,
(void *)(&tables));
compile_block.lcc = tables + lcc_offset;
compile_block.fcc = tables + fcc_offset;
compile_block.cbits = tables + cbits_offset;
compile_block.ctypes = tables + ctypes_offset;
compile_block.lcc = tables + lcc_offset;
compile_block.fcc = tables + fcc_offset;
compile_block.cbits = tables + cbits_offset;
compile_block.ctypes = tables + ctypes_offset;
/* See if we can find a fixed set of initial characters for the pattern. */
/* See if we can find a fixed set of initial characters for the pattern. */
memset(start_bits, 0, 32 * sizeof(uschar));
bits_set = set_start_bits(code, start_bits,
(re->options & PCRE_CASELESS) != 0, (re->options & PCRE_UTF8) != 0,
&compile_block) == SSB_DONE;
}
memset(start_bits, 0, 32 * sizeof(uschar));
if (set_start_bits(code, start_bits, (re->options & PCRE_CASELESS) != 0,
(re->options & PCRE_UTF8) != 0, &compile_block) != SSB_DONE) return NULL;
/* Find the minimum length of subject string. */
min = find_minlength(code, code, re->options);
/* Return NULL if no optimization is possible. */
if (!bits_set && min < 0) return NULL;
/* Get a pcre_extra block and a pcre_study_data block. The study data is put in
the latter, which is pointed to by the former, which may also get additional
@ -577,8 +946,19 @@ extra->flags = PCRE_EXTRA_STUDY_DATA;
extra->study_data = study;
study->size = sizeof(pcre_study_data);
study->options = PCRE_STUDY_MAPPED;
memcpy(study->start_bits, start_bits, sizeof(start_bits));
study->flags = 0;
if (bits_set)
{
study->flags |= PCRE_STUDY_MAPPED;
memcpy(study->start_bits, start_bits, sizeof(start_bits));
}
if (min >= 0)
{
study->flags |= PCRE_STUDY_MINLEN;
study->minlength = min;
}
return extra;
}

View File

@ -6,7 +6,7 @@
and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
Copyright (c) 1997-2008 University of Cambridge
Copyright (c) 1997-2009 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
@ -126,7 +126,9 @@ if (study != NULL)
{
*internal_study = *study; /* To copy other fields */
internal_study->size = byteflip(study->size, sizeof(study->size));
internal_study->options = byteflip(study->options, sizeof(study->options));
internal_study->flags = byteflip(study->flags, sizeof(study->flags));
internal_study->minlength = byteflip(study->minlength,
sizeof(study->minlength));
}
return internal_re;

View File

@ -1,9 +1,26 @@
#include "config.h"
#include "pcre_internal.h"
/* Unicode character database. */
/* This file was autogenerated by the MultiStage2.py script. */
/* Total size: 52808 bytes, block size: 128. */
/* The tables herein are needed only when UCP support is built */
/* into PCRE. This module should not be referenced otherwise, so */
/* it should not matter whether it is compiled or not. However */
/* a comment was received about space saving - maybe the guy linked */
/* all the modules rather than using a library - so we include a */
/* condition to cut out the tables when not needed. But don't leave */
/* a totally empty module because some compilers barf at that. */
/* Instead, just supply small dummy tables. */
#ifndef SUPPORT_UCP
const ucd_record _pcre_ucd_records[] = {{0,0,0 }};
const uschar _pcre_ucd_stage1[] = {0};
const pcre_uint16 _pcre_ucd_stage2[] = {0};
#else
/* When recompiling tables with a new Unicode version,
please check types in the structure definition from pcre_internal.h:
typedef struct {
@ -2606,3 +2623,4 @@ const pcre_uint16 _pcre_ucd_stage2[] = { /* 40448 bytes, block = 128 */
#if UCD_BLOCK_SIZE != 128
#error Please correct UCD_BLOCK_SIZE in pcre_internal.h
#endif
#endif /* SUPPORT_UCP */

View File

@ -223,12 +223,12 @@ if (namecount <= 0) printf("No named substrings\n"); else
* *
* If the previous match WAS for an empty string, we can't do that, as it *
* would lead to an infinite loop. Instead, a special call of pcre_exec() *
* is made with the PCRE_NOTEMPTY and PCRE_ANCHORED flags set. The first *
* of these tells PCRE that an empty string is not a valid match; other *
* possibilities must be tried. The second flag restricts PCRE to one *
* match attempt at the initial string position. If this match succeeds, *
* an alternative to the empty string match has been found, and we can *
* proceed round the loop. *
* is made with the PCRE_NOTEMPTY_ATSTART and PCRE_ANCHORED flags set. *
* The first of these tells PCRE that an empty string at the start of the *
* subject is not a valid match; other possibilities must be tried. The *
* second flag restricts PCRE to one match attempt at the initial string *
* position. If this match succeeds, an alternative to the empty string *
* match has been found, and we can proceed round the loop. *
*************************************************************************/
if (!find_all)
@ -251,7 +251,7 @@ for (;;)
if (ovector[0] == ovector[1])
{
if (ovector[0] == subject_length) break;
options = PCRE_NOTEMPTY | PCRE_ANCHORED;
options = PCRE_NOTEMPTY_ATSTART | PCRE_ANCHORED;
}
/* Run the next matching operation */

View File

@ -68,64 +68,80 @@ static const int eint[] = {
REG_EESCAPE, /* \c at end of pattern */
REG_EESCAPE, /* unrecognized character follows \ */
REG_BADBR, /* numbers out of order in {} quantifier */
/* 5 */
REG_BADBR, /* number too big in {} quantifier */
REG_EBRACK, /* missing terminating ] for character class */
REG_ECTYPE, /* invalid escape sequence in character class */
REG_ERANGE, /* range out of order in character class */
REG_BADRPT, /* nothing to repeat */
/* 10 */
REG_BADRPT, /* operand of unlimited repeat could match the empty string */
REG_ASSERT, /* internal error: unexpected repeat */
REG_BADPAT, /* unrecognized character after (? */
REG_BADPAT, /* POSIX named classes are supported only within a class */
REG_EPAREN, /* missing ) */
/* 15 */
REG_ESUBREG, /* reference to non-existent subpattern */
REG_INVARG, /* erroffset passed as NULL */
REG_INVARG, /* unknown option bit(s) set */
REG_EPAREN, /* missing ) after comment */
REG_ESIZE, /* parentheses nested too deeply */
/* 20 */
REG_ESIZE, /* regular expression too large */
REG_ESPACE, /* failed to get memory */
REG_EPAREN, /* unmatched brackets */
REG_EPAREN, /* unmatched parentheses */
REG_ASSERT, /* internal error: code overflow */
REG_BADPAT, /* unrecognized character after (?< */
/* 25 */
REG_BADPAT, /* lookbehind assertion is not fixed length */
REG_BADPAT, /* malformed number or name after (?( */
REG_BADPAT, /* conditional group contains more than two branches */
REG_BADPAT, /* assertion expected after (?( */
REG_BADPAT, /* (?R or (?[+-]digits must be followed by ) */
/* 30 */
REG_ECTYPE, /* unknown POSIX class name */
REG_BADPAT, /* POSIX collating elements are not supported */
REG_INVARG, /* this version of PCRE is not compiled with PCRE_UTF8 support */
REG_BADPAT, /* spare error */
REG_BADPAT, /* character value in \x{...} sequence is too large */
/* 35 */
REG_BADPAT, /* invalid condition (?(0) */
REG_BADPAT, /* \C not allowed in lookbehind assertion */
REG_EESCAPE, /* PCRE does not support \L, \l, \N, \U, or \u */
REG_BADPAT, /* number after (?C is > 255 */
REG_BADPAT, /* closing ) for (?C expected */
/* 40 */
REG_BADPAT, /* recursive call could loop indefinitely */
REG_BADPAT, /* unrecognized character after (?P */
REG_BADPAT, /* syntax error in subpattern name (missing terminator) */
REG_BADPAT, /* two named subpatterns have the same name */
REG_BADPAT, /* invalid UTF-8 string */
/* 45 */
REG_BADPAT, /* support for \P, \p, and \X has not been compiled */
REG_BADPAT, /* malformed \P or \p sequence */
REG_BADPAT, /* unknown property name after \P or \p */
REG_BADPAT, /* subpattern name is too long (maximum 32 characters) */
REG_BADPAT, /* too many named subpatterns (maximum 10,000) */
/* 50 */
REG_BADPAT, /* repeated subpattern is too long */
REG_BADPAT, /* octal value is greater than \377 (not in UTF-8 mode) */
REG_BADPAT, /* internal error: overran compiling workspace */
REG_BADPAT, /* internal error: previously-checked referenced subpattern not found */
REG_BADPAT, /* DEFINE group contains more than one branch */
/* 55 */
REG_BADPAT, /* repeating a DEFINE group is not allowed */
REG_INVARG, /* inconsistent NEWLINE options */
REG_BADPAT, /* \g is not followed followed by an (optionally braced) non-zero number */
REG_BADPAT, /* (?+ or (?- must be followed by a non-zero number */
REG_BADPAT, /* a numbered reference must not be zero */
REG_BADPAT, /* (*VERB) with an argument is not supported */
/* 60 */
REG_BADPAT, /* (*VERB) not recognized */
REG_BADPAT, /* number is too big */
REG_BADPAT, /* subpattern name expected */
REG_BADPAT, /* digit expected after (?+ */
REG_BADPAT /* ] is an invalid data character in JavaScript compatibility mode */
REG_BADPAT, /* ] is an invalid data character in JavaScript compatibility mode */
/* 65 */
REG_BADPAT /* different names for subpatterns of the same number are not allowed */
};
/* Table of texts corresponding to POSIX error codes */
@ -224,17 +240,25 @@ int erroffset;
int errorcode;
int options = 0;
if ((cflags & REG_ICASE) != 0) options |= PCRE_CASELESS;
if ((cflags & REG_NEWLINE) != 0) options |= PCRE_MULTILINE;
if ((cflags & REG_DOTALL) != 0) options |= PCRE_DOTALL;
if ((cflags & REG_NOSUB) != 0) options |= PCRE_NO_AUTO_CAPTURE;
if ((cflags & REG_UTF8) != 0) options |= PCRE_UTF8;
if ((cflags & REG_ICASE) != 0) options |= PCRE_CASELESS;
if ((cflags & REG_NEWLINE) != 0) options |= PCRE_MULTILINE;
if ((cflags & REG_DOTALL) != 0) options |= PCRE_DOTALL;
if ((cflags & REG_NOSUB) != 0) options |= PCRE_NO_AUTO_CAPTURE;
if ((cflags & REG_UTF8) != 0) options |= PCRE_UTF8;
if ((cflags & REG_UNGREEDY) != 0) options |= PCRE_UNGREEDY;
preg->re_pcre = pcre_compile2(pattern, options, &errorcode, &errorptr,
&erroffset, NULL);
preg->re_erroffset = erroffset;
if (preg->re_pcre == NULL) return eint[errorcode];
/* Safety: if the error code is too big for the translation vector (which
should not happen, but we all make mistakes), return REG_BADPAT. */
if (preg->re_pcre == NULL)
{
return (errorcode < sizeof(eint)/sizeof(const int))?
eint[errorcode] : REG_BADPAT;
}
preg->re_nsub = pcre_info((const pcre *)preg->re_pcre, NULL, NULL);
return 0;
@ -276,10 +300,11 @@ if ((eflags & REG_NOTEMPTY) != 0) options |= PCRE_NOTEMPTY;
((regex_t *)preg)->re_erroffset = (size_t)(-1); /* Only has meaning after compile */
/* When no string data is being returned, ensure that nmatch is zero.
Otherwise, ensure the vector for holding the return data is large enough. */
/* When no string data is being returned, or no vector has been passed in which
to put it, ensure that nmatch is zero. Otherwise, ensure the vector for holding
the return data is large enough. */
if (nosub) nmatch = 0;
if (nosub || pmatch == NULL) nmatch = 0;
else if (nmatch > 0)
{

View File

@ -50,17 +50,18 @@ POSSIBILITY OF SUCH DAMAGE.
extern "C" {
#endif
/* Options, mostly defined by POSIX, but with a couple of extras. */
/* Options, mostly defined by POSIX, but with some extras. */
#define REG_ICASE 0x0001
#define REG_NEWLINE 0x0002
#define REG_NOTBOL 0x0004
#define REG_NOTEOL 0x0008
#define REG_DOTALL 0x0010 /* NOT defined by POSIX. */
#define REG_NOSUB 0x0020
#define REG_UTF8 0x0040 /* NOT defined by POSIX. */
#define REG_ICASE 0x0001 /* Maps to PCRE_CASELESS */
#define REG_NEWLINE 0x0002 /* Maps to PCRE_MULTILINE */
#define REG_NOTBOL 0x0004 /* Maps to PCRE_NOTBOL */
#define REG_NOTEOL 0x0008 /* Maps to PCRE_NOTEOL */
#define REG_DOTALL 0x0010 /* NOT defined by POSIX; maps to PCRE_DOTALL */
#define REG_NOSUB 0x0020 /* Maps to PCRE_NO_AUTO_CAPTURE */
#define REG_UTF8 0x0040 /* NOT defined by POSIX; maps to PCRE_UTF8 */
#define REG_STARTEND 0x0080 /* BSD feature: pass subject string by so,eo */
#define REG_NOTEMPTY 0x0100 /* NOT defined by POSIX. */
#define REG_NOTEMPTY 0x0100 /* NOT defined by POSIX; maps to PCRE_NOTEMPTY */
#define REG_UNGREEDY 0x0200 /* NOT defined by POSIX; maps to PCRE_UNGREEDY */
/* This is not used by PCRE, but by defining it we make it easier
to slot PCRE into existing programs that make POSIX calls. */

View File

@ -423,3 +423,27 @@ This time it jumps and jumps and jumps.
Here is the pattern again.
That time it was on a line by itself.
This line contains pattern not on a line by itself.
---------------------------- Test 55 -----------------------------
./testdata/grepinput:456
./testdata/grepinput8:0
./testdata/grepinputv:1
./testdata/grepinputx:0
---------------------------- Test 56 -----------------------------
./testdata/grepinput:456
./testdata/grepinputv:1
---------------------------- Test 57 -----------------------------
PATTERN at the start of a line.
In the middle of a line, PATTERN appears.
Check up on PATTERN near the end.
---------------------------- Test 58 -----------------------------
PATTERN at the start of a line.
In the middle of a line, PATTERN appears.
Check up on PATTERN near the end.
---------------------------- Test 59 -----------------------------
PATTERN at the start of a line.
In the middle of a line, PATTERN appears.
Check up on PATTERN near the end.
---------------------------- Test 60 -----------------------------
PATTERN at the start of a line.
In the middle of a line, PATTERN appears.
Check up on PATTERN near the end.

View File

@ -1,3 +1,6 @@
/-- This set of tests is for features that are compatible with all versions of
Perl 5, in non-UTF-8 mode. --/
/the quick brown fox/
the quick brown fox
The quick brown FOX
@ -4064,4 +4067,4 @@
/^%((?(?=[a])[^%])|b)*%$/
%ab%
/ End of testinput1 /
/-- End of testinput1 --/

View File

@ -121,4 +121,4 @@ are all themselves checked in other tests. --/
/[^\xaa]/8BM
/ End of testinput10 /
/-- End of testinput10 --/

View File

@ -1,3 +1,14 @@
/-- This set of tests is not Perl-compatible. It checks on special features
of PCRE's API, error diagnostics, and the compiled code of some patterns.
It also checks the non-Perl syntax the PCRE supports (Python, .NET,
Oniguruma). Finally, there are some tests where PCRE and Perl differ,
either because PCRE can't be compatible, or there is potential Perl
bug. --/
/-- Originally, the Perl 5.10 things were in here too, but now I have separated
many (most?) of them out into test 11. However, there may still be some
that were overlooked. --/
/(a)b|/I
/abc/I
@ -123,38 +134,38 @@
defabc
\Zdefabc
/abc/IP
/abc/P
abc
*** Failers
/^abc|def/IP
/^abc|def/P
abcdef
abcdef\B
/.*((abc)$|(def))/IP
/.*((abc)$|(def))/P
defabc
\Zdefabc
/the quick brown fox/IP
/the quick brown fox/P
the quick brown fox
*** Failers
The Quick Brown Fox
/the quick brown fox/IPi
/the quick brown fox/Pi
the quick brown fox
The Quick Brown Fox
/abc.def/IP
/abc.def/P
*** Failers
abc\ndef
/abc$/IP
/abc$/P
abc
abc\n
/(abc)\2/IP
/(abc)\2/P
/(abc\1)/IP
/(abc\1)/P
abc
/)/
@ -593,7 +604,7 @@
*** Failers
\Nabc
/a*(b+)(z)(z)/IP
/a*(b+)(z)(z)/P
aaaabbbbzzzz
aaaabbbbzzzz\O0
aaaabbbbzzzz\O1
@ -1122,14 +1133,6 @@
/(a(?1)+b)/DZ
/^\W*(?:((.)\W*(?1)\W*\2|)|((.)\W*(?3)\W*\4|\W*.\W*))\W*$/Ii
1221
Satan, oscillate my metallic sonatas!
A man, a plan, a canal: Panama!
Able was I ere I saw Elba.
*** Failers
The quick brown fox
/^(\d+|\((?1)([+*-])(?1)\)|-(?1))$/I
12
(((2+2)*-3)-7)
@ -1419,13 +1422,13 @@
** Failers
line one\nthis is a line\nbreak in the second line
/ab.cd/IP
/ab.cd/P
ab-cd
ab=cd
** Failers
ab\ncd
/ab.cd/IPs
/ab.cd/Ps
ab-cd
ab=cd
ab\ncd
@ -1480,10 +1483,10 @@
(this)
((this))
/a(b)c/IPN
/a(b)c/PN
abc
/a(?P<name>b)c/IPN
/a(?P<name>b)c/PN
abc
/\x{100}/I
@ -1915,13 +1918,6 @@ a random value. /Ix
/(?=(?'abc'\w+))\k<abc>:/I
abcd:
/(?'abc'\w+):\k<abc>{2}/
a:aaxyz
ab:ababxyz
** Failers
a:axyz
ab:abxyz
/(?'abc'a|b)(?<abc>d|e)\k<abc>{2}/J
adaa
** Failers
@ -1934,10 +1930,6 @@ a random value. /Ix
** Failers
bddd
/^(?<ab>a)? (?(<ab>)b|c) (?('ab')d|e)/x
abd
ce
/(?(<bc))/
/(?(''))/
@ -1955,16 +1947,6 @@ a random value. /Ix
/(?<1> (?'B' abc (?(R) (?(R&1)1) (?(R&B)2) X | (?1) (?2) (?R) ))) /x
abcabc1Xabc2XabcXabcabc
/^(?(DEFINE) (?<A> a) (?<B> b) ) (?&A) (?&B) /x
abcd
/(?<NAME>(?&NAME_PAT))\s+(?<ADDR>(?&ADDRESS_PAT))
(?(DEFINE)
(?<NAME_PAT>[a-z]+)
(?<ADDRESS_PAT>\d+)
)/x
metcalfe 33
/^(?(DEFINE) abc | xyz ) /x
/(?(DEFINE) abc) xyz/xI
@ -2053,22 +2035,6 @@ a random value. /Ix
/(?1)X(?<abc>P)/I
abcPXP123
/(?(DEFINE)(?<byte>2[0-4]\d|25[0-5]|1\d\d|[1-9]?\d))\b(?&byte)(\.(?&byte)){3}/
1.2.3.4
131.111.10.206
10.0.0.0
** Failers
10.6
455.3.4.5
/\b(?&byte)(\.(?&byte)){3}(?(DEFINE)(?<byte>2[0-4]\d|25[0-5]|1\d\d|[1-9]?\d))/
1.2.3.4
131.111.10.206
10.0.0.0
** Failers
10.6
455.3.4.5
/(?:a(?&abc)b)*(?<abc>x)/
123axbaxbaxbx456
123axbaxbaxb456
@ -2090,9 +2056,6 @@ a random value. /Ix
defabcabcxyz
DEFabcABCXYZ
/^(a(b))\1\g1\g{1}\g-1\g{-1}\g{-02}Z/
ababababbbabZXXXX
/^(a)\g-2/
/^(a)\g/
@ -2191,26 +2154,12 @@ a random value. /Ix
/^(?(+1)X|Y)(.)/BZ
Y!
/(foo)\Kbar/
foobar
/(foo)(\Kbar|baz)/
foobar
foobaz
/(foo\Kbar)baz/
foobarbaz
/(?<A>tom|bon)-\k{A}/
tom-tom
bon-bon
** Failers
tom-bon
/(?<A>tom|bon)-\g{A}/
tom-tom
bon-bon
/\g{A/
/(?|(abc)|(xyz))/BZ
@ -2225,50 +2174,6 @@ a random value. /Ix
xabcpqrx
xxyzx
/(?|(abc)|(xyz))\1/
abcabc
xyzxyz
** Failers
abcxyz
xyzabc
/(?|(abc)|(xyz))(?1)/
abcabc
xyzabc
** Failers
xyzxyz
/\H\h\V\v/
X X\x0a
X\x09X\x0b
** Failers
\xa0 X\x0a
/\H*\h+\V?\v{3,4}/
\x09\x20\xa0X\x0a\x0b\x0c\x0d\x0a
\x09\x20\xa0\x0a\x0b\x0c\x0d\x0a
\x09\x20\xa0\x0a\x0b\x0c
** Failers
\x09\x20\xa0\x0a\x0b
/\H{3,4}/
XY ABCDE
XY PQR ST
/.\h{3,4}./
XY AB PQRS
/\h*X\h?\H+Y\H?Z/
>XNNNYZ
> X NYQZ
** Failers
>XYZ
> X NY Z
/\v*X\v?Y\v+Z\V*\x0a\V+\x0b\V{2,3}\x0c/
>XY\x0aZ\x0aA\x0bNN\x0c
>\x0a\x0dX\x0aY\x0a\x0bZZZ\x0aAAA\x0bNNN\x0c
/[\h]/BZ
>\x09<
@ -2341,49 +2246,6 @@ a random value. /Ix
/A(*PRUNE)B(*SKIP)C(*THEN)D(*COMMIT)E(*F)F(*FAIL)G(?!)H(*ACCEPT)I/BZ
/^a+(*FAIL)/
aaaaaa
/a+b?c+(*FAIL)/
aaabccc
/a+b?(*PRUNE)c+(*FAIL)/
aaabccc
/a+b?(*COMMIT)c+(*FAIL)/
aaabccc
/a+b?(*SKIP)c+(*FAIL)/
aaabcccaaabccc
/^(?:aaa(*THEN)\w{6}|bbb(*THEN)\w{5}|ccc(*THEN)\w{4}|\w{3})/
aaaxxxxxx
aaa++++++
bbbxxxxx
bbb+++++
cccxxxx
ccc++++
dddddddd
/^(aaa(*THEN)\w{6}|bbb(*THEN)\w{5}|ccc(*THEN)\w{4}|\w{3})/
aaaxxxxxx
aaa++++++
bbbxxxxx
bbb+++++
cccxxxx
ccc++++
dddddddd
/a+b?(*THEN)c+(*FAIL)/
aaabccc
/(A (A|B(*ACCEPT)|C) D)(E)/x
ABX
AADE
ACDE
** Failers
AD
/^a+(*FAIL)/C
aaaaaa
@ -2589,66 +2451,8 @@ a random value. /Ix
/[[:a\dz:]]/
/^(?<name>a|b\g<name>c)/
aaaa
bacxxx
bbaccxxx
bbbacccxx
/^(?<name>a|b\g'name'c)/
aaaa
bacxxx
bbaccxxx
bbbacccxx
/^(a|b\g<1>c)/
aaaa
bacxxx
bbaccxxx
bbbacccxx
/^(a|b\g'1'c)/
aaaa
bacxxx
bbaccxxx
bbbacccxx
/^(a|b\g'-1'c)/
aaaa
bacxxx
bbaccxxx
bbbacccxx
/(^(a|b\g<-1>c))/
aaaa
bacxxx
bbaccxxx
bbbacccxx
/(^(a|b\g<-1'c))/
/(^(a|b\g{-1}))/
bacxxx
/(?-i:\g<name>)(?i:(?<name>a))/
XaaX
XAAX
/(?i:\g<name>)(?-i:(?<name>a))/
XaaX
** Failers
XAAX
/(?-i:\g<+1>)(?i:(a))/
XaaX
XAAX
/(?=(?<regex>(?#simplesyntax)\$(?<name>[a-zA-Z_\x{7f}-\x{ff}][a-zA-Z0-9_\x{7f}-\x{ff}]*)(?:\[(?<index>[a-zA-Z0-9_\x{7f}-\x{ff}]+|\$\g<name>)\]|->\g<name>(\(.*?\))?)?|(?#simple syntax withbraces)\$\{(?:\g<name>(?<indices>\[(?:\g<index>|'(?:\\.|[^'\\])*'|"(?:\g<regex>|\\.|[^"\\])*")\])?|\g<complex>|\$\{\g<complex>\})\}|(?#complexsyntax)\{(?<complex>\$(?<segment>\g<name>(\g<indices>*|\(.*?\))?)(?:->\g<segment>)*|\$\g<complex>|\$\{\g<complex>\})\}))\{/
/(?<n>a|b|c)\g<n>*/
abc
accccbbb
/^(?+1)(?<a>x|y){0}z/
xzxx
yzyy
@ -2755,22 +2559,614 @@ a random value. /Ix
/^"((?(?=[a])[^"])|b)*"$/
"ab"
/^X(?5)(a)(?|(b)|(q))(c)(d)(Y)/
XYabcdY
/^X(?5)(a)(?|(b)|(q))(c)(d)Y/
XYabcdY
/^X(?&N)(a)(?|(b)|(q))(c)(d)(?<N>Y)/
XYabcdY
/Xa{2,4}b/
X\P
Xa\P
Xaa\P
Xaaa\P
Xaaaa\P
/Xa{2,4}?b/
X\P
Xa\P
Xaa\P
Xaaa\P
Xaaaa\P
/Xa{2,4}+b/
X\P
Xa\P
Xaa\P
Xaaa\P
Xaaaa\P
/X\d{2,4}b/
X\P
X3\P
X33\P
X333\P
X3333\P
/X\d{2,4}?b/
X\P
X3\P
X33\P
X333\P
X3333\P
/X\d{2,4}+b/
X\P
X3\P
X33\P
X333\P
X3333\P
/X\D{2,4}b/
X\P
Xa\P
Xaa\P
Xaaa\P
Xaaaa\P
/X\D{2,4}?b/
X\P
Xa\P
Xaa\P
Xaaa\P
Xaaaa\P
/X\D{2,4}+b/
X\P
Xa\P
Xaa\P
Xaaa\P
Xaaaa\P
/X[abc]{2,4}b/
X\P
Xa\P
Xaa\P
Xaaa\P
Xaaaa\P
/X[abc]{2,4}?b/
X\P
Xa\P
Xaa\P
Xaaa\P
Xaaaa\P
/X[abc]{2,4}+b/
X\P
Xa\P
Xaa\P
Xaaa\P
Xaaaa\P
/X[^a]{2,4}b/
X\P
Xz\P
Xzz\P
Xzzz\P
Xzzzz\P
/X[^a]{2,4}?b/
X\P
Xz\P
Xzz\P
Xzzz\P
Xzzzz\P
/X[^a]{2,4}+b/
X\P
Xz\P
Xzz\P
Xzzz\P
Xzzzz\P
/(Y)X\1{2,4}b/
YX\P
YXY\P
YXYY\P
YXYYY\P
YXYYYY\P
/(Y)X\1{2,4}?b/
YX\P
YXY\P
YXYY\P
YXYYY\P
YXYYYY\P
/(Y)X\1{2,4}+b/
YX\P
YXY\P
YXYY\P
YXYYY\P
YXYYYY\P
/\++\KZ|\d+X|9+Y/
++++123999\P
++++123999Y\P
++++Z1234\P
/Z(*F)/
Z\P
ZA\P
/Z(?!)/
Z\P
ZA\P
/dog(sbody)?/
dogs\P
dogs\P\P
/dog(sbody)??/
dogs\P
dogs\P\P
/dog|dogsbody/
dogs\P
dogs\P\P
/dogsbody|dog/
dogs\P
dogs\P\P
/\bthe cat\b/
the cat\P
the cat\P\P
/abc/
abc\P
abc\P\P
/\w+A/P
CDAAAAB
/\w+A/PU
CDAAAAB
/abc\K123/
xyzabc123pqr
xyzabc12\P
xyzabc12\P\P
/(?<=abc)123/
xyzabc123pqr
xyzabc12\P
xyzabc12\P\P
/\babc\b/
+++abc+++
+++ab\P
+++ab\P\P
/(?&word)(?&element)(?(DEFINE)(?<element><[^m][^>]>[^<])(?<word>\w*+))/BZ
/(?&word)(?&element)(?(DEFINE)(?<element><[^\d][^>]>[^<])(?<word>\w*+))/BZ
/(ab)(x(y)z(cd(*ACCEPT)))pq/BZ
/abc\K/+
abcdef
abcdef\N\N
xyzabcdef\N\N
** Failers
abcdef\N
xyzabcdef\N
/^(?:(?=abc)|abc\K)/+
abcdef
abcdef\N\N
** Failers
abcdef\N
/a?b?/+
xyz
xyzabc
xyzabc\N
xyzabc\N\N
xyz\N\N
** Failers
xyz\N
/^a?b?/+
xyz
xyzabc
** Failers
xyzabc\N
xyzabc\N\N
xyz\N\N
xyz\N
/^(?<name>a|b\g<name>c)/
aaaa
bacxxx
bbaccxxx
bbbacccxx
/^(?<name>a|b\g'name'c)/
aaaa
bacxxx
bbaccxxx
bbbacccxx
/^(a|b\g<1>c)/
aaaa
bacxxx
bbaccxxx
bbbacccxx
/^(a|b\g'1'c)/
aaaa
bacxxx
bbaccxxx
bbbacccxx
/^(a|b\g'-1'c)/
aaaa
bacxxx
bbaccxxx
bbbacccxx
/(^(a|b\g<-1>c))/
aaaa
bacxxx
bbaccxxx
bbbacccxx
/(?-i:\g<name>)(?i:(?<name>a))/
XaaX
XAAX
/(?i:\g<name>)(?-i:(?<name>a))/
XaaX
** Failers
XAAX
/(?-i:\g<+1>)(?i:(a))/
XaaX
XAAX
/(?=(?<regex>(?#simplesyntax)\$(?<name>[a-zA-Z_\x{7f}-\x{ff}][a-zA-Z0-9_\x{7f}-\x{ff}]*)(?:\[(?<index>[a-zA-Z0-9_\x{7f}-\x{ff}]+|\$\g<name>)\]|->\g<name>(\(.*?\))?)?|(?#simple syntax withbraces)\$\{(?:\g<name>(?<indices>\[(?:\g<index>|'(?:\\.|[^'\\])*'|"(?:\g<regex>|\\.|[^"\\])*")\])?|\g<complex>|\$\{\g<complex>\})\}|(?#complexsyntax)\{(?<complex>\$(?<segment>\g<name>(\g<indices>*|\(.*?\))?)(?:->\g<segment>)*|\$\g<complex>|\$\{\g<complex>\})\}))\{/
/(?<n>a|b|c)\g<n>*/
abc
accccbbb
/^X(?7)(a)(?|(b)|(q)(r)(s))(c)(d)(Y)/
XYabcdY
/^X(?7)(a)(?|(b|(r)(s))|(q))(c)(d)(Y)/
XYabcdY
/(?<=b(?1)|zzz)(a)/
xbaax
xzzzax
/^X(?7)(a)(?|(b|(?|(r)|(t))(s))|(q))(c)(d)(Y)/
XYabcdY
/(a)(?<=b\1)/
/ End of testinput2 /
/(a)(?<=b+(?1))/
/(a+)(?<=b(?1))/
/(a(?<=b(?1)))/
/(?<=b(?1))xyz/
/(?<=b(?1))xyz(b+)pqrstuvew/
/(a|bc)\1/SI
/(a|bc)\1{2,3}/SI
/(a|bc)(?1)/SI
/(a|b\1)(a|b\1)/SI
/(a|b\1){2}/SI
/(a|bbbb\1)(a|bbbb\1)/SI
/(a|bbbb\1){2}/SI
/^From +([^ ]+) +[a-zA-Z][a-zA-Z][a-zA-Z] +[a-zA-Z][a-zA-Z][a-zA-Z] +[0-9]?[0-9] +[0-9][0-9]:[0-9][0-9]/SI
/ (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* # optional leading comment
(?: (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
|
" (?: # opening quote...
[^\\\x80-\xff\n\015"] # Anything except backslash and quote
| # or
\\ [^\x80-\xff] # Escaped something (something != CR)
)* " # closing quote
) # initial word
(?: (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* \. (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
|
" (?: # opening quote...
[^\\\x80-\xff\n\015"] # Anything except backslash and quote
| # or
\\ [^\x80-\xff] # Escaped something (something != CR)
)* " # closing quote
) )* # further okay, if led by a period
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* @ (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # initial subdomain
(?: #
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* \. # if led by a period...
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # ...further okay
)*
# address
| # or
(?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
|
" (?: # opening quote...
[^\\\x80-\xff\n\015"] # Anything except backslash and quote
| # or
\\ [^\x80-\xff] # Escaped something (something != CR)
)* " # closing quote
) # one word, optionally followed by....
(?:
[^()<>@,;:".\\\[\]\x80-\xff\000-\010\012-\037] | # atom and space parts, or...
\(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) | # comments, or...
" (?: # opening quote...
[^\\\x80-\xff\n\015"] # Anything except backslash and quote
| # or
\\ [^\x80-\xff] # Escaped something (something != CR)
)* " # closing quote
# quoted strings
)*
< (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* # leading <
(?: @ (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # initial subdomain
(?: #
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* \. # if led by a period...
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # ...further okay
)*
(?: (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* , (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* @ (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # initial subdomain
(?: #
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* \. # if led by a period...
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # ...further okay
)*
)* # further okay, if led by comma
: # closing colon
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* )? # optional route
(?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
|
" (?: # opening quote...
[^\\\x80-\xff\n\015"] # Anything except backslash and quote
| # or
\\ [^\x80-\xff] # Escaped something (something != CR)
)* " # closing quote
) # initial word
(?: (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* \. (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
|
" (?: # opening quote...
[^\\\x80-\xff\n\015"] # Anything except backslash and quote
| # or
\\ [^\x80-\xff] # Escaped something (something != CR)
)* " # closing quote
) )* # further okay, if led by a period
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* @ (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # initial subdomain
(?: #
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* \. # if led by a period...
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* (?:
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
| \[ # [
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
\] # ]
) # ...further okay
)*
# address spec
(?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* > # trailing >
# name and address
) (?: [\040\t] | \(
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* # optional trailing comment
/xSI
/<tr([\w\W\s\d][^<>]{0,})><TD([\w\W\s\d][^<>]{0,})>([\d]{0,}\.)(.*)((<BR>([\w\W\s\d][^<>]{0,})|[\s]{0,}))<\/a><\/TD><TD([\w\W\s\d][^<>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD><TD([\w\W\s\d][^<>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD><\/TR>/isIS
"(?>.*/)foo"SI
/(?(?=[^a-z]+[a-z]) \d{2}-[a-z]{3}-\d{2} | \d{2}-\d{2}-\d{2} ) /xSI
/(?:(?:(?:(?:(?:(?:(?:(?:(?:(a|b|c))))))))))/iSI
/(?:c|d)(?:)(?:aaaaaaaa(?:)(?:bbbbbbbb)(?:bbbbbbbb(?:))(?:bbbbbbbb(?:)(?:bbbbbbbb)))/SI
/<a[\s]+href[\s]*=[\s]* # find <a href=
([\"\'])? # find single or double quote
(?(1) (.*?)\1 | ([^\s]+)) # if quote found, match up to next matching
# quote, otherwise match up to next space
/isxSI
/^(?!:) # colon disallowed at start
(?: # start of item
(?: [0-9a-f]{1,4} | # 1-4 hex digits or
(?(1)0 | () ) ) # if null previously matched, fail; else null
: # followed by colon
){1,7} # end item; 1-7 of them required
[0-9a-f]{1,4} $ # final hex number at end of string
(?(1)|.) # check that there was an empty component
/xiIS
/(?|(?<a>A)|(?<a>B))/I
AB\Ca
BA\Ca
/(?|(?<a>A)|(?<b>B))/
/(?:a(?<quote> (?<apostrophe>')|(?<realquote>")) |
b(?<quote> (?<apostrophe>')|(?<realquote>")) )
(?('quote')[a-z]+|[0-9]+)/JIx
a"aaaaa
b"aaaaa
** Failers
b"11111
a"11111
/^(?|(a)(b)(c)(?<D>d)|(?<D>e)) (?('D')X|Y)/JDZx
abcdX
eX
** Failers
abcdY
ey
/(?<A>a) (b)(c) (?<A>d (?(R&A)$ | (?4)) )/JDZx
abcdd
** Failers
abcdde
/abcd*/
xxxxabcd\P
xxxxabcd\P\P
/abcd*/i
xxxxabcd\P
xxxxabcd\P\P
XXXXABCD\P
XXXXABCD\P\P
/abc\d*/
xxxxabc1\P
xxxxabc1\P\P
/(a)bc\1*/
xxxxabca\P
xxxxabca\P\P
/abc[de]*/
xxxxabcde\P
xxxxabcde\P\P
/-- This is not in the Perl 5.10 test because Perl seems currently to be broken
and not behaving as specified in that it *does* bumpalong after hitting
(*COMMIT). --/
/(?1)(A(*COMMIT)|B)D/
ABD
XABD
BAD
ABXABD
** Failers
ABX
BAXBAD
/(\3)(\1)(a)/<JS>
cat
/(\3)(\1)(a)/SI<JS>
cat
/(\3)(\1)(a)/SI
cat
/-- End of testinput2 --/

View File

@ -1,3 +1,7 @@
/-- This set of tests checks local-specific features, using the fr_FR locale.
It is not Perl-compatible. There is different version called wintestinput3
f or use on Windows, where the locale is called "french". --/
/^[\w]+/
*** Failers
École
@ -88,4 +92,4 @@
/[[:alpha:]][[:lower:]][[:upper:]]/DZLfr_FR
/ End of testinput3 /
/-- End of testinput3 --/

View File

@ -1,7 +1,6 @@
/-- Do not use the \x{} construct except with patterns that have the --/
/-- /8 option set, because PCRE doesn't recognize them as UTF-8 unless --/
/-- that option is set. However, the latest Perls recognize them always. --/
/-- This set of tests if for UTF-8 support, excluding Unicode properties. It is
compatible with all versions of Perl 5. --/
/a.b/8
acb
a\x7fb
@ -623,4 +622,22 @@
/(?i)[\xc3\xa9\xc3\xbd]|[\xc3\xa9\xc3\xbdA]/8
/ End of testinput4 /
/^[a\x{c0}]b/8
\x{c0}b
/^([a\x{c0}]*?)aa/8
a\x{c0}aaaa/
/^([a\x{c0}]*?)aa/8
a\x{c0}aaaa/
a\x{c0}a\x{c0}aaa/
/^([a\x{c0}]*)aa/8
a\x{c0}aaaa/
a\x{c0}a\x{c0}aaa/
/^([a\x{c0}]*)a\x{c0}/8
a\x{c0}aaaa/
a\x{c0}a\x{c0}aaa/
/-- End of testinput4 --/

View File

@ -1,3 +1,6 @@
/-- This set of tests checks the API, internals, and non-Perl stuff for UTF-8
support, excluding Unicode properties. --/
/\x{100}/8DZ
/\x{1000}/8DZ
@ -53,30 +56,6 @@
/.{3,5}?/DZ8
\x{212ab}\x{212ab}\x{212ab}\x{861}
/-- These tests are here rather than in testinput4 because Perl 5.6 has some
problems with UTF-8 support, in the area of \x{..} where the value is < 255.
It grumbles about invalid UTF-8 strings. --/
/^[a\x{c0}]b/8
\x{c0}b
/^([a\x{c0}]*?)aa/8
a\x{c0}aaaa/
/^([a\x{c0}]*?)aa/8
a\x{c0}aaaa/
a\x{c0}a\x{c0}aaa/
/^([a\x{c0}]*)aa/8
a\x{c0}aaaa/
a\x{c0}a\x{c0}aaa/
/^([a\x{c0}]*)a\x{c0}/8
a\x{c0}aaaa/
a\x{c0}a\x{c0}aaa/
/-- --/
/(?<=\C)X/8
Should produce an error diagnostic
@ -485,4 +464,282 @@ can't tell the difference.) --/
/(*CRLF)(*UTF8)(*BSR_UNICODE)a\Rb/I
/ End of testinput5 /
/Xa{2,4}b/8
X\P
Xa\P
Xaa\P
Xaaa\P
Xaaaa\P
/Xa{2,4}?b/8
X\P
Xa\P
Xaa\P
Xaaa\P
Xaaaa\P
/Xa{2,4}+b/8
X\P
Xa\P
Xaa\P
Xaaa\P
Xaaaa\P
/X\x{123}{2,4}b/8
X\P
X\x{123}\P
X\x{123}\x{123}\P
X\x{123}\x{123}\x{123}\P
X\x{123}\x{123}\x{123}\x{123}\P
/X\x{123}{2,4}?b/8
X\P
X\x{123}\P
X\x{123}\x{123}\P
X\x{123}\x{123}\x{123}\P
X\x{123}\x{123}\x{123}\x{123}\P
/X\x{123}{2,4}+b/8
X\P
X\x{123}\P
X\x{123}\x{123}\P
X\x{123}\x{123}\x{123}\P
X\x{123}\x{123}\x{123}\x{123}\P
/X\x{123}{2,4}b/8
Xx\P
X\x{123}x\P
X\x{123}\x{123}x\P
X\x{123}\x{123}\x{123}x\P
X\x{123}\x{123}\x{123}\x{123}x\P
/X\x{123}{2,4}?b/8
Xx\P
X\x{123}x\P
X\x{123}\x{123}x\P
X\x{123}\x{123}\x{123}x\P
X\x{123}\x{123}\x{123}\x{123}x\P
/X\x{123}{2,4}+b/8
Xx\P
X\x{123}x\P
X\x{123}\x{123}x\P
X\x{123}\x{123}\x{123}x\P
X\x{123}\x{123}\x{123}\x{123}x\P
/X\d{2,4}b/8
X\P
X3\P
X33\P
X333\P
X3333\P
/X\d{2,4}?b/8
X\P
X3\P
X33\P
X333\P
X3333\P
/X\d{2,4}+b/8
X\P
X3\P
X33\P
X333\P
X3333\P
/X\D{2,4}b/8
X\P
Xa\P
Xaa\P
Xaaa\P
Xaaaa\P
/X\D{2,4}?b/8
X\P
Xa\P
Xaa\P
Xaaa\P
Xaaaa\P
/X\D{2,4}+b/8
X\P
Xa\P
Xaa\P
Xaaa\P
Xaaaa\P
/X\D{2,4}b/8
X\P
X\x{123}\P
X\x{123}\x{123}\P
X\x{123}\x{123}\x{123}\P
X\x{123}\x{123}\x{123}\x{123}\P
/X\D{2,4}?b/8
X\P
X\x{123}\P
X\x{123}\x{123}\P
X\x{123}\x{123}\x{123}\P
X\x{123}\x{123}\x{123}\x{123}\P
/X\D{2,4}+b/8
X\P
X\x{123}\P
X\x{123}\x{123}\P
X\x{123}\x{123}\x{123}\P
X\x{123}\x{123}\x{123}\x{123}\P
/X[abc]{2,4}b/8
X\P
Xa\P
Xaa\P
Xaaa\P
Xaaaa\P
/X[abc]{2,4}?b/8
X\P
Xa\P
Xaa\P
Xaaa\P
Xaaaa\P
/X[abc]{2,4}+b/8
X\P
Xa\P
Xaa\P
Xaaa\P
Xaaaa\P
/X[abc\x{123}]{2,4}b/8
X\P
X\x{123}\P
X\x{123}\x{123}\P
X\x{123}\x{123}\x{123}\P
X\x{123}\x{123}\x{123}\x{123}\P
/X[abc\x{123}]{2,4}?b/8
X\P
X\x{123}\P
X\x{123}\x{123}\P
X\x{123}\x{123}\x{123}\P
X\x{123}\x{123}\x{123}\x{123}\P
/X[abc\x{123}]{2,4}+b/8
X\P
X\x{123}\P
X\x{123}\x{123}\P
X\x{123}\x{123}\x{123}\P
X\x{123}\x{123}\x{123}\x{123}\P
/X[^a]{2,4}b/8
X\P
Xz\P
Xzz\P
Xzzz\P
Xzzzz\P
/X[^a]{2,4}?b/8
X\P
Xz\P
Xzz\P
Xzzz\P
Xzzzz\P
/X[^a]{2,4}+b/8
X\P
Xz\P
Xzz\P
Xzzz\P
Xzzzz\P
/X[^a]{2,4}b/8
X\P
X\x{123}\P
X\x{123}\x{123}\P
X\x{123}\x{123}\x{123}\P
X\x{123}\x{123}\x{123}\x{123}\P
/X[^a]{2,4}?b/8
X\P
X\x{123}\P
X\x{123}\x{123}\P
X\x{123}\x{123}\x{123}\P
X\x{123}\x{123}\x{123}\x{123}\P
/X[^a]{2,4}+b/8
X\P
X\x{123}\P
X\x{123}\x{123}\P
X\x{123}\x{123}\x{123}\P
X\x{123}\x{123}\x{123}\x{123}\P
/(Y)X\1{2,4}b/8
YX\P
YXY\P
YXYY\P
YXYYY\P
YXYYYY\P
/(Y)X\1{2,4}?b/8
YX\P
YXY\P
YXYY\P
YXYYY\P
YXYYYY\P
/(Y)X\1{2,4}+b/8
YX\P
YXY\P
YXYY\P
YXYYY\P
YXYYYY\P
/(\x{123})X\1{2,4}b/8
\x{123}X\P
\x{123}X\x{123}\P
\x{123}X\x{123}\x{123}\P
\x{123}X\x{123}\x{123}\x{123}\P
\x{123}X\x{123}\x{123}\x{123}\x{123}\P
/(\x{123})X\1{2,4}?b/8
\x{123}X\P
\x{123}X\x{123}\P
\x{123}X\x{123}\x{123}\P
\x{123}X\x{123}\x{123}\x{123}\P
\x{123}X\x{123}\x{123}\x{123}\x{123}\P
/(\x{123})X\1{2,4}+b/8
\x{123}X\P
\x{123}X\x{123}\P
\x{123}X\x{123}\x{123}\P
\x{123}X\x{123}\x{123}\x{123}\P
\x{123}X\x{123}\x{123}\x{123}\x{123}\P
/\bthe cat\b/8
the cat\P
the cat\P\P
/abcd*/8
xxxxabcd\P
xxxxabcd\P\P
/abcd*/i8
xxxxabcd\P
xxxxabcd\P\P
XXXXABCD\P
XXXXABCD\P\P
/abc\d*/8
xxxxabc1\P
xxxxabc1\P\P
/(a)bc\1*/8
xxxxabca\P
xxxxabca\P\P
/abc[de]*/8
xxxxabcde\P
xxxxabcde\P\P
/-- End of testinput5 --/

View File

@ -1,3 +1,7 @@
/-- This set of tests is for Unicode property support. It is compatible with
Perl 5.10, but not 5.8 because it tests some extra properties that are
not in the earlier release. --/
/^\pC\pL\pM\pN\pP\pS\pZ</8
\x7f\x{c0}\x{30f}\x{660}\x{66c}\x{f01}\x{1680}<
\np\x{300}9!\$ <
@ -60,11 +64,6 @@
** Failers
\x{09f}
/^\p{Cs}/8
\?\x{dfff}
** Failers
\x{09f}
/^\p{Ll}/8
a
** Failers
@ -199,13 +198,6 @@
}
\x{f3b}
/^\p{Sc}+/8
$\x{a2}\x{a3}\x{a4}\x{a5}\x{a6}
\x{9f2}
** Failers
X
\x{2c2}
/^\p{Sk}/8
\x{2c2}
** Failers
@ -237,17 +229,6 @@
X
\x{2028}
/^\p{Zs}/8
\ \
\x{a0}
\x{1680}
\x{180e}
\x{2000}
\x{2001}
** Failers
\x{2028}
\x{200d}
/\p{Nd}+(..)/8
\x{660}\x{661}\x{662}ABC
@ -291,23 +272,6 @@
** Failers
\x{660}\x{661}\x{662}ABC
/\p{Lu}/8i
A
a\x{10a0}B
** Failers
a
\x{1d00}
/\p{^Lu}/8i
1234
** Failers
ABC
/\P{Lu}/8i
1234
** Failers
ABC
/(?<=A\p{Nd})XYZ/8
A2XYZ
123A5XYZPQR
@ -323,26 +287,6 @@
** Failers
WXYZ
/[\p{L}]/DZ
/[\p{^L}]/DZ
/[\P{L}]/DZ
/[\P{^L}]/DZ
/[abc\p{L}\x{0660}]/8DZ
/[\p{Nd}]/8DZ
1234
/[\p{Nd}+-]+/8DZ
1234
12-34
12+\x{661}-34
** Failers
abcd
/[\P{Nd}]+/8
abcd
** Failers
@ -394,20 +338,6 @@
** Failers
ABC
/\p{Ll}/8i
a
Az
** Failers
ABC
/^\x{c0}$/8i
\x{c0}
\x{e0}
/^\x{e0}$/8i
\x{c0}
\x{e0}
/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/8
A\x{391}\x{10427}\x{ff3a}\x{1fb0}
** Failers
@ -425,14 +355,6 @@
A\x{391}\x{10427}\x{ff5a}\x{1fb0}
A\x{391}\x{10427}\x{ff3a}\x{1fb8}
/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/8iDZ
/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/8DZ
/AB\x{1fb0}/8DZ
/AB\x{1fb0}/8DZi
/\x{391}+/8i
\x{391}\x{3b1}\x{3b1}\x{3b1}\x{391}
@ -448,35 +370,6 @@
\x{3b1}
\x{ff5a}
/[\x{c0}\x{391}]/8i
\x{c0}
\x{e0}
/[\x{105}-\x{109}]/8iDZ
\x{104}
\x{105}
\x{109}
** Failers
\x{100}
\x{10a}
/[z-\x{100}]/8iDZ
Z
z
\x{39c}
\x{178}
|
\x{80}
\x{ff}
\x{100}
\x{101}
** Failers
\x{102}
Y
y
/[z-\x{100}]/8DZi
/^\X/8
A
A\x{300}BC
@ -747,31 +640,9 @@
/([\pL]=(abc))*X/
L=abcX
/The next two should be Perl-compatible, but it fails to match \x{e0}. PCRE
will match it only with UCP support, because without that it has no notion
of case for anything other than the ASCII letters. /
/((?i)[\x{c0}])/8
\x{c0}
\x{e0}
/(?i:[\x{c0}])/8
\x{c0}
\x{e0}
/^\p{Balinese}\p{Cuneiform}\p{Nko}\p{Phags_Pa}\p{Phoenician}/8
\x{1b00}\x{12000}\x{7c0}\x{a840}\x{10900}
/The next two are special cases where the lengths of the different cases of the
same character differ. The first went wrong with heap frame storage; the 2nd
was broken in all cases./
/^\x{023a}+?(\x{0130}+)/8i
\x{023a}\x{2c65}\x{0130}
/^\x{023a}+([^X])/8i
\x{023a}\x{2c65}X
/Check property support in non-UTF-8 mode/
/\p{L}{4}/
@ -790,48 +661,6 @@ was broken in all cases./
/[\PPP\x8a]{1,}\x80/
A\x80
/(?:[\PPa*]*){8,}/
/[\P{Any}]/BZ
/[\P{Any}\E]/BZ
/(\P{Yi}+\277)/
/(\P{Yi}+\277)?/
/(?<=\P{Yi}{3}A)X/
/\p{Yi}+(\P{Yi}+)(?1)/
/(\P{Yi}{2}\277)?/
/[\P{Yi}A]/
/[\P{Yi}\P{Yi}\P{Yi}A]/
/[^\P{Yi}A]/
/[^\P{Yi}\P{Yi}\P{Yi}A]/
/(\P{Yi}*\277)*/
/(\P{Yi}*?\277)*/
/(\p{Yi}*+\277)*/
/(\P{Yi}?\277)*/
/(\P{Yi}??\277)*/
/(\p{Yi}?+\277)*/
/(\P{Yi}{0,3}\277)*/
/(\P{Yi}{0,3}?\277)*/
/(\p{Yi}{0,3}+\277)*/
/^[\p{Arabic}]/8
\x{60e}
\x{656}
@ -895,24 +724,6 @@ was broken in all cases./
\x{1049f}
\x{104aa}
/\p{Zl}{2,3}+/8BZ
\xe2\x80\xa8\xe2\x80\xa8
\x{2028}\x{2028}\x{2028}
/\p{Zl}/8BZ
/\p{Lu}{3}+/8BZ
/\pL{2}+/8BZ
/\p{Cc}{2}+/8BZ
/\x{c0}+\x{116}+/8i
\x{c0}\x{e0}\x{116}\x{117}
/[\x{c0}\x{116}]+/8i
\x{c0}\x{e0}\x{116}\x{117}
/\p{Carian}\p{Cham}\p{Kayah_Li}\p{Lepcha}\p{Lycian}\p{Lydian}\p{Ol_Chiki}\p{Rejang}\p{Saurashtra}\p{Sundanese}\p{Vai}/8
\x{102A4}\x{AA52}\x{A91D}\x{1C46}\x{10283}\x{1092E}\x{1C6B}\x{A93B}\x{A8BF}\x{1BA0}\x{A50A}====
@ -931,12 +742,6 @@ was broken in all cases./
aa
aA
/(\x{de})\1/8i
\x{de}\x{de}
\x{de}\x{fe}
\x{fe}\x{fe}
\x{fe}\x{de}
/(\x{10a})\1/8i
\x{10a}\x{10a}
\x{10a}\x{10b}
@ -951,4 +756,4 @@ was broken in all cases./
/[\p{Lu}\x20]+/
\x41\x20\x50\xC2\x54\xC9\x20\x54\x4F\x44\x41\x59
/ End of testinput6 /
/-- End of testinput6 --/

View File

@ -1,3 +1,6 @@
/-- This set of tests check the DFA matching functionality of pcre_dfa_exec().
The -dfa flag must be used with pcretest when running it. --/
/abc/
abc
@ -4421,4 +4424,122 @@
"ab"
\C-"ab"
/ End of testinput7 /
/\d+X|9+Y/
++++123999\P
++++123999Y\P
/Z(*F)/
Z\P
ZA\P
/Z(?!)/
Z\P
ZA\P
/dog(sbody)?/
dogs\P
dogs\P\P
/dog(sbody)??/
dogs\P
dogs\P\P
/dog|dogsbody/
dogs\P
dogs\P\P
/dogsbody|dog/
dogs\P
dogs\P\P
/Z(*F)Q|ZXY/
Z\P
ZA\P
X\P
/\bthe cat\b/
the cat\P
the cat\P\P
/dog(sbody)?/
dogs\D\P
body\D\R
/dog(sbody)?/
dogs\D\P\P
body\D\R
/abc/
abc\P
abc\P\P
/abc\K123/
xyzabc123pqr
/(?<=abc)123/
xyzabc123pqr
xyzabc12\P
xyzabc12\P\P
/\babc\b/
+++abc+++
+++ab\P
+++ab\P\P
/(?=C)/g+
ABCDECBA
/(abc|def|xyz)/I
terhjk;abcdaadsfe
the quick xyz brown fox
\Yterhjk;abcdaadsfe
\Ythe quick xyz brown fox
** Failers
thejk;adlfj aenjl;fda asdfasd ehj;kjxyasiupd
\Ythejk;adlfj aenjl;fda asdfasd ehj;kjxyasiupd
/(abc|def|xyz)/SI
terhjk;abcdaadsfe
the quick xyz brown fox
\Yterhjk;abcdaadsfe
\Ythe quick xyz brown fox
** Failers
thejk;adlfj aenjl;fda asdfasd ehj;kjxyasiupd
\Ythejk;adlfj aenjl;fda asdfasd ehj;kjxyasiupd
/abcd*/+
xxxxabcd\P
xxxxabcd\P\P
dddxxx\R
xxxxabcd\P\P
xxx\R
/abcd*/i
xxxxabcd\P
xxxxabcd\P\P
XXXXABCD\P
XXXXABCD\P\P
/abc\d*/
xxxxabc1\P
xxxxabc1\P\P
/abc[de]*/
xxxxabcde\P
xxxxabcde\P\P
/(?:(?1)|B)(A(*F)|C)/
ABCD
CCD
** Failers
CAD
/^(?:(?1)|B)(A(*F)|C)/
CCD
BCD
** Failers
ABCD
CAD
BAD
/-- End of testinput7 --/

View File

@ -1,6 +1,6 @@
/-- Do not use the \x{} construct except with patterns that have the --/
/-- /8 option set, because PCRE doesn't recognize them as UTF-8 unless --/
/-- that option is set. However, the latest Perls recognize them always. --/
/-- This set of tests checks UTF-8 support with the DFA matching functionality
of pcre_dfa_exec(). The -dfa flag must be used with pcretest when running
it. --/
/\x{100}ab/8
\x{100}ab
@ -667,4 +667,22 @@
/X/8f<any>
A\x{1ec5}ABCXYZ
/ End of testinput 8 /
/abcd*/8
xxxxabcd\P
xxxxabcd\P\P
/abcd*/i8
xxxxabcd\P
xxxxabcd\P\P
XXXXABCD\P
XXXXABCD\P\P
/abc\d*/8
xxxxabc1\P
xxxxabc1\P\P
/abc[de]*/8
xxxxabcde\P
xxxxabcde\P\P
/-- End of testinput8 --/

View File

@ -1,3 +1,7 @@
/-- This set of tests check Unicode property support with the DFA matching
functionality of pcre_dfa_exec(). The -dfa flag must be used with pcretest
when running it. --/
/\pL\P{Nd}/8
AB
*** Failers
@ -843,4 +847,4 @@
** Failers
\x{1d79}\x{a77d}
/ End /
/-- End of testinput9 --/

View File

@ -1,3 +1,6 @@
/-- This set of tests is for features that are compatible with all versions of
Perl 5, in non-UTF-8 mode. --/
/the quick brown fox/
the quick brown fox
0: the quick brown fox
@ -6646,4 +6649,4 @@ No match
0: %ab%
1:
/ End of testinput1 /
/-- End of testinput1 --/

View File

@ -666,4 +666,4 @@ Memory allocation (code space): 40
39 End
------------------------------------------------------------------
/ End of testinput10 /
/-- End of testinput10 --/

File diff suppressed because it is too large Load Diff

View File

@ -1,3 +1,7 @@
/-- This set of tests checks local-specific features, using the fr_FR locale.
It is not Perl-compatible. There is different version called wintestinput3
f or use on Windows, where the locale is called "french". --/
/^[\w]+/
*** Failers
No match
@ -83,6 +87,7 @@ Capturing subpattern count = 0
No options
No first char
No need char
Subject length lower bound = 1
Starting byte set: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
@ -91,6 +96,7 @@ Capturing subpattern count = 0
No options
No first char
No need char
Subject length lower bound = 1
Starting byte set: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
ª µ º À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â
@ -160,4 +166,4 @@ No options
No first char
No need char
/ End of testinput3 /
/-- End of testinput3 --/

View File

@ -1,9 +1,6 @@
/-- Do not use the \x{} construct except with patterns that have the --/
/-- /8 option set, because PCRE doesn't recognize them as UTF-8 unless --/
No match
/-- that option is set. However, the latest Perls recognize them always. --/
No match
/-- This set of tests if for UTF-8 support, excluding Unicode properties. It is
compatible with all versions of Perl 5. --/
/a.b/8
acb
0: acb
@ -1089,4 +1086,37 @@ No match
/(?i)[\xc3\xa9\xc3\xbd]|[\xc3\xa9\xc3\xbdA]/8
/ End of testinput4 /
/^[a\x{c0}]b/8
\x{c0}b
0: \x{c0}b
/^([a\x{c0}]*?)aa/8
a\x{c0}aaaa/
0: a\x{c0}aa
1: a\x{c0}
/^([a\x{c0}]*?)aa/8
a\x{c0}aaaa/
0: a\x{c0}aa
1: a\x{c0}
a\x{c0}a\x{c0}aaa/
0: a\x{c0}a\x{c0}aa
1: a\x{c0}a\x{c0}
/^([a\x{c0}]*)aa/8
a\x{c0}aaaa/
0: a\x{c0}aaaa
1: a\x{c0}aa
a\x{c0}a\x{c0}aaa/
0: a\x{c0}a\x{c0}aaa
1: a\x{c0}a\x{c0}a
/^([a\x{c0}]*)a\x{c0}/8
a\x{c0}aaaa/
0: a\x{c0}
1:
a\x{c0}a\x{c0}aaa/
0: a\x{c0}a\x{c0}
1: a\x{c0}
/-- End of testinput4 --/

View File

@ -1,3 +1,6 @@
/-- This set of tests checks the API, internals, and non-Perl stuff for UTF-8
support, excluding Unicode properties. --/
/\x{100}/8DZ
------------------------------------------------------------------
Bra
@ -252,7 +255,6 @@ Need char = 171
End
------------------------------------------------------------------
Capturing subpattern count = 0
Partial matching not supported
Options: utf8
No first char
Need char = 'X'
@ -269,52 +271,12 @@ Need char = 'X'
End
------------------------------------------------------------------
Capturing subpattern count = 0
Partial matching not supported
Options: utf8
No first char
No need char
\x{212ab}\x{212ab}\x{212ab}\x{861}
0: \x{212ab}\x{212ab}\x{212ab}
/-- These tests are here rather than in testinput4 because Perl 5.6 has some
problems with UTF-8 support, in the area of \x{..} where the value is < 255.
It grumbles about invalid UTF-8 strings. --/
/^[a\x{c0}]b/8
\x{c0}b
0: \x{c0}b
/^([a\x{c0}]*?)aa/8
a\x{c0}aaaa/
0: a\x{c0}aa
1: a\x{c0}
/^([a\x{c0}]*?)aa/8
a\x{c0}aaaa/
0: a\x{c0}aa
1: a\x{c0}
a\x{c0}a\x{c0}aaa/
0: a\x{c0}a\x{c0}aa
1: a\x{c0}a\x{c0}
/^([a\x{c0}]*)aa/8
a\x{c0}aaaa/
0: a\x{c0}aaaa
1: a\x{c0}aa
a\x{c0}a\x{c0}aaa/
0: a\x{c0}a\x{c0}aaa
1: a\x{c0}a\x{c0}a
/^([a\x{c0}]*)a\x{c0}/8
a\x{c0}aaaa/
0: a\x{c0}
1:
a\x{c0}a\x{c0}aaa/
0: a\x{c0}a\x{c0}
1: a\x{c0}
/-- --/
/(?<=\C)X/8
Failed: \C not allowed in lookbehind assertion at offset 6
@ -389,6 +351,7 @@ Capturing subpattern count = 0
Options: utf8
No first char
No need char
Subject length lower bound = 1
Starting byte set: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
\x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4
@ -423,11 +386,11 @@ No match
End
------------------------------------------------------------------
Capturing subpattern count = 0
Partial matching not supported
Options: utf8
First char = 196
Need char = 128
Study returned NULL
Subject length lower bound = 3
No set of starting bytes
\x{100}\x{100}\x{100}\x{100\x{100}
0: \x{100}\x{100}\x{100}
@ -443,10 +406,10 @@ Study returned NULL
End
------------------------------------------------------------------
Capturing subpattern count = 1
Partial matching not supported
Options: utf8
No first char
No need char
Subject length lower bound = 1
Starting byte set: x \xc4
/(\x{100}*a|x)/8SDZ
@ -462,10 +425,10 @@ Starting byte set: x \xc4
End
------------------------------------------------------------------
Capturing subpattern count = 1
Partial matching not supported
Options: utf8
No first char
No need char
Subject length lower bound = 1
Starting byte set: a x \xc4
/(\x{100}{0,2}a|x)/8SDZ
@ -481,10 +444,10 @@ Starting byte set: a x \xc4
End
------------------------------------------------------------------
Capturing subpattern count = 1
Partial matching not supported
Options: utf8
No first char
No need char
Subject length lower bound = 1
Starting byte set: a x \xc4
/(\x{100}{1,2}a|x)/8SDZ
@ -501,10 +464,10 @@ Starting byte set: a x \xc4
End
------------------------------------------------------------------
Capturing subpattern count = 1
Partial matching not supported
Options: utf8
No first char
No need char
Subject length lower bound = 1
Starting byte set: x \xc4
/\x{100}*(\d+|"(?1)")/8
@ -551,7 +514,6 @@ Need char = 128
End
------------------------------------------------------------------
Capturing subpattern count = 0
Partial matching not supported
Options: utf8
No first char
No need char
@ -565,7 +527,6 @@ No need char
End
------------------------------------------------------------------
Capturing subpattern count = 0
Partial matching not supported
Options: utf8
First char = 'a'
No need char
@ -579,7 +540,6 @@ No need char
End
------------------------------------------------------------------
Capturing subpattern count = 0
Partial matching not supported
Options: utf8
First char = 'a'
Need char = 'b'
@ -593,7 +553,6 @@ Need char = 'b'
End
------------------------------------------------------------------
Capturing subpattern count = 0
Partial matching not supported
Options: utf8
First char = 'a'
Need char = 128
@ -607,7 +566,6 @@ Need char = 128
End
------------------------------------------------------------------
Capturing subpattern count = 0
Partial matching not supported
Options: utf8
First char = 'a'
Need char = 129
@ -621,7 +579,6 @@ Need char = 129
End
------------------------------------------------------------------
Capturing subpattern count = 0
Partial matching not supported
Options: utf8
No first char
Need char = 'A'
@ -640,7 +597,6 @@ Need char = 'A'
End
------------------------------------------------------------------
Capturing subpattern count = 0
Partial matching not supported
Options: utf8
No first char
No need char
@ -1122,7 +1078,6 @@ Need char = 191
End
------------------------------------------------------------------
Capturing subpattern count = 0
Partial matching not supported
Options: utf8
No first char
No need char
@ -1136,7 +1091,6 @@ No need char
End
------------------------------------------------------------------
Capturing subpattern count = 0
Partial matching not supported
Options: utf8
No first char
No need char
@ -1150,7 +1104,6 @@ No need char
End
------------------------------------------------------------------
Capturing subpattern count = 0
Partial matching not supported
Options: utf8
No first char
No need char
@ -1164,7 +1117,6 @@ No need char
End
------------------------------------------------------------------
Capturing subpattern count = 0
Partial matching not supported
Options: utf8
No first char
No need char
@ -1178,7 +1130,6 @@ No need char
End
------------------------------------------------------------------
Capturing subpattern count = 0
Partial matching not supported
Options: utf8
No first char
No need char
@ -1192,7 +1143,6 @@ No need char
End
------------------------------------------------------------------
Capturing subpattern count = 0
Partial matching not supported
Options: utf8
No first char
No need char
@ -1206,7 +1156,6 @@ No need char
End
------------------------------------------------------------------
Capturing subpattern count = 0
Partial matching not supported
Options: utf8
First char = 196
Need char = 128
@ -1220,7 +1169,6 @@ Need char = 128
End
------------------------------------------------------------------
Capturing subpattern count = 0
Partial matching not supported
Options: utf8
First char = 196
Need char = 'X'
@ -1234,7 +1182,6 @@ Need char = 'X'
End
------------------------------------------------------------------
Capturing subpattern count = 0
Partial matching not supported
Options: utf8
First char = 'X'
Need char = 128
@ -1652,4 +1599,477 @@ Forced newline sequence: CRLF
First char = 'a'
Need char = 'b'
/ End of testinput5 /
/Xa{2,4}b/8
X\P
Partial match: X
Xa\P
Partial match: Xa
Xaa\P
Partial match: Xaa
Xaaa\P
Partial match: Xaaa
Xaaaa\P
Partial match: Xaaaa
/Xa{2,4}?b/8
X\P
Partial match: X
Xa\P
Partial match: Xa
Xaa\P
Partial match: Xaa
Xaaa\P
Partial match: Xaaa
Xaaaa\P
Partial match: Xaaaa
/Xa{2,4}+b/8
X\P
Partial match: X
Xa\P
Partial match: Xa
Xaa\P
Partial match: Xaa
Xaaa\P
Partial match: Xaaa
Xaaaa\P
Partial match: Xaaaa
/X\x{123}{2,4}b/8
X\P
Partial match: X
X\x{123}\P
Partial match: X\x{123}
X\x{123}\x{123}\P
Partial match: X\x{123}\x{123}
X\x{123}\x{123}\x{123}\P
Partial match: X\x{123}\x{123}\x{123}
X\x{123}\x{123}\x{123}\x{123}\P
Partial match: X\x{123}\x{123}\x{123}\x{123}
/X\x{123}{2,4}?b/8
X\P
Partial match: X
X\x{123}\P
Partial match: X\x{123}
X\x{123}\x{123}\P
Partial match: X\x{123}\x{123}
X\x{123}\x{123}\x{123}\P
Partial match: X\x{123}\x{123}\x{123}
X\x{123}\x{123}\x{123}\x{123}\P
Partial match: X\x{123}\x{123}\x{123}\x{123}
/X\x{123}{2,4}+b/8
X\P
Partial match: X
X\x{123}\P
Partial match: X\x{123}
X\x{123}\x{123}\P
Partial match: X\x{123}\x{123}
X\x{123}\x{123}\x{123}\P
Partial match: X\x{123}\x{123}\x{123}
X\x{123}\x{123}\x{123}\x{123}\P
Partial match: X\x{123}\x{123}\x{123}\x{123}
/X\x{123}{2,4}b/8
Xx\P
No match
X\x{123}x\P
No match
X\x{123}\x{123}x\P
No match
X\x{123}\x{123}\x{123}x\P
No match
X\x{123}\x{123}\x{123}\x{123}x\P
No match
/X\x{123}{2,4}?b/8
Xx\P
No match
X\x{123}x\P
No match
X\x{123}\x{123}x\P
No match
X\x{123}\x{123}\x{123}x\P
No match
X\x{123}\x{123}\x{123}\x{123}x\P
No match
/X\x{123}{2,4}+b/8
Xx\P
No match
X\x{123}x\P
No match
X\x{123}\x{123}x\P
No match
X\x{123}\x{123}\x{123}x\P
No match
X\x{123}\x{123}\x{123}\x{123}x\P
No match
/X\d{2,4}b/8
X\P
Partial match: X
X3\P
Partial match: X3
X33\P
Partial match: X33
X333\P
Partial match: X333
X3333\P
Partial match: X3333
/X\d{2,4}?b/8
X\P
Partial match: X
X3\P
Partial match: X3
X33\P
Partial match: X33
X333\P
Partial match: X333
X3333\P
Partial match: X3333
/X\d{2,4}+b/8
X\P
Partial match: X
X3\P
Partial match: X3
X33\P
Partial match: X33
X333\P
Partial match: X333
X3333\P
Partial match: X3333
/X\D{2,4}b/8
X\P
Partial match: X
Xa\P
Partial match: Xa
Xaa\P
Partial match: Xaa
Xaaa\P
Partial match: Xaaa
Xaaaa\P
Partial match: Xaaaa
/X\D{2,4}?b/8
X\P
Partial match: X
Xa\P
Partial match: Xa
Xaa\P
Partial match: Xaa
Xaaa\P
Partial match: Xaaa
Xaaaa\P
Partial match: Xaaaa
/X\D{2,4}+b/8
X\P
Partial match: X
Xa\P
Partial match: Xa
Xaa\P
Partial match: Xaa
Xaaa\P
Partial match: Xaaa
Xaaaa\P
Partial match: Xaaaa
/X\D{2,4}b/8
X\P
Partial match: X
X\x{123}\P
Partial match: X\x{123}
X\x{123}\x{123}\P
Partial match: X\x{123}\x{123}
X\x{123}\x{123}\x{123}\P
Partial match: X\x{123}\x{123}\x{123}
X\x{123}\x{123}\x{123}\x{123}\P
Partial match: X\x{123}\x{123}\x{123}\x{123}
/X\D{2,4}?b/8
X\P
Partial match: X
X\x{123}\P
Partial match: X\x{123}
X\x{123}\x{123}\P
Partial match: X\x{123}\x{123}
X\x{123}\x{123}\x{123}\P
Partial match: X\x{123}\x{123}\x{123}
X\x{123}\x{123}\x{123}\x{123}\P
Partial match: X\x{123}\x{123}\x{123}\x{123}
/X\D{2,4}+b/8
X\P
Partial match: X
X\x{123}\P
Partial match: X\x{123}
X\x{123}\x{123}\P
Partial match: X\x{123}\x{123}
X\x{123}\x{123}\x{123}\P
Partial match: X\x{123}\x{123}\x{123}
X\x{123}\x{123}\x{123}\x{123}\P
Partial match: X\x{123}\x{123}\x{123}\x{123}
/X[abc]{2,4}b/8
X\P
Partial match: X
Xa\P
Partial match: Xa
Xaa\P
Partial match: Xaa
Xaaa\P
Partial match: Xaaa
Xaaaa\P
Partial match: Xaaaa
/X[abc]{2,4}?b/8
X\P
Partial match: X
Xa\P
Partial match: Xa
Xaa\P
Partial match: Xaa
Xaaa\P
Partial match: Xaaa
Xaaaa\P
Partial match: Xaaaa
/X[abc]{2,4}+b/8
X\P
Partial match: X
Xa\P
Partial match: Xa
Xaa\P
Partial match: Xaa
Xaaa\P
Partial match: Xaaa
Xaaaa\P
Partial match: Xaaaa
/X[abc\x{123}]{2,4}b/8
X\P
Partial match: X
X\x{123}\P
Partial match: X\x{123}
X\x{123}\x{123}\P
Partial match: X\x{123}\x{123}
X\x{123}\x{123}\x{123}\P
Partial match: X\x{123}\x{123}\x{123}
X\x{123}\x{123}\x{123}\x{123}\P
Partial match: X\x{123}\x{123}\x{123}\x{123}
/X[abc\x{123}]{2,4}?b/8
X\P
Partial match: X
X\x{123}\P
Partial match: X\x{123}
X\x{123}\x{123}\P
Partial match: X\x{123}\x{123}
X\x{123}\x{123}\x{123}\P
Partial match: X\x{123}\x{123}\x{123}
X\x{123}\x{123}\x{123}\x{123}\P
Partial match: X\x{123}\x{123}\x{123}\x{123}
/X[abc\x{123}]{2,4}+b/8
X\P
Partial match: X
X\x{123}\P
Partial match: X\x{123}
X\x{123}\x{123}\P
Partial match: X\x{123}\x{123}
X\x{123}\x{123}\x{123}\P
Partial match: X\x{123}\x{123}\x{123}
X\x{123}\x{123}\x{123}\x{123}\P
Partial match: X\x{123}\x{123}\x{123}\x{123}
/X[^a]{2,4}b/8
X\P
Partial match: X
Xz\P
Partial match: Xz
Xzz\P
Partial match: Xzz
Xzzz\P
Partial match: Xzzz
Xzzzz\P
Partial match: Xzzzz
/X[^a]{2,4}?b/8
X\P
Partial match: X
Xz\P
Partial match: Xz
Xzz\P
Partial match: Xzz
Xzzz\P
Partial match: Xzzz
Xzzzz\P
Partial match: Xzzzz
/X[^a]{2,4}+b/8
X\P
Partial match: X
Xz\P
Partial match: Xz
Xzz\P
Partial match: Xzz
Xzzz\P
Partial match: Xzzz
Xzzzz\P
Partial match: Xzzzz
/X[^a]{2,4}b/8
X\P
Partial match: X
X\x{123}\P
Partial match: X\x{123}
X\x{123}\x{123}\P
Partial match: X\x{123}\x{123}
X\x{123}\x{123}\x{123}\P
Partial match: X\x{123}\x{123}\x{123}
X\x{123}\x{123}\x{123}\x{123}\P
Partial match: X\x{123}\x{123}\x{123}\x{123}
/X[^a]{2,4}?b/8
X\P
Partial match: X
X\x{123}\P
Partial match: X\x{123}
X\x{123}\x{123}\P
Partial match: X\x{123}\x{123}
X\x{123}\x{123}\x{123}\P
Partial match: X\x{123}\x{123}\x{123}
X\x{123}\x{123}\x{123}\x{123}\P
Partial match: X\x{123}\x{123}\x{123}\x{123}
/X[^a]{2,4}+b/8
X\P
Partial match: X
X\x{123}\P
Partial match: X\x{123}
X\x{123}\x{123}\P
Partial match: X\x{123}\x{123}
X\x{123}\x{123}\x{123}\P
Partial match: X\x{123}\x{123}\x{123}
X\x{123}\x{123}\x{123}\x{123}\P
Partial match: X\x{123}\x{123}\x{123}\x{123}
/(Y)X\1{2,4}b/8
YX\P
Partial match: YX
YXY\P
Partial match: YXY
YXYY\P
Partial match: YXYY
YXYYY\P
Partial match: YXYYY
YXYYYY\P
Partial match: YXYYYY
/(Y)X\1{2,4}?b/8
YX\P
Partial match: YX
YXY\P
Partial match: YXY
YXYY\P
Partial match: YXYY
YXYYY\P
Partial match: YXYYY
YXYYYY\P
Partial match: YXYYYY
/(Y)X\1{2,4}+b/8
YX\P
Partial match: YX
YXY\P
Partial match: YXY
YXYY\P
Partial match: YXYY
YXYYY\P
Partial match: YXYYY
YXYYYY\P
Partial match: YXYYYY
/(\x{123})X\1{2,4}b/8
\x{123}X\P
Partial match: \x{123}X
\x{123}X\x{123}\P
Partial match: \x{123}X\x{123}
\x{123}X\x{123}\x{123}\P
Partial match: \x{123}X\x{123}\x{123}
\x{123}X\x{123}\x{123}\x{123}\P
Partial match: \x{123}X\x{123}\x{123}\x{123}
\x{123}X\x{123}\x{123}\x{123}\x{123}\P
Partial match: \x{123}X\x{123}\x{123}\x{123}\x{123}
/(\x{123})X\1{2,4}?b/8
\x{123}X\P
Partial match: \x{123}X
\x{123}X\x{123}\P
Partial match: \x{123}X\x{123}
\x{123}X\x{123}\x{123}\P
Partial match: \x{123}X\x{123}\x{123}
\x{123}X\x{123}\x{123}\x{123}\P
Partial match: \x{123}X\x{123}\x{123}\x{123}
\x{123}X\x{123}\x{123}\x{123}\x{123}\P
Partial match: \x{123}X\x{123}\x{123}\x{123}\x{123}
/(\x{123})X\1{2,4}+b/8
\x{123}X\P
Partial match: \x{123}X
\x{123}X\x{123}\P
Partial match: \x{123}X\x{123}
\x{123}X\x{123}\x{123}\P
Partial match: \x{123}X\x{123}\x{123}
\x{123}X\x{123}\x{123}\x{123}\P
Partial match: \x{123}X\x{123}\x{123}\x{123}
\x{123}X\x{123}\x{123}\x{123}\x{123}\P
Partial match: \x{123}X\x{123}\x{123}\x{123}\x{123}
/\bthe cat\b/8
the cat\P
0: the cat
the cat\P\P
Partial match: the cat
/abcd*/8
xxxxabcd\P
0: abcd
xxxxabcd\P\P
Partial match: abcd
/abcd*/i8
xxxxabcd\P
0: abcd
xxxxabcd\P\P
Partial match: abcd
XXXXABCD\P
0: ABCD
XXXXABCD\P\P
Partial match: ABCD
/abc\d*/8
xxxxabc1\P
0: abc1
xxxxabc1\P\P
Partial match: abc1
/(a)bc\1*/8
xxxxabca\P
0: abca
1: a
xxxxabca\P\P
Partial match: abca
/abc[de]*/8
xxxxabcde\P
0: abcde
xxxxabcde\P\P
Partial match: abcde
/-- End of testinput5 --/

View File

@ -1,3 +1,7 @@
/-- This set of tests is for Unicode property support. It is compatible with
Perl 5.10, but not 5.8 because it tests some extra properties that are
not in the earlier release. --/
/^\pC\pL\pM\pN\pP\pS\pZ</8
\x7f\x{c0}\x{30f}\x{660}\x{66c}\x{f01}\x{1680}<
0: \x{7f}\x{c0}\x{30f}\x{660}\x{66c}\x{f01}\x{1680}<
@ -98,14 +102,6 @@ No match
\x{09f}
No match
/^\p{Cs}/8
\?\x{dfff}
0: \x{dfff}
** Failers
No match
\x{09f}
No match
/^\p{Ll}/8
a
0: a
@ -338,18 +334,6 @@ No match
\x{f3b}
No match
/^\p{Sc}+/8
$\x{a2}\x{a3}\x{a4}\x{a5}\x{a6}
0: $\x{a2}\x{a3}\x{a4}\x{a5}
\x{9f2}
0: \x{9f2}
** Failers
No match
X
No match
\x{2c2}
No match
/^\p{Sk}/8
\x{2c2}
0: \x{2c2}
@ -402,26 +386,6 @@ No match
\x{2028}
No match
/^\p{Zs}/8
\ \
0:
\x{a0}
0: \x{a0}
\x{1680}
0: \x{1680}
\x{180e}
0: \x{180e}
\x{2000}
0: \x{2000}
\x{2001}
0: \x{2001}
** Failers
No match
\x{2028}
No match
\x{200d}
No match
/\p{Nd}+(..)/8
\x{660}\x{661}\x{662}ABC
0: \x{660}\x{661}\x{662}AB
@ -494,34 +458,6 @@ No match
\x{660}\x{661}\x{662}ABC
No match
/\p{Lu}/8i
A
0: A
a\x{10a0}B
0: \x{10a0}
** Failers
0: F
a
No match
\x{1d00}
No match
/\p{^Lu}/8i
1234
0: 1
** Failers
0: *
ABC
No match
/\P{Lu}/8i
1234
0: 1
** Failers
0: *
ABC
No match
/(?<=A\p{Nd})XYZ/8
A2XYZ
0: XYZ
@ -548,103 +484,6 @@ No match
WXYZ
No match
/[\p{L}]/DZ
------------------------------------------------------------------
Bra
[\p{L}]
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
No options
No first char
No need char
/[\p{^L}]/DZ
------------------------------------------------------------------
Bra
[\P{L}]
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
No options
No first char
No need char
/[\P{L}]/DZ
------------------------------------------------------------------
Bra
[\P{L}]
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
No options
No first char
No need char
/[\P{^L}]/DZ
------------------------------------------------------------------
Bra
[\p{L}]
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
No options
No first char
No need char
/[abc\p{L}\x{0660}]/8DZ
------------------------------------------------------------------
Bra
[a-c\p{L}\x{660}]
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf8
No first char
No need char
/[\p{Nd}]/8DZ
------------------------------------------------------------------
Bra
[\p{Nd}]
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf8
No first char
No need char
1234
0: 1
/[\p{Nd}+-]+/8DZ
------------------------------------------------------------------
Bra
[+\-\p{Nd}]+
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Partial matching not supported
Options: utf8
No first char
No need char
1234
0: 1234
12-34
0: 12-34
12+\x{661}-34
0: 12+\x{661}-34
** Failers
No match
abcd
No match
/[\P{Nd}]+/8
abcd
0: abcd
@ -725,28 +564,6 @@ No match
ABC
No match
/\p{Ll}/8i
a
0: a
Az
0: z
** Failers
0: a
ABC
No match
/^\x{c0}$/8i
\x{c0}
0: \x{c0}
\x{e0}
0: \x{e0}
/^\x{e0}$/8i
\x{c0}
0: \x{c0}
\x{e0}
0: \x{e0}
/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/8
A\x{391}\x{10427}\x{ff3a}\x{1fb0}
0: A\x{391}\x{10427}\x{ff3a}\x{1fb0}
@ -777,54 +594,6 @@ No match
A\x{391}\x{10427}\x{ff3a}\x{1fb8}
0: A\x{391}\x{10427}\x{ff3a}\x{1fb8}
/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/8iDZ
------------------------------------------------------------------
Bra
NC A\x{391}\x{10427}\x{ff3a}\x{1fb0}
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: caseless utf8
First char = 'A' (caseless)
No need char
/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/8DZ
------------------------------------------------------------------
Bra
A\x{391}\x{10427}\x{ff3a}\x{1fb0}
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf8
First char = 'A'
Need char = 176
/AB\x{1fb0}/8DZ
------------------------------------------------------------------
Bra
AB\x{1fb0}
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf8
First char = 'A'
Need char = 176
/AB\x{1fb0}/8DZi
------------------------------------------------------------------
Bra
NC AB\x{1fb0}
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: caseless utf8
First char = 'A' (caseless)
Need char = 'B' (caseless)
/\x{391}+/8i
\x{391}\x{3b1}\x{3b1}\x{3b1}\x{391}
0: \x{391}\x{3b1}\x{3b1}\x{3b1}\x{391}
@ -849,86 +618,6 @@ Need char = 'B' (caseless)
\x{ff5a}
0: \x{ff5a}
/[\x{c0}\x{391}]/8i
\x{c0}
0: \x{c0}
\x{e0}
0: \x{e0}
/[\x{105}-\x{109}]/8iDZ
------------------------------------------------------------------
Bra
[\x{104}-\x{109}]
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: caseless utf8
No first char
No need char
\x{104}
0: \x{104}
\x{105}
0: \x{105}
\x{109}
0: \x{109}
** Failers
No match
\x{100}
No match
\x{10a}
No match
/[z-\x{100}]/8iDZ
------------------------------------------------------------------
Bra
[Z\x{39c}\x{178}z-\x{101}]
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: caseless utf8
No first char
No need char
Z
0: Z
z
0: z
\x{39c}
0: \x{39c}
\x{178}
0: \x{178}
|
0: |
\x{80}
0: \x{80}
\x{ff}
0: \x{ff}
\x{100}
0: \x{100}
\x{101}
0: \x{101}
** Failers
No match
\x{102}
No match
Y
No match
y
No match
/[z-\x{100}]/8DZi
------------------------------------------------------------------
Bra
[Z\x{39c}\x{178}z-\x{101}]
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: caseless utf8
No first char
No need char
/^\X/8
A
0: A
@ -1408,42 +1097,10 @@ No match
1: L=abc
2: abc
/The next two should be Perl-compatible, but it fails to match \x{e0}. PCRE
will match it only with UCP support, because without that it has no notion
of case for anything other than the ASCII letters. /
/((?i)[\x{c0}])/8
\x{c0}
0: \x{c0}
1: \x{c0}
\x{e0}
0: \x{e0}
1: \x{e0}
/(?i:[\x{c0}])/8
\x{c0}
0: \x{c0}
\x{e0}
0: \x{e0}
/^\p{Balinese}\p{Cuneiform}\p{Nko}\p{Phags_Pa}\p{Phoenician}/8
\x{1b00}\x{12000}\x{7c0}\x{a840}\x{10900}
0: \x{1b00}\x{12000}\x{7c0}\x{a840}\x{10900}
/The next two are special cases where the lengths of the different cases of the
same character differ. The first went wrong with heap frame storage; the 2nd
was broken in all cases./
/^\x{023a}+?(\x{0130}+)/8i
\x{023a}\x{2c65}\x{0130}
0: \x{23a}\x{2c65}\x{130}
1: \x{130}
/^\x{023a}+([^X])/8i
\x{023a}\x{2c65}X
0: \x{23a}\x{2c65}
1: \x{2c65}
/Check property support in non-UTF-8 mode/
/\p{L}{4}/
@ -1468,60 +1125,6 @@ No match
A\x80
0: A\x80
/(?:[\PPa*]*){8,}/
/[\P{Any}]/BZ
------------------------------------------------------------------
Bra
[\P{Any}]
Ket
End
------------------------------------------------------------------
/[\P{Any}\E]/BZ
------------------------------------------------------------------
Bra
[\P{Any}]
Ket
End
------------------------------------------------------------------
/(\P{Yi}+\277)/
/(\P{Yi}+\277)?/
/(?<=\P{Yi}{3}A)X/
/\p{Yi}+(\P{Yi}+)(?1)/
/(\P{Yi}{2}\277)?/
/[\P{Yi}A]/
/[\P{Yi}\P{Yi}\P{Yi}A]/
/[^\P{Yi}A]/
/[^\P{Yi}\P{Yi}\P{Yi}A]/
/(\P{Yi}*\277)*/
/(\P{Yi}*?\277)*/
/(\p{Yi}*+\277)*/
/(\P{Yi}?\277)*/
/(\P{Yi}??\277)*/
/(\p{Yi}?+\277)*/
/(\P{Yi}{0,3}\277)*/
/(\P{Yi}{0,3}?\277)*/
/(\p{Yi}{0,3}+\277)*/
/^[\p{Arabic}]/8
\x{60e}
0: \x{60e}
@ -1634,59 +1237,6 @@ No match
\x{104aa}
No match
/\p{Zl}{2,3}+/8BZ
------------------------------------------------------------------
Bra
prop Zl {2}
prop Zl ?+
Ket
End
------------------------------------------------------------------
\xe2\x80\xa8\xe2\x80\xa8
0: \x{2028}\x{2028}
\x{2028}\x{2028}\x{2028}
0: \x{2028}\x{2028}\x{2028}
/\p{Zl}/8BZ
------------------------------------------------------------------
Bra
prop Zl
Ket
End
------------------------------------------------------------------
/\p{Lu}{3}+/8BZ
------------------------------------------------------------------
Bra
prop Lu {3}
Ket
End
------------------------------------------------------------------
/\pL{2}+/8BZ
------------------------------------------------------------------
Bra
prop L {2}
Ket
End
------------------------------------------------------------------
/\p{Cc}{2}+/8BZ
------------------------------------------------------------------
Bra
prop Cc {2}
Ket
End
------------------------------------------------------------------
/\x{c0}+\x{116}+/8i
\x{c0}\x{e0}\x{116}\x{117}
0: \x{c0}\x{e0}\x{116}\x{117}
/[\x{c0}\x{116}]+/8i
\x{c0}\x{e0}\x{116}\x{117}
0: \x{c0}\x{e0}\x{116}\x{117}
/\p{Carian}\p{Cham}\p{Kayah_Li}\p{Lepcha}\p{Lycian}\p{Lydian}\p{Ol_Chiki}\p{Rejang}\p{Saurashtra}\p{Sundanese}\p{Vai}/8
\x{102A4}\x{AA52}\x{A91D}\x{1C46}\x{10283}\x{1092E}\x{1C6B}\x{A93B}\x{A8BF}\x{1BA0}\x{A50A}====
0: \x{102a4}\x{aa52}\x{a91d}\x{1c46}\x{10283}\x{1092e}\x{1c6b}\x{a93b}\x{a8bf}\x{1ba0}\x{a50a}
@ -1719,20 +1269,6 @@ No match
0: aA
1: a
/(\x{de})\1/8i
\x{de}\x{de}
0: \x{de}\x{de}
1: \x{de}
\x{de}\x{fe}
0: \x{de}\x{fe}
1: \x{de}
\x{fe}\x{fe}
0: \x{fe}\x{fe}
1: \x{fe}
\x{fe}\x{de}
0: \x{fe}\x{de}
1: \x{fe}
/(\x{10a})\1/8i
\x{10a}\x{10a}
0: \x{10a}\x{10a}
@ -1757,4 +1293,4 @@ No match
\x41\x20\x50\xC2\x54\xC9\x20\x54\x4F\x44\x41\x59
0: A P\xc2T\xc9 TODAY
/ End of testinput6 /
/-- End of testinput6 --/

View File

@ -1,3 +1,6 @@
/-- This set of tests check the DFA matching functionality of pcre_dfa_exec().
The -dfa flag must be used with pcretest when running it. --/
/abc/
abc
0: abc
@ -981,7 +984,7 @@ Partial match: abc
xyzfo\P
No match
foob\P\>2
Partial match: b
Partial match: foob
foobar...\R\P\>4
0: ar
xyzfo\P
@ -7168,7 +7171,6 @@ No match
/a\R{2,4}b/I<bsr_anycrlf>
Capturing subpattern count = 0
Partial matching not supported
Options: bsr_anycrlf
First char = 'a'
Need char = 'b'
@ -7187,7 +7189,6 @@ No match
/a\R{2,4}b/I<bsr_unicode>
Capturing subpattern count = 0
Partial matching not supported
Options: bsr_unicode
First char = 'a'
Need char = 'b'
@ -7370,4 +7371,217 @@ No match
\C-"ab"
0: "ab"
/ End of testinput7 /
/\d+X|9+Y/
++++123999\P
Partial match: 123999
++++123999Y\P
0: 999Y
/Z(*F)/
Z\P
No match
ZA\P
No match
/Z(?!)/
Z\P
No match
ZA\P
No match
/dog(sbody)?/
dogs\P
0: dog
dogs\P\P
Partial match: dogs
/dog(sbody)??/
dogs\P
0: dog
dogs\P\P
Partial match: dogs
/dog|dogsbody/
dogs\P
0: dog
dogs\P\P
Partial match: dogs
/dogsbody|dog/
dogs\P
0: dog
dogs\P\P
Partial match: dogs
/Z(*F)Q|ZXY/
Z\P
Partial match: Z
ZA\P
No match
X\P
No match
/\bthe cat\b/
the cat\P
0: the cat
the cat\P\P
Partial match: the cat
/dog(sbody)?/
dogs\D\P
0: dog
body\D\R
0: body
/dog(sbody)?/
dogs\D\P\P
Partial match: dogs
body\D\R
0: body
/abc/
abc\P
0: abc
abc\P\P
0: abc
/abc\K123/
xyzabc123pqr
Error -16
/(?<=abc)123/
xyzabc123pqr
0: 123
xyzabc12\P
Partial match: abc12
xyzabc12\P\P
Partial match: abc12
/\babc\b/
+++abc+++
0: abc
+++ab\P
Partial match: +ab
+++ab\P\P
Partial match: +ab
/(?=C)/g+
ABCDECBA
0:
0+ CDECBA
0:
0+ CBA
/(abc|def|xyz)/I
Capturing subpattern count = 1
No options
No first char
No need char
terhjk;abcdaadsfe
0: abc
the quick xyz brown fox
0: xyz
\Yterhjk;abcdaadsfe
0: abc
\Ythe quick xyz brown fox
0: xyz
** Failers
No match
thejk;adlfj aenjl;fda asdfasd ehj;kjxyasiupd
No match
\Ythejk;adlfj aenjl;fda asdfasd ehj;kjxyasiupd
No match
/(abc|def|xyz)/SI
Capturing subpattern count = 1
No options
No first char
No need char
Subject length lower bound = 3
Starting byte set: a d x
terhjk;abcdaadsfe
0: abc
the quick xyz brown fox
0: xyz
\Yterhjk;abcdaadsfe
0: abc
\Ythe quick xyz brown fox
0: xyz
** Failers
No match
thejk;adlfj aenjl;fda asdfasd ehj;kjxyasiupd
No match
\Ythejk;adlfj aenjl;fda asdfasd ehj;kjxyasiupd
No match
/abcd*/+
xxxxabcd\P
0: abcd
0+
1: abc
xxxxabcd\P\P
Partial match: abcd
dddxxx\R
0: ddd
0+ xxx
1: dd
2: d
3:
xxxxabcd\P\P
Partial match: abcd
xxx\R
0:
0+ xxx
/abcd*/i
xxxxabcd\P
0: abcd
1: abc
xxxxabcd\P\P
Partial match: abcd
XXXXABCD\P
0: ABCD
1: ABC
XXXXABCD\P\P
Partial match: ABCD
/abc\d*/
xxxxabc1\P
0: abc1
1: abc
xxxxabc1\P\P
Partial match: abc1
/abc[de]*/
xxxxabcde\P
0: abcde
1: abcd
2: abc
xxxxabcde\P\P
Partial match: abcde
/(?:(?1)|B)(A(*F)|C)/
ABCD
0: BC
CCD
0: CC
** Failers
No match
CAD
No match
/^(?:(?1)|B)(A(*F)|C)/
CCD
0: CC
BCD
0: BC
** Failers
No match
ABCD
No match
CAD
No match
BAD
No match
/-- End of testinput7 --/

View File

@ -1,8 +1,6 @@
/-- Do not use the \x{} construct except with patterns that have the --/
/-- /8 option set, because PCRE doesn't recognize them as UTF-8 unless --/
No match
/-- that option is set. However, the latest Perls recognize them always. --/
No match
/-- This set of tests checks UTF-8 support with the DFA matching functionality
of pcre_dfa_exec(). The -dfa flag must be used with pcretest when running
it. --/
/\x{100}ab/8
\x{100}ab
@ -1288,4 +1286,38 @@ No match
A\x{1ec5}ABCXYZ
0: X
/ End of testinput 8 /
/abcd*/8
xxxxabcd\P
0: abcd
1: abc
xxxxabcd\P\P
Partial match: abcd
/abcd*/i8
xxxxabcd\P
0: abcd
1: abc
xxxxabcd\P\P
Partial match: abcd
XXXXABCD\P
0: ABCD
1: ABC
XXXXABCD\P\P
Partial match: ABCD
/abc\d*/8
xxxxabc1\P
0: abc1
1: abc
xxxxabc1\P\P
Partial match: abc1
/abc[de]*/8
xxxxabcde\P
0: abcde
1: abcd
2: abc
xxxxabcde\P\P
Partial match: abcde
/-- End of testinput8 --/

View File

@ -1,3 +1,7 @@
/-- This set of tests check Unicode property support with the DFA matching
functionality of pcre_dfa_exec(). The -dfa flag must be used with pcretest
when running it. --/
/\pL\P{Nd}/8
AB
0: AB
@ -1670,4 +1674,4 @@ No match
\x{1d79}\x{a77d}
No match
/ End /
/-- End of testinput9 --/

View File

@ -84,7 +84,12 @@ recurse('pcrelib');
$dirorig = scandir('pcrelib/testdata');
$k = array_search('CVS', $dirorig);
unset($dirorig[$k]);
if ($k !== false)
unset($dirorig[$k]);
$k = array_search('.svn', $dirorig);
if ($k !== false)
unset($dirorig[$k]);
$dirnew = scandir("$newpcre/testdata");
$diff = array_diff($dirorig, $dirnew);