mirror of
https://github.com/php/php-src.git
synced 2024-11-28 12:26:37 +08:00
Update PCRE to 8.00
This commit is contained in:
parent
26e3082abc
commit
f03b175f7c
@ -1,6 +1,170 @@
|
||||
ChangeLog for PCRE
|
||||
------------------
|
||||
|
||||
Version 8.00 19-Oct-09
|
||||
----------------------
|
||||
|
||||
1. The table for translating pcre_compile() error codes into POSIX error codes
|
||||
was out-of-date, and there was no check on the pcre_compile() error code
|
||||
being within the table. This could lead to an OK return being given in
|
||||
error.
|
||||
|
||||
2. Changed the call to open a subject file in pcregrep from fopen(pathname,
|
||||
"r") to fopen(pathname, "rb"), which fixed a problem with some of the tests
|
||||
in a Windows environment.
|
||||
|
||||
3. The pcregrep --count option prints the count for each file even when it is
|
||||
zero, as does GNU grep. However, pcregrep was also printing all files when
|
||||
--files-with-matches was added. Now, when both options are given, it prints
|
||||
counts only for those files that have at least one match. (GNU grep just
|
||||
prints the file name in this circumstance, but including the count seems
|
||||
more useful - otherwise, why use --count?) Also ensured that the
|
||||
combination -clh just lists non-zero counts, with no names.
|
||||
|
||||
4. The long form of the pcregrep -F option was incorrectly implemented as
|
||||
--fixed_strings instead of --fixed-strings. This is an incompatible change,
|
||||
but it seems right to fix it, and I didn't think it was worth preserving
|
||||
the old behaviour.
|
||||
|
||||
5. The command line items --regex=pattern and --regexp=pattern were not
|
||||
recognized by pcregrep, which required --regex pattern or --regexp pattern
|
||||
(with a space rather than an '='). The man page documented the '=' forms,
|
||||
which are compatible with GNU grep; these now work.
|
||||
|
||||
6. No libpcreposix.pc file was created for pkg-config; there was just
|
||||
libpcre.pc and libpcrecpp.pc. The omission has been rectified.
|
||||
|
||||
7. Added #ifndef SUPPORT_UCP into the pcre_ucd.c module, to reduce its size
|
||||
when UCP support is not needed, by modifying the Python script that
|
||||
generates it from Unicode data files. This should not matter if the module
|
||||
is correctly used as a library, but I received one complaint about 50K of
|
||||
unwanted data. My guess is that the person linked everything into his
|
||||
program rather than using a library. Anyway, it does no harm.
|
||||
|
||||
8. A pattern such as /\x{123}{2,2}+/8 was incorrectly compiled; the trigger
|
||||
was a minimum greater than 1 for a wide character in a possessive
|
||||
repetition. The same bug could also affect patterns like /(\x{ff}{0,2})*/8
|
||||
which had an unlimited repeat of a nested, fixed maximum repeat of a wide
|
||||
character. Chaos in the form of incorrect output or a compiling loop could
|
||||
result.
|
||||
|
||||
9. The restrictions on what a pattern can contain when partial matching is
|
||||
requested for pcre_exec() have been removed. All patterns can now be
|
||||
partially matched by this function. In addition, if there are at least two
|
||||
slots in the offset vector, the offset of the earliest inspected character
|
||||
for the match and the offset of the end of the subject are set in them when
|
||||
PCRE_ERROR_PARTIAL is returned.
|
||||
|
||||
10. Partial matching has been split into two forms: PCRE_PARTIAL_SOFT, which is
|
||||
synonymous with PCRE_PARTIAL, for backwards compatibility, and
|
||||
PCRE_PARTIAL_HARD, which causes a partial match to supersede a full match,
|
||||
and may be more useful for multi-segment matching.
|
||||
|
||||
11. Partial matching with pcre_exec() is now more intuitive. A partial match
|
||||
used to be given if ever the end of the subject was reached; now it is
|
||||
given only if matching could not proceed because another character was
|
||||
needed. This makes a difference in some odd cases such as Z(*FAIL) with the
|
||||
string "Z", which now yields "no match" instead of "partial match". In the
|
||||
case of pcre_dfa_exec(), "no match" is given if every matching path for the
|
||||
final character ended with (*FAIL).
|
||||
|
||||
12. Restarting a match using pcre_dfa_exec() after a partial match did not work
|
||||
if the pattern had a "must contain" character that was already found in the
|
||||
earlier partial match, unless partial matching was again requested. For
|
||||
example, with the pattern /dog.(body)?/, the "must contain" character is
|
||||
"g". If the first part-match was for the string "dog", restarting with
|
||||
"sbody" failed. This bug has been fixed.
|
||||
|
||||
13. The string returned by pcre_dfa_exec() after a partial match has been
|
||||
changed so that it starts at the first inspected character rather than the
|
||||
first character of the match. This makes a difference only if the pattern
|
||||
starts with a lookbehind assertion or \b or \B (\K is not supported by
|
||||
pcre_dfa_exec()). It's an incompatible change, but it makes the two
|
||||
matching functions compatible, and I think it's the right thing to do.
|
||||
|
||||
14. Added a pcredemo man page, created automatically from the pcredemo.c file,
|
||||
so that the demonstration program is easily available in environments where
|
||||
PCRE has not been installed from source.
|
||||
|
||||
15. Arranged to add -DPCRE_STATIC to cflags in libpcre.pc, libpcreposix.cp,
|
||||
libpcrecpp.pc and pcre-config when PCRE is not compiled as a shared
|
||||
library.
|
||||
|
||||
16. Added REG_UNGREEDY to the pcreposix interface, at the request of a user.
|
||||
It maps to PCRE_UNGREEDY. It is not, of course, POSIX-compatible, but it
|
||||
is not the first non-POSIX option to be added. Clearly some people find
|
||||
these options useful.
|
||||
|
||||
17. If a caller to the POSIX matching function regexec() passes a non-zero
|
||||
value for nmatch with a NULL value for pmatch, the value of
|
||||
nmatch is forced to zero.
|
||||
|
||||
18. RunGrepTest did not have a test for the availability of the -u option of
|
||||
the diff command, as RunTest does. It now checks in the same way as
|
||||
RunTest, and also checks for the -b option.
|
||||
|
||||
19. If an odd number of negated classes containing just a single character
|
||||
interposed, within parentheses, between a forward reference to a named
|
||||
subpattern and the definition of the subpattern, compilation crashed with
|
||||
an internal error, complaining that it could not find the referenced
|
||||
subpattern. An example of a crashing pattern is /(?&A)(([^m])(?<A>))/.
|
||||
[The bug was that it was starting one character too far in when skipping
|
||||
over the character class, thus treating the ] as data rather than
|
||||
terminating the class. This meant it could skip too much.]
|
||||
|
||||
20. Added PCRE_NOTEMPTY_ATSTART in order to be able to correctly implement the
|
||||
/g option in pcretest when the pattern contains \K, which makes it possible
|
||||
to have an empty string match not at the start, even when the pattern is
|
||||
anchored. Updated pcretest and pcredemo to use this option.
|
||||
|
||||
21. If the maximum number of capturing subpatterns in a recursion was greater
|
||||
than the maximum at the outer level, the higher number was returned, but
|
||||
with unset values at the outer level. The correct (outer level) value is
|
||||
now given.
|
||||
|
||||
22. If (*ACCEPT) appeared inside capturing parentheses, previous releases of
|
||||
PCRE did not set those parentheses (unlike Perl). I have now found a way to
|
||||
make it do so. The string so far is captured, making this feature
|
||||
compatible with Perl.
|
||||
|
||||
23. The tests have been re-organized, adding tests 11 and 12, to make it
|
||||
possible to check the Perl 5.10 features against Perl 5.10.
|
||||
|
||||
24. Perl 5.10 allows subroutine calls in lookbehinds, as long as the subroutine
|
||||
pattern matches a fixed length string. PCRE did not allow this; now it
|
||||
does. Neither allows recursion.
|
||||
|
||||
25. I finally figured out how to implement a request to provide the minimum
|
||||
length of subject string that was needed in order to match a given pattern.
|
||||
(It was back references and recursion that I had previously got hung up
|
||||
on.) This code has now been added to pcre_study(); it finds a lower bound
|
||||
to the length of subject needed. It is not necessarily the greatest lower
|
||||
bound, but using it to avoid searching strings that are too short does give
|
||||
some useful speed-ups. The value is available to calling programs via
|
||||
pcre_fullinfo().
|
||||
|
||||
26. While implementing 25, I discovered to my embarrassment that pcretest had
|
||||
not been passing the result of pcre_study() to pcre_dfa_exec(), so the
|
||||
study optimizations had never been tested with that matching function.
|
||||
Oops. What is worse, even when it was passed study data, there was a bug in
|
||||
pcre_dfa_exec() that meant it never actually used it. Double oops. There
|
||||
were also very few tests of studied patterns with pcre_dfa_exec().
|
||||
|
||||
27. If (?| is used to create subpatterns with duplicate numbers, they are now
|
||||
allowed to have the same name, even if PCRE_DUPNAMES is not set. However,
|
||||
on the other side of the coin, they are no longer allowed to have different
|
||||
names, because these cannot be distinguished in PCRE, and this has caused
|
||||
confusion. (This is a difference from Perl.)
|
||||
|
||||
28. When duplicate subpattern names are present (necessarily with different
|
||||
numbers, as required by 27 above), and a test is made by name in a
|
||||
conditional pattern, either for a subpattern having been matched, or for
|
||||
recursion in such a pattern, all the associated numbered subpatterns are
|
||||
tested, and the overall condition is true if the condition is true for any
|
||||
one of them. This is the way Perl works, and is also more like the way
|
||||
testing by number works.
|
||||
|
||||
|
||||
Version 7.9 11-Apr-09
|
||||
---------------------
|
||||
|
||||
|
@ -67,22 +67,22 @@ many tests of the mode that might slow it down. So I re-factored the compiling
|
||||
functions to work this way. This got rid of about 600 lines of source. It
|
||||
should make future maintenance and development easier. As this was such a major
|
||||
change, I never released 6.8, instead upping the number to 7.0 (other quite
|
||||
major changes are also present in the 7.0 release).
|
||||
major changes were also present in the 7.0 release).
|
||||
|
||||
A side effect of this work is that the previous limit of 200 on the nesting
|
||||
A side effect of this work was that the previous limit of 200 on the nesting
|
||||
depth of parentheses was removed. However, there is a downside: pcre_compile()
|
||||
runs more slowly than before (30% or more, depending on the pattern) because it
|
||||
is doing a full analysis of the pattern. My hope is that this is not a big
|
||||
issue.
|
||||
is doing a full analysis of the pattern. My hope was that this would not be a
|
||||
big issue, and in the event, nobody has commented on it.
|
||||
|
||||
Traditional matching function
|
||||
-----------------------------
|
||||
|
||||
The "traditional", and original, matching function is called pcre_exec(), and
|
||||
it implements an NFA algorithm, similar to the original Henry Spencer algorithm
|
||||
and the way that Perl works. Not surprising, since it is intended to be as
|
||||
compatible with Perl as possible. This is the function most users of PCRE will
|
||||
use most of the time.
|
||||
and the way that Perl works. This is not surprising, since it is intended to be
|
||||
as compatible with Perl as possible. This is the function most users of PCRE
|
||||
will use most of the time.
|
||||
|
||||
Supplementary matching function
|
||||
-------------------------------
|
||||
@ -119,6 +119,7 @@ quantifiers) are always just two bytes long.
|
||||
|
||||
A list of the opcodes follows:
|
||||
|
||||
|
||||
Opcodes with no following data
|
||||
------------------------------
|
||||
|
||||
@ -150,12 +151,12 @@ These items are all just one byte long
|
||||
OP_EXTUNI match an extended Unicode character
|
||||
OP_ANYNL match any Unicode newline sequence
|
||||
|
||||
OP_ACCEPT )
|
||||
OP_COMMIT )
|
||||
OP_FAIL ) These are Perl 5.10's "backtracking
|
||||
OP_PRUNE ) control verbs".
|
||||
OP_SKIP )
|
||||
OP_THEN )
|
||||
OP_ACCEPT ) These are Perl 5.10's "backtracking
|
||||
OP_COMMIT ) control verbs". If OP_ACCEPT is inside
|
||||
OP_FAIL ) capturing parentheses, it may be preceded
|
||||
OP_PRUNE ) by one or more OP_CLOSE, followed by a 2-byte
|
||||
OP_SKIP ) number, indicating which parentheses must be
|
||||
OP_THEN ) closed.
|
||||
|
||||
|
||||
Repeating single characters
|
||||
@ -372,12 +373,15 @@ These are like other subpatterns, but they start with the opcode OP_COND, or
|
||||
OP_SCOND for one that might match an empty string in an unbounded repeat. If
|
||||
the condition is a back reference, this is stored at the start of the
|
||||
subpattern using the opcode OP_CREF followed by two bytes containing the
|
||||
reference number. If the condition is "in recursion" (coded as "(?(R)"), or "in
|
||||
recursion of group x" (coded as "(?(Rx)"), the group number is stored at the
|
||||
start of the subpattern using the opcode OP_RREF, and a value of zero for "the
|
||||
whole pattern". For a DEFINE condition, just the single byte OP_DEF is used (it
|
||||
has no associated data). Otherwise, a conditional subpattern always starts with
|
||||
one of the assertions.
|
||||
reference number. OP_NCREF is used instead if the reference was generated by
|
||||
name (so that the runtime code knows to check for duplicate names).
|
||||
|
||||
If the condition is "in recursion" (coded as "(?(R)"), or "in recursion of
|
||||
group x" (coded as "(?(Rx)"), the group number is stored at the start of the
|
||||
subpattern using the opcode OP_RREF or OP_NRREF (cf OP_NCREF), and a value of
|
||||
zero for "the whole pattern". For a DEFINE condition, just the single byte
|
||||
OP_DEF is used (it has no associated data). Otherwise, a conditional subpattern
|
||||
always starts with one of the assertions.
|
||||
|
||||
|
||||
Recursion
|
||||
@ -415,4 +419,4 @@ at compile time, and so does not cause anything to be put into the compiled
|
||||
data.
|
||||
|
||||
Philip Hazel
|
||||
April 2008
|
||||
October 2009
|
||||
|
@ -4,7 +4,7 @@ PCRE LICENCE
|
||||
PCRE is a library of functions to support regular expressions whose syntax
|
||||
and semantics are as close as possible to those of the Perl 5 language.
|
||||
|
||||
Release 7 of PCRE is distributed under the terms of the "BSD" licence, as
|
||||
Release 8 of PCRE is distributed under the terms of the "BSD" licence, as
|
||||
specified below. The documentation for PCRE, supplied in the "doc"
|
||||
directory, is distributed under the same terms as the software itself.
|
||||
|
||||
|
@ -1,6 +1,21 @@
|
||||
News about PCRE releases
|
||||
------------------------
|
||||
|
||||
Release 8.00 19-Oct-09
|
||||
----------------------
|
||||
|
||||
Bugs have been fixed in the library and in pcregrep. There are also some
|
||||
enhancements. Restrictions on patterns used for partial matching have been
|
||||
removed, extra information is given for partial matches, the partial matching
|
||||
process has been improved, and an option to make a partial match override a
|
||||
full match is available. The "study" process has been enhanced by finding a
|
||||
lower bound matching length. Groups with duplicate numbers may now have
|
||||
duplicated names without the use of PCRE_DUPNAMES. However, they may not have
|
||||
different names. The documentation has been revised to reflect these changes.
|
||||
The version number has been expanded to 3 digits as it is clear that the rate
|
||||
of change is not slowing down.
|
||||
|
||||
|
||||
Release 7.9 11-Apr-09
|
||||
---------------------
|
||||
|
||||
|
@ -12,9 +12,10 @@ This document contains the following sections:
|
||||
Comments about Win32 builds
|
||||
Building PCRE on Windows with CMake
|
||||
Use of relative paths with CMake on Windows
|
||||
Testing with runtest.bat
|
||||
Testing with RunTest.bat
|
||||
Building under Windows with BCC5.5
|
||||
Building PCRE on OpenVMS
|
||||
Building PCRE on Stratus OpenVOS
|
||||
|
||||
|
||||
GENERAL
|
||||
@ -36,10 +37,10 @@ wrapper functions are a separate issue (see below).
|
||||
|
||||
The PCRE distribution includes a "configure" file for use by the Configure/Make
|
||||
build system, as found in many Unix-like environments. There is also support
|
||||
support for CMake, which some users prefer, in particular in Windows
|
||||
environments. There are some instructions for CMake under Windows in the
|
||||
section entitled "Building PCRE with CMake" below. CMake can also be used to
|
||||
build PCRE in Unix-like systems.
|
||||
support for CMake, which some users prefer, especially in Windows environments.
|
||||
There are some instructions for CMake under Windows in the section entitled
|
||||
"Building PCRE with CMake" below. CMake can also be used to build PCRE in
|
||||
Unix-like systems.
|
||||
|
||||
|
||||
GENERIC INSTRUCTIONS FOR THE PCRE C LIBRARY
|
||||
@ -278,40 +279,42 @@ things in this area in future.
|
||||
|
||||
BUILDING PCRE ON WINDOWS WITH CMAKE
|
||||
|
||||
CMake is an alternative build facility that can be used instead of the
|
||||
traditional Unix "configure". CMake version 2.4.7 supports Borland makefiles,
|
||||
MinGW makefiles, MSYS makefiles, NMake makefiles, UNIX makefiles, Visual Studio
|
||||
6, Visual Studio 7, Visual Studio 8, and Watcom W8. The following instructions
|
||||
CMake is an alternative configuration facility that can be used instead of the
|
||||
traditional Unix "configure". CMake creates project files (make files, solution
|
||||
files, etc.) tailored to numerous development environments, including Visual
|
||||
Studio, Borland, Msys, MinGW, NMake, and Unix. The following instructions
|
||||
were contributed by a PCRE user.
|
||||
|
||||
1. Download CMake 2.4.7 or above from http://www.cmake.org/, install and ensure
|
||||
that cmake\bin is on your path.
|
||||
1. Install the latest CMake version available from http://www.cmake.org/, and
|
||||
ensure that cmake\bin is on your path.
|
||||
|
||||
2. Unzip (retaining folder structure) the PCRE source tree into a source
|
||||
directory such as C:\pcre.
|
||||
|
||||
3. Create a new, empty build directory: C:\pcre\build\
|
||||
3. Create a new, empty build directory, for example C:\pcre\build\
|
||||
|
||||
4. Run CMakeSetup from the Shell envirornment of your build tool, e.g., Msys
|
||||
for Msys/MinGW or Visual Studio Command Prompt for VC/VC++
|
||||
4. Run cmake-gui from the Shell envirornment of your build tool, for example,
|
||||
Msys for Msys/MinGW or Visual Studio Command Prompt for VC/VC++.
|
||||
|
||||
5. Enter C:\pcre\pcre-xx and C:\pcre\build for the source and build
|
||||
directories, respectively
|
||||
directories, respectively.
|
||||
|
||||
6. Hit the "Configure" button.
|
||||
|
||||
7. Select the particular IDE / build tool that you are using (Visual Studio,
|
||||
MSYS makefiles, MinGW makefiles, etc.)
|
||||
7. Select the particular IDE / build tool that you are using (Visual
|
||||
Studio, MSYS makefiles, MinGW makefiles, etc.)
|
||||
|
||||
8. The GUI will then list several configuration options. This is where you can
|
||||
enable UTF-8 support, etc.
|
||||
8. The GUI will then list several configuration options. This is where
|
||||
you can enable UTF-8 support or other PCRE optional features.
|
||||
|
||||
9. Hit "Configure" again. The adjacent "OK" button should now be active.
|
||||
9. Hit "Configure" again. The adjacent "Generate" button should now be
|
||||
active.
|
||||
|
||||
10. Hit "OK".
|
||||
10. Hit "Generate".
|
||||
|
||||
11. The build directory should now contain a usable build system, be it a
|
||||
solution file for Visual Studio, makefiles for MinGW, etc.
|
||||
solution file for Visual Studio, makefiles for MinGW, etc. Exit from
|
||||
cmake-gui and use the generated build system with your compiler or IDE.
|
||||
|
||||
|
||||
USE OF RELATIVE PATHS WITH CMAKE ON WINDOWS
|
||||
@ -444,5 +447,52 @@ $! Locale could not be set to fr
|
||||
$!
|
||||
=========================
|
||||
|
||||
Last Updated: 17 March 2009
|
||||
|
||||
BUILDING PCRE ON STRATUS OPENVOS
|
||||
|
||||
These notes on the port of PCRE to VOS (lightly edited) were supplied by
|
||||
Ashutosh Warikoo, whose email address has the local part awarikoo and the
|
||||
domain nse.co.in. The port was for version 7.9 in August 2009.
|
||||
|
||||
1. Building PCRE
|
||||
|
||||
I built pcre on OpenVOS Release 17.0.1at using GNU Tools 3.4a without any
|
||||
problems. I used the following packages to build PCRE:
|
||||
|
||||
ftp://ftp.stratus.com/pub/vos/posix/ga/posix.save.evf.gz
|
||||
|
||||
Please read and follow the instructions that come with these packages. To start
|
||||
the build of pcre, from the root of the package type:
|
||||
|
||||
./build.sh
|
||||
|
||||
2. Installing PCRE
|
||||
|
||||
Once you have successfully built PCRE, login to the SysAdmin group, switch to
|
||||
the root user, and type
|
||||
|
||||
[ !create_dir (master_disk)>usr --if needed ]
|
||||
[ !create_dir (master_disk)>usr>local --if needed ]
|
||||
!gmake install
|
||||
|
||||
This installs PCRE and its man pages into /usr/local. You can add
|
||||
(master_disk)>usr>local>bin to your command search paths, or if you are in
|
||||
BASH, add /usr/local/bin to the PATH environment variable.
|
||||
|
||||
4. Restrictions
|
||||
|
||||
This port requires readline library optionally. However during the build I
|
||||
faced some yet unexplored errors while linking with readline. As it was an
|
||||
optional component I chose to disable it.
|
||||
|
||||
5. Known Problems
|
||||
|
||||
I ran a the test suite, but you will have to be your own judge of whether this
|
||||
command, and this port, suits your purposes. If you find any problems that
|
||||
appear to be related to the port itself, please let me know. Please see the
|
||||
build.log file in the root of the package also.
|
||||
|
||||
|
||||
=========================
|
||||
Last Updated: 05 October 2009
|
||||
****
|
||||
|
@ -24,6 +24,7 @@ The contents of this README file are:
|
||||
Shared libraries on Unix-like systems
|
||||
Cross-compiling on Unix-like systems
|
||||
Using HP's ANSI C++ compiler (aCC)
|
||||
Using PCRE from MySQL
|
||||
Making new tarballs
|
||||
Testing PCRE
|
||||
Character tables
|
||||
@ -111,8 +112,8 @@ Building PCRE on non-Unix systems
|
||||
For a non-Unix system, please read the comments in the file NON-UNIX-USE,
|
||||
though if your system supports the use of "configure" and "make" you may be
|
||||
able to build PCRE in the same way as for Unix-like systems. PCRE can also be
|
||||
configured in many platform environments using the GUI facility of CMake's
|
||||
CMakeSetup. It creates Makefiles, solution files, etc.
|
||||
configured in many platform environments using the GUI facility provided by
|
||||
CMake's cmake-gui command. This creates Makefiles, solution files, etc.
|
||||
|
||||
PCRE has been compiled on many different operating systems. It should be
|
||||
straightforward to build PCRE on any system that has a Standard C compiler and
|
||||
@ -478,6 +479,26 @@ running the "configure" script:
|
||||
CXXLDFLAGS="-lstd_v2 -lCsup_v2"
|
||||
|
||||
|
||||
Using Sun's compilers for Solaris
|
||||
---------------------------------
|
||||
|
||||
A user reports that the following configurations work on Solaris 9 sparcv9 and
|
||||
Solaris 9 x86 (32-bit):
|
||||
|
||||
Solaris 9 sparcv9: ./configure --disable-cpp CC=/bin/cc CFLAGS="-m64 -g"
|
||||
Solaris 9 x86: ./configure --disable-cpp CC=/bin/cc CFLAGS="-g"
|
||||
|
||||
|
||||
Using PCRE from MySQL
|
||||
---------------------
|
||||
|
||||
On systems where both PCRE and MySQL are installed, it is possible to make use
|
||||
of PCRE from within MySQL, as an alternative to the built-in pattern matching.
|
||||
There is a web page that tells you how to do this:
|
||||
|
||||
http://www.mysqludf.org/lib_mysqludf_preg/index.php
|
||||
|
||||
|
||||
Making new tarballs
|
||||
-------------------
|
||||
|
||||
@ -553,22 +574,32 @@ document entitled NON-UNIX-USE.]
|
||||
|
||||
The fourth test checks the UTF-8 support. It is not run automatically unless
|
||||
PCRE is built with UTF-8 support. To do this you must set --enable-utf8 when
|
||||
running "configure". This file can be also fed directly to the perltest script,
|
||||
provided you are running Perl 5.8 or higher. (For Perl 5.6, a small patch,
|
||||
commented in the script, can be be used.)
|
||||
running "configure". This file can be also fed directly to the perltest.pl
|
||||
script, provided you are running Perl 5.8 or higher.
|
||||
|
||||
The fifth test checks error handling with UTF-8 encoding, and internal UTF-8
|
||||
features of PCRE that are not relevant to Perl.
|
||||
|
||||
The sixth test checks the support for Unicode character properties. It it not
|
||||
run automatically unless PCRE is built with Unicode property support. To to
|
||||
this you must set --enable-unicode-properties when running "configure".
|
||||
The sixth test (which is Perl-5.10 compatible) checks the support for Unicode
|
||||
character properties. It it not run automatically unless PCRE is built with
|
||||
Unicode property support. To to this you must set --enable-unicode-properties
|
||||
when running "configure".
|
||||
|
||||
The seventh, eighth, and ninth tests check the pcre_dfa_exec() alternative
|
||||
matching function, in non-UTF-8 mode, UTF-8 mode, and UTF-8 mode with Unicode
|
||||
property support, respectively. The eighth and ninth tests are not run
|
||||
automatically unless PCRE is build with the relevant support.
|
||||
|
||||
The tenth test checks some internal offsets and code size features; it is run
|
||||
only when the default "link size" of 2 is set (in other cases the sizes
|
||||
change).
|
||||
|
||||
The eleventh test checks out features that are new in Perl 5.10, and the
|
||||
twelfth test checks a number internals and non-Perl features concerned with
|
||||
Unicode property support. It it not run automatically unless PCRE is built with
|
||||
Unicode property support. To to this you must set --enable-unicode-properties
|
||||
when running "configure".
|
||||
|
||||
|
||||
Character tables
|
||||
----------------
|
||||
@ -712,7 +743,7 @@ The distribution should contain the following files:
|
||||
) "configure" and config.h
|
||||
depcomp ) script to find program dependencies, generated by
|
||||
) automake
|
||||
doc/*.3 man page sources for the PCRE functions
|
||||
doc/*.3 man page sources for PCRE
|
||||
doc/*.1 man page sources for pcregrep and pcretest
|
||||
doc/index.html.src the base HTML page
|
||||
doc/html/* HTML documentation
|
||||
@ -721,6 +752,7 @@ The distribution should contain the following files:
|
||||
doc/perltest.txt plain text documentation of Perl test program
|
||||
install-sh a shell script for installing files
|
||||
libpcre.pc.in template for libpcre.pc for pkg-config
|
||||
libpcreposix.pc.in template for libpcreposix.pc for pkg-config
|
||||
libpcrecpp.pc.in template for libpcrecpp.pc for pkg-config
|
||||
ltmain.sh file used to build a libtool script
|
||||
missing ) common stub for a few missing GNU programs while
|
||||
@ -764,4 +796,4 @@ The distribution should contain the following files:
|
||||
Philip Hazel
|
||||
Email local part: ph10
|
||||
Email domain: cam.ac.uk
|
||||
Last updated: 21 March 2009
|
||||
Last updated: 19 October 2009
|
||||
|
@ -196,6 +196,12 @@ them both to 0; an emulation function will be used. */
|
||||
#define LINK_SIZE 2
|
||||
#endif
|
||||
|
||||
/* Define to the sub-directory in which libtool stores uninstalled libraries.
|
||||
*/
|
||||
#ifndef LT_OBJDIR
|
||||
#define LT_OBJDIR ".libs/"
|
||||
#endif
|
||||
|
||||
/* The value of MATCH_LIMIT determines the default number of times the
|
||||
internal match() function can be called during a single execution of
|
||||
pcre_exec(). There is a runtime interface for setting a different limit.
|
||||
@ -262,13 +268,13 @@ them both to 0; an emulation function will be used. */
|
||||
#define PACKAGE_NAME "PCRE"
|
||||
|
||||
/* Define to the full name and version of this package. */
|
||||
#define PACKAGE_STRING "PCRE 7.9"
|
||||
#define PACKAGE_STRING "PCRE 8.00"
|
||||
|
||||
/* Define to the one symbol short name of this package. */
|
||||
#define PACKAGE_TARNAME "pcre"
|
||||
|
||||
/* Define to the version of this package. */
|
||||
#define PACKAGE_VERSION "7.9"
|
||||
#define PACKAGE_VERSION "8.00"
|
||||
|
||||
|
||||
/* If you are compiling for a system other than a Unix-like system or
|
||||
@ -324,7 +330,7 @@ them both to 0; an emulation function will be used. */
|
||||
|
||||
/* Version number of package */
|
||||
#ifndef VERSION
|
||||
#define VERSION "7.9"
|
||||
#define VERSION "8.00"
|
||||
#endif
|
||||
|
||||
/* Define to empty if `const' does not conform to ANSI C. */
|
||||
|
File diff suppressed because it is too large
Load Diff
@ -41,10 +41,10 @@ POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
/* The current PCRE version information. */
|
||||
|
||||
#define PCRE_MAJOR 7
|
||||
#define PCRE_MINOR 9
|
||||
#define PCRE_MAJOR 8
|
||||
#define PCRE_MINOR 00
|
||||
#define PCRE_PRERELEASE
|
||||
#define PCRE_DATE 2009-04-11
|
||||
#define PCRE_DATE 2009-10-19
|
||||
|
||||
/* When an application links to a PCRE DLL in Windows, the symbols that are
|
||||
imported have to be identified as such. When building PCRE, the appropriate
|
||||
@ -113,7 +113,8 @@ both, so we keep them all distinct. */
|
||||
#define PCRE_NO_AUTO_CAPTURE 0x00001000
|
||||
#define PCRE_NO_UTF8_CHECK 0x00002000
|
||||
#define PCRE_AUTO_CALLOUT 0x00004000
|
||||
#define PCRE_PARTIAL 0x00008000
|
||||
#define PCRE_PARTIAL_SOFT 0x00008000
|
||||
#define PCRE_PARTIAL 0x00008000 /* Backwards compatible synonym */
|
||||
#define PCRE_DFA_SHORTEST 0x00010000
|
||||
#define PCRE_DFA_RESTART 0x00020000
|
||||
#define PCRE_FIRSTLINE 0x00040000
|
||||
@ -128,6 +129,8 @@ both, so we keep them all distinct. */
|
||||
#define PCRE_JAVASCRIPT_COMPAT 0x02000000
|
||||
#define PCRE_NO_START_OPTIMIZE 0x04000000
|
||||
#define PCRE_NO_START_OPTIMISE 0x04000000
|
||||
#define PCRE_PARTIAL_HARD 0x08000000
|
||||
#define PCRE_NOTEMPTY_ATSTART 0x10000000
|
||||
|
||||
/* Exec-time and get/set-time error codes */
|
||||
|
||||
@ -174,6 +177,7 @@ both, so we keep them all distinct. */
|
||||
#define PCRE_INFO_OKPARTIAL 12
|
||||
#define PCRE_INFO_JCHANGED 13
|
||||
#define PCRE_INFO_HASCRORLF 14
|
||||
#define PCRE_INFO_MINLENGTH 15
|
||||
|
||||
/* Request types for pcre_config(). Do not re-arrange, in order to remain
|
||||
compatible. */
|
||||
|
@ -339,7 +339,9 @@ static const char error_texts[] =
|
||||
"number is too big\0"
|
||||
"subpattern name expected\0"
|
||||
"digit expected after (?+\0"
|
||||
"] is an invalid data character in JavaScript compatibility mode";
|
||||
"] is an invalid data character in JavaScript compatibility mode\0"
|
||||
/* 65 */
|
||||
"different names for subpatterns of the same number are not allowed";
|
||||
|
||||
|
||||
/* Table to identify digits and hex digits. This is used when compiling
|
||||
@ -1098,6 +1100,7 @@ if (ptr[0] == CHAR_LEFT_PARENTHESIS)
|
||||
if (name != NULL && lorn == ptr - thisname &&
|
||||
strncmp((const char *)name, (const char *)thisname, lorn) == 0)
|
||||
return *count;
|
||||
term++;
|
||||
}
|
||||
}
|
||||
}
|
||||
@ -1132,19 +1135,21 @@ for (; *ptr != 0; ptr++)
|
||||
BOOL negate_class = FALSE;
|
||||
for (;;)
|
||||
{
|
||||
int c = *(++ptr);
|
||||
if (c == CHAR_BACKSLASH)
|
||||
if (ptr[1] == CHAR_BACKSLASH)
|
||||
{
|
||||
if (ptr[1] == CHAR_E)
|
||||
ptr++;
|
||||
else if (strncmp((const char *)ptr+1,
|
||||
if (ptr[2] == CHAR_E)
|
||||
ptr+= 2;
|
||||
else if (strncmp((const char *)ptr+2,
|
||||
STR_Q STR_BACKSLASH STR_E, 3) == 0)
|
||||
ptr += 3;
|
||||
ptr += 4;
|
||||
else
|
||||
break;
|
||||
}
|
||||
else if (!negate_class && c == CHAR_CIRCUMFLEX_ACCENT)
|
||||
else if (!negate_class && ptr[1] == CHAR_CIRCUMFLEX_ACCENT)
|
||||
{
|
||||
negate_class = TRUE;
|
||||
ptr++;
|
||||
}
|
||||
else break;
|
||||
}
|
||||
|
||||
@ -1310,7 +1315,9 @@ for (;;)
|
||||
|
||||
case OP_CALLOUT:
|
||||
case OP_CREF:
|
||||
case OP_NCREF:
|
||||
case OP_RREF:
|
||||
case OP_NRREF:
|
||||
case OP_DEF:
|
||||
code += _pcre_OP_lengths[*code];
|
||||
break;
|
||||
@ -1326,23 +1333,34 @@ for (;;)
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Find the fixed length of a pattern *
|
||||
* Find the fixed length of a branch *
|
||||
*************************************************/
|
||||
|
||||
/* Scan a pattern and compute the fixed length of subject that will match it,
|
||||
/* Scan a branch and compute the fixed length of subject that will match it,
|
||||
if the length is fixed. This is needed for dealing with backward assertions.
|
||||
In UTF8 mode, the result is in characters rather than bytes.
|
||||
In UTF8 mode, the result is in characters rather than bytes. The branch is
|
||||
temporarily terminated with OP_END when this function is called.
|
||||
|
||||
This function is called when a backward assertion is encountered, so that if it
|
||||
fails, the error message can point to the correct place in the pattern.
|
||||
However, we cannot do this when the assertion contains subroutine calls,
|
||||
because they can be forward references. We solve this by remembering this case
|
||||
and doing the check at the end; a flag specifies which mode we are running in.
|
||||
|
||||
Arguments:
|
||||
code points to the start of the pattern (the bracket)
|
||||
options the compiling options
|
||||
atend TRUE if called when the pattern is complete
|
||||
cd the "compile data" structure
|
||||
|
||||
Returns: the fixed length, or -1 if there is no fixed length,
|
||||
Returns: the fixed length,
|
||||
or -1 if there is no fixed length,
|
||||
or -2 if \C was encountered
|
||||
or -3 if an OP_RECURSE item was encountered and atend is FALSE
|
||||
*/
|
||||
|
||||
static int
|
||||
find_fixedlength(uschar *code, int options)
|
||||
find_fixedlength(uschar *code, int options, BOOL atend, compile_data *cd)
|
||||
{
|
||||
int length = -1;
|
||||
|
||||
@ -1355,6 +1373,7 @@ branch, check the length against that of the other branches. */
|
||||
for (;;)
|
||||
{
|
||||
int d;
|
||||
uschar *ce, *cs;
|
||||
register int op = *cc;
|
||||
switch (op)
|
||||
{
|
||||
@ -1362,7 +1381,7 @@ for (;;)
|
||||
case OP_BRA:
|
||||
case OP_ONCE:
|
||||
case OP_COND:
|
||||
d = find_fixedlength(cc + ((op == OP_CBRA)? 2:0), options);
|
||||
d = find_fixedlength(cc + ((op == OP_CBRA)? 2:0), options, atend, cd);
|
||||
if (d < 0) return d;
|
||||
branchlength += d;
|
||||
do cc += GET(cc, 1); while (*cc == OP_ALT);
|
||||
@ -1385,6 +1404,21 @@ for (;;)
|
||||
branchlength = 0;
|
||||
break;
|
||||
|
||||
/* A true recursion implies not fixed length, but a subroutine call may
|
||||
be OK. If the subroutine is a forward reference, we can't deal with
|
||||
it until the end of the pattern, so return -3. */
|
||||
|
||||
case OP_RECURSE:
|
||||
if (!atend) return -3;
|
||||
cs = ce = (uschar *)cd->start_code + GET(cc, 1); /* Start subpattern */
|
||||
do ce += GET(ce, 1); while (*ce == OP_ALT); /* End subpattern */
|
||||
if (cc > cs && cc < ce) return -1; /* Recursion */
|
||||
d = find_fixedlength(cs + 2, options, atend, cd);
|
||||
if (d < 0) return d;
|
||||
branchlength += d;
|
||||
cc += 1 + LINK_SIZE;
|
||||
break;
|
||||
|
||||
/* Skip over assertive subpatterns */
|
||||
|
||||
case OP_ASSERT:
|
||||
@ -1398,7 +1432,9 @@ for (;;)
|
||||
|
||||
case OP_REVERSE:
|
||||
case OP_CREF:
|
||||
case OP_NCREF:
|
||||
case OP_RREF:
|
||||
case OP_NRREF:
|
||||
case OP_DEF:
|
||||
case OP_OPT:
|
||||
case OP_CALLOUT:
|
||||
@ -1421,10 +1457,8 @@ for (;;)
|
||||
branchlength++;
|
||||
cc += 2;
|
||||
#ifdef SUPPORT_UTF8
|
||||
if ((options & PCRE_UTF8) != 0)
|
||||
{
|
||||
while ((*cc & 0xc0) == 0x80) cc++;
|
||||
}
|
||||
if ((options & PCRE_UTF8) != 0 && cc[-1] >= 0xc0)
|
||||
cc += _pcre_utf8_table4[cc[-1] & 0x3f];
|
||||
#endif
|
||||
break;
|
||||
|
||||
@ -1435,10 +1469,8 @@ for (;;)
|
||||
branchlength += GET2(cc,1);
|
||||
cc += 4;
|
||||
#ifdef SUPPORT_UTF8
|
||||
if ((options & PCRE_UTF8) != 0)
|
||||
{
|
||||
while((*cc & 0x80) == 0x80) cc++;
|
||||
}
|
||||
if ((options & PCRE_UTF8) != 0 && cc[-1] >= 0xc0)
|
||||
cc += _pcre_utf8_table4[cc[-1] & 0x3f];
|
||||
#endif
|
||||
break;
|
||||
|
||||
@ -1517,22 +1549,25 @@ for (;;)
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Scan compiled regex for numbered bracket *
|
||||
* Scan compiled regex for specific bracket *
|
||||
*************************************************/
|
||||
|
||||
/* This little function scans through a compiled pattern until it finds a
|
||||
capturing bracket with the given number.
|
||||
capturing bracket with the given number, or, if the number is negative, an
|
||||
instance of OP_REVERSE for a lookbehind. The function is global in the C sense
|
||||
so that it can be called from pcre_study() when finding the minimum matching
|
||||
length.
|
||||
|
||||
Arguments:
|
||||
code points to start of expression
|
||||
utf8 TRUE in UTF-8 mode
|
||||
number the required bracket number
|
||||
number the required bracket number or negative to find a lookbehind
|
||||
|
||||
Returns: pointer to the opcode for the bracket, or NULL if not found
|
||||
*/
|
||||
|
||||
static const uschar *
|
||||
find_bracket(const uschar *code, BOOL utf8, int number)
|
||||
const uschar *
|
||||
_pcre_find_bracket(const uschar *code, BOOL utf8, int number)
|
||||
{
|
||||
for (;;)
|
||||
{
|
||||
@ -1545,6 +1580,14 @@ for (;;)
|
||||
|
||||
if (c == OP_XCLASS) code += GET(code, 1);
|
||||
|
||||
/* Handle recursion */
|
||||
|
||||
else if (c == OP_REVERSE)
|
||||
{
|
||||
if (number < 0) return (uschar *)code;
|
||||
code += _pcre_OP_lengths[c];
|
||||
}
|
||||
|
||||
/* Handle capturing bracket */
|
||||
|
||||
else if (c == OP_CBRA)
|
||||
@ -1910,10 +1953,13 @@ for (code = first_significant_code(code + _pcre_OP_lengths[*code], NULL, 0, TRUE
|
||||
case OP_QUERY:
|
||||
case OP_MINQUERY:
|
||||
case OP_POSQUERY:
|
||||
if (utf8 && code[1] >= 0xc0) code += _pcre_utf8_table4[code[1] & 0x3f];
|
||||
break;
|
||||
|
||||
case OP_UPTO:
|
||||
case OP_MINUPTO:
|
||||
case OP_POSUPTO:
|
||||
if (utf8) while ((code[2] & 0xc0) == 0x80) code++;
|
||||
if (utf8 && code[3] >= 0xc0) code += _pcre_utf8_table4[code[3] & 0x3f];
|
||||
break;
|
||||
#endif
|
||||
}
|
||||
@ -3867,10 +3913,15 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
|
||||
|
||||
if (repeat_max == 0) goto END_REPEAT;
|
||||
|
||||
/*--------------------------------------------------------------------*/
|
||||
/* This code is obsolete from release 8.00; the restriction was finally
|
||||
removed: */
|
||||
|
||||
/* All real repeats make it impossible to handle partial matching (maybe
|
||||
one day we will be able to remove this restriction). */
|
||||
|
||||
if (repeat_max != 1) cd->external_flags |= PCRE_NOPARTIAL;
|
||||
/* if (repeat_max != 1) cd->external_flags |= PCRE_NOPARTIAL; */
|
||||
/*--------------------------------------------------------------------*/
|
||||
|
||||
/* Combine the op_type with the repeat_type */
|
||||
|
||||
@ -4017,10 +4068,15 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
|
||||
goto END_REPEAT;
|
||||
}
|
||||
|
||||
/*--------------------------------------------------------------------*/
|
||||
/* This code is obsolete from release 8.00; the restriction was finally
|
||||
removed: */
|
||||
|
||||
/* All real repeats make it impossible to handle partial matching (maybe
|
||||
one day we will be able to remove this restriction). */
|
||||
|
||||
if (repeat_max != 1) cd->external_flags |= PCRE_NOPARTIAL;
|
||||
/* if (repeat_max != 1) cd->external_flags |= PCRE_NOPARTIAL; */
|
||||
/*--------------------------------------------------------------------*/
|
||||
|
||||
if (repeat_min == 0 && repeat_max == -1)
|
||||
*code++ = OP_CRSTAR + repeat_type;
|
||||
@ -4335,11 +4391,20 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
|
||||
if (possessive_quantifier)
|
||||
{
|
||||
int len;
|
||||
if (*tempcode == OP_EXACT || *tempcode == OP_TYPEEXACT ||
|
||||
*tempcode == OP_NOTEXACT)
|
||||
|
||||
if (*tempcode == OP_TYPEEXACT)
|
||||
tempcode += _pcre_OP_lengths[*tempcode] +
|
||||
((*tempcode == OP_TYPEEXACT &&
|
||||
(tempcode[3] == OP_PROP || tempcode[3] == OP_NOTPROP))? 2:0);
|
||||
((tempcode[3] == OP_PROP || tempcode[3] == OP_NOTPROP)? 2 : 0);
|
||||
|
||||
else if (*tempcode == OP_EXACT || *tempcode == OP_NOTEXACT)
|
||||
{
|
||||
tempcode += _pcre_OP_lengths[*tempcode];
|
||||
#ifdef SUPPORT_UTF8
|
||||
if (utf8 && tempcode[-1] >= 0xc0)
|
||||
tempcode += _pcre_utf8_table4[tempcode[-1] & 0x3f];
|
||||
#endif
|
||||
}
|
||||
|
||||
len = code - tempcode;
|
||||
if (len > 0) switch (*tempcode)
|
||||
{
|
||||
@ -4417,8 +4482,19 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
|
||||
if (namelen == verbs[i].len &&
|
||||
strncmp((char *)name, vn, namelen) == 0)
|
||||
{
|
||||
*code = verbs[i].op;
|
||||
if (*code++ == OP_ACCEPT) cd->had_accept = TRUE;
|
||||
/* Check for open captures before ACCEPT */
|
||||
|
||||
if (verbs[i].op == OP_ACCEPT)
|
||||
{
|
||||
open_capitem *oc;
|
||||
cd->had_accept = TRUE;
|
||||
for (oc = cd->open_caps; oc != NULL; oc = oc->next)
|
||||
{
|
||||
*code++ = OP_CLOSE;
|
||||
PUT2INC(code, 0, oc->number);
|
||||
}
|
||||
}
|
||||
*code++ = verbs[i].op;
|
||||
break;
|
||||
}
|
||||
vn += verbs[i].len + 1;
|
||||
@ -4580,7 +4656,10 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
|
||||
}
|
||||
|
||||
/* Otherwise (did not start with "+" or "-"), start by looking for the
|
||||
name. */
|
||||
name. If we find a name, add one to the opcode to change OP_CREF or
|
||||
OP_RREF into OP_NCREF or OP_NRREF. These behave exactly the same,
|
||||
except they record that the reference was originally to a name. The
|
||||
information is used to check duplicate names. */
|
||||
|
||||
slot = cd->name_table;
|
||||
for (i = 0; i < cd->names_found; i++)
|
||||
@ -4595,6 +4674,7 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
|
||||
{
|
||||
recno = GET2(slot, 0);
|
||||
PUT2(code, 2+LINK_SIZE, recno);
|
||||
code[1+LINK_SIZE]++;
|
||||
}
|
||||
|
||||
/* Search the pattern for a forward reference */
|
||||
@ -4603,6 +4683,7 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
|
||||
(options & PCRE_EXTENDED) != 0)) > 0)
|
||||
{
|
||||
PUT2(code, 2+LINK_SIZE, i);
|
||||
code[1+LINK_SIZE]++;
|
||||
}
|
||||
|
||||
/* If terminator == 0 it means that the name followed directly after
|
||||
@ -4795,11 +4876,24 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
|
||||
}
|
||||
}
|
||||
|
||||
/* In the real compile, create the entry in the table */
|
||||
/* In the real compile, create the entry in the table, maintaining
|
||||
alphabetical order. Duplicate names for different numbers are
|
||||
permitted only if PCRE_DUPNAMES is set. Duplicate names for the same
|
||||
number are always OK. (An existing number can be re-used if (?|
|
||||
appears in the pattern.) In either event, a duplicate name results in
|
||||
a duplicate entry in the table, even if the number is the same. This
|
||||
is because the number of names, and hence the table size, is computed
|
||||
in the pre-compile, and it affects various numbers and pointers which
|
||||
would all have to be modified, and the compiled code moved down, if
|
||||
duplicates with the same number were omitted from the table. This
|
||||
doesn't seem worth the hassle. However, *different* names for the
|
||||
same number are not permitted. */
|
||||
|
||||
else
|
||||
{
|
||||
BOOL dupname = FALSE;
|
||||
slot = cd->name_table;
|
||||
|
||||
for (i = 0; i < cd->names_found; i++)
|
||||
{
|
||||
int crc = memcmp(name, slot+2, namelen);
|
||||
@ -4807,33 +4901,66 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
|
||||
{
|
||||
if (slot[2+namelen] == 0)
|
||||
{
|
||||
if ((options & PCRE_DUPNAMES) == 0)
|
||||
if (GET2(slot, 0) != cd->bracount + 1 &&
|
||||
(options & PCRE_DUPNAMES) == 0)
|
||||
{
|
||||
*errorcodeptr = ERR43;
|
||||
goto FAILED;
|
||||
}
|
||||
else dupname = TRUE;
|
||||
}
|
||||
else crc = -1; /* Current name is substring */
|
||||
else crc = -1; /* Current name is a substring */
|
||||
}
|
||||
|
||||
/* Make space in the table and break the loop for an earlier
|
||||
name. For a duplicate or later name, carry on. We do this for
|
||||
duplicates so that in the simple case (when ?(| is not used) they
|
||||
are in order of their numbers. */
|
||||
|
||||
if (crc < 0)
|
||||
{
|
||||
memmove(slot + cd->name_entry_size, slot,
|
||||
(cd->names_found - i) * cd->name_entry_size);
|
||||
break;
|
||||
}
|
||||
|
||||
/* Continue the loop for a later or duplicate name */
|
||||
|
||||
slot += cd->name_entry_size;
|
||||
}
|
||||
|
||||
/* For non-duplicate names, check for a duplicate number before
|
||||
adding the new name. */
|
||||
|
||||
if (!dupname)
|
||||
{
|
||||
uschar *cslot = cd->name_table;
|
||||
for (i = 0; i < cd->names_found; i++)
|
||||
{
|
||||
if (cslot != slot)
|
||||
{
|
||||
if (GET2(cslot, 0) == cd->bracount + 1)
|
||||
{
|
||||
*errorcodeptr = ERR65;
|
||||
goto FAILED;
|
||||
}
|
||||
}
|
||||
else i--;
|
||||
cslot += cd->name_entry_size;
|
||||
}
|
||||
}
|
||||
|
||||
PUT2(slot, 0, cd->bracount + 1);
|
||||
memcpy(slot + 2, name, namelen);
|
||||
slot[2+namelen] = 0;
|
||||
}
|
||||
}
|
||||
|
||||
/* In both cases, count the number of names we've encountered. */
|
||||
/* In both pre-compile and compile, count the number of names we've
|
||||
encountered. */
|
||||
|
||||
ptr++; /* Move past > or ' */
|
||||
cd->names_found++;
|
||||
ptr++; /* Move past > or ' */
|
||||
goto NUMBERED_GROUP;
|
||||
|
||||
|
||||
@ -5002,7 +5129,8 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
|
||||
if (lengthptr == NULL)
|
||||
{
|
||||
*code = OP_END;
|
||||
if (recno != 0) called = find_bracket(cd->start_code, utf8, recno);
|
||||
if (recno != 0)
|
||||
called = _pcre_find_bracket(cd->start_code, utf8, recno);
|
||||
|
||||
/* Forward reference */
|
||||
|
||||
@ -5646,6 +5774,8 @@ uschar *code = *codeptr;
|
||||
uschar *last_branch = code;
|
||||
uschar *start_bracket = code;
|
||||
uschar *reverse_count = NULL;
|
||||
open_capitem capitem;
|
||||
int capnumber = 0;
|
||||
int firstbyte, reqbyte;
|
||||
int branchfirstbyte, branchreqbyte;
|
||||
int length;
|
||||
@ -5672,6 +5802,17 @@ the code that abstracts option settings at the start of the pattern and makes
|
||||
them global. It tests the value of length for (2 + 2*LINK_SIZE) in the
|
||||
pre-compile phase to find out whether anything has yet been compiled or not. */
|
||||
|
||||
/* If this is a capturing subpattern, add to the chain of open capturing items
|
||||
so that we can detect them if (*ACCEPT) is encountered. */
|
||||
|
||||
if (*code == OP_CBRA)
|
||||
{
|
||||
capnumber = GET2(code, 1 + LINK_SIZE);
|
||||
capitem.number = capnumber;
|
||||
capitem.next = cd->open_caps;
|
||||
cd->open_caps = &capitem;
|
||||
}
|
||||
|
||||
/* Offset is set zero to mark that this bracket is still open */
|
||||
|
||||
PUT(code, 1, 0);
|
||||
@ -5766,21 +5907,29 @@ for (;;)
|
||||
|
||||
/* If lookbehind, check that this branch matches a fixed-length string, and
|
||||
put the length into the OP_REVERSE item. Temporarily mark the end of the
|
||||
branch with OP_END. */
|
||||
branch with OP_END. If the branch contains OP_RECURSE, the result is -3
|
||||
because there may be forward references that we can't check here. Set a
|
||||
flag to cause another lookbehind check at the end. Why not do it all at the
|
||||
end? Because common, erroneous checks are picked up here and the offset of
|
||||
the problem can be shown. */
|
||||
|
||||
if (lookbehind)
|
||||
{
|
||||
int fixed_length;
|
||||
*code = OP_END;
|
||||
fixed_length = find_fixedlength(last_branch, options);
|
||||
fixed_length = find_fixedlength(last_branch, options, FALSE, cd);
|
||||
DPRINTF(("fixed length = %d\n", fixed_length));
|
||||
if (fixed_length < 0)
|
||||
if (fixed_length == -3)
|
||||
{
|
||||
cd->check_lookbehind = TRUE;
|
||||
}
|
||||
else if (fixed_length < 0)
|
||||
{
|
||||
*errorcodeptr = (fixed_length == -2)? ERR36 : ERR25;
|
||||
*ptrptr = ptr;
|
||||
return FALSE;
|
||||
}
|
||||
PUT(reverse_count, 0, fixed_length);
|
||||
else { PUT(reverse_count, 0, fixed_length); }
|
||||
}
|
||||
}
|
||||
|
||||
@ -5808,6 +5957,10 @@ for (;;)
|
||||
while (branch_length > 0);
|
||||
}
|
||||
|
||||
/* If it was a capturing subpattern, remove it from the chain. */
|
||||
|
||||
if (capnumber > 0) cd->open_caps = cd->open_caps->next;
|
||||
|
||||
/* Fill in the ket */
|
||||
|
||||
*code = OP_KET;
|
||||
@ -6010,7 +6163,9 @@ do {
|
||||
switch (*scode)
|
||||
{
|
||||
case OP_CREF:
|
||||
case OP_NCREF:
|
||||
case OP_RREF:
|
||||
case OP_NRREF:
|
||||
case OP_DEF:
|
||||
return FALSE;
|
||||
|
||||
@ -6179,9 +6334,7 @@ int length = 1; /* For final END opcode */
|
||||
int firstbyte, reqbyte, newline;
|
||||
int errorcode = 0;
|
||||
int skipatstart = 0;
|
||||
#ifdef SUPPORT_UTF8
|
||||
BOOL utf8;
|
||||
#endif
|
||||
BOOL utf8 = (options & PCRE_UTF8) != 0;
|
||||
size_t size;
|
||||
uschar *code;
|
||||
const uschar *codestart;
|
||||
@ -6278,7 +6431,6 @@ while (ptr[skipatstart] == CHAR_LEFT_PARENTHESIS &&
|
||||
/* Can't support UTF8 unless PCRE has been compiled to include the code. */
|
||||
|
||||
#ifdef SUPPORT_UTF8
|
||||
utf8 = (options & PCRE_UTF8) != 0;
|
||||
if (utf8 && (options & PCRE_NO_UTF8_CHECK) == 0 &&
|
||||
(*erroroffset = _pcre_valid_utf8((uschar *)pattern, -1)) >= 0)
|
||||
{
|
||||
@ -6286,7 +6438,7 @@ if (utf8 && (options & PCRE_NO_UTF8_CHECK) == 0 &&
|
||||
goto PCRE_EARLY_ERROR_RETURN2;
|
||||
}
|
||||
#else
|
||||
if ((options & PCRE_UTF8) != 0)
|
||||
if (utf8)
|
||||
{
|
||||
errorcode = ERR32;
|
||||
goto PCRE_EARLY_ERROR_RETURN;
|
||||
@ -6375,6 +6527,7 @@ cd->end_pattern = (const uschar *)(pattern + strlen(pattern));
|
||||
cd->req_varyopt = 0;
|
||||
cd->external_options = options;
|
||||
cd->external_flags = 0;
|
||||
cd->open_caps = NULL;
|
||||
|
||||
/* Now do the pre-compile. On error, errorcode will be set non-zero, so we
|
||||
don't need to look at the result of the function here. The initial options have
|
||||
@ -6449,6 +6602,8 @@ cd->start_code = codestart;
|
||||
cd->hwm = cworkspace;
|
||||
cd->req_varyopt = 0;
|
||||
cd->had_accept = FALSE;
|
||||
cd->check_lookbehind = FALSE;
|
||||
cd->open_caps = NULL;
|
||||
|
||||
/* Set up a starting, non-extracting bracket, then compile the expression. On
|
||||
error, errorcode will be set non-zero, so we don't need to look at the result
|
||||
@ -6487,7 +6642,7 @@ while (errorcode == 0 && cd->hwm > cworkspace)
|
||||
cd->hwm -= LINK_SIZE;
|
||||
offset = GET(cd->hwm, 0);
|
||||
recno = GET(codestart, offset);
|
||||
groupptr = find_bracket(codestart, (re->options & PCRE_UTF8) != 0, recno);
|
||||
groupptr = _pcre_find_bracket(codestart, utf8, recno);
|
||||
if (groupptr == NULL) errorcode = ERR53;
|
||||
else PUT(((uschar *)codestart), offset, groupptr - codestart);
|
||||
}
|
||||
@ -6497,6 +6652,47 @@ subpattern. */
|
||||
|
||||
if (errorcode == 0 && re->top_backref > re->top_bracket) errorcode = ERR15;
|
||||
|
||||
/* If there were any lookbehind assertions that contained OP_RECURSE
|
||||
(recursions or subroutine calls), a flag is set for them to be checked here,
|
||||
because they may contain forward references. Actual recursions can't be fixed
|
||||
length, but subroutine calls can. It is done like this so that those without
|
||||
OP_RECURSE that are not fixed length get a diagnosic with a useful offset. The
|
||||
exceptional ones forgo this. We scan the pattern to check that they are fixed
|
||||
length, and set their lengths. */
|
||||
|
||||
if (cd->check_lookbehind)
|
||||
{
|
||||
uschar *cc = (uschar *)codestart;
|
||||
|
||||
/* Loop, searching for OP_REVERSE items, and process those that do not have
|
||||
their length set. (Actually, it will also re-process any that have a length
|
||||
of zero, but that is a pathological case, and it does no harm.) When we find
|
||||
one, we temporarily terminate the branch it is in while we scan it. */
|
||||
|
||||
for (cc = (uschar *)_pcre_find_bracket(codestart, utf8, -1);
|
||||
cc != NULL;
|
||||
cc = (uschar *)_pcre_find_bracket(cc, utf8, -1))
|
||||
{
|
||||
if (GET(cc, 1) == 0)
|
||||
{
|
||||
int fixed_length;
|
||||
uschar *be = cc - 1 - LINK_SIZE + GET(cc, -LINK_SIZE);
|
||||
int end_op = *be;
|
||||
*be = OP_END;
|
||||
fixed_length = find_fixedlength(cc, re->options, TRUE, cd);
|
||||
*be = end_op;
|
||||
DPRINTF(("fixed length = %d\n", fixed_length));
|
||||
if (fixed_length < 0)
|
||||
{
|
||||
errorcode = (fixed_length == -2)? ERR36 : ERR25;
|
||||
break;
|
||||
}
|
||||
PUT(cc, 1, fixed_length);
|
||||
}
|
||||
cc += 1 + LINK_SIZE;
|
||||
}
|
||||
}
|
||||
|
||||
/* Failed to compile, or error while post-processing */
|
||||
|
||||
if (errorcode != 0)
|
||||
|
File diff suppressed because it is too large
Load Diff
@ -117,10 +117,16 @@ switch (what)
|
||||
|
||||
case PCRE_INFO_FIRSTTABLE:
|
||||
*((const uschar **)where) =
|
||||
(study != NULL && (study->options & PCRE_STUDY_MAPPED) != 0)?
|
||||
(study != NULL && (study->flags & PCRE_STUDY_MAPPED) != 0)?
|
||||
((const pcre_study_data *)extra_data->study_data)->start_bits : NULL;
|
||||
break;
|
||||
|
||||
case PCRE_INFO_MINLENGTH:
|
||||
*((int *)where) =
|
||||
(study != NULL && (study->flags & PCRE_STUDY_MINLEN) != 0)?
|
||||
study->minlength : -1;
|
||||
break;
|
||||
|
||||
case PCRE_INFO_LASTLITERAL:
|
||||
*((int *)where) =
|
||||
((re->flags & PCRE_REQCHSET) != 0)? re->req_byte : -1;
|
||||
@ -142,6 +148,9 @@ switch (what)
|
||||
*((const uschar **)where) = (const uschar *)(_pcre_default_tables);
|
||||
break;
|
||||
|
||||
/* From release 8.00 this will always return TRUE because NOPARTIAL is
|
||||
no longer ever set (the restrictions have been removed). */
|
||||
|
||||
case PCRE_INFO_OKPARTIAL:
|
||||
*((int *)where) = (re->flags & PCRE_NOPARTIAL) == 0;
|
||||
break;
|
||||
|
@ -535,7 +535,9 @@ Standard C system should have one. */
|
||||
|
||||
/* Private flags containing information about the compiled regex. They used to
|
||||
live at the top end of the options word, but that got almost full, so now they
|
||||
are in a 16-bit flags word. */
|
||||
are in a 16-bit flags word. From release 8.00, PCRE_NOPARTIAL is unused, as
|
||||
the restrictions on partial matching have been lifted. It remains for backwards
|
||||
compatibility. */
|
||||
|
||||
#define PCRE_NOPARTIAL 0x0001 /* can't use partial with this regex */
|
||||
#define PCRE_FIRSTSET 0x0002 /* first_byte is set */
|
||||
@ -547,6 +549,7 @@ are in a 16-bit flags word. */
|
||||
/* Options for the "extra" block produced by pcre_study(). */
|
||||
|
||||
#define PCRE_STUDY_MAPPED 0x01 /* a map of starting chars exists */
|
||||
#define PCRE_STUDY_MINLEN 0x02 /* a minimum length field exists */
|
||||
|
||||
/* Masks for identifying the public options that are permitted at compile
|
||||
time, run time, or study time, respectively. */
|
||||
@ -562,14 +565,15 @@ time, run time, or study time, respectively. */
|
||||
PCRE_JAVASCRIPT_COMPAT)
|
||||
|
||||
#define PUBLIC_EXEC_OPTIONS \
|
||||
(PCRE_ANCHORED|PCRE_NOTBOL|PCRE_NOTEOL|PCRE_NOTEMPTY|PCRE_NO_UTF8_CHECK| \
|
||||
PCRE_PARTIAL|PCRE_NEWLINE_BITS|PCRE_BSR_ANYCRLF|PCRE_BSR_UNICODE| \
|
||||
PCRE_NO_START_OPTIMIZE)
|
||||
(PCRE_ANCHORED|PCRE_NOTBOL|PCRE_NOTEOL|PCRE_NOTEMPTY|PCRE_NOTEMPTY_ATSTART| \
|
||||
PCRE_NO_UTF8_CHECK|PCRE_PARTIAL_HARD|PCRE_PARTIAL_SOFT|PCRE_NEWLINE_BITS| \
|
||||
PCRE_BSR_ANYCRLF|PCRE_BSR_UNICODE|PCRE_NO_START_OPTIMIZE)
|
||||
|
||||
#define PUBLIC_DFA_EXEC_OPTIONS \
|
||||
(PCRE_ANCHORED|PCRE_NOTBOL|PCRE_NOTEOL|PCRE_NOTEMPTY|PCRE_NO_UTF8_CHECK| \
|
||||
PCRE_PARTIAL|PCRE_DFA_SHORTEST|PCRE_DFA_RESTART|PCRE_NEWLINE_BITS| \
|
||||
PCRE_BSR_ANYCRLF|PCRE_BSR_UNICODE|PCRE_NO_START_OPTIMIZE)
|
||||
(PCRE_ANCHORED|PCRE_NOTBOL|PCRE_NOTEOL|PCRE_NOTEMPTY|PCRE_NOTEMPTY_ATSTART| \
|
||||
PCRE_NO_UTF8_CHECK|PCRE_PARTIAL_HARD|PCRE_PARTIAL_SOFT|PCRE_DFA_SHORTEST| \
|
||||
PCRE_DFA_RESTART|PCRE_NEWLINE_BITS|PCRE_BSR_ANYCRLF|PCRE_BSR_UNICODE| \
|
||||
PCRE_NO_START_OPTIMIZE)
|
||||
|
||||
#define PUBLIC_STUDY_OPTIONS 0 /* None defined */
|
||||
|
||||
@ -1206,8 +1210,8 @@ enum { ESC_A = 1, ESC_G, ESC_K, ESC_B, ESC_b, ESC_D, ESC_d, ESC_S, ESC_s,
|
||||
OP_EOD must correspond in order to the list of escapes immediately above.
|
||||
|
||||
*** NOTE NOTE NOTE *** Whenever this list is updated, the two macro definitions
|
||||
that follow must also be updated to match. There is also a table called
|
||||
"coptable" in pcre_dfa_exec.c that must be updated. */
|
||||
that follow must also be updated to match. There are also tables called
|
||||
"coptable" and "poptable" in pcre_dfa_exec.c that must be updated. */
|
||||
|
||||
enum {
|
||||
OP_END, /* 0 End of pattern */
|
||||
@ -1343,30 +1347,39 @@ enum {
|
||||
OP_SCBRA, /* 98 Start of capturing bracket, check empty */
|
||||
OP_SCOND, /* 99 Conditional group, check empty */
|
||||
|
||||
OP_CREF, /* 100 Used to hold a capture number as condition */
|
||||
OP_RREF, /* 101 Used to hold a recursion number as condition */
|
||||
OP_DEF, /* 102 The DEFINE condition */
|
||||
/* The next two pairs must (respectively) be kept together. */
|
||||
|
||||
OP_BRAZERO, /* 103 These two must remain together and in this */
|
||||
OP_BRAMINZERO, /* 104 order. */
|
||||
OP_CREF, /* 100 Used to hold a capture number as condition */
|
||||
OP_NCREF, /* 101 Same, but generaged by a name reference*/
|
||||
OP_RREF, /* 102 Used to hold a recursion number as condition */
|
||||
OP_NRREF, /* 103 Same, but generaged by a name reference*/
|
||||
OP_DEF, /* 104 The DEFINE condition */
|
||||
|
||||
OP_BRAZERO, /* 105 These two must remain together and in this */
|
||||
OP_BRAMINZERO, /* 106 order. */
|
||||
|
||||
/* These are backtracking control verbs */
|
||||
|
||||
OP_PRUNE, /* 105 */
|
||||
OP_SKIP, /* 106 */
|
||||
OP_THEN, /* 107 */
|
||||
OP_COMMIT, /* 108 */
|
||||
OP_PRUNE, /* 107 */
|
||||
OP_SKIP, /* 108 */
|
||||
OP_THEN, /* 109 */
|
||||
OP_COMMIT, /* 110 */
|
||||
|
||||
/* These are forced failure and success verbs */
|
||||
|
||||
OP_FAIL, /* 109 */
|
||||
OP_ACCEPT, /* 110 */
|
||||
OP_FAIL, /* 111 */
|
||||
OP_ACCEPT, /* 112 */
|
||||
OP_CLOSE, /* 113 Used before OP_ACCEPT to close open captures */
|
||||
|
||||
/* This is used to skip a subpattern with a {0} quantifier */
|
||||
|
||||
OP_SKIPZERO /* 111 */
|
||||
OP_SKIPZERO /* 114 */
|
||||
};
|
||||
|
||||
/* *** NOTE NOTE NOTE *** Whenever the list above is updated, the two macro
|
||||
definitions that follow must also be updated to match. There are also tables
|
||||
called "coptable" cna "poptable" in pcre_dfa_exec.c that must be updated. */
|
||||
|
||||
|
||||
/* This macro defines textual names for all the opcodes. These are used only
|
||||
for debugging. The macro is referenced only in pcre_printint.c. */
|
||||
@ -1388,9 +1401,10 @@ for debugging. The macro is referenced only in pcre_printint.c. */
|
||||
"Alt", "Ket", "KetRmax", "KetRmin", "Assert", "Assert not", \
|
||||
"AssertB", "AssertB not", "Reverse", \
|
||||
"Once", "Bra", "CBra", "Cond", "SBra", "SCBra", "SCond", \
|
||||
"Cond ref", "Cond rec", "Cond def", "Brazero", "Braminzero", \
|
||||
"Cond ref", "Cond nref", "Cond rec", "Cond nrec", "Cond def", \
|
||||
"Brazero", "Braminzero", \
|
||||
"*PRUNE", "*SKIP", "*THEN", "*COMMIT", "*FAIL", "*ACCEPT", \
|
||||
"Skip zero"
|
||||
"Close", "Skip zero"
|
||||
|
||||
|
||||
/* This macro defines the length of fixed length operations in the compiled
|
||||
@ -1450,15 +1464,16 @@ in UTF-8 mode. The code that uses this table must know about such things. */
|
||||
1+LINK_SIZE, /* SBRA */ \
|
||||
3+LINK_SIZE, /* SCBRA */ \
|
||||
1+LINK_SIZE, /* SCOND */ \
|
||||
3, /* CREF */ \
|
||||
3, /* RREF */ \
|
||||
3, 3, /* CREF, NCREF */ \
|
||||
3, 3, /* RREF, NRREF */ \
|
||||
1, /* DEF */ \
|
||||
1, 1, /* BRAZERO, BRAMINZERO */ \
|
||||
1, 1, 1, 1, /* PRUNE, SKIP, THEN, COMMIT, */ \
|
||||
1, 1, 1 /* FAIL, ACCEPT, SKIPZERO */
|
||||
1, 1, 3, 1 /* FAIL, ACCEPT, CLOSE, SKIPZERO */
|
||||
|
||||
|
||||
/* A magic value for OP_RREF to indicate the "any recursion" condition. */
|
||||
/* A magic value for OP_RREF and OP_NRREF to indicate the "any recursion"
|
||||
condition. */
|
||||
|
||||
#define RREF_ANY 0xffff
|
||||
|
||||
@ -1471,7 +1486,7 @@ enum { ERR0, ERR1, ERR2, ERR3, ERR4, ERR5, ERR6, ERR7, ERR8, ERR9,
|
||||
ERR30, ERR31, ERR32, ERR33, ERR34, ERR35, ERR36, ERR37, ERR38, ERR39,
|
||||
ERR40, ERR41, ERR42, ERR43, ERR44, ERR45, ERR46, ERR47, ERR48, ERR49,
|
||||
ERR50, ERR51, ERR52, ERR53, ERR54, ERR55, ERR56, ERR57, ERR58, ERR59,
|
||||
ERR60, ERR61, ERR62, ERR63, ERR64 };
|
||||
ERR60, ERR61, ERR62, ERR63, ERR64, ERR65 };
|
||||
|
||||
/* The real format of the start of the pcre block; the index of names and the
|
||||
code vector run on as long as necessary after the end. We store an explicit
|
||||
@ -1487,7 +1502,7 @@ Because people can now save and re-use compiled patterns, any additions to this
|
||||
structure should be made at the end, and something earlier (e.g. a new
|
||||
flag in the options or one of the dummy fields) should indicate that the new
|
||||
fields are present. Currently PCRE always sets the dummy fields to zero.
|
||||
NOTE NOTE NOTE:
|
||||
NOTE NOTE NOTE
|
||||
*/
|
||||
|
||||
typedef struct real_pcre {
|
||||
@ -1514,10 +1529,20 @@ remark (see NOTE above) about extending this structure applies. */
|
||||
|
||||
typedef struct pcre_study_data {
|
||||
pcre_uint32 size; /* Total that was malloced */
|
||||
pcre_uint32 options;
|
||||
uschar start_bits[32];
|
||||
pcre_uint32 flags; /* Private flags */
|
||||
uschar start_bits[32]; /* Starting char bits */
|
||||
pcre_uint32 minlength; /* Minimum subject length */
|
||||
} pcre_study_data;
|
||||
|
||||
/* Structure for building a chain of open capturing subpatterns during
|
||||
compiling, so that instructions to close them can be compiled when (*ACCEPT) is
|
||||
encountered. */
|
||||
|
||||
typedef struct open_capitem {
|
||||
struct open_capitem *next; /* Chain link */
|
||||
pcre_uint16 number; /* Capture number */
|
||||
} open_capitem;
|
||||
|
||||
/* Structure for passing "static" information around between the functions
|
||||
doing the compiling, so that they are thread-safe. */
|
||||
|
||||
@ -1530,6 +1555,7 @@ typedef struct compile_data {
|
||||
const uschar *start_code; /* The start of the compiled code */
|
||||
const uschar *start_pattern; /* The start of the pattern */
|
||||
const uschar *end_pattern; /* The end of the pattern */
|
||||
open_capitem *open_caps; /* Chain of open capture items */
|
||||
uschar *hwm; /* High watermark of workspace */
|
||||
uschar *name_table; /* The name/number table */
|
||||
int names_found; /* Number of entries so far */
|
||||
@ -1542,6 +1568,7 @@ typedef struct compile_data {
|
||||
int external_flags; /* External flag bits to be set */
|
||||
int req_varyopt; /* "After variable item" flag for reqbyte */
|
||||
BOOL had_accept; /* (*ACCEPT) encountered */
|
||||
BOOL check_lookbehind; /* Lookbehinds need later checking */
|
||||
int nltype; /* Newline type */
|
||||
int nllen; /* Newline string length */
|
||||
uschar nl[4]; /* Newline string when fixed length */
|
||||
@ -1565,6 +1592,7 @@ typedef struct recursion_info {
|
||||
USPTR save_start; /* Old value of mstart */
|
||||
int *offset_save; /* Pointer to start of saved offsets */
|
||||
int saved_max; /* Number of saved offsets */
|
||||
int save_offset_top; /* Current value of offset_top */
|
||||
} recursion_info;
|
||||
|
||||
/* Structure for building a chain of data for holding the values of the subject
|
||||
@ -1589,6 +1617,9 @@ typedef struct match_data {
|
||||
int offset_max; /* The maximum usable for return data */
|
||||
int nltype; /* Newline type */
|
||||
int nllen; /* Newline string length */
|
||||
int name_count; /* Number of names in name table */
|
||||
int name_entry_size; /* Size of entry in names table */
|
||||
uschar *name_table; /* Table of names */
|
||||
uschar nl[4]; /* Newline string when fixed */
|
||||
const uschar *lcc; /* Points to lower casing table */
|
||||
const uschar *ctypes; /* Points to table of type maps */
|
||||
@ -1599,7 +1630,7 @@ typedef struct match_data {
|
||||
BOOL jscript_compat; /* JAVASCRIPT_COMPAT flag */
|
||||
BOOL endonly; /* Dollar not before final \n */
|
||||
BOOL notempty; /* Empty string match not wanted */
|
||||
BOOL partial; /* PARTIAL flag */
|
||||
BOOL notempty_atstart; /* Empty string match at start not wanted */
|
||||
BOOL hitend; /* Hit the end of the subject at some point */
|
||||
BOOL bsr_anycrlf; /* \R is just any CRLF, not full Unicode */
|
||||
const uschar *start_code; /* For use when recursing */
|
||||
@ -1607,6 +1638,8 @@ typedef struct match_data {
|
||||
USPTR end_subject; /* End of the subject string */
|
||||
USPTR start_match_ptr; /* Start of matched string */
|
||||
USPTR end_match_ptr; /* Subject position at end match */
|
||||
USPTR start_used_ptr; /* Earliest consulted character */
|
||||
int partial; /* PARTIAL options */
|
||||
int end_offset_top; /* Highwater mark at end of match */
|
||||
int capture_last; /* Most recent capture number */
|
||||
int start_offset; /* The start offset value */
|
||||
@ -1623,7 +1656,9 @@ typedef struct dfa_match_data {
|
||||
const uschar *start_code; /* Start of the compiled pattern */
|
||||
const uschar *start_subject; /* Start of the subject string */
|
||||
const uschar *end_subject; /* End of subject string */
|
||||
const uschar *start_used_ptr; /* Earliest consulted character */
|
||||
const uschar *tables; /* Character tables */
|
||||
int start_offset; /* The start offset value */
|
||||
int moptions; /* Match options */
|
||||
int poptions; /* Pattern options */
|
||||
int nltype; /* Newline type */
|
||||
@ -1702,15 +1737,16 @@ extern const uschar _pcre_OP_lengths[];
|
||||
one of the exported public functions. They have to be "external" in the C
|
||||
sense, but are not part of the PCRE public API. */
|
||||
|
||||
extern BOOL _pcre_is_newline(const uschar *, int, const uschar *,
|
||||
int *, BOOL);
|
||||
extern int _pcre_ord2utf8(int, uschar *);
|
||||
extern real_pcre *_pcre_try_flipped(const real_pcre *, real_pcre *,
|
||||
const pcre_study_data *, pcre_study_data *);
|
||||
extern int _pcre_valid_utf8(const uschar *, int);
|
||||
extern BOOL _pcre_was_newline(const uschar *, int, const uschar *,
|
||||
int *, BOOL);
|
||||
extern BOOL _pcre_xclass(int, const uschar *);
|
||||
extern const uschar *_pcre_find_bracket(const uschar *, BOOL, int);
|
||||
extern BOOL _pcre_is_newline(const uschar *, int, const uschar *,
|
||||
int *, BOOL);
|
||||
extern int _pcre_ord2utf8(int, uschar *);
|
||||
extern real_pcre *_pcre_try_flipped(const real_pcre *, real_pcre *,
|
||||
const pcre_study_data *, pcre_study_data *);
|
||||
extern int _pcre_valid_utf8(const uschar *, int);
|
||||
extern BOOL _pcre_was_newline(const uschar *, int, const uschar *,
|
||||
int *, BOOL);
|
||||
extern BOOL _pcre_xclass(int, const uschar *);
|
||||
|
||||
|
||||
/* Unicode character database (UCD) */
|
||||
|
@ -246,7 +246,12 @@ for(;;)
|
||||
fprintf(f, "%s", OP_names[*code]);
|
||||
break;
|
||||
|
||||
case OP_CLOSE:
|
||||
fprintf(f, " %s %d", OP_names[*code], GET2(code, 1));
|
||||
break;
|
||||
|
||||
case OP_CREF:
|
||||
case OP_NCREF:
|
||||
fprintf(f, "%3d %s", GET2(code,1), OP_names[*code]);
|
||||
break;
|
||||
|
||||
@ -258,6 +263,14 @@ for(;;)
|
||||
fprintf(f, " Cond recurse %d", c);
|
||||
break;
|
||||
|
||||
case OP_NRREF:
|
||||
c = GET2(code, 1);
|
||||
if (c == RREF_ANY)
|
||||
fprintf(f, " Cond nrecurse any");
|
||||
else
|
||||
fprintf(f, " Cond nrecurse %d", c);
|
||||
break;
|
||||
|
||||
case OP_DEF:
|
||||
fprintf(f, " Cond def");
|
||||
break;
|
||||
|
@ -6,7 +6,7 @@
|
||||
and semantics are as close as possible to those of the Perl 5 language.
|
||||
|
||||
Written by Philip Hazel
|
||||
Copyright (c) 1997-2008 University of Cambridge
|
||||
Copyright (c) 1997-2009 University of Cambridge
|
||||
|
||||
-----------------------------------------------------------------------------
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
@ -52,6 +52,364 @@ supporting functions. */
|
||||
enum { SSB_FAIL, SSB_DONE, SSB_CONTINUE };
|
||||
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Find the minimum subject length for a group *
|
||||
*************************************************/
|
||||
|
||||
/* Scan a parenthesized group and compute the minimum length of subject that
|
||||
is needed to match it. This is a lower bound; it does not mean there is a
|
||||
string of that length that matches. In UTF8 mode, the result is in characters
|
||||
rather than bytes.
|
||||
|
||||
Arguments:
|
||||
code pointer to start of group (the bracket)
|
||||
startcode pointer to start of the whole pattern
|
||||
options the compiling options
|
||||
|
||||
Returns: the minimum length
|
||||
-1 if \C was encountered
|
||||
-2 internal error (missing capturing bracket)
|
||||
*/
|
||||
|
||||
static int
|
||||
find_minlength(const uschar *code, const uschar *startcode, int options)
|
||||
{
|
||||
int length = -1;
|
||||
BOOL utf8 = (options & PCRE_UTF8) != 0;
|
||||
BOOL had_recurse = FALSE;
|
||||
register int branchlength = 0;
|
||||
register uschar *cc = (uschar *)code + 1 + LINK_SIZE;
|
||||
|
||||
if (*code == OP_CBRA || *code == OP_SCBRA) cc += 2;
|
||||
|
||||
/* Scan along the opcodes for this branch. If we get to the end of the
|
||||
branch, check the length against that of the other branches. */
|
||||
|
||||
for (;;)
|
||||
{
|
||||
int d, min;
|
||||
uschar *cs, *ce;
|
||||
register int op = *cc;
|
||||
|
||||
switch (op)
|
||||
{
|
||||
case OP_CBRA:
|
||||
case OP_SCBRA:
|
||||
case OP_BRA:
|
||||
case OP_SBRA:
|
||||
case OP_ONCE:
|
||||
case OP_COND:
|
||||
case OP_SCOND:
|
||||
d = find_minlength(cc, startcode, options);
|
||||
if (d < 0) return d;
|
||||
branchlength += d;
|
||||
do cc += GET(cc, 1); while (*cc == OP_ALT);
|
||||
cc += 1 + LINK_SIZE;
|
||||
break;
|
||||
|
||||
/* Reached end of a branch; if it's a ket it is the end of a nested
|
||||
call. If it's ALT it is an alternation in a nested call. If it is
|
||||
END it's the end of the outer call. All can be handled by the same code. */
|
||||
|
||||
case OP_ALT:
|
||||
case OP_KET:
|
||||
case OP_KETRMAX:
|
||||
case OP_KETRMIN:
|
||||
case OP_END:
|
||||
if (length < 0 || (!had_recurse && branchlength < length))
|
||||
length = branchlength;
|
||||
if (*cc != OP_ALT) return length;
|
||||
cc += 1 + LINK_SIZE;
|
||||
branchlength = 0;
|
||||
had_recurse = FALSE;
|
||||
break;
|
||||
|
||||
/* Skip over assertive subpatterns */
|
||||
|
||||
case OP_ASSERT:
|
||||
case OP_ASSERT_NOT:
|
||||
case OP_ASSERTBACK:
|
||||
case OP_ASSERTBACK_NOT:
|
||||
do cc += GET(cc, 1); while (*cc == OP_ALT);
|
||||
/* Fall through */
|
||||
|
||||
/* Skip over things that don't match chars */
|
||||
|
||||
case OP_REVERSE:
|
||||
case OP_CREF:
|
||||
case OP_NCREF:
|
||||
case OP_RREF:
|
||||
case OP_NRREF:
|
||||
case OP_DEF:
|
||||
case OP_OPT:
|
||||
case OP_CALLOUT:
|
||||
case OP_SOD:
|
||||
case OP_SOM:
|
||||
case OP_EOD:
|
||||
case OP_EODN:
|
||||
case OP_CIRC:
|
||||
case OP_DOLL:
|
||||
case OP_NOT_WORD_BOUNDARY:
|
||||
case OP_WORD_BOUNDARY:
|
||||
cc += _pcre_OP_lengths[*cc];
|
||||
break;
|
||||
|
||||
/* Skip over a subpattern that has a {0} or {0,x} quantifier */
|
||||
|
||||
case OP_BRAZERO:
|
||||
case OP_BRAMINZERO:
|
||||
case OP_SKIPZERO:
|
||||
cc += _pcre_OP_lengths[*cc];
|
||||
do cc += GET(cc, 1); while (*cc == OP_ALT);
|
||||
cc += 1 + LINK_SIZE;
|
||||
break;
|
||||
|
||||
/* Handle literal characters and + repetitions */
|
||||
|
||||
case OP_CHAR:
|
||||
case OP_CHARNC:
|
||||
case OP_NOT:
|
||||
case OP_PLUS:
|
||||
case OP_MINPLUS:
|
||||
case OP_POSPLUS:
|
||||
case OP_NOTPLUS:
|
||||
case OP_NOTMINPLUS:
|
||||
case OP_NOTPOSPLUS:
|
||||
branchlength++;
|
||||
cc += 2;
|
||||
#ifdef SUPPORT_UTF8
|
||||
if (utf8 && cc[-1] >= 0xc0) cc += _pcre_utf8_table4[cc[-1] & 0x3f];
|
||||
#endif
|
||||
break;
|
||||
|
||||
case OP_TYPEPLUS:
|
||||
case OP_TYPEMINPLUS:
|
||||
case OP_TYPEPOSPLUS:
|
||||
branchlength++;
|
||||
cc += (cc[1] == OP_PROP || cc[1] == OP_NOTPROP)? 4 : 2;
|
||||
break;
|
||||
|
||||
/* Handle exact repetitions. The count is already in characters, but we
|
||||
need to skip over a multibyte character in UTF8 mode. */
|
||||
|
||||
case OP_EXACT:
|
||||
case OP_NOTEXACT:
|
||||
branchlength += GET2(cc,1);
|
||||
cc += 4;
|
||||
#ifdef SUPPORT_UTF8
|
||||
if (utf8 && cc[-1] >= 0xc0) cc += _pcre_utf8_table4[cc[-1] & 0x3f];
|
||||
#endif
|
||||
break;
|
||||
|
||||
case OP_TYPEEXACT:
|
||||
branchlength += GET2(cc,1);
|
||||
cc += (cc[3] == OP_PROP || cc[3] == OP_NOTPROP)? 6 : 4;
|
||||
break;
|
||||
|
||||
/* Handle single-char non-literal matchers */
|
||||
|
||||
case OP_PROP:
|
||||
case OP_NOTPROP:
|
||||
cc += 2;
|
||||
/* Fall through */
|
||||
|
||||
case OP_NOT_DIGIT:
|
||||
case OP_DIGIT:
|
||||
case OP_NOT_WHITESPACE:
|
||||
case OP_WHITESPACE:
|
||||
case OP_NOT_WORDCHAR:
|
||||
case OP_WORDCHAR:
|
||||
case OP_ANY:
|
||||
case OP_ALLANY:
|
||||
case OP_EXTUNI:
|
||||
case OP_HSPACE:
|
||||
case OP_NOT_HSPACE:
|
||||
case OP_VSPACE:
|
||||
case OP_NOT_VSPACE:
|
||||
branchlength++;
|
||||
cc++;
|
||||
break;
|
||||
|
||||
/* "Any newline" might match two characters */
|
||||
|
||||
case OP_ANYNL:
|
||||
branchlength += 2;
|
||||
cc++;
|
||||
break;
|
||||
|
||||
/* The single-byte matcher means we can't proceed in UTF-8 mode */
|
||||
|
||||
case OP_ANYBYTE:
|
||||
#ifdef SUPPORT_UTF8
|
||||
if (utf8) return -1;
|
||||
#endif
|
||||
branchlength++;
|
||||
cc++;
|
||||
break;
|
||||
|
||||
/* For repeated character types, we have to test for \p and \P, which have
|
||||
an extra two bytes of parameters. */
|
||||
|
||||
case OP_TYPESTAR:
|
||||
case OP_TYPEMINSTAR:
|
||||
case OP_TYPEQUERY:
|
||||
case OP_TYPEMINQUERY:
|
||||
case OP_TYPEPOSSTAR:
|
||||
case OP_TYPEPOSQUERY:
|
||||
if (cc[1] == OP_PROP || cc[1] == OP_NOTPROP) cc += 2;
|
||||
cc += _pcre_OP_lengths[op];
|
||||
break;
|
||||
|
||||
case OP_TYPEUPTO:
|
||||
case OP_TYPEMINUPTO:
|
||||
case OP_TYPEPOSUPTO:
|
||||
if (cc[3] == OP_PROP || cc[3] == OP_NOTPROP) cc += 2;
|
||||
cc += _pcre_OP_lengths[op];
|
||||
break;
|
||||
|
||||
/* Check a class for variable quantification */
|
||||
|
||||
#ifdef SUPPORT_UTF8
|
||||
case OP_XCLASS:
|
||||
cc += GET(cc, 1) - 33;
|
||||
/* Fall through */
|
||||
#endif
|
||||
|
||||
case OP_CLASS:
|
||||
case OP_NCLASS:
|
||||
cc += 33;
|
||||
|
||||
switch (*cc)
|
||||
{
|
||||
case OP_CRPLUS:
|
||||
case OP_CRMINPLUS:
|
||||
branchlength++;
|
||||
/* Fall through */
|
||||
|
||||
case OP_CRSTAR:
|
||||
case OP_CRMINSTAR:
|
||||
case OP_CRQUERY:
|
||||
case OP_CRMINQUERY:
|
||||
cc++;
|
||||
break;
|
||||
|
||||
case OP_CRRANGE:
|
||||
case OP_CRMINRANGE:
|
||||
branchlength += GET2(cc,1);
|
||||
cc += 5;
|
||||
break;
|
||||
|
||||
default:
|
||||
branchlength++;
|
||||
break;
|
||||
}
|
||||
break;
|
||||
|
||||
/* Backreferences and subroutine calls are treated in the same way: we find
|
||||
the minimum length for the subpattern. A recursion, however, causes an
|
||||
a flag to be set that causes the length of this branch to be ignored. The
|
||||
logic is that a recursion can only make sense if there is another
|
||||
alternation that stops the recursing. That will provide the minimum length
|
||||
(when no recursion happens). A backreference within the group that it is
|
||||
referencing behaves in the same way.
|
||||
|
||||
If PCRE_JAVASCRIPT_COMPAT is set, a backreference to an unset bracket
|
||||
matches an empty string (by default it causes a matching failure), so in
|
||||
that case we must set the minimum length to zero. */
|
||||
|
||||
case OP_REF:
|
||||
if ((options & PCRE_JAVASCRIPT_COMPAT) == 0)
|
||||
{
|
||||
ce = cs = (uschar *)_pcre_find_bracket(startcode, utf8, GET2(cc, 1));
|
||||
if (cs == NULL) return -2;
|
||||
do ce += GET(ce, 1); while (*ce == OP_ALT);
|
||||
if (cc > cs && cc < ce)
|
||||
{
|
||||
d = 0;
|
||||
had_recurse = TRUE;
|
||||
}
|
||||
else d = find_minlength(cs, startcode, options);
|
||||
}
|
||||
else d = 0;
|
||||
cc += 3;
|
||||
|
||||
/* Handle repeated back references */
|
||||
|
||||
switch (*cc)
|
||||
{
|
||||
case OP_CRSTAR:
|
||||
case OP_CRMINSTAR:
|
||||
case OP_CRQUERY:
|
||||
case OP_CRMINQUERY:
|
||||
min = 0;
|
||||
cc++;
|
||||
break;
|
||||
|
||||
case OP_CRRANGE:
|
||||
case OP_CRMINRANGE:
|
||||
min = GET2(cc, 1);
|
||||
cc += 5;
|
||||
break;
|
||||
|
||||
default:
|
||||
min = 1;
|
||||
break;
|
||||
}
|
||||
|
||||
branchlength += min * d;
|
||||
break;
|
||||
|
||||
case OP_RECURSE:
|
||||
cs = ce = (uschar *)startcode + GET(cc, 1);
|
||||
if (cs == NULL) return -2;
|
||||
do ce += GET(ce, 1); while (*ce == OP_ALT);
|
||||
if (cc > cs && cc < ce)
|
||||
had_recurse = TRUE;
|
||||
else
|
||||
branchlength += find_minlength(cs, startcode, options);
|
||||
cc += 1 + LINK_SIZE;
|
||||
break;
|
||||
|
||||
/* Anything else does not or need not match a character. We can get the
|
||||
item's length from the table, but for those that can match zero occurrences
|
||||
of a character, we must take special action for UTF-8 characters. */
|
||||
|
||||
case OP_UPTO:
|
||||
case OP_NOTUPTO:
|
||||
case OP_MINUPTO:
|
||||
case OP_NOTMINUPTO:
|
||||
case OP_POSUPTO:
|
||||
case OP_STAR:
|
||||
case OP_MINSTAR:
|
||||
case OP_NOTMINSTAR:
|
||||
case OP_POSSTAR:
|
||||
case OP_NOTPOSSTAR:
|
||||
case OP_QUERY:
|
||||
case OP_MINQUERY:
|
||||
case OP_NOTMINQUERY:
|
||||
case OP_POSQUERY:
|
||||
case OP_NOTPOSQUERY:
|
||||
cc += _pcre_OP_lengths[op];
|
||||
#ifdef SUPPORT_UTF8
|
||||
if (utf8 && cc[-1] >= 0xc0) cc += _pcre_utf8_table4[cc[-1] & 0x3f];
|
||||
#endif
|
||||
break;
|
||||
|
||||
/* For the record, these are the opcodes that are matched by "default":
|
||||
OP_ACCEPT, OP_CLOSE, OP_COMMIT, OP_FAIL, OP_PRUNE, OP_SET_SOM, OP_SKIP,
|
||||
OP_THEN. */
|
||||
|
||||
default:
|
||||
cc += _pcre_OP_lengths[op];
|
||||
break;
|
||||
}
|
||||
}
|
||||
/* Control never gets here */
|
||||
}
|
||||
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Set a bit and maybe its alternate case *
|
||||
*************************************************/
|
||||
@ -498,13 +856,15 @@ Arguments:
|
||||
set NULL unless error
|
||||
|
||||
Returns: pointer to a pcre_extra block, with study_data filled in and the
|
||||
appropriate flag set;
|
||||
appropriate flags set;
|
||||
NULL on error or if no optimization possible
|
||||
*/
|
||||
|
||||
PCRE_EXP_DEFN pcre_extra * PCRE_CALL_CONVENTION
|
||||
pcre_study(const pcre *external_re, int options, const char **errorptr)
|
||||
{
|
||||
int min;
|
||||
BOOL bits_set = FALSE;
|
||||
uschar start_bits[32];
|
||||
pcre_extra *extra;
|
||||
pcre_study_data *study;
|
||||
@ -531,30 +891,39 @@ code = (uschar *)re + re->name_table_offset +
|
||||
(re->name_count * re->name_entry_size);
|
||||
|
||||
/* For an anchored pattern, or an unanchored pattern that has a first char, or
|
||||
a multiline pattern that matches only at "line starts", no further processing
|
||||
at present. */
|
||||
a multiline pattern that matches only at "line starts", there is no point in
|
||||
seeking a list of starting bytes. */
|
||||
|
||||
if ((re->options & PCRE_ANCHORED) != 0 ||
|
||||
(re->flags & (PCRE_FIRSTSET|PCRE_STARTLINE)) != 0)
|
||||
return NULL;
|
||||
if ((re->options & PCRE_ANCHORED) == 0 &&
|
||||
(re->flags & (PCRE_FIRSTSET|PCRE_STARTLINE)) == 0)
|
||||
{
|
||||
/* Set the character tables in the block that is passed around */
|
||||
|
||||
/* Set the character tables in the block that is passed around */
|
||||
tables = re->tables;
|
||||
if (tables == NULL)
|
||||
(void)pcre_fullinfo(external_re, NULL, PCRE_INFO_DEFAULT_TABLES,
|
||||
(void *)(&tables));
|
||||
|
||||
tables = re->tables;
|
||||
if (tables == NULL)
|
||||
(void)pcre_fullinfo(external_re, NULL, PCRE_INFO_DEFAULT_TABLES,
|
||||
(void *)(&tables));
|
||||
compile_block.lcc = tables + lcc_offset;
|
||||
compile_block.fcc = tables + fcc_offset;
|
||||
compile_block.cbits = tables + cbits_offset;
|
||||
compile_block.ctypes = tables + ctypes_offset;
|
||||
|
||||
compile_block.lcc = tables + lcc_offset;
|
||||
compile_block.fcc = tables + fcc_offset;
|
||||
compile_block.cbits = tables + cbits_offset;
|
||||
compile_block.ctypes = tables + ctypes_offset;
|
||||
/* See if we can find a fixed set of initial characters for the pattern. */
|
||||
|
||||
/* See if we can find a fixed set of initial characters for the pattern. */
|
||||
memset(start_bits, 0, 32 * sizeof(uschar));
|
||||
bits_set = set_start_bits(code, start_bits,
|
||||
(re->options & PCRE_CASELESS) != 0, (re->options & PCRE_UTF8) != 0,
|
||||
&compile_block) == SSB_DONE;
|
||||
}
|
||||
|
||||
memset(start_bits, 0, 32 * sizeof(uschar));
|
||||
if (set_start_bits(code, start_bits, (re->options & PCRE_CASELESS) != 0,
|
||||
(re->options & PCRE_UTF8) != 0, &compile_block) != SSB_DONE) return NULL;
|
||||
/* Find the minimum length of subject string. */
|
||||
|
||||
min = find_minlength(code, code, re->options);
|
||||
|
||||
/* Return NULL if no optimization is possible. */
|
||||
|
||||
if (!bits_set && min < 0) return NULL;
|
||||
|
||||
/* Get a pcre_extra block and a pcre_study_data block. The study data is put in
|
||||
the latter, which is pointed to by the former, which may also get additional
|
||||
@ -577,8 +946,19 @@ extra->flags = PCRE_EXTRA_STUDY_DATA;
|
||||
extra->study_data = study;
|
||||
|
||||
study->size = sizeof(pcre_study_data);
|
||||
study->options = PCRE_STUDY_MAPPED;
|
||||
memcpy(study->start_bits, start_bits, sizeof(start_bits));
|
||||
study->flags = 0;
|
||||
|
||||
if (bits_set)
|
||||
{
|
||||
study->flags |= PCRE_STUDY_MAPPED;
|
||||
memcpy(study->start_bits, start_bits, sizeof(start_bits));
|
||||
}
|
||||
|
||||
if (min >= 0)
|
||||
{
|
||||
study->flags |= PCRE_STUDY_MINLEN;
|
||||
study->minlength = min;
|
||||
}
|
||||
|
||||
return extra;
|
||||
}
|
||||
|
@ -6,7 +6,7 @@
|
||||
and semantics are as close as possible to those of the Perl 5 language.
|
||||
|
||||
Written by Philip Hazel
|
||||
Copyright (c) 1997-2008 University of Cambridge
|
||||
Copyright (c) 1997-2009 University of Cambridge
|
||||
|
||||
-----------------------------------------------------------------------------
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
@ -126,7 +126,9 @@ if (study != NULL)
|
||||
{
|
||||
*internal_study = *study; /* To copy other fields */
|
||||
internal_study->size = byteflip(study->size, sizeof(study->size));
|
||||
internal_study->options = byteflip(study->options, sizeof(study->options));
|
||||
internal_study->flags = byteflip(study->flags, sizeof(study->flags));
|
||||
internal_study->minlength = byteflip(study->minlength,
|
||||
sizeof(study->minlength));
|
||||
}
|
||||
|
||||
return internal_re;
|
||||
|
@ -1,9 +1,26 @@
|
||||
#include "config.h"
|
||||
|
||||
#include "pcre_internal.h"
|
||||
|
||||
/* Unicode character database. */
|
||||
/* This file was autogenerated by the MultiStage2.py script. */
|
||||
/* Total size: 52808 bytes, block size: 128. */
|
||||
|
||||
/* The tables herein are needed only when UCP support is built */
|
||||
/* into PCRE. This module should not be referenced otherwise, so */
|
||||
/* it should not matter whether it is compiled or not. However */
|
||||
/* a comment was received about space saving - maybe the guy linked */
|
||||
/* all the modules rather than using a library - so we include a */
|
||||
/* condition to cut out the tables when not needed. But don't leave */
|
||||
/* a totally empty module because some compilers barf at that. */
|
||||
/* Instead, just supply small dummy tables. */
|
||||
|
||||
#ifndef SUPPORT_UCP
|
||||
const ucd_record _pcre_ucd_records[] = {{0,0,0 }};
|
||||
const uschar _pcre_ucd_stage1[] = {0};
|
||||
const pcre_uint16 _pcre_ucd_stage2[] = {0};
|
||||
#else
|
||||
|
||||
/* When recompiling tables with a new Unicode version,
|
||||
please check types in the structure definition from pcre_internal.h:
|
||||
typedef struct {
|
||||
@ -2606,3 +2623,4 @@ const pcre_uint16 _pcre_ucd_stage2[] = { /* 40448 bytes, block = 128 */
|
||||
#if UCD_BLOCK_SIZE != 128
|
||||
#error Please correct UCD_BLOCK_SIZE in pcre_internal.h
|
||||
#endif
|
||||
#endif /* SUPPORT_UCP */
|
||||
|
@ -223,12 +223,12 @@ if (namecount <= 0) printf("No named substrings\n"); else
|
||||
* *
|
||||
* If the previous match WAS for an empty string, we can't do that, as it *
|
||||
* would lead to an infinite loop. Instead, a special call of pcre_exec() *
|
||||
* is made with the PCRE_NOTEMPTY and PCRE_ANCHORED flags set. The first *
|
||||
* of these tells PCRE that an empty string is not a valid match; other *
|
||||
* possibilities must be tried. The second flag restricts PCRE to one *
|
||||
* match attempt at the initial string position. If this match succeeds, *
|
||||
* an alternative to the empty string match has been found, and we can *
|
||||
* proceed round the loop. *
|
||||
* is made with the PCRE_NOTEMPTY_ATSTART and PCRE_ANCHORED flags set. *
|
||||
* The first of these tells PCRE that an empty string at the start of the *
|
||||
* subject is not a valid match; other possibilities must be tried. The *
|
||||
* second flag restricts PCRE to one match attempt at the initial string *
|
||||
* position. If this match succeeds, an alternative to the empty string *
|
||||
* match has been found, and we can proceed round the loop. *
|
||||
*************************************************************************/
|
||||
|
||||
if (!find_all)
|
||||
@ -251,7 +251,7 @@ for (;;)
|
||||
if (ovector[0] == ovector[1])
|
||||
{
|
||||
if (ovector[0] == subject_length) break;
|
||||
options = PCRE_NOTEMPTY | PCRE_ANCHORED;
|
||||
options = PCRE_NOTEMPTY_ATSTART | PCRE_ANCHORED;
|
||||
}
|
||||
|
||||
/* Run the next matching operation */
|
||||
|
@ -68,64 +68,80 @@ static const int eint[] = {
|
||||
REG_EESCAPE, /* \c at end of pattern */
|
||||
REG_EESCAPE, /* unrecognized character follows \ */
|
||||
REG_BADBR, /* numbers out of order in {} quantifier */
|
||||
/* 5 */
|
||||
REG_BADBR, /* number too big in {} quantifier */
|
||||
REG_EBRACK, /* missing terminating ] for character class */
|
||||
REG_ECTYPE, /* invalid escape sequence in character class */
|
||||
REG_ERANGE, /* range out of order in character class */
|
||||
REG_BADRPT, /* nothing to repeat */
|
||||
/* 10 */
|
||||
REG_BADRPT, /* operand of unlimited repeat could match the empty string */
|
||||
REG_ASSERT, /* internal error: unexpected repeat */
|
||||
REG_BADPAT, /* unrecognized character after (? */
|
||||
REG_BADPAT, /* POSIX named classes are supported only within a class */
|
||||
REG_EPAREN, /* missing ) */
|
||||
/* 15 */
|
||||
REG_ESUBREG, /* reference to non-existent subpattern */
|
||||
REG_INVARG, /* erroffset passed as NULL */
|
||||
REG_INVARG, /* unknown option bit(s) set */
|
||||
REG_EPAREN, /* missing ) after comment */
|
||||
REG_ESIZE, /* parentheses nested too deeply */
|
||||
/* 20 */
|
||||
REG_ESIZE, /* regular expression too large */
|
||||
REG_ESPACE, /* failed to get memory */
|
||||
REG_EPAREN, /* unmatched brackets */
|
||||
REG_EPAREN, /* unmatched parentheses */
|
||||
REG_ASSERT, /* internal error: code overflow */
|
||||
REG_BADPAT, /* unrecognized character after (?< */
|
||||
/* 25 */
|
||||
REG_BADPAT, /* lookbehind assertion is not fixed length */
|
||||
REG_BADPAT, /* malformed number or name after (?( */
|
||||
REG_BADPAT, /* conditional group contains more than two branches */
|
||||
REG_BADPAT, /* assertion expected after (?( */
|
||||
REG_BADPAT, /* (?R or (?[+-]digits must be followed by ) */
|
||||
/* 30 */
|
||||
REG_ECTYPE, /* unknown POSIX class name */
|
||||
REG_BADPAT, /* POSIX collating elements are not supported */
|
||||
REG_INVARG, /* this version of PCRE is not compiled with PCRE_UTF8 support */
|
||||
REG_BADPAT, /* spare error */
|
||||
REG_BADPAT, /* character value in \x{...} sequence is too large */
|
||||
/* 35 */
|
||||
REG_BADPAT, /* invalid condition (?(0) */
|
||||
REG_BADPAT, /* \C not allowed in lookbehind assertion */
|
||||
REG_EESCAPE, /* PCRE does not support \L, \l, \N, \U, or \u */
|
||||
REG_BADPAT, /* number after (?C is > 255 */
|
||||
REG_BADPAT, /* closing ) for (?C expected */
|
||||
/* 40 */
|
||||
REG_BADPAT, /* recursive call could loop indefinitely */
|
||||
REG_BADPAT, /* unrecognized character after (?P */
|
||||
REG_BADPAT, /* syntax error in subpattern name (missing terminator) */
|
||||
REG_BADPAT, /* two named subpatterns have the same name */
|
||||
REG_BADPAT, /* invalid UTF-8 string */
|
||||
/* 45 */
|
||||
REG_BADPAT, /* support for \P, \p, and \X has not been compiled */
|
||||
REG_BADPAT, /* malformed \P or \p sequence */
|
||||
REG_BADPAT, /* unknown property name after \P or \p */
|
||||
REG_BADPAT, /* subpattern name is too long (maximum 32 characters) */
|
||||
REG_BADPAT, /* too many named subpatterns (maximum 10,000) */
|
||||
/* 50 */
|
||||
REG_BADPAT, /* repeated subpattern is too long */
|
||||
REG_BADPAT, /* octal value is greater than \377 (not in UTF-8 mode) */
|
||||
REG_BADPAT, /* internal error: overran compiling workspace */
|
||||
REG_BADPAT, /* internal error: previously-checked referenced subpattern not found */
|
||||
REG_BADPAT, /* DEFINE group contains more than one branch */
|
||||
/* 55 */
|
||||
REG_BADPAT, /* repeating a DEFINE group is not allowed */
|
||||
REG_INVARG, /* inconsistent NEWLINE options */
|
||||
REG_BADPAT, /* \g is not followed followed by an (optionally braced) non-zero number */
|
||||
REG_BADPAT, /* (?+ or (?- must be followed by a non-zero number */
|
||||
REG_BADPAT, /* a numbered reference must not be zero */
|
||||
REG_BADPAT, /* (*VERB) with an argument is not supported */
|
||||
/* 60 */
|
||||
REG_BADPAT, /* (*VERB) not recognized */
|
||||
REG_BADPAT, /* number is too big */
|
||||
REG_BADPAT, /* subpattern name expected */
|
||||
REG_BADPAT, /* digit expected after (?+ */
|
||||
REG_BADPAT /* ] is an invalid data character in JavaScript compatibility mode */
|
||||
REG_BADPAT, /* ] is an invalid data character in JavaScript compatibility mode */
|
||||
/* 65 */
|
||||
REG_BADPAT /* different names for subpatterns of the same number are not allowed */
|
||||
};
|
||||
|
||||
/* Table of texts corresponding to POSIX error codes */
|
||||
@ -224,17 +240,25 @@ int erroffset;
|
||||
int errorcode;
|
||||
int options = 0;
|
||||
|
||||
if ((cflags & REG_ICASE) != 0) options |= PCRE_CASELESS;
|
||||
if ((cflags & REG_NEWLINE) != 0) options |= PCRE_MULTILINE;
|
||||
if ((cflags & REG_DOTALL) != 0) options |= PCRE_DOTALL;
|
||||
if ((cflags & REG_NOSUB) != 0) options |= PCRE_NO_AUTO_CAPTURE;
|
||||
if ((cflags & REG_UTF8) != 0) options |= PCRE_UTF8;
|
||||
if ((cflags & REG_ICASE) != 0) options |= PCRE_CASELESS;
|
||||
if ((cflags & REG_NEWLINE) != 0) options |= PCRE_MULTILINE;
|
||||
if ((cflags & REG_DOTALL) != 0) options |= PCRE_DOTALL;
|
||||
if ((cflags & REG_NOSUB) != 0) options |= PCRE_NO_AUTO_CAPTURE;
|
||||
if ((cflags & REG_UTF8) != 0) options |= PCRE_UTF8;
|
||||
if ((cflags & REG_UNGREEDY) != 0) options |= PCRE_UNGREEDY;
|
||||
|
||||
preg->re_pcre = pcre_compile2(pattern, options, &errorcode, &errorptr,
|
||||
&erroffset, NULL);
|
||||
preg->re_erroffset = erroffset;
|
||||
|
||||
if (preg->re_pcre == NULL) return eint[errorcode];
|
||||
/* Safety: if the error code is too big for the translation vector (which
|
||||
should not happen, but we all make mistakes), return REG_BADPAT. */
|
||||
|
||||
if (preg->re_pcre == NULL)
|
||||
{
|
||||
return (errorcode < sizeof(eint)/sizeof(const int))?
|
||||
eint[errorcode] : REG_BADPAT;
|
||||
}
|
||||
|
||||
preg->re_nsub = pcre_info((const pcre *)preg->re_pcre, NULL, NULL);
|
||||
return 0;
|
||||
@ -276,10 +300,11 @@ if ((eflags & REG_NOTEMPTY) != 0) options |= PCRE_NOTEMPTY;
|
||||
|
||||
((regex_t *)preg)->re_erroffset = (size_t)(-1); /* Only has meaning after compile */
|
||||
|
||||
/* When no string data is being returned, ensure that nmatch is zero.
|
||||
Otherwise, ensure the vector for holding the return data is large enough. */
|
||||
/* When no string data is being returned, or no vector has been passed in which
|
||||
to put it, ensure that nmatch is zero. Otherwise, ensure the vector for holding
|
||||
the return data is large enough. */
|
||||
|
||||
if (nosub) nmatch = 0;
|
||||
if (nosub || pmatch == NULL) nmatch = 0;
|
||||
|
||||
else if (nmatch > 0)
|
||||
{
|
||||
|
@ -50,17 +50,18 @@ POSSIBILITY OF SUCH DAMAGE.
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
/* Options, mostly defined by POSIX, but with a couple of extras. */
|
||||
/* Options, mostly defined by POSIX, but with some extras. */
|
||||
|
||||
#define REG_ICASE 0x0001
|
||||
#define REG_NEWLINE 0x0002
|
||||
#define REG_NOTBOL 0x0004
|
||||
#define REG_NOTEOL 0x0008
|
||||
#define REG_DOTALL 0x0010 /* NOT defined by POSIX. */
|
||||
#define REG_NOSUB 0x0020
|
||||
#define REG_UTF8 0x0040 /* NOT defined by POSIX. */
|
||||
#define REG_ICASE 0x0001 /* Maps to PCRE_CASELESS */
|
||||
#define REG_NEWLINE 0x0002 /* Maps to PCRE_MULTILINE */
|
||||
#define REG_NOTBOL 0x0004 /* Maps to PCRE_NOTBOL */
|
||||
#define REG_NOTEOL 0x0008 /* Maps to PCRE_NOTEOL */
|
||||
#define REG_DOTALL 0x0010 /* NOT defined by POSIX; maps to PCRE_DOTALL */
|
||||
#define REG_NOSUB 0x0020 /* Maps to PCRE_NO_AUTO_CAPTURE */
|
||||
#define REG_UTF8 0x0040 /* NOT defined by POSIX; maps to PCRE_UTF8 */
|
||||
#define REG_STARTEND 0x0080 /* BSD feature: pass subject string by so,eo */
|
||||
#define REG_NOTEMPTY 0x0100 /* NOT defined by POSIX. */
|
||||
#define REG_NOTEMPTY 0x0100 /* NOT defined by POSIX; maps to PCRE_NOTEMPTY */
|
||||
#define REG_UNGREEDY 0x0200 /* NOT defined by POSIX; maps to PCRE_UNGREEDY */
|
||||
|
||||
/* This is not used by PCRE, but by defining it we make it easier
|
||||
to slot PCRE into existing programs that make POSIX calls. */
|
||||
|
24
ext/pcre/pcrelib/testdata/grepoutput
vendored
24
ext/pcre/pcrelib/testdata/grepoutput
vendored
@ -423,3 +423,27 @@ This time it [1;31mjumps[00m and [1;31mjumps[00m and [1;31mjumps[00m.
|
||||
Here is the [1;31mpattern[00m again.
|
||||
That time it was on a [1;31mline by itself[00m.
|
||||
This line contains [1;31mpattern[00m not on a [1;31mline by itself[00m.
|
||||
---------------------------- Test 55 -----------------------------
|
||||
./testdata/grepinput:456
|
||||
./testdata/grepinput8:0
|
||||
./testdata/grepinputv:1
|
||||
./testdata/grepinputx:0
|
||||
---------------------------- Test 56 -----------------------------
|
||||
./testdata/grepinput:456
|
||||
./testdata/grepinputv:1
|
||||
---------------------------- Test 57 -----------------------------
|
||||
PATTERN at the start of a line.
|
||||
In the middle of a line, PATTERN appears.
|
||||
Check up on PATTERN near the end.
|
||||
---------------------------- Test 58 -----------------------------
|
||||
PATTERN at the start of a line.
|
||||
In the middle of a line, PATTERN appears.
|
||||
Check up on PATTERN near the end.
|
||||
---------------------------- Test 59 -----------------------------
|
||||
PATTERN at the start of a line.
|
||||
In the middle of a line, PATTERN appears.
|
||||
Check up on PATTERN near the end.
|
||||
---------------------------- Test 60 -----------------------------
|
||||
PATTERN at the start of a line.
|
||||
In the middle of a line, PATTERN appears.
|
||||
Check up on PATTERN near the end.
|
||||
|
5
ext/pcre/pcrelib/testdata/testinput1
vendored
5
ext/pcre/pcrelib/testdata/testinput1
vendored
@ -1,3 +1,6 @@
|
||||
/-- This set of tests is for features that are compatible with all versions of
|
||||
Perl 5, in non-UTF-8 mode. --/
|
||||
|
||||
/the quick brown fox/
|
||||
the quick brown fox
|
||||
The quick brown FOX
|
||||
@ -4064,4 +4067,4 @@
|
||||
/^%((?(?=[a])[^%])|b)*%$/
|
||||
%ab%
|
||||
|
||||
/ End of testinput1 /
|
||||
/-- End of testinput1 --/
|
||||
|
2
ext/pcre/pcrelib/testdata/testinput10
vendored
2
ext/pcre/pcrelib/testdata/testinput10
vendored
@ -121,4 +121,4 @@ are all themselves checked in other tests. --/
|
||||
|
||||
/[^\xaa]/8BM
|
||||
|
||||
/ End of testinput10 /
|
||||
/-- End of testinput10 --/
|
||||
|
854
ext/pcre/pcrelib/testdata/testinput2
vendored
854
ext/pcre/pcrelib/testdata/testinput2
vendored
@ -1,3 +1,14 @@
|
||||
/-- This set of tests is not Perl-compatible. It checks on special features
|
||||
of PCRE's API, error diagnostics, and the compiled code of some patterns.
|
||||
It also checks the non-Perl syntax the PCRE supports (Python, .NET,
|
||||
Oniguruma). Finally, there are some tests where PCRE and Perl differ,
|
||||
either because PCRE can't be compatible, or there is potential Perl
|
||||
bug. --/
|
||||
|
||||
/-- Originally, the Perl 5.10 things were in here too, but now I have separated
|
||||
many (most?) of them out into test 11. However, there may still be some
|
||||
that were overlooked. --/
|
||||
|
||||
/(a)b|/I
|
||||
|
||||
/abc/I
|
||||
@ -123,38 +134,38 @@
|
||||
defabc
|
||||
\Zdefabc
|
||||
|
||||
/abc/IP
|
||||
/abc/P
|
||||
abc
|
||||
*** Failers
|
||||
|
||||
/^abc|def/IP
|
||||
/^abc|def/P
|
||||
abcdef
|
||||
abcdef\B
|
||||
|
||||
/.*((abc)$|(def))/IP
|
||||
/.*((abc)$|(def))/P
|
||||
defabc
|
||||
\Zdefabc
|
||||
|
||||
/the quick brown fox/IP
|
||||
/the quick brown fox/P
|
||||
the quick brown fox
|
||||
*** Failers
|
||||
The Quick Brown Fox
|
||||
|
||||
/the quick brown fox/IPi
|
||||
/the quick brown fox/Pi
|
||||
the quick brown fox
|
||||
The Quick Brown Fox
|
||||
|
||||
/abc.def/IP
|
||||
/abc.def/P
|
||||
*** Failers
|
||||
abc\ndef
|
||||
|
||||
/abc$/IP
|
||||
/abc$/P
|
||||
abc
|
||||
abc\n
|
||||
|
||||
/(abc)\2/IP
|
||||
/(abc)\2/P
|
||||
|
||||
/(abc\1)/IP
|
||||
/(abc\1)/P
|
||||
abc
|
||||
|
||||
/)/
|
||||
@ -593,7 +604,7 @@
|
||||
*** Failers
|
||||
\Nabc
|
||||
|
||||
/a*(b+)(z)(z)/IP
|
||||
/a*(b+)(z)(z)/P
|
||||
aaaabbbbzzzz
|
||||
aaaabbbbzzzz\O0
|
||||
aaaabbbbzzzz\O1
|
||||
@ -1122,14 +1133,6 @@
|
||||
|
||||
/(a(?1)+b)/DZ
|
||||
|
||||
/^\W*(?:((.)\W*(?1)\W*\2|)|((.)\W*(?3)\W*\4|\W*.\W*))\W*$/Ii
|
||||
1221
|
||||
Satan, oscillate my metallic sonatas!
|
||||
A man, a plan, a canal: Panama!
|
||||
Able was I ere I saw Elba.
|
||||
*** Failers
|
||||
The quick brown fox
|
||||
|
||||
/^(\d+|\((?1)([+*-])(?1)\)|-(?1))$/I
|
||||
12
|
||||
(((2+2)*-3)-7)
|
||||
@ -1419,13 +1422,13 @@
|
||||
** Failers
|
||||
line one\nthis is a line\nbreak in the second line
|
||||
|
||||
/ab.cd/IP
|
||||
/ab.cd/P
|
||||
ab-cd
|
||||
ab=cd
|
||||
** Failers
|
||||
ab\ncd
|
||||
|
||||
/ab.cd/IPs
|
||||
/ab.cd/Ps
|
||||
ab-cd
|
||||
ab=cd
|
||||
ab\ncd
|
||||
@ -1480,10 +1483,10 @@
|
||||
(this)
|
||||
((this))
|
||||
|
||||
/a(b)c/IPN
|
||||
/a(b)c/PN
|
||||
abc
|
||||
|
||||
/a(?P<name>b)c/IPN
|
||||
/a(?P<name>b)c/PN
|
||||
abc
|
||||
|
||||
/\x{100}/I
|
||||
@ -1915,13 +1918,6 @@ a random value. /Ix
|
||||
/(?=(?'abc'\w+))\k<abc>:/I
|
||||
abcd:
|
||||
|
||||
/(?'abc'\w+):\k<abc>{2}/
|
||||
a:aaxyz
|
||||
ab:ababxyz
|
||||
** Failers
|
||||
a:axyz
|
||||
ab:abxyz
|
||||
|
||||
/(?'abc'a|b)(?<abc>d|e)\k<abc>{2}/J
|
||||
adaa
|
||||
** Failers
|
||||
@ -1934,10 +1930,6 @@ a random value. /Ix
|
||||
** Failers
|
||||
bddd
|
||||
|
||||
/^(?<ab>a)? (?(<ab>)b|c) (?('ab')d|e)/x
|
||||
abd
|
||||
ce
|
||||
|
||||
/(?(<bc))/
|
||||
|
||||
/(?(''))/
|
||||
@ -1955,16 +1947,6 @@ a random value. /Ix
|
||||
/(?<1> (?'B' abc (?(R) (?(R&1)1) (?(R&B)2) X | (?1) (?2) (?R) ))) /x
|
||||
abcabc1Xabc2XabcXabcabc
|
||||
|
||||
/^(?(DEFINE) (?<A> a) (?<B> b) ) (?&A) (?&B) /x
|
||||
abcd
|
||||
|
||||
/(?<NAME>(?&NAME_PAT))\s+(?<ADDR>(?&ADDRESS_PAT))
|
||||
(?(DEFINE)
|
||||
(?<NAME_PAT>[a-z]+)
|
||||
(?<ADDRESS_PAT>\d+)
|
||||
)/x
|
||||
metcalfe 33
|
||||
|
||||
/^(?(DEFINE) abc | xyz ) /x
|
||||
|
||||
/(?(DEFINE) abc) xyz/xI
|
||||
@ -2053,22 +2035,6 @@ a random value. /Ix
|
||||
/(?1)X(?<abc>P)/I
|
||||
abcPXP123
|
||||
|
||||
/(?(DEFINE)(?<byte>2[0-4]\d|25[0-5]|1\d\d|[1-9]?\d))\b(?&byte)(\.(?&byte)){3}/
|
||||
1.2.3.4
|
||||
131.111.10.206
|
||||
10.0.0.0
|
||||
** Failers
|
||||
10.6
|
||||
455.3.4.5
|
||||
|
||||
/\b(?&byte)(\.(?&byte)){3}(?(DEFINE)(?<byte>2[0-4]\d|25[0-5]|1\d\d|[1-9]?\d))/
|
||||
1.2.3.4
|
||||
131.111.10.206
|
||||
10.0.0.0
|
||||
** Failers
|
||||
10.6
|
||||
455.3.4.5
|
||||
|
||||
/(?:a(?&abc)b)*(?<abc>x)/
|
||||
123axbaxbaxbx456
|
||||
123axbaxbaxb456
|
||||
@ -2090,9 +2056,6 @@ a random value. /Ix
|
||||
defabcabcxyz
|
||||
DEFabcABCXYZ
|
||||
|
||||
/^(a(b))\1\g1\g{1}\g-1\g{-1}\g{-02}Z/
|
||||
ababababbbabZXXXX
|
||||
|
||||
/^(a)\g-2/
|
||||
|
||||
/^(a)\g/
|
||||
@ -2191,26 +2154,12 @@ a random value. /Ix
|
||||
/^(?(+1)X|Y)(.)/BZ
|
||||
Y!
|
||||
|
||||
/(foo)\Kbar/
|
||||
foobar
|
||||
|
||||
/(foo)(\Kbar|baz)/
|
||||
foobar
|
||||
foobaz
|
||||
|
||||
/(foo\Kbar)baz/
|
||||
foobarbaz
|
||||
|
||||
/(?<A>tom|bon)-\k{A}/
|
||||
tom-tom
|
||||
bon-bon
|
||||
** Failers
|
||||
tom-bon
|
||||
|
||||
/(?<A>tom|bon)-\g{A}/
|
||||
tom-tom
|
||||
bon-bon
|
||||
|
||||
/\g{A/
|
||||
|
||||
/(?|(abc)|(xyz))/BZ
|
||||
@ -2225,50 +2174,6 @@ a random value. /Ix
|
||||
xabcpqrx
|
||||
xxyzx
|
||||
|
||||
/(?|(abc)|(xyz))\1/
|
||||
abcabc
|
||||
xyzxyz
|
||||
** Failers
|
||||
abcxyz
|
||||
xyzabc
|
||||
|
||||
/(?|(abc)|(xyz))(?1)/
|
||||
abcabc
|
||||
xyzabc
|
||||
** Failers
|
||||
xyzxyz
|
||||
|
||||
/\H\h\V\v/
|
||||
X X\x0a
|
||||
X\x09X\x0b
|
||||
** Failers
|
||||
\xa0 X\x0a
|
||||
|
||||
/\H*\h+\V?\v{3,4}/
|
||||
\x09\x20\xa0X\x0a\x0b\x0c\x0d\x0a
|
||||
\x09\x20\xa0\x0a\x0b\x0c\x0d\x0a
|
||||
\x09\x20\xa0\x0a\x0b\x0c
|
||||
** Failers
|
||||
\x09\x20\xa0\x0a\x0b
|
||||
|
||||
/\H{3,4}/
|
||||
XY ABCDE
|
||||
XY PQR ST
|
||||
|
||||
/.\h{3,4}./
|
||||
XY AB PQRS
|
||||
|
||||
/\h*X\h?\H+Y\H?Z/
|
||||
>XNNNYZ
|
||||
> X NYQZ
|
||||
** Failers
|
||||
>XYZ
|
||||
> X NY Z
|
||||
|
||||
/\v*X\v?Y\v+Z\V*\x0a\V+\x0b\V{2,3}\x0c/
|
||||
>XY\x0aZ\x0aA\x0bNN\x0c
|
||||
>\x0a\x0dX\x0aY\x0a\x0bZZZ\x0aAAA\x0bNNN\x0c
|
||||
|
||||
/[\h]/BZ
|
||||
>\x09<
|
||||
|
||||
@ -2341,49 +2246,6 @@ a random value. /Ix
|
||||
|
||||
/A(*PRUNE)B(*SKIP)C(*THEN)D(*COMMIT)E(*F)F(*FAIL)G(?!)H(*ACCEPT)I/BZ
|
||||
|
||||
/^a+(*FAIL)/
|
||||
aaaaaa
|
||||
|
||||
/a+b?c+(*FAIL)/
|
||||
aaabccc
|
||||
|
||||
/a+b?(*PRUNE)c+(*FAIL)/
|
||||
aaabccc
|
||||
|
||||
/a+b?(*COMMIT)c+(*FAIL)/
|
||||
aaabccc
|
||||
|
||||
/a+b?(*SKIP)c+(*FAIL)/
|
||||
aaabcccaaabccc
|
||||
|
||||
/^(?:aaa(*THEN)\w{6}|bbb(*THEN)\w{5}|ccc(*THEN)\w{4}|\w{3})/
|
||||
aaaxxxxxx
|
||||
aaa++++++
|
||||
bbbxxxxx
|
||||
bbb+++++
|
||||
cccxxxx
|
||||
ccc++++
|
||||
dddddddd
|
||||
|
||||
/^(aaa(*THEN)\w{6}|bbb(*THEN)\w{5}|ccc(*THEN)\w{4}|\w{3})/
|
||||
aaaxxxxxx
|
||||
aaa++++++
|
||||
bbbxxxxx
|
||||
bbb+++++
|
||||
cccxxxx
|
||||
ccc++++
|
||||
dddddddd
|
||||
|
||||
/a+b?(*THEN)c+(*FAIL)/
|
||||
aaabccc
|
||||
|
||||
/(A (A|B(*ACCEPT)|C) D)(E)/x
|
||||
ABX
|
||||
AADE
|
||||
ACDE
|
||||
** Failers
|
||||
AD
|
||||
|
||||
/^a+(*FAIL)/C
|
||||
aaaaaa
|
||||
|
||||
@ -2589,66 +2451,8 @@ a random value. /Ix
|
||||
|
||||
/[[:a\dz:]]/
|
||||
|
||||
/^(?<name>a|b\g<name>c)/
|
||||
aaaa
|
||||
bacxxx
|
||||
bbaccxxx
|
||||
bbbacccxx
|
||||
|
||||
/^(?<name>a|b\g'name'c)/
|
||||
aaaa
|
||||
bacxxx
|
||||
bbaccxxx
|
||||
bbbacccxx
|
||||
|
||||
/^(a|b\g<1>c)/
|
||||
aaaa
|
||||
bacxxx
|
||||
bbaccxxx
|
||||
bbbacccxx
|
||||
|
||||
/^(a|b\g'1'c)/
|
||||
aaaa
|
||||
bacxxx
|
||||
bbaccxxx
|
||||
bbbacccxx
|
||||
|
||||
/^(a|b\g'-1'c)/
|
||||
aaaa
|
||||
bacxxx
|
||||
bbaccxxx
|
||||
bbbacccxx
|
||||
|
||||
/(^(a|b\g<-1>c))/
|
||||
aaaa
|
||||
bacxxx
|
||||
bbaccxxx
|
||||
bbbacccxx
|
||||
|
||||
/(^(a|b\g<-1'c))/
|
||||
|
||||
/(^(a|b\g{-1}))/
|
||||
bacxxx
|
||||
|
||||
/(?-i:\g<name>)(?i:(?<name>a))/
|
||||
XaaX
|
||||
XAAX
|
||||
|
||||
/(?i:\g<name>)(?-i:(?<name>a))/
|
||||
XaaX
|
||||
** Failers
|
||||
XAAX
|
||||
|
||||
/(?-i:\g<+1>)(?i:(a))/
|
||||
XaaX
|
||||
XAAX
|
||||
|
||||
/(?=(?<regex>(?#simplesyntax)\$(?<name>[a-zA-Z_\x{7f}-\x{ff}][a-zA-Z0-9_\x{7f}-\x{ff}]*)(?:\[(?<index>[a-zA-Z0-9_\x{7f}-\x{ff}]+|\$\g<name>)\]|->\g<name>(\(.*?\))?)?|(?#simple syntax withbraces)\$\{(?:\g<name>(?<indices>\[(?:\g<index>|'(?:\\.|[^'\\])*'|"(?:\g<regex>|\\.|[^"\\])*")\])?|\g<complex>|\$\{\g<complex>\})\}|(?#complexsyntax)\{(?<complex>\$(?<segment>\g<name>(\g<indices>*|\(.*?\))?)(?:->\g<segment>)*|\$\g<complex>|\$\{\g<complex>\})\}))\{/
|
||||
|
||||
/(?<n>a|b|c)\g<n>*/
|
||||
abc
|
||||
accccbbb
|
||||
|
||||
/^(?+1)(?<a>x|y){0}z/
|
||||
xzxx
|
||||
yzyy
|
||||
@ -2755,22 +2559,614 @@ a random value. /Ix
|
||||
/^"((?(?=[a])[^"])|b)*"$/
|
||||
"ab"
|
||||
|
||||
/^X(?5)(a)(?|(b)|(q))(c)(d)(Y)/
|
||||
XYabcdY
|
||||
|
||||
/^X(?5)(a)(?|(b)|(q))(c)(d)Y/
|
||||
XYabcdY
|
||||
|
||||
/^X(?&N)(a)(?|(b)|(q))(c)(d)(?<N>Y)/
|
||||
XYabcdY
|
||||
|
||||
/Xa{2,4}b/
|
||||
X\P
|
||||
Xa\P
|
||||
Xaa\P
|
||||
Xaaa\P
|
||||
Xaaaa\P
|
||||
|
||||
/Xa{2,4}?b/
|
||||
X\P
|
||||
Xa\P
|
||||
Xaa\P
|
||||
Xaaa\P
|
||||
Xaaaa\P
|
||||
|
||||
/Xa{2,4}+b/
|
||||
X\P
|
||||
Xa\P
|
||||
Xaa\P
|
||||
Xaaa\P
|
||||
Xaaaa\P
|
||||
|
||||
/X\d{2,4}b/
|
||||
X\P
|
||||
X3\P
|
||||
X33\P
|
||||
X333\P
|
||||
X3333\P
|
||||
|
||||
/X\d{2,4}?b/
|
||||
X\P
|
||||
X3\P
|
||||
X33\P
|
||||
X333\P
|
||||
X3333\P
|
||||
|
||||
/X\d{2,4}+b/
|
||||
X\P
|
||||
X3\P
|
||||
X33\P
|
||||
X333\P
|
||||
X3333\P
|
||||
|
||||
/X\D{2,4}b/
|
||||
X\P
|
||||
Xa\P
|
||||
Xaa\P
|
||||
Xaaa\P
|
||||
Xaaaa\P
|
||||
|
||||
/X\D{2,4}?b/
|
||||
X\P
|
||||
Xa\P
|
||||
Xaa\P
|
||||
Xaaa\P
|
||||
Xaaaa\P
|
||||
|
||||
/X\D{2,4}+b/
|
||||
X\P
|
||||
Xa\P
|
||||
Xaa\P
|
||||
Xaaa\P
|
||||
Xaaaa\P
|
||||
|
||||
/X[abc]{2,4}b/
|
||||
X\P
|
||||
Xa\P
|
||||
Xaa\P
|
||||
Xaaa\P
|
||||
Xaaaa\P
|
||||
|
||||
/X[abc]{2,4}?b/
|
||||
X\P
|
||||
Xa\P
|
||||
Xaa\P
|
||||
Xaaa\P
|
||||
Xaaaa\P
|
||||
|
||||
/X[abc]{2,4}+b/
|
||||
X\P
|
||||
Xa\P
|
||||
Xaa\P
|
||||
Xaaa\P
|
||||
Xaaaa\P
|
||||
|
||||
/X[^a]{2,4}b/
|
||||
X\P
|
||||
Xz\P
|
||||
Xzz\P
|
||||
Xzzz\P
|
||||
Xzzzz\P
|
||||
|
||||
/X[^a]{2,4}?b/
|
||||
X\P
|
||||
Xz\P
|
||||
Xzz\P
|
||||
Xzzz\P
|
||||
Xzzzz\P
|
||||
|
||||
/X[^a]{2,4}+b/
|
||||
X\P
|
||||
Xz\P
|
||||
Xzz\P
|
||||
Xzzz\P
|
||||
Xzzzz\P
|
||||
|
||||
/(Y)X\1{2,4}b/
|
||||
YX\P
|
||||
YXY\P
|
||||
YXYY\P
|
||||
YXYYY\P
|
||||
YXYYYY\P
|
||||
|
||||
/(Y)X\1{2,4}?b/
|
||||
YX\P
|
||||
YXY\P
|
||||
YXYY\P
|
||||
YXYYY\P
|
||||
YXYYYY\P
|
||||
|
||||
/(Y)X\1{2,4}+b/
|
||||
YX\P
|
||||
YXY\P
|
||||
YXYY\P
|
||||
YXYYY\P
|
||||
YXYYYY\P
|
||||
|
||||
/\++\KZ|\d+X|9+Y/
|
||||
++++123999\P
|
||||
++++123999Y\P
|
||||
++++Z1234\P
|
||||
|
||||
/Z(*F)/
|
||||
Z\P
|
||||
ZA\P
|
||||
|
||||
/Z(?!)/
|
||||
Z\P
|
||||
ZA\P
|
||||
|
||||
/dog(sbody)?/
|
||||
dogs\P
|
||||
dogs\P\P
|
||||
|
||||
/dog(sbody)??/
|
||||
dogs\P
|
||||
dogs\P\P
|
||||
|
||||
/dog|dogsbody/
|
||||
dogs\P
|
||||
dogs\P\P
|
||||
|
||||
/dogsbody|dog/
|
||||
dogs\P
|
||||
dogs\P\P
|
||||
|
||||
/\bthe cat\b/
|
||||
the cat\P
|
||||
the cat\P\P
|
||||
|
||||
/abc/
|
||||
abc\P
|
||||
abc\P\P
|
||||
|
||||
/\w+A/P
|
||||
CDAAAAB
|
||||
|
||||
/\w+A/PU
|
||||
CDAAAAB
|
||||
|
||||
/abc\K123/
|
||||
xyzabc123pqr
|
||||
xyzabc12\P
|
||||
xyzabc12\P\P
|
||||
|
||||
/(?<=abc)123/
|
||||
xyzabc123pqr
|
||||
xyzabc12\P
|
||||
xyzabc12\P\P
|
||||
|
||||
/\babc\b/
|
||||
+++abc+++
|
||||
+++ab\P
|
||||
+++ab\P\P
|
||||
|
||||
/(?&word)(?&element)(?(DEFINE)(?<element><[^m][^>]>[^<])(?<word>\w*+))/BZ
|
||||
|
||||
/(?&word)(?&element)(?(DEFINE)(?<element><[^\d][^>]>[^<])(?<word>\w*+))/BZ
|
||||
|
||||
/(ab)(x(y)z(cd(*ACCEPT)))pq/BZ
|
||||
|
||||
/abc\K/+
|
||||
abcdef
|
||||
abcdef\N\N
|
||||
xyzabcdef\N\N
|
||||
** Failers
|
||||
abcdef\N
|
||||
xyzabcdef\N
|
||||
|
||||
/^(?:(?=abc)|abc\K)/+
|
||||
abcdef
|
||||
abcdef\N\N
|
||||
** Failers
|
||||
abcdef\N
|
||||
|
||||
/a?b?/+
|
||||
xyz
|
||||
xyzabc
|
||||
xyzabc\N
|
||||
xyzabc\N\N
|
||||
xyz\N\N
|
||||
** Failers
|
||||
xyz\N
|
||||
|
||||
/^a?b?/+
|
||||
xyz
|
||||
xyzabc
|
||||
** Failers
|
||||
xyzabc\N
|
||||
xyzabc\N\N
|
||||
xyz\N\N
|
||||
xyz\N
|
||||
|
||||
/^(?<name>a|b\g<name>c)/
|
||||
aaaa
|
||||
bacxxx
|
||||
bbaccxxx
|
||||
bbbacccxx
|
||||
|
||||
/^(?<name>a|b\g'name'c)/
|
||||
aaaa
|
||||
bacxxx
|
||||
bbaccxxx
|
||||
bbbacccxx
|
||||
|
||||
/^(a|b\g<1>c)/
|
||||
aaaa
|
||||
bacxxx
|
||||
bbaccxxx
|
||||
bbbacccxx
|
||||
|
||||
/^(a|b\g'1'c)/
|
||||
aaaa
|
||||
bacxxx
|
||||
bbaccxxx
|
||||
bbbacccxx
|
||||
|
||||
/^(a|b\g'-1'c)/
|
||||
aaaa
|
||||
bacxxx
|
||||
bbaccxxx
|
||||
bbbacccxx
|
||||
|
||||
/(^(a|b\g<-1>c))/
|
||||
aaaa
|
||||
bacxxx
|
||||
bbaccxxx
|
||||
bbbacccxx
|
||||
|
||||
/(?-i:\g<name>)(?i:(?<name>a))/
|
||||
XaaX
|
||||
XAAX
|
||||
|
||||
/(?i:\g<name>)(?-i:(?<name>a))/
|
||||
XaaX
|
||||
** Failers
|
||||
XAAX
|
||||
|
||||
/(?-i:\g<+1>)(?i:(a))/
|
||||
XaaX
|
||||
XAAX
|
||||
|
||||
/(?=(?<regex>(?#simplesyntax)\$(?<name>[a-zA-Z_\x{7f}-\x{ff}][a-zA-Z0-9_\x{7f}-\x{ff}]*)(?:\[(?<index>[a-zA-Z0-9_\x{7f}-\x{ff}]+|\$\g<name>)\]|->\g<name>(\(.*?\))?)?|(?#simple syntax withbraces)\$\{(?:\g<name>(?<indices>\[(?:\g<index>|'(?:\\.|[^'\\])*'|"(?:\g<regex>|\\.|[^"\\])*")\])?|\g<complex>|\$\{\g<complex>\})\}|(?#complexsyntax)\{(?<complex>\$(?<segment>\g<name>(\g<indices>*|\(.*?\))?)(?:->\g<segment>)*|\$\g<complex>|\$\{\g<complex>\})\}))\{/
|
||||
|
||||
/(?<n>a|b|c)\g<n>*/
|
||||
abc
|
||||
accccbbb
|
||||
|
||||
/^X(?7)(a)(?|(b)|(q)(r)(s))(c)(d)(Y)/
|
||||
XYabcdY
|
||||
|
||||
/^X(?7)(a)(?|(b|(r)(s))|(q))(c)(d)(Y)/
|
||||
XYabcdY
|
||||
/(?<=b(?1)|zzz)(a)/
|
||||
xbaax
|
||||
xzzzax
|
||||
|
||||
/^X(?7)(a)(?|(b|(?|(r)|(t))(s))|(q))(c)(d)(Y)/
|
||||
XYabcdY
|
||||
/(a)(?<=b\1)/
|
||||
|
||||
/ End of testinput2 /
|
||||
/(a)(?<=b+(?1))/
|
||||
|
||||
/(a+)(?<=b(?1))/
|
||||
|
||||
/(a(?<=b(?1)))/
|
||||
|
||||
/(?<=b(?1))xyz/
|
||||
|
||||
/(?<=b(?1))xyz(b+)pqrstuvew/
|
||||
|
||||
/(a|bc)\1/SI
|
||||
|
||||
/(a|bc)\1{2,3}/SI
|
||||
|
||||
/(a|bc)(?1)/SI
|
||||
|
||||
/(a|b\1)(a|b\1)/SI
|
||||
|
||||
/(a|b\1){2}/SI
|
||||
|
||||
/(a|bbbb\1)(a|bbbb\1)/SI
|
||||
|
||||
/(a|bbbb\1){2}/SI
|
||||
|
||||
/^From +([^ ]+) +[a-zA-Z][a-zA-Z][a-zA-Z] +[a-zA-Z][a-zA-Z][a-zA-Z] +[0-9]?[0-9] +[0-9][0-9]:[0-9][0-9]/SI
|
||||
|
||||
/ (?: [\040\t] | \(
|
||||
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
|
||||
\) )* # optional leading comment
|
||||
(?: (?:
|
||||
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
|
||||
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
|
||||
|
|
||||
" (?: # opening quote...
|
||||
[^\\\x80-\xff\n\015"] # Anything except backslash and quote
|
||||
| # or
|
||||
\\ [^\x80-\xff] # Escaped something (something != CR)
|
||||
)* " # closing quote
|
||||
) # initial word
|
||||
(?: (?: [\040\t] | \(
|
||||
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
|
||||
\) )* \. (?: [\040\t] | \(
|
||||
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
|
||||
\) )* (?:
|
||||
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
|
||||
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
|
||||
|
|
||||
" (?: # opening quote...
|
||||
[^\\\x80-\xff\n\015"] # Anything except backslash and quote
|
||||
| # or
|
||||
\\ [^\x80-\xff] # Escaped something (something != CR)
|
||||
)* " # closing quote
|
||||
) )* # further okay, if led by a period
|
||||
(?: [\040\t] | \(
|
||||
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
|
||||
\) )* @ (?: [\040\t] | \(
|
||||
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
|
||||
\) )* (?:
|
||||
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
|
||||
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
|
||||
| \[ # [
|
||||
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
|
||||
\] # ]
|
||||
) # initial subdomain
|
||||
(?: #
|
||||
(?: [\040\t] | \(
|
||||
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
|
||||
\) )* \. # if led by a period...
|
||||
(?: [\040\t] | \(
|
||||
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
|
||||
\) )* (?:
|
||||
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
|
||||
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
|
||||
| \[ # [
|
||||
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
|
||||
\] # ]
|
||||
) # ...further okay
|
||||
)*
|
||||
# address
|
||||
| # or
|
||||
(?:
|
||||
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
|
||||
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
|
||||
|
|
||||
" (?: # opening quote...
|
||||
[^\\\x80-\xff\n\015"] # Anything except backslash and quote
|
||||
| # or
|
||||
\\ [^\x80-\xff] # Escaped something (something != CR)
|
||||
)* " # closing quote
|
||||
) # one word, optionally followed by....
|
||||
(?:
|
||||
[^()<>@,;:".\\\[\]\x80-\xff\000-\010\012-\037] | # atom and space parts, or...
|
||||
\(
|
||||
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
|
||||
\) | # comments, or...
|
||||
|
||||
" (?: # opening quote...
|
||||
[^\\\x80-\xff\n\015"] # Anything except backslash and quote
|
||||
| # or
|
||||
\\ [^\x80-\xff] # Escaped something (something != CR)
|
||||
)* " # closing quote
|
||||
# quoted strings
|
||||
)*
|
||||
< (?: [\040\t] | \(
|
||||
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
|
||||
\) )* # leading <
|
||||
(?: @ (?: [\040\t] | \(
|
||||
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
|
||||
\) )* (?:
|
||||
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
|
||||
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
|
||||
| \[ # [
|
||||
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
|
||||
\] # ]
|
||||
) # initial subdomain
|
||||
(?: #
|
||||
(?: [\040\t] | \(
|
||||
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
|
||||
\) )* \. # if led by a period...
|
||||
(?: [\040\t] | \(
|
||||
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
|
||||
\) )* (?:
|
||||
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
|
||||
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
|
||||
| \[ # [
|
||||
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
|
||||
\] # ]
|
||||
) # ...further okay
|
||||
)*
|
||||
|
||||
(?: (?: [\040\t] | \(
|
||||
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
|
||||
\) )* , (?: [\040\t] | \(
|
||||
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
|
||||
\) )* @ (?: [\040\t] | \(
|
||||
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
|
||||
\) )* (?:
|
||||
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
|
||||
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
|
||||
| \[ # [
|
||||
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
|
||||
\] # ]
|
||||
) # initial subdomain
|
||||
(?: #
|
||||
(?: [\040\t] | \(
|
||||
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
|
||||
\) )* \. # if led by a period...
|
||||
(?: [\040\t] | \(
|
||||
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
|
||||
\) )* (?:
|
||||
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
|
||||
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
|
||||
| \[ # [
|
||||
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
|
||||
\] # ]
|
||||
) # ...further okay
|
||||
)*
|
||||
)* # further okay, if led by comma
|
||||
: # closing colon
|
||||
(?: [\040\t] | \(
|
||||
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
|
||||
\) )* )? # optional route
|
||||
(?:
|
||||
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
|
||||
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
|
||||
|
|
||||
" (?: # opening quote...
|
||||
[^\\\x80-\xff\n\015"] # Anything except backslash and quote
|
||||
| # or
|
||||
\\ [^\x80-\xff] # Escaped something (something != CR)
|
||||
)* " # closing quote
|
||||
) # initial word
|
||||
(?: (?: [\040\t] | \(
|
||||
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
|
||||
\) )* \. (?: [\040\t] | \(
|
||||
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
|
||||
\) )* (?:
|
||||
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
|
||||
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
|
||||
|
|
||||
" (?: # opening quote...
|
||||
[^\\\x80-\xff\n\015"] # Anything except backslash and quote
|
||||
| # or
|
||||
\\ [^\x80-\xff] # Escaped something (something != CR)
|
||||
)* " # closing quote
|
||||
) )* # further okay, if led by a period
|
||||
(?: [\040\t] | \(
|
||||
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
|
||||
\) )* @ (?: [\040\t] | \(
|
||||
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
|
||||
\) )* (?:
|
||||
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
|
||||
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
|
||||
| \[ # [
|
||||
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
|
||||
\] # ]
|
||||
) # initial subdomain
|
||||
(?: #
|
||||
(?: [\040\t] | \(
|
||||
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
|
||||
\) )* \. # if led by a period...
|
||||
(?: [\040\t] | \(
|
||||
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
|
||||
\) )* (?:
|
||||
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters...
|
||||
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
|
||||
| \[ # [
|
||||
(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff
|
||||
\] # ]
|
||||
) # ...further okay
|
||||
)*
|
||||
# address spec
|
||||
(?: [\040\t] | \(
|
||||
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
|
||||
\) )* > # trailing >
|
||||
# name and address
|
||||
) (?: [\040\t] | \(
|
||||
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
|
||||
\) )* # optional trailing comment
|
||||
/xSI
|
||||
|
||||
/<tr([\w\W\s\d][^<>]{0,})><TD([\w\W\s\d][^<>]{0,})>([\d]{0,}\.)(.*)((<BR>([\w\W\s\d][^<>]{0,})|[\s]{0,}))<\/a><\/TD><TD([\w\W\s\d][^<>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD><TD([\w\W\s\d][^<>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD><\/TR>/isIS
|
||||
|
||||
"(?>.*/)foo"SI
|
||||
|
||||
/(?(?=[^a-z]+[a-z]) \d{2}-[a-z]{3}-\d{2} | \d{2}-\d{2}-\d{2} ) /xSI
|
||||
|
||||
/(?:(?:(?:(?:(?:(?:(?:(?:(?:(a|b|c))))))))))/iSI
|
||||
|
||||
/(?:c|d)(?:)(?:aaaaaaaa(?:)(?:bbbbbbbb)(?:bbbbbbbb(?:))(?:bbbbbbbb(?:)(?:bbbbbbbb)))/SI
|
||||
|
||||
/<a[\s]+href[\s]*=[\s]* # find <a href=
|
||||
([\"\'])? # find single or double quote
|
||||
(?(1) (.*?)\1 | ([^\s]+)) # if quote found, match up to next matching
|
||||
# quote, otherwise match up to next space
|
||||
/isxSI
|
||||
|
||||
/^(?!:) # colon disallowed at start
|
||||
(?: # start of item
|
||||
(?: [0-9a-f]{1,4} | # 1-4 hex digits or
|
||||
(?(1)0 | () ) ) # if null previously matched, fail; else null
|
||||
: # followed by colon
|
||||
){1,7} # end item; 1-7 of them required
|
||||
[0-9a-f]{1,4} $ # final hex number at end of string
|
||||
(?(1)|.) # check that there was an empty component
|
||||
/xiIS
|
||||
|
||||
/(?|(?<a>A)|(?<a>B))/I
|
||||
AB\Ca
|
||||
BA\Ca
|
||||
|
||||
/(?|(?<a>A)|(?<b>B))/
|
||||
|
||||
/(?:a(?<quote> (?<apostrophe>')|(?<realquote>")) |
|
||||
b(?<quote> (?<apostrophe>')|(?<realquote>")) )
|
||||
(?('quote')[a-z]+|[0-9]+)/JIx
|
||||
a"aaaaa
|
||||
b"aaaaa
|
||||
** Failers
|
||||
b"11111
|
||||
a"11111
|
||||
|
||||
/^(?|(a)(b)(c)(?<D>d)|(?<D>e)) (?('D')X|Y)/JDZx
|
||||
abcdX
|
||||
eX
|
||||
** Failers
|
||||
abcdY
|
||||
ey
|
||||
|
||||
/(?<A>a) (b)(c) (?<A>d (?(R&A)$ | (?4)) )/JDZx
|
||||
abcdd
|
||||
** Failers
|
||||
abcdde
|
||||
|
||||
/abcd*/
|
||||
xxxxabcd\P
|
||||
xxxxabcd\P\P
|
||||
|
||||
/abcd*/i
|
||||
xxxxabcd\P
|
||||
xxxxabcd\P\P
|
||||
XXXXABCD\P
|
||||
XXXXABCD\P\P
|
||||
|
||||
/abc\d*/
|
||||
xxxxabc1\P
|
||||
xxxxabc1\P\P
|
||||
|
||||
/(a)bc\1*/
|
||||
xxxxabca\P
|
||||
xxxxabca\P\P
|
||||
|
||||
/abc[de]*/
|
||||
xxxxabcde\P
|
||||
xxxxabcde\P\P
|
||||
|
||||
/-- This is not in the Perl 5.10 test because Perl seems currently to be broken
|
||||
and not behaving as specified in that it *does* bumpalong after hitting
|
||||
(*COMMIT). --/
|
||||
|
||||
/(?1)(A(*COMMIT)|B)D/
|
||||
ABD
|
||||
XABD
|
||||
BAD
|
||||
ABXABD
|
||||
** Failers
|
||||
ABX
|
||||
BAXBAD
|
||||
|
||||
/(\3)(\1)(a)/<JS>
|
||||
cat
|
||||
|
||||
/(\3)(\1)(a)/SI<JS>
|
||||
cat
|
||||
|
||||
/(\3)(\1)(a)/SI
|
||||
cat
|
||||
|
||||
/-- End of testinput2 --/
|
||||
|
6
ext/pcre/pcrelib/testdata/testinput3
vendored
6
ext/pcre/pcrelib/testdata/testinput3
vendored
@ -1,3 +1,7 @@
|
||||
/-- This set of tests checks local-specific features, using the fr_FR locale.
|
||||
It is not Perl-compatible. There is different version called wintestinput3
|
||||
f or use on Windows, where the locale is called "french". --/
|
||||
|
||||
/^[\w]+/
|
||||
*** Failers
|
||||
École
|
||||
@ -88,4 +92,4 @@
|
||||
|
||||
/[[:alpha:]][[:lower:]][[:upper:]]/DZLfr_FR
|
||||
|
||||
/ End of testinput3 /
|
||||
/-- End of testinput3 --/
|
||||
|
27
ext/pcre/pcrelib/testdata/testinput4
vendored
27
ext/pcre/pcrelib/testdata/testinput4
vendored
@ -1,7 +1,6 @@
|
||||
/-- Do not use the \x{} construct except with patterns that have the --/
|
||||
/-- /8 option set, because PCRE doesn't recognize them as UTF-8 unless --/
|
||||
/-- that option is set. However, the latest Perls recognize them always. --/
|
||||
|
||||
/-- This set of tests if for UTF-8 support, excluding Unicode properties. It is
|
||||
compatible with all versions of Perl 5. --/
|
||||
|
||||
/a.b/8
|
||||
acb
|
||||
a\x7fb
|
||||
@ -623,4 +622,22 @@
|
||||
|
||||
/(?i)[\xc3\xa9\xc3\xbd]|[\xc3\xa9\xc3\xbdA]/8
|
||||
|
||||
/ End of testinput4 /
|
||||
/^[a\x{c0}]b/8
|
||||
\x{c0}b
|
||||
|
||||
/^([a\x{c0}]*?)aa/8
|
||||
a\x{c0}aaaa/
|
||||
|
||||
/^([a\x{c0}]*?)aa/8
|
||||
a\x{c0}aaaa/
|
||||
a\x{c0}a\x{c0}aaa/
|
||||
|
||||
/^([a\x{c0}]*)aa/8
|
||||
a\x{c0}aaaa/
|
||||
a\x{c0}a\x{c0}aaa/
|
||||
|
||||
/^([a\x{c0}]*)a\x{c0}/8
|
||||
a\x{c0}aaaa/
|
||||
a\x{c0}a\x{c0}aaa/
|
||||
|
||||
/-- End of testinput4 --/
|
||||
|
307
ext/pcre/pcrelib/testdata/testinput5
vendored
307
ext/pcre/pcrelib/testdata/testinput5
vendored
@ -1,3 +1,6 @@
|
||||
/-- This set of tests checks the API, internals, and non-Perl stuff for UTF-8
|
||||
support, excluding Unicode properties. --/
|
||||
|
||||
/\x{100}/8DZ
|
||||
|
||||
/\x{1000}/8DZ
|
||||
@ -53,30 +56,6 @@
|
||||
/.{3,5}?/DZ8
|
||||
\x{212ab}\x{212ab}\x{212ab}\x{861}
|
||||
|
||||
/-- These tests are here rather than in testinput4 because Perl 5.6 has some
|
||||
problems with UTF-8 support, in the area of \x{..} where the value is < 255.
|
||||
It grumbles about invalid UTF-8 strings. --/
|
||||
|
||||
/^[a\x{c0}]b/8
|
||||
\x{c0}b
|
||||
|
||||
/^([a\x{c0}]*?)aa/8
|
||||
a\x{c0}aaaa/
|
||||
|
||||
/^([a\x{c0}]*?)aa/8
|
||||
a\x{c0}aaaa/
|
||||
a\x{c0}a\x{c0}aaa/
|
||||
|
||||
/^([a\x{c0}]*)aa/8
|
||||
a\x{c0}aaaa/
|
||||
a\x{c0}a\x{c0}aaa/
|
||||
|
||||
/^([a\x{c0}]*)a\x{c0}/8
|
||||
a\x{c0}aaaa/
|
||||
a\x{c0}a\x{c0}aaa/
|
||||
|
||||
/-- --/
|
||||
|
||||
/(?<=\C)X/8
|
||||
Should produce an error diagnostic
|
||||
|
||||
@ -485,4 +464,282 @@ can't tell the difference.) --/
|
||||
|
||||
/(*CRLF)(*UTF8)(*BSR_UNICODE)a\Rb/I
|
||||
|
||||
/ End of testinput5 /
|
||||
/Xa{2,4}b/8
|
||||
X\P
|
||||
Xa\P
|
||||
Xaa\P
|
||||
Xaaa\P
|
||||
Xaaaa\P
|
||||
|
||||
/Xa{2,4}?b/8
|
||||
X\P
|
||||
Xa\P
|
||||
Xaa\P
|
||||
Xaaa\P
|
||||
Xaaaa\P
|
||||
|
||||
/Xa{2,4}+b/8
|
||||
X\P
|
||||
Xa\P
|
||||
Xaa\P
|
||||
Xaaa\P
|
||||
Xaaaa\P
|
||||
|
||||
/X\x{123}{2,4}b/8
|
||||
X\P
|
||||
X\x{123}\P
|
||||
X\x{123}\x{123}\P
|
||||
X\x{123}\x{123}\x{123}\P
|
||||
X\x{123}\x{123}\x{123}\x{123}\P
|
||||
|
||||
/X\x{123}{2,4}?b/8
|
||||
X\P
|
||||
X\x{123}\P
|
||||
X\x{123}\x{123}\P
|
||||
X\x{123}\x{123}\x{123}\P
|
||||
X\x{123}\x{123}\x{123}\x{123}\P
|
||||
|
||||
/X\x{123}{2,4}+b/8
|
||||
X\P
|
||||
X\x{123}\P
|
||||
X\x{123}\x{123}\P
|
||||
X\x{123}\x{123}\x{123}\P
|
||||
X\x{123}\x{123}\x{123}\x{123}\P
|
||||
|
||||
/X\x{123}{2,4}b/8
|
||||
Xx\P
|
||||
X\x{123}x\P
|
||||
X\x{123}\x{123}x\P
|
||||
X\x{123}\x{123}\x{123}x\P
|
||||
X\x{123}\x{123}\x{123}\x{123}x\P
|
||||
|
||||
/X\x{123}{2,4}?b/8
|
||||
Xx\P
|
||||
X\x{123}x\P
|
||||
X\x{123}\x{123}x\P
|
||||
X\x{123}\x{123}\x{123}x\P
|
||||
X\x{123}\x{123}\x{123}\x{123}x\P
|
||||
|
||||
/X\x{123}{2,4}+b/8
|
||||
Xx\P
|
||||
X\x{123}x\P
|
||||
X\x{123}\x{123}x\P
|
||||
X\x{123}\x{123}\x{123}x\P
|
||||
X\x{123}\x{123}\x{123}\x{123}x\P
|
||||
|
||||
/X\d{2,4}b/8
|
||||
X\P
|
||||
X3\P
|
||||
X33\P
|
||||
X333\P
|
||||
X3333\P
|
||||
|
||||
/X\d{2,4}?b/8
|
||||
X\P
|
||||
X3\P
|
||||
X33\P
|
||||
X333\P
|
||||
X3333\P
|
||||
|
||||
/X\d{2,4}+b/8
|
||||
X\P
|
||||
X3\P
|
||||
X33\P
|
||||
X333\P
|
||||
X3333\P
|
||||
|
||||
/X\D{2,4}b/8
|
||||
X\P
|
||||
Xa\P
|
||||
Xaa\P
|
||||
Xaaa\P
|
||||
Xaaaa\P
|
||||
|
||||
/X\D{2,4}?b/8
|
||||
X\P
|
||||
Xa\P
|
||||
Xaa\P
|
||||
Xaaa\P
|
||||
Xaaaa\P
|
||||
|
||||
/X\D{2,4}+b/8
|
||||
X\P
|
||||
Xa\P
|
||||
Xaa\P
|
||||
Xaaa\P
|
||||
Xaaaa\P
|
||||
|
||||
/X\D{2,4}b/8
|
||||
X\P
|
||||
X\x{123}\P
|
||||
X\x{123}\x{123}\P
|
||||
X\x{123}\x{123}\x{123}\P
|
||||
X\x{123}\x{123}\x{123}\x{123}\P
|
||||
|
||||
/X\D{2,4}?b/8
|
||||
X\P
|
||||
X\x{123}\P
|
||||
X\x{123}\x{123}\P
|
||||
X\x{123}\x{123}\x{123}\P
|
||||
X\x{123}\x{123}\x{123}\x{123}\P
|
||||
|
||||
/X\D{2,4}+b/8
|
||||
X\P
|
||||
X\x{123}\P
|
||||
X\x{123}\x{123}\P
|
||||
X\x{123}\x{123}\x{123}\P
|
||||
X\x{123}\x{123}\x{123}\x{123}\P
|
||||
|
||||
/X[abc]{2,4}b/8
|
||||
X\P
|
||||
Xa\P
|
||||
Xaa\P
|
||||
Xaaa\P
|
||||
Xaaaa\P
|
||||
|
||||
/X[abc]{2,4}?b/8
|
||||
X\P
|
||||
Xa\P
|
||||
Xaa\P
|
||||
Xaaa\P
|
||||
Xaaaa\P
|
||||
|
||||
/X[abc]{2,4}+b/8
|
||||
X\P
|
||||
Xa\P
|
||||
Xaa\P
|
||||
Xaaa\P
|
||||
Xaaaa\P
|
||||
|
||||
/X[abc\x{123}]{2,4}b/8
|
||||
X\P
|
||||
X\x{123}\P
|
||||
X\x{123}\x{123}\P
|
||||
X\x{123}\x{123}\x{123}\P
|
||||
X\x{123}\x{123}\x{123}\x{123}\P
|
||||
|
||||
/X[abc\x{123}]{2,4}?b/8
|
||||
X\P
|
||||
X\x{123}\P
|
||||
X\x{123}\x{123}\P
|
||||
X\x{123}\x{123}\x{123}\P
|
||||
X\x{123}\x{123}\x{123}\x{123}\P
|
||||
|
||||
/X[abc\x{123}]{2,4}+b/8
|
||||
X\P
|
||||
X\x{123}\P
|
||||
X\x{123}\x{123}\P
|
||||
X\x{123}\x{123}\x{123}\P
|
||||
X\x{123}\x{123}\x{123}\x{123}\P
|
||||
|
||||
/X[^a]{2,4}b/8
|
||||
X\P
|
||||
Xz\P
|
||||
Xzz\P
|
||||
Xzzz\P
|
||||
Xzzzz\P
|
||||
|
||||
/X[^a]{2,4}?b/8
|
||||
X\P
|
||||
Xz\P
|
||||
Xzz\P
|
||||
Xzzz\P
|
||||
Xzzzz\P
|
||||
|
||||
/X[^a]{2,4}+b/8
|
||||
X\P
|
||||
Xz\P
|
||||
Xzz\P
|
||||
Xzzz\P
|
||||
Xzzzz\P
|
||||
|
||||
/X[^a]{2,4}b/8
|
||||
X\P
|
||||
X\x{123}\P
|
||||
X\x{123}\x{123}\P
|
||||
X\x{123}\x{123}\x{123}\P
|
||||
X\x{123}\x{123}\x{123}\x{123}\P
|
||||
|
||||
/X[^a]{2,4}?b/8
|
||||
X\P
|
||||
X\x{123}\P
|
||||
X\x{123}\x{123}\P
|
||||
X\x{123}\x{123}\x{123}\P
|
||||
X\x{123}\x{123}\x{123}\x{123}\P
|
||||
|
||||
/X[^a]{2,4}+b/8
|
||||
X\P
|
||||
X\x{123}\P
|
||||
X\x{123}\x{123}\P
|
||||
X\x{123}\x{123}\x{123}\P
|
||||
X\x{123}\x{123}\x{123}\x{123}\P
|
||||
|
||||
/(Y)X\1{2,4}b/8
|
||||
YX\P
|
||||
YXY\P
|
||||
YXYY\P
|
||||
YXYYY\P
|
||||
YXYYYY\P
|
||||
|
||||
/(Y)X\1{2,4}?b/8
|
||||
YX\P
|
||||
YXY\P
|
||||
YXYY\P
|
||||
YXYYY\P
|
||||
YXYYYY\P
|
||||
|
||||
/(Y)X\1{2,4}+b/8
|
||||
YX\P
|
||||
YXY\P
|
||||
YXYY\P
|
||||
YXYYY\P
|
||||
YXYYYY\P
|
||||
|
||||
/(\x{123})X\1{2,4}b/8
|
||||
\x{123}X\P
|
||||
\x{123}X\x{123}\P
|
||||
\x{123}X\x{123}\x{123}\P
|
||||
\x{123}X\x{123}\x{123}\x{123}\P
|
||||
\x{123}X\x{123}\x{123}\x{123}\x{123}\P
|
||||
|
||||
/(\x{123})X\1{2,4}?b/8
|
||||
\x{123}X\P
|
||||
\x{123}X\x{123}\P
|
||||
\x{123}X\x{123}\x{123}\P
|
||||
\x{123}X\x{123}\x{123}\x{123}\P
|
||||
\x{123}X\x{123}\x{123}\x{123}\x{123}\P
|
||||
|
||||
/(\x{123})X\1{2,4}+b/8
|
||||
\x{123}X\P
|
||||
\x{123}X\x{123}\P
|
||||
\x{123}X\x{123}\x{123}\P
|
||||
\x{123}X\x{123}\x{123}\x{123}\P
|
||||
\x{123}X\x{123}\x{123}\x{123}\x{123}\P
|
||||
|
||||
/\bthe cat\b/8
|
||||
the cat\P
|
||||
the cat\P\P
|
||||
|
||||
/abcd*/8
|
||||
xxxxabcd\P
|
||||
xxxxabcd\P\P
|
||||
|
||||
/abcd*/i8
|
||||
xxxxabcd\P
|
||||
xxxxabcd\P\P
|
||||
XXXXABCD\P
|
||||
XXXXABCD\P\P
|
||||
|
||||
/abc\d*/8
|
||||
xxxxabc1\P
|
||||
xxxxabc1\P\P
|
||||
|
||||
/(a)bc\1*/8
|
||||
xxxxabca\P
|
||||
xxxxabca\P\P
|
||||
|
||||
/abc[de]*/8
|
||||
xxxxabcde\P
|
||||
xxxxabcde\P\P
|
||||
|
||||
/-- End of testinput5 --/
|
||||
|
205
ext/pcre/pcrelib/testdata/testinput6
vendored
205
ext/pcre/pcrelib/testdata/testinput6
vendored
@ -1,3 +1,7 @@
|
||||
/-- This set of tests is for Unicode property support. It is compatible with
|
||||
Perl 5.10, but not 5.8 because it tests some extra properties that are
|
||||
not in the earlier release. --/
|
||||
|
||||
/^\pC\pL\pM\pN\pP\pS\pZ</8
|
||||
\x7f\x{c0}\x{30f}\x{660}\x{66c}\x{f01}\x{1680}<
|
||||
\np\x{300}9!\$ <
|
||||
@ -60,11 +64,6 @@
|
||||
** Failers
|
||||
\x{09f}
|
||||
|
||||
/^\p{Cs}/8
|
||||
\?\x{dfff}
|
||||
** Failers
|
||||
\x{09f}
|
||||
|
||||
/^\p{Ll}/8
|
||||
a
|
||||
** Failers
|
||||
@ -199,13 +198,6 @@
|
||||
}
|
||||
\x{f3b}
|
||||
|
||||
/^\p{Sc}+/8
|
||||
$\x{a2}\x{a3}\x{a4}\x{a5}\x{a6}
|
||||
\x{9f2}
|
||||
** Failers
|
||||
X
|
||||
\x{2c2}
|
||||
|
||||
/^\p{Sk}/8
|
||||
\x{2c2}
|
||||
** Failers
|
||||
@ -237,17 +229,6 @@
|
||||
X
|
||||
\x{2028}
|
||||
|
||||
/^\p{Zs}/8
|
||||
\ \
|
||||
\x{a0}
|
||||
\x{1680}
|
||||
\x{180e}
|
||||
\x{2000}
|
||||
\x{2001}
|
||||
** Failers
|
||||
\x{2028}
|
||||
\x{200d}
|
||||
|
||||
/\p{Nd}+(..)/8
|
||||
\x{660}\x{661}\x{662}ABC
|
||||
|
||||
@ -291,23 +272,6 @@
|
||||
** Failers
|
||||
\x{660}\x{661}\x{662}ABC
|
||||
|
||||
/\p{Lu}/8i
|
||||
A
|
||||
a\x{10a0}B
|
||||
** Failers
|
||||
a
|
||||
\x{1d00}
|
||||
|
||||
/\p{^Lu}/8i
|
||||
1234
|
||||
** Failers
|
||||
ABC
|
||||
|
||||
/\P{Lu}/8i
|
||||
1234
|
||||
** Failers
|
||||
ABC
|
||||
|
||||
/(?<=A\p{Nd})XYZ/8
|
||||
A2XYZ
|
||||
123A5XYZPQR
|
||||
@ -323,26 +287,6 @@
|
||||
** Failers
|
||||
WXYZ
|
||||
|
||||
/[\p{L}]/DZ
|
||||
|
||||
/[\p{^L}]/DZ
|
||||
|
||||
/[\P{L}]/DZ
|
||||
|
||||
/[\P{^L}]/DZ
|
||||
|
||||
/[abc\p{L}\x{0660}]/8DZ
|
||||
|
||||
/[\p{Nd}]/8DZ
|
||||
1234
|
||||
|
||||
/[\p{Nd}+-]+/8DZ
|
||||
1234
|
||||
12-34
|
||||
12+\x{661}-34
|
||||
** Failers
|
||||
abcd
|
||||
|
||||
/[\P{Nd}]+/8
|
||||
abcd
|
||||
** Failers
|
||||
@ -394,20 +338,6 @@
|
||||
** Failers
|
||||
ABC
|
||||
|
||||
/\p{Ll}/8i
|
||||
a
|
||||
Az
|
||||
** Failers
|
||||
ABC
|
||||
|
||||
/^\x{c0}$/8i
|
||||
\x{c0}
|
||||
\x{e0}
|
||||
|
||||
/^\x{e0}$/8i
|
||||
\x{c0}
|
||||
\x{e0}
|
||||
|
||||
/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/8
|
||||
A\x{391}\x{10427}\x{ff3a}\x{1fb0}
|
||||
** Failers
|
||||
@ -425,14 +355,6 @@
|
||||
A\x{391}\x{10427}\x{ff5a}\x{1fb0}
|
||||
A\x{391}\x{10427}\x{ff3a}\x{1fb8}
|
||||
|
||||
/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/8iDZ
|
||||
|
||||
/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/8DZ
|
||||
|
||||
/AB\x{1fb0}/8DZ
|
||||
|
||||
/AB\x{1fb0}/8DZi
|
||||
|
||||
/\x{391}+/8i
|
||||
\x{391}\x{3b1}\x{3b1}\x{3b1}\x{391}
|
||||
|
||||
@ -448,35 +370,6 @@
|
||||
\x{3b1}
|
||||
\x{ff5a}
|
||||
|
||||
/[\x{c0}\x{391}]/8i
|
||||
\x{c0}
|
||||
\x{e0}
|
||||
|
||||
/[\x{105}-\x{109}]/8iDZ
|
||||
\x{104}
|
||||
\x{105}
|
||||
\x{109}
|
||||
** Failers
|
||||
\x{100}
|
||||
\x{10a}
|
||||
|
||||
/[z-\x{100}]/8iDZ
|
||||
Z
|
||||
z
|
||||
\x{39c}
|
||||
\x{178}
|
||||
|
|
||||
\x{80}
|
||||
\x{ff}
|
||||
\x{100}
|
||||
\x{101}
|
||||
** Failers
|
||||
\x{102}
|
||||
Y
|
||||
y
|
||||
|
||||
/[z-\x{100}]/8DZi
|
||||
|
||||
/^\X/8
|
||||
A
|
||||
A\x{300}BC
|
||||
@ -747,31 +640,9 @@
|
||||
/([\pL]=(abc))*X/
|
||||
L=abcX
|
||||
|
||||
/The next two should be Perl-compatible, but it fails to match \x{e0}. PCRE
|
||||
will match it only with UCP support, because without that it has no notion
|
||||
of case for anything other than the ASCII letters. /
|
||||
|
||||
/((?i)[\x{c0}])/8
|
||||
\x{c0}
|
||||
\x{e0}
|
||||
|
||||
/(?i:[\x{c0}])/8
|
||||
\x{c0}
|
||||
\x{e0}
|
||||
|
||||
/^\p{Balinese}\p{Cuneiform}\p{Nko}\p{Phags_Pa}\p{Phoenician}/8
|
||||
\x{1b00}\x{12000}\x{7c0}\x{a840}\x{10900}
|
||||
|
||||
/The next two are special cases where the lengths of the different cases of the
|
||||
same character differ. The first went wrong with heap frame storage; the 2nd
|
||||
was broken in all cases./
|
||||
|
||||
/^\x{023a}+?(\x{0130}+)/8i
|
||||
\x{023a}\x{2c65}\x{0130}
|
||||
|
||||
/^\x{023a}+([^X])/8i
|
||||
\x{023a}\x{2c65}X
|
||||
|
||||
/Check property support in non-UTF-8 mode/
|
||||
|
||||
/\p{L}{4}/
|
||||
@ -790,48 +661,6 @@ was broken in all cases./
|
||||
/[\PPP\x8a]{1,}\x80/
|
||||
A\x80
|
||||
|
||||
/(?:[\PPa*]*){8,}/
|
||||
|
||||
/[\P{Any}]/BZ
|
||||
|
||||
/[\P{Any}\E]/BZ
|
||||
|
||||
/(\P{Yi}+\277)/
|
||||
|
||||
/(\P{Yi}+\277)?/
|
||||
|
||||
/(?<=\P{Yi}{3}A)X/
|
||||
|
||||
/\p{Yi}+(\P{Yi}+)(?1)/
|
||||
|
||||
/(\P{Yi}{2}\277)?/
|
||||
|
||||
/[\P{Yi}A]/
|
||||
|
||||
/[\P{Yi}\P{Yi}\P{Yi}A]/
|
||||
|
||||
/[^\P{Yi}A]/
|
||||
|
||||
/[^\P{Yi}\P{Yi}\P{Yi}A]/
|
||||
|
||||
/(\P{Yi}*\277)*/
|
||||
|
||||
/(\P{Yi}*?\277)*/
|
||||
|
||||
/(\p{Yi}*+\277)*/
|
||||
|
||||
/(\P{Yi}?\277)*/
|
||||
|
||||
/(\P{Yi}??\277)*/
|
||||
|
||||
/(\p{Yi}?+\277)*/
|
||||
|
||||
/(\P{Yi}{0,3}\277)*/
|
||||
|
||||
/(\P{Yi}{0,3}?\277)*/
|
||||
|
||||
/(\p{Yi}{0,3}+\277)*/
|
||||
|
||||
/^[\p{Arabic}]/8
|
||||
\x{60e}
|
||||
\x{656}
|
||||
@ -895,24 +724,6 @@ was broken in all cases./
|
||||
\x{1049f}
|
||||
\x{104aa}
|
||||
|
||||
/\p{Zl}{2,3}+/8BZ
|
||||
\xe2\x80\xa8\xe2\x80\xa8
|
||||
\x{2028}\x{2028}\x{2028}
|
||||
|
||||
/\p{Zl}/8BZ
|
||||
|
||||
/\p{Lu}{3}+/8BZ
|
||||
|
||||
/\pL{2}+/8BZ
|
||||
|
||||
/\p{Cc}{2}+/8BZ
|
||||
|
||||
/\x{c0}+\x{116}+/8i
|
||||
\x{c0}\x{e0}\x{116}\x{117}
|
||||
|
||||
/[\x{c0}\x{116}]+/8i
|
||||
\x{c0}\x{e0}\x{116}\x{117}
|
||||
|
||||
/\p{Carian}\p{Cham}\p{Kayah_Li}\p{Lepcha}\p{Lycian}\p{Lydian}\p{Ol_Chiki}\p{Rejang}\p{Saurashtra}\p{Sundanese}\p{Vai}/8
|
||||
\x{102A4}\x{AA52}\x{A91D}\x{1C46}\x{10283}\x{1092E}\x{1C6B}\x{A93B}\x{A8BF}\x{1BA0}\x{A50A}====
|
||||
|
||||
@ -931,12 +742,6 @@ was broken in all cases./
|
||||
aa
|
||||
aA
|
||||
|
||||
/(\x{de})\1/8i
|
||||
\x{de}\x{de}
|
||||
\x{de}\x{fe}
|
||||
\x{fe}\x{fe}
|
||||
\x{fe}\x{de}
|
||||
|
||||
/(\x{10a})\1/8i
|
||||
\x{10a}\x{10a}
|
||||
\x{10a}\x{10b}
|
||||
@ -951,4 +756,4 @@ was broken in all cases./
|
||||
/[\p{Lu}\x20]+/
|
||||
\x41\x20\x50\xC2\x54\xC9\x20\x54\x4F\x44\x41\x59
|
||||
|
||||
/ End of testinput6 /
|
||||
/-- End of testinput6 --/
|
||||
|
123
ext/pcre/pcrelib/testdata/testinput7
vendored
123
ext/pcre/pcrelib/testdata/testinput7
vendored
@ -1,3 +1,6 @@
|
||||
/-- This set of tests check the DFA matching functionality of pcre_dfa_exec().
|
||||
The -dfa flag must be used with pcretest when running it. --/
|
||||
|
||||
/abc/
|
||||
abc
|
||||
|
||||
@ -4421,4 +4424,122 @@
|
||||
"ab"
|
||||
\C-"ab"
|
||||
|
||||
/ End of testinput7 /
|
||||
/\d+X|9+Y/
|
||||
++++123999\P
|
||||
++++123999Y\P
|
||||
|
||||
/Z(*F)/
|
||||
Z\P
|
||||
ZA\P
|
||||
|
||||
/Z(?!)/
|
||||
Z\P
|
||||
ZA\P
|
||||
|
||||
/dog(sbody)?/
|
||||
dogs\P
|
||||
dogs\P\P
|
||||
|
||||
/dog(sbody)??/
|
||||
dogs\P
|
||||
dogs\P\P
|
||||
|
||||
/dog|dogsbody/
|
||||
dogs\P
|
||||
dogs\P\P
|
||||
|
||||
/dogsbody|dog/
|
||||
dogs\P
|
||||
dogs\P\P
|
||||
|
||||
/Z(*F)Q|ZXY/
|
||||
Z\P
|
||||
ZA\P
|
||||
X\P
|
||||
|
||||
/\bthe cat\b/
|
||||
the cat\P
|
||||
the cat\P\P
|
||||
|
||||
/dog(sbody)?/
|
||||
dogs\D\P
|
||||
body\D\R
|
||||
|
||||
/dog(sbody)?/
|
||||
dogs\D\P\P
|
||||
body\D\R
|
||||
|
||||
/abc/
|
||||
abc\P
|
||||
abc\P\P
|
||||
|
||||
/abc\K123/
|
||||
xyzabc123pqr
|
||||
|
||||
/(?<=abc)123/
|
||||
xyzabc123pqr
|
||||
xyzabc12\P
|
||||
xyzabc12\P\P
|
||||
|
||||
/\babc\b/
|
||||
+++abc+++
|
||||
+++ab\P
|
||||
+++ab\P\P
|
||||
|
||||
/(?=C)/g+
|
||||
ABCDECBA
|
||||
|
||||
/(abc|def|xyz)/I
|
||||
terhjk;abcdaadsfe
|
||||
the quick xyz brown fox
|
||||
\Yterhjk;abcdaadsfe
|
||||
\Ythe quick xyz brown fox
|
||||
** Failers
|
||||
thejk;adlfj aenjl;fda asdfasd ehj;kjxyasiupd
|
||||
\Ythejk;adlfj aenjl;fda asdfasd ehj;kjxyasiupd
|
||||
|
||||
/(abc|def|xyz)/SI
|
||||
terhjk;abcdaadsfe
|
||||
the quick xyz brown fox
|
||||
\Yterhjk;abcdaadsfe
|
||||
\Ythe quick xyz brown fox
|
||||
** Failers
|
||||
thejk;adlfj aenjl;fda asdfasd ehj;kjxyasiupd
|
||||
\Ythejk;adlfj aenjl;fda asdfasd ehj;kjxyasiupd
|
||||
|
||||
/abcd*/+
|
||||
xxxxabcd\P
|
||||
xxxxabcd\P\P
|
||||
dddxxx\R
|
||||
xxxxabcd\P\P
|
||||
xxx\R
|
||||
|
||||
/abcd*/i
|
||||
xxxxabcd\P
|
||||
xxxxabcd\P\P
|
||||
XXXXABCD\P
|
||||
XXXXABCD\P\P
|
||||
|
||||
/abc\d*/
|
||||
xxxxabc1\P
|
||||
xxxxabc1\P\P
|
||||
|
||||
/abc[de]*/
|
||||
xxxxabcde\P
|
||||
xxxxabcde\P\P
|
||||
|
||||
/(?:(?1)|B)(A(*F)|C)/
|
||||
ABCD
|
||||
CCD
|
||||
** Failers
|
||||
CAD
|
||||
|
||||
/^(?:(?1)|B)(A(*F)|C)/
|
||||
CCD
|
||||
BCD
|
||||
** Failers
|
||||
ABCD
|
||||
CAD
|
||||
BAD
|
||||
|
||||
/-- End of testinput7 --/
|
||||
|
26
ext/pcre/pcrelib/testdata/testinput8
vendored
26
ext/pcre/pcrelib/testdata/testinput8
vendored
@ -1,6 +1,6 @@
|
||||
/-- Do not use the \x{} construct except with patterns that have the --/
|
||||
/-- /8 option set, because PCRE doesn't recognize them as UTF-8 unless --/
|
||||
/-- that option is set. However, the latest Perls recognize them always. --/
|
||||
/-- This set of tests checks UTF-8 support with the DFA matching functionality
|
||||
of pcre_dfa_exec(). The -dfa flag must be used with pcretest when running
|
||||
it. --/
|
||||
|
||||
/\x{100}ab/8
|
||||
\x{100}ab
|
||||
@ -667,4 +667,22 @@
|
||||
/X/8f<any>
|
||||
A\x{1ec5}ABCXYZ
|
||||
|
||||
/ End of testinput 8 /
|
||||
/abcd*/8
|
||||
xxxxabcd\P
|
||||
xxxxabcd\P\P
|
||||
|
||||
/abcd*/i8
|
||||
xxxxabcd\P
|
||||
xxxxabcd\P\P
|
||||
XXXXABCD\P
|
||||
XXXXABCD\P\P
|
||||
|
||||
/abc\d*/8
|
||||
xxxxabc1\P
|
||||
xxxxabc1\P\P
|
||||
|
||||
/abc[de]*/8
|
||||
xxxxabcde\P
|
||||
xxxxabcde\P\P
|
||||
|
||||
/-- End of testinput8 --/
|
||||
|
6
ext/pcre/pcrelib/testdata/testinput9
vendored
6
ext/pcre/pcrelib/testdata/testinput9
vendored
@ -1,3 +1,7 @@
|
||||
/-- This set of tests check Unicode property support with the DFA matching
|
||||
functionality of pcre_dfa_exec(). The -dfa flag must be used with pcretest
|
||||
when running it. --/
|
||||
|
||||
/\pL\P{Nd}/8
|
||||
AB
|
||||
*** Failers
|
||||
@ -843,4 +847,4 @@
|
||||
** Failers
|
||||
\x{1d79}\x{a77d}
|
||||
|
||||
/ End /
|
||||
/-- End of testinput9 --/
|
||||
|
5
ext/pcre/pcrelib/testdata/testoutput1
vendored
5
ext/pcre/pcrelib/testdata/testoutput1
vendored
@ -1,3 +1,6 @@
|
||||
/-- This set of tests is for features that are compatible with all versions of
|
||||
Perl 5, in non-UTF-8 mode. --/
|
||||
|
||||
/the quick brown fox/
|
||||
the quick brown fox
|
||||
0: the quick brown fox
|
||||
@ -6646,4 +6649,4 @@ No match
|
||||
0: %ab%
|
||||
1:
|
||||
|
||||
/ End of testinput1 /
|
||||
/-- End of testinput1 --/
|
||||
|
2
ext/pcre/pcrelib/testdata/testoutput10
vendored
2
ext/pcre/pcrelib/testdata/testoutput10
vendored
@ -666,4 +666,4 @@ Memory allocation (code space): 40
|
||||
39 End
|
||||
------------------------------------------------------------------
|
||||
|
||||
/ End of testinput10 /
|
||||
/-- End of testinput10 --/
|
||||
|
1917
ext/pcre/pcrelib/testdata/testoutput2
vendored
1917
ext/pcre/pcrelib/testdata/testoutput2
vendored
File diff suppressed because it is too large
Load Diff
8
ext/pcre/pcrelib/testdata/testoutput3
vendored
8
ext/pcre/pcrelib/testdata/testoutput3
vendored
@ -1,3 +1,7 @@
|
||||
/-- This set of tests checks local-specific features, using the fr_FR locale.
|
||||
It is not Perl-compatible. There is different version called wintestinput3
|
||||
f or use on Windows, where the locale is called "french". --/
|
||||
|
||||
/^[\w]+/
|
||||
*** Failers
|
||||
No match
|
||||
@ -83,6 +87,7 @@ Capturing subpattern count = 0
|
||||
No options
|
||||
No first char
|
||||
No need char
|
||||
Subject length lower bound = 1
|
||||
Starting byte set: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
|
||||
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
|
||||
|
||||
@ -91,6 +96,7 @@ Capturing subpattern count = 0
|
||||
No options
|
||||
No first char
|
||||
No need char
|
||||
Subject length lower bound = 1
|
||||
Starting byte set: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
|
||||
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
|
||||
ª µ º À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â
|
||||
@ -160,4 +166,4 @@ No options
|
||||
No first char
|
||||
No need char
|
||||
|
||||
/ End of testinput3 /
|
||||
/-- End of testinput3 --/
|
||||
|
44
ext/pcre/pcrelib/testdata/testoutput4
vendored
44
ext/pcre/pcrelib/testdata/testoutput4
vendored
@ -1,9 +1,6 @@
|
||||
/-- Do not use the \x{} construct except with patterns that have the --/
|
||||
/-- /8 option set, because PCRE doesn't recognize them as UTF-8 unless --/
|
||||
No match
|
||||
/-- that option is set. However, the latest Perls recognize them always. --/
|
||||
No match
|
||||
|
||||
/-- This set of tests if for UTF-8 support, excluding Unicode properties. It is
|
||||
compatible with all versions of Perl 5. --/
|
||||
|
||||
/a.b/8
|
||||
acb
|
||||
0: acb
|
||||
@ -1089,4 +1086,37 @@ No match
|
||||
|
||||
/(?i)[\xc3\xa9\xc3\xbd]|[\xc3\xa9\xc3\xbdA]/8
|
||||
|
||||
/ End of testinput4 /
|
||||
/^[a\x{c0}]b/8
|
||||
\x{c0}b
|
||||
0: \x{c0}b
|
||||
|
||||
/^([a\x{c0}]*?)aa/8
|
||||
a\x{c0}aaaa/
|
||||
0: a\x{c0}aa
|
||||
1: a\x{c0}
|
||||
|
||||
/^([a\x{c0}]*?)aa/8
|
||||
a\x{c0}aaaa/
|
||||
0: a\x{c0}aa
|
||||
1: a\x{c0}
|
||||
a\x{c0}a\x{c0}aaa/
|
||||
0: a\x{c0}a\x{c0}aa
|
||||
1: a\x{c0}a\x{c0}
|
||||
|
||||
/^([a\x{c0}]*)aa/8
|
||||
a\x{c0}aaaa/
|
||||
0: a\x{c0}aaaa
|
||||
1: a\x{c0}aa
|
||||
a\x{c0}a\x{c0}aaa/
|
||||
0: a\x{c0}a\x{c0}aaa
|
||||
1: a\x{c0}a\x{c0}a
|
||||
|
||||
/^([a\x{c0}]*)a\x{c0}/8
|
||||
a\x{c0}aaaa/
|
||||
0: a\x{c0}
|
||||
1:
|
||||
a\x{c0}a\x{c0}aaa/
|
||||
0: a\x{c0}a\x{c0}
|
||||
1: a\x{c0}
|
||||
|
||||
/-- End of testinput4 --/
|
||||
|
548
ext/pcre/pcrelib/testdata/testoutput5
vendored
548
ext/pcre/pcrelib/testdata/testoutput5
vendored
@ -1,3 +1,6 @@
|
||||
/-- This set of tests checks the API, internals, and non-Perl stuff for UTF-8
|
||||
support, excluding Unicode properties. --/
|
||||
|
||||
/\x{100}/8DZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
@ -252,7 +255,6 @@ Need char = 171
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Partial matching not supported
|
||||
Options: utf8
|
||||
No first char
|
||||
Need char = 'X'
|
||||
@ -269,52 +271,12 @@ Need char = 'X'
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Partial matching not supported
|
||||
Options: utf8
|
||||
No first char
|
||||
No need char
|
||||
\x{212ab}\x{212ab}\x{212ab}\x{861}
|
||||
0: \x{212ab}\x{212ab}\x{212ab}
|
||||
|
||||
/-- These tests are here rather than in testinput4 because Perl 5.6 has some
|
||||
problems with UTF-8 support, in the area of \x{..} where the value is < 255.
|
||||
It grumbles about invalid UTF-8 strings. --/
|
||||
|
||||
/^[a\x{c0}]b/8
|
||||
\x{c0}b
|
||||
0: \x{c0}b
|
||||
|
||||
/^([a\x{c0}]*?)aa/8
|
||||
a\x{c0}aaaa/
|
||||
0: a\x{c0}aa
|
||||
1: a\x{c0}
|
||||
|
||||
/^([a\x{c0}]*?)aa/8
|
||||
a\x{c0}aaaa/
|
||||
0: a\x{c0}aa
|
||||
1: a\x{c0}
|
||||
a\x{c0}a\x{c0}aaa/
|
||||
0: a\x{c0}a\x{c0}aa
|
||||
1: a\x{c0}a\x{c0}
|
||||
|
||||
/^([a\x{c0}]*)aa/8
|
||||
a\x{c0}aaaa/
|
||||
0: a\x{c0}aaaa
|
||||
1: a\x{c0}aa
|
||||
a\x{c0}a\x{c0}aaa/
|
||||
0: a\x{c0}a\x{c0}aaa
|
||||
1: a\x{c0}a\x{c0}a
|
||||
|
||||
/^([a\x{c0}]*)a\x{c0}/8
|
||||
a\x{c0}aaaa/
|
||||
0: a\x{c0}
|
||||
1:
|
||||
a\x{c0}a\x{c0}aaa/
|
||||
0: a\x{c0}a\x{c0}
|
||||
1: a\x{c0}
|
||||
|
||||
/-- --/
|
||||
|
||||
/(?<=\C)X/8
|
||||
Failed: \C not allowed in lookbehind assertion at offset 6
|
||||
|
||||
@ -389,6 +351,7 @@ Capturing subpattern count = 0
|
||||
Options: utf8
|
||||
No first char
|
||||
No need char
|
||||
Subject length lower bound = 1
|
||||
Starting byte set: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
|
||||
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
|
||||
\x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4
|
||||
@ -423,11 +386,11 @@ No match
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Partial matching not supported
|
||||
Options: utf8
|
||||
First char = 196
|
||||
Need char = 128
|
||||
Study returned NULL
|
||||
Subject length lower bound = 3
|
||||
No set of starting bytes
|
||||
\x{100}\x{100}\x{100}\x{100\x{100}
|
||||
0: \x{100}\x{100}\x{100}
|
||||
|
||||
@ -443,10 +406,10 @@ Study returned NULL
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Partial matching not supported
|
||||
Options: utf8
|
||||
No first char
|
||||
No need char
|
||||
Subject length lower bound = 1
|
||||
Starting byte set: x \xc4
|
||||
|
||||
/(\x{100}*a|x)/8SDZ
|
||||
@ -462,10 +425,10 @@ Starting byte set: x \xc4
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Partial matching not supported
|
||||
Options: utf8
|
||||
No first char
|
||||
No need char
|
||||
Subject length lower bound = 1
|
||||
Starting byte set: a x \xc4
|
||||
|
||||
/(\x{100}{0,2}a|x)/8SDZ
|
||||
@ -481,10 +444,10 @@ Starting byte set: a x \xc4
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Partial matching not supported
|
||||
Options: utf8
|
||||
No first char
|
||||
No need char
|
||||
Subject length lower bound = 1
|
||||
Starting byte set: a x \xc4
|
||||
|
||||
/(\x{100}{1,2}a|x)/8SDZ
|
||||
@ -501,10 +464,10 @@ Starting byte set: a x \xc4
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Partial matching not supported
|
||||
Options: utf8
|
||||
No first char
|
||||
No need char
|
||||
Subject length lower bound = 1
|
||||
Starting byte set: x \xc4
|
||||
|
||||
/\x{100}*(\d+|"(?1)")/8
|
||||
@ -551,7 +514,6 @@ Need char = 128
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Partial matching not supported
|
||||
Options: utf8
|
||||
No first char
|
||||
No need char
|
||||
@ -565,7 +527,6 @@ No need char
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Partial matching not supported
|
||||
Options: utf8
|
||||
First char = 'a'
|
||||
No need char
|
||||
@ -579,7 +540,6 @@ No need char
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Partial matching not supported
|
||||
Options: utf8
|
||||
First char = 'a'
|
||||
Need char = 'b'
|
||||
@ -593,7 +553,6 @@ Need char = 'b'
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Partial matching not supported
|
||||
Options: utf8
|
||||
First char = 'a'
|
||||
Need char = 128
|
||||
@ -607,7 +566,6 @@ Need char = 128
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Partial matching not supported
|
||||
Options: utf8
|
||||
First char = 'a'
|
||||
Need char = 129
|
||||
@ -621,7 +579,6 @@ Need char = 129
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Partial matching not supported
|
||||
Options: utf8
|
||||
No first char
|
||||
Need char = 'A'
|
||||
@ -640,7 +597,6 @@ Need char = 'A'
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Partial matching not supported
|
||||
Options: utf8
|
||||
No first char
|
||||
No need char
|
||||
@ -1122,7 +1078,6 @@ Need char = 191
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Partial matching not supported
|
||||
Options: utf8
|
||||
No first char
|
||||
No need char
|
||||
@ -1136,7 +1091,6 @@ No need char
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Partial matching not supported
|
||||
Options: utf8
|
||||
No first char
|
||||
No need char
|
||||
@ -1150,7 +1104,6 @@ No need char
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Partial matching not supported
|
||||
Options: utf8
|
||||
No first char
|
||||
No need char
|
||||
@ -1164,7 +1117,6 @@ No need char
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Partial matching not supported
|
||||
Options: utf8
|
||||
No first char
|
||||
No need char
|
||||
@ -1178,7 +1130,6 @@ No need char
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Partial matching not supported
|
||||
Options: utf8
|
||||
No first char
|
||||
No need char
|
||||
@ -1192,7 +1143,6 @@ No need char
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Partial matching not supported
|
||||
Options: utf8
|
||||
No first char
|
||||
No need char
|
||||
@ -1206,7 +1156,6 @@ No need char
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Partial matching not supported
|
||||
Options: utf8
|
||||
First char = 196
|
||||
Need char = 128
|
||||
@ -1220,7 +1169,6 @@ Need char = 128
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Partial matching not supported
|
||||
Options: utf8
|
||||
First char = 196
|
||||
Need char = 'X'
|
||||
@ -1234,7 +1182,6 @@ Need char = 'X'
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Partial matching not supported
|
||||
Options: utf8
|
||||
First char = 'X'
|
||||
Need char = 128
|
||||
@ -1652,4 +1599,477 @@ Forced newline sequence: CRLF
|
||||
First char = 'a'
|
||||
Need char = 'b'
|
||||
|
||||
/ End of testinput5 /
|
||||
/Xa{2,4}b/8
|
||||
X\P
|
||||
Partial match: X
|
||||
Xa\P
|
||||
Partial match: Xa
|
||||
Xaa\P
|
||||
Partial match: Xaa
|
||||
Xaaa\P
|
||||
Partial match: Xaaa
|
||||
Xaaaa\P
|
||||
Partial match: Xaaaa
|
||||
|
||||
/Xa{2,4}?b/8
|
||||
X\P
|
||||
Partial match: X
|
||||
Xa\P
|
||||
Partial match: Xa
|
||||
Xaa\P
|
||||
Partial match: Xaa
|
||||
Xaaa\P
|
||||
Partial match: Xaaa
|
||||
Xaaaa\P
|
||||
Partial match: Xaaaa
|
||||
|
||||
/Xa{2,4}+b/8
|
||||
X\P
|
||||
Partial match: X
|
||||
Xa\P
|
||||
Partial match: Xa
|
||||
Xaa\P
|
||||
Partial match: Xaa
|
||||
Xaaa\P
|
||||
Partial match: Xaaa
|
||||
Xaaaa\P
|
||||
Partial match: Xaaaa
|
||||
|
||||
/X\x{123}{2,4}b/8
|
||||
X\P
|
||||
Partial match: X
|
||||
X\x{123}\P
|
||||
Partial match: X\x{123}
|
||||
X\x{123}\x{123}\P
|
||||
Partial match: X\x{123}\x{123}
|
||||
X\x{123}\x{123}\x{123}\P
|
||||
Partial match: X\x{123}\x{123}\x{123}
|
||||
X\x{123}\x{123}\x{123}\x{123}\P
|
||||
Partial match: X\x{123}\x{123}\x{123}\x{123}
|
||||
|
||||
/X\x{123}{2,4}?b/8
|
||||
X\P
|
||||
Partial match: X
|
||||
X\x{123}\P
|
||||
Partial match: X\x{123}
|
||||
X\x{123}\x{123}\P
|
||||
Partial match: X\x{123}\x{123}
|
||||
X\x{123}\x{123}\x{123}\P
|
||||
Partial match: X\x{123}\x{123}\x{123}
|
||||
X\x{123}\x{123}\x{123}\x{123}\P
|
||||
Partial match: X\x{123}\x{123}\x{123}\x{123}
|
||||
|
||||
/X\x{123}{2,4}+b/8
|
||||
X\P
|
||||
Partial match: X
|
||||
X\x{123}\P
|
||||
Partial match: X\x{123}
|
||||
X\x{123}\x{123}\P
|
||||
Partial match: X\x{123}\x{123}
|
||||
X\x{123}\x{123}\x{123}\P
|
||||
Partial match: X\x{123}\x{123}\x{123}
|
||||
X\x{123}\x{123}\x{123}\x{123}\P
|
||||
Partial match: X\x{123}\x{123}\x{123}\x{123}
|
||||
|
||||
/X\x{123}{2,4}b/8
|
||||
Xx\P
|
||||
No match
|
||||
X\x{123}x\P
|
||||
No match
|
||||
X\x{123}\x{123}x\P
|
||||
No match
|
||||
X\x{123}\x{123}\x{123}x\P
|
||||
No match
|
||||
X\x{123}\x{123}\x{123}\x{123}x\P
|
||||
No match
|
||||
|
||||
/X\x{123}{2,4}?b/8
|
||||
Xx\P
|
||||
No match
|
||||
X\x{123}x\P
|
||||
No match
|
||||
X\x{123}\x{123}x\P
|
||||
No match
|
||||
X\x{123}\x{123}\x{123}x\P
|
||||
No match
|
||||
X\x{123}\x{123}\x{123}\x{123}x\P
|
||||
No match
|
||||
|
||||
/X\x{123}{2,4}+b/8
|
||||
Xx\P
|
||||
No match
|
||||
X\x{123}x\P
|
||||
No match
|
||||
X\x{123}\x{123}x\P
|
||||
No match
|
||||
X\x{123}\x{123}\x{123}x\P
|
||||
No match
|
||||
X\x{123}\x{123}\x{123}\x{123}x\P
|
||||
No match
|
||||
|
||||
/X\d{2,4}b/8
|
||||
X\P
|
||||
Partial match: X
|
||||
X3\P
|
||||
Partial match: X3
|
||||
X33\P
|
||||
Partial match: X33
|
||||
X333\P
|
||||
Partial match: X333
|
||||
X3333\P
|
||||
Partial match: X3333
|
||||
|
||||
/X\d{2,4}?b/8
|
||||
X\P
|
||||
Partial match: X
|
||||
X3\P
|
||||
Partial match: X3
|
||||
X33\P
|
||||
Partial match: X33
|
||||
X333\P
|
||||
Partial match: X333
|
||||
X3333\P
|
||||
Partial match: X3333
|
||||
|
||||
/X\d{2,4}+b/8
|
||||
X\P
|
||||
Partial match: X
|
||||
X3\P
|
||||
Partial match: X3
|
||||
X33\P
|
||||
Partial match: X33
|
||||
X333\P
|
||||
Partial match: X333
|
||||
X3333\P
|
||||
Partial match: X3333
|
||||
|
||||
/X\D{2,4}b/8
|
||||
X\P
|
||||
Partial match: X
|
||||
Xa\P
|
||||
Partial match: Xa
|
||||
Xaa\P
|
||||
Partial match: Xaa
|
||||
Xaaa\P
|
||||
Partial match: Xaaa
|
||||
Xaaaa\P
|
||||
Partial match: Xaaaa
|
||||
|
||||
/X\D{2,4}?b/8
|
||||
X\P
|
||||
Partial match: X
|
||||
Xa\P
|
||||
Partial match: Xa
|
||||
Xaa\P
|
||||
Partial match: Xaa
|
||||
Xaaa\P
|
||||
Partial match: Xaaa
|
||||
Xaaaa\P
|
||||
Partial match: Xaaaa
|
||||
|
||||
/X\D{2,4}+b/8
|
||||
X\P
|
||||
Partial match: X
|
||||
Xa\P
|
||||
Partial match: Xa
|
||||
Xaa\P
|
||||
Partial match: Xaa
|
||||
Xaaa\P
|
||||
Partial match: Xaaa
|
||||
Xaaaa\P
|
||||
Partial match: Xaaaa
|
||||
|
||||
/X\D{2,4}b/8
|
||||
X\P
|
||||
Partial match: X
|
||||
X\x{123}\P
|
||||
Partial match: X\x{123}
|
||||
X\x{123}\x{123}\P
|
||||
Partial match: X\x{123}\x{123}
|
||||
X\x{123}\x{123}\x{123}\P
|
||||
Partial match: X\x{123}\x{123}\x{123}
|
||||
X\x{123}\x{123}\x{123}\x{123}\P
|
||||
Partial match: X\x{123}\x{123}\x{123}\x{123}
|
||||
|
||||
/X\D{2,4}?b/8
|
||||
X\P
|
||||
Partial match: X
|
||||
X\x{123}\P
|
||||
Partial match: X\x{123}
|
||||
X\x{123}\x{123}\P
|
||||
Partial match: X\x{123}\x{123}
|
||||
X\x{123}\x{123}\x{123}\P
|
||||
Partial match: X\x{123}\x{123}\x{123}
|
||||
X\x{123}\x{123}\x{123}\x{123}\P
|
||||
Partial match: X\x{123}\x{123}\x{123}\x{123}
|
||||
|
||||
/X\D{2,4}+b/8
|
||||
X\P
|
||||
Partial match: X
|
||||
X\x{123}\P
|
||||
Partial match: X\x{123}
|
||||
X\x{123}\x{123}\P
|
||||
Partial match: X\x{123}\x{123}
|
||||
X\x{123}\x{123}\x{123}\P
|
||||
Partial match: X\x{123}\x{123}\x{123}
|
||||
X\x{123}\x{123}\x{123}\x{123}\P
|
||||
Partial match: X\x{123}\x{123}\x{123}\x{123}
|
||||
|
||||
/X[abc]{2,4}b/8
|
||||
X\P
|
||||
Partial match: X
|
||||
Xa\P
|
||||
Partial match: Xa
|
||||
Xaa\P
|
||||
Partial match: Xaa
|
||||
Xaaa\P
|
||||
Partial match: Xaaa
|
||||
Xaaaa\P
|
||||
Partial match: Xaaaa
|
||||
|
||||
/X[abc]{2,4}?b/8
|
||||
X\P
|
||||
Partial match: X
|
||||
Xa\P
|
||||
Partial match: Xa
|
||||
Xaa\P
|
||||
Partial match: Xaa
|
||||
Xaaa\P
|
||||
Partial match: Xaaa
|
||||
Xaaaa\P
|
||||
Partial match: Xaaaa
|
||||
|
||||
/X[abc]{2,4}+b/8
|
||||
X\P
|
||||
Partial match: X
|
||||
Xa\P
|
||||
Partial match: Xa
|
||||
Xaa\P
|
||||
Partial match: Xaa
|
||||
Xaaa\P
|
||||
Partial match: Xaaa
|
||||
Xaaaa\P
|
||||
Partial match: Xaaaa
|
||||
|
||||
/X[abc\x{123}]{2,4}b/8
|
||||
X\P
|
||||
Partial match: X
|
||||
X\x{123}\P
|
||||
Partial match: X\x{123}
|
||||
X\x{123}\x{123}\P
|
||||
Partial match: X\x{123}\x{123}
|
||||
X\x{123}\x{123}\x{123}\P
|
||||
Partial match: X\x{123}\x{123}\x{123}
|
||||
X\x{123}\x{123}\x{123}\x{123}\P
|
||||
Partial match: X\x{123}\x{123}\x{123}\x{123}
|
||||
|
||||
/X[abc\x{123}]{2,4}?b/8
|
||||
X\P
|
||||
Partial match: X
|
||||
X\x{123}\P
|
||||
Partial match: X\x{123}
|
||||
X\x{123}\x{123}\P
|
||||
Partial match: X\x{123}\x{123}
|
||||
X\x{123}\x{123}\x{123}\P
|
||||
Partial match: X\x{123}\x{123}\x{123}
|
||||
X\x{123}\x{123}\x{123}\x{123}\P
|
||||
Partial match: X\x{123}\x{123}\x{123}\x{123}
|
||||
|
||||
/X[abc\x{123}]{2,4}+b/8
|
||||
X\P
|
||||
Partial match: X
|
||||
X\x{123}\P
|
||||
Partial match: X\x{123}
|
||||
X\x{123}\x{123}\P
|
||||
Partial match: X\x{123}\x{123}
|
||||
X\x{123}\x{123}\x{123}\P
|
||||
Partial match: X\x{123}\x{123}\x{123}
|
||||
X\x{123}\x{123}\x{123}\x{123}\P
|
||||
Partial match: X\x{123}\x{123}\x{123}\x{123}
|
||||
|
||||
/X[^a]{2,4}b/8
|
||||
X\P
|
||||
Partial match: X
|
||||
Xz\P
|
||||
Partial match: Xz
|
||||
Xzz\P
|
||||
Partial match: Xzz
|
||||
Xzzz\P
|
||||
Partial match: Xzzz
|
||||
Xzzzz\P
|
||||
Partial match: Xzzzz
|
||||
|
||||
/X[^a]{2,4}?b/8
|
||||
X\P
|
||||
Partial match: X
|
||||
Xz\P
|
||||
Partial match: Xz
|
||||
Xzz\P
|
||||
Partial match: Xzz
|
||||
Xzzz\P
|
||||
Partial match: Xzzz
|
||||
Xzzzz\P
|
||||
Partial match: Xzzzz
|
||||
|
||||
/X[^a]{2,4}+b/8
|
||||
X\P
|
||||
Partial match: X
|
||||
Xz\P
|
||||
Partial match: Xz
|
||||
Xzz\P
|
||||
Partial match: Xzz
|
||||
Xzzz\P
|
||||
Partial match: Xzzz
|
||||
Xzzzz\P
|
||||
Partial match: Xzzzz
|
||||
|
||||
/X[^a]{2,4}b/8
|
||||
X\P
|
||||
Partial match: X
|
||||
X\x{123}\P
|
||||
Partial match: X\x{123}
|
||||
X\x{123}\x{123}\P
|
||||
Partial match: X\x{123}\x{123}
|
||||
X\x{123}\x{123}\x{123}\P
|
||||
Partial match: X\x{123}\x{123}\x{123}
|
||||
X\x{123}\x{123}\x{123}\x{123}\P
|
||||
Partial match: X\x{123}\x{123}\x{123}\x{123}
|
||||
|
||||
/X[^a]{2,4}?b/8
|
||||
X\P
|
||||
Partial match: X
|
||||
X\x{123}\P
|
||||
Partial match: X\x{123}
|
||||
X\x{123}\x{123}\P
|
||||
Partial match: X\x{123}\x{123}
|
||||
X\x{123}\x{123}\x{123}\P
|
||||
Partial match: X\x{123}\x{123}\x{123}
|
||||
X\x{123}\x{123}\x{123}\x{123}\P
|
||||
Partial match: X\x{123}\x{123}\x{123}\x{123}
|
||||
|
||||
/X[^a]{2,4}+b/8
|
||||
X\P
|
||||
Partial match: X
|
||||
X\x{123}\P
|
||||
Partial match: X\x{123}
|
||||
X\x{123}\x{123}\P
|
||||
Partial match: X\x{123}\x{123}
|
||||
X\x{123}\x{123}\x{123}\P
|
||||
Partial match: X\x{123}\x{123}\x{123}
|
||||
X\x{123}\x{123}\x{123}\x{123}\P
|
||||
Partial match: X\x{123}\x{123}\x{123}\x{123}
|
||||
|
||||
/(Y)X\1{2,4}b/8
|
||||
YX\P
|
||||
Partial match: YX
|
||||
YXY\P
|
||||
Partial match: YXY
|
||||
YXYY\P
|
||||
Partial match: YXYY
|
||||
YXYYY\P
|
||||
Partial match: YXYYY
|
||||
YXYYYY\P
|
||||
Partial match: YXYYYY
|
||||
|
||||
/(Y)X\1{2,4}?b/8
|
||||
YX\P
|
||||
Partial match: YX
|
||||
YXY\P
|
||||
Partial match: YXY
|
||||
YXYY\P
|
||||
Partial match: YXYY
|
||||
YXYYY\P
|
||||
Partial match: YXYYY
|
||||
YXYYYY\P
|
||||
Partial match: YXYYYY
|
||||
|
||||
/(Y)X\1{2,4}+b/8
|
||||
YX\P
|
||||
Partial match: YX
|
||||
YXY\P
|
||||
Partial match: YXY
|
||||
YXYY\P
|
||||
Partial match: YXYY
|
||||
YXYYY\P
|
||||
Partial match: YXYYY
|
||||
YXYYYY\P
|
||||
Partial match: YXYYYY
|
||||
|
||||
/(\x{123})X\1{2,4}b/8
|
||||
\x{123}X\P
|
||||
Partial match: \x{123}X
|
||||
\x{123}X\x{123}\P
|
||||
Partial match: \x{123}X\x{123}
|
||||
\x{123}X\x{123}\x{123}\P
|
||||
Partial match: \x{123}X\x{123}\x{123}
|
||||
\x{123}X\x{123}\x{123}\x{123}\P
|
||||
Partial match: \x{123}X\x{123}\x{123}\x{123}
|
||||
\x{123}X\x{123}\x{123}\x{123}\x{123}\P
|
||||
Partial match: \x{123}X\x{123}\x{123}\x{123}\x{123}
|
||||
|
||||
/(\x{123})X\1{2,4}?b/8
|
||||
\x{123}X\P
|
||||
Partial match: \x{123}X
|
||||
\x{123}X\x{123}\P
|
||||
Partial match: \x{123}X\x{123}
|
||||
\x{123}X\x{123}\x{123}\P
|
||||
Partial match: \x{123}X\x{123}\x{123}
|
||||
\x{123}X\x{123}\x{123}\x{123}\P
|
||||
Partial match: \x{123}X\x{123}\x{123}\x{123}
|
||||
\x{123}X\x{123}\x{123}\x{123}\x{123}\P
|
||||
Partial match: \x{123}X\x{123}\x{123}\x{123}\x{123}
|
||||
|
||||
/(\x{123})X\1{2,4}+b/8
|
||||
\x{123}X\P
|
||||
Partial match: \x{123}X
|
||||
\x{123}X\x{123}\P
|
||||
Partial match: \x{123}X\x{123}
|
||||
\x{123}X\x{123}\x{123}\P
|
||||
Partial match: \x{123}X\x{123}\x{123}
|
||||
\x{123}X\x{123}\x{123}\x{123}\P
|
||||
Partial match: \x{123}X\x{123}\x{123}\x{123}
|
||||
\x{123}X\x{123}\x{123}\x{123}\x{123}\P
|
||||
Partial match: \x{123}X\x{123}\x{123}\x{123}\x{123}
|
||||
|
||||
/\bthe cat\b/8
|
||||
the cat\P
|
||||
0: the cat
|
||||
the cat\P\P
|
||||
Partial match: the cat
|
||||
|
||||
/abcd*/8
|
||||
xxxxabcd\P
|
||||
0: abcd
|
||||
xxxxabcd\P\P
|
||||
Partial match: abcd
|
||||
|
||||
/abcd*/i8
|
||||
xxxxabcd\P
|
||||
0: abcd
|
||||
xxxxabcd\P\P
|
||||
Partial match: abcd
|
||||
XXXXABCD\P
|
||||
0: ABCD
|
||||
XXXXABCD\P\P
|
||||
Partial match: ABCD
|
||||
|
||||
/abc\d*/8
|
||||
xxxxabc1\P
|
||||
0: abc1
|
||||
xxxxabc1\P\P
|
||||
Partial match: abc1
|
||||
|
||||
/(a)bc\1*/8
|
||||
xxxxabca\P
|
||||
0: abca
|
||||
1: a
|
||||
xxxxabca\P\P
|
||||
Partial match: abca
|
||||
|
||||
/abc[de]*/8
|
||||
xxxxabcde\P
|
||||
0: abcde
|
||||
xxxxabcde\P\P
|
||||
Partial match: abcde
|
||||
|
||||
/-- End of testinput5 --/
|
||||
|
474
ext/pcre/pcrelib/testdata/testoutput6
vendored
474
ext/pcre/pcrelib/testdata/testoutput6
vendored
@ -1,3 +1,7 @@
|
||||
/-- This set of tests is for Unicode property support. It is compatible with
|
||||
Perl 5.10, but not 5.8 because it tests some extra properties that are
|
||||
not in the earlier release. --/
|
||||
|
||||
/^\pC\pL\pM\pN\pP\pS\pZ</8
|
||||
\x7f\x{c0}\x{30f}\x{660}\x{66c}\x{f01}\x{1680}<
|
||||
0: \x{7f}\x{c0}\x{30f}\x{660}\x{66c}\x{f01}\x{1680}<
|
||||
@ -98,14 +102,6 @@ No match
|
||||
\x{09f}
|
||||
No match
|
||||
|
||||
/^\p{Cs}/8
|
||||
\?\x{dfff}
|
||||
0: \x{dfff}
|
||||
** Failers
|
||||
No match
|
||||
\x{09f}
|
||||
No match
|
||||
|
||||
/^\p{Ll}/8
|
||||
a
|
||||
0: a
|
||||
@ -338,18 +334,6 @@ No match
|
||||
\x{f3b}
|
||||
No match
|
||||
|
||||
/^\p{Sc}+/8
|
||||
$\x{a2}\x{a3}\x{a4}\x{a5}\x{a6}
|
||||
0: $\x{a2}\x{a3}\x{a4}\x{a5}
|
||||
\x{9f2}
|
||||
0: \x{9f2}
|
||||
** Failers
|
||||
No match
|
||||
X
|
||||
No match
|
||||
\x{2c2}
|
||||
No match
|
||||
|
||||
/^\p{Sk}/8
|
||||
\x{2c2}
|
||||
0: \x{2c2}
|
||||
@ -402,26 +386,6 @@ No match
|
||||
\x{2028}
|
||||
No match
|
||||
|
||||
/^\p{Zs}/8
|
||||
\ \
|
||||
0:
|
||||
\x{a0}
|
||||
0: \x{a0}
|
||||
\x{1680}
|
||||
0: \x{1680}
|
||||
\x{180e}
|
||||
0: \x{180e}
|
||||
\x{2000}
|
||||
0: \x{2000}
|
||||
\x{2001}
|
||||
0: \x{2001}
|
||||
** Failers
|
||||
No match
|
||||
\x{2028}
|
||||
No match
|
||||
\x{200d}
|
||||
No match
|
||||
|
||||
/\p{Nd}+(..)/8
|
||||
\x{660}\x{661}\x{662}ABC
|
||||
0: \x{660}\x{661}\x{662}AB
|
||||
@ -494,34 +458,6 @@ No match
|
||||
\x{660}\x{661}\x{662}ABC
|
||||
No match
|
||||
|
||||
/\p{Lu}/8i
|
||||
A
|
||||
0: A
|
||||
a\x{10a0}B
|
||||
0: \x{10a0}
|
||||
** Failers
|
||||
0: F
|
||||
a
|
||||
No match
|
||||
\x{1d00}
|
||||
No match
|
||||
|
||||
/\p{^Lu}/8i
|
||||
1234
|
||||
0: 1
|
||||
** Failers
|
||||
0: *
|
||||
ABC
|
||||
No match
|
||||
|
||||
/\P{Lu}/8i
|
||||
1234
|
||||
0: 1
|
||||
** Failers
|
||||
0: *
|
||||
ABC
|
||||
No match
|
||||
|
||||
/(?<=A\p{Nd})XYZ/8
|
||||
A2XYZ
|
||||
0: XYZ
|
||||
@ -548,103 +484,6 @@ No match
|
||||
WXYZ
|
||||
No match
|
||||
|
||||
/[\p{L}]/DZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
[\p{L}]
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
No options
|
||||
No first char
|
||||
No need char
|
||||
|
||||
/[\p{^L}]/DZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
[\P{L}]
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
No options
|
||||
No first char
|
||||
No need char
|
||||
|
||||
/[\P{L}]/DZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
[\P{L}]
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
No options
|
||||
No first char
|
||||
No need char
|
||||
|
||||
/[\P{^L}]/DZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
[\p{L}]
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
No options
|
||||
No first char
|
||||
No need char
|
||||
|
||||
/[abc\p{L}\x{0660}]/8DZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
[a-c\p{L}\x{660}]
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Options: utf8
|
||||
No first char
|
||||
No need char
|
||||
|
||||
/[\p{Nd}]/8DZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
[\p{Nd}]
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Options: utf8
|
||||
No first char
|
||||
No need char
|
||||
1234
|
||||
0: 1
|
||||
|
||||
/[\p{Nd}+-]+/8DZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
[+\-\p{Nd}]+
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Partial matching not supported
|
||||
Options: utf8
|
||||
No first char
|
||||
No need char
|
||||
1234
|
||||
0: 1234
|
||||
12-34
|
||||
0: 12-34
|
||||
12+\x{661}-34
|
||||
0: 12+\x{661}-34
|
||||
** Failers
|
||||
No match
|
||||
abcd
|
||||
No match
|
||||
|
||||
/[\P{Nd}]+/8
|
||||
abcd
|
||||
0: abcd
|
||||
@ -725,28 +564,6 @@ No match
|
||||
ABC
|
||||
No match
|
||||
|
||||
/\p{Ll}/8i
|
||||
a
|
||||
0: a
|
||||
Az
|
||||
0: z
|
||||
** Failers
|
||||
0: a
|
||||
ABC
|
||||
No match
|
||||
|
||||
/^\x{c0}$/8i
|
||||
\x{c0}
|
||||
0: \x{c0}
|
||||
\x{e0}
|
||||
0: \x{e0}
|
||||
|
||||
/^\x{e0}$/8i
|
||||
\x{c0}
|
||||
0: \x{c0}
|
||||
\x{e0}
|
||||
0: \x{e0}
|
||||
|
||||
/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/8
|
||||
A\x{391}\x{10427}\x{ff3a}\x{1fb0}
|
||||
0: A\x{391}\x{10427}\x{ff3a}\x{1fb0}
|
||||
@ -777,54 +594,6 @@ No match
|
||||
A\x{391}\x{10427}\x{ff3a}\x{1fb8}
|
||||
0: A\x{391}\x{10427}\x{ff3a}\x{1fb8}
|
||||
|
||||
/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/8iDZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
NC A\x{391}\x{10427}\x{ff3a}\x{1fb0}
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Options: caseless utf8
|
||||
First char = 'A' (caseless)
|
||||
No need char
|
||||
|
||||
/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/8DZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
A\x{391}\x{10427}\x{ff3a}\x{1fb0}
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Options: utf8
|
||||
First char = 'A'
|
||||
Need char = 176
|
||||
|
||||
/AB\x{1fb0}/8DZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
AB\x{1fb0}
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Options: utf8
|
||||
First char = 'A'
|
||||
Need char = 176
|
||||
|
||||
/AB\x{1fb0}/8DZi
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
NC AB\x{1fb0}
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Options: caseless utf8
|
||||
First char = 'A' (caseless)
|
||||
Need char = 'B' (caseless)
|
||||
|
||||
/\x{391}+/8i
|
||||
\x{391}\x{3b1}\x{3b1}\x{3b1}\x{391}
|
||||
0: \x{391}\x{3b1}\x{3b1}\x{3b1}\x{391}
|
||||
@ -849,86 +618,6 @@ Need char = 'B' (caseless)
|
||||
\x{ff5a}
|
||||
0: \x{ff5a}
|
||||
|
||||
/[\x{c0}\x{391}]/8i
|
||||
\x{c0}
|
||||
0: \x{c0}
|
||||
\x{e0}
|
||||
0: \x{e0}
|
||||
|
||||
/[\x{105}-\x{109}]/8iDZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
[\x{104}-\x{109}]
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Options: caseless utf8
|
||||
No first char
|
||||
No need char
|
||||
\x{104}
|
||||
0: \x{104}
|
||||
\x{105}
|
||||
0: \x{105}
|
||||
\x{109}
|
||||
0: \x{109}
|
||||
** Failers
|
||||
No match
|
||||
\x{100}
|
||||
No match
|
||||
\x{10a}
|
||||
No match
|
||||
|
||||
/[z-\x{100}]/8iDZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
[Z\x{39c}\x{178}z-\x{101}]
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Options: caseless utf8
|
||||
No first char
|
||||
No need char
|
||||
Z
|
||||
0: Z
|
||||
z
|
||||
0: z
|
||||
\x{39c}
|
||||
0: \x{39c}
|
||||
\x{178}
|
||||
0: \x{178}
|
||||
|
|
||||
0: |
|
||||
\x{80}
|
||||
0: \x{80}
|
||||
\x{ff}
|
||||
0: \x{ff}
|
||||
\x{100}
|
||||
0: \x{100}
|
||||
\x{101}
|
||||
0: \x{101}
|
||||
** Failers
|
||||
No match
|
||||
\x{102}
|
||||
No match
|
||||
Y
|
||||
No match
|
||||
y
|
||||
No match
|
||||
|
||||
/[z-\x{100}]/8DZi
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
[Z\x{39c}\x{178}z-\x{101}]
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Options: caseless utf8
|
||||
No first char
|
||||
No need char
|
||||
|
||||
/^\X/8
|
||||
A
|
||||
0: A
|
||||
@ -1408,42 +1097,10 @@ No match
|
||||
1: L=abc
|
||||
2: abc
|
||||
|
||||
/The next two should be Perl-compatible, but it fails to match \x{e0}. PCRE
|
||||
will match it only with UCP support, because without that it has no notion
|
||||
of case for anything other than the ASCII letters. /
|
||||
|
||||
/((?i)[\x{c0}])/8
|
||||
\x{c0}
|
||||
0: \x{c0}
|
||||
1: \x{c0}
|
||||
\x{e0}
|
||||
0: \x{e0}
|
||||
1: \x{e0}
|
||||
|
||||
/(?i:[\x{c0}])/8
|
||||
\x{c0}
|
||||
0: \x{c0}
|
||||
\x{e0}
|
||||
0: \x{e0}
|
||||
|
||||
/^\p{Balinese}\p{Cuneiform}\p{Nko}\p{Phags_Pa}\p{Phoenician}/8
|
||||
\x{1b00}\x{12000}\x{7c0}\x{a840}\x{10900}
|
||||
0: \x{1b00}\x{12000}\x{7c0}\x{a840}\x{10900}
|
||||
|
||||
/The next two are special cases where the lengths of the different cases of the
|
||||
same character differ. The first went wrong with heap frame storage; the 2nd
|
||||
was broken in all cases./
|
||||
|
||||
/^\x{023a}+?(\x{0130}+)/8i
|
||||
\x{023a}\x{2c65}\x{0130}
|
||||
0: \x{23a}\x{2c65}\x{130}
|
||||
1: \x{130}
|
||||
|
||||
/^\x{023a}+([^X])/8i
|
||||
\x{023a}\x{2c65}X
|
||||
0: \x{23a}\x{2c65}
|
||||
1: \x{2c65}
|
||||
|
||||
/Check property support in non-UTF-8 mode/
|
||||
|
||||
/\p{L}{4}/
|
||||
@ -1468,60 +1125,6 @@ No match
|
||||
A\x80
|
||||
0: A\x80
|
||||
|
||||
/(?:[\PPa*]*){8,}/
|
||||
|
||||
/[\P{Any}]/BZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
[\P{Any}]
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
|
||||
/[\P{Any}\E]/BZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
[\P{Any}]
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
|
||||
/(\P{Yi}+\277)/
|
||||
|
||||
/(\P{Yi}+\277)?/
|
||||
|
||||
/(?<=\P{Yi}{3}A)X/
|
||||
|
||||
/\p{Yi}+(\P{Yi}+)(?1)/
|
||||
|
||||
/(\P{Yi}{2}\277)?/
|
||||
|
||||
/[\P{Yi}A]/
|
||||
|
||||
/[\P{Yi}\P{Yi}\P{Yi}A]/
|
||||
|
||||
/[^\P{Yi}A]/
|
||||
|
||||
/[^\P{Yi}\P{Yi}\P{Yi}A]/
|
||||
|
||||
/(\P{Yi}*\277)*/
|
||||
|
||||
/(\P{Yi}*?\277)*/
|
||||
|
||||
/(\p{Yi}*+\277)*/
|
||||
|
||||
/(\P{Yi}?\277)*/
|
||||
|
||||
/(\P{Yi}??\277)*/
|
||||
|
||||
/(\p{Yi}?+\277)*/
|
||||
|
||||
/(\P{Yi}{0,3}\277)*/
|
||||
|
||||
/(\P{Yi}{0,3}?\277)*/
|
||||
|
||||
/(\p{Yi}{0,3}+\277)*/
|
||||
|
||||
/^[\p{Arabic}]/8
|
||||
\x{60e}
|
||||
0: \x{60e}
|
||||
@ -1634,59 +1237,6 @@ No match
|
||||
\x{104aa}
|
||||
No match
|
||||
|
||||
/\p{Zl}{2,3}+/8BZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
prop Zl {2}
|
||||
prop Zl ?+
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
\xe2\x80\xa8\xe2\x80\xa8
|
||||
0: \x{2028}\x{2028}
|
||||
\x{2028}\x{2028}\x{2028}
|
||||
0: \x{2028}\x{2028}\x{2028}
|
||||
|
||||
/\p{Zl}/8BZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
prop Zl
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
|
||||
/\p{Lu}{3}+/8BZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
prop Lu {3}
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
|
||||
/\pL{2}+/8BZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
prop L {2}
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
|
||||
/\p{Cc}{2}+/8BZ
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
prop Cc {2}
|
||||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
|
||||
/\x{c0}+\x{116}+/8i
|
||||
\x{c0}\x{e0}\x{116}\x{117}
|
||||
0: \x{c0}\x{e0}\x{116}\x{117}
|
||||
|
||||
/[\x{c0}\x{116}]+/8i
|
||||
\x{c0}\x{e0}\x{116}\x{117}
|
||||
0: \x{c0}\x{e0}\x{116}\x{117}
|
||||
|
||||
/\p{Carian}\p{Cham}\p{Kayah_Li}\p{Lepcha}\p{Lycian}\p{Lydian}\p{Ol_Chiki}\p{Rejang}\p{Saurashtra}\p{Sundanese}\p{Vai}/8
|
||||
\x{102A4}\x{AA52}\x{A91D}\x{1C46}\x{10283}\x{1092E}\x{1C6B}\x{A93B}\x{A8BF}\x{1BA0}\x{A50A}====
|
||||
0: \x{102a4}\x{aa52}\x{a91d}\x{1c46}\x{10283}\x{1092e}\x{1c6b}\x{a93b}\x{a8bf}\x{1ba0}\x{a50a}
|
||||
@ -1719,20 +1269,6 @@ No match
|
||||
0: aA
|
||||
1: a
|
||||
|
||||
/(\x{de})\1/8i
|
||||
\x{de}\x{de}
|
||||
0: \x{de}\x{de}
|
||||
1: \x{de}
|
||||
\x{de}\x{fe}
|
||||
0: \x{de}\x{fe}
|
||||
1: \x{de}
|
||||
\x{fe}\x{fe}
|
||||
0: \x{fe}\x{fe}
|
||||
1: \x{fe}
|
||||
\x{fe}\x{de}
|
||||
0: \x{fe}\x{de}
|
||||
1: \x{fe}
|
||||
|
||||
/(\x{10a})\1/8i
|
||||
\x{10a}\x{10a}
|
||||
0: \x{10a}\x{10a}
|
||||
@ -1757,4 +1293,4 @@ No match
|
||||
\x41\x20\x50\xC2\x54\xC9\x20\x54\x4F\x44\x41\x59
|
||||
0: A P\xc2T\xc9 TODAY
|
||||
|
||||
/ End of testinput6 /
|
||||
/-- End of testinput6 --/
|
||||
|
222
ext/pcre/pcrelib/testdata/testoutput7
vendored
222
ext/pcre/pcrelib/testdata/testoutput7
vendored
@ -1,3 +1,6 @@
|
||||
/-- This set of tests check the DFA matching functionality of pcre_dfa_exec().
|
||||
The -dfa flag must be used with pcretest when running it. --/
|
||||
|
||||
/abc/
|
||||
abc
|
||||
0: abc
|
||||
@ -981,7 +984,7 @@ Partial match: abc
|
||||
xyzfo\P
|
||||
No match
|
||||
foob\P\>2
|
||||
Partial match: b
|
||||
Partial match: foob
|
||||
foobar...\R\P\>4
|
||||
0: ar
|
||||
xyzfo\P
|
||||
@ -7168,7 +7171,6 @@ No match
|
||||
|
||||
/a\R{2,4}b/I<bsr_anycrlf>
|
||||
Capturing subpattern count = 0
|
||||
Partial matching not supported
|
||||
Options: bsr_anycrlf
|
||||
First char = 'a'
|
||||
Need char = 'b'
|
||||
@ -7187,7 +7189,6 @@ No match
|
||||
|
||||
/a\R{2,4}b/I<bsr_unicode>
|
||||
Capturing subpattern count = 0
|
||||
Partial matching not supported
|
||||
Options: bsr_unicode
|
||||
First char = 'a'
|
||||
Need char = 'b'
|
||||
@ -7370,4 +7371,217 @@ No match
|
||||
\C-"ab"
|
||||
0: "ab"
|
||||
|
||||
/ End of testinput7 /
|
||||
/\d+X|9+Y/
|
||||
++++123999\P
|
||||
Partial match: 123999
|
||||
++++123999Y\P
|
||||
0: 999Y
|
||||
|
||||
/Z(*F)/
|
||||
Z\P
|
||||
No match
|
||||
ZA\P
|
||||
No match
|
||||
|
||||
/Z(?!)/
|
||||
Z\P
|
||||
No match
|
||||
ZA\P
|
||||
No match
|
||||
|
||||
/dog(sbody)?/
|
||||
dogs\P
|
||||
0: dog
|
||||
dogs\P\P
|
||||
Partial match: dogs
|
||||
|
||||
/dog(sbody)??/
|
||||
dogs\P
|
||||
0: dog
|
||||
dogs\P\P
|
||||
Partial match: dogs
|
||||
|
||||
/dog|dogsbody/
|
||||
dogs\P
|
||||
0: dog
|
||||
dogs\P\P
|
||||
Partial match: dogs
|
||||
|
||||
/dogsbody|dog/
|
||||
dogs\P
|
||||
0: dog
|
||||
dogs\P\P
|
||||
Partial match: dogs
|
||||
|
||||
/Z(*F)Q|ZXY/
|
||||
Z\P
|
||||
Partial match: Z
|
||||
ZA\P
|
||||
No match
|
||||
X\P
|
||||
No match
|
||||
|
||||
/\bthe cat\b/
|
||||
the cat\P
|
||||
0: the cat
|
||||
the cat\P\P
|
||||
Partial match: the cat
|
||||
|
||||
/dog(sbody)?/
|
||||
dogs\D\P
|
||||
0: dog
|
||||
body\D\R
|
||||
0: body
|
||||
|
||||
/dog(sbody)?/
|
||||
dogs\D\P\P
|
||||
Partial match: dogs
|
||||
body\D\R
|
||||
0: body
|
||||
|
||||
/abc/
|
||||
abc\P
|
||||
0: abc
|
||||
abc\P\P
|
||||
0: abc
|
||||
|
||||
/abc\K123/
|
||||
xyzabc123pqr
|
||||
Error -16
|
||||
|
||||
/(?<=abc)123/
|
||||
xyzabc123pqr
|
||||
0: 123
|
||||
xyzabc12\P
|
||||
Partial match: abc12
|
||||
xyzabc12\P\P
|
||||
Partial match: abc12
|
||||
|
||||
/\babc\b/
|
||||
+++abc+++
|
||||
0: abc
|
||||
+++ab\P
|
||||
Partial match: +ab
|
||||
+++ab\P\P
|
||||
Partial match: +ab
|
||||
|
||||
/(?=C)/g+
|
||||
ABCDECBA
|
||||
0:
|
||||
0+ CDECBA
|
||||
0:
|
||||
0+ CBA
|
||||
|
||||
/(abc|def|xyz)/I
|
||||
Capturing subpattern count = 1
|
||||
No options
|
||||
No first char
|
||||
No need char
|
||||
terhjk;abcdaadsfe
|
||||
0: abc
|
||||
the quick xyz brown fox
|
||||
0: xyz
|
||||
\Yterhjk;abcdaadsfe
|
||||
0: abc
|
||||
\Ythe quick xyz brown fox
|
||||
0: xyz
|
||||
** Failers
|
||||
No match
|
||||
thejk;adlfj aenjl;fda asdfasd ehj;kjxyasiupd
|
||||
No match
|
||||
\Ythejk;adlfj aenjl;fda asdfasd ehj;kjxyasiupd
|
||||
No match
|
||||
|
||||
/(abc|def|xyz)/SI
|
||||
Capturing subpattern count = 1
|
||||
No options
|
||||
No first char
|
||||
No need char
|
||||
Subject length lower bound = 3
|
||||
Starting byte set: a d x
|
||||
terhjk;abcdaadsfe
|
||||
0: abc
|
||||
the quick xyz brown fox
|
||||
0: xyz
|
||||
\Yterhjk;abcdaadsfe
|
||||
0: abc
|
||||
\Ythe quick xyz brown fox
|
||||
0: xyz
|
||||
** Failers
|
||||
No match
|
||||
thejk;adlfj aenjl;fda asdfasd ehj;kjxyasiupd
|
||||
No match
|
||||
\Ythejk;adlfj aenjl;fda asdfasd ehj;kjxyasiupd
|
||||
No match
|
||||
|
||||
/abcd*/+
|
||||
xxxxabcd\P
|
||||
0: abcd
|
||||
0+
|
||||
1: abc
|
||||
xxxxabcd\P\P
|
||||
Partial match: abcd
|
||||
dddxxx\R
|
||||
0: ddd
|
||||
0+ xxx
|
||||
1: dd
|
||||
2: d
|
||||
3:
|
||||
xxxxabcd\P\P
|
||||
Partial match: abcd
|
||||
xxx\R
|
||||
0:
|
||||
0+ xxx
|
||||
|
||||
/abcd*/i
|
||||
xxxxabcd\P
|
||||
0: abcd
|
||||
1: abc
|
||||
xxxxabcd\P\P
|
||||
Partial match: abcd
|
||||
XXXXABCD\P
|
||||
0: ABCD
|
||||
1: ABC
|
||||
XXXXABCD\P\P
|
||||
Partial match: ABCD
|
||||
|
||||
/abc\d*/
|
||||
xxxxabc1\P
|
||||
0: abc1
|
||||
1: abc
|
||||
xxxxabc1\P\P
|
||||
Partial match: abc1
|
||||
|
||||
/abc[de]*/
|
||||
xxxxabcde\P
|
||||
0: abcde
|
||||
1: abcd
|
||||
2: abc
|
||||
xxxxabcde\P\P
|
||||
Partial match: abcde
|
||||
|
||||
/(?:(?1)|B)(A(*F)|C)/
|
||||
ABCD
|
||||
0: BC
|
||||
CCD
|
||||
0: CC
|
||||
** Failers
|
||||
No match
|
||||
CAD
|
||||
No match
|
||||
|
||||
/^(?:(?1)|B)(A(*F)|C)/
|
||||
CCD
|
||||
0: CC
|
||||
BCD
|
||||
0: BC
|
||||
** Failers
|
||||
No match
|
||||
ABCD
|
||||
No match
|
||||
CAD
|
||||
No match
|
||||
BAD
|
||||
No match
|
||||
|
||||
/-- End of testinput7 --/
|
||||
|
44
ext/pcre/pcrelib/testdata/testoutput8
vendored
44
ext/pcre/pcrelib/testdata/testoutput8
vendored
@ -1,8 +1,6 @@
|
||||
/-- Do not use the \x{} construct except with patterns that have the --/
|
||||
/-- /8 option set, because PCRE doesn't recognize them as UTF-8 unless --/
|
||||
No match
|
||||
/-- that option is set. However, the latest Perls recognize them always. --/
|
||||
No match
|
||||
/-- This set of tests checks UTF-8 support with the DFA matching functionality
|
||||
of pcre_dfa_exec(). The -dfa flag must be used with pcretest when running
|
||||
it. --/
|
||||
|
||||
/\x{100}ab/8
|
||||
\x{100}ab
|
||||
@ -1288,4 +1286,38 @@ No match
|
||||
A\x{1ec5}ABCXYZ
|
||||
0: X
|
||||
|
||||
/ End of testinput 8 /
|
||||
/abcd*/8
|
||||
xxxxabcd\P
|
||||
0: abcd
|
||||
1: abc
|
||||
xxxxabcd\P\P
|
||||
Partial match: abcd
|
||||
|
||||
/abcd*/i8
|
||||
xxxxabcd\P
|
||||
0: abcd
|
||||
1: abc
|
||||
xxxxabcd\P\P
|
||||
Partial match: abcd
|
||||
XXXXABCD\P
|
||||
0: ABCD
|
||||
1: ABC
|
||||
XXXXABCD\P\P
|
||||
Partial match: ABCD
|
||||
|
||||
/abc\d*/8
|
||||
xxxxabc1\P
|
||||
0: abc1
|
||||
1: abc
|
||||
xxxxabc1\P\P
|
||||
Partial match: abc1
|
||||
|
||||
/abc[de]*/8
|
||||
xxxxabcde\P
|
||||
0: abcde
|
||||
1: abcd
|
||||
2: abc
|
||||
xxxxabcde\P\P
|
||||
Partial match: abcde
|
||||
|
||||
/-- End of testinput8 --/
|
||||
|
6
ext/pcre/pcrelib/testdata/testoutput9
vendored
6
ext/pcre/pcrelib/testdata/testoutput9
vendored
@ -1,3 +1,7 @@
|
||||
/-- This set of tests check Unicode property support with the DFA matching
|
||||
functionality of pcre_dfa_exec(). The -dfa flag must be used with pcretest
|
||||
when running it. --/
|
||||
|
||||
/\pL\P{Nd}/8
|
||||
AB
|
||||
0: AB
|
||||
@ -1670,4 +1674,4 @@ No match
|
||||
\x{1d79}\x{a77d}
|
||||
No match
|
||||
|
||||
/ End /
|
||||
/-- End of testinput9 --/
|
||||
|
@ -84,7 +84,12 @@ recurse('pcrelib');
|
||||
|
||||
$dirorig = scandir('pcrelib/testdata');
|
||||
$k = array_search('CVS', $dirorig);
|
||||
unset($dirorig[$k]);
|
||||
if ($k !== false)
|
||||
unset($dirorig[$k]);
|
||||
|
||||
$k = array_search('.svn', $dirorig);
|
||||
if ($k !== false)
|
||||
unset($dirorig[$k]);
|
||||
|
||||
$dirnew = scandir("$newpcre/testdata");
|
||||
$diff = array_diff($dirorig, $dirnew);
|
||||
|
Loading…
Reference in New Issue
Block a user