Update PCRE to 8.00

2024-11-28 12:26:37 +08:00 · 2009-11-03 12:15:03 +00:00 · 2009-11-03 12:15:03 +00:00 · f03b175f7c
commit f03b175f7c
parent 26e3082abc
42 changed files with 7020 additions and 3397 deletions
--- a/ext/pcre/pcrelib/ChangeLog
+++ b/ext/pcre/pcrelib/ChangeLog
@ -1,6 +1,170 @@
 ChangeLog for PCRE
 ------------------

+Version 8.00 19-Oct-09
+----------------------
+
+1.  The table for translating pcre_compile() error codes into POSIX error codes
+    was out-of-date, and there was no check on the pcre_compile() error code
+    being within the table. This could lead to an OK return being given in
+    error.
+
+2.  Changed the call to open a subject file in pcregrep from fopen(pathname,
+    "r") to fopen(pathname, "rb"), which fixed a problem with some of the tests
+    in a Windows environment.
+
+3.  The pcregrep --count option prints the count for each file even when it is
+    zero, as does GNU grep. However, pcregrep was also printing all files when
+    --files-with-matches was added. Now, when both options are given, it prints
+    counts only for those files that have at least one match. (GNU grep just
+    prints the file name in this circumstance, but including the count seems
+    more useful - otherwise, why use --count?) Also ensured that the
+    combination -clh just lists non-zero counts, with no names.
+
+4.  The long form of the pcregrep -F option was incorrectly implemented as
+    --fixed_strings instead of --fixed-strings. This is an incompatible change,
+    but it seems right to fix it, and I didn't think it was worth preserving
+    the old behaviour.
+
+5.  The command line items --regex=pattern and --regexp=pattern were not
+    recognized by pcregrep, which required --regex pattern or --regexp pattern
+    (with a space rather than an '='). The man page documented the '=' forms,
+    which are compatible with GNU grep; these now work.
+
+6.  No libpcreposix.pc file was created for pkg-config; there was just
+    libpcre.pc and libpcrecpp.pc. The omission has been rectified.
+
+7.  Added #ifndef SUPPORT_UCP into the pcre_ucd.c module, to reduce its size
+    when UCP support is not needed, by modifying the Python script that
+    generates it from Unicode data files. This should not matter if the module
+    is correctly used as a library, but I received one complaint about 50K of
+    unwanted data. My guess is that the person linked everything into his
+    program rather than using a library. Anyway, it does no harm.
+
+8.  A pattern such as /\x{123}{2,2}+/8 was incorrectly compiled; the trigger
+    was a minimum greater than 1 for a wide character in a possessive
+    repetition. The same bug could also affect patterns like /(\x{ff}{0,2})*/8
+    which had an unlimited repeat of a nested, fixed maximum repeat of a wide
+    character. Chaos in the form of incorrect output or a compiling loop could
+    result.
+
+9.  The restrictions on what a pattern can contain when partial matching is
+    requested for pcre_exec() have been removed. All patterns can now be
+    partially matched by this function. In addition, if there are at least two
+    slots in the offset vector, the offset of the earliest inspected character
+    for the match and the offset of the end of the subject are set in them when
+    PCRE_ERROR_PARTIAL is returned.
+
+10. Partial matching has been split into two forms: PCRE_PARTIAL_SOFT, which is
+    synonymous with PCRE_PARTIAL, for backwards compatibility, and
+    PCRE_PARTIAL_HARD, which causes a partial match to supersede a full match,
+    and may be more useful for multi-segment matching.
+
+11. Partial matching with pcre_exec() is now more intuitive. A partial match
+    used to be given if ever the end of the subject was reached; now it is
+    given only if matching could not proceed because another character was
+    needed. This makes a difference in some odd cases such as Z(*FAIL) with the
+    string "Z", which now yields "no match" instead of "partial match". In the
+    case of pcre_dfa_exec(), "no match" is given if every matching path for the
+    final character ended with (*FAIL).
+
+12. Restarting a match using pcre_dfa_exec() after a partial match did not work
+    if the pattern had a "must contain" character that was already found in the
+    earlier partial match, unless partial matching was again requested. For
+    example, with the pattern /dog.(body)?/, the "must contain" character is
+    "g". If the first part-match was for the string "dog", restarting with
+    "sbody" failed. This bug has been fixed.
+
+13. The string returned by pcre_dfa_exec() after a partial match has been
+    changed so that it starts at the first inspected character rather than the
+    first character of the match. This makes a difference only if the pattern
+    starts with a lookbehind assertion or \b or \B (\K is not supported by
+    pcre_dfa_exec()). It's an incompatible change, but it makes the two
+    matching functions compatible, and I think it's the right thing to do.
+
+14. Added a pcredemo man page, created automatically from the pcredemo.c file,
+    so that the demonstration program is easily available in environments where
+    PCRE has not been installed from source.
+
+15. Arranged to add -DPCRE_STATIC to cflags in libpcre.pc, libpcreposix.cp,
+    libpcrecpp.pc and pcre-config when PCRE is not compiled as a shared
+    library.
+
+16. Added REG_UNGREEDY to the pcreposix interface, at the request of a user.
+    It maps to PCRE_UNGREEDY. It is not, of course, POSIX-compatible, but it
+    is not the first non-POSIX option to be added. Clearly some people find
+    these options useful.
+
+17. If a caller to the POSIX matching function regexec() passes a non-zero
+    value for nmatch with a NULL value for pmatch, the value of
+    nmatch is forced to zero.
+
+18. RunGrepTest did not have a test for the availability of the -u option of
+    the diff command, as RunTest does. It now checks in the same way as
+    RunTest, and also checks for the -b option.
+
+19. If an odd number of negated classes containing just a single character
+    interposed, within parentheses, between a forward reference to a named
+    subpattern and the definition of the subpattern, compilation crashed with
+    an internal error, complaining that it could not find the referenced
+    subpattern. An example of a crashing pattern is /(?&A)(([^m])(?<A>))/.
+    [The bug was that it was starting one character too far in when skipping
+    over the character class, thus treating the ] as data rather than
+    terminating the class. This meant it could skip too much.]
+
+20. Added PCRE_NOTEMPTY_ATSTART in order to be able to correctly implement the
+    /g option in pcretest when the pattern contains \K, which makes it possible
+    to have an empty string match not at the start, even when the pattern is
+    anchored. Updated pcretest and pcredemo to use this option.
+
+21. If the maximum number of capturing subpatterns in a recursion was greater
+    than the maximum at the outer level, the higher number was returned, but
+    with unset values at the outer level. The correct (outer level) value is
+    now given.
+
+22. If (*ACCEPT) appeared inside capturing parentheses, previous releases of
+    PCRE did not set those parentheses (unlike Perl). I have now found a way to
+    make it do so. The string so far is captured, making this feature
+    compatible with Perl.
+
+23. The tests have been re-organized, adding tests 11 and 12, to make it
+    possible to check the Perl 5.10 features against Perl 5.10.
+
+24. Perl 5.10 allows subroutine calls in lookbehinds, as long as the subroutine
+    pattern matches a fixed length string. PCRE did not allow this; now it
+    does. Neither allows recursion.
+
+25. I finally figured out how to implement a request to provide the minimum
+    length of subject string that was needed in order to match a given pattern.
+    (It was back references and recursion that I had previously got hung up
+    on.) This code has now been added to pcre_study(); it finds a lower bound
+    to the length of subject needed. It is not necessarily the greatest lower
+    bound, but using it to avoid searching strings that are too short does give
+    some useful speed-ups. The value is available to calling programs via
+    pcre_fullinfo().
+
+26. While implementing 25, I discovered to my embarrassment that pcretest had
+    not been passing the result of pcre_study() to pcre_dfa_exec(), so the
+    study optimizations had never been tested with that matching function.
+    Oops. What is worse, even when it was passed study data, there was a bug in
+    pcre_dfa_exec() that meant it never actually used it. Double oops. There
+    were also very few tests of studied patterns with pcre_dfa_exec().
+
+27. If (?| is used to create subpatterns with duplicate numbers, they are now
+    allowed to have the same name, even if PCRE_DUPNAMES is not set. However,
+    on the other side of the coin, they are no longer allowed to have different
+    names, because these cannot be distinguished in PCRE, and this has caused
+    confusion. (This is a difference from Perl.)
+
+28. When duplicate subpattern names are present (necessarily with different
+    numbers, as required by 27 above), and a test is made by name in a
+    conditional pattern, either for a subpattern having been matched, or for
+    recursion in such a pattern, all the associated numbered subpatterns are
+    tested, and the overall condition is true if the condition is true for any
+    one of them. This is the way Perl works, and is also more like the way
+    testing by number works.
+
+
 Version 7.9 11-Apr-09
 ---------------------

--- a/ext/pcre/pcrelib/HACKING
+++ b/ext/pcre/pcrelib/HACKING
@ -67,22 +67,22 @@ many tests of the mode that might slow it down. So I re-factored the compiling
 functions to work this way. This got rid of about 600 lines of source. It
 should make future maintenance and development easier. As this was such a major 
 change, I never released 6.8, instead upping the number to 7.0 (other quite 
-major changes are also present in the 7.0 release).
+major changes were also present in the 7.0 release).

-A side effect of this work is that the previous limit of 200 on the nesting
+A side effect of this work was that the previous limit of 200 on the nesting
 depth of parentheses was removed. However, there is a downside: pcre_compile()
 runs more slowly than before (30% or more, depending on the pattern) because it
-is doing a full analysis of the pattern. My hope is that this is not a big
-issue.
+is doing a full analysis of the pattern. My hope was that this would not be a
+big issue, and in the event, nobody has commented on it.

 Traditional matching function
 -----------------------------

 The "traditional", and original, matching function is called pcre_exec(), and 
 it implements an NFA algorithm, similar to the original Henry Spencer algorithm 
-and the way that Perl works. Not surprising, since it is intended to be as 
-compatible with Perl as possible. This is the function most users of PCRE will 
-use most of the time.
+and the way that Perl works. This is not surprising, since it is intended to be
+as compatible with Perl as possible. This is the function most users of PCRE
+will use most of the time.

 Supplementary matching function
 -------------------------------
@ -119,6 +119,7 @@ quantifiers) are always just two bytes long.

 A list of the opcodes follows:

+
 Opcodes with no following data
 ------------------------------

@ -150,12 +151,12 @@ These items are all just one byte long
  OP_EXTUNI              match an extended Unicode character 
  OP_ANYNL               match any Unicode newline sequence 
  
-  OP_ACCEPT              )
-  OP_COMMIT              ) 
-  OP_FAIL                ) These are Perl 5.10's "backtracking     
-  OP_PRUNE               ) control verbs".                         
-  OP_SKIP                )
-  OP_THEN                )
+  OP_ACCEPT              ) These are Perl 5.10's "backtracking    
+  OP_COMMIT              ) control verbs". If OP_ACCEPT is inside
+  OP_FAIL                ) capturing parentheses, it may be preceded 
+  OP_PRUNE               ) by one or more OP_CLOSE, followed by a 2-byte 
+  OP_SKIP                ) number, indicating which parentheses must be
+  OP_THEN                ) closed.
  

 Repeating single characters
@ -372,12 +373,15 @@ These are like other subpatterns, but they start with the opcode OP_COND, or
 OP_SCOND for one that might match an empty string in an unbounded repeat. If
 the condition is a back reference, this is stored at the start of the
 subpattern using the opcode OP_CREF followed by two bytes containing the
-reference number. If the condition is "in recursion" (coded as "(?(R)"), or "in
-recursion of group x" (coded as "(?(Rx)"), the group number is stored at the
-start of the subpattern using the opcode OP_RREF, and a value of zero for "the
-whole pattern". For a DEFINE condition, just the single byte OP_DEF is used (it
-has no associated data). Otherwise, a conditional subpattern always starts with
-one of the assertions.
+reference number. OP_NCREF is used instead if the reference was generated by 
+name (so that the runtime code knows to check for duplicate names).
+
+If the condition is "in recursion" (coded as "(?(R)"), or "in recursion of
+group x" (coded as "(?(Rx)"), the group number is stored at the start of the
+subpattern using the opcode OP_RREF or OP_NRREF (cf OP_NCREF), and a value of
+zero for "the whole pattern". For a DEFINE condition, just the single byte
+OP_DEF is used (it has no associated data). Otherwise, a conditional subpattern
+always starts with one of the assertions.


 Recursion
@ -415,4 +419,4 @@ at compile time, and so does not cause anything to be put into the compiled
 data.

 Philip Hazel
-April 2008
+October 2009
--- a/ext/pcre/pcrelib/LICENCE
+++ b/ext/pcre/pcrelib/LICENCE
@ -4,7 +4,7 @@ PCRE LICENCE
 PCRE is a library of functions to support regular expressions whose syntax
 and semantics are as close as possible to those of the Perl 5 language.

-Release 7 of PCRE is distributed under the terms of the "BSD" licence, as
+Release 8 of PCRE is distributed under the terms of the "BSD" licence, as
 specified below. The documentation for PCRE, supplied in the "doc"
 directory, is distributed under the same terms as the software itself.

--- a/ext/pcre/pcrelib/NEWS
+++ b/ext/pcre/pcrelib/NEWS
@ -1,6 +1,21 @@
 News about PCRE releases
 ------------------------

+Release 8.00 19-Oct-09
+----------------------
+
+Bugs have been fixed in the library and in pcregrep. There are also some
+enhancements. Restrictions on patterns used for partial matching have been
+removed, extra information is given for partial matches, the partial matching
+process has been improved, and an option to make a partial match override a
+full match is available. The "study" process has been enhanced by finding a
+lower bound matching length. Groups with duplicate numbers may now have
+duplicated names without the use of PCRE_DUPNAMES. However, they may not have
+different names. The documentation has been revised to reflect these changes.
+The version number has been expanded to 3 digits as it is clear that the rate
+of change is not slowing down.
+
+
 Release 7.9 11-Apr-09
 ---------------------

--- a/ext/pcre/pcrelib/NON-UNIX-USE
+++ b/ext/pcre/pcrelib/NON-UNIX-USE
@ -12,9 +12,10 @@ This document contains the following sections:
  Comments about Win32 builds
  Building PCRE on Windows with CMake
  Use of relative paths with CMake on Windows
-  Testing with runtest.bat
+  Testing with RunTest.bat
  Building under Windows with BCC5.5
  Building PCRE on OpenVMS
+  Building PCRE on Stratus OpenVOS


 GENERAL
@ -36,10 +37,10 @@ wrapper functions are a separate issue (see below).

 The PCRE distribution includes a "configure" file for use by the Configure/Make
 build system, as found in many Unix-like environments. There is also support
-support for CMake, which some users prefer, in particular in Windows
-environments. There are some instructions for CMake under Windows in the
-section entitled "Building PCRE with CMake" below. CMake can also be used to
-build PCRE in Unix-like systems.
+support for CMake, which some users prefer, especially in Windows environments.
+There are some instructions for CMake under Windows in the section entitled
+"Building PCRE with CMake" below. CMake can also be used to build PCRE in
+Unix-like systems.


 GENERIC INSTRUCTIONS FOR THE PCRE C LIBRARY
@ -278,40 +279,42 @@ things in this area in future.

 BUILDING PCRE ON WINDOWS WITH CMAKE

-CMake is an alternative build facility that can be used instead of the
-traditional Unix "configure". CMake version 2.4.7 supports Borland makefiles,
-MinGW makefiles, MSYS makefiles, NMake makefiles, UNIX makefiles, Visual Studio
-6, Visual Studio 7, Visual Studio 8, and Watcom W8. The following instructions
+CMake is an alternative configuration facility that can be used instead of the
+traditional Unix "configure". CMake creates project files (make files, solution
+files, etc.) tailored to numerous development environments, including Visual
+Studio, Borland, Msys, MinGW, NMake, and Unix. The following instructions
 were contributed by a PCRE user.

-1.  Download CMake 2.4.7 or above from http://www.cmake.org/, install and ensure
-    that cmake\bin is on your path.
+1.  Install the latest CMake version available from http://www.cmake.org/, and
+    ensure that cmake\bin is on your path.

 2.  Unzip (retaining folder structure) the PCRE source tree into a source
    directory such as C:\pcre.

-3.  Create a new, empty build directory: C:\pcre\build\
+3.  Create a new, empty build directory, for example C:\pcre\build\

-4.  Run CMakeSetup from the Shell envirornment of your build tool, e.g., Msys
-    for Msys/MinGW or Visual Studio Command Prompt for VC/VC++
+4.  Run cmake-gui from the Shell envirornment of your build tool, for example,
+    Msys for Msys/MinGW or Visual Studio Command Prompt for VC/VC++.

 5.  Enter C:\pcre\pcre-xx and C:\pcre\build for the source and build
-    directories, respectively
+    directories, respectively.

 6.  Hit the "Configure" button.

-7.  Select the particular IDE / build tool that you are using (Visual Studio,
-    MSYS makefiles, MinGW makefiles, etc.)
+7.  Select the particular IDE / build tool that you are using (Visual
+    Studio, MSYS makefiles, MinGW makefiles, etc.)

-8.  The GUI will then list several configuration options. This is where you can
-    enable UTF-8 support, etc.
+8.  The GUI will then list several configuration options. This is where
+    you can enable UTF-8 support or other PCRE optional features.

-9.  Hit "Configure" again. The adjacent "OK" button should now be active.
+9.  Hit "Configure" again. The adjacent "Generate" button should now be
+    active.

-10. Hit "OK".
+10. Hit "Generate".

 11. The build directory should now contain a usable build system, be it a
-    solution file for Visual Studio, makefiles for MinGW, etc.
+    solution file for Visual Studio, makefiles for MinGW, etc. Exit from
+    cmake-gui and use the generated build system with your compiler or IDE.


 USE OF RELATIVE PATHS WITH CMAKE ON WINDOWS
@ -444,5 +447,52 @@ $!   Locale could not be set to fr
 $!
 =========================

-Last Updated: 17 March 2009
+
+BUILDING PCRE ON STRATUS OPENVOS
+
+These notes on the port of PCRE to VOS (lightly edited) were supplied by
+Ashutosh Warikoo, whose email address has the local part awarikoo and the
+domain nse.co.in. The port was for version 7.9 in August 2009.
+
+1.   Building PCRE
+
+I built pcre on OpenVOS Release 17.0.1at using GNU Tools 3.4a without any
+problems. I used the following packages to build PCRE:
+
+  ftp://ftp.stratus.com/pub/vos/posix/ga/posix.save.evf.gz
+
+Please read and follow the instructions that come with these packages. To start
+the build of pcre, from the root of the package type:
+
+  ./build.sh
+
+2. Installing PCRE
+
+Once you have successfully built PCRE, login to the SysAdmin group, switch to
+the root user, and type
+
+  [ !create_dir (master_disk)>usr   --if needed ]
+  [ !create_dir (master_disk)>usr>local   --if needed ]
+    !gmake install
+
+This installs PCRE and its man pages into /usr/local. You can add
+(master_disk)>usr>local>bin to your command search paths, or if you are in
+BASH, add /usr/local/bin to the PATH environment variable.
+
+4. Restrictions
+
+This port requires readline library optionally. However during the build I
+faced some yet unexplored errors while linking with readline. As it was an
+optional component I chose to disable it.
+
+5. Known Problems
+
+I ran a the test suite, but you will have to be your own judge of whether this
+command, and this port, suits your purposes. If you find any problems that
+appear to be related to the port itself, please let me know. Please see the
+build.log file in the root of the package also.
+
+
+=========================
+Last Updated: 05 October 2009
 ****
--- a/ext/pcre/pcrelib/README
+++ b/ext/pcre/pcrelib/README
@ -24,6 +24,7 @@ The contents of this README file are:
  Shared libraries on Unix-like systems
  Cross-compiling on Unix-like systems
  Using HP's ANSI C++ compiler (aCC)
+  Using PCRE from MySQL
  Making new tarballs
  Testing PCRE
  Character tables
@ -111,8 +112,8 @@ Building PCRE on non-Unix systems
 For a non-Unix system, please read the comments in the file NON-UNIX-USE,
 though if your system supports the use of "configure" and "make" you may be
 able to build PCRE in the same way as for Unix-like systems. PCRE can also be
-configured in many platform environments using the GUI facility of CMake's
-CMakeSetup. It creates Makefiles, solution files, etc.
+configured in many platform environments using the GUI facility provided by
+CMake's cmake-gui command. This creates Makefiles, solution files, etc.

 PCRE has been compiled on many different operating systems. It should be
 straightforward to build PCRE on any system that has a Standard C compiler and
@ -478,6 +479,26 @@ running the "configure" script:
  CXXLDFLAGS="-lstd_v2 -lCsup_v2"


+Using Sun's compilers for Solaris
+---------------------------------
+
+A user reports that the following configurations work on Solaris 9 sparcv9 and
+Solaris 9 x86 (32-bit):
+
+  Solaris 9 sparcv9: ./configure --disable-cpp CC=/bin/cc CFLAGS="-m64 -g"
+  Solaris 9 x86:     ./configure --disable-cpp CC=/bin/cc CFLAGS="-g"
+
+
+Using PCRE from MySQL
+---------------------
+
+On systems where both PCRE and MySQL are installed, it is possible to make use
+of PCRE from within MySQL, as an alternative to the built-in pattern matching.
+There is a web page that tells you how to do this:
+
+  http://www.mysqludf.org/lib_mysqludf_preg/index.php
+
+
 Making new tarballs
 -------------------

@ -553,22 +574,32 @@ document entitled NON-UNIX-USE.]

 The fourth test checks the UTF-8 support. It is not run automatically unless
 PCRE is built with UTF-8 support. To do this you must set --enable-utf8 when
-running "configure". This file can be also fed directly to the perltest script,
-provided you are running Perl 5.8 or higher. (For Perl 5.6, a small patch,
-commented in the script, can be be used.)
+running "configure". This file can be also fed directly to the perltest.pl
+script, provided you are running Perl 5.8 or higher.

 The fifth test checks error handling with UTF-8 encoding, and internal UTF-8
 features of PCRE that are not relevant to Perl.

-The sixth test checks the support for Unicode character properties. It it not
-run automatically unless PCRE is built with Unicode property support. To to
-this you must set --enable-unicode-properties when running "configure".
+The sixth test (which is Perl-5.10 compatible) checks the support for Unicode
+character properties. It it not run automatically unless PCRE is built with
+Unicode property support. To to this you must set --enable-unicode-properties
+when running "configure".

 The seventh, eighth, and ninth tests check the pcre_dfa_exec() alternative
 matching function, in non-UTF-8 mode, UTF-8 mode, and UTF-8 mode with Unicode
 property support, respectively. The eighth and ninth tests are not run
 automatically unless PCRE is build with the relevant support.

+The tenth test checks some internal offsets and code size features; it is run
+only when the default "link size" of 2 is set (in other cases the sizes
+change).
+
+The eleventh test checks out features that are new in Perl 5.10, and the
+twelfth test checks a number internals and non-Perl features concerned with
+Unicode property support. It it not run automatically unless PCRE is built with
+Unicode property support. To to this you must set --enable-unicode-properties
+when running "configure".
+

 Character tables
 ----------------
@ -712,7 +743,7 @@ The distribution should contain the following files:
                          )   "configure" and config.h
  depcomp                 ) script to find program dependencies, generated by
                          )   automake
-  doc/*.3                 man page sources for the PCRE functions
+  doc/*.3                 man page sources for PCRE
  doc/*.1                 man page sources for pcregrep and pcretest
  doc/index.html.src      the base HTML page
  doc/html/*              HTML documentation
@ -721,6 +752,7 @@ The distribution should contain the following files:
  doc/perltest.txt        plain text documentation of Perl test program
  install-sh              a shell script for installing files
  libpcre.pc.in           template for libpcre.pc for pkg-config
+  libpcreposix.pc.in      template for libpcreposix.pc for pkg-config
  libpcrecpp.pc.in        template for libpcrecpp.pc for pkg-config
  ltmain.sh               file used to build a libtool script
  missing                 ) common stub for a few missing GNU programs while
@ -764,4 +796,4 @@ The distribution should contain the following files:
 Philip Hazel
 Email local part: ph10
 Email domain: cam.ac.uk
-Last updated: 21 March 2009
+Last updated: 19 October 2009
--- a/ext/pcre/pcrelib/config.h
+++ b/ext/pcre/pcrelib/config.h
@ -196,6 +196,12 @@ them both to 0; an emulation function will be used. */
 #define LINK_SIZE 2
 #endif

+/* Define to the sub-directory in which libtool stores uninstalled libraries.
+   */
+#ifndef LT_OBJDIR
+#define LT_OBJDIR ".libs/"
+#endif
+
 /* The value of MATCH_LIMIT determines the default number of times the
   internal match() function can be called during a single execution of
   pcre_exec(). There is a runtime interface for setting a different limit.
@ -262,13 +268,13 @@ them both to 0; an emulation function will be used. */
 #define PACKAGE_NAME "PCRE"

 /* Define to the full name and version of this package. */
-#define PACKAGE_STRING "PCRE 7.9"
+#define PACKAGE_STRING "PCRE 8.00"

 /* Define to the one symbol short name of this package. */
 #define PACKAGE_TARNAME "pcre"

 /* Define to the version of this package. */
-#define PACKAGE_VERSION "7.9"
+#define PACKAGE_VERSION "8.00"


 /* If you are compiling for a system other than a Unix-like system or
@ -324,7 +330,7 @@ them both to 0; an emulation function will be used. */

 /* Version number of package */
 #ifndef VERSION
-#define VERSION "7.9"
+#define VERSION "8.00"
 #endif

 /* Define to empty if `const' does not conform to ANSI C. */
--- a/ext/pcre/pcrelib/doc/pcre.txt
+++ b/ext/pcre/pcrelib/doc/pcre.txt
--- a/ext/pcre/pcrelib/pcre.h
+++ b/ext/pcre/pcrelib/pcre.h
@ -41,10 +41,10 @@ POSSIBILITY OF SUCH DAMAGE.

 /* The current PCRE version information. */

-#define PCRE_MAJOR          7
-#define PCRE_MINOR          9
+#define PCRE_MAJOR          8
+#define PCRE_MINOR          00
 #define PCRE_PRERELEASE     
-#define PCRE_DATE           2009-04-11
+#define PCRE_DATE           2009-10-19

 /* When an application links to a PCRE DLL in Windows, the symbols that are
 imported have to be identified as such. When building PCRE, the appropriate
@ -113,7 +113,8 @@ both, so we keep them all distinct. */
 #define PCRE_NO_AUTO_CAPTURE    0x00001000
 #define PCRE_NO_UTF8_CHECK      0x00002000
 #define PCRE_AUTO_CALLOUT       0x00004000
-#define PCRE_PARTIAL            0x00008000
+#define PCRE_PARTIAL_SOFT       0x00008000
+#define PCRE_PARTIAL            0x00008000  /* Backwards compatible synonym */
 #define PCRE_DFA_SHORTEST       0x00010000
 #define PCRE_DFA_RESTART        0x00020000
 #define PCRE_FIRSTLINE          0x00040000
@ -128,6 +129,8 @@ both, so we keep them all distinct. */
 #define PCRE_JAVASCRIPT_COMPAT  0x02000000
 #define PCRE_NO_START_OPTIMIZE  0x04000000
 #define PCRE_NO_START_OPTIMISE  0x04000000
+#define PCRE_PARTIAL_HARD       0x08000000
+#define PCRE_NOTEMPTY_ATSTART   0x10000000

 /* Exec-time and get/set-time error codes */

@ -174,6 +177,7 @@ both, so we keep them all distinct. */
 #define PCRE_INFO_OKPARTIAL         12
 #define PCRE_INFO_JCHANGED          13
 #define PCRE_INFO_HASCRORLF         14
+#define PCRE_INFO_MINLENGTH         15

 /* Request types for pcre_config(). Do not re-arrange, in order to remain
 compatible. */
--- a/ext/pcre/pcrelib/pcre_compile.c
+++ b/ext/pcre/pcrelib/pcre_compile.c
@ -339,7 +339,9 @@ static const char error_texts[] =
  "number is too big\0"
  "subpattern name expected\0"
  "digit expected after (?+\0"
-  "] is an invalid data character in JavaScript compatibility mode";
+  "] is an invalid data character in JavaScript compatibility mode\0"
+  /* 65 */
+  "different names for subpatterns of the same number are not allowed";


 /* Table to identify digits and hex digits. This is used when compiling
@ -1098,6 +1100,7 @@ if (ptr[0] == CHAR_LEFT_PARENTHESIS)
      if (name != NULL && lorn == ptr - thisname &&
          strncmp((const char *)name, (const char *)thisname, lorn) == 0)
        return *count;
+      term++;
      }
    }
  }
@ -1132,19 +1135,21 @@ for (; *ptr != 0; ptr++)
    BOOL negate_class = FALSE;
    for (;;)
      {
-      int c = *(++ptr);
-      if (c == CHAR_BACKSLASH)
+      if (ptr[1] == CHAR_BACKSLASH)
        {
-        if (ptr[1] == CHAR_E)
-          ptr++;
-        else if (strncmp((const char *)ptr+1,
+        if (ptr[2] == CHAR_E)
+          ptr+= 2;
+        else if (strncmp((const char *)ptr+2,
                 STR_Q STR_BACKSLASH STR_E, 3) == 0)
-          ptr += 3;
+          ptr += 4;
        else
          break;
        }
-      else if (!negate_class && c == CHAR_CIRCUMFLEX_ACCENT)
+      else if (!negate_class && ptr[1] == CHAR_CIRCUMFLEX_ACCENT)
+        {
        negate_class = TRUE;
+        ptr++;
+        }
      else break;
      }

@ -1310,7 +1315,9 @@ for (;;)

    case OP_CALLOUT:
    case OP_CREF:
+    case OP_NCREF:
    case OP_RREF:
+    case OP_NRREF:
    case OP_DEF:
    code += _pcre_OP_lengths[*code];
    break;
@ -1326,23 +1333,34 @@ for (;;)


 /*************************************************
-*        Find the fixed length of a pattern      *
+*        Find the fixed length of a branch       *
 *************************************************/

-/* Scan a pattern and compute the fixed length of subject that will match it,
+/* Scan a branch and compute the fixed length of subject that will match it,
 if the length is fixed. This is needed for dealing with backward assertions.
-In UTF8 mode, the result is in characters rather than bytes.
+In UTF8 mode, the result is in characters rather than bytes. The branch is
+temporarily terminated with OP_END when this function is called.
+
+This function is called when a backward assertion is encountered, so that if it
+fails, the error message can point to the correct place in the pattern.
+However, we cannot do this when the assertion contains subroutine calls,
+because they can be forward references. We solve this by remembering this case
+and doing the check at the end; a flag specifies which mode we are running in.

 Arguments:
  code     points to the start of the pattern (the bracket)
  options  the compiling options
+  atend    TRUE if called when the pattern is complete
+  cd       the "compile data" structure

-Returns:   the fixed length, or -1 if there is no fixed length,
+Returns:   the fixed length,
+             or -1 if there is no fixed length,
             or -2 if \C was encountered
+             or -3 if an OP_RECURSE item was encountered and atend is FALSE
 */

 static int
-find_fixedlength(uschar *code, int options)
+find_fixedlength(uschar *code, int options, BOOL atend, compile_data *cd)
 {
 int length = -1;

@ -1355,6 +1373,7 @@ branch, check the length against that of the other branches. */
 for (;;)
  {
  int d;
+  uschar *ce, *cs;
  register int op = *cc;
  switch (op)
    {
@ -1362,7 +1381,7 @@ for (;;)
    case OP_BRA:
    case OP_ONCE:
    case OP_COND:
-    d = find_fixedlength(cc + ((op == OP_CBRA)? 2:0), options);
+    d = find_fixedlength(cc + ((op == OP_CBRA)? 2:0), options, atend, cd);
    if (d < 0) return d;
    branchlength += d;
    do cc += GET(cc, 1); while (*cc == OP_ALT);
@ -1385,6 +1404,21 @@ for (;;)
    branchlength = 0;
    break;

+    /* A true recursion implies not fixed length, but a subroutine call may
+    be OK. If the subroutine is a forward reference, we can't deal with
+    it until the end of the pattern, so return -3. */
+
+    case OP_RECURSE:
+    if (!atend) return -3;
+    cs = ce = (uschar *)cd->start_code + GET(cc, 1);  /* Start subpattern */
+    do ce += GET(ce, 1); while (*ce == OP_ALT);       /* End subpattern */
+    if (cc > cs && cc < ce) return -1;                /* Recursion */
+    d = find_fixedlength(cs + 2, options, atend, cd);
+    if (d < 0) return d;
+    branchlength += d;
+    cc += 1 + LINK_SIZE;
+    break;
+
    /* Skip over assertive subpatterns */

    case OP_ASSERT:
@ -1398,7 +1432,9 @@ for (;;)

    case OP_REVERSE:
    case OP_CREF:
+    case OP_NCREF:
    case OP_RREF:
+    case OP_NRREF:
    case OP_DEF:
    case OP_OPT:
    case OP_CALLOUT:
@ -1421,10 +1457,8 @@ for (;;)
    branchlength++;
    cc += 2;
 #ifdef SUPPORT_UTF8
-    if ((options & PCRE_UTF8) != 0)
-      {
-      while ((*cc & 0xc0) == 0x80) cc++;
-      }
+    if ((options & PCRE_UTF8) != 0 && cc[-1] >= 0xc0)
+      cc += _pcre_utf8_table4[cc[-1] & 0x3f];
 #endif
    break;

@ -1435,10 +1469,8 @@ for (;;)
    branchlength += GET2(cc,1);
    cc += 4;
 #ifdef SUPPORT_UTF8
-    if ((options & PCRE_UTF8) != 0)
-      {
-      while((*cc & 0x80) == 0x80) cc++;
-      }
+    if ((options & PCRE_UTF8) != 0 && cc[-1] >= 0xc0)
+      cc += _pcre_utf8_table4[cc[-1] & 0x3f];
 #endif
    break;

@ -1517,22 +1549,25 @@ for (;;)


 /*************************************************
-*    Scan compiled regex for numbered bracket    *
+*    Scan compiled regex for specific bracket    *
 *************************************************/

 /* This little function scans through a compiled pattern until it finds a
-capturing bracket with the given number.
+capturing bracket with the given number, or, if the number is negative, an
+instance of OP_REVERSE for a lookbehind. The function is global in the C sense
+so that it can be called from pcre_study() when finding the minimum matching
+length.

 Arguments:
  code        points to start of expression
  utf8        TRUE in UTF-8 mode
-  number      the required bracket number
+  number      the required bracket number or negative to find a lookbehind

 Returns:      pointer to the opcode for the bracket, or NULL if not found
 */

-static const uschar *
-find_bracket(const uschar *code, BOOL utf8, int number)
+const uschar *
+_pcre_find_bracket(const uschar *code, BOOL utf8, int number)
 {
 for (;;)
  {
@ -1545,6 +1580,14 @@ for (;;)

  if (c == OP_XCLASS) code += GET(code, 1);

+  /* Handle recursion */
+
+  else if (c == OP_REVERSE)
+    {
+    if (number < 0) return (uschar *)code;
+    code += _pcre_OP_lengths[c];
+    }
+
  /* Handle capturing bracket */

  else if (c == OP_CBRA)
@ -1910,10 +1953,13 @@ for (code = first_significant_code(code + _pcre_OP_lengths[*code], NULL, 0, TRUE
    case OP_QUERY:
    case OP_MINQUERY:
    case OP_POSQUERY:
+    if (utf8 && code[1] >= 0xc0) code += _pcre_utf8_table4[code[1] & 0x3f];
+    break;
+
    case OP_UPTO:
    case OP_MINUPTO:
    case OP_POSUPTO:
-    if (utf8) while ((code[2] & 0xc0) == 0x80) code++;
+    if (utf8 && code[3] >= 0xc0) code += _pcre_utf8_table4[code[3] & 0x3f];
    break;
 #endif
    }
@ -3867,10 +3913,15 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */

      if (repeat_max == 0) goto END_REPEAT;

+      /*--------------------------------------------------------------------*/
+      /* This code is obsolete from release 8.00; the restriction was finally
+      removed: */
+
      /* All real repeats make it impossible to handle partial matching (maybe
      one day we will be able to remove this restriction). */

-      if (repeat_max != 1) cd->external_flags |= PCRE_NOPARTIAL;
+      /* if (repeat_max != 1) cd->external_flags |= PCRE_NOPARTIAL; */
+      /*--------------------------------------------------------------------*/

      /* Combine the op_type with the repeat_type */

@ -4017,10 +4068,15 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
        goto END_REPEAT;
        }

+      /*--------------------------------------------------------------------*/
+      /* This code is obsolete from release 8.00; the restriction was finally
+      removed: */
+
      /* All real repeats make it impossible to handle partial matching (maybe
      one day we will be able to remove this restriction). */

-      if (repeat_max != 1) cd->external_flags |= PCRE_NOPARTIAL;
+      /* if (repeat_max != 1) cd->external_flags |= PCRE_NOPARTIAL; */
+      /*--------------------------------------------------------------------*/

      if (repeat_min == 0 && repeat_max == -1)
        *code++ = OP_CRSTAR + repeat_type;
@ -4335,11 +4391,20 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
    if (possessive_quantifier)
      {
      int len;
-      if (*tempcode == OP_EXACT || *tempcode == OP_TYPEEXACT ||
-          *tempcode == OP_NOTEXACT)
+
+      if (*tempcode == OP_TYPEEXACT)
        tempcode += _pcre_OP_lengths[*tempcode] +
-          ((*tempcode == OP_TYPEEXACT &&
-             (tempcode[3] == OP_PROP || tempcode[3] == OP_NOTPROP))? 2:0);
+          ((tempcode[3] == OP_PROP || tempcode[3] == OP_NOTPROP)? 2 : 0);
+
+      else if (*tempcode == OP_EXACT || *tempcode == OP_NOTEXACT)
+        {
+        tempcode += _pcre_OP_lengths[*tempcode];
+#ifdef SUPPORT_UTF8
+        if (utf8 && tempcode[-1] >= 0xc0)
+          tempcode += _pcre_utf8_table4[tempcode[-1] & 0x3f];
+#endif
+        }
+
      len = code - tempcode;
      if (len > 0) switch (*tempcode)
        {
@ -4417,8 +4482,19 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
        if (namelen == verbs[i].len &&
            strncmp((char *)name, vn, namelen) == 0)
          {
-          *code = verbs[i].op;
-          if (*code++ == OP_ACCEPT) cd->had_accept = TRUE;
+          /* Check for open captures before ACCEPT */
+
+          if (verbs[i].op == OP_ACCEPT)
+            {
+            open_capitem *oc;
+            cd->had_accept = TRUE;
+            for (oc = cd->open_caps; oc != NULL; oc = oc->next)
+              {
+              *code++ = OP_CLOSE;
+              PUT2INC(code, 0, oc->number);
+              }
+            }
+          *code++ = verbs[i].op;
          break;
          }
        vn += verbs[i].len + 1;
@ -4580,7 +4656,10 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
          }

        /* Otherwise (did not start with "+" or "-"), start by looking for the
-        name. */
+        name. If we find a name, add one to the opcode to change OP_CREF or
+        OP_RREF into OP_NCREF or OP_NRREF. These behave exactly the same,
+        except they record that the reference was originally to a name. The
+        information is used to check duplicate names. */

        slot = cd->name_table;
        for (i = 0; i < cd->names_found; i++)
@ -4595,6 +4674,7 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
          {
          recno = GET2(slot, 0);
          PUT2(code, 2+LINK_SIZE, recno);
+          code[1+LINK_SIZE]++;
          }

        /* Search the pattern for a forward reference */
@ -4603,6 +4683,7 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
                        (options & PCRE_EXTENDED) != 0)) > 0)
          {
          PUT2(code, 2+LINK_SIZE, i);
+          code[1+LINK_SIZE]++;
          }

        /* If terminator == 0 it means that the name followed directly after
@ -4795,11 +4876,24 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
              }
            }

-          /* In the real compile, create the entry in the table */
+          /* In the real compile, create the entry in the table, maintaining
+          alphabetical order. Duplicate names for different numbers are
+          permitted only if PCRE_DUPNAMES is set. Duplicate names for the same
+          number are always OK. (An existing number can be re-used if (?|
+          appears in the pattern.) In either event, a duplicate name results in
+          a duplicate entry in the table, even if the number is the same. This
+          is because the number of names, and hence the table size, is computed
+          in the pre-compile, and it affects various numbers and pointers which
+          would all have to be modified, and the compiled code moved down, if
+          duplicates with the same number were omitted from the table. This
+          doesn't seem worth the hassle. However, *different* names for the
+          same number are not permitted. */

          else
            {
+            BOOL dupname = FALSE;
            slot = cd->name_table;
+
            for (i = 0; i < cd->names_found; i++)
              {
              int crc = memcmp(name, slot+2, namelen);
@ -4807,33 +4901,66 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
                {
                if (slot[2+namelen] == 0)
                  {
-                  if ((options & PCRE_DUPNAMES) == 0)
+                  if (GET2(slot, 0) != cd->bracount + 1 &&
+                      (options & PCRE_DUPNAMES) == 0)
                    {
                    *errorcodeptr = ERR43;
                    goto FAILED;
                    }
+                  else dupname = TRUE;
                  }
-                else crc = -1;      /* Current name is substring */
+                else crc = -1;      /* Current name is a substring */
                }
+
+              /* Make space in the table and break the loop for an earlier
+              name. For a duplicate or later name, carry on. We do this for
+              duplicates so that in the simple case (when ?(| is not used) they
+              are in order of their numbers. */
+
              if (crc < 0)
                {
                memmove(slot + cd->name_entry_size, slot,
                  (cd->names_found - i) * cd->name_entry_size);
                break;
                }
+
+              /* Continue the loop for a later or duplicate name */
+
              slot += cd->name_entry_size;
              }

+            /* For non-duplicate names, check for a duplicate number before
+            adding the new name. */
+
+            if (!dupname)
+              {
+              uschar *cslot = cd->name_table;
+              for (i = 0; i < cd->names_found; i++)
+                {
+                if (cslot != slot)
+                  {
+                  if (GET2(cslot, 0) == cd->bracount + 1)
+                    {
+                    *errorcodeptr = ERR65;
+                    goto FAILED;
+                    }
+                  }
+                else i--;
+                cslot += cd->name_entry_size;
+                }
+              }
+
            PUT2(slot, 0, cd->bracount + 1);
            memcpy(slot + 2, name, namelen);
            slot[2+namelen] = 0;
            }
          }

-        /* In both cases, count the number of names we've encountered. */
+        /* In both pre-compile and compile, count the number of names we've
+        encountered. */

-        ptr++;                    /* Move past > or ' */
        cd->names_found++;
+        ptr++;                    /* Move past > or ' */
        goto NUMBERED_GROUP;


@ -5002,7 +5129,8 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
          if (lengthptr == NULL)
            {
            *code = OP_END;
-            if (recno != 0) called = find_bracket(cd->start_code, utf8, recno);
+            if (recno != 0)
+              called = _pcre_find_bracket(cd->start_code, utf8, recno);

            /* Forward reference */

@ -5646,6 +5774,8 @@ uschar *code = *codeptr;
 uschar *last_branch = code;
 uschar *start_bracket = code;
 uschar *reverse_count = NULL;
+open_capitem capitem;
+int capnumber = 0;
 int firstbyte, reqbyte;
 int branchfirstbyte, branchreqbyte;
 int length;
@ -5672,6 +5802,17 @@ the code that abstracts option settings at the start of the pattern and makes
 them global. It tests the value of length for (2 + 2*LINK_SIZE) in the
 pre-compile phase to find out whether anything has yet been compiled or not. */

+/* If this is a capturing subpattern, add to the chain of open capturing items
+so that we can detect them if (*ACCEPT) is encountered. */
+
+if (*code == OP_CBRA)
+  {
+  capnumber = GET2(code, 1 + LINK_SIZE);
+  capitem.number = capnumber;
+  capitem.next = cd->open_caps;
+  cd->open_caps = &capitem;
+  }
+
 /* Offset is set zero to mark that this bracket is still open */

 PUT(code, 1, 0);
@ -5766,21 +5907,29 @@ for (;;)

    /* If lookbehind, check that this branch matches a fixed-length string, and
    put the length into the OP_REVERSE item. Temporarily mark the end of the
-    branch with OP_END. */
+    branch with OP_END. If the branch contains OP_RECURSE, the result is -3
+    because there may be forward references that we can't check here. Set a
+    flag to cause another lookbehind check at the end. Why not do it all at the
+    end? Because common, erroneous checks are picked up here and the offset of
+    the problem can be shown. */

    if (lookbehind)
      {
      int fixed_length;
      *code = OP_END;
-      fixed_length = find_fixedlength(last_branch, options);
+      fixed_length = find_fixedlength(last_branch, options, FALSE, cd);
      DPRINTF(("fixed length = %d\n", fixed_length));
-      if (fixed_length < 0)
+      if (fixed_length == -3)
+        {
+        cd->check_lookbehind = TRUE;
+        }
+      else if (fixed_length < 0)
        {
        *errorcodeptr = (fixed_length == -2)? ERR36 : ERR25;
        *ptrptr = ptr;
        return FALSE;
        }
-      PUT(reverse_count, 0, fixed_length);
+      else { PUT(reverse_count, 0, fixed_length); }
      }
    }

@ -5808,6 +5957,10 @@ for (;;)
      while (branch_length > 0);
      }

+    /* If it was a capturing subpattern, remove it from the chain. */
+
+    if (capnumber > 0) cd->open_caps = cd->open_caps->next;
+
    /* Fill in the ket */

    *code = OP_KET;
@ -6010,7 +6163,9 @@ do {
     switch (*scode)
       {
       case OP_CREF:
+       case OP_NCREF:
       case OP_RREF:
+       case OP_NRREF:
       case OP_DEF:
       return FALSE;

@ -6179,9 +6334,7 @@ int length = 1;  /* For final END opcode */
 int firstbyte, reqbyte, newline;
 int errorcode = 0;
 int skipatstart = 0;
-#ifdef SUPPORT_UTF8
-BOOL utf8;
-#endif
+BOOL utf8 = (options & PCRE_UTF8) != 0;
 size_t size;
 uschar *code;
 const uschar *codestart;
@ -6278,7 +6431,6 @@ while (ptr[skipatstart] == CHAR_LEFT_PARENTHESIS &&
 /* Can't support UTF8 unless PCRE has been compiled to include the code. */

 #ifdef SUPPORT_UTF8
-utf8 = (options & PCRE_UTF8) != 0;
 if (utf8 && (options & PCRE_NO_UTF8_CHECK) == 0 &&
     (*erroroffset = _pcre_valid_utf8((uschar *)pattern, -1)) >= 0)
  {
@ -6286,7 +6438,7 @@ if (utf8 && (options & PCRE_NO_UTF8_CHECK) == 0 &&
  goto PCRE_EARLY_ERROR_RETURN2;
  }
 #else
-if ((options & PCRE_UTF8) != 0)
+if (utf8)
  {
  errorcode = ERR32;
  goto PCRE_EARLY_ERROR_RETURN;
@ -6375,6 +6527,7 @@ cd->end_pattern = (const uschar *)(pattern + strlen(pattern));
 cd->req_varyopt = 0;
 cd->external_options = options;
 cd->external_flags = 0;
+cd->open_caps = NULL;

 /* Now do the pre-compile. On error, errorcode will be set non-zero, so we
 don't need to look at the result of the function here. The initial options have
@ -6449,6 +6602,8 @@ cd->start_code = codestart;
 cd->hwm = cworkspace;
 cd->req_varyopt = 0;
 cd->had_accept = FALSE;
+cd->check_lookbehind = FALSE;
+cd->open_caps = NULL;

 /* Set up a starting, non-extracting bracket, then compile the expression. On
 error, errorcode will be set non-zero, so we don't need to look at the result
@ -6487,7 +6642,7 @@ while (errorcode == 0 && cd->hwm > cworkspace)
  cd->hwm -= LINK_SIZE;
  offset = GET(cd->hwm, 0);
  recno = GET(codestart, offset);
-  groupptr = find_bracket(codestart, (re->options & PCRE_UTF8) != 0, recno);
+  groupptr = _pcre_find_bracket(codestart, utf8, recno);
  if (groupptr == NULL) errorcode = ERR53;
    else PUT(((uschar *)codestart), offset, groupptr - codestart);
  }
@ -6497,6 +6652,47 @@ subpattern. */

 if (errorcode == 0 && re->top_backref > re->top_bracket) errorcode = ERR15;

+/* If there were any lookbehind assertions that contained OP_RECURSE
+(recursions or subroutine calls), a flag is set for them to be checked here,
+because they may contain forward references. Actual recursions can't be fixed
+length, but subroutine calls can. It is done like this so that those without
+OP_RECURSE that are not fixed length get a diagnosic with a useful offset. The
+exceptional ones forgo this. We scan the pattern to check that they are fixed
+length, and set their lengths. */
+
+if (cd->check_lookbehind)
+  {
+  uschar *cc = (uschar *)codestart;
+
+  /* Loop, searching for OP_REVERSE items, and process those that do not have
+  their length set. (Actually, it will also re-process any that have a length
+  of zero, but that is a pathological case, and it does no harm.) When we find
+  one, we temporarily terminate the branch it is in while we scan it. */
+
+  for (cc = (uschar *)_pcre_find_bracket(codestart, utf8, -1);
+       cc != NULL;
+       cc = (uschar *)_pcre_find_bracket(cc, utf8, -1))
+    {
+    if (GET(cc, 1) == 0)
+      {
+      int fixed_length;
+      uschar *be = cc - 1 - LINK_SIZE + GET(cc, -LINK_SIZE);
+      int end_op = *be;
+      *be = OP_END;
+      fixed_length = find_fixedlength(cc, re->options, TRUE, cd);
+      *be = end_op;
+      DPRINTF(("fixed length = %d\n", fixed_length));
+      if (fixed_length < 0)
+        {
+        errorcode = (fixed_length == -2)? ERR36 : ERR25;
+        break;
+        }
+      PUT(cc, 1, fixed_length);
+      }
+    cc += 1 + LINK_SIZE;
+    }
+  }
+
 /* Failed to compile, or error while post-processing */

 if (errorcode != 0)
--- a/ext/pcre/pcrelib/pcre_exec.c
+++ b/ext/pcre/pcrelib/pcre_exec.c
--- a/ext/pcre/pcrelib/pcre_fullinfo.c
+++ b/ext/pcre/pcrelib/pcre_fullinfo.c
@ -117,10 +117,16 @@ switch (what)

  case PCRE_INFO_FIRSTTABLE:
  *((const uschar **)where) =
-    (study != NULL && (study->options & PCRE_STUDY_MAPPED) != 0)?
+    (study != NULL && (study->flags & PCRE_STUDY_MAPPED) != 0)?
      ((const pcre_study_data *)extra_data->study_data)->start_bits : NULL;
  break;

+  case PCRE_INFO_MINLENGTH:
+  *((int *)where) =
+    (study != NULL && (study->flags & PCRE_STUDY_MINLEN) != 0)?
+      study->minlength : -1;
+  break;
+
  case PCRE_INFO_LASTLITERAL:
  *((int *)where) =
    ((re->flags & PCRE_REQCHSET) != 0)? re->req_byte : -1;
@ -142,6 +148,9 @@ switch (what)
  *((const uschar **)where) = (const uschar *)(_pcre_default_tables);
  break;

+  /* From release 8.00 this will always return TRUE because NOPARTIAL is
+  no longer ever set (the restrictions have been removed). */
+
  case PCRE_INFO_OKPARTIAL:
  *((int *)where) = (re->flags & PCRE_NOPARTIAL) == 0;
  break;
--- a/ext/pcre/pcrelib/pcre_internal.h
+++ b/ext/pcre/pcrelib/pcre_internal.h
@ -535,7 +535,9 @@ Standard C system should have one. */

 /* Private flags containing information about the compiled regex. They used to
 live at the top end of the options word, but that got almost full, so now they
-are in a 16-bit flags word. */
+are in a 16-bit flags word. From release 8.00, PCRE_NOPARTIAL is unused, as
+the restrictions on partial matching have been lifted. It remains for backwards
+compatibility. */

 #define PCRE_NOPARTIAL     0x0001  /* can't use partial with this regex */
 #define PCRE_FIRSTSET      0x0002  /* first_byte is set */
@ -547,6 +549,7 @@ are in a 16-bit flags word. */
 /* Options for the "extra" block produced by pcre_study(). */

 #define PCRE_STUDY_MAPPED   0x01     /* a map of starting chars exists */
+#define PCRE_STUDY_MINLEN   0x02     /* a minimum length field exists */

 /* Masks for identifying the public options that are permitted at compile
 time, run time, or study time, respectively. */
@ -562,14 +565,15 @@ time, run time, or study time, respectively. */
   PCRE_JAVASCRIPT_COMPAT)

 #define PUBLIC_EXEC_OPTIONS \
-  (PCRE_ANCHORED|PCRE_NOTBOL|PCRE_NOTEOL|PCRE_NOTEMPTY|PCRE_NO_UTF8_CHECK| \
-   PCRE_PARTIAL|PCRE_NEWLINE_BITS|PCRE_BSR_ANYCRLF|PCRE_BSR_UNICODE| \
-   PCRE_NO_START_OPTIMIZE)
+  (PCRE_ANCHORED|PCRE_NOTBOL|PCRE_NOTEOL|PCRE_NOTEMPTY|PCRE_NOTEMPTY_ATSTART| \
+   PCRE_NO_UTF8_CHECK|PCRE_PARTIAL_HARD|PCRE_PARTIAL_SOFT|PCRE_NEWLINE_BITS| \
+   PCRE_BSR_ANYCRLF|PCRE_BSR_UNICODE|PCRE_NO_START_OPTIMIZE)

 #define PUBLIC_DFA_EXEC_OPTIONS \
-  (PCRE_ANCHORED|PCRE_NOTBOL|PCRE_NOTEOL|PCRE_NOTEMPTY|PCRE_NO_UTF8_CHECK| \
-   PCRE_PARTIAL|PCRE_DFA_SHORTEST|PCRE_DFA_RESTART|PCRE_NEWLINE_BITS| \
-   PCRE_BSR_ANYCRLF|PCRE_BSR_UNICODE|PCRE_NO_START_OPTIMIZE)
+  (PCRE_ANCHORED|PCRE_NOTBOL|PCRE_NOTEOL|PCRE_NOTEMPTY|PCRE_NOTEMPTY_ATSTART| \
+   PCRE_NO_UTF8_CHECK|PCRE_PARTIAL_HARD|PCRE_PARTIAL_SOFT|PCRE_DFA_SHORTEST| \
+   PCRE_DFA_RESTART|PCRE_NEWLINE_BITS|PCRE_BSR_ANYCRLF|PCRE_BSR_UNICODE| \
+   PCRE_NO_START_OPTIMIZE)

 #define PUBLIC_STUDY_OPTIONS 0   /* None defined */

@ -1206,8 +1210,8 @@ enum { ESC_A = 1, ESC_G, ESC_K, ESC_B, ESC_b, ESC_D, ESC_d, ESC_S, ESC_s,
 OP_EOD must correspond in order to the list of escapes immediately above.

 *** NOTE NOTE NOTE *** Whenever this list is updated, the two macro definitions
-that follow must also be updated to match. There is also a table called
-"coptable" in pcre_dfa_exec.c that must be updated. */
+that follow must also be updated to match. There are also tables called
+"coptable" and "poptable" in pcre_dfa_exec.c that must be updated. */

 enum {
  OP_END,            /* 0 End of pattern */
@ -1343,30 +1347,39 @@ enum {
  OP_SCBRA,          /* 98 Start of capturing bracket, check empty */
  OP_SCOND,          /* 99 Conditional group, check empty */

-  OP_CREF,           /* 100 Used to hold a capture number as condition */
-  OP_RREF,           /* 101 Used to hold a recursion number as condition */
-  OP_DEF,            /* 102 The DEFINE condition */
+  /* The next two pairs must (respectively) be kept together. */

-  OP_BRAZERO,        /* 103 These two must remain together and in this */
-  OP_BRAMINZERO,     /* 104 order. */
+  OP_CREF,           /* 100 Used to hold a capture number as condition */
+  OP_NCREF,          /* 101 Same, but generaged by a name reference*/
+  OP_RREF,           /* 102 Used to hold a recursion number as condition */
+  OP_NRREF,          /* 103 Same, but generaged by a name reference*/
+  OP_DEF,            /* 104 The DEFINE condition */
+
+  OP_BRAZERO,        /* 105 These two must remain together and in this */
+  OP_BRAMINZERO,     /* 106 order. */

  /* These are backtracking control verbs */

-  OP_PRUNE,          /* 105 */
-  OP_SKIP,           /* 106 */
-  OP_THEN,           /* 107 */
-  OP_COMMIT,         /* 108 */
+  OP_PRUNE,          /* 107 */
+  OP_SKIP,           /* 108 */
+  OP_THEN,           /* 109 */
+  OP_COMMIT,         /* 110 */

  /* These are forced failure and success verbs */

-  OP_FAIL,           /* 109 */
-  OP_ACCEPT,         /* 110 */
+  OP_FAIL,           /* 111 */
+  OP_ACCEPT,         /* 112 */
+  OP_CLOSE,          /* 113 Used before OP_ACCEPT to close open captures */

  /* This is used to skip a subpattern with a {0} quantifier */

-  OP_SKIPZERO        /* 111 */
+  OP_SKIPZERO        /* 114 */
 };

+/* *** NOTE NOTE NOTE *** Whenever the list above is updated, the two macro
+definitions that follow must also be updated to match. There are also tables
+called "coptable" cna "poptable" in pcre_dfa_exec.c that must be updated. */
+

 /* This macro defines textual names for all the opcodes. These are used only
 for debugging. The macro is referenced only in pcre_printint.c. */
@ -1388,9 +1401,10 @@ for debugging. The macro is referenced only in pcre_printint.c. */
  "Alt", "Ket", "KetRmax", "KetRmin", "Assert", "Assert not",     \
  "AssertB", "AssertB not", "Reverse",                            \
  "Once", "Bra", "CBra", "Cond", "SBra", "SCBra", "SCond",        \
-  "Cond ref", "Cond rec", "Cond def", "Brazero", "Braminzero",    \
+  "Cond ref", "Cond nref", "Cond rec", "Cond nrec", "Cond def",   \
+  "Brazero", "Braminzero",                                        \
  "*PRUNE", "*SKIP", "*THEN", "*COMMIT", "*FAIL", "*ACCEPT",      \
-  "Skip zero"
+  "Close", "Skip zero"


 /* This macro defines the length of fixed length operations in the compiled
@ -1450,15 +1464,16 @@ in UTF-8 mode. The code that uses this table must know about such things. */
  1+LINK_SIZE,                   /* SBRA                                   */ \
  3+LINK_SIZE,                   /* SCBRA                                  */ \
  1+LINK_SIZE,                   /* SCOND                                  */ \
-  3,                             /* CREF                                   */ \
-  3,                             /* RREF                                   */ \
+  3, 3,                          /* CREF, NCREF                            */ \
+  3, 3,                          /* RREF, NRREF                            */ \
  1,                             /* DEF                                    */ \
  1, 1,                          /* BRAZERO, BRAMINZERO                    */ \
  1, 1, 1, 1,                    /* PRUNE, SKIP, THEN, COMMIT,             */ \
-  1, 1, 1                        /* FAIL, ACCEPT, SKIPZERO                 */
+  1, 1, 3, 1                     /* FAIL, ACCEPT, CLOSE, SKIPZERO          */


-/* A magic value for OP_RREF to indicate the "any recursion" condition. */
+/* A magic value for OP_RREF and OP_NRREF to indicate the "any recursion"
+condition. */

 #define RREF_ANY  0xffff

@ -1471,7 +1486,7 @@ enum { ERR0,  ERR1,  ERR2,  ERR3,  ERR4,  ERR5,  ERR6,  ERR7,  ERR8,  ERR9,
       ERR30, ERR31, ERR32, ERR33, ERR34, ERR35, ERR36, ERR37, ERR38, ERR39,
       ERR40, ERR41, ERR42, ERR43, ERR44, ERR45, ERR46, ERR47, ERR48, ERR49,
       ERR50, ERR51, ERR52, ERR53, ERR54, ERR55, ERR56, ERR57, ERR58, ERR59,
-       ERR60, ERR61, ERR62, ERR63, ERR64 };
+       ERR60, ERR61, ERR62, ERR63, ERR64, ERR65 };

 /* The real format of the start of the pcre block; the index of names and the
 code vector run on as long as necessary after the end. We store an explicit
@ -1487,7 +1502,7 @@ Because people can now save and re-use compiled patterns, any additions to this
 structure should be made at the end, and something earlier (e.g. a new
 flag in the options or one of the dummy fields) should indicate that the new
 fields are present. Currently PCRE always sets the dummy fields to zero.
-NOTE NOTE NOTE:
+NOTE NOTE NOTE
 */

 typedef struct real_pcre {
@ -1514,10 +1529,20 @@ remark (see NOTE above) about extending this structure applies. */

 typedef struct pcre_study_data {
  pcre_uint32 size;               /* Total that was malloced */
-  pcre_uint32 options;
-  uschar start_bits[32];
+  pcre_uint32 flags;              /* Private flags */
+  uschar start_bits[32];          /* Starting char bits */
+  pcre_uint32 minlength;          /* Minimum subject length */
 } pcre_study_data;

+/* Structure for building a chain of open capturing subpatterns during
+compiling, so that instructions to close them can be compiled when (*ACCEPT) is
+encountered. */
+
+typedef struct open_capitem {
+  struct open_capitem *next;    /* Chain link */
+  pcre_uint16 number;           /* Capture number */
+} open_capitem;
+
 /* Structure for passing "static" information around between the functions
 doing the compiling, so that they are thread-safe. */

@ -1530,6 +1555,7 @@ typedef struct compile_data {
  const uschar *start_code;     /* The start of the compiled code */
  const uschar *start_pattern;  /* The start of the pattern */
  const uschar *end_pattern;    /* The end of the pattern */
+  open_capitem *open_caps;      /* Chain of open capture items */
  uschar *hwm;                  /* High watermark of workspace */
  uschar *name_table;           /* The name/number table */
  int  names_found;             /* Number of entries so far */
@ -1542,6 +1568,7 @@ typedef struct compile_data {
  int  external_flags;          /* External flag bits to be set */
  int  req_varyopt;             /* "After variable item" flag for reqbyte */
  BOOL had_accept;              /* (*ACCEPT) encountered */
+  BOOL check_lookbehind;        /* Lookbehinds need later checking */
  int  nltype;                  /* Newline type */
  int  nllen;                   /* Newline string length */
  uschar nl[4];                 /* Newline string when fixed length */
@ -1565,6 +1592,7 @@ typedef struct recursion_info {
  USPTR save_start;             /* Old value of mstart */
  int *offset_save;             /* Pointer to start of saved offsets */
  int saved_max;                /* Number of saved offsets */
+  int save_offset_top;          /* Current value of offset_top */
 } recursion_info;

 /* Structure for building a chain of data for holding the values of the subject
@ -1589,6 +1617,9 @@ typedef struct match_data {
  int    offset_max;            /* The maximum usable for return data */
  int    nltype;                /* Newline type */
  int    nllen;                 /* Newline string length */
+  int    name_count;            /* Number of names in name table */
+  int    name_entry_size;       /* Size of entry in names table */
+  uschar *name_table;           /* Table of names */
  uschar nl[4];                 /* Newline string when fixed */
  const uschar *lcc;            /* Points to lower casing table */
  const uschar *ctypes;         /* Points to table of type maps */
@ -1599,7 +1630,7 @@ typedef struct match_data {
  BOOL   jscript_compat;        /* JAVASCRIPT_COMPAT flag */
  BOOL   endonly;               /* Dollar not before final \n */
  BOOL   notempty;              /* Empty string match not wanted */
-  BOOL   partial;               /* PARTIAL flag */
+  BOOL   notempty_atstart;      /* Empty string match at start not wanted */
  BOOL   hitend;                /* Hit the end of the subject at some point */
  BOOL   bsr_anycrlf;           /* \R is just any CRLF, not full Unicode */
  const uschar *start_code;     /* For use when recursing */
@ -1607,6 +1638,8 @@ typedef struct match_data {
  USPTR  end_subject;           /* End of the subject string */
  USPTR  start_match_ptr;       /* Start of matched string */
  USPTR  end_match_ptr;         /* Subject position at end match */
+  USPTR  start_used_ptr;        /* Earliest consulted character */
+  int    partial;               /* PARTIAL options */
  int    end_offset_top;        /* Highwater mark at end of match */
  int    capture_last;          /* Most recent capture number */
  int    start_offset;          /* The start offset value */
@ -1623,7 +1656,9 @@ typedef struct dfa_match_data {
  const uschar *start_code;     /* Start of the compiled pattern */
  const uschar *start_subject;  /* Start of the subject string */
  const uschar *end_subject;    /* End of subject string */
+  const uschar *start_used_ptr; /* Earliest consulted character */
  const uschar *tables;         /* Character tables */
+  int   start_offset;           /* The start offset value */
  int   moptions;               /* Match options */
  int   poptions;               /* Pattern options */
  int    nltype;                /* Newline type */
@ -1702,15 +1737,16 @@ extern const uschar _pcre_OP_lengths[];
 one of the exported public functions. They have to be "external" in the C
 sense, but are not part of the PCRE public API. */

-extern BOOL         _pcre_is_newline(const uschar *, int, const uschar *,
-                      int *, BOOL);
-extern int          _pcre_ord2utf8(int, uschar *);
-extern real_pcre   *_pcre_try_flipped(const real_pcre *, real_pcre *,
-                      const pcre_study_data *, pcre_study_data *);
-extern int          _pcre_valid_utf8(const uschar *, int);
-extern BOOL         _pcre_was_newline(const uschar *, int, const uschar *,
-                      int *, BOOL);
-extern BOOL         _pcre_xclass(int, const uschar *);
+extern const uschar *_pcre_find_bracket(const uschar *, BOOL, int);
+extern BOOL          _pcre_is_newline(const uschar *, int, const uschar *,
+                       int *, BOOL);
+extern int           _pcre_ord2utf8(int, uschar *);
+extern real_pcre    *_pcre_try_flipped(const real_pcre *, real_pcre *,
+                       const pcre_study_data *, pcre_study_data *);
+extern int           _pcre_valid_utf8(const uschar *, int);
+extern BOOL          _pcre_was_newline(const uschar *, int, const uschar *,
+                       int *, BOOL);
+extern BOOL          _pcre_xclass(int, const uschar *);


 /* Unicode character database (UCD) */
--- a/ext/pcre/pcrelib/pcre_printint.src
+++ b/ext/pcre/pcrelib/pcre_printint.src
@ -246,7 +246,12 @@ for(;;)
    fprintf(f, "%s", OP_names[*code]);
    break;

+    case OP_CLOSE:
+    fprintf(f, "    %s %d", OP_names[*code], GET2(code, 1));
+    break;
+
    case OP_CREF:
+    case OP_NCREF:
    fprintf(f, "%3d %s", GET2(code,1), OP_names[*code]);
    break;

@ -258,6 +263,14 @@ for(;;)
      fprintf(f, "    Cond recurse %d", c);
    break;

+    case OP_NRREF:
+    c = GET2(code, 1);
+    if (c == RREF_ANY)
+      fprintf(f, "    Cond nrecurse any");
+    else
+      fprintf(f, "    Cond nrecurse %d", c);
+    break;
+
    case OP_DEF:
    fprintf(f, "    Cond def");
    break;
--- a/ext/pcre/pcrelib/pcre_study.c
+++ b/ext/pcre/pcrelib/pcre_study.c
@ -6,7 +6,7 @@
 and semantics are as close as possible to those of the Perl 5 language.

                       Written by Philip Hazel
-           Copyright (c) 1997-2008 University of Cambridge
+           Copyright (c) 1997-2009 University of Cambridge

 -----------------------------------------------------------------------------
 Redistribution and use in source and binary forms, with or without
@ -52,6 +52,364 @@ supporting functions. */
 enum { SSB_FAIL, SSB_DONE, SSB_CONTINUE };


+
+/*************************************************
+*   Find the minimum subject length for a group  *
+*************************************************/
+
+/* Scan a parenthesized group and compute the minimum length of subject that
+is needed to match it. This is a lower bound; it does not mean there is a
+string of that length that matches. In UTF8 mode, the result is in characters
+rather than bytes.
+
+Arguments:
+  code       pointer to start of group (the bracket)
+  startcode  pointer to start of the whole pattern
+  options    the compiling options
+
+Returns:   the minimum length
+           -1 if \C was encountered
+           -2 internal error (missing capturing bracket)
+*/
+
+static int
+find_minlength(const uschar *code, const uschar *startcode, int options)
+{
+int length = -1;
+BOOL utf8 = (options & PCRE_UTF8) != 0;
+BOOL had_recurse = FALSE;
+register int branchlength = 0;
+register uschar *cc = (uschar *)code + 1 + LINK_SIZE;
+
+if (*code == OP_CBRA || *code == OP_SCBRA) cc += 2;
+
+/* Scan along the opcodes for this branch. If we get to the end of the
+branch, check the length against that of the other branches. */
+
+for (;;)
+  {
+  int d, min;
+  uschar *cs, *ce;
+  register int op = *cc;
+
+  switch (op)
+    {
+    case OP_CBRA:
+    case OP_SCBRA:
+    case OP_BRA:
+    case OP_SBRA:
+    case OP_ONCE:
+    case OP_COND:
+    case OP_SCOND:
+    d = find_minlength(cc, startcode, options);
+    if (d < 0) return d;
+    branchlength += d;
+    do cc += GET(cc, 1); while (*cc == OP_ALT);
+    cc += 1 + LINK_SIZE;
+    break;
+
+    /* Reached end of a branch; if it's a ket it is the end of a nested
+    call. If it's ALT it is an alternation in a nested call. If it is
+    END it's the end of the outer call. All can be handled by the same code. */
+
+    case OP_ALT:
+    case OP_KET:
+    case OP_KETRMAX:
+    case OP_KETRMIN:
+    case OP_END:
+    if (length < 0 || (!had_recurse && branchlength < length))
+      length = branchlength;
+    if (*cc != OP_ALT) return length;
+    cc += 1 + LINK_SIZE;
+    branchlength = 0;
+    had_recurse = FALSE;
+    break;
+
+    /* Skip over assertive subpatterns */
+
+    case OP_ASSERT:
+    case OP_ASSERT_NOT:
+    case OP_ASSERTBACK:
+    case OP_ASSERTBACK_NOT:
+    do cc += GET(cc, 1); while (*cc == OP_ALT);
+    /* Fall through */
+
+    /* Skip over things that don't match chars */
+
+    case OP_REVERSE:
+    case OP_CREF:
+    case OP_NCREF:
+    case OP_RREF:
+    case OP_NRREF:
+    case OP_DEF:
+    case OP_OPT:
+    case OP_CALLOUT:
+    case OP_SOD:
+    case OP_SOM:
+    case OP_EOD:
+    case OP_EODN:
+    case OP_CIRC:
+    case OP_DOLL:
+    case OP_NOT_WORD_BOUNDARY:
+    case OP_WORD_BOUNDARY:
+    cc += _pcre_OP_lengths[*cc];
+    break;
+
+    /* Skip over a subpattern that has a {0} or {0,x} quantifier */
+
+    case OP_BRAZERO:
+    case OP_BRAMINZERO:
+    case OP_SKIPZERO:
+    cc += _pcre_OP_lengths[*cc];
+    do cc += GET(cc, 1); while (*cc == OP_ALT);
+    cc += 1 + LINK_SIZE;
+    break;
+
+    /* Handle literal characters and + repetitions */
+
+    case OP_CHAR:
+    case OP_CHARNC:
+    case OP_NOT:
+    case OP_PLUS:
+    case OP_MINPLUS:
+    case OP_POSPLUS:
+    case OP_NOTPLUS:
+    case OP_NOTMINPLUS:
+    case OP_NOTPOSPLUS:
+    branchlength++;
+    cc += 2;
+#ifdef SUPPORT_UTF8
+    if (utf8 && cc[-1] >= 0xc0) cc += _pcre_utf8_table4[cc[-1] & 0x3f];
+#endif
+    break;
+
+    case OP_TYPEPLUS:
+    case OP_TYPEMINPLUS:
+    case OP_TYPEPOSPLUS:
+    branchlength++;
+    cc += (cc[1] == OP_PROP || cc[1] == OP_NOTPROP)? 4 : 2;
+    break;
+
+    /* Handle exact repetitions. The count is already in characters, but we
+    need to skip over a multibyte character in UTF8 mode.  */
+
+    case OP_EXACT:
+    case OP_NOTEXACT:
+    branchlength += GET2(cc,1);
+    cc += 4;
+#ifdef SUPPORT_UTF8
+    if (utf8 && cc[-1] >= 0xc0) cc += _pcre_utf8_table4[cc[-1] & 0x3f];
+#endif
+    break;
+
+    case OP_TYPEEXACT:
+    branchlength += GET2(cc,1);
+    cc += (cc[3] == OP_PROP || cc[3] == OP_NOTPROP)? 6 : 4;
+    break;
+
+    /* Handle single-char non-literal matchers */
+
+    case OP_PROP:
+    case OP_NOTPROP:
+    cc += 2;
+    /* Fall through */
+
+    case OP_NOT_DIGIT:
+    case OP_DIGIT:
+    case OP_NOT_WHITESPACE:
+    case OP_WHITESPACE:
+    case OP_NOT_WORDCHAR:
+    case OP_WORDCHAR:
+    case OP_ANY:
+    case OP_ALLANY:
+    case OP_EXTUNI:
+    case OP_HSPACE:
+    case OP_NOT_HSPACE:
+    case OP_VSPACE:
+    case OP_NOT_VSPACE:
+    branchlength++;
+    cc++;
+    break;
+
+    /* "Any newline" might match two characters */
+
+    case OP_ANYNL:
+    branchlength += 2;
+    cc++;
+    break;
+
+    /* The single-byte matcher means we can't proceed in UTF-8 mode */
+
+    case OP_ANYBYTE:
+#ifdef SUPPORT_UTF8
+    if (utf8) return -1;
+#endif
+    branchlength++;
+    cc++;
+    break;
+
+    /* For repeated character types, we have to test for \p and \P, which have
+    an extra two bytes of parameters. */
+
+    case OP_TYPESTAR:
+    case OP_TYPEMINSTAR:
+    case OP_TYPEQUERY:
+    case OP_TYPEMINQUERY:
+    case OP_TYPEPOSSTAR:
+    case OP_TYPEPOSQUERY:
+    if (cc[1] == OP_PROP || cc[1] == OP_NOTPROP) cc += 2;
+    cc += _pcre_OP_lengths[op];
+    break;
+
+    case OP_TYPEUPTO:
+    case OP_TYPEMINUPTO:
+    case OP_TYPEPOSUPTO:
+    if (cc[3] == OP_PROP || cc[3] == OP_NOTPROP) cc += 2;
+    cc += _pcre_OP_lengths[op];
+    break;
+
+    /* Check a class for variable quantification */
+
+#ifdef SUPPORT_UTF8
+    case OP_XCLASS:
+    cc += GET(cc, 1) - 33;
+    /* Fall through */
+#endif
+
+    case OP_CLASS:
+    case OP_NCLASS:
+    cc += 33;
+
+    switch (*cc)
+      {
+      case OP_CRPLUS:
+      case OP_CRMINPLUS:
+      branchlength++;
+      /* Fall through */
+
+      case OP_CRSTAR:
+      case OP_CRMINSTAR:
+      case OP_CRQUERY:
+      case OP_CRMINQUERY:
+      cc++;
+      break;
+
+      case OP_CRRANGE:
+      case OP_CRMINRANGE:
+      branchlength += GET2(cc,1);
+      cc += 5;
+      break;
+
+      default:
+      branchlength++;
+      break;
+      }
+    break;
+
+    /* Backreferences and subroutine calls are treated in the same way: we find
+    the minimum length for the subpattern. A recursion, however, causes an
+    a flag to be set that causes the length of this branch to be ignored. The
+    logic is that a recursion can only make sense if there is another
+    alternation that stops the recursing. That will provide the minimum length
+    (when no recursion happens). A backreference within the group that it is
+    referencing behaves in the same way.
+
+    If PCRE_JAVASCRIPT_COMPAT is set, a backreference to an unset bracket
+    matches an empty string (by default it causes a matching failure), so in
+    that case we must set the minimum length to zero. */
+
+    case OP_REF:
+    if ((options & PCRE_JAVASCRIPT_COMPAT) == 0)
+      {
+      ce = cs = (uschar *)_pcre_find_bracket(startcode, utf8, GET2(cc, 1));
+      if (cs == NULL) return -2;
+      do ce += GET(ce, 1); while (*ce == OP_ALT);
+      if (cc > cs && cc < ce)
+        {
+        d = 0;
+        had_recurse = TRUE;
+        }
+      else d = find_minlength(cs, startcode, options);
+      }
+    else d = 0;
+    cc += 3;
+
+    /* Handle repeated back references */
+
+    switch (*cc)
+      {
+      case OP_CRSTAR:
+      case OP_CRMINSTAR:
+      case OP_CRQUERY:
+      case OP_CRMINQUERY:
+      min = 0;
+      cc++;
+      break;
+
+      case OP_CRRANGE:
+      case OP_CRMINRANGE:
+      min = GET2(cc, 1);
+      cc += 5;
+      break;
+
+      default:
+      min = 1;
+      break;
+      }
+
+    branchlength += min * d;
+    break;
+
+    case OP_RECURSE:
+    cs = ce = (uschar *)startcode + GET(cc, 1);
+    if (cs == NULL) return -2;
+    do ce += GET(ce, 1); while (*ce == OP_ALT);
+    if (cc > cs && cc < ce)
+      had_recurse = TRUE;
+    else
+      branchlength += find_minlength(cs, startcode, options);
+    cc += 1 + LINK_SIZE;
+    break;
+
+    /* Anything else does not or need not match a character. We can get the
+    item's length from the table, but for those that can match zero occurrences
+    of a character, we must take special action for UTF-8 characters. */
+
+    case OP_UPTO:
+    case OP_NOTUPTO:
+    case OP_MINUPTO:
+    case OP_NOTMINUPTO:
+    case OP_POSUPTO:
+    case OP_STAR:
+    case OP_MINSTAR:
+    case OP_NOTMINSTAR:
+    case OP_POSSTAR:
+    case OP_NOTPOSSTAR:
+    case OP_QUERY:
+    case OP_MINQUERY:
+    case OP_NOTMINQUERY:
+    case OP_POSQUERY:
+    case OP_NOTPOSQUERY:
+    cc += _pcre_OP_lengths[op];
+#ifdef SUPPORT_UTF8
+    if (utf8 && cc[-1] >= 0xc0) cc += _pcre_utf8_table4[cc[-1] & 0x3f];
+#endif
+    break;
+
+    /* For the record, these are the opcodes that are matched by "default":
+    OP_ACCEPT, OP_CLOSE, OP_COMMIT, OP_FAIL, OP_PRUNE, OP_SET_SOM, OP_SKIP,
+    OP_THEN. */
+
+    default:
+    cc += _pcre_OP_lengths[op];
+    break;
+    }
+  }
+/* Control never gets here */
+}
+
+
+
 /*************************************************
 *      Set a bit and maybe its alternate case    *
 *************************************************/
@ -498,13 +856,15 @@ Arguments:
            set NULL unless error

 Returns:    pointer to a pcre_extra block, with study_data filled in and the
-              appropriate flag set;
+              appropriate flags set;
            NULL on error or if no optimization possible
 */

 PCRE_EXP_DEFN pcre_extra * PCRE_CALL_CONVENTION
 pcre_study(const pcre *external_re, int options, const char **errorptr)
 {
+int min;
+BOOL bits_set = FALSE;
 uschar start_bits[32];
 pcre_extra *extra;
 pcre_study_data *study;
@ -531,30 +891,39 @@ code = (uschar *)re + re->name_table_offset +
  (re->name_count * re->name_entry_size);

 /* For an anchored pattern, or an unanchored pattern that has a first char, or
-a multiline pattern that matches only at "line starts", no further processing
-at present. */
+a multiline pattern that matches only at "line starts", there is no point in
+seeking a list of starting bytes. */

-if ((re->options & PCRE_ANCHORED) != 0 ||
-    (re->flags & (PCRE_FIRSTSET|PCRE_STARTLINE)) != 0)
-  return NULL;
+if ((re->options & PCRE_ANCHORED) == 0 &&
+    (re->flags & (PCRE_FIRSTSET|PCRE_STARTLINE)) == 0)
+  {
+  /* Set the character tables in the block that is passed around */

-/* Set the character tables in the block that is passed around */
+  tables = re->tables;
+  if (tables == NULL)
+    (void)pcre_fullinfo(external_re, NULL, PCRE_INFO_DEFAULT_TABLES,
+    (void *)(&tables));

-tables = re->tables;
-if (tables == NULL)
-  (void)pcre_fullinfo(external_re, NULL, PCRE_INFO_DEFAULT_TABLES,
-  (void *)(&tables));
+  compile_block.lcc = tables + lcc_offset;
+  compile_block.fcc = tables + fcc_offset;
+  compile_block.cbits = tables + cbits_offset;
+  compile_block.ctypes = tables + ctypes_offset;

-compile_block.lcc = tables + lcc_offset;
-compile_block.fcc = tables + fcc_offset;
-compile_block.cbits = tables + cbits_offset;
-compile_block.ctypes = tables + ctypes_offset;
+  /* See if we can find a fixed set of initial characters for the pattern. */

-/* See if we can find a fixed set of initial characters for the pattern. */
+  memset(start_bits, 0, 32 * sizeof(uschar));
+  bits_set = set_start_bits(code, start_bits,
+    (re->options & PCRE_CASELESS) != 0, (re->options & PCRE_UTF8) != 0,
+    &compile_block) == SSB_DONE;
+  }

-memset(start_bits, 0, 32 * sizeof(uschar));
-if (set_start_bits(code, start_bits, (re->options & PCRE_CASELESS) != 0,
-  (re->options & PCRE_UTF8) != 0, &compile_block) != SSB_DONE) return NULL;
+/* Find the minimum length of subject string. */
+
+min = find_minlength(code, code, re->options);
+
+/* Return NULL if no optimization is possible. */
+
+if (!bits_set && min < 0) return NULL;

 /* Get a pcre_extra block and a pcre_study_data block. The study data is put in
 the latter, which is pointed to by the former, which may also get additional
@ -577,8 +946,19 @@ extra->flags = PCRE_EXTRA_STUDY_DATA;
 extra->study_data = study;

 study->size = sizeof(pcre_study_data);
-study->options = PCRE_STUDY_MAPPED;
-memcpy(study->start_bits, start_bits, sizeof(start_bits));
+study->flags = 0;
+
+if (bits_set)
+  {
+  study->flags |= PCRE_STUDY_MAPPED;
+  memcpy(study->start_bits, start_bits, sizeof(start_bits));
+  }
+
+if (min >= 0)
+  {
+  study->flags |= PCRE_STUDY_MINLEN;
+  study->minlength = min;
+  }

 return extra;
 }
--- a/ext/pcre/pcrelib/pcre_try_flipped.c
+++ b/ext/pcre/pcrelib/pcre_try_flipped.c
@ -6,7 +6,7 @@
 and semantics are as close as possible to those of the Perl 5 language.

                       Written by Philip Hazel
-           Copyright (c) 1997-2008 University of Cambridge
+           Copyright (c) 1997-2009 University of Cambridge

 -----------------------------------------------------------------------------
 Redistribution and use in source and binary forms, with or without
@ -126,7 +126,9 @@ if (study != NULL)
  {
  *internal_study = *study;   /* To copy other fields */
  internal_study->size = byteflip(study->size, sizeof(study->size));
-  internal_study->options = byteflip(study->options, sizeof(study->options));
+  internal_study->flags = byteflip(study->flags, sizeof(study->flags));
+  internal_study->minlength = byteflip(study->minlength,
+    sizeof(study->minlength));
  }

 return internal_re;
--- a/ext/pcre/pcrelib/pcre_ucd.c
+++ b/ext/pcre/pcrelib/pcre_ucd.c
@ -1,9 +1,26 @@
 #include "config.h"
+
 #include "pcre_internal.h"

 /* Unicode character database. */
 /* This file was autogenerated by the MultiStage2.py script. */
 /* Total size: 52808 bytes, block size: 128. */
+
+/* The tables herein are needed only when UCP support is built */
+/* into PCRE. This module should not be referenced otherwise, so */
+/* it should not matter whether it is compiled or not. However */
+/* a comment was received about space saving - maybe the guy linked */
+/* all the modules rather than using a library - so we include a */
+/* condition to cut out the tables when not needed. But don't leave */
+/* a totally empty module because some compilers barf at that. */
+/* Instead, just supply small dummy tables. */
+
+#ifndef SUPPORT_UCP
+const ucd_record _pcre_ucd_records[] = {{0,0,0 }};
+const uschar _pcre_ucd_stage1[] = {0};
+const pcre_uint16 _pcre_ucd_stage2[] = {0};
+#else
+
 /* When recompiling tables with a new Unicode version,
 please check types in the structure definition from pcre_internal.h:
 typedef struct {
@ -2606,3 +2623,4 @@ const pcre_uint16 _pcre_ucd_stage2[] = { /* 40448 bytes, block = 128 */
 #if UCD_BLOCK_SIZE != 128
 #error Please correct UCD_BLOCK_SIZE in pcre_internal.h
 #endif
+#endif  /* SUPPORT_UCP */
--- a/ext/pcre/pcrelib/pcredemo.c
+++ b/ext/pcre/pcrelib/pcredemo.c
@ -223,12 +223,12 @@ if (namecount <= 0) printf("No named substrings\n"); else
 *                                                                        *
 * If the previous match WAS for an empty string, we can't do that, as it *
 * would lead to an infinite loop. Instead, a special call of pcre_exec() *
-* is made with the PCRE_NOTEMPTY and PCRE_ANCHORED flags set. The first  *
-* of these tells PCRE that an empty string is not a valid match; other   *
-* possibilities must be tried. The second flag restricts PCRE to one     *
-* match attempt at the initial string position. If this match succeeds,  *
-* an alternative to the empty string match has been found, and we can    *
-* proceed round the loop.                                                *
+* is made with the PCRE_NOTEMPTY_ATSTART and PCRE_ANCHORED flags set.    *
+* The first of these tells PCRE that an empty string at the start of the *
+* subject is not a valid match; other possibilities must be tried. The   *
+* second flag restricts PCRE to one match attempt at the initial string  *
+* position. If this match succeeds, an alternative to the empty string   *
+* match has been found, and we can proceed round the loop.               *
 *************************************************************************/

 if (!find_all)
@ -251,7 +251,7 @@ for (;;)
  if (ovector[0] == ovector[1])
    {
    if (ovector[0] == subject_length) break;
-    options = PCRE_NOTEMPTY | PCRE_ANCHORED;
+    options = PCRE_NOTEMPTY_ATSTART | PCRE_ANCHORED;
    }

  /* Run the next matching operation */
--- a/ext/pcre/pcrelib/pcreposix.c
+++ b/ext/pcre/pcrelib/pcreposix.c
@ -68,64 +68,80 @@ static const int eint[] = {
  REG_EESCAPE, /* \c at end of pattern */
  REG_EESCAPE, /* unrecognized character follows \ */
  REG_BADBR,   /* numbers out of order in {} quantifier */
+  /* 5 */
  REG_BADBR,   /* number too big in {} quantifier */
  REG_EBRACK,  /* missing terminating ] for character class */
  REG_ECTYPE,  /* invalid escape sequence in character class */
  REG_ERANGE,  /* range out of order in character class */
  REG_BADRPT,  /* nothing to repeat */
+  /* 10 */
  REG_BADRPT,  /* operand of unlimited repeat could match the empty string */
  REG_ASSERT,  /* internal error: unexpected repeat */
  REG_BADPAT,  /* unrecognized character after (? */
  REG_BADPAT,  /* POSIX named classes are supported only within a class */
  REG_EPAREN,  /* missing ) */
+  /* 15 */
  REG_ESUBREG, /* reference to non-existent subpattern */
  REG_INVARG,  /* erroffset passed as NULL */
  REG_INVARG,  /* unknown option bit(s) set */
  REG_EPAREN,  /* missing ) after comment */
  REG_ESIZE,   /* parentheses nested too deeply */
+  /* 20 */
  REG_ESIZE,   /* regular expression too large */
  REG_ESPACE,  /* failed to get memory */
-  REG_EPAREN,  /* unmatched brackets */
+  REG_EPAREN,  /* unmatched parentheses */
  REG_ASSERT,  /* internal error: code overflow */
  REG_BADPAT,  /* unrecognized character after (?< */
+  /* 25 */
  REG_BADPAT,  /* lookbehind assertion is not fixed length */
  REG_BADPAT,  /* malformed number or name after (?( */
  REG_BADPAT,  /* conditional group contains more than two branches */
  REG_BADPAT,  /* assertion expected after (?( */
  REG_BADPAT,  /* (?R or (?[+-]digits must be followed by ) */
+  /* 30 */
  REG_ECTYPE,  /* unknown POSIX class name */
  REG_BADPAT,  /* POSIX collating elements are not supported */
  REG_INVARG,  /* this version of PCRE is not compiled with PCRE_UTF8 support */
  REG_BADPAT,  /* spare error */
  REG_BADPAT,  /* character value in \x{...} sequence is too large */
+  /* 35 */
  REG_BADPAT,  /* invalid condition (?(0) */
  REG_BADPAT,  /* \C not allowed in lookbehind assertion */
  REG_EESCAPE, /* PCRE does not support \L, \l, \N, \U, or \u */
  REG_BADPAT,  /* number after (?C is > 255 */
  REG_BADPAT,  /* closing ) for (?C expected */
+  /* 40 */
  REG_BADPAT,  /* recursive call could loop indefinitely */
  REG_BADPAT,  /* unrecognized character after (?P */
  REG_BADPAT,  /* syntax error in subpattern name (missing terminator) */
  REG_BADPAT,  /* two named subpatterns have the same name */
  REG_BADPAT,  /* invalid UTF-8 string */
+  /* 45 */
  REG_BADPAT,  /* support for \P, \p, and \X has not been compiled */
  REG_BADPAT,  /* malformed \P or \p sequence */
  REG_BADPAT,  /* unknown property name after \P or \p */
  REG_BADPAT,  /* subpattern name is too long (maximum 32 characters) */
  REG_BADPAT,  /* too many named subpatterns (maximum 10,000) */
+  /* 50 */
  REG_BADPAT,  /* repeated subpattern is too long */
  REG_BADPAT,  /* octal value is greater than \377 (not in UTF-8 mode) */
  REG_BADPAT,  /* internal error: overran compiling workspace */
  REG_BADPAT,  /* internal error: previously-checked referenced subpattern not found */
  REG_BADPAT,  /* DEFINE group contains more than one branch */
+  /* 55 */
  REG_BADPAT,  /* repeating a DEFINE group is not allowed */
  REG_INVARG,  /* inconsistent NEWLINE options */
  REG_BADPAT,  /* \g is not followed followed by an (optionally braced) non-zero number */
-  REG_BADPAT,  /* (?+ or (?- must be followed by a non-zero number */
+  REG_BADPAT,  /* a numbered reference must not be zero */
+  REG_BADPAT,  /* (*VERB) with an argument is not supported */
+  /* 60 */
+  REG_BADPAT,  /* (*VERB) not recognized */
  REG_BADPAT,  /* number is too big */
  REG_BADPAT,  /* subpattern name expected */
  REG_BADPAT,  /* digit expected after (?+ */
-  REG_BADPAT   /* ] is an invalid data character in JavaScript compatibility mode */
+  REG_BADPAT,  /* ] is an invalid data character in JavaScript compatibility mode */
+  /* 65 */
+  REG_BADPAT   /* different names for subpatterns of the same number are not allowed */
 };

 /* Table of texts corresponding to POSIX error codes */
@ -224,17 +240,25 @@ int erroffset;
 int errorcode;
 int options = 0;

-if ((cflags & REG_ICASE) != 0)   options |= PCRE_CASELESS;
-if ((cflags & REG_NEWLINE) != 0) options |= PCRE_MULTILINE;
-if ((cflags & REG_DOTALL) != 0)  options |= PCRE_DOTALL;
-if ((cflags & REG_NOSUB) != 0)   options |= PCRE_NO_AUTO_CAPTURE;
-if ((cflags & REG_UTF8) != 0)    options |= PCRE_UTF8;
+if ((cflags & REG_ICASE) != 0)    options |= PCRE_CASELESS;
+if ((cflags & REG_NEWLINE) != 0)  options |= PCRE_MULTILINE;
+if ((cflags & REG_DOTALL) != 0)   options |= PCRE_DOTALL;
+if ((cflags & REG_NOSUB) != 0)    options |= PCRE_NO_AUTO_CAPTURE;
+if ((cflags & REG_UTF8) != 0)     options |= PCRE_UTF8;
+if ((cflags & REG_UNGREEDY) != 0) options |= PCRE_UNGREEDY;

 preg->re_pcre = pcre_compile2(pattern, options, &errorcode, &errorptr,
  &erroffset, NULL);
 preg->re_erroffset = erroffset;

-if (preg->re_pcre == NULL) return eint[errorcode];
+/* Safety: if the error code is too big for the translation vector (which
+should not happen, but we all make mistakes), return REG_BADPAT. */
+
+if (preg->re_pcre == NULL)
+  {
+  return (errorcode < sizeof(eint)/sizeof(const int))?
+    eint[errorcode] : REG_BADPAT;
+  }

 preg->re_nsub = pcre_info((const pcre *)preg->re_pcre, NULL, NULL);
 return 0;
@ -276,10 +300,11 @@ if ((eflags & REG_NOTEMPTY) != 0) options |= PCRE_NOTEMPTY;

 ((regex_t *)preg)->re_erroffset = (size_t)(-1);  /* Only has meaning after compile */

-/* When no string data is being returned, ensure that nmatch is zero.
-Otherwise, ensure the vector for holding the return data is large enough. */
+/* When no string data is being returned, or no vector has been passed in which
+to put it, ensure that nmatch is zero. Otherwise, ensure the vector for holding
+the return data is large enough. */

-if (nosub) nmatch = 0;
+if (nosub || pmatch == NULL) nmatch = 0;

 else if (nmatch > 0)
  {
--- a/ext/pcre/pcrelib/pcreposix.h
+++ b/ext/pcre/pcrelib/pcreposix.h
@ -50,17 +50,18 @@ POSSIBILITY OF SUCH DAMAGE.
 extern "C" {
 #endif

-/* Options, mostly defined by POSIX, but with a couple of extras. */
+/* Options, mostly defined by POSIX, but with some extras. */

-#define REG_ICASE     0x0001
-#define REG_NEWLINE   0x0002
-#define REG_NOTBOL    0x0004
-#define REG_NOTEOL    0x0008
-#define REG_DOTALL    0x0010   /* NOT defined by POSIX. */
-#define REG_NOSUB     0x0020
-#define REG_UTF8      0x0040   /* NOT defined by POSIX. */
+#define REG_ICASE     0x0001   /* Maps to PCRE_CASELESS */
+#define REG_NEWLINE   0x0002   /* Maps to PCRE_MULTILINE */
+#define REG_NOTBOL    0x0004   /* Maps to PCRE_NOTBOL */
+#define REG_NOTEOL    0x0008   /* Maps to PCRE_NOTEOL */
+#define REG_DOTALL    0x0010   /* NOT defined by POSIX; maps to PCRE_DOTALL */
+#define REG_NOSUB     0x0020   /* Maps to PCRE_NO_AUTO_CAPTURE */
+#define REG_UTF8      0x0040   /* NOT defined by POSIX; maps to PCRE_UTF8 */
 #define REG_STARTEND  0x0080   /* BSD feature: pass subject string by so,eo */
-#define REG_NOTEMPTY  0x0100   /* NOT defined by POSIX. */
+#define REG_NOTEMPTY  0x0100   /* NOT defined by POSIX; maps to PCRE_NOTEMPTY */
+#define REG_UNGREEDY  0x0200   /* NOT defined by POSIX; maps to PCRE_UNGREEDY */

 /* This is not used by PCRE, but by defining it we make it easier
 to slot PCRE into existing programs that make POSIX calls. */
--- a/ext/pcre/pcrelib/testdata/grepoutput
+++ b/ext/pcre/pcrelib/testdata/grepoutput
@ -423,3 +423,27 @@ This time it [1;31mjumps[00m and [1;31mjumps[00m and [1;31mjumps[00m.
 Here is the [1;31mpattern[00m again.
 That time it was on a [1;31mline by itself[00m.
 This line contains [1;31mpattern[00m not on a [1;31mline by itself[00m.
+---------------------------- Test 55 -----------------------------
+./testdata/grepinput:456
+./testdata/grepinput8:0
+./testdata/grepinputv:1
+./testdata/grepinputx:0
+---------------------------- Test 56 -----------------------------
+./testdata/grepinput:456
+./testdata/grepinputv:1
+---------------------------- Test 57 -----------------------------
+PATTERN at the start of a line.
+In the middle of a line, PATTERN appears.
+Check up on PATTERN near the end.
+---------------------------- Test 58 -----------------------------
+PATTERN at the start of a line.
+In the middle of a line, PATTERN appears.
+Check up on PATTERN near the end.
+---------------------------- Test 59 -----------------------------
+PATTERN at the start of a line.
+In the middle of a line, PATTERN appears.
+Check up on PATTERN near the end.
+---------------------------- Test 60 -----------------------------
+PATTERN at the start of a line.
+In the middle of a line, PATTERN appears.
+Check up on PATTERN near the end.
--- a/ext/pcre/pcrelib/testdata/testinput1
+++ b/ext/pcre/pcrelib/testdata/testinput1
@ -1,3 +1,6 @@
+/-- This set of tests is for features that are compatible with all versions of
+    Perl 5, in non-UTF-8 mode. --/
+
 /the quick brown fox/
    the quick brown fox
    The quick brown FOX
@ -4064,4 +4067,4 @@
 /^%((?(?=[a])[^%])|b)*%$/
    %ab%

-/ End of testinput1 /
+/-- End of testinput1 --/
--- a/ext/pcre/pcrelib/testdata/testinput10
+++ b/ext/pcre/pcrelib/testdata/testinput10
@ -121,4 +121,4 @@ are all themselves checked in other tests. --/

 /[^\xaa]/8BM

-/ End of testinput10 /
+/-- End of testinput10 --/
--- a/ext/pcre/pcrelib/testdata/testinput2
+++ b/ext/pcre/pcrelib/testdata/testinput2
@ -1,3 +1,14 @@
+/-- This set of tests is not Perl-compatible. It checks on special features
+    of PCRE's API, error diagnostics, and the compiled code of some patterns.
+    It also checks the non-Perl syntax the PCRE supports (Python, .NET, 
+    Oniguruma). Finally, there are some tests where PCRE and Perl differ, 
+    either because PCRE can't be compatible, or there is potential Perl 
+    bug. --/  
+  
+/-- Originally, the Perl 5.10 things were in here too, but now I have separated
+    many (most?) of them out into test 11. However, there may still be some
+    that were overlooked. --/   
+
 /(a)b|/I

 /abc/I
@ -123,38 +134,38 @@
    defabc
    \Zdefabc

-/abc/IP
+/abc/P
    abc
    *** Failers

-/^abc|def/IP
+/^abc|def/P
    abcdef
    abcdef\B

-/.*((abc)$|(def))/IP
+/.*((abc)$|(def))/P
    defabc
    \Zdefabc

-/the quick brown fox/IP
+/the quick brown fox/P
    the quick brown fox
    *** Failers
    The Quick Brown Fox

-/the quick brown fox/IPi
+/the quick brown fox/Pi
    the quick brown fox
    The Quick Brown Fox

-/abc.def/IP
+/abc.def/P
    *** Failers
    abc\ndef

-/abc$/IP
+/abc$/P
    abc
    abc\n

-/(abc)\2/IP
+/(abc)\2/P

-/(abc\1)/IP
+/(abc\1)/P
    abc

 /)/
@ -593,7 +604,7 @@
    *** Failers
    \Nabc

-/a*(b+)(z)(z)/IP
+/a*(b+)(z)(z)/P
    aaaabbbbzzzz
    aaaabbbbzzzz\O0
    aaaabbbbzzzz\O1
@ -1122,14 +1133,6 @@

 /(a(?1)+b)/DZ

-/^\W*(?:((.)\W*(?1)\W*\2|)|((.)\W*(?3)\W*\4|\W*.\W*))\W*$/Ii
-    1221
-    Satan, oscillate my metallic sonatas!
-    A man, a plan, a canal: Panama!
-    Able was I ere I saw Elba.
-    *** Failers
-    The quick brown fox
-
 /^(\d+|\((?1)([+*-])(?1)\)|-(?1))$/I
    12
    (((2+2)*-3)-7)
@ -1419,13 +1422,13 @@
    ** Failers
    line one\nthis is a line\nbreak in the second line

-/ab.cd/IP
+/ab.cd/P
    ab-cd
    ab=cd
    ** Failers
    ab\ncd

-/ab.cd/IPs
+/ab.cd/Ps
    ab-cd
    ab=cd
    ab\ncd
@ -1480,10 +1483,10 @@
    (this)
    ((this))

-/a(b)c/IPN
+/a(b)c/PN
    abc

-/a(?P<name>b)c/IPN
+/a(?P<name>b)c/PN
    abc

 /\x{100}/I
@ -1915,13 +1918,6 @@ a random value. /Ix
 /(?=(?'abc'\w+))\k<abc>:/I
    abcd:

-/(?'abc'\w+):\k<abc>{2}/
-    a:aaxyz
-    ab:ababxyz
-    ** Failers
-    a:axyz
-    ab:abxyz
-
 /(?'abc'a|b)(?<abc>d|e)\k<abc>{2}/J
    adaa
    ** Failers
@ -1934,10 +1930,6 @@ a random value. /Ix
    ** Failers
    bddd

-/^(?<ab>a)? (?(<ab>)b|c) (?('ab')d|e)/x
-    abd
-    ce
-
 /(?(<bc))/

 /(?(''))/
@ -1955,16 +1947,6 @@ a random value. /Ix
 /(?<1> (?'B' abc (?(R) (?(R&1)1) (?(R&B)2) X  |  (?1)  (?2)   (?R) ))) /x
    abcabc1Xabc2XabcXabcabc

-/^(?(DEFINE) (?<A> a) (?<B> b) )  (?&A) (?&B) /x
-    abcd
-
-/(?<NAME>(?&NAME_PAT))\s+(?<ADDR>(?&ADDRESS_PAT))
-  (?(DEFINE)
-  (?<NAME_PAT>[a-z]+)
-  (?<ADDRESS_PAT>\d+)
-  )/x
-    metcalfe 33
-
 /^(?(DEFINE) abc | xyz ) /x

 /(?(DEFINE) abc) xyz/xI
@ -2053,22 +2035,6 @@ a random value. /Ix
 /(?1)X(?<abc>P)/I
    abcPXP123

-/(?(DEFINE)(?<byte>2[0-4]\d|25[0-5]|1\d\d|[1-9]?\d))\b(?&byte)(\.(?&byte)){3}/
-    1.2.3.4
-    131.111.10.206
-    10.0.0.0
-    ** Failers
-    10.6
-    455.3.4.5
-
-/\b(?&byte)(\.(?&byte)){3}(?(DEFINE)(?<byte>2[0-4]\d|25[0-5]|1\d\d|[1-9]?\d))/
-    1.2.3.4
-    131.111.10.206
-    10.0.0.0
-    ** Failers
-    10.6
-    455.3.4.5
-
 /(?:a(?&abc)b)*(?<abc>x)/
    123axbaxbaxbx456
    123axbaxbaxb456
@ -2090,9 +2056,6 @@ a random value. /Ix
   defabcabcxyz
   DEFabcABCXYZ

-/^(a(b))\1\g1\g{1}\g-1\g{-1}\g{-02}Z/
-    ababababbbabZXXXX
-
 /^(a)\g-2/

 /^(a)\g/
@ -2191,26 +2154,12 @@ a random value. /Ix
 /^(?(+1)X|Y)(.)/BZ
    Y!

-/(foo)\Kbar/
-    foobar
-   
-/(foo)(\Kbar|baz)/
-    foobar
-    foobaz 
-
-/(foo\Kbar)baz/
-    foobarbaz
-
 /(?<A>tom|bon)-\k{A}/
    tom-tom
    bon-bon 
    ** Failers
    tom-bon  

-/(?<A>tom|bon)-\g{A}/
-    tom-tom
-    bon-bon 
-    
 /\g{A/ 

 /(?|(abc)|(xyz))/BZ
@ -2225,50 +2174,6 @@ a random value. /Ix
    xabcpqrx
    xxyzx 

-/(?|(abc)|(xyz))\1/
-    abcabc
-    xyzxyz 
-    ** Failers
-    abcxyz
-    xyzabc   
-    
-/(?|(abc)|(xyz))(?1)/
-    abcabc
-    xyzabc 
-    ** Failers 
-    xyzxyz 
- 
-/\H\h\V\v/
-    X X\x0a
-    X\x09X\x0b
-    ** Failers
-    \xa0 X\x0a   
-    
-/\H*\h+\V?\v{3,4}/ 
-    \x09\x20\xa0X\x0a\x0b\x0c\x0d\x0a
-    \x09\x20\xa0\x0a\x0b\x0c\x0d\x0a
-    \x09\x20\xa0\x0a\x0b\x0c
-    ** Failers 
-    \x09\x20\xa0\x0a\x0b
-     
-/\H{3,4}/
-    XY  ABCDE
-    XY  PQR ST 
-    
-/.\h{3,4}./
-    XY  AB    PQRS
-
-/\h*X\h?\H+Y\H?Z/
-    >XNNNYZ
-    >  X NYQZ
-    ** Failers
-    >XYZ   
-    >  X NY Z
-
-/\v*X\v?Y\v+Z\V*\x0a\V+\x0b\V{2,3}\x0c/
-    >XY\x0aZ\x0aA\x0bNN\x0c
-    >\x0a\x0dX\x0aY\x0a\x0bZZZ\x0aAAA\x0bNNN\x0c
-
 /[\h]/BZ
    >\x09<

@ -2341,49 +2246,6 @@ a random value. /Ix

 /A(*PRUNE)B(*SKIP)C(*THEN)D(*COMMIT)E(*F)F(*FAIL)G(?!)H(*ACCEPT)I/BZ

-/^a+(*FAIL)/
-    aaaaaa
-    
-/a+b?c+(*FAIL)/
-    aaabccc
-
-/a+b?(*PRUNE)c+(*FAIL)/
-    aaabccc
-
-/a+b?(*COMMIT)c+(*FAIL)/
-    aaabccc
-    
-/a+b?(*SKIP)c+(*FAIL)/
-    aaabcccaaabccc
-
-/^(?:aaa(*THEN)\w{6}|bbb(*THEN)\w{5}|ccc(*THEN)\w{4}|\w{3})/
-    aaaxxxxxx
-    aaa++++++ 
-    bbbxxxxx
-    bbb+++++ 
-    cccxxxx
-    ccc++++ 
-    dddddddd   
-
-/^(aaa(*THEN)\w{6}|bbb(*THEN)\w{5}|ccc(*THEN)\w{4}|\w{3})/
-    aaaxxxxxx
-    aaa++++++ 
-    bbbxxxxx
-    bbb+++++ 
-    cccxxxx
-    ccc++++ 
-    dddddddd   
-
-/a+b?(*THEN)c+(*FAIL)/
-    aaabccc
-
-/(A (A|B(*ACCEPT)|C) D)(E)/x
-    ABX
-    AADE
-    ACDE
-    ** Failers
-    AD 
-        
 /^a+(*FAIL)/C
    aaaaaa
    
@ -2589,66 +2451,8 @@ a random value. /Ix

 /[[:a\dz:]]/

-/^(?<name>a|b\g<name>c)/
-    aaaa
-    bacxxx
-    bbaccxxx 
-    bbbacccxx
-
-/^(?<name>a|b\g'name'c)/
-    aaaa
-    bacxxx
-    bbaccxxx 
-    bbbacccxx
-
-/^(a|b\g<1>c)/
-    aaaa
-    bacxxx
-    bbaccxxx 
-    bbbacccxx
-
-/^(a|b\g'1'c)/
-    aaaa
-    bacxxx
-    bbaccxxx 
-    bbbacccxx
-
-/^(a|b\g'-1'c)/
-    aaaa
-    bacxxx
-    bbaccxxx 
-    bbbacccxx
-
-/(^(a|b\g<-1>c))/
-    aaaa
-    bacxxx
-    bbaccxxx 
-    bbbacccxx
-
 /(^(a|b\g<-1'c))/

-/(^(a|b\g{-1}))/
-    bacxxx
-
-/(?-i:\g<name>)(?i:(?<name>a))/
-    XaaX
-    XAAX 
-
-/(?i:\g<name>)(?-i:(?<name>a))/
-    XaaX
-    ** Failers 
-    XAAX 
-
-/(?-i:\g<+1>)(?i:(a))/
-    XaaX
-    XAAX 
-
-/(?=(?<regex>(?#simplesyntax)\$(?<name>[a-zA-Z_\x{7f}-\x{ff}][a-zA-Z0-9_\x{7f}-\x{ff}]*)(?:\[(?<index>[a-zA-Z0-9_\x{7f}-\x{ff}]+|\$\g<name>)\]|->\g<name>(\(.*?\))?)?|(?#simple syntax withbraces)\$\{(?:\g<name>(?<indices>\[(?:\g<index>|'(?:\\.|[^'\\])*'|"(?:\g<regex>|\\.|[^"\\])*")\])?|\g<complex>|\$\{\g<complex>\})\}|(?#complexsyntax)\{(?<complex>\$(?<segment>\g<name>(\g<indices>*|\(.*?\))?)(?:->\g<segment>)*|\$\g<complex>|\$\{\g<complex>\})\}))\{/
-
-/(?<n>a|b|c)\g<n>*/
-   abc
-   accccbbb 
-
 /^(?+1)(?<a>x|y){0}z/
    xzxx
    yzyy 
@ -2755,22 +2559,614 @@ a random value. /Ix
 /^"((?(?=[a])[^"])|b)*"$/
    "ab"

-/^X(?5)(a)(?|(b)|(q))(c)(d)(Y)/
-    XYabcdY
-
 /^X(?5)(a)(?|(b)|(q))(c)(d)Y/
    XYabcdY

 /^X(?&N)(a)(?|(b)|(q))(c)(d)(?<N>Y)/
    XYabcdY
 
+/Xa{2,4}b/
+    X\P
+    Xa\P
+    Xaa\P 
+    Xaaa\P
+    Xaaaa\P 
+    
+/Xa{2,4}?b/
+    X\P
+    Xa\P
+    Xaa\P 
+    Xaaa\P
+    Xaaaa\P 
+    
+/Xa{2,4}+b/
+    X\P
+    Xa\P
+    Xaa\P 
+    Xaaa\P
+    Xaaaa\P 
+    
+/X\d{2,4}b/
+    X\P
+    X3\P
+    X33\P 
+    X333\P
+    X3333\P 
+    
+/X\d{2,4}?b/
+    X\P
+    X3\P
+    X33\P 
+    X333\P
+    X3333\P 
+    
+/X\d{2,4}+b/
+    X\P
+    X3\P
+    X33\P 
+    X333\P
+    X3333\P 
+    
+/X\D{2,4}b/
+    X\P
+    Xa\P
+    Xaa\P 
+    Xaaa\P
+    Xaaaa\P 
+    
+/X\D{2,4}?b/
+    X\P
+    Xa\P
+    Xaa\P 
+    Xaaa\P
+    Xaaaa\P 
+    
+/X\D{2,4}+b/
+    X\P
+    Xa\P
+    Xaa\P 
+    Xaaa\P
+    Xaaaa\P 
+    
+/X[abc]{2,4}b/
+    X\P
+    Xa\P
+    Xaa\P 
+    Xaaa\P
+    Xaaaa\P 
+    
+/X[abc]{2,4}?b/
+    X\P
+    Xa\P
+    Xaa\P 
+    Xaaa\P
+    Xaaaa\P 
+    
+/X[abc]{2,4}+b/
+    X\P
+    Xa\P
+    Xaa\P 
+    Xaaa\P
+    Xaaaa\P 
+    
+/X[^a]{2,4}b/
+    X\P
+    Xz\P
+    Xzz\P 
+    Xzzz\P
+    Xzzzz\P 
+    
+/X[^a]{2,4}?b/
+    X\P
+    Xz\P
+    Xzz\P 
+    Xzzz\P
+    Xzzzz\P 
+    
+/X[^a]{2,4}+b/
+    X\P
+    Xz\P
+    Xzz\P 
+    Xzzz\P
+    Xzzzz\P 
+    
+/(Y)X\1{2,4}b/
+    YX\P
+    YXY\P
+    YXYY\P 
+    YXYYY\P
+    YXYYYY\P 
+    
+/(Y)X\1{2,4}?b/
+    YX\P
+    YXY\P
+    YXYY\P 
+    YXYYY\P
+    YXYYYY\P 
+    
+/(Y)X\1{2,4}+b/
+    YX\P
+    YXY\P
+    YXYY\P 
+    YXYYY\P
+    YXYYYY\P 
+    
+/\++\KZ|\d+X|9+Y/
+    ++++123999\P
+    ++++123999Y\P
+    ++++Z1234\P 
+
+/Z(*F)/
+    Z\P
+    ZA\P 
+    
+/Z(?!)/
+    Z\P 
+    ZA\P 
+
+/dog(sbody)?/
+    dogs\P
+    dogs\P\P 
+    
+/dog(sbody)??/
+    dogs\P
+    dogs\P\P 
+
+/dog|dogsbody/
+    dogs\P
+    dogs\P\P 
+ 
+/dogsbody|dog/
+    dogs\P
+    dogs\P\P 
+
+/\bthe cat\b/
+    the cat\P
+    the cat\P\P
+
+/abc/
+   abc\P
+   abc\P\P
+   
+/\w+A/P
+   CDAAAAB 
+
+/\w+A/PU
+   CDAAAAB 
+
+/abc\K123/
+    xyzabc123pqr
+    xyzabc12\P
+    xyzabc12\P\P
+    
+/(?<=abc)123/
+    xyzabc123pqr 
+    xyzabc12\P
+    xyzabc12\P\P
+
+/\babc\b/
+    +++abc+++
+    +++ab\P
+    +++ab\P\P  
+
+/(?&word)(?&element)(?(DEFINE)(?<element><[^m][^>]>[^<])(?<word>\w*+))/BZ
+
+/(?&word)(?&element)(?(DEFINE)(?<element><[^\d][^>]>[^<])(?<word>\w*+))/BZ
+
+/(ab)(x(y)z(cd(*ACCEPT)))pq/BZ
+
+/abc\K/+
+    abcdef
+    abcdef\N\N
+    xyzabcdef\N\N
+    ** Failers
+    abcdef\N 
+    xyzabcdef\N
+    
+/^(?:(?=abc)|abc\K)/+
+    abcdef
+    abcdef\N\N 
+    ** Failers 
+    abcdef\N 
+
+/a?b?/+
+    xyz
+    xyzabc
+    xyzabc\N
+    xyzabc\N\N
+    xyz\N\N    
+    ** Failers 
+    xyz\N 
+
+/^a?b?/+
+    xyz
+    xyzabc
+    ** Failers 
+    xyzabc\N
+    xyzabc\N\N
+    xyz\N\N    
+    xyz\N 
+    
+/^(?<name>a|b\g<name>c)/
+    aaaa
+    bacxxx
+    bbaccxxx 
+    bbbacccxx
+
+/^(?<name>a|b\g'name'c)/
+    aaaa
+    bacxxx
+    bbaccxxx 
+    bbbacccxx
+
+/^(a|b\g<1>c)/
+    aaaa
+    bacxxx
+    bbaccxxx 
+    bbbacccxx
+
+/^(a|b\g'1'c)/
+    aaaa
+    bacxxx
+    bbaccxxx 
+    bbbacccxx
+
+/^(a|b\g'-1'c)/
+    aaaa
+    bacxxx
+    bbaccxxx 
+    bbbacccxx
+
+/(^(a|b\g<-1>c))/
+    aaaa
+    bacxxx
+    bbaccxxx 
+    bbbacccxx
+
+/(?-i:\g<name>)(?i:(?<name>a))/
+    XaaX
+    XAAX 
+
+/(?i:\g<name>)(?-i:(?<name>a))/
+    XaaX
+    ** Failers 
+    XAAX 
+
+/(?-i:\g<+1>)(?i:(a))/
+    XaaX
+    XAAX 
+
+/(?=(?<regex>(?#simplesyntax)\$(?<name>[a-zA-Z_\x{7f}-\x{ff}][a-zA-Z0-9_\x{7f}-\x{ff}]*)(?:\[(?<index>[a-zA-Z0-9_\x{7f}-\x{ff}]+|\$\g<name>)\]|->\g<name>(\(.*?\))?)?|(?#simple syntax withbraces)\$\{(?:\g<name>(?<indices>\[(?:\g<index>|'(?:\\.|[^'\\])*'|"(?:\g<regex>|\\.|[^"\\])*")\])?|\g<complex>|\$\{\g<complex>\})\}|(?#complexsyntax)\{(?<complex>\$(?<segment>\g<name>(\g<indices>*|\(.*?\))?)(?:->\g<segment>)*|\$\g<complex>|\$\{\g<complex>\})\}))\{/
+
+/(?<n>a|b|c)\g<n>*/
+   abc
+   accccbbb 
+
 /^X(?7)(a)(?|(b)|(q)(r)(s))(c)(d)(Y)/
    XYabcdY

-/^X(?7)(a)(?|(b|(r)(s))|(q))(c)(d)(Y)/
-    XYabcdY
+/(?<=b(?1)|zzz)(a)/
+    xbaax
+    xzzzax 

-/^X(?7)(a)(?|(b|(?|(r)|(t))(s))|(q))(c)(d)(Y)/
-    XYabcdY
+/(a)(?<=b\1)/

-/ End of testinput2 /
+/(a)(?<=b+(?1))/
+
+/(a+)(?<=b(?1))/
+
+/(a(?<=b(?1)))/
+
+/(?<=b(?1))xyz/
+
+/(?<=b(?1))xyz(b+)pqrstuvew/
+
+/(a|bc)\1/SI
+
+/(a|bc)\1{2,3}/SI
+
+/(a|bc)(?1)/SI
+
+/(a|b\1)(a|b\1)/SI
+
+/(a|b\1){2}/SI
+
+/(a|bbbb\1)(a|bbbb\1)/SI
+
+/(a|bbbb\1){2}/SI
+
+/^From +([^ ]+) +[a-zA-Z][a-zA-Z][a-zA-Z] +[a-zA-Z][a-zA-Z][a-zA-Z] +[0-9]?[0-9] +[0-9][0-9]:[0-9][0-9]/SI
+
+/  (?: [\040\t] |  \(
+(?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  |  \( (?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  )* \)  )*
+\)  )*                          # optional leading comment
+(?:    (?:
+[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+    # some number of atom characters...
+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
+|
+" (?:                      # opening quote...
+[^\\\x80-\xff\n\015"]                #   Anything except backslash and quote
+|                     #    or
+\\ [^\x80-\xff]           #   Escaped something (something != CR)
+)* "  # closing quote
+)                    # initial word
+(?:  (?: [\040\t] |  \(
+(?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  |  \( (?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  )* \)  )*
+\)  )*  \.  (?: [\040\t] |  \(
+(?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  |  \( (?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  )* \)  )*
+\)  )*   (?:
+[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+    # some number of atom characters...
+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
+|
+" (?:                      # opening quote...
+[^\\\x80-\xff\n\015"]                #   Anything except backslash and quote
+|                     #    or
+\\ [^\x80-\xff]           #   Escaped something (something != CR)
+)* "  # closing quote
+)  )* # further okay, if led by a period
+(?: [\040\t] |  \(
+(?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  |  \( (?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  )* \)  )*
+\)  )*  @  (?: [\040\t] |  \(
+(?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  |  \( (?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  )* \)  )*
+\)  )*    (?:
+[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+    # some number of atom characters...
+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
+|   \[                         # [
+(?: [^\\\x80-\xff\n\015\[\]] |  \\ [^\x80-\xff]  )*    #    stuff
+\]                        #           ]
+)                           # initial subdomain
+(?:                                  #
+(?: [\040\t] |  \(
+(?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  |  \( (?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  )* \)  )*
+\)  )*  \.                        # if led by a period...
+(?: [\040\t] |  \(
+(?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  |  \( (?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  )* \)  )*
+\)  )*   (?:
+[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+    # some number of atom characters...
+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
+|   \[                         # [
+(?: [^\\\x80-\xff\n\015\[\]] |  \\ [^\x80-\xff]  )*    #    stuff
+\]                        #           ]
+)                     #   ...further okay
+)*
+# address
+|                     #  or
+(?:
+[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+    # some number of atom characters...
+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
+|
+" (?:                      # opening quote...
+[^\\\x80-\xff\n\015"]                #   Anything except backslash and quote
+|                     #    or
+\\ [^\x80-\xff]           #   Escaped something (something != CR)
+)* "  # closing quote
+)             # one word, optionally followed by....
+(?:
+[^()<>@,;:".\\\[\]\x80-\xff\000-\010\012-\037]  |  # atom and space parts, or...
+\(
+(?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  |  \( (?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  )* \)  )*
+\)       |  # comments, or...
+
+" (?:                      # opening quote...
+[^\\\x80-\xff\n\015"]                #   Anything except backslash and quote
+|                     #    or
+\\ [^\x80-\xff]           #   Escaped something (something != CR)
+)* "  # closing quote
+# quoted strings
+)*
+<  (?: [\040\t] |  \(
+(?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  |  \( (?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  )* \)  )*
+\)  )*                     # leading <
+(?:  @  (?: [\040\t] |  \(
+(?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  |  \( (?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  )* \)  )*
+\)  )*    (?:
+[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+    # some number of atom characters...
+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
+|   \[                         # [
+(?: [^\\\x80-\xff\n\015\[\]] |  \\ [^\x80-\xff]  )*    #    stuff
+\]                        #           ]
+)                           # initial subdomain
+(?:                                  #
+(?: [\040\t] |  \(
+(?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  |  \( (?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  )* \)  )*
+\)  )*  \.                        # if led by a period...
+(?: [\040\t] |  \(
+(?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  |  \( (?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  )* \)  )*
+\)  )*   (?:
+[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+    # some number of atom characters...
+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
+|   \[                         # [
+(?: [^\\\x80-\xff\n\015\[\]] |  \\ [^\x80-\xff]  )*    #    stuff
+\]                        #           ]
+)                     #   ...further okay
+)*
+
+(?:  (?: [\040\t] |  \(
+(?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  |  \( (?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  )* \)  )*
+\)  )*  ,  (?: [\040\t] |  \(
+(?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  |  \( (?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  )* \)  )*
+\)  )*  @  (?: [\040\t] |  \(
+(?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  |  \( (?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  )* \)  )*
+\)  )*    (?:
+[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+    # some number of atom characters...
+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
+|   \[                         # [
+(?: [^\\\x80-\xff\n\015\[\]] |  \\ [^\x80-\xff]  )*    #    stuff
+\]                        #           ]
+)                           # initial subdomain
+(?:                                  #
+(?: [\040\t] |  \(
+(?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  |  \( (?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  )* \)  )*
+\)  )*  \.                        # if led by a period...
+(?: [\040\t] |  \(
+(?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  |  \( (?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  )* \)  )*
+\)  )*   (?:
+[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+    # some number of atom characters...
+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
+|   \[                         # [
+(?: [^\\\x80-\xff\n\015\[\]] |  \\ [^\x80-\xff]  )*    #    stuff
+\]                        #           ]
+)                     #   ...further okay
+)*
+)* # further okay, if led by comma
+:                                # closing colon
+(?: [\040\t] |  \(
+(?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  |  \( (?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  )* \)  )*
+\)  )*  )? #       optional route
+(?:
+[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+    # some number of atom characters...
+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
+|
+" (?:                      # opening quote...
+[^\\\x80-\xff\n\015"]                #   Anything except backslash and quote
+|                     #    or
+\\ [^\x80-\xff]           #   Escaped something (something != CR)
+)* "  # closing quote
+)                    # initial word
+(?:  (?: [\040\t] |  \(
+(?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  |  \( (?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  )* \)  )*
+\)  )*  \.  (?: [\040\t] |  \(
+(?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  |  \( (?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  )* \)  )*
+\)  )*   (?:
+[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+    # some number of atom characters...
+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
+|
+" (?:                      # opening quote...
+[^\\\x80-\xff\n\015"]                #   Anything except backslash and quote
+|                     #    or
+\\ [^\x80-\xff]           #   Escaped something (something != CR)
+)* "  # closing quote
+)  )* # further okay, if led by a period
+(?: [\040\t] |  \(
+(?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  |  \( (?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  )* \)  )*
+\)  )*  @  (?: [\040\t] |  \(
+(?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  |  \( (?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  )* \)  )*
+\)  )*    (?:
+[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+    # some number of atom characters...
+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
+|   \[                         # [
+(?: [^\\\x80-\xff\n\015\[\]] |  \\ [^\x80-\xff]  )*    #    stuff
+\]                        #           ]
+)                           # initial subdomain
+(?:                                  #
+(?: [\040\t] |  \(
+(?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  |  \( (?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  )* \)  )*
+\)  )*  \.                        # if led by a period...
+(?: [\040\t] |  \(
+(?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  |  \( (?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  )* \)  )*
+\)  )*   (?:
+[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+    # some number of atom characters...
+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom
+|   \[                         # [
+(?: [^\\\x80-\xff\n\015\[\]] |  \\ [^\x80-\xff]  )*    #    stuff
+\]                        #           ]
+)                     #   ...further okay
+)*
+#       address spec
+(?: [\040\t] |  \(
+(?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  |  \( (?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  )* \)  )*
+\)  )*  > #                  trailing >
+# name and address
+)  (?: [\040\t] |  \(
+(?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  |  \( (?:  [^\\\x80-\xff\n\015()]  |  \\ [^\x80-\xff]  )* \)  )*
+\)  )*                       # optional trailing comment
+/xSI
+
+/<tr([\w\W\s\d][^<>]{0,})><TD([\w\W\s\d][^<>]{0,})>([\d]{0,}\.)(.*)((<BR>([\w\W\s\d][^<>]{0,})|[\s]{0,}))<\/a><\/TD><TD([\w\W\s\d][^<>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD><TD([\w\W\s\d][^<>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD><\/TR>/isIS
+
+"(?>.*/)foo"SI
+
+/(?(?=[^a-z]+[a-z])  \d{2}-[a-z]{3}-\d{2}  |  \d{2}-\d{2}-\d{2} ) /xSI
+
+/(?:(?:(?:(?:(?:(?:(?:(?:(?:(a|b|c))))))))))/iSI
+
+/(?:c|d)(?:)(?:aaaaaaaa(?:)(?:bbbbbbbb)(?:bbbbbbbb(?:))(?:bbbbbbbb(?:)(?:bbbbbbbb)))/SI
+
+/<a[\s]+href[\s]*=[\s]*          # find <a href=
+ ([\"\'])?                       # find single or double quote
+ (?(1) (.*?)\1 | ([^\s]+))       # if quote found, match up to next matching
+                                 # quote, otherwise match up to next space
+/isxSI
+
+/^(?!:)                       # colon disallowed at start
+  (?:                         # start of item
+    (?: [0-9a-f]{1,4} |       # 1-4 hex digits or
+    (?(1)0 | () ) )           # if null previously matched, fail; else null
+    :                         # followed by colon
+  ){1,7}                      # end item; 1-7 of them required               
+  [0-9a-f]{1,4} $             # final hex number at end of string
+  (?(1)|.)                    # check that there was an empty component
+  /xiIS
+
+/(?|(?<a>A)|(?<a>B))/I
+    AB\Ca
+    BA\Ca
+
+/(?|(?<a>A)|(?<b>B))/ 
+
+/(?:a(?<quote> (?<apostrophe>')|(?<realquote>")) |
+    b(?<quote> (?<apostrophe>')|(?<realquote>")) ) 
+    (?('quote')[a-z]+|[0-9]+)/JIx
+    a"aaaaa
+    b"aaaaa 
+    ** Failers 
+    b"11111
+    a"11111 
+    
+/^(?|(a)(b)(c)(?<D>d)|(?<D>e)) (?('D')X|Y)/JDZx
+    abcdX
+    eX
+    ** Failers
+    abcdY
+    ey     
+    
+/(?<A>a) (b)(c)  (?<A>d  (?(R&A)$ | (?4)) )/JDZx
+    abcdd
+    ** Failers
+    abcdde  
+
+/abcd*/
+    xxxxabcd\P
+    xxxxabcd\P\P
+
+/abcd*/i
+    xxxxabcd\P
+    xxxxabcd\P\P
+    XXXXABCD\P
+    XXXXABCD\P\P
+
+/abc\d*/
+    xxxxabc1\P
+    xxxxabc1\P\P
+
+/(a)bc\1*/
+    xxxxabca\P
+    xxxxabca\P\P
+
+/abc[de]*/
+    xxxxabcde\P
+    xxxxabcde\P\P
+
+/-- This is not in the Perl 5.10 test because Perl seems currently to be broken
+    and not behaving as specified in that it *does* bumpalong after hitting
+    (*COMMIT). --/ 
+
+/(?1)(A(*COMMIT)|B)D/
+    ABD
+    XABD
+    BAD
+    ABXABD  
+    ** Failers 
+    ABX 
+    BAXBAD  
+
+/(\3)(\1)(a)/<JS>
+    cat
+
+/(\3)(\1)(a)/SI<JS>
+    cat
+
+/(\3)(\1)(a)/SI
+    cat
+
+/-- End of testinput2 --/
--- a/ext/pcre/pcrelib/testdata/testinput3
+++ b/ext/pcre/pcrelib/testdata/testinput3
@ -1,3 +1,7 @@
+/-- This set of tests checks local-specific features, using the fr_FR locale. 
+    It is not Perl-compatible. There is different version called wintestinput3
+  f  or use on Windows, where the locale is called "french". --/
+
 /^[\w]+/
    *** Failers
    École
@ -88,4 +92,4 @@
    
 /[[:alpha:]][[:lower:]][[:upper:]]/DZLfr_FR 

-/ End of testinput3 /
+/-- End of testinput3 --/
--- a/ext/pcre/pcrelib/testdata/testinput4
+++ b/ext/pcre/pcrelib/testdata/testinput4
@ -1,7 +1,6 @@
-/-- Do not use the \x{} construct except with patterns that have the --/
-/-- /8 option set, because PCRE doesn't recognize them as UTF-8 unless --/
-/-- that option is set. However, the latest Perls recognize them always. --/
-
+/-- This set of tests if for UTF-8 support, excluding Unicode properties. It is
+    compatible with all versions of Perl 5. --/
+   
 /a.b/8
    acb
    a\x7fb
@ -623,4 +622,22 @@

 /(?i)[\xc3\xa9\xc3\xbd]|[\xc3\xa9\xc3\xbdA]/8

-/ End of testinput4 /
+/^[a\x{c0}]b/8
+    \x{c0}b
+    
+/^([a\x{c0}]*?)aa/8
+    a\x{c0}aaaa/ 
+
+/^([a\x{c0}]*?)aa/8
+    a\x{c0}aaaa/ 
+    a\x{c0}a\x{c0}aaa/ 
+
+/^([a\x{c0}]*)aa/8
+    a\x{c0}aaaa/ 
+    a\x{c0}a\x{c0}aaa/ 
+
+/^([a\x{c0}]*)a\x{c0}/8
+    a\x{c0}aaaa/ 
+    a\x{c0}a\x{c0}aaa/ 
+
+/-- End of testinput4 --/
--- a/ext/pcre/pcrelib/testdata/testinput5
+++ b/ext/pcre/pcrelib/testdata/testinput5
@ -1,3 +1,6 @@
+/-- This set of tests checks the API, internals, and non-Perl stuff for UTF-8
+    support, excluding Unicode properties. --/
+
 /\x{100}/8DZ

 /\x{1000}/8DZ
@ -53,30 +56,6 @@
 /.{3,5}?/DZ8
    \x{212ab}\x{212ab}\x{212ab}\x{861}

-/-- These tests are here rather than in testinput4 because Perl 5.6 has some
-problems with UTF-8 support, in the area of \x{..} where the value is < 255. 
-It grumbles about invalid UTF-8 strings. --/
-
-/^[a\x{c0}]b/8
-    \x{c0}b
-    
-/^([a\x{c0}]*?)aa/8
-    a\x{c0}aaaa/ 
-
-/^([a\x{c0}]*?)aa/8
-    a\x{c0}aaaa/ 
-    a\x{c0}a\x{c0}aaa/ 
-
-/^([a\x{c0}]*)aa/8
-    a\x{c0}aaaa/ 
-    a\x{c0}a\x{c0}aaa/ 
-
-/^([a\x{c0}]*)a\x{c0}/8
-    a\x{c0}aaaa/ 
-    a\x{c0}a\x{c0}aaa/ 
-    
-/-- --/ 
-    
 /(?<=\C)X/8
    Should produce an error diagnostic
    
@ -485,4 +464,282 @@ can't tell the difference.) --/

 /(*CRLF)(*UTF8)(*BSR_UNICODE)a\Rb/I

-/ End of testinput5 /
+/Xa{2,4}b/8
+    X\P
+    Xa\P
+    Xaa\P 
+    Xaaa\P
+    Xaaaa\P 
+    
+/Xa{2,4}?b/8
+    X\P
+    Xa\P
+    Xaa\P 
+    Xaaa\P
+    Xaaaa\P 
+    
+/Xa{2,4}+b/8
+    X\P
+    Xa\P
+    Xaa\P 
+    Xaaa\P
+    Xaaaa\P 
+    
+/X\x{123}{2,4}b/8
+    X\P
+    X\x{123}\P
+    X\x{123}\x{123}\P 
+    X\x{123}\x{123}\x{123}\P
+    X\x{123}\x{123}\x{123}\x{123}\P 
+    
+/X\x{123}{2,4}?b/8
+    X\P
+    X\x{123}\P
+    X\x{123}\x{123}\P 
+    X\x{123}\x{123}\x{123}\P
+    X\x{123}\x{123}\x{123}\x{123}\P 
+    
+/X\x{123}{2,4}+b/8
+    X\P
+    X\x{123}\P
+    X\x{123}\x{123}\P 
+    X\x{123}\x{123}\x{123}\P
+    X\x{123}\x{123}\x{123}\x{123}\P 
+    
+/X\x{123}{2,4}b/8
+    Xx\P
+    X\x{123}x\P
+    X\x{123}\x{123}x\P 
+    X\x{123}\x{123}\x{123}x\P
+    X\x{123}\x{123}\x{123}\x{123}x\P 
+    
+/X\x{123}{2,4}?b/8
+    Xx\P
+    X\x{123}x\P
+    X\x{123}\x{123}x\P 
+    X\x{123}\x{123}\x{123}x\P
+    X\x{123}\x{123}\x{123}\x{123}x\P 
+    
+/X\x{123}{2,4}+b/8
+    Xx\P
+    X\x{123}x\P
+    X\x{123}\x{123}x\P 
+    X\x{123}\x{123}\x{123}x\P
+    X\x{123}\x{123}\x{123}\x{123}x\P 
+    
+/X\d{2,4}b/8
+    X\P
+    X3\P
+    X33\P 
+    X333\P
+    X3333\P 
+    
+/X\d{2,4}?b/8
+    X\P
+    X3\P
+    X33\P 
+    X333\P
+    X3333\P 
+    
+/X\d{2,4}+b/8
+    X\P
+    X3\P
+    X33\P 
+    X333\P
+    X3333\P 
+
+/X\D{2,4}b/8
+    X\P
+    Xa\P
+    Xaa\P 
+    Xaaa\P
+    Xaaaa\P 
+    
+/X\D{2,4}?b/8
+    X\P
+    Xa\P
+    Xaa\P 
+    Xaaa\P
+    Xaaaa\P 
+    
+/X\D{2,4}+b/8
+    X\P
+    Xa\P
+    Xaa\P 
+    Xaaa\P
+    Xaaaa\P 
+
+/X\D{2,4}b/8
+    X\P
+    X\x{123}\P
+    X\x{123}\x{123}\P 
+    X\x{123}\x{123}\x{123}\P
+    X\x{123}\x{123}\x{123}\x{123}\P 
+    
+/X\D{2,4}?b/8
+    X\P
+    X\x{123}\P
+    X\x{123}\x{123}\P 
+    X\x{123}\x{123}\x{123}\P
+    X\x{123}\x{123}\x{123}\x{123}\P 
+    
+/X\D{2,4}+b/8
+    X\P
+    X\x{123}\P
+    X\x{123}\x{123}\P 
+    X\x{123}\x{123}\x{123}\P
+    X\x{123}\x{123}\x{123}\x{123}\P 
+
+/X[abc]{2,4}b/8
+    X\P
+    Xa\P
+    Xaa\P 
+    Xaaa\P
+    Xaaaa\P 
+    
+/X[abc]{2,4}?b/8
+    X\P
+    Xa\P
+    Xaa\P 
+    Xaaa\P
+    Xaaaa\P 
+    
+/X[abc]{2,4}+b/8
+    X\P
+    Xa\P
+    Xaa\P 
+    Xaaa\P
+    Xaaaa\P 
+
+/X[abc\x{123}]{2,4}b/8
+    X\P
+    X\x{123}\P
+    X\x{123}\x{123}\P 
+    X\x{123}\x{123}\x{123}\P
+    X\x{123}\x{123}\x{123}\x{123}\P 
+    
+/X[abc\x{123}]{2,4}?b/8
+    X\P
+    X\x{123}\P
+    X\x{123}\x{123}\P 
+    X\x{123}\x{123}\x{123}\P
+    X\x{123}\x{123}\x{123}\x{123}\P 
+    
+/X[abc\x{123}]{2,4}+b/8
+    X\P
+    X\x{123}\P
+    X\x{123}\x{123}\P 
+    X\x{123}\x{123}\x{123}\P
+    X\x{123}\x{123}\x{123}\x{123}\P 
+
+/X[^a]{2,4}b/8
+    X\P
+    Xz\P
+    Xzz\P 
+    Xzzz\P
+    Xzzzz\P 
+    
+/X[^a]{2,4}?b/8
+    X\P
+    Xz\P
+    Xzz\P 
+    Xzzz\P
+    Xzzzz\P 
+    
+/X[^a]{2,4}+b/8
+    X\P
+    Xz\P
+    Xzz\P 
+    Xzzz\P
+    Xzzzz\P 
+
+/X[^a]{2,4}b/8
+    X\P
+    X\x{123}\P
+    X\x{123}\x{123}\P 
+    X\x{123}\x{123}\x{123}\P
+    X\x{123}\x{123}\x{123}\x{123}\P 
+    
+/X[^a]{2,4}?b/8
+    X\P
+    X\x{123}\P
+    X\x{123}\x{123}\P 
+    X\x{123}\x{123}\x{123}\P
+    X\x{123}\x{123}\x{123}\x{123}\P 
+    
+/X[^a]{2,4}+b/8
+    X\P
+    X\x{123}\P
+    X\x{123}\x{123}\P 
+    X\x{123}\x{123}\x{123}\P
+    X\x{123}\x{123}\x{123}\x{123}\P 
+
+/(Y)X\1{2,4}b/8
+    YX\P
+    YXY\P
+    YXYY\P 
+    YXYYY\P
+    YXYYYY\P 
+    
+/(Y)X\1{2,4}?b/8
+    YX\P
+    YXY\P
+    YXYY\P 
+    YXYYY\P
+    YXYYYY\P 
+    
+/(Y)X\1{2,4}+b/8
+    YX\P
+    YXY\P
+    YXYY\P 
+    YXYYY\P
+    YXYYYY\P 
+
+/(\x{123})X\1{2,4}b/8
+    \x{123}X\P
+    \x{123}X\x{123}\P
+    \x{123}X\x{123}\x{123}\P 
+    \x{123}X\x{123}\x{123}\x{123}\P
+    \x{123}X\x{123}\x{123}\x{123}\x{123}\P 
+    
+/(\x{123})X\1{2,4}?b/8
+    \x{123}X\P
+    \x{123}X\x{123}\P
+    \x{123}X\x{123}\x{123}\P 
+    \x{123}X\x{123}\x{123}\x{123}\P
+    \x{123}X\x{123}\x{123}\x{123}\x{123}\P 
+    
+/(\x{123})X\1{2,4}+b/8
+    \x{123}X\P
+    \x{123}X\x{123}\P
+    \x{123}X\x{123}\x{123}\P 
+    \x{123}X\x{123}\x{123}\x{123}\P
+    \x{123}X\x{123}\x{123}\x{123}\x{123}\P 
+
+/\bthe cat\b/8
+    the cat\P
+    the cat\P\P
+
+/abcd*/8
+    xxxxabcd\P
+    xxxxabcd\P\P
+
+/abcd*/i8
+    xxxxabcd\P
+    xxxxabcd\P\P
+    XXXXABCD\P
+    XXXXABCD\P\P
+
+/abc\d*/8
+    xxxxabc1\P
+    xxxxabc1\P\P
+
+/(a)bc\1*/8
+    xxxxabca\P
+    xxxxabca\P\P
+
+/abc[de]*/8
+    xxxxabcde\P
+    xxxxabcde\P\P
+
+/-- End of testinput5 --/
--- a/ext/pcre/pcrelib/testdata/testinput6
+++ b/ext/pcre/pcrelib/testdata/testinput6
@ -1,3 +1,7 @@
+/-- This set of tests is for Unicode property support. It is compatible with
+    Perl 5.10, but not 5.8 because it tests some extra properties that are
+    not in the earlier release. --/ 
+
 /^\pC\pL\pM\pN\pP\pS\pZ</8
    \x7f\x{c0}\x{30f}\x{660}\x{66c}\x{f01}\x{1680}<
    \np\x{300}9!\$ < 
@ -60,11 +64,6 @@
    ** Failers
    \x{09f} 
  
-/^\p{Cs}/8
-    \?\x{dfff}
-    ** Failers
-    \x{09f} 
-  
 /^\p{Ll}/8
    a
    ** Failers 
@ -199,13 +198,6 @@
    }
    \x{f3b}
  
-/^\p{Sc}+/8
-    $\x{a2}\x{a3}\x{a4}\x{a5}\x{a6}
-    \x{9f2}
-    ** Failers
-    X
-    \x{2c2}
-  
 /^\p{Sk}/8
    \x{2c2}
    ** Failers
@ -237,17 +229,6 @@
    X
    \x{2028}
  
-/^\p{Zs}/8
-    \ \
-    \x{a0}
-    \x{1680}
-    \x{180e}
-    \x{2000}
-    \x{2001}     
-    ** Failers
-    \x{2028}
-    \x{200d} 
-  
 /\p{Nd}+(..)/8
      \x{660}\x{661}\x{662}ABC
  
@ -291,23 +272,6 @@
      ** Failers
      \x{660}\x{661}\x{662}ABC
  
-/\p{Lu}/8i
-    A
-    a\x{10a0}B 
-    ** Failers 
-    a
-    \x{1d00}  
-
-/\p{^Lu}/8i
-    1234
-    ** Failers
-    ABC 
-
-/\P{Lu}/8i
-    1234
-    ** Failers
-    ABC 
-
 /(?<=A\p{Nd})XYZ/8
    A2XYZ
    123A5XYZPQR
@ -323,26 +287,6 @@
    ** Failers
    WXYZ 

-/[\p{L}]/DZ
-
-/[\p{^L}]/DZ
-
-/[\P{L}]/DZ
-
-/[\P{^L}]/DZ
-
-/[abc\p{L}\x{0660}]/8DZ
-
-/[\p{Nd}]/8DZ
-    1234
-
-/[\p{Nd}+-]+/8DZ
-    1234
-    12-34
-    12+\x{661}-34  
-    ** Failers
-    abcd  
-
 /[\P{Nd}]+/8
    abcd
    ** Failers
@ -394,20 +338,6 @@
    ** Failers
    ABC   

-/\p{Ll}/8i 
-    a
-    Az
-    ** Failers
-    ABC   
-
-/^\x{c0}$/8i
-    \x{c0}
-    \x{e0} 
-
-/^\x{e0}$/8i
-    \x{c0}
-    \x{e0} 
-
 /A\x{391}\x{10427}\x{ff3a}\x{1fb0}/8
    A\x{391}\x{10427}\x{ff3a}\x{1fb0}
    ** Failers
@ -425,14 +355,6 @@
    A\x{391}\x{10427}\x{ff5a}\x{1fb0}
    A\x{391}\x{10427}\x{ff3a}\x{1fb8}

-/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/8iDZ
-
-/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/8DZ
-
-/AB\x{1fb0}/8DZ
-
-/AB\x{1fb0}/8DZi
-
 /\x{391}+/8i
    \x{391}\x{3b1}\x{3b1}\x{3b1}\x{391}

@ -448,35 +370,6 @@
    \x{3b1}
    \x{ff5a}   
    
-/[\x{c0}\x{391}]/8i
-    \x{c0}
-    \x{e0} 
-
-/[\x{105}-\x{109}]/8iDZ
-    \x{104}
-    \x{105}
-    \x{109}  
-    ** Failers
-    \x{100}
-    \x{10a} 
-    
-/[z-\x{100}]/8iDZ 
-    Z
-    z
-    \x{39c}
-    \x{178}
-    |
-    \x{80}
-    \x{ff}
-    \x{100}
-    \x{101} 
-    ** Failers
-    \x{102}
-    Y
-    y           
-
-/[z-\x{100}]/8DZi
-
 /^\X/8
    A
    A\x{300}BC 
@ -747,31 +640,9 @@
 /([\pL]=(abc))*X/
    L=abcX

-/The next two should be Perl-compatible, but it fails to match \x{e0}. PCRE
-will match it only with UCP support, because without that it has no notion
-of case for anything other than the ASCII letters. / 
-
-/((?i)[\x{c0}])/8
-    \x{c0}
-    \x{e0} 
-
-/(?i:[\x{c0}])/8
-    \x{c0}
-    \x{e0} 
-    
 /^\p{Balinese}\p{Cuneiform}\p{Nko}\p{Phags_Pa}\p{Phoenician}/8
    \x{1b00}\x{12000}\x{7c0}\x{a840}\x{10900}

-/The next two are special cases where the lengths of the different cases of the 
-same character differ. The first went wrong with heap frame storage; the 2nd
-was broken in all cases./
-
-/^\x{023a}+?(\x{0130}+)/8i
-  \x{023a}\x{2c65}\x{0130}
-  
-/^\x{023a}+([^X])/8i
-  \x{023a}\x{2c65}X
-
 /Check property support in non-UTF-8 mode/
 
 /\p{L}{4}/
@ -790,48 +661,6 @@ was broken in all cases./
 /[\PPP\x8a]{1,}\x80/
    A\x80

-/(?:[\PPa*]*){8,}/
-
-/[\P{Any}]/BZ
-
-/[\P{Any}\E]/BZ
-
-/(\P{Yi}+\277)/
-
-/(\P{Yi}+\277)?/
-
-/(?<=\P{Yi}{3}A)X/
-
-/\p{Yi}+(\P{Yi}+)(?1)/
-
-/(\P{Yi}{2}\277)?/
-
-/[\P{Yi}A]/
-
-/[\P{Yi}\P{Yi}\P{Yi}A]/
-
-/[^\P{Yi}A]/
-
-/[^\P{Yi}\P{Yi}\P{Yi}A]/
-
-/(\P{Yi}*\277)*/
-
-/(\P{Yi}*?\277)*/
-
-/(\p{Yi}*+\277)*/
-
-/(\P{Yi}?\277)*/
-
-/(\P{Yi}??\277)*/
-
-/(\p{Yi}?+\277)*/
-
-/(\P{Yi}{0,3}\277)*/
-
-/(\P{Yi}{0,3}?\277)*/
-
-/(\p{Yi}{0,3}+\277)*/
-
 /^[\p{Arabic}]/8
    \x{60e} 
    \x{656} 
@ -895,24 +724,6 @@ was broken in all cases./
    \x{1049f}
    \x{104aa}           

-/\p{Zl}{2,3}+/8BZ
-    \xe2\x80\xa8\xe2\x80\xa8
-    \x{2028}\x{2028}\x{2028}
-    
-/\p{Zl}/8BZ
-
-/\p{Lu}{3}+/8BZ
-
-/\pL{2}+/8BZ
-
-/\p{Cc}{2}+/8BZ
-
-/\x{c0}+\x{116}+/8i
-    \x{c0}\x{e0}\x{116}\x{117}
-
-/[\x{c0}\x{116}]+/8i
-    \x{c0}\x{e0}\x{116}\x{117}
-
 /\p{Carian}\p{Cham}\p{Kayah_Li}\p{Lepcha}\p{Lycian}\p{Lydian}\p{Ol_Chiki}\p{Rejang}\p{Saurashtra}\p{Sundanese}\p{Vai}/8
    \x{102A4}\x{AA52}\x{A91D}\x{1C46}\x{10283}\x{1092E}\x{1C6B}\x{A93B}\x{A8BF}\x{1BA0}\x{A50A}====

@ -931,12 +742,6 @@ was broken in all cases./
    aa
    aA

-/(\x{de})\1/8i
-    \x{de}\x{de}
-    \x{de}\x{fe}
-    \x{fe}\x{fe}
-    \x{fe}\x{de}
-
 /(\x{10a})\1/8i
    \x{10a}\x{10a}
    \x{10a}\x{10b}
@ -951,4 +756,4 @@ was broken in all cases./
 /[\p{Lu}\x20]+/
    \x41\x20\x50\xC2\x54\xC9\x20\x54\x4F\x44\x41\x59

-/ End of testinput6 /
+/-- End of testinput6 --/
--- a/ext/pcre/pcrelib/testdata/testinput7
+++ b/ext/pcre/pcrelib/testdata/testinput7
@ -1,3 +1,6 @@
+/-- This set of tests check the DFA matching functionality of pcre_dfa_exec().
+    The -dfa flag must be used with pcretest when running it. --/
+     
 /abc/
    abc
    
@ -4421,4 +4424,122 @@
    "ab"
    \C-"ab"

-/ End of testinput7 /
+/\d+X|9+Y/
+    ++++123999\P
+    ++++123999Y\P
+
+/Z(*F)/
+    Z\P
+    ZA\P 
+    
+/Z(?!)/
+    Z\P 
+    ZA\P 
+
+/dog(sbody)?/
+    dogs\P
+    dogs\P\P 
+    
+/dog(sbody)??/
+    dogs\P
+    dogs\P\P 
+
+/dog|dogsbody/
+    dogs\P
+    dogs\P\P 
+ 
+/dogsbody|dog/
+    dogs\P
+    dogs\P\P 
+
+/Z(*F)Q|ZXY/
+    Z\P
+    ZA\P 
+    X\P 
+
+/\bthe cat\b/
+    the cat\P
+    the cat\P\P
+
+/dog(sbody)?/
+    dogs\D\P
+    body\D\R
+
+/dog(sbody)?/
+    dogs\D\P\P
+    body\D\R
+
+/abc/
+   abc\P
+   abc\P\P
+
+/abc\K123/
+    xyzabc123pqr
+    
+/(?<=abc)123/
+    xyzabc123pqr 
+    xyzabc12\P
+    xyzabc12\P\P
+
+/\babc\b/
+    +++abc+++
+    +++ab\P
+    +++ab\P\P  
+
+/(?=C)/g+
+    ABCDECBA
+
+/(abc|def|xyz)/I
+    terhjk;abcdaadsfe
+    the quick xyz brown fox 
+    \Yterhjk;abcdaadsfe
+    \Ythe quick xyz brown fox 
+    ** Failers
+    thejk;adlfj aenjl;fda asdfasd ehj;kjxyasiupd
+    \Ythejk;adlfj aenjl;fda asdfasd ehj;kjxyasiupd
+
+/(abc|def|xyz)/SI
+    terhjk;abcdaadsfe
+    the quick xyz brown fox 
+    \Yterhjk;abcdaadsfe
+    \Ythe quick xyz brown fox 
+    ** Failers
+    thejk;adlfj aenjl;fda asdfasd ehj;kjxyasiupd
+    \Ythejk;adlfj aenjl;fda asdfasd ehj;kjxyasiupd
+
+/abcd*/+
+    xxxxabcd\P
+    xxxxabcd\P\P
+    dddxxx\R 
+    xxxxabcd\P\P
+    xxx\R 
+
+/abcd*/i
+    xxxxabcd\P
+    xxxxabcd\P\P
+    XXXXABCD\P
+    XXXXABCD\P\P
+
+/abc\d*/
+    xxxxabc1\P
+    xxxxabc1\P\P
+
+/abc[de]*/
+    xxxxabcde\P
+    xxxxabcde\P\P
+
+/(?:(?1)|B)(A(*F)|C)/
+    ABCD
+    CCD
+    ** Failers
+    CAD   
+
+/^(?:(?1)|B)(A(*F)|C)/
+    CCD
+    BCD 
+    ** Failers
+    ABCD
+    CAD
+    BAD    
+
+/-- End of testinput7 --/
--- a/ext/pcre/pcrelib/testdata/testinput8
+++ b/ext/pcre/pcrelib/testdata/testinput8
@ -1,6 +1,6 @@
-/-- Do not use the \x{} construct except with patterns that have the --/
-/-- /8 option set, because PCRE doesn't recognize them as UTF-8 unless --/
-/-- that option is set. However, the latest Perls recognize them always. --/
+/-- This set of tests checks UTF-8 support with the DFA matching functionality
+    of pcre_dfa_exec(). The -dfa flag must be used with pcretest when running 
+    it. --/

 /\x{100}ab/8
  \x{100}ab
@ -667,4 +667,22 @@
 /X/8f<any> 
    A\x{1ec5}ABCXYZ

-/ End of testinput 8 / 
+/abcd*/8
+    xxxxabcd\P
+    xxxxabcd\P\P
+
+/abcd*/i8
+    xxxxabcd\P
+    xxxxabcd\P\P
+    XXXXABCD\P
+    XXXXABCD\P\P
+
+/abc\d*/8
+    xxxxabc1\P
+    xxxxabc1\P\P
+
+/abc[de]*/8
+    xxxxabcde\P
+    xxxxabcde\P\P
+
+/-- End of testinput8 --/ 
--- a/ext/pcre/pcrelib/testdata/testinput9
+++ b/ext/pcre/pcrelib/testdata/testinput9
@ -1,3 +1,7 @@
+/-- This set of tests check Unicode property support with the DFA matching 
+    functionality of pcre_dfa_exec(). The -dfa flag must be used with pcretest
+    when running it. --/
+
 /\pL\P{Nd}/8
    AB
    *** Failers
@ -843,4 +847,4 @@
    ** Failers 
    \x{1d79}\x{a77d} 

-/ End / 
+/-- End of testinput9 --/ 
--- a/ext/pcre/pcrelib/testdata/testoutput1
+++ b/ext/pcre/pcrelib/testdata/testoutput1
@ -1,3 +1,6 @@
+/-- This set of tests is for features that are compatible with all versions of
+    Perl 5, in non-UTF-8 mode. --/
+
 /the quick brown fox/
    the quick brown fox
 0: the quick brown fox
@ -6646,4 +6649,4 @@ No match
 0: %ab%
 1: 

-/ End of testinput1 /
+/-- End of testinput1 --/
--- a/ext/pcre/pcrelib/testdata/testoutput10
+++ b/ext/pcre/pcrelib/testdata/testoutput10
@ -666,4 +666,4 @@ Memory allocation (code space): 40
 39     End
 ------------------------------------------------------------------

-/ End of testinput10 /
+/-- End of testinput10 --/
--- a/ext/pcre/pcrelib/testdata/testoutput2
+++ b/ext/pcre/pcrelib/testdata/testoutput2
--- a/ext/pcre/pcrelib/testdata/testoutput3
+++ b/ext/pcre/pcrelib/testdata/testoutput3
@ -1,3 +1,7 @@
+/-- This set of tests checks local-specific features, using the fr_FR locale. 
+    It is not Perl-compatible. There is different version called wintestinput3
+  f  or use on Windows, where the locale is called "french". --/
+
 /^[\w]+/
    *** Failers
 No match
@ -83,6 +87,7 @@ Capturing subpattern count = 0
 No options
 No first char
 No need char
+Subject length lower bound = 1
 Starting byte set: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P 
  Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z 

@ -91,6 +96,7 @@ Capturing subpattern count = 0
 No options
 No first char
 No need char
+Subject length lower bound = 1
 Starting byte set: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P 
  Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z 
  ª µ º À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â 
@ -160,4 +166,4 @@ No options
 No first char
 No need char

-/ End of testinput3 /
+/-- End of testinput3 --/
--- a/ext/pcre/pcrelib/testdata/testoutput4
+++ b/ext/pcre/pcrelib/testdata/testoutput4
@ -1,9 +1,6 @@
-/-- Do not use the \x{} construct except with patterns that have the --/
-/-- /8 option set, because PCRE doesn't recognize them as UTF-8 unless --/
-No match
-/-- that option is set. However, the latest Perls recognize them always. --/
-No match
-
+/-- This set of tests if for UTF-8 support, excluding Unicode properties. It is
+    compatible with all versions of Perl 5. --/
+   
 /a.b/8
    acb
 0: acb
@ -1089,4 +1086,37 @@ No match

 /(?i)[\xc3\xa9\xc3\xbd]|[\xc3\xa9\xc3\xbdA]/8

-/ End of testinput4 /
+/^[a\x{c0}]b/8
+    \x{c0}b
+ 0: \x{c0}b
+    
+/^([a\x{c0}]*?)aa/8
+    a\x{c0}aaaa/ 
+ 0: a\x{c0}aa
+ 1: a\x{c0}
+
+/^([a\x{c0}]*?)aa/8
+    a\x{c0}aaaa/ 
+ 0: a\x{c0}aa
+ 1: a\x{c0}
+    a\x{c0}a\x{c0}aaa/ 
+ 0: a\x{c0}a\x{c0}aa
+ 1: a\x{c0}a\x{c0}
+
+/^([a\x{c0}]*)aa/8
+    a\x{c0}aaaa/ 
+ 0: a\x{c0}aaaa
+ 1: a\x{c0}aa
+    a\x{c0}a\x{c0}aaa/ 
+ 0: a\x{c0}a\x{c0}aaa
+ 1: a\x{c0}a\x{c0}a
+
+/^([a\x{c0}]*)a\x{c0}/8
+    a\x{c0}aaaa/ 
+ 0: a\x{c0}
+ 1: 
+    a\x{c0}a\x{c0}aaa/ 
+ 0: a\x{c0}a\x{c0}
+ 1: a\x{c0}
+
+/-- End of testinput4 --/
--- a/ext/pcre/pcrelib/testdata/testoutput5
+++ b/ext/pcre/pcrelib/testdata/testoutput5
@ -1,3 +1,6 @@
+/-- This set of tests checks the API, internals, and non-Perl stuff for UTF-8
+    support, excluding Unicode properties. --/
+
 /\x{100}/8DZ
 ------------------------------------------------------------------
        Bra
@ -252,7 +255,6 @@ Need char = 171
        End
 ------------------------------------------------------------------
 Capturing subpattern count = 0
-Partial matching not supported
 Options: utf8
 No first char
 Need char = 'X'
@ -269,52 +271,12 @@ Need char = 'X'
        End
 ------------------------------------------------------------------
 Capturing subpattern count = 0
-Partial matching not supported
 Options: utf8
 No first char
 No need char
    \x{212ab}\x{212ab}\x{212ab}\x{861}
 0: \x{212ab}\x{212ab}\x{212ab}

-/-- These tests are here rather than in testinput4 because Perl 5.6 has some
-problems with UTF-8 support, in the area of \x{..} where the value is < 255. 
-It grumbles about invalid UTF-8 strings. --/
-
-/^[a\x{c0}]b/8
-    \x{c0}b
- 0: \x{c0}b
-    
-/^([a\x{c0}]*?)aa/8
-    a\x{c0}aaaa/ 
- 0: a\x{c0}aa
- 1: a\x{c0}
-
-/^([a\x{c0}]*?)aa/8
-    a\x{c0}aaaa/ 
- 0: a\x{c0}aa
- 1: a\x{c0}
-    a\x{c0}a\x{c0}aaa/ 
- 0: a\x{c0}a\x{c0}aa
- 1: a\x{c0}a\x{c0}
-
-/^([a\x{c0}]*)aa/8
-    a\x{c0}aaaa/ 
- 0: a\x{c0}aaaa
- 1: a\x{c0}aa
-    a\x{c0}a\x{c0}aaa/ 
- 0: a\x{c0}a\x{c0}aaa
- 1: a\x{c0}a\x{c0}a
-
-/^([a\x{c0}]*)a\x{c0}/8
-    a\x{c0}aaaa/ 
- 0: a\x{c0}
- 1: 
-    a\x{c0}a\x{c0}aaa/ 
- 0: a\x{c0}a\x{c0}
- 1: a\x{c0}
-    
-/-- --/ 
-    
 /(?<=\C)X/8
 Failed: \C not allowed in lookbehind assertion at offset 6

@ -389,6 +351,7 @@ Capturing subpattern count = 0
 Options: utf8
 No first char
 No need char
+Subject length lower bound = 1
 Starting byte set: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a 
  \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 
  \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 
@ -423,11 +386,11 @@ No match
        End
 ------------------------------------------------------------------
 Capturing subpattern count = 0
-Partial matching not supported
 Options: utf8
 First char = 196
 Need char = 128
-Study returned NULL
+Subject length lower bound = 3
+No set of starting bytes
  \x{100}\x{100}\x{100}\x{100\x{100}
 0: \x{100}\x{100}\x{100}

@ -443,10 +406,10 @@ Study returned NULL
        End
 ------------------------------------------------------------------
 Capturing subpattern count = 1
-Partial matching not supported
 Options: utf8
 No first char
 No need char
+Subject length lower bound = 1
 Starting byte set: x \xc4 

 /(\x{100}*a|x)/8SDZ
@ -462,10 +425,10 @@ Starting byte set: x \xc4
        End
 ------------------------------------------------------------------
 Capturing subpattern count = 1
-Partial matching not supported
 Options: utf8
 No first char
 No need char
+Subject length lower bound = 1
 Starting byte set: a x \xc4 

 /(\x{100}{0,2}a|x)/8SDZ
@ -481,10 +444,10 @@ Starting byte set: a x \xc4
        End
 ------------------------------------------------------------------
 Capturing subpattern count = 1
-Partial matching not supported
 Options: utf8
 No first char
 No need char
+Subject length lower bound = 1
 Starting byte set: a x \xc4 

 /(\x{100}{1,2}a|x)/8SDZ
@ -501,10 +464,10 @@ Starting byte set: a x \xc4
        End
 ------------------------------------------------------------------
 Capturing subpattern count = 1
-Partial matching not supported
 Options: utf8
 No first char
 No need char
+Subject length lower bound = 1
 Starting byte set: x \xc4 

 /\x{100}*(\d+|"(?1)")/8
@ -551,7 +514,6 @@ Need char = 128
        End
 ------------------------------------------------------------------
 Capturing subpattern count = 0
-Partial matching not supported
 Options: utf8
 No first char
 No need char
@ -565,7 +527,6 @@ No need char
        End
 ------------------------------------------------------------------
 Capturing subpattern count = 0
-Partial matching not supported
 Options: utf8
 First char = 'a'
 No need char
@ -579,7 +540,6 @@ No need char
        End
 ------------------------------------------------------------------
 Capturing subpattern count = 0
-Partial matching not supported
 Options: utf8
 First char = 'a'
 Need char = 'b'
@ -593,7 +553,6 @@ Need char = 'b'
        End
 ------------------------------------------------------------------
 Capturing subpattern count = 0
-Partial matching not supported
 Options: utf8
 First char = 'a'
 Need char = 128
@ -607,7 +566,6 @@ Need char = 128
        End
 ------------------------------------------------------------------
 Capturing subpattern count = 0
-Partial matching not supported
 Options: utf8
 First char = 'a'
 Need char = 129
@ -621,7 +579,6 @@ Need char = 129
        End
 ------------------------------------------------------------------
 Capturing subpattern count = 0
-Partial matching not supported
 Options: utf8
 No first char
 Need char = 'A'
@ -640,7 +597,6 @@ Need char = 'A'
        End
 ------------------------------------------------------------------
 Capturing subpattern count = 0
-Partial matching not supported
 Options: utf8
 No first char
 No need char
@ -1122,7 +1078,6 @@ Need char = 191
        End
 ------------------------------------------------------------------
 Capturing subpattern count = 0
-Partial matching not supported
 Options: utf8
 No first char
 No need char
@ -1136,7 +1091,6 @@ No need char
        End
 ------------------------------------------------------------------
 Capturing subpattern count = 0
-Partial matching not supported
 Options: utf8
 No first char
 No need char
@ -1150,7 +1104,6 @@ No need char
        End
 ------------------------------------------------------------------
 Capturing subpattern count = 0
-Partial matching not supported
 Options: utf8
 No first char
 No need char
@ -1164,7 +1117,6 @@ No need char
        End
 ------------------------------------------------------------------
 Capturing subpattern count = 0
-Partial matching not supported
 Options: utf8
 No first char
 No need char
@ -1178,7 +1130,6 @@ No need char
        End
 ------------------------------------------------------------------
 Capturing subpattern count = 0
-Partial matching not supported
 Options: utf8
 No first char
 No need char
@ -1192,7 +1143,6 @@ No need char
        End
 ------------------------------------------------------------------
 Capturing subpattern count = 0
-Partial matching not supported
 Options: utf8
 No first char
 No need char
@ -1206,7 +1156,6 @@ No need char
        End
 ------------------------------------------------------------------
 Capturing subpattern count = 0
-Partial matching not supported
 Options: utf8
 First char = 196
 Need char = 128
@ -1220,7 +1169,6 @@ Need char = 128
        End
 ------------------------------------------------------------------
 Capturing subpattern count = 0
-Partial matching not supported
 Options: utf8
 First char = 196
 Need char = 'X'
@ -1234,7 +1182,6 @@ Need char = 'X'
        End
 ------------------------------------------------------------------
 Capturing subpattern count = 0
-Partial matching not supported
 Options: utf8
 First char = 'X'
 Need char = 128
@ -1652,4 +1599,477 @@ Forced newline sequence: CRLF
 First char = 'a'
 Need char = 'b'

-/ End of testinput5 /
+/Xa{2,4}b/8
+    X\P
+Partial match: X
+    Xa\P
+Partial match: Xa
+    Xaa\P 
+Partial match: Xaa
+    Xaaa\P
+Partial match: Xaaa
+    Xaaaa\P 
+Partial match: Xaaaa
+    
+/Xa{2,4}?b/8
+    X\P
+Partial match: X
+    Xa\P
+Partial match: Xa
+    Xaa\P 
+Partial match: Xaa
+    Xaaa\P
+Partial match: Xaaa
+    Xaaaa\P 
+Partial match: Xaaaa
+    
+/Xa{2,4}+b/8
+    X\P
+Partial match: X
+    Xa\P
+Partial match: Xa
+    Xaa\P 
+Partial match: Xaa
+    Xaaa\P
+Partial match: Xaaa
+    Xaaaa\P 
+Partial match: Xaaaa
+    
+/X\x{123}{2,4}b/8
+    X\P
+Partial match: X
+    X\x{123}\P
+Partial match: X\x{123}
+    X\x{123}\x{123}\P 
+Partial match: X\x{123}\x{123}
+    X\x{123}\x{123}\x{123}\P
+Partial match: X\x{123}\x{123}\x{123}
+    X\x{123}\x{123}\x{123}\x{123}\P 
+Partial match: X\x{123}\x{123}\x{123}\x{123}
+    
+/X\x{123}{2,4}?b/8
+    X\P
+Partial match: X
+    X\x{123}\P
+Partial match: X\x{123}
+    X\x{123}\x{123}\P 
+Partial match: X\x{123}\x{123}
+    X\x{123}\x{123}\x{123}\P
+Partial match: X\x{123}\x{123}\x{123}
+    X\x{123}\x{123}\x{123}\x{123}\P 
+Partial match: X\x{123}\x{123}\x{123}\x{123}
+    
+/X\x{123}{2,4}+b/8
+    X\P
+Partial match: X
+    X\x{123}\P
+Partial match: X\x{123}
+    X\x{123}\x{123}\P 
+Partial match: X\x{123}\x{123}
+    X\x{123}\x{123}\x{123}\P
+Partial match: X\x{123}\x{123}\x{123}
+    X\x{123}\x{123}\x{123}\x{123}\P 
+Partial match: X\x{123}\x{123}\x{123}\x{123}
+    
+/X\x{123}{2,4}b/8
+    Xx\P
+No match
+    X\x{123}x\P
+No match
+    X\x{123}\x{123}x\P 
+No match
+    X\x{123}\x{123}\x{123}x\P
+No match
+    X\x{123}\x{123}\x{123}\x{123}x\P 
+No match
+    
+/X\x{123}{2,4}?b/8
+    Xx\P
+No match
+    X\x{123}x\P
+No match
+    X\x{123}\x{123}x\P 
+No match
+    X\x{123}\x{123}\x{123}x\P
+No match
+    X\x{123}\x{123}\x{123}\x{123}x\P 
+No match
+    
+/X\x{123}{2,4}+b/8
+    Xx\P
+No match
+    X\x{123}x\P
+No match
+    X\x{123}\x{123}x\P 
+No match
+    X\x{123}\x{123}\x{123}x\P
+No match
+    X\x{123}\x{123}\x{123}\x{123}x\P 
+No match
+    
+/X\d{2,4}b/8
+    X\P
+Partial match: X
+    X3\P
+Partial match: X3
+    X33\P 
+Partial match: X33
+    X333\P
+Partial match: X333
+    X3333\P 
+Partial match: X3333
+    
+/X\d{2,4}?b/8
+    X\P
+Partial match: X
+    X3\P
+Partial match: X3
+    X33\P 
+Partial match: X33
+    X333\P
+Partial match: X333
+    X3333\P 
+Partial match: X3333
+    
+/X\d{2,4}+b/8
+    X\P
+Partial match: X
+    X3\P
+Partial match: X3
+    X33\P 
+Partial match: X33
+    X333\P
+Partial match: X333
+    X3333\P 
+Partial match: X3333
+
+/X\D{2,4}b/8
+    X\P
+Partial match: X
+    Xa\P
+Partial match: Xa
+    Xaa\P 
+Partial match: Xaa
+    Xaaa\P
+Partial match: Xaaa
+    Xaaaa\P 
+Partial match: Xaaaa
+    
+/X\D{2,4}?b/8
+    X\P
+Partial match: X
+    Xa\P
+Partial match: Xa
+    Xaa\P 
+Partial match: Xaa
+    Xaaa\P
+Partial match: Xaaa
+    Xaaaa\P 
+Partial match: Xaaaa
+    
+/X\D{2,4}+b/8
+    X\P
+Partial match: X
+    Xa\P
+Partial match: Xa
+    Xaa\P 
+Partial match: Xaa
+    Xaaa\P
+Partial match: Xaaa
+    Xaaaa\P 
+Partial match: Xaaaa
+
+/X\D{2,4}b/8
+    X\P
+Partial match: X
+    X\x{123}\P
+Partial match: X\x{123}
+    X\x{123}\x{123}\P 
+Partial match: X\x{123}\x{123}
+    X\x{123}\x{123}\x{123}\P
+Partial match: X\x{123}\x{123}\x{123}
+    X\x{123}\x{123}\x{123}\x{123}\P 
+Partial match: X\x{123}\x{123}\x{123}\x{123}
+    
+/X\D{2,4}?b/8
+    X\P
+Partial match: X
+    X\x{123}\P
+Partial match: X\x{123}
+    X\x{123}\x{123}\P 
+Partial match: X\x{123}\x{123}
+    X\x{123}\x{123}\x{123}\P
+Partial match: X\x{123}\x{123}\x{123}
+    X\x{123}\x{123}\x{123}\x{123}\P 
+Partial match: X\x{123}\x{123}\x{123}\x{123}
+    
+/X\D{2,4}+b/8
+    X\P
+Partial match: X
+    X\x{123}\P
+Partial match: X\x{123}
+    X\x{123}\x{123}\P 
+Partial match: X\x{123}\x{123}
+    X\x{123}\x{123}\x{123}\P
+Partial match: X\x{123}\x{123}\x{123}
+    X\x{123}\x{123}\x{123}\x{123}\P 
+Partial match: X\x{123}\x{123}\x{123}\x{123}
+
+/X[abc]{2,4}b/8
+    X\P
+Partial match: X
+    Xa\P
+Partial match: Xa
+    Xaa\P 
+Partial match: Xaa
+    Xaaa\P
+Partial match: Xaaa
+    Xaaaa\P 
+Partial match: Xaaaa
+    
+/X[abc]{2,4}?b/8
+    X\P
+Partial match: X
+    Xa\P
+Partial match: Xa
+    Xaa\P 
+Partial match: Xaa
+    Xaaa\P
+Partial match: Xaaa
+    Xaaaa\P 
+Partial match: Xaaaa
+    
+/X[abc]{2,4}+b/8
+    X\P
+Partial match: X
+    Xa\P
+Partial match: Xa
+    Xaa\P 
+Partial match: Xaa
+    Xaaa\P
+Partial match: Xaaa
+    Xaaaa\P 
+Partial match: Xaaaa
+
+/X[abc\x{123}]{2,4}b/8
+    X\P
+Partial match: X
+    X\x{123}\P
+Partial match: X\x{123}
+    X\x{123}\x{123}\P 
+Partial match: X\x{123}\x{123}
+    X\x{123}\x{123}\x{123}\P
+Partial match: X\x{123}\x{123}\x{123}
+    X\x{123}\x{123}\x{123}\x{123}\P 
+Partial match: X\x{123}\x{123}\x{123}\x{123}
+    
+/X[abc\x{123}]{2,4}?b/8
+    X\P
+Partial match: X
+    X\x{123}\P
+Partial match: X\x{123}
+    X\x{123}\x{123}\P 
+Partial match: X\x{123}\x{123}
+    X\x{123}\x{123}\x{123}\P
+Partial match: X\x{123}\x{123}\x{123}
+    X\x{123}\x{123}\x{123}\x{123}\P 
+Partial match: X\x{123}\x{123}\x{123}\x{123}
+    
+/X[abc\x{123}]{2,4}+b/8
+    X\P
+Partial match: X
+    X\x{123}\P
+Partial match: X\x{123}
+    X\x{123}\x{123}\P 
+Partial match: X\x{123}\x{123}
+    X\x{123}\x{123}\x{123}\P
+Partial match: X\x{123}\x{123}\x{123}
+    X\x{123}\x{123}\x{123}\x{123}\P 
+Partial match: X\x{123}\x{123}\x{123}\x{123}
+
+/X[^a]{2,4}b/8
+    X\P
+Partial match: X
+    Xz\P
+Partial match: Xz
+    Xzz\P 
+Partial match: Xzz
+    Xzzz\P
+Partial match: Xzzz
+    Xzzzz\P 
+Partial match: Xzzzz
+    
+/X[^a]{2,4}?b/8
+    X\P
+Partial match: X
+    Xz\P
+Partial match: Xz
+    Xzz\P 
+Partial match: Xzz
+    Xzzz\P
+Partial match: Xzzz
+    Xzzzz\P 
+Partial match: Xzzzz
+    
+/X[^a]{2,4}+b/8
+    X\P
+Partial match: X
+    Xz\P
+Partial match: Xz
+    Xzz\P 
+Partial match: Xzz
+    Xzzz\P
+Partial match: Xzzz
+    Xzzzz\P 
+Partial match: Xzzzz
+
+/X[^a]{2,4}b/8
+    X\P
+Partial match: X
+    X\x{123}\P
+Partial match: X\x{123}
+    X\x{123}\x{123}\P 
+Partial match: X\x{123}\x{123}
+    X\x{123}\x{123}\x{123}\P
+Partial match: X\x{123}\x{123}\x{123}
+    X\x{123}\x{123}\x{123}\x{123}\P 
+Partial match: X\x{123}\x{123}\x{123}\x{123}
+    
+/X[^a]{2,4}?b/8
+    X\P
+Partial match: X
+    X\x{123}\P
+Partial match: X\x{123}
+    X\x{123}\x{123}\P 
+Partial match: X\x{123}\x{123}
+    X\x{123}\x{123}\x{123}\P
+Partial match: X\x{123}\x{123}\x{123}
+    X\x{123}\x{123}\x{123}\x{123}\P 
+Partial match: X\x{123}\x{123}\x{123}\x{123}
+    
+/X[^a]{2,4}+b/8
+    X\P
+Partial match: X
+    X\x{123}\P
+Partial match: X\x{123}
+    X\x{123}\x{123}\P 
+Partial match: X\x{123}\x{123}
+    X\x{123}\x{123}\x{123}\P
+Partial match: X\x{123}\x{123}\x{123}
+    X\x{123}\x{123}\x{123}\x{123}\P 
+Partial match: X\x{123}\x{123}\x{123}\x{123}
+
+/(Y)X\1{2,4}b/8
+    YX\P
+Partial match: YX
+    YXY\P
+Partial match: YXY
+    YXYY\P 
+Partial match: YXYY
+    YXYYY\P
+Partial match: YXYYY
+    YXYYYY\P 
+Partial match: YXYYYY
+    
+/(Y)X\1{2,4}?b/8
+    YX\P
+Partial match: YX
+    YXY\P
+Partial match: YXY
+    YXYY\P 
+Partial match: YXYY
+    YXYYY\P
+Partial match: YXYYY
+    YXYYYY\P 
+Partial match: YXYYYY
+    
+/(Y)X\1{2,4}+b/8
+    YX\P
+Partial match: YX
+    YXY\P
+Partial match: YXY
+    YXYY\P 
+Partial match: YXYY
+    YXYYY\P
+Partial match: YXYYY
+    YXYYYY\P 
+Partial match: YXYYYY
+
+/(\x{123})X\1{2,4}b/8
+    \x{123}X\P
+Partial match: \x{123}X
+    \x{123}X\x{123}\P
+Partial match: \x{123}X\x{123}
+    \x{123}X\x{123}\x{123}\P 
+Partial match: \x{123}X\x{123}\x{123}
+    \x{123}X\x{123}\x{123}\x{123}\P
+Partial match: \x{123}X\x{123}\x{123}\x{123}
+    \x{123}X\x{123}\x{123}\x{123}\x{123}\P 
+Partial match: \x{123}X\x{123}\x{123}\x{123}\x{123}
+    
+/(\x{123})X\1{2,4}?b/8
+    \x{123}X\P
+Partial match: \x{123}X
+    \x{123}X\x{123}\P
+Partial match: \x{123}X\x{123}
+    \x{123}X\x{123}\x{123}\P 
+Partial match: \x{123}X\x{123}\x{123}
+    \x{123}X\x{123}\x{123}\x{123}\P
+Partial match: \x{123}X\x{123}\x{123}\x{123}
+    \x{123}X\x{123}\x{123}\x{123}\x{123}\P 
+Partial match: \x{123}X\x{123}\x{123}\x{123}\x{123}
+    
+/(\x{123})X\1{2,4}+b/8
+    \x{123}X\P
+Partial match: \x{123}X
+    \x{123}X\x{123}\P
+Partial match: \x{123}X\x{123}
+    \x{123}X\x{123}\x{123}\P 
+Partial match: \x{123}X\x{123}\x{123}
+    \x{123}X\x{123}\x{123}\x{123}\P
+Partial match: \x{123}X\x{123}\x{123}\x{123}
+    \x{123}X\x{123}\x{123}\x{123}\x{123}\P 
+Partial match: \x{123}X\x{123}\x{123}\x{123}\x{123}
+
+/\bthe cat\b/8
+    the cat\P
+ 0: the cat
+    the cat\P\P
+Partial match: the cat
+
+/abcd*/8
+    xxxxabcd\P
+ 0: abcd
+    xxxxabcd\P\P
+Partial match: abcd
+
+/abcd*/i8
+    xxxxabcd\P
+ 0: abcd
+    xxxxabcd\P\P
+Partial match: abcd
+    XXXXABCD\P
+ 0: ABCD
+    XXXXABCD\P\P
+Partial match: ABCD
+
+/abc\d*/8
+    xxxxabc1\P
+ 0: abc1
+    xxxxabc1\P\P
+Partial match: abc1
+
+/(a)bc\1*/8
+    xxxxabca\P
+ 0: abca
+ 1: a
+    xxxxabca\P\P
+Partial match: abca
+
+/abc[de]*/8
+    xxxxabcde\P
+ 0: abcde
+    xxxxabcde\P\P
+Partial match: abcde
+
+/-- End of testinput5 --/
--- a/ext/pcre/pcrelib/testdata/testoutput6
+++ b/ext/pcre/pcrelib/testdata/testoutput6
@ -1,3 +1,7 @@
+/-- This set of tests is for Unicode property support. It is compatible with
+    Perl 5.10, but not 5.8 because it tests some extra properties that are
+    not in the earlier release. --/ 
+
 /^\pC\pL\pM\pN\pP\pS\pZ</8
    \x7f\x{c0}\x{30f}\x{660}\x{66c}\x{f01}\x{1680}<
 0: \x{7f}\x{c0}\x{30f}\x{660}\x{66c}\x{f01}\x{1680}<
@ -98,14 +102,6 @@ No match
    \x{09f} 
 No match
  
-/^\p{Cs}/8
-    \?\x{dfff}
- 0: \x{dfff}
-    ** Failers
-No match
-    \x{09f} 
-No match
-  
 /^\p{Ll}/8
    a
 0: a
@ -338,18 +334,6 @@ No match
    \x{f3b}
 No match
  
-/^\p{Sc}+/8
-    $\x{a2}\x{a3}\x{a4}\x{a5}\x{a6}
- 0: $\x{a2}\x{a3}\x{a4}\x{a5}
-    \x{9f2}
- 0: \x{9f2}
-    ** Failers
-No match
-    X
-No match
-    \x{2c2}
-No match
-  
 /^\p{Sk}/8
    \x{2c2}
 0: \x{2c2}
@ -402,26 +386,6 @@ No match
    \x{2028}
 No match
  
-/^\p{Zs}/8
-    \ \
- 0:  
-    \x{a0}
- 0: \x{a0}
-    \x{1680}
- 0: \x{1680}
-    \x{180e}
- 0: \x{180e}
-    \x{2000}
- 0: \x{2000}
-    \x{2001}     
- 0: \x{2001}
-    ** Failers
-No match
-    \x{2028}
-No match
-    \x{200d} 
-No match
-  
 /\p{Nd}+(..)/8
      \x{660}\x{661}\x{662}ABC
 0: \x{660}\x{661}\x{662}AB
@ -494,34 +458,6 @@ No match
      \x{660}\x{661}\x{662}ABC
 No match
  
-/\p{Lu}/8i
-    A
- 0: A
-    a\x{10a0}B 
- 0: \x{10a0}
-    ** Failers 
- 0: F
-    a
-No match
-    \x{1d00}  
-No match
-
-/\p{^Lu}/8i
-    1234
- 0: 1
-    ** Failers
- 0: *
-    ABC 
-No match
-
-/\P{Lu}/8i
-    1234
- 0: 1
-    ** Failers
- 0: *
-    ABC 
-No match
-
 /(?<=A\p{Nd})XYZ/8
    A2XYZ
 0: XYZ
@ -548,103 +484,6 @@ No match
    WXYZ 
 No match

-/[\p{L}]/DZ
------------------------------------------------------------------
-        Bra
-        [\p{L}]
-        Ket
-        End
------------------------------------------------------------------
-Capturing subpattern count = 0
-No options
-No first char
-No need char
-
-/[\p{^L}]/DZ
------------------------------------------------------------------
-        Bra
-        [\P{L}]
-        Ket
-        End
------------------------------------------------------------------
-Capturing subpattern count = 0
-No options
-No first char
-No need char
-
-/[\P{L}]/DZ
------------------------------------------------------------------
-        Bra
-        [\P{L}]
-        Ket
-        End
------------------------------------------------------------------
-Capturing subpattern count = 0
-No options
-No first char
-No need char
-
-/[\P{^L}]/DZ
------------------------------------------------------------------
-        Bra
-        [\p{L}]
-        Ket
-        End
------------------------------------------------------------------
-Capturing subpattern count = 0
-No options
-No first char
-No need char
-
-/[abc\p{L}\x{0660}]/8DZ
------------------------------------------------------------------
-        Bra
-        [a-c\p{L}\x{660}]
-        Ket
-        End
------------------------------------------------------------------
-Capturing subpattern count = 0
-Options: utf8
-No first char
-No need char
-
-/[\p{Nd}]/8DZ
------------------------------------------------------------------
-        Bra
-        [\p{Nd}]
-        Ket
-        End
------------------------------------------------------------------
-Capturing subpattern count = 0
-Options: utf8
-No first char
-No need char
-    1234
- 0: 1
-
-/[\p{Nd}+-]+/8DZ
------------------------------------------------------------------
-        Bra
-        [+\-\p{Nd}]+
-        Ket
-        End
------------------------------------------------------------------
-Capturing subpattern count = 0
-Partial matching not supported
-Options: utf8
-No first char
-No need char
-    1234
- 0: 1234
-    12-34
- 0: 12-34
-    12+\x{661}-34  
- 0: 12+\x{661}-34
-    ** Failers
-No match
-    abcd  
-No match
-
 /[\P{Nd}]+/8
    abcd
 0: abcd
@ -725,28 +564,6 @@ No match
    ABC   
 No match

-/\p{Ll}/8i 
-    a
- 0: a
-    Az
- 0: z
-    ** Failers
- 0: a
-    ABC   
-No match
-
-/^\x{c0}$/8i
-    \x{c0}
- 0: \x{c0}
-    \x{e0} 
- 0: \x{e0}
-
-/^\x{e0}$/8i
-    \x{c0}
- 0: \x{c0}
-    \x{e0} 
- 0: \x{e0}
-
 /A\x{391}\x{10427}\x{ff3a}\x{1fb0}/8
    A\x{391}\x{10427}\x{ff3a}\x{1fb0}
 0: A\x{391}\x{10427}\x{ff3a}\x{1fb0}
@ -777,54 +594,6 @@ No match
    A\x{391}\x{10427}\x{ff3a}\x{1fb8}
 0: A\x{391}\x{10427}\x{ff3a}\x{1fb8}

-/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/8iDZ
------------------------------------------------------------------
-        Bra
-     NC A\x{391}\x{10427}\x{ff3a}\x{1fb0}
-        Ket
-        End
------------------------------------------------------------------
-Capturing subpattern count = 0
-Options: caseless utf8
-First char = 'A' (caseless)
-No need char
-
-/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/8DZ
------------------------------------------------------------------
-        Bra
-        A\x{391}\x{10427}\x{ff3a}\x{1fb0}
-        Ket
-        End
------------------------------------------------------------------
-Capturing subpattern count = 0
-Options: utf8
-First char = 'A'
-Need char = 176
-
-/AB\x{1fb0}/8DZ
------------------------------------------------------------------
-        Bra
-        AB\x{1fb0}
-        Ket
-        End
------------------------------------------------------------------
-Capturing subpattern count = 0
-Options: utf8
-First char = 'A'
-Need char = 176
-
-/AB\x{1fb0}/8DZi
------------------------------------------------------------------
-        Bra
-     NC AB\x{1fb0}
-        Ket
-        End
------------------------------------------------------------------
-Capturing subpattern count = 0
-Options: caseless utf8
-First char = 'A' (caseless)
-Need char = 'B' (caseless)
-
 /\x{391}+/8i
    \x{391}\x{3b1}\x{3b1}\x{3b1}\x{391}
 0: \x{391}\x{3b1}\x{3b1}\x{3b1}\x{391}
@ -849,86 +618,6 @@ Need char = 'B' (caseless)
    \x{ff5a}   
 0: \x{ff5a}
    
-/[\x{c0}\x{391}]/8i
-    \x{c0}
- 0: \x{c0}
-    \x{e0} 
- 0: \x{e0}
-
-/[\x{105}-\x{109}]/8iDZ
------------------------------------------------------------------
-        Bra
-        [\x{104}-\x{109}]
-        Ket
-        End
------------------------------------------------------------------
-Capturing subpattern count = 0
-Options: caseless utf8
-No first char
-No need char
-    \x{104}
- 0: \x{104}
-    \x{105}
- 0: \x{105}
-    \x{109}  
- 0: \x{109}
-    ** Failers
-No match
-    \x{100}
-No match
-    \x{10a} 
-No match
-    
-/[z-\x{100}]/8iDZ 
------------------------------------------------------------------
-        Bra
-        [Z\x{39c}\x{178}z-\x{101}]
-        Ket
-        End
------------------------------------------------------------------
-Capturing subpattern count = 0
-Options: caseless utf8
-No first char
-No need char
-    Z
- 0: Z
-    z
- 0: z
-    \x{39c}
- 0: \x{39c}
-    \x{178}
- 0: \x{178}
-    |
- 0: |
-    \x{80}
- 0: \x{80}
-    \x{ff}
- 0: \x{ff}
-    \x{100}
- 0: \x{100}
-    \x{101} 
- 0: \x{101}
-    ** Failers
-No match
-    \x{102}
-No match
-    Y
-No match
-    y           
-No match
-
-/[z-\x{100}]/8DZi
------------------------------------------------------------------
-        Bra
-        [Z\x{39c}\x{178}z-\x{101}]
-        Ket
-        End
------------------------------------------------------------------
-Capturing subpattern count = 0
-Options: caseless utf8
-No first char
-No need char
-
 /^\X/8
    A
 0: A
@ -1408,42 +1097,10 @@ No match
 1: L=abc
 2: abc

-/The next two should be Perl-compatible, but it fails to match \x{e0}. PCRE
-will match it only with UCP support, because without that it has no notion
-of case for anything other than the ASCII letters. / 
-
-/((?i)[\x{c0}])/8
-    \x{c0}
- 0: \x{c0}
- 1: \x{c0}
-    \x{e0} 
- 0: \x{e0}
- 1: \x{e0}
-
-/(?i:[\x{c0}])/8
-    \x{c0}
- 0: \x{c0}
-    \x{e0} 
- 0: \x{e0}
-    
 /^\p{Balinese}\p{Cuneiform}\p{Nko}\p{Phags_Pa}\p{Phoenician}/8
    \x{1b00}\x{12000}\x{7c0}\x{a840}\x{10900}
 0: \x{1b00}\x{12000}\x{7c0}\x{a840}\x{10900}

-/The next two are special cases where the lengths of the different cases of the 
-same character differ. The first went wrong with heap frame storage; the 2nd
-was broken in all cases./
-
-/^\x{023a}+?(\x{0130}+)/8i
-  \x{023a}\x{2c65}\x{0130}
- 0: \x{23a}\x{2c65}\x{130}
- 1: \x{130}
-  
-/^\x{023a}+([^X])/8i
-  \x{023a}\x{2c65}X
- 0: \x{23a}\x{2c65}
- 1: \x{2c65}
-
 /Check property support in non-UTF-8 mode/
 
 /\p{L}{4}/
@ -1468,60 +1125,6 @@ No match
    A\x80
 0: A\x80

-/(?:[\PPa*]*){8,}/
-
-/[\P{Any}]/BZ
------------------------------------------------------------------
-        Bra
-        [\P{Any}]
-        Ket
-        End
------------------------------------------------------------------
-
-/[\P{Any}\E]/BZ
------------------------------------------------------------------
-        Bra
-        [\P{Any}]
-        Ket
-        End
------------------------------------------------------------------
-
-/(\P{Yi}+\277)/
-
-/(\P{Yi}+\277)?/
-
-/(?<=\P{Yi}{3}A)X/
-
-/\p{Yi}+(\P{Yi}+)(?1)/
-
-/(\P{Yi}{2}\277)?/
-
-/[\P{Yi}A]/
-
-/[\P{Yi}\P{Yi}\P{Yi}A]/
-
-/[^\P{Yi}A]/
-
-/[^\P{Yi}\P{Yi}\P{Yi}A]/
-
-/(\P{Yi}*\277)*/
-
-/(\P{Yi}*?\277)*/
-
-/(\p{Yi}*+\277)*/
-
-/(\P{Yi}?\277)*/
-
-/(\P{Yi}??\277)*/
-
-/(\p{Yi}?+\277)*/
-
-/(\P{Yi}{0,3}\277)*/
-
-/(\P{Yi}{0,3}?\277)*/
-
-/(\p{Yi}{0,3}+\277)*/
-
 /^[\p{Arabic}]/8
    \x{60e} 
 0: \x{60e}
@ -1634,59 +1237,6 @@ No match
    \x{104aa}           
 No match

-/\p{Zl}{2,3}+/8BZ
------------------------------------------------------------------
-        Bra
-        prop Zl {2}
-        prop Zl ?+
-        Ket
-        End
------------------------------------------------------------------
-    \xe2\x80\xa8\xe2\x80\xa8
- 0: \x{2028}\x{2028}
-    \x{2028}\x{2028}\x{2028}
- 0: \x{2028}\x{2028}\x{2028}
-    
-/\p{Zl}/8BZ
------------------------------------------------------------------
-        Bra
-        prop Zl
-        Ket
-        End
------------------------------------------------------------------
-
-/\p{Lu}{3}+/8BZ
------------------------------------------------------------------
-        Bra
-        prop Lu {3}
-        Ket
-        End
------------------------------------------------------------------
-
-/\pL{2}+/8BZ
------------------------------------------------------------------
-        Bra
-        prop L {2}
-        Ket
-        End
------------------------------------------------------------------
-
-/\p{Cc}{2}+/8BZ
------------------------------------------------------------------
-        Bra
-        prop Cc {2}
-        Ket
-        End
------------------------------------------------------------------
-
-/\x{c0}+\x{116}+/8i
-    \x{c0}\x{e0}\x{116}\x{117}
- 0: \x{c0}\x{e0}\x{116}\x{117}
-
-/[\x{c0}\x{116}]+/8i
-    \x{c0}\x{e0}\x{116}\x{117}
- 0: \x{c0}\x{e0}\x{116}\x{117}
-
 /\p{Carian}\p{Cham}\p{Kayah_Li}\p{Lepcha}\p{Lycian}\p{Lydian}\p{Ol_Chiki}\p{Rejang}\p{Saurashtra}\p{Sundanese}\p{Vai}/8
    \x{102A4}\x{AA52}\x{A91D}\x{1C46}\x{10283}\x{1092E}\x{1C6B}\x{A93B}\x{A8BF}\x{1BA0}\x{A50A}====
 0: \x{102a4}\x{aa52}\x{a91d}\x{1c46}\x{10283}\x{1092e}\x{1c6b}\x{a93b}\x{a8bf}\x{1ba0}\x{a50a}
@ -1719,20 +1269,6 @@ No match
 0: aA
 1: a

-/(\x{de})\1/8i
-    \x{de}\x{de}
- 0: \x{de}\x{de}
- 1: \x{de}
-    \x{de}\x{fe}
- 0: \x{de}\x{fe}
- 1: \x{de}
-    \x{fe}\x{fe}
- 0: \x{fe}\x{fe}
- 1: \x{fe}
-    \x{fe}\x{de}
- 0: \x{fe}\x{de}
- 1: \x{fe}
-
 /(\x{10a})\1/8i
    \x{10a}\x{10a}
 0: \x{10a}\x{10a}
@ -1757,4 +1293,4 @@ No match
    \x41\x20\x50\xC2\x54\xC9\x20\x54\x4F\x44\x41\x59
 0: A P\xc2T\xc9 TODAY

-/ End of testinput6 /
+/-- End of testinput6 --/
--- a/ext/pcre/pcrelib/testdata/testoutput7
+++ b/ext/pcre/pcrelib/testdata/testoutput7
@ -1,3 +1,6 @@
+/-- This set of tests check the DFA matching functionality of pcre_dfa_exec().
+    The -dfa flag must be used with pcretest when running it. --/
+     
 /abc/
    abc
 0: abc
@ -981,7 +984,7 @@ Partial match: abc
   xyzfo\P 
 No match
   foob\P\>2 
-Partial match: b
+Partial match: foob
   foobar...\R\P\>4 
 0: ar
   xyzfo\P
@ -7168,7 +7171,6 @@ No match
    
 /a\R{2,4}b/I<bsr_anycrlf>
 Capturing subpattern count = 0
-Partial matching not supported
 Options: bsr_anycrlf
 First char = 'a'
 Need char = 'b'
@ -7187,7 +7189,6 @@ No match

 /a\R{2,4}b/I<bsr_unicode>
 Capturing subpattern count = 0
-Partial matching not supported
 Options: bsr_unicode
 First char = 'a'
 Need char = 'b'
@ -7370,4 +7371,217 @@ No match
    \C-"ab"
 0: "ab"

-/ End of testinput7 /
+/\d+X|9+Y/
+    ++++123999\P
+Partial match: 123999
+    ++++123999Y\P
+ 0: 999Y
+
+/Z(*F)/
+    Z\P
+No match
+    ZA\P 
+No match
+    
+/Z(?!)/
+    Z\P 
+No match
+    ZA\P 
+No match
+
+/dog(sbody)?/
+    dogs\P
+ 0: dog
+    dogs\P\P 
+Partial match: dogs
+    
+/dog(sbody)??/
+    dogs\P
+ 0: dog
+    dogs\P\P 
+Partial match: dogs
+
+/dog|dogsbody/
+    dogs\P
+ 0: dog
+    dogs\P\P 
+Partial match: dogs
+ 
+/dogsbody|dog/
+    dogs\P
+ 0: dog
+    dogs\P\P 
+Partial match: dogs
+
+/Z(*F)Q|ZXY/
+    Z\P
+Partial match: Z
+    ZA\P 
+No match
+    X\P 
+No match
+
+/\bthe cat\b/
+    the cat\P
+ 0: the cat
+    the cat\P\P
+Partial match: the cat
+
+/dog(sbody)?/
+    dogs\D\P
+ 0: dog
+    body\D\R
+ 0: body
+
+/dog(sbody)?/
+    dogs\D\P\P
+Partial match: dogs
+    body\D\R
+ 0: body
+
+/abc/
+   abc\P
+ 0: abc
+   abc\P\P
+ 0: abc
+
+/abc\K123/
+    xyzabc123pqr
+Error -16
+    
+/(?<=abc)123/
+    xyzabc123pqr 
+ 0: 123
+    xyzabc12\P
+Partial match: abc12
+    xyzabc12\P\P
+Partial match: abc12
+
+/\babc\b/
+    +++abc+++
+ 0: abc
+    +++ab\P
+Partial match: +ab
+    +++ab\P\P  
+Partial match: +ab
+
+/(?=C)/g+
+    ABCDECBA
+ 0: 
+ 0+ CDECBA
+ 0: 
+ 0+ CBA
+
+/(abc|def|xyz)/I
+Capturing subpattern count = 1
+No options
+No first char
+No need char
+    terhjk;abcdaadsfe
+ 0: abc
+    the quick xyz brown fox 
+ 0: xyz
+    \Yterhjk;abcdaadsfe
+ 0: abc
+    \Ythe quick xyz brown fox 
+ 0: xyz
+    ** Failers
+No match
+    thejk;adlfj aenjl;fda asdfasd ehj;kjxyasiupd
+No match
+    \Ythejk;adlfj aenjl;fda asdfasd ehj;kjxyasiupd
+No match
+
+/(abc|def|xyz)/SI
+Capturing subpattern count = 1
+No options
+No first char
+No need char
+Subject length lower bound = 3
+Starting byte set: a d x 
+    terhjk;abcdaadsfe
+ 0: abc
+    the quick xyz brown fox 
+ 0: xyz
+    \Yterhjk;abcdaadsfe
+ 0: abc
+    \Ythe quick xyz brown fox 
+ 0: xyz
+    ** Failers
+No match
+    thejk;adlfj aenjl;fda asdfasd ehj;kjxyasiupd
+No match
+    \Ythejk;adlfj aenjl;fda asdfasd ehj;kjxyasiupd
+No match
+
+/abcd*/+
+    xxxxabcd\P
+ 0: abcd
+ 0+ 
+ 1: abc
+    xxxxabcd\P\P
+Partial match: abcd
+    dddxxx\R 
+ 0: ddd
+ 0+ xxx
+ 1: dd
+ 2: d
+ 3: 
+    xxxxabcd\P\P
+Partial match: abcd
+    xxx\R 
+ 0: 
+ 0+ xxx
+
+/abcd*/i
+    xxxxabcd\P
+ 0: abcd
+ 1: abc
+    xxxxabcd\P\P
+Partial match: abcd
+    XXXXABCD\P
+ 0: ABCD
+ 1: ABC
+    XXXXABCD\P\P
+Partial match: ABCD
+
+/abc\d*/
+    xxxxabc1\P
+ 0: abc1
+ 1: abc
+    xxxxabc1\P\P
+Partial match: abc1
+
+/abc[de]*/
+    xxxxabcde\P
+ 0: abcde
+ 1: abcd
+ 2: abc
+    xxxxabcde\P\P
+Partial match: abcde
+
+/(?:(?1)|B)(A(*F)|C)/
+    ABCD
+ 0: BC
+    CCD
+ 0: CC
+    ** Failers
+No match
+    CAD   
+No match
+
+/^(?:(?1)|B)(A(*F)|C)/
+    CCD
+ 0: CC
+    BCD 
+ 0: BC
+    ** Failers
+No match
+    ABCD
+No match
+    CAD
+No match
+    BAD    
+No match
+
+/-- End of testinput7 --/
--- a/ext/pcre/pcrelib/testdata/testoutput8
+++ b/ext/pcre/pcrelib/testdata/testoutput8
@ -1,8 +1,6 @@
-/-- Do not use the \x{} construct except with patterns that have the --/
-/-- /8 option set, because PCRE doesn't recognize them as UTF-8 unless --/
-No match
-/-- that option is set. However, the latest Perls recognize them always. --/
-No match
+/-- This set of tests checks UTF-8 support with the DFA matching functionality
+    of pcre_dfa_exec(). The -dfa flag must be used with pcretest when running 
+    it. --/

 /\x{100}ab/8
  \x{100}ab
@ -1288,4 +1286,38 @@ No match
    A\x{1ec5}ABCXYZ
 0: X

-/ End of testinput 8 / 
+/abcd*/8
+    xxxxabcd\P
+ 0: abcd
+ 1: abc
+    xxxxabcd\P\P
+Partial match: abcd
+
+/abcd*/i8
+    xxxxabcd\P
+ 0: abcd
+ 1: abc
+    xxxxabcd\P\P
+Partial match: abcd
+    XXXXABCD\P
+ 0: ABCD
+ 1: ABC
+    XXXXABCD\P\P
+Partial match: ABCD
+
+/abc\d*/8
+    xxxxabc1\P
+ 0: abc1
+ 1: abc
+    xxxxabc1\P\P
+Partial match: abc1
+
+/abc[de]*/8
+    xxxxabcde\P
+ 0: abcde
+ 1: abcd
+ 2: abc
+    xxxxabcde\P\P
+Partial match: abcde
+
+/-- End of testinput8 --/ 
--- a/ext/pcre/pcrelib/testdata/testoutput9
+++ b/ext/pcre/pcrelib/testdata/testoutput9
@ -1,3 +1,7 @@
+/-- This set of tests check Unicode property support with the DFA matching 
+    functionality of pcre_dfa_exec(). The -dfa flag must be used with pcretest
+    when running it. --/
+
 /\pL\P{Nd}/8
    AB
 0: AB
@ -1670,4 +1674,4 @@ No match
    \x{1d79}\x{a77d} 
 No match

-/ End / 
+/-- End of testinput9 --/ 
--- a/ext/pcre/upgrade-pcre.php
+++ b/ext/pcre/upgrade-pcre.php
@ -84,7 +84,12 @@ recurse('pcrelib');

 $dirorig = scandir('pcrelib/testdata');
 $k = array_search('CVS', $dirorig);
-unset($dirorig[$k]);
+if ($k !== false)
+	unset($dirorig[$k]);
+
+$k = array_search('.svn', $dirorig);
+if ($k !== false)
+	unset($dirorig[$k]);

 $dirnew = scandir("$newpcre/testdata");
 $diff   = array_diff($dirorig, $dirnew);