Upgraded bundled PCRE to version 8.10

2025-01-20 18:53:37 +08:00 · 2010-07-02 17:17:16 +00:00 · 2010-07-02 17:17:16 +00:00 · ef22824315
commit ef22824315
parent 8584b90199
27 changed files with 4213 additions and 1252 deletions
--- a/4
+++ b/4
@ -1,8 +1,8 @@
-PHP                                                                        NEWS
+PHP                                                                        NEWS
 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
 ?? ??? 201?, PHP 5.3.99
 - Upgraded bundled sqlite to version 3.6.23.1. (Ilia)
- Upgraded bundled PCRE to version 8.02. (Ilia)
+- Upgraded bundled PCRE to version 8.10. (Ilia)

 - Added caches to eliminate repeatable run-time bindings of functions, classes,
  constants, methods and properties (Dmitry)
--- a/ext/pcre/pcrelib/ChangeLog
+++ b/ext/pcre/pcrelib/ChangeLog
@ -1,6 +1,101 @@
 ChangeLog for PCRE
 ------------------

+Version 8.10 25-Jun-2010
+------------------------
+
+1.  Added support for (*MARK:ARG) and for ARG additions to PRUNE, SKIP, and
+    THEN.
+
+2.  (*ACCEPT) was not working when inside an atomic group.
+
+3.  Inside a character class, \B is treated as a literal by default, but
+    faulted if PCRE_EXTRA is set. This mimics Perl's behaviour (the -w option
+    causes the error). The code is unchanged, but I tidied the documentation.
+
+4.  Inside a character class, PCRE always treated \R and \X as literals,
+    whereas Perl faults them if its -w option is set. I have changed PCRE so
+    that it faults them when PCRE_EXTRA is set.
+
+5.  Added support for \N, which always matches any character other than
+    newline. (It is the same as "." when PCRE_DOTALL is not set.)
+
+6.  When compiling pcregrep with newer versions of gcc which may have
+    FORTIFY_SOURCE set, several warnings "ignoring return value of 'fwrite',
+    declared with attribute warn_unused_result" were given. Just casting the
+    result to (void) does not stop the warnings; a more elaborate fudge is
+    needed. I've used a macro to implement this.
+
+7.  Minor change to pcretest.c to avoid a compiler warning.
+
+8.  Added four artifical Unicode properties to help with an option to make
+    \s etc use properties (see next item). The new properties are: Xan
+    (alphanumeric), Xsp (Perl space), Xps (POSIX space), and Xwd (word).
+
+9.  Added PCRE_UCP to make \b, \d, \s, \w, and certain POSIX character classes
+    use Unicode properties. (*UCP) at the start of a pattern can be used to set
+    this option. Modified pcretest to add /W to test this facility. Added
+    REG_UCP to make it available via the POSIX interface.
+
+10. Added --line-buffered to pcregrep.
+
+11. In UTF-8 mode, if a pattern that was compiled with PCRE_CASELESS was
+    studied, and the match started with a letter with a code point greater than
+    127 whose first byte was different to the first byte of the other case of
+    the letter, the other case of this starting letter was not recognized
+    (#976).
+
+12. If a pattern that was studied started with a repeated Unicode property
+    test, for example, \p{Nd}+, there was the theoretical possibility of
+    setting up an incorrect bitmap of starting bytes, but fortunately it could
+    not have actually happened in practice until change 8 above was made (it
+    added property types that matched character-matching opcodes).
+
+13. pcre_study() now recognizes \h, \v, and \R when constructing a bit map of
+    possible starting bytes for non-anchored patterns.
+
+14. Extended the "auto-possessify" feature of pcre_compile(). It now recognizes
+    \R, and also a number of cases that involve Unicode properties, both
+    explicit and implicit when PCRE_UCP is set.
+
+15. If a repeated Unicode property match (e.g. \p{Lu}*) was used with non-UTF-8
+    input, it could crash or give wrong results if characters with values
+    greater than 0xc0 were present in the subject string. (Detail: it assumed
+    UTF-8 input when processing these items.)
+
+16. Added a lot of (int) casts to avoid compiler warnings in systems where
+    size_t is 64-bit (#991).
+
+17. Added a check for running out of memory when PCRE is compiled with
+    --disable-stack-for-recursion (#990).
+
+18. If the last data line in a file for pcretest does not have a newline on
+    the end, a newline was missing in the output.
+
+19. The default pcre_chartables.c file recognizes only ASCII characters (values
+    less than 128) in its various bitmaps. However, there is a facility for
+    generating tables according to the current locale when PCRE is compiled. It
+    turns out that in some environments, 0x85 and 0xa0, which are Unicode space
+    characters, are recognized by isspace() and therefore were getting set in
+    these tables, and indeed these tables seem to approximate to ISO 8859. This
+    caused a problem in UTF-8 mode when pcre_study() was used to create a list
+    of bytes that can start a match. For \s, it was including 0x85 and 0xa0,
+    which of course cannot start UTF-8 characters. I have changed the code so
+    that only real ASCII characters (less than 128) and the correct starting
+    bytes for UTF-8 encodings are set for characters greater than 127 when in
+    UTF-8 mode. (When PCRE_UCP is set - see 9 above - the code is different
+    altogether.)
+
+20. Added the /T option to pcretest so as to be able to run tests with non-
+    standard character tables, thus making it possible to include the tests
+    used for 19 above in the standard set of tests.
+
+21. A pattern such as (?&t)(?#()(?(DEFINE)(?<t>a)) which has a forward
+    reference to a subpattern the other side of a comment that contains an
+    opening parenthesis caused either an internal compiling error, or a
+    reference to the wrong subpattern.
+
+
 Version 8.02 19-Mar-2010
 ------------------------

--- a/ext/pcre/pcrelib/NEWS
+++ b/ext/pcre/pcrelib/NEWS
@ -1,6 +1,17 @@
 News about PCRE releases
 ------------------------

+Release 8.10 25-Jun-2010
+------------------------
+
+There are two major additions: support for (*MARK) and friends, and the option
+PCRE_UCP, which changes the behaviour of \b, \d, \s, and \w (and their
+opposites) so that they make use of Unicode properties. There are also a number
+of lesser new features, and several bugs have been fixed. A new option,
+--line-buffered, has been added to pcregrep, for use when it is connected to
+pipes.
+
+
 Release 8.02 19-Mar-2010
 ------------------------

--- a/ext/pcre/pcrelib/NON-UNIX-USE
+++ b/ext/pcre/pcrelib/NON-UNIX-USE
@ -188,9 +188,9 @@ significantly slower when this is done. There is more about stack usage in the
 LINKING PROGRAMS IN WINDOWS ENVIRONMENTS

 If you want to statically link a program against a PCRE library in the form of
-a non-dll .a file, you must define PCRE_STATIC before including pcre.h,
-otherwise the pcre_malloc() and pcre_free() exported functions will be declared
-__declspec(dllimport), with unwanted results.
+a non-dll .a file, you must define PCRE_STATIC before including pcre.h or
+pcrecpp.h, otherwise the pcre_malloc() and pcre_free() exported functions will
+be declared __declspec(dllimport), with unwanted results.


 CALLING CONVENTIONS IN WINDOWS ENVIRONMENTS
@ -497,5 +497,5 @@ build.log file in the root of the package also.


 =========================
-Last Updated: 19 January 2010
+Last Updated: 26 May 2010
 ****
--- a/ext/pcre/pcrelib/config.h
+++ b/ext/pcre/pcrelib/config.h
@ -271,13 +271,16 @@ them both to 0; an emulation function will be used. */
 #define PACKAGE_NAME "PCRE"

 /* Define to the full name and version of this package. */
-#define PACKAGE_STRING "PCRE 8.02"
+#define PACKAGE_STRING "PCRE 8.10"

 /* Define to the one symbol short name of this package. */
 #define PACKAGE_TARNAME "pcre"

+/* Define to the home page for this package. */
+#define PACKAGE_URL ""
+
 /* Define to the version of this package. */
-#define PACKAGE_VERSION "8.02"
+#define PACKAGE_VERSION "8.10"


 /* If you are compiling for a system other than a Unix-like system or
@ -333,7 +336,7 @@ them both to 0; an emulation function will be used. */

 /* Version number of package */
 #ifndef VERSION
-#define VERSION "8.02"
+#define VERSION "8.10"
 #endif

 /* Define to empty if `const' does not conform to ANSI C. */
--- a/ext/pcre/pcrelib/doc/pcre.txt
+++ b/ext/pcre/pcrelib/doc/pcre.txt
--- a/ext/pcre/pcrelib/pcre.h
+++ b/ext/pcre/pcrelib/pcre.h
@ -5,7 +5,7 @@
 /* This is the public header file for the PCRE library, to be #included by
 applications that call the PCRE functions.

-           Copyright (c) 1997-2009 University of Cambridge
+           Copyright (c) 1997-2010 University of Cambridge

 -----------------------------------------------------------------------------
 Redistribution and use in source and binary forms, with or without
@ -42,9 +42,9 @@ POSSIBILITY OF SUCH DAMAGE.
 /* The current PCRE version information. */

 #define PCRE_MAJOR          8
-#define PCRE_MINOR          02
+#define PCRE_MINOR          10
 #define PCRE_PRERELEASE     
-#define PCRE_DATE           2010-03-19
+#define PCRE_DATE           2010-06-25

 /* When an application links to a PCRE DLL in Windows, the symbols that are
 imported have to be identified as such. When building PCRE, the appropriate
@ -131,6 +131,7 @@ both, so we keep them all distinct. */
 #define PCRE_NO_START_OPTIMISE  0x04000000
 #define PCRE_PARTIAL_HARD       0x08000000
 #define PCRE_NOTEMPTY_ATSTART   0x10000000
+#define PCRE_UCP                0x20000000

 /* Exec-time and get/set-time error codes */

@ -200,6 +201,7 @@ these bits, just add new ones on the end, in order to remain compatible. */
 #define PCRE_EXTRA_CALLOUT_DATA           0x0004
 #define PCRE_EXTRA_TABLES                 0x0008
 #define PCRE_EXTRA_MATCH_LIMIT_RECURSION  0x0010
+#define PCRE_EXTRA_MARK                   0x0020

 /* Types */

@ -225,6 +227,7 @@ typedef struct pcre_extra {
  void *callout_data;             /* Data passed back in callouts */
  const unsigned char *tables;    /* Pointer to character tables */
  unsigned long int match_limit_recursion; /* Max recursive calls to match() */
+  unsigned char **mark;           /* For passing back a mark pointer */
 } pcre_extra;

 /* The structure for passing out data via the pcre_callout_function. We use a
--- a/ext/pcre/pcrelib/pcre_chartables.c
+++ b/ext/pcre/pcrelib/pcre_chartables.c
@ -14,7 +14,7 @@ example ISO-8859-1. When dftables is run, it creates these tables in the
 current locale. If PCRE is configured with --enable-rebuild-chartables, this
 happens automatically.

-The following #includes are present because without the gcc 4.x may remove the
+The following #includes are present because without them gcc 4.x may remove the
 array definition from the final binary if PCRE is built into a static library
 and dead code stripping is activated. This leads to link errors. Pulling in the
 header ensures that the array gets flagged as "someone outside this compilation
--- a/ext/pcre/pcrelib/pcre_compile.c
+++ b/ext/pcre/pcrelib/pcre_compile.c
--- a/ext/pcre/pcrelib/pcre_exec.c
+++ b/ext/pcre/pcrelib/pcre_exec.c
--- a/ext/pcre/pcrelib/pcre_internal.h
+++ b/ext/pcre/pcrelib/pcre_internal.h
@ -192,9 +192,7 @@ stdint.h is available, include it; it may define INT64_MAX. Systems that do not
 have stdint.h (e.g. Solaris) may have inttypes.h. The macro int64_t may be set
 by "configure". */

-#ifdef PHP_WIN32
-#include "win32/php_stdint.h"
-#elif HAVE_STDINT_H
+#if HAVE_STDINT_H
 #include <stdint.h>
 #elif HAVE_INTTYPES_H
 #include <inttypes.h>
@ -477,7 +475,8 @@ know we are in UTF-8 mode. */
      } \
    }

-/* Get the next character, testing for UTF-8 mode, and advancing the pointer */
+/* Get the next character, testing for UTF-8 mode, and advancing the pointer.
+This is called when we don't know if we are in UTF-8 mode. */

 #define GETCHARINCTEST(c, eptr) \
  c = *eptr++; \
@ -514,7 +513,7 @@ if there are extra bytes. This is called when we know we are in UTF-8 mode. */

 /* Get the next UTF-8 character, testing for UTF-8 mode, not advancing the
 pointer, incrementing length if there are extra bytes. This is called when we
-know we are in UTF-8 mode. */
+do not know if we are in UTF-8 mode. */

 #define GETCHARLENTEST(c, eptr, len) \
  c = *eptr; \
@ -582,7 +581,7 @@ time, run time, or study time, respectively. */
   PCRE_DOTALL|PCRE_DOLLAR_ENDONLY|PCRE_EXTRA|PCRE_UNGREEDY|PCRE_UTF8| \
   PCRE_NO_AUTO_CAPTURE|PCRE_NO_UTF8_CHECK|PCRE_AUTO_CALLOUT|PCRE_FIRSTLINE| \
   PCRE_DUPNAMES|PCRE_NEWLINE_BITS|PCRE_BSR_ANYCRLF|PCRE_BSR_UNICODE| \
-   PCRE_JAVASCRIPT_COMPAT)
+   PCRE_JAVASCRIPT_COMPAT|PCRE_UCP)

 #define PUBLIC_EXEC_OPTIONS \
  (PCRE_ANCHORED|PCRE_NOTBOL|PCRE_NOTEOL|PCRE_NOTEMPTY|PCRE_NOTEMPTY_ATSTART| \
@ -877,6 +876,7 @@ so that PCRE works on both ASCII and EBCDIC platforms, in non-UTF-mode only. */
 #define STRING_COMMIT0              "COMMIT\0"
 #define STRING_F0                   "F\0"
 #define STRING_FAIL0                "FAIL\0"
+#define STRING_MARK0                "MARK\0"
 #define STRING_PRUNE0               "PRUNE\0"
 #define STRING_SKIP0                "SKIP\0"
 #define STRING_THEN                 "THEN"
@ -906,6 +906,7 @@ so that PCRE works on both ASCII and EBCDIC platforms, in non-UTF-mode only. */
 #define STRING_BSR_ANYCRLF_RIGHTPAR "BSR_ANYCRLF)"
 #define STRING_BSR_UNICODE_RIGHTPAR "BSR_UNICODE)"
 #define STRING_UTF8_RIGHTPAR        "UTF8)"
+#define STRING_UCP_RIGHTPAR         "UCP)"

 #else  /* SUPPORT_UTF8 */

@ -1129,6 +1130,7 @@ only. */
 #define STRING_COMMIT0              STR_C STR_O STR_M STR_M STR_I STR_T "\0"
 #define STRING_F0                   STR_F "\0"
 #define STRING_FAIL0                STR_F STR_A STR_I STR_L "\0"
+#define STRING_MARK0                STR_M STR_A STR_R STR_K "\0"
 #define STRING_PRUNE0               STR_P STR_R STR_U STR_N STR_E "\0"
 #define STRING_SKIP0                STR_S STR_K STR_I STR_P "\0"
 #define STRING_THEN                 STR_T STR_H STR_E STR_N
@ -1158,6 +1160,7 @@ only. */
 #define STRING_BSR_ANYCRLF_RIGHTPAR STR_B STR_S STR_R STR_UNDERSCORE STR_A STR_N STR_Y STR_C STR_R STR_L STR_F STR_RIGHT_PARENTHESIS
 #define STRING_BSR_UNICODE_RIGHTPAR STR_B STR_S STR_R STR_UNDERSCORE STR_U STR_N STR_I STR_C STR_O STR_D STR_E STR_RIGHT_PARENTHESIS
 #define STRING_UTF8_RIGHTPAR        STR_U STR_T STR_F STR_8 STR_RIGHT_PARENTHESIS
+#define STRING_UCP_RIGHTPAR         STR_U STR_C STR_P STR_RIGHT_PARENTHESIS

 #endif  /* SUPPORT_UTF8 */

@ -1190,9 +1193,13 @@ only. */

 #define PT_ANY        0    /* Any property - matches all chars */
 #define PT_LAMP       1    /* L& - the union of Lu, Ll, Lt */
-#define PT_GC         2    /* General characteristic (e.g. L) */
-#define PT_PC         3    /* Particular characteristic (e.g. Lu) */
+#define PT_GC         2    /* Specified general characteristic (e.g. L) */
+#define PT_PC         3    /* Specified particular characteristic (e.g. Lu) */
 #define PT_SC         4    /* Script (e.g. Han) */
+#define PT_ALNUM      5    /* Alphanumeric - the union of L and N */
+#define PT_SPACE      6    /* Perl space - Z plus 9,10,12,13 */
+#define PT_PXSPACE    7    /* POSIX space - Z plus 9,10,11,12,13 */
+#define PT_WORD       8    /* Word - L plus N plus underscore */

 /* Flag bits and data types for the extended class (OP_XCLASS) for classes that
 contain UTF-8 characters with values greater than 255. */
@ -1209,9 +1216,15 @@ contain UTF-8 characters with values greater than 255. */
 /* These are escaped items that aren't just an encoding of a particular data
 value such as \n. They must have non-zero values, as check_escape() returns
 their negation. Also, they must appear in the same order as in the opcode
-definitions below, up to ESC_z. There's a dummy for OP_ANY because it
-corresponds to "." rather than an escape sequence, and another for OP_ALLANY
-(which is used for [^] in JavaScript compatibility mode).
+definitions below, up to ESC_z. There's a dummy for OP_ALLANY because it
+corresponds to "." in DOTALL mode rather than an escape sequence. It is also
+used for [^] in JavaScript compatibility mode. In non-DOTALL mode, "." behaves
+like \N.
+
+The special values ESC_DU, ESC_du, etc. are used instead of ESC_D, ESC_d, etc.
+when PCRE_UCP is set, when replacement of \d etc by \p sequences is required.
+They must be contiguous, and remain in order so that the replacements can be
+looked up from a table.

 The final escape must be ESC_REF as subsequent values are used for
 backreferences (\1, \2, \3, etc). There are two tests in the code for an escape
@ -1221,11 +1234,12 @@ put in between that don't consume a character, that code will have to change.
 */

 enum { ESC_A = 1, ESC_G, ESC_K, ESC_B, ESC_b, ESC_D, ESC_d, ESC_S, ESC_s,
-       ESC_W, ESC_w, ESC_dum1, ESC_dum2, ESC_C, ESC_P, ESC_p, ESC_R, ESC_H,
-       ESC_h, ESC_V, ESC_v, ESC_X, ESC_Z, ESC_z, ESC_E, ESC_Q, ESC_g, ESC_k,
+       ESC_W, ESC_w, ESC_N, ESC_dum, ESC_C, ESC_P, ESC_p, ESC_R, ESC_H,
+       ESC_h, ESC_V, ESC_v, ESC_X, ESC_Z, ESC_z,
+       ESC_E, ESC_Q, ESC_g, ESC_k,
+       ESC_DU, ESC_du, ESC_SU, ESC_su, ESC_WU, ESC_wu,
       ESC_REF };

-
 /* Opcode table: Starting from 1 (i.e. after OP_END), the values up to
 OP_EOD must correspond in order to the list of escapes immediately above.

@ -1249,8 +1263,8 @@ enum {
  OP_WHITESPACE,         /*  9 \s */
  OP_NOT_WORDCHAR,       /* 10 \W */
  OP_WORDCHAR,           /* 11 \w */
-  OP_ANY,            /* 12 Match any character (subject to DOTALL) */
-  OP_ALLANY,         /* 13 Match any character (not subject to DOTALL) */
+  OP_ANY,            /* 12 Match any character except newline */
+  OP_ALLANY,         /* 13 Match any character */
  OP_ANYBYTE,        /* 14 Match any byte (\C); different to OP_ANY for UTF-8 */
  OP_NOTPROP,        /* 15 \P (not Unicode property) */
  OP_PROP,           /* 16 \p (Unicode property) */
@ -1380,20 +1394,24 @@ enum {

  /* These are backtracking control verbs */

-  OP_PRUNE,          /* 107 */
-  OP_SKIP,           /* 108 */
-  OP_THEN,           /* 109 */
-  OP_COMMIT,         /* 110 */
+  OP_MARK,           /* 107 always has an argument */
+  OP_PRUNE,          /* 108 */
+  OP_PRUNE_ARG,      /* 109 same, but with argument */
+  OP_SKIP,           /* 110 */
+  OP_SKIP_ARG,       /* 111 same, but with argument */
+  OP_THEN,           /* 112 */
+  OP_THEN_ARG,       /* 113 same, but with argument */
+  OP_COMMIT,         /* 114 */

  /* These are forced failure and success verbs */

-  OP_FAIL,           /* 111 */
-  OP_ACCEPT,         /* 112 */
-  OP_CLOSE,          /* 113 Used before OP_ACCEPT to close open captures */
+  OP_FAIL,           /* 115 */
+  OP_ACCEPT,         /* 116 */
+  OP_CLOSE,          /* 117 Used before OP_ACCEPT to close open captures */

  /* This is used to skip a subpattern with a {0} quantifier */

-  OP_SKIPZERO,       /* 114 */
+  OP_SKIPZERO,       /* 118 */

  /* This is not an opcode, but is used to check that tables indexed by opcode
  are the correct length, in order to catch updating errors - there have been
@ -1404,7 +1422,7 @@ enum {

 /* *** NOTE NOTE NOTE *** Whenever the list above is updated, the two macro
 definitions that follow must also be updated to match. There are also tables
-called "coptable" cna "poptable" in pcre_dfa_exec.c that must be updated. */
+called "coptable" and "poptable" in pcre_dfa_exec.c that must be updated. */


 /* This macro defines textual names for all the opcodes. These are used only
@ -1429,7 +1447,8 @@ for debugging. The macro is referenced only in pcre_printint.c. */
  "Once", "Bra", "CBra", "Cond", "SBra", "SCBra", "SCond",        \
  "Cond ref", "Cond nref", "Cond rec", "Cond nrec", "Cond def",   \
  "Brazero", "Braminzero",                                        \
-  "*PRUNE", "*SKIP", "*THEN", "*COMMIT", "*FAIL", "*ACCEPT",      \
+  "*MARK", "*PRUNE", "*PRUNE", "*SKIP", "*SKIP",                  \
+  "*THEN", "*THEN", "*COMMIT", "*FAIL", "*ACCEPT",                \
  "Close", "Skip zero"


@ -1495,8 +1514,9 @@ in UTF-8 mode. The code that uses this table must know about such things. */
  3, 3,                          /* RREF, NRREF                            */ \
  1,                             /* DEF                                    */ \
  1, 1,                          /* BRAZERO, BRAMINZERO                    */ \
-  1, 1, 1, 1,                    /* PRUNE, SKIP, THEN, COMMIT,             */ \
-  1, 1, 3, 1                     /* FAIL, ACCEPT, CLOSE, SKIPZERO          */
+  3, 1, 3,                       /* MARK, PRUNE, PRUNE_ARG,                */ \
+  1, 3, 1, 3,                    /* SKIP, SKIP_ARG, THEN, THEN_ARG,        */ \
+  1, 1, 1, 3, 1                  /* COMMIT, FAIL, ACCEPT, CLOSE, SKIPZERO  */


 /* A magic value for OP_RREF and OP_NRREF to indicate the "any recursion"
@ -1514,7 +1534,7 @@ enum { ERR0,  ERR1,  ERR2,  ERR3,  ERR4,  ERR5,  ERR6,  ERR7,  ERR8,  ERR9,
       ERR30, ERR31, ERR32, ERR33, ERR34, ERR35, ERR36, ERR37, ERR38, ERR39,
       ERR40, ERR41, ERR42, ERR43, ERR44, ERR45, ERR46, ERR47, ERR48, ERR49,
       ERR50, ERR51, ERR52, ERR53, ERR54, ERR55, ERR56, ERR57, ERR58, ERR59,
-       ERR60, ERR61, ERR62, ERR63, ERR64, ERR65, ERRCOUNT };
+       ERR60, ERR61, ERR62, ERR63, ERR64, ERR65, ERR66, ERR67, ERRCOUNT };

 /* The real format of the start of the pcre block; the index of names and the
 code vector run on as long as necessary after the end. We store an explicit
@ -1657,6 +1677,7 @@ typedef struct match_data {
  BOOL   noteol;                /* NOTEOL flag */
  BOOL   utf8;                  /* UTF8 flag */
  BOOL   jscript_compat;        /* JAVASCRIPT_COMPAT flag */
+  BOOL   use_ucp;               /* PCRE_UCP flag */
  BOOL   endonly;               /* Dollar not before final \n */
  BOOL   notempty;              /* Empty string match not wanted */
  BOOL   notempty_atstart;      /* Empty string match at start not wanted */
@ -1676,6 +1697,7 @@ typedef struct match_data {
  int    eptrn;                 /* Next free eptrblock */
  recursion_info *recursive;    /* Linked list of recursion data */
  void  *callout_data;          /* To pass back to callouts */
+  const uschar *mark;           /* Mark pointer to pass back */
 } match_data;

 /* A similar structure is used for the same purpose by the DFA matching
--- a/ext/pcre/pcrelib/pcre_printint.src
+++ b/ext/pcre/pcrelib/pcre_printint.src
@ -534,6 +534,14 @@ for(;;)
      }
    break;

+    case OP_MARK:
+    case OP_PRUNE_ARG:
+    case OP_SKIP_ARG:
+    case OP_THEN_ARG:
+    fprintf(f, "    %s %s", OP_names[*code], code + 2);
+    extra += code[1];
+    break;
+
    /* Anything else is just an item with no data*/

    default:
--- a/ext/pcre/pcrelib/pcre_study.c
+++ b/ext/pcre/pcrelib/pcre_study.c
@ -46,6 +46,7 @@ supporting functions. */

 #include "pcre_internal.h"

+#define SET_BIT(c) start_bits[c/8] |= (1 << (c&7))

 /* Returns from set_start_bits() */

@ -411,6 +412,15 @@ for (;;)
 #endif
    break;

+    /* Skip these, but we need to add in the name length. */
+
+    case OP_MARK:
+    case OP_PRUNE_ARG:
+    case OP_SKIP_ARG:
+    case OP_THEN_ARG:
+    cc += _pcre_OP_lengths[op] + cc[1];
+    break;
+
    /* For the record, these are the opcodes that are matched by "default":
    OP_ACCEPT, OP_CLOSE, OP_COMMIT, OP_FAIL, OP_PRUNE, OP_SET_SOM, OP_SKIP,
    OP_THEN. */
@ -429,25 +439,121 @@ for (;;)
 *      Set a bit and maybe its alternate case    *
 *************************************************/

-/* Given a character, set its bit in the table, and also the bit for the other
-version of a letter if we are caseless.
+/* Given a character, set its first byte's bit in the table, and also the
+corresponding bit for the other version of a letter if we are caseless. In
+UTF-8 mode, for characters greater than 127, we can only do the caseless thing
+when Unicode property support is available.

 Arguments:
  start_bits    points to the bit map
-  c             is the character
+  p             points to the character
  caseless      the caseless flag
  cd            the block with char table pointers
+  utf8          TRUE for UTF-8 mode

-Returns:        nothing
+Returns:        pointer after the character
+*/
+
+static const uschar *
+set_table_bit(uschar *start_bits, const uschar *p, BOOL caseless,
+  compile_data *cd, BOOL utf8)
+{
+unsigned int c = *p;
+
+SET_BIT(c);
+
+#ifdef SUPPORT_UTF8
+if (utf8 && c > 127)
+  {
+  GETCHARINC(c, p);
+#ifdef SUPPORT_UCP
+  if (caseless)
+    {
+    uschar buff[8];
+    c = UCD_OTHERCASE(c);
+    (void)_pcre_ord2utf8(c, buff);
+    SET_BIT(buff[0]);
+    }
+#endif
+  return p;
+  }
+#endif
+
+/* Not UTF-8 mode, or character is less than 127. */
+
+if (caseless && (cd->ctypes[c] & ctype_letter) != 0) SET_BIT(cd->fcc[c]);
+return p + 1;
+}
+
+
+
+/*************************************************
+*     Set bits for a positive character type     *
+*************************************************/
+
+/* This function sets starting bits for a character type. In UTF-8 mode, we can
+only do a direct setting for bytes less than 128, as otherwise there can be
+confusion with bytes in the middle of UTF-8 characters. In a "traditional"
+environment, the tables will only recognize ASCII characters anyway, but in at
+least one Windows environment, some higher bytes bits were set in the tables.
+So we deal with that case by considering the UTF-8 encoding.
+
+Arguments:
+  start_bits     the starting bitmap
+  cbit type      the type of character wanted
+  table_limit    32 for non-UTF-8; 16 for UTF-8
+  cd             the block with char table pointers
+
+Returns:         nothing
 */

 static void
-set_table_bit(uschar *start_bits, unsigned int c, BOOL caseless,
+set_type_bits(uschar *start_bits, int cbit_type, int table_limit,
  compile_data *cd)
 {
-start_bits[c/8] |= (1 << (c&7));
-if (caseless && (cd->ctypes[c] & ctype_letter) != 0)
-  start_bits[cd->fcc[c]/8] |= (1 << (cd->fcc[c]&7));
+register int c;
+for (c = 0; c < table_limit; c++) start_bits[c] |= cd->cbits[c+cbit_type];
+if (table_limit == 32) return;
+for (c = 128; c < 256; c++)
+  {
+  if ((cd->cbits[c/8] & (1 << (c&7))) != 0)
+    {
+    uschar buff[8];
+    (void)_pcre_ord2utf8(c, buff);
+    SET_BIT(buff[0]);
+    }
+  }
+}
+
+
+/*************************************************
+*     Set bits for a negative character type     *
+*************************************************/
+
+/* This function sets starting bits for a negative character type such as \D.
+In UTF-8 mode, we can only do a direct setting for bytes less than 128, as
+otherwise there can be confusion with bytes in the middle of UTF-8 characters.
+Unlike in the positive case, where we can set appropriate starting bits for
+specific high-valued UTF-8 characters, in this case we have to set the bits for
+all high-valued characters. The lowest is 0xc2, but we overkill by starting at
+0xc0 (192) for simplicity.
+
+Arguments:
+  start_bits     the starting bitmap
+  cbit type      the type of character wanted
+  table_limit    32 for non-UTF-8; 16 for UTF-8
+  cd             the block with char table pointers
+
+Returns:         nothing
+*/
+
+static void
+set_nottype_bits(uschar *start_bits, int cbit_type, int table_limit,
+  compile_data *cd)
+{
+register int c;
+for (c = 0; c < table_limit; c++) start_bits[c] |= ~cd->cbits[c+cbit_type];
+if (table_limit != 32) for (c = 24; c < 32; c++) start_bits[c] = 0xff;
 }


@ -482,6 +588,7 @@ set_start_bits(const uschar *code, uschar *start_bits, BOOL caseless,
 {
 register int c;
 int yield = SSB_DONE;
+int table_limit = utf8? 16:32;

 #if 0
 /* ========================================================================= */
@ -605,12 +712,7 @@ do
      case OP_QUERY:
      case OP_MINQUERY:
      case OP_POSQUERY:
-      set_table_bit(start_bits, tcode[1], caseless, cd);
-      tcode += 2;
-#ifdef SUPPORT_UTF8
-      if (utf8 && tcode[-1] >= 0xc0)
-        tcode += _pcre_utf8_table4[tcode[-1] & 0x3f];
-#endif
+      tcode = set_table_bit(start_bits, tcode + 1, caseless, cd, utf8);
      break;

      /* Single-char upto sets the bit and tries the next */
@ -618,12 +720,7 @@ do
      case OP_UPTO:
      case OP_MINUPTO:
      case OP_POSUPTO:
-      set_table_bit(start_bits, tcode[3], caseless, cd);
-      tcode += 4;
-#ifdef SUPPORT_UTF8
-      if (utf8 && tcode[-1] >= 0xc0)
-        tcode += _pcre_utf8_table4[tcode[-1] & 0x3f];
-#endif
+      tcode = set_table_bit(start_bits, tcode + 3, caseless, cd, utf8);
      break;

      /* At least one single char sets the bit and stops */
@ -636,59 +733,86 @@ do
      case OP_PLUS:
      case OP_MINPLUS:
      case OP_POSPLUS:
-      set_table_bit(start_bits, tcode[1], caseless, cd);
+      (void)set_table_bit(start_bits, tcode + 1, caseless, cd, utf8);
      try_next = FALSE;
      break;

-      /* Single character type sets the bits and stops */
+      /* Special spacing and line-terminating items. These recognize specific
+      lists of characters. The difference between VSPACE and ANYNL is that the
+      latter can match the two-character CRLF sequence, but that is not
+      relevant for finding the first character, so their code here is
+      identical. */
+
+      case OP_HSPACE:
+      SET_BIT(0x09);
+      SET_BIT(0x20);
+      if (utf8)
+        {
+        SET_BIT(0xC2);  /* For U+00A0 */
+        SET_BIT(0xE1);  /* For U+1680, U+180E */
+        SET_BIT(0xE2);  /* For U+2000 - U+200A, U+202F, U+205F */
+        SET_BIT(0xE3);  /* For U+3000 */
+        }
+      else SET_BIT(0xA0);
+      try_next = FALSE;
+      break;
+
+      case OP_ANYNL:
+      case OP_VSPACE:
+      SET_BIT(0x0A);
+      SET_BIT(0x0B);
+      SET_BIT(0x0C);
+      SET_BIT(0x0D);
+      if (utf8)
+        {
+        SET_BIT(0xC2);  /* For U+0085 */
+        SET_BIT(0xE2);  /* For U+2028, U+2029 */
+        }
+      else SET_BIT(0x85);
+      try_next = FALSE;
+      break;
+
+      /* Single character types set the bits and stop. Note that if PCRE_UCP
+      is set, we do not see these op codes because \d etc are converted to
+      properties. Therefore, these apply in the case when only characters less
+      than 256 are recognized to match the types. */

      case OP_NOT_DIGIT:
-      for (c = 0; c < 32; c++)
-        start_bits[c] |= ~cd->cbits[c+cbit_digit];
+      set_nottype_bits(start_bits, cbit_digit, table_limit, cd);
      try_next = FALSE;
      break;

      case OP_DIGIT:
-      for (c = 0; c < 32; c++)
-        start_bits[c] |= cd->cbits[c+cbit_digit];
+      set_type_bits(start_bits, cbit_digit, table_limit, cd);
      try_next = FALSE;
      break;

      /* The cbit_space table has vertical tab as whitespace; we have to
-      discard it. */
+      ensure it is set as not whitespace. */

      case OP_NOT_WHITESPACE:
-      for (c = 0; c < 32; c++)
-        {
-        int d = cd->cbits[c+cbit_space];
-        if (c == 1) d &= ~0x08;
-        start_bits[c] |= ~d;
-        }
+      set_nottype_bits(start_bits, cbit_space, table_limit, cd);
+      start_bits[1] |= 0x08;
      try_next = FALSE;
      break;

      /* The cbit_space table has vertical tab as whitespace; we have to
-      discard it. */
+      not set it from the table. */

      case OP_WHITESPACE:
-      for (c = 0; c < 32; c++)
-        {
-        int d = cd->cbits[c+cbit_space];
-        if (c == 1) d &= ~0x08;
-        start_bits[c] |= d;
-        }
+      c = start_bits[1];    /* Save in case it was already set */
+      set_type_bits(start_bits, cbit_space, table_limit, cd);
+      start_bits[1] = (start_bits[1] & ~0x08) | c;
      try_next = FALSE;
      break;

      case OP_NOT_WORDCHAR:
-      for (c = 0; c < 32; c++)
-        start_bits[c] |= ~cd->cbits[c+cbit_word];
+      set_nottype_bits(start_bits, cbit_word, table_limit, cd);
      try_next = FALSE;
      break;

      case OP_WORDCHAR:
-      for (c = 0; c < 32; c++)
-        start_bits[c] |= cd->cbits[c+cbit_word];
+      set_type_bits(start_bits, cbit_word, table_limit, cd);
      try_next = FALSE;
      break;

@ -697,6 +821,7 @@ do

      case OP_TYPEPLUS:
      case OP_TYPEMINPLUS:
+      case OP_TYPEPOSPLUS:
      tcode++;
      break;

@ -720,52 +845,69 @@ do
      case OP_TYPEPOSQUERY:
      switch(tcode[1])
        {
+        default:
        case OP_ANY:
        case OP_ALLANY:
        return SSB_FAIL;

+        case OP_HSPACE:
+        SET_BIT(0x09);
+        SET_BIT(0x20);
+        if (utf8)
+          {
+          SET_BIT(0xC2);  /* For U+00A0 */
+          SET_BIT(0xE1);  /* For U+1680, U+180E */
+          SET_BIT(0xE2);  /* For U+2000 - U+200A, U+202F, U+205F */
+          SET_BIT(0xE3);  /* For U+3000 */
+          }
+        else SET_BIT(0xA0);
+        break;
+
+        case OP_ANYNL:
+        case OP_VSPACE:
+        SET_BIT(0x0A);
+        SET_BIT(0x0B);
+        SET_BIT(0x0C);
+        SET_BIT(0x0D);
+        if (utf8)
+          {
+          SET_BIT(0xC2);  /* For U+0085 */
+          SET_BIT(0xE2);  /* For U+2028, U+2029 */
+          }
+        else SET_BIT(0x85);
+        break;
+
        case OP_NOT_DIGIT:
-        for (c = 0; c < 32; c++)
-          start_bits[c] |= ~cd->cbits[c+cbit_digit];
+        set_nottype_bits(start_bits, cbit_digit, table_limit, cd);
        break;

        case OP_DIGIT:
-        for (c = 0; c < 32; c++)
-          start_bits[c] |= cd->cbits[c+cbit_digit];
+        set_type_bits(start_bits, cbit_digit, table_limit, cd);
        break;

        /* The cbit_space table has vertical tab as whitespace; we have to
-        discard it. */
+        ensure it gets set as not whitespace. */

        case OP_NOT_WHITESPACE:
-        for (c = 0; c < 32; c++)
-          {
-          int d = cd->cbits[c+cbit_space];
-          if (c == 1) d &= ~0x08;
-          start_bits[c] |= ~d;
-          }
+        set_nottype_bits(start_bits, cbit_space, table_limit, cd);
+        start_bits[1] |= 0x08;
        break;

        /* The cbit_space table has vertical tab as whitespace; we have to
-        discard it. */
+        avoid setting it. */

        case OP_WHITESPACE:
-        for (c = 0; c < 32; c++)
-          {
-          int d = cd->cbits[c+cbit_space];
-          if (c == 1) d &= ~0x08;
-          start_bits[c] |= d;
-          }
+        c = start_bits[1];    /* Save in case it was already set */
+        set_type_bits(start_bits, cbit_space, table_limit, cd);
+        start_bits[1] = (start_bits[1] & ~0x08) | c;
        break;

        case OP_NOT_WORDCHAR:
-        for (c = 0; c < 32; c++)
-          start_bits[c] |= ~cd->cbits[c+cbit_word];
+        set_nottype_bits(start_bits, cbit_word, table_limit, cd);
        break;

        case OP_WORDCHAR:
-        for (c = 0; c < 32; c++)
-          start_bits[c] |= cd->cbits[c+cbit_word];
+        set_type_bits(start_bits, cbit_word, table_limit, cd);
        break;
        }

--- a/ext/pcre/pcrelib/pcre_tables.c
+++ b/ext/pcre/pcrelib/pcre_tables.c
@ -241,6 +241,10 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
 #define STRING_Tifinagh0 STR_T STR_i STR_f STR_i STR_n STR_a STR_g STR_h "\0"
 #define STRING_Ugaritic0 STR_U STR_g STR_a STR_r STR_i STR_t STR_i STR_c "\0"
 #define STRING_Vai0 STR_V STR_a STR_i "\0"
+#define STRING_Xan0 STR_X STR_a STR_n "\0"
+#define STRING_Xps0 STR_X STR_p STR_s "\0"
+#define STRING_Xsp0 STR_X STR_s STR_p "\0"
+#define STRING_Xwd0 STR_X STR_w STR_d "\0"
 #define STRING_Yi0 STR_Y STR_i "\0"
 #define STRING_Z0 STR_Z "\0"
 #define STRING_Zl0 STR_Z STR_l "\0"
@ -374,6 +378,10 @@ const char _pcre_utt_names[] =
  STRING_Tifinagh0
  STRING_Ugaritic0
  STRING_Vai0
+  STRING_Xan0
+  STRING_Xps0
+  STRING_Xsp0
+  STRING_Xwd0
  STRING_Yi0
  STRING_Z0
  STRING_Zl0
@ -507,11 +515,15 @@ const ucp_type_table _pcre_utt[] = {
  { 891, PT_SC, ucp_Tifinagh },
  { 900, PT_SC, ucp_Ugaritic },
  { 909, PT_SC, ucp_Vai },
-  { 913, PT_SC, ucp_Yi },
-  { 916, PT_GC, ucp_Z },
-  { 918, PT_PC, ucp_Zl },
-  { 921, PT_PC, ucp_Zp },
-  { 924, PT_PC, ucp_Zs }
+  { 913, PT_ALNUM, 0 },
+  { 917, PT_PXSPACE, 0 },
+  { 921, PT_SPACE, 0 },
+  { 925, PT_WORD, 0 },
+  { 929, PT_SC, ucp_Yi },
+  { 932, PT_GC, ucp_Z },
+  { 934, PT_PC, ucp_Zl },
+  { 937, PT_PC, ucp_Zp },
+  { 940, PT_PC, ucp_Zs }
 };

 const int _pcre_utt_size = sizeof(_pcre_utt)/sizeof(ucp_type_table);
--- a/ext/pcre/pcrelib/pcre_xclass.c
+++ b/ext/pcre/pcrelib/pcre_xclass.c
@ -6,7 +6,7 @@
 and semantics are as close as possible to those of the Perl 5 language.

                       Written by Philip Hazel
-           Copyright (c) 1997-2009 University of Cambridge
+           Copyright (c) 1997-2010 University of Cambridge

 -----------------------------------------------------------------------------
 Redistribution and use in source and binary forms, with or without
@ -110,12 +110,13 @@ while ((t = *data++) != XCL_END)
      break;

      case PT_LAMP:
-      if ((prop->chartype == ucp_Lu || prop->chartype == ucp_Ll || prop->chartype == ucp_Lt) ==
-          (t == XCL_PROP)) return !negated;
+      if ((prop->chartype == ucp_Lu || prop->chartype == ucp_Ll ||
+           prop->chartype == ucp_Lt) == (t == XCL_PROP)) return !negated;
      break;

      case PT_GC:
-      if ((data[1] == _pcre_ucp_gentype[prop->chartype]) == (t == XCL_PROP)) return !negated;
+      if ((data[1] == _pcre_ucp_gentype[prop->chartype]) == (t == XCL_PROP))
+        return !negated;
      break;

      case PT_PC:
@ -126,6 +127,33 @@ while ((t = *data++) != XCL_END)
      if ((data[1] == prop->script) == (t == XCL_PROP)) return !negated;
      break;

+      case PT_ALNUM:
+      if ((_pcre_ucp_gentype[prop->chartype] == ucp_L ||
+           _pcre_ucp_gentype[prop->chartype] == ucp_N) == (t == XCL_PROP))
+        return !negated;
+      break;
+
+      case PT_SPACE:    /* Perl space */
+      if ((_pcre_ucp_gentype[prop->chartype] == ucp_Z ||
+           c == CHAR_HT || c == CHAR_NL || c == CHAR_FF || c == CHAR_CR)
+             == (t == XCL_PROP))
+        return !negated;
+      break;
+
+      case PT_PXSPACE:  /* POSIX space */
+      if ((_pcre_ucp_gentype[prop->chartype] == ucp_Z ||
+           c == CHAR_HT || c == CHAR_NL || c == CHAR_VT ||
+           c == CHAR_FF || c == CHAR_CR) == (t == XCL_PROP))
+        return !negated;
+      break;
+
+      case PT_WORD:
+      if ((_pcre_ucp_gentype[prop->chartype] == ucp_L ||
+           _pcre_ucp_gentype[prop->chartype] == ucp_N || c == CHAR_UNDERSCORE)
+             == (t == XCL_PROP))
+        return !negated;
+      break;
+
      /* This should never occur, but compilers may mutter if there is no
      default. */

--- a/ext/pcre/pcrelib/pcreposix.c
+++ b/ext/pcre/pcrelib/pcreposix.c
@ -6,7 +6,7 @@
 and semantics are as close as possible to those of the Perl 5 language.

                       Written by Philip Hazel
-           Copyright (c) 1997-2009 University of Cambridge
+           Copyright (c) 1997-2010 University of Cambridge

 -----------------------------------------------------------------------------
 Redistribution and use in source and binary forms, with or without
@ -55,6 +55,11 @@ previously been set. */
 #  define PCREPOSIX_EXP_DEFN __declspec(dllexport)
 #endif

+/* We include pcre.h before pcre_internal.h so that the PCRE library functions
+are declared as "import" for Windows by defining PCRE_EXP_DECL as "import".
+This is needed even though pcre_internal.h itself includes pcre.h, because it
+does so after it has set PCRE_EXP_DECL to "export" if it is not already set. */
+
 #include "pcre.h"
 #include "pcre_internal.h"
 #include "pcreposix.h"
@ -133,7 +138,7 @@ static const int eint[] = {
  REG_INVARG,  /* inconsistent NEWLINE options */
  REG_BADPAT,  /* \g is not followed followed by an (optionally braced) non-zero number */
  REG_BADPAT,  /* a numbered reference must not be zero */
-  REG_BADPAT,  /* (*VERB) with an argument is not supported */
+  REG_BADPAT,  /* an argument is not allowed for (*ACCEPT), (*FAIL), or (*COMMIT) */
  /* 60 */
  REG_BADPAT,  /* (*VERB) not recognized */
  REG_BADPAT,  /* number is too big */
@ -141,7 +146,9 @@ static const int eint[] = {
  REG_BADPAT,  /* digit expected after (?+ */
  REG_BADPAT,  /* ] is an invalid data character in JavaScript compatibility mode */
  /* 65 */
-  REG_BADPAT   /* different names for subpatterns of the same number are not allowed */
+  REG_BADPAT,  /* different names for subpatterns of the same number are not allowed */
+  REG_BADPAT,  /* (*MARK) must have an argument */
+  REG_INVARG,  /* this version of PCRE is not compiled with PCRE_UCP support */
 };

 /* Table of texts corresponding to POSIX error codes */
@ -245,6 +252,7 @@ if ((cflags & REG_NEWLINE) != 0)  options |= PCRE_MULTILINE;
 if ((cflags & REG_DOTALL) != 0)   options |= PCRE_DOTALL;
 if ((cflags & REG_NOSUB) != 0)    options |= PCRE_NO_AUTO_CAPTURE;
 if ((cflags & REG_UTF8) != 0)     options |= PCRE_UTF8;
+if ((cflags & REG_UCP) != 0)      options |= PCRE_UCP;
 if ((cflags & REG_UNGREEDY) != 0) options |= PCRE_UNGREEDY;

 preg->re_pcre = pcre_compile2(pattern, options, &errorcode, &errorptr,
@ -334,13 +342,13 @@ if ((eflags & REG_STARTEND) != 0)
 else
  {
  so = 0;
-  eo = strlen(string);
+  eo = (int)strlen(string);
  }

 rc = pcre_exec((const pcre *)preg->re_pcre, NULL, string + so, (eo - so),
-  0, options, ovector, nmatch * 3);
+  0, options, ovector, (int)(nmatch * 3));

-if (rc == 0) rc = nmatch;    /* All captured slots were filled in */
+if (rc == 0) rc = (int)nmatch;    /* All captured slots were filled in */

 /* Successful match */

--- a/ext/pcre/pcrelib/pcreposix.h
+++ b/ext/pcre/pcrelib/pcreposix.h
@ -62,6 +62,7 @@ extern "C" {
 #define REG_STARTEND  0x0080   /* BSD feature: pass subject string by so,eo */
 #define REG_NOTEMPTY  0x0100   /* NOT defined by POSIX; maps to PCRE_NOTEMPTY */
 #define REG_UNGREEDY  0x0200   /* NOT defined by POSIX; maps to PCRE_UNGREEDY */
+#define REG_UCP       0x0400   /* NOT defined by POSIX; maps to PCRE_UCP */

 /* This is not used by PCRE, but by defining it we make it easier
 to slot PCRE into existing programs that make POSIX calls. */
--- a/ext/pcre/pcrelib/testdata/testinput10
+++ b/ext/pcre/pcrelib/testdata/testinput10
@ -1,7 +1,8 @@
 /-- These are a few representative patterns whose lengths and offsets are to be 
 shown when the link size is 2. This is just a doublecheck test to ensure the 
 sizes don't go horribly wrong when something is changed. The pattern contents 
-are all themselves checked in other tests. --/
+are all themselves checked in other tests. Unicode, including property support, 
+is required for these tests. --/

 /((?i)b)/BM

@ -121,4 +122,14 @@ are all themselves checked in other tests. --/

 /[^\xaa]/8BM

+/[^\d]/8WB
+
+/[[:^alpha:][:^cntrl:]]+/8WB
+
+/[[:^cntrl:][:^alpha:]]+/8WB
+
+/[[:alpha:]]+/8WB
+
+/[[:^alpha:]\S]+/8WB
+
 /-- End of testinput10 --/
--- a/ext/pcre/pcrelib/testdata/testinput2
+++ b/ext/pcre/pcrelib/testdata/testinput2
@ -2,12 +2,12 @@
    of PCRE's API, error diagnostics, and the compiled code of some patterns.
    It also checks the non-Perl syntax the PCRE supports (Python, .NET, 
    Oniguruma). Finally, there are some tests where PCRE and Perl differ, 
-    either because PCRE can't be compatible, or there is potential Perl 
+    either because PCRE can't be compatible, or there is a possible Perl 
    bug. --/  
  
-/-- Originally, the Perl 5.10 things were in here too, but now I have separated
-    many (most?) of them out into test 11. However, there may still be some
-    that were overlooked. --/   
+/-- Originally, the Perl 5.10 and 5.11 things were in here too, but now I have 
+    separated many (most?) of them out into test 11. However, there may still 
+    be some that were overlooked. --/   

 /(a)b|/I

@ -51,6 +51,16 @@

 /(?X)[\B]/

+/(?X)[\R]/
+
+/(?X)[\X]/
+
+/[\B]/BZ
+
+/[\R]/BZ
+
+/[\X]/BZ
+
 /[z-a]/

 /^*/
@ -2279,8 +2289,6 @@ a random value. /Ix
 /a+b?(*THEN)c+(*FAIL)/C
    aaabccc
    
-/a(*PRUNE:XXX)b/
-
 /a(*MARK)b/ 

 /(?i:A{1,}\6666666666)/
@ -3232,4 +3240,255 @@ a random value. /Ix

 /(?P<L1>(?P<L2>0|)|(?P>L2)(?P>L1))/

+/abc(*MARK:)pqr/
+
+/abc(*:)pqr/
+
+/abc(*FAIL:123)xyz/
+
+/--- This should, and does, fail. In Perl, it does not, which I think is a 
+     bug because replacing the B in the pattern by (B|D) does make it fail. ---/
+
+/A(*COMMIT)B/+K
+    ACABX
+
+/--- These should be different, but in Perl 5.11 are not, which I think
+     is a bug in Perl. ---/
+
+/A(*THEN)B|A(*THEN)C/K
+    AC
+
+/A(*PRUNE)B|A(*PRUNE)C/K
+    AC
+    
+/--- A whole lot of tests of verbs with arguments are here rather than in test
+     11 because Perl doesn't seem to follow its specification entirely 
+     correctly. ---/
+
+/--- Perl 5.11 sets $REGERROR on the AC failure case here; PCRE does not. It is
+     not clear how Perl defines "involved in the failure of the match". ---/ 
+
+/^(A(*THEN:A)B|C(*THEN:B)D)/K
+    AB
+    CD
+    ** Failers
+    AC
+    CB    
+    
+/--- Check the use of names for success and failure. PCRE doesn't show these 
+names for success, though Perl does, contrary to its spec. ---/
+
+/^(A(*PRUNE:A)B|C(*PRUNE:B)D)/K
+    AB
+    CD
+    ** Failers
+    AC
+    CB    
+    
+/--- An empty name does not pass back an empty string. It is the same as if no
+name were given. ---/ 
+
+/^(A(*PRUNE:)B|C(*PRUNE:B)D)/K
+    AB
+    CD 
+
+/--- PRUNE goes to next bumpalong; COMMIT does not. ---/
+    
+/A(*PRUNE:A)B/K
+    ACAB
+
+/(*MARK:A)(*PRUNE:B)(C|X)/K
+    C
+    D 
+
+/(*MARK:A)(*THEN:B)(C|X)/K
+    C
+    D 
+
+/--- This should fail, as the skip causes a bump to offset 3 (the skip) ---/
+
+/A(*MARK:A)A+(*SKIP)(B|Z) | AC/xK
+    AAAC
+
+/--- Same --/
+
+/A(*MARK:A)A+(*MARK:B)(*SKIP:B)(B|Z) | AC/xK
+    AAAC
+
+/--- This should fail; the SKIP advances by one, but when we get to AC, the
+     PRUNE kills it. ---/ 
+    
+/A(*PRUNE:A)A+(*SKIP:A)(B|Z) | AC/xK
+    AAAC
+
+/A(*:A)A+(*SKIP)(B|Z) | AC/xK
+    AAAC
+
+/--- This should fail, as a null name is the same as no name ---/
+
+/A(*MARK:A)A+(*SKIP:)(B|Z) | AC/xK
+    AAAC
+
+/--- This fails in PCRE, and I think that is in accordance with Perl's 
+     documentation, though in Perl it succeeds. ---/
+    
+/A(*MARK:A)A+(*SKIP:B)(B|Z) | AAC/xK
+    AAAC
+
+/--- Mark names can be duplicated ---/
+
+/A(*:A)B|X(*:A)Y/K
+    AABC
+    XXYZ 
+    
+/^A(*:A)B|^X(*:A)Y/K
+    ** Failers
+    XAQQ
+    
+/--- A check on what happens after hitting a mark and them bumping along to
+something that does not even start. Perl reports tags after the failures here, 
+though it does not when the individual letters are made into something 
+more complicated. ---/
+
+/A(*:A)B|XX(*:B)Y/K
+    AABC
+    XXYZ 
+    ** Failers
+    XAQQ  
+    XAQQXZZ  
+    AXQQQ 
+    AXXQQQ 
+    
+/--- COMMIT at the start of a pattern should be the same as an anchor. Perl 
+optimizations defeat this. So does the PCRE optimization unless we disable it 
+with \Y. ---/
+
+/(*COMMIT)ABC/
+    ABCDEFG
+    ** Failers
+    DEFGABC\Y  
+    
+/--- Repeat some tests with added studying. ---/
+
+/A(*COMMIT)B/+KS
+    ACABX
+ 
+/A(*THEN)B|A(*THEN)C/KS
+    AC
+
+/A(*PRUNE)B|A(*PRUNE)C/KS
+    AC
+
+/^(A(*THEN:A)B|C(*THEN:B)D)/KS
+    AB
+    CD
+    ** Failers
+    AC
+    CB    
+
+/^(A(*PRUNE:A)B|C(*PRUNE:B)D)/KS
+    AB
+    CD
+    ** Failers
+    AC
+    CB    
+
+/^(A(*PRUNE:)B|C(*PRUNE:B)D)/KS
+    AB
+    CD 
+
+/A(*PRUNE:A)B/KS
+    ACAB
+
+/(*MARK:A)(*PRUNE:B)(C|X)/KS
+    C
+    D 
+
+/(*MARK:A)(*THEN:B)(C|X)/KS
+    C
+    D 
+
+/A(*MARK:A)A+(*SKIP)(B|Z) | AC/xKS
+    AAAC
+
+/A(*MARK:A)A+(*MARK:B)(*SKIP:B)(B|Z) | AC/xKS
+    AAAC
+    
+/A(*PRUNE:A)A+(*SKIP:A)(B|Z) | AC/xKS
+    AAAC
+
+/A(*:A)A+(*SKIP)(B|Z) | AC/xKS
+    AAAC
+
+/A(*MARK:A)A+(*SKIP:)(B|Z) | AC/xKS
+    AAAC
+
+/A(*MARK:A)A+(*SKIP:B)(B|Z) | AAC/xKS
+    AAAC
+
+/A(*:A)B|XX(*:B)Y/KS
+    AABC
+    XXYZ 
+    ** Failers
+    XAQQ  
+    XAQQXZZ  
+    AXQQQ 
+    AXXQQQ 
+    
+/(*COMMIT)ABC/
+    ABCDEFG
+    ** Failers
+    DEFGABC\Y  
+
+/^(ab (c+(*THEN)cd) | xyz)/x
+    abcccd  
+
+/^(ab (c+(*PRUNE)cd) | xyz)/x
+    abcccd  
+
+/^(ab (c+(*FAIL)cd) | xyz)/x
+    abcccd  
+    
+/--- Perl 5.11 gets some of these wrong ---/ 
+
+/(?>.(*ACCEPT))*?5/
+    abcde
+
+/(.(*ACCEPT))*?5/
+    abcde
+
+/(.(*ACCEPT))5/
+    abcde
+
+/(.(*ACCEPT))*5/
+    abcde
+
+/A\NB./BZ
+  ACBD
+  ** Failers
+  A\nB
+  ACB\n   
+
+/A\NB./sBZ
+  ACBD
+  ACB\n 
+  ** Failers
+  A\nB  
+  
+/A\NB/<crlf>
+  A\nB
+  A\rB
+  ** Failers
+  A\r\nB    
+
+/\R+b/BZ
+
+/\R+\n/BZ
+
+/\R+\d/BZ
+
+/\d*\R/BZ
+
+/\s*\R/BZ
+
 /-- End of testinput2 --/
--- a/ext/pcre/pcrelib/testdata/testinput5
+++ b/ext/pcre/pcrelib/testdata/testinput5
@ -745,4 +745,53 @@ can't tell the difference.) --/
 /X\W{3}X/8
    \PX

+/\h/SI
+
+/\h/SI8
+    ABC\x{09}
+    ABC\x{20}
+    ABC\x{a0}
+    ABC\x{1680}
+    ABC\x{180e}
+    ABC\x{2000}
+    ABC\x{202f} 
+    ABC\x{205f} 
+    ABC\x{3000} 
+
+/\v/SI
+
+/\v/SI8
+    ABC\x{0a}
+    ABC\x{0b}
+    ABC\x{0c}
+    ABC\x{0d}
+    ABC\x{85}
+    ABC\x{2028}
+
+/\R/SI
+
+/\R/SI8
+
+/\h*A/SI8
+    CDBABC
+    
+/\v+A/SI8
+
+/\s?xxx\s/8SI
+
+/\sxxx\s/8T1
+    AB\x{85}xxx\x{a0}XYZ
+    AB\x{a0}xxx\x{85}XYZ
+
+/\sxxx\s/I8ST1
+    AB\x{85}xxx\x{a0}XYZ
+    AB\x{a0}xxx\x{85}XYZ
+
+/\S \S/8T1
+    \x{a2} \x{84} 
+
+/\S \S/I8ST1
+    \x{a2} \x{84} 
+    A Z 
+
 /-- End of testinput5 --/
--- a/ext/pcre/pcrelib/testdata/testinput6
+++ b/ext/pcre/pcrelib/testdata/testinput6
@ -752,4 +752,54 @@
 /\p{Avestan}\p{Bamum}\p{Egyptian_Hieroglyphs}\p{Imperial_Aramaic}\p{Inscriptional_Pahlavi}\p{Inscriptional_Parthian}\p{Javanese}\p{Kaithi}\p{Lisu}\p{Meetei_Mayek}\p{Old_South_Arabian}\p{Old_Turkic}\p{Samaritan}\p{Tai_Tham}\p{Tai_Viet}/8
    \x{10b00}\x{a6ef}\x{13007}\x{10857}\x{10b78}\x{10b58}\x{a980}\x{110c1}\x{a4ff}\x{abc0}\x{10a7d}\x{10c48}\x{0800}\x{1aad}\x{aac0}

+/^\w+/8W
+    Az_\x{aa}\x{c0}\x{1c5}\x{2b0}\x{3b6}\x{1d7c9}\x{2fa1d}1\x{660}\x{bef}\x{16ee}
+
+/^[[:xdigit:]]*/8W
+    1a\x{660}\x{bef}\x{16ee}
+  
+/^\d+/8W
+    1\x{660}\x{bef}\x{16ee}
+  
+/^[[:digit:]]+/8W
+    1\x{660}\x{bef}\x{16ee}
+
+/^>\s+/8W
+    >\x{20}\x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{9}\x{b} 
+  
+/^>\pZ+/8W
+    >\x{20}\x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{9}\x{b} 
+  
+/^>[[:space:]]*/8W
+    >\x{20}\x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{9}\x{b} 
+
+/^>[[:blank:]]*/8W
+    >\x{20}\x{a0}\x{1680}\x{180e}\x{2000}\x{202f}\x{9}\x{b}\x{2028} 
+
+/^[[:alpha:]]*/8W
+    Az\x{aa}\x{c0}\x{1c5}\x{2b0}\x{3b6}\x{1d7c9}\x{2fa1d}
+
+/^[[:alnum:]]*/8W
+    Az\x{aa}\x{c0}\x{1c5}\x{2b0}\x{3b6}\x{1d7c9}\x{2fa1d}1\x{660}\x{bef}\x{16ee}
+
+/^[[:cntrl:]]*/8W
+    \x{0}\x{09}\x{1f}\x{7f}\x{9f} 
+
+/^[[:graph:]]*/8W
+    A\x{a1}\x{a0}
+
+/^[[:print:]]*/8W
+    A z\x{a0}\x{a1}
+
+/^[[:punct:]]*/8W
+    .+\x{a1}\x{a0}
+
+/\p{Zs}*?\R/
+    ** Failers
+    a\xFCb   
+
+/\p{Zs}*\R/                                                                    
+    ** Failers 
+    a\xFCb   
+
 /-- End of testinput6 --/
--- a/ext/pcre/pcrelib/testdata/testinput9
+++ b/ext/pcre/pcrelib/testdata/testinput9
@ -847,4 +847,143 @@
    ** Failers 
    \x{1d79}\x{a77d} 

+/^\p{Xan}/8
+    ABCD
+    1234
+    \x{6ca}
+    \x{a6c}
+    \x{10a7}   
+    ** Failers
+    _ABC   
+
+/^\p{Xan}+/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+    ** Failers
+    _ABC   
+
+/^\p{Xan}*/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+    
+/^\p{Xan}{2,9}/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+    
+/^[\p{Xan}]/8
+    ABCD1234_
+    1234abcd_
+    \x{6ca}
+    \x{a6c}
+    \x{10a7}   
+    ** Failers
+    _ABC   
+ 
+/^[\p{Xan}]+/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+    ** Failers
+    _ABC   
+
+/^>\p{Xsp}/8
+    >\x{1680}\x{2028}\x{0b}
+    ** Failers
+    \x{0b} 
+
+/^>\p{Xsp}+/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+
+/^>\p{Xsp}*/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+    
+/^>\p{Xsp}{2,9}/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+    
+/^>[\p{Xsp}]/8
+    >\x{2028}\x{0b}
+ 
+/^>[\p{Xsp}]+/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+
+/^>\p{Xps}/8
+    >\x{1680}\x{2028}\x{0b}
+    >\x{a0} 
+    ** Failers
+    \x{0b} 
+
+/^>\p{Xps}+/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+
+/^>\p{Xps}+?/8
+    >\x{1680}\x{2028}\x{0b}
+
+/^>\p{Xps}*/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+    
+/^>\p{Xps}{2,9}/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+    
+/^>\p{Xps}{2,9}?/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+    
+/^>[\p{Xps}]/8
+    >\x{2028}\x{0b}
+ 
+/^>[\p{Xps}]+/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+
+/^\p{Xwd}/8
+    ABCD
+    1234
+    \x{6ca}
+    \x{a6c}
+    \x{10a7}
+    _ABC    
+    ** Failers
+    [] 
+
+/^\p{Xwd}+/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+
+/^\p{Xwd}*/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+    
+/^\p{Xwd}{2,9}/8
+    A_12\x{6ca}\x{a6c}\x{10a7}
+    
+/^[\p{Xwd}]/8
+    ABCD1234_
+    1234abcd_
+    \x{6ca}
+    \x{a6c}
+    \x{10a7}   
+    _ABC 
+    ** Failers
+    []   
+ 
+/^[\p{Xwd}]+/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+
+/-- Unicode properties for \b abd \B --/
+
+/\b...\B/8W
+    abc_
+    \x{37e}abc\x{376} 
+    \x{37e}\x{376}\x{371}\x{393}\x{394} 
+    !\x{c0}++\x{c1}\x{c2} 
+    !\x{c0}+++++ 
+
+/-- Without PCRE_UCP, non-ASCII always fail, even if < 256  --/
+
+/\b...\B/8
+    abc_
+    ** Failers 
+    \x{37e}abc\x{376} 
+    \x{37e}\x{376}\x{371}\x{393}\x{394} 
+    !\x{c0}++\x{c1}\x{c2} 
+    !\x{c0}+++++ 
+
+/-- With PCRE_UCP, non-UTF8 chars that are < 256 still check properties  --/
+
+/\b...\B/W
+    abc_
+    !\x{c0}++\x{c1}\x{c2} 
+    !\x{c0}+++++ 
+
 /-- End of testinput9 --/ 
--- a/ext/pcre/pcrelib/testdata/testoutput10
+++ b/ext/pcre/pcrelib/testdata/testoutput10
@ -1,7 +1,8 @@
 /-- These are a few representative patterns whose lengths and offsets are to be 
 shown when the link size is 2. This is just a doublecheck test to ensure the 
 sizes don't go horribly wrong when something is changed. The pattern contents 
-are all themselves checked in other tests. --/
+are all themselves checked in other tests. Unicode, including property support, 
+is required for these tests. --/

 /((?i)b)/BM
 Memory allocation (code space): 21
@ -666,4 +667,44 @@ Memory allocation (code space): 40
 39     End
 ------------------------------------------------------------------

+/[^\d]/8WB
+------------------------------------------------------------------
+  0  11 Bra
+  3     [^\p{Nd}]
+ 11  11 Ket
+ 14     End
+------------------------------------------------------------------
+
+/[[:^alpha:][:^cntrl:]]+/8WB
+------------------------------------------------------------------
+  0  44 Bra
+  3     [ -~\x80-\xff\P{L}]+
+ 44  44 Ket
+ 47     End
+------------------------------------------------------------------
+
+/[[:^cntrl:][:^alpha:]]+/8WB
+------------------------------------------------------------------
+  0  44 Bra
+  3     [ -~\x80-\xff\P{L}]+
+ 44  44 Ket
+ 47     End
+------------------------------------------------------------------
+
+/[[:alpha:]]+/8WB
+------------------------------------------------------------------
+  0  12 Bra
+  3     [\p{L}]+
+ 12  12 Ket
+ 15     End
+------------------------------------------------------------------
+
+/[[:^alpha:]\S]+/8WB
+------------------------------------------------------------------
+  0  15 Bra
+  3     [\P{L}\P{Xsp}]+
+ 15  15 Ket
+ 18     End
+------------------------------------------------------------------
+
 /-- End of testinput10 --/
--- a/ext/pcre/pcrelib/testdata/testoutput2
+++ b/ext/pcre/pcrelib/testdata/testoutput2
@ -2,12 +2,12 @@
    of PCRE's API, error diagnostics, and the compiled code of some patterns.
    It also checks the non-Perl syntax the PCRE supports (Python, .NET, 
    Oniguruma). Finally, there are some tests where PCRE and Perl differ, 
-    either because PCRE can't be compatible, or there is potential Perl 
+    either because PCRE can't be compatible, or there is a possible Perl 
    bug. --/  
  
-/-- Originally, the Perl 5.10 things were in here too, but now I have separated
-    many (most?) of them out into test 11. However, there may still be some
-    that were overlooked. --/   
+/-- Originally, the Perl 5.10 and 5.11 things were in here too, but now I have 
+    separated many (most?) of them out into test 11. However, there may still 
+    be some that were overlooked. --/   

 /(a)b|/I
 Capturing subpattern count = 1
@ -103,6 +103,36 @@ Failed: missing terminating ] for character class at offset 5
 /(?X)[\B]/
 Failed: invalid escape sequence in character class at offset 6

+/(?X)[\R]/
+Failed: invalid escape sequence in character class at offset 6
+
+/(?X)[\X]/
+Failed: invalid escape sequence in character class at offset 6
+
+/[\B]/BZ
+------------------------------------------------------------------
+        Bra
+        B
+        Ket
+        End
+------------------------------------------------------------------
+
+/[\R]/BZ
+------------------------------------------------------------------
+        Bra
+        R
+        Ket
+        End
+------------------------------------------------------------------
+
+/[\X]/BZ
+------------------------------------------------------------------
+        Bra
+        X
+        Ket
+        End
+------------------------------------------------------------------
+
 /[z-a]/
 Failed: range out of order in character class at offset 3

@ -3198,19 +3228,19 @@ Failed: POSIX collating elements are not supported at offset 0
 Failed: POSIX named classes are supported only within a class at offset 0

 /\l/I
-Failed: PCRE does not support \L, \l, \N, \U, or \u at offset 1
+Failed: PCRE does not support \L, \l, \N{name}, \U, or \u at offset 1

 /\L/I
-Failed: PCRE does not support \L, \l, \N, \U, or \u at offset 1
+Failed: PCRE does not support \L, \l, \N{name}, \U, or \u at offset 1

 /\N{name}/I
-Failed: PCRE does not support \L, \l, \N, \U, or \u at offset 1
+Failed: PCRE does not support \L, \l, \N{name}, \U, or \u at offset 1

 /\u/I
-Failed: PCRE does not support \L, \l, \N, \U, or \u at offset 1
+Failed: PCRE does not support \L, \l, \N{name}, \U, or \u at offset 1

 /\U/I
-Failed: PCRE does not support \L, \l, \N, \U, or \u at offset 1
+Failed: PCRE does not support \L, \l, \N{name}, \U, or \u at offset 1

 /[/I
 Failed: missing terminating ] for character class at offset 1
@ -8667,11 +8697,8 @@ No match
 +13   ^  ^      (*FAIL)
 No match
    
-/a(*PRUNE:XXX)b/
-Failed: (*VERB) with an argument is not supported at offset 8
-
 /a(*MARK)b/ 
-Failed: (*VERB) not recognized at offset 7
+Failed: (*MARK) must have an argument at offset 7

 /(?i:A{1,}\6666666666)/
 Failed: number is too big at offset 19
@ -10668,4 +10695,435 @@ No match
 /(?P<L1>(?P<L2>0|)|(?P>L2)(?P>L1))/
 Failed: recursive call could loop indefinitely at offset 31

+/abc(*MARK:)pqr/
+Failed: (*MARK) must have an argument at offset 10
+
+/abc(*:)pqr/
+Failed: (*MARK) must have an argument at offset 6
+
+/abc(*FAIL:123)xyz/
+Failed: an argument is not allowed for (*ACCEPT), (*FAIL), or (*COMMIT) at offset 13
+
+/--- This should, and does, fail. In Perl, it does not, which I think is a 
+     bug because replacing the B in the pattern by (B|D) does make it fail. ---/
+
+/A(*COMMIT)B/+K
+    ACABX
+No match
+
+/--- These should be different, but in Perl 5.11 are not, which I think
+     is a bug in Perl. ---/
+
+/A(*THEN)B|A(*THEN)C/K
+    AC
+ 0: AC
+
+/A(*PRUNE)B|A(*PRUNE)C/K
+    AC
+No match
+    
+/--- A whole lot of tests of verbs with arguments are here rather than in test
+     11 because Perl doesn't seem to follow its specification entirely 
+     correctly. ---/
+
+/--- Perl 5.11 sets $REGERROR on the AC failure case here; PCRE does not. It is
+     not clear how Perl defines "involved in the failure of the match". ---/ 
+
+/^(A(*THEN:A)B|C(*THEN:B)D)/K
+    AB
+ 0: AB
+ 1: AB
+    CD
+ 0: CD
+ 1: CD
+    ** Failers
+No match
+    AC
+No match
+    CB    
+No match, mark = B
+    
+/--- Check the use of names for success and failure. PCRE doesn't show these 
+names for success, though Perl does, contrary to its spec. ---/
+
+/^(A(*PRUNE:A)B|C(*PRUNE:B)D)/K
+    AB
+ 0: AB
+ 1: AB
+    CD
+ 0: CD
+ 1: CD
+    ** Failers
+No match
+    AC
+No match, mark = A
+    CB    
+No match, mark = B
+    
+/--- An empty name does not pass back an empty string. It is the same as if no
+name were given. ---/ 
+
+/^(A(*PRUNE:)B|C(*PRUNE:B)D)/K
+    AB
+ 0: AB
+ 1: AB
+    CD 
+ 0: CD
+ 1: CD
+
+/--- PRUNE goes to next bumpalong; COMMIT does not. ---/
+    
+/A(*PRUNE:A)B/K
+    ACAB
+ 0: AB
+
+/(*MARK:A)(*PRUNE:B)(C|X)/K
+    C
+ 0: C
+ 1: C
+MK: A
+    D 
+No match, mark = B
+
+/(*MARK:A)(*THEN:B)(C|X)/K
+    C
+ 0: C
+ 1: C
+MK: A
+    D 
+No match, mark = B
+
+/--- This should fail, as the skip causes a bump to offset 3 (the skip) ---/
+
+/A(*MARK:A)A+(*SKIP)(B|Z) | AC/xK
+    AAAC
+No match
+
+/--- Same --/
+
+/A(*MARK:A)A+(*MARK:B)(*SKIP:B)(B|Z) | AC/xK
+    AAAC
+No match
+
+/--- This should fail; the SKIP advances by one, but when we get to AC, the
+     PRUNE kills it. ---/ 
+    
+/A(*PRUNE:A)A+(*SKIP:A)(B|Z) | AC/xK
+    AAAC
+No match
+
+/A(*:A)A+(*SKIP)(B|Z) | AC/xK
+    AAAC
+No match
+
+/--- This should fail, as a null name is the same as no name ---/
+
+/A(*MARK:A)A+(*SKIP:)(B|Z) | AC/xK
+    AAAC
+No match
+
+/--- This fails in PCRE, and I think that is in accordance with Perl's 
+     documentation, though in Perl it succeeds. ---/
+    
+/A(*MARK:A)A+(*SKIP:B)(B|Z) | AAC/xK
+    AAAC
+No match
+
+/--- Mark names can be duplicated ---/
+
+/A(*:A)B|X(*:A)Y/K
+    AABC
+ 0: AB
+MK: A
+    XXYZ 
+ 0: XY
+MK: A
+    
+/^A(*:A)B|^X(*:A)Y/K
+    ** Failers
+No match
+    XAQQ
+No match, mark = A
+    
+/--- A check on what happens after hitting a mark and them bumping along to
+something that does not even start. Perl reports tags after the failures here, 
+though it does not when the individual letters are made into something 
+more complicated. ---/
+
+/A(*:A)B|XX(*:B)Y/K
+    AABC
+ 0: AB
+MK: A
+    XXYZ 
+ 0: XXY
+MK: B
+    ** Failers
+No match
+    XAQQ  
+No match
+    XAQQXZZ  
+No match
+    AXQQQ 
+No match
+    AXXQQQ 
+No match
+    
+/--- COMMIT at the start of a pattern should be the same as an anchor. Perl 
+optimizations defeat this. So does the PCRE optimization unless we disable it 
+with \Y. ---/
+
+/(*COMMIT)ABC/
+    ABCDEFG
+ 0: ABC
+    ** Failers
+No match
+    DEFGABC\Y  
+No match
+    
+/--- Repeat some tests with added studying. ---/
+
+/A(*COMMIT)B/+KS
+    ACABX
+No match
+ 
+/A(*THEN)B|A(*THEN)C/KS
+    AC
+ 0: AC
+
+/A(*PRUNE)B|A(*PRUNE)C/KS
+    AC
+No match
+
+/^(A(*THEN:A)B|C(*THEN:B)D)/KS
+    AB
+ 0: AB
+ 1: AB
+    CD
+ 0: CD
+ 1: CD
+    ** Failers
+No match
+    AC
+No match
+    CB    
+No match, mark = B
+
+/^(A(*PRUNE:A)B|C(*PRUNE:B)D)/KS
+    AB
+ 0: AB
+ 1: AB
+    CD
+ 0: CD
+ 1: CD
+    ** Failers
+No match
+    AC
+No match, mark = A
+    CB    
+No match, mark = B
+
+/^(A(*PRUNE:)B|C(*PRUNE:B)D)/KS
+    AB
+ 0: AB
+ 1: AB
+    CD 
+ 0: CD
+ 1: CD
+
+/A(*PRUNE:A)B/KS
+    ACAB
+ 0: AB
+
+/(*MARK:A)(*PRUNE:B)(C|X)/KS
+    C
+ 0: C
+ 1: C
+MK: A
+    D 
+No match
+
+/(*MARK:A)(*THEN:B)(C|X)/KS
+    C
+ 0: C
+ 1: C
+MK: A
+    D 
+No match
+
+/A(*MARK:A)A+(*SKIP)(B|Z) | AC/xKS
+    AAAC
+No match
+
+/A(*MARK:A)A+(*MARK:B)(*SKIP:B)(B|Z) | AC/xKS
+    AAAC
+No match
+    
+/A(*PRUNE:A)A+(*SKIP:A)(B|Z) | AC/xKS
+    AAAC
+No match
+
+/A(*:A)A+(*SKIP)(B|Z) | AC/xKS
+    AAAC
+No match
+
+/A(*MARK:A)A+(*SKIP:)(B|Z) | AC/xKS
+    AAAC
+No match
+
+/A(*MARK:A)A+(*SKIP:B)(B|Z) | AAC/xKS
+    AAAC
+No match
+
+/A(*:A)B|XX(*:B)Y/KS
+    AABC
+ 0: AB
+MK: A
+    XXYZ 
+ 0: XXY
+MK: B
+    ** Failers
+No match
+    XAQQ  
+No match
+    XAQQXZZ  
+No match
+    AXQQQ 
+No match
+    AXXQQQ 
+No match
+    
+/(*COMMIT)ABC/
+    ABCDEFG
+ 0: ABC
+    ** Failers
+No match
+    DEFGABC\Y  
+No match
+
+/^(ab (c+(*THEN)cd) | xyz)/x
+    abcccd  
+No match
+
+/^(ab (c+(*PRUNE)cd) | xyz)/x
+    abcccd  
+No match
+
+/^(ab (c+(*FAIL)cd) | xyz)/x
+    abcccd  
+No match
+    
+/--- Perl 5.11 gets some of these wrong ---/ 
+
+/(?>.(*ACCEPT))*?5/
+    abcde
+ 0: a
+
+/(.(*ACCEPT))*?5/
+    abcde
+ 0: a
+ 1: a
+
+/(.(*ACCEPT))5/
+    abcde
+ 0: a
+ 1: a
+
+/(.(*ACCEPT))*5/
+    abcde
+ 0: a
+ 1: a
+
+/A\NB./BZ
+------------------------------------------------------------------
+        Bra
+        A
+        Any
+        B
+        Any
+        Ket
+        End
+------------------------------------------------------------------
+  ACBD
+ 0: ACBD
+  ** Failers
+No match
+  A\nB
+No match
+  ACB\n   
+No match
+
+/A\NB./sBZ
+------------------------------------------------------------------
+        Bra
+        A
+        Any
+        B
+        AllAny
+        Ket
+        End
+------------------------------------------------------------------
+  ACBD
+ 0: ACBD
+  ACB\n 
+ 0: ACB\x0a
+  ** Failers
+No match
+  A\nB  
+No match
+  
+/A\NB/<crlf>
+  A\nB
+ 0: A\x0aB
+  A\rB
+ 0: A\x0dB
+  ** Failers
+No match
+  A\r\nB    
+No match
+
+/\R+b/BZ
+------------------------------------------------------------------
+        Bra
+        \R++
+        b
+        Ket
+        End
+------------------------------------------------------------------
+
+/\R+\n/BZ
+------------------------------------------------------------------
+        Bra
+        \R+
+        \x0a
+        Ket
+        End
+------------------------------------------------------------------
+
+/\R+\d/BZ
+------------------------------------------------------------------
+        Bra
+        \R++
+        \d
+        Ket
+        End
+------------------------------------------------------------------
+
+/\d*\R/BZ
+------------------------------------------------------------------
+        Bra
+        \d*+
+        \R
+        Ket
+        End
+------------------------------------------------------------------
+
+/\s*\R/BZ
+------------------------------------------------------------------
+        Bra
+        \s*+
+        \R
+        Ket
+        End
+------------------------------------------------------------------
+
 /-- End of testinput2 --/
--- a/ext/pcre/pcrelib/testdata/testoutput5
+++ b/ext/pcre/pcrelib/testdata/testoutput5
@ -2076,4 +2076,150 @@ Partial match: abcde
    \PX
 Partial match: X

+/\h/SI
+Capturing subpattern count = 0
+No options
+No first char
+No need char
+Subject length lower bound = 1
+Starting byte set: \x09 \x20 \xa0 
+
+/\h/SI8
+Capturing subpattern count = 0
+Options: utf8
+No first char
+No need char
+Subject length lower bound = 1
+Starting byte set: \x09 \x20 \xc2 \xe1 \xe2 \xe3 
+    ABC\x{09}
+ 0: \x{09}
+    ABC\x{20}
+ 0:  
+    ABC\x{a0}
+ 0: \x{a0}
+    ABC\x{1680}
+ 0: \x{1680}
+    ABC\x{180e}
+ 0: \x{180e}
+    ABC\x{2000}
+ 0: \x{2000}
+    ABC\x{202f} 
+ 0: \x{202f}
+    ABC\x{205f} 
+ 0: \x{205f}
+    ABC\x{3000} 
+ 0: \x{3000}
+
+/\v/SI
+Capturing subpattern count = 0
+No options
+No first char
+No need char
+Subject length lower bound = 1
+Starting byte set: \x0a \x0b \x0c \x0d \x85 
+
+/\v/SI8
+Capturing subpattern count = 0
+Options: utf8
+No first char
+No need char
+Subject length lower bound = 1
+Starting byte set: \x0a \x0b \x0c \x0d \xc2 \xe2 
+    ABC\x{0a}
+ 0: \x{0a}
+    ABC\x{0b}
+ 0: \x{0b}
+    ABC\x{0c}
+ 0: \x{0c}
+    ABC\x{0d}
+ 0: \x{0d}
+    ABC\x{85}
+ 0: \x{85}
+    ABC\x{2028}
+ 0: \x{2028}
+
+/\R/SI
+Capturing subpattern count = 0
+No options
+No first char
+No need char
+Subject length lower bound = 2
+Starting byte set: \x0a \x0b \x0c \x0d \x85 
+
+/\R/SI8
+Capturing subpattern count = 0
+Options: utf8
+No first char
+No need char
+Subject length lower bound = 2
+Starting byte set: \x0a \x0b \x0c \x0d \xc2 \xe2 
+
+/\h*A/SI8
+Capturing subpattern count = 0
+Options: utf8
+No first char
+Need char = 'A'
+Subject length lower bound = 1
+Starting byte set: \x09 \x20 A \xc2 \xe1 \xe2 \xe3 
+    CDBABC
+ 0: A
+    
+/\v+A/SI8
+Capturing subpattern count = 0
+Options: utf8
+No first char
+Need char = 'A'
+Subject length lower bound = 2
+Starting byte set: \x0a \x0b \x0c \x0d \xc2 \xe2 
+
+/\s?xxx\s/8SI
+Capturing subpattern count = 0
+Options: utf8
+No first char
+Need char = 'x'
+Subject length lower bound = 4
+Starting byte set: \x09 \x0a \x0c \x0d \x20 x 
+
+/\sxxx\s/8T1
+    AB\x{85}xxx\x{a0}XYZ
+ 0: \x{85}xxx\x{a0}
+    AB\x{a0}xxx\x{85}XYZ
+ 0: \x{a0}xxx\x{85}
+
+/\sxxx\s/I8ST1
+Capturing subpattern count = 0
+Options: utf8
+No first char
+Need char = 'x'
+Subject length lower bound = 5
+Starting byte set: \x09 \x0a \x0c \x0d \x20 \xc2 
+    AB\x{85}xxx\x{a0}XYZ
+ 0: \x{85}xxx\x{a0}
+    AB\x{a0}xxx\x{85}XYZ
+ 0: \x{a0}xxx\x{85}
+
+/\S \S/8T1
+    \x{a2} \x{84} 
+ 0: \x{a2} \x{84}
+
+/\S \S/I8ST1
+Capturing subpattern count = 0
+Options: utf8
+No first char
+Need char = ' '
+Subject length lower bound = 3
+Starting byte set: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0b \x0e 
+  \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d 
+  \x1e \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ 
+  A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e 
+  f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \xc0 \xc1 \xc2 \xc3 
+  \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 
+  \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 
+  \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0 
+  \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff 
+    \x{a2} \x{84} 
+ 0: \x{a2} \x{84}
+    A Z 
+ 0: A Z
+
 /-- End of testinput5 --/
--- a/ext/pcre/pcrelib/testdata/testoutput6
+++ b/ext/pcre/pcrelib/testdata/testoutput6
@ -1285,4 +1285,72 @@ No match
    \x{10b00}\x{a6ef}\x{13007}\x{10857}\x{10b78}\x{10b58}\x{a980}\x{110c1}\x{a4ff}\x{abc0}\x{10a7d}\x{10c48}\x{0800}\x{1aad}\x{aac0}
 0: \x{10b00}\x{a6ef}\x{13007}\x{10857}\x{10b78}\x{10b58}\x{a980}\x{110c1}\x{a4ff}\x{abc0}\x{10a7d}\x{10c48}\x{800}\x{1aad}\x{aac0}

+/^\w+/8W
+    Az_\x{aa}\x{c0}\x{1c5}\x{2b0}\x{3b6}\x{1d7c9}\x{2fa1d}1\x{660}\x{bef}\x{16ee}
+ 0: Az_\x{aa}\x{c0}\x{1c5}\x{2b0}\x{3b6}\x{1d7c9}\x{2fa1d}1\x{660}\x{bef}\x{16ee}
+
+/^[[:xdigit:]]*/8W
+    1a\x{660}\x{bef}\x{16ee}
+ 0: 1a
+  
+/^\d+/8W
+    1\x{660}\x{bef}\x{16ee}
+ 0: 1\x{660}\x{bef}
+  
+/^[[:digit:]]+/8W
+    1\x{660}\x{bef}\x{16ee}
+ 0: 1\x{660}\x{bef}
+
+/^>\s+/8W
+    >\x{20}\x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{9}\x{b} 
+ 0: > \x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{09}
+  
+/^>\pZ+/8W
+    >\x{20}\x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{9}\x{b} 
+ 0: > \x{a0}\x{1680}\x{2028}\x{2029}\x{202f}
+  
+/^>[[:space:]]*/8W
+    >\x{20}\x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{9}\x{b} 
+ 0: > \x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{09}\x{0b}
+
+/^>[[:blank:]]*/8W
+    >\x{20}\x{a0}\x{1680}\x{180e}\x{2000}\x{202f}\x{9}\x{b}\x{2028} 
+ 0: > \x{a0}\x{1680}\x{180e}\x{2000}\x{202f}\x{09}
+
+/^[[:alpha:]]*/8W
+    Az\x{aa}\x{c0}\x{1c5}\x{2b0}\x{3b6}\x{1d7c9}\x{2fa1d}
+ 0: Az\x{aa}\x{c0}\x{1c5}\x{2b0}\x{3b6}\x{1d7c9}\x{2fa1d}
+
+/^[[:alnum:]]*/8W
+    Az\x{aa}\x{c0}\x{1c5}\x{2b0}\x{3b6}\x{1d7c9}\x{2fa1d}1\x{660}\x{bef}\x{16ee}
+ 0: Az\x{aa}\x{c0}\x{1c5}\x{2b0}\x{3b6}\x{1d7c9}\x{2fa1d}1\x{660}\x{bef}\x{16ee}
+
+/^[[:cntrl:]]*/8W
+    \x{0}\x{09}\x{1f}\x{7f}\x{9f} 
+ 0: \x{00}\x{09}\x{1f}\x{7f}
+
+/^[[:graph:]]*/8W
+    A\x{a1}\x{a0}
+ 0: A
+
+/^[[:print:]]*/8W
+    A z\x{a0}\x{a1}
+ 0: A z
+
+/^[[:punct:]]*/8W
+    .+\x{a1}\x{a0}
+ 0: .+
+
+/\p{Zs}*?\R/
+    ** Failers
+No match
+    a\xFCb   
+No match
+
+/\p{Zs}*\R/                                                                    
+    ** Failers 
+No match
+    a\xFCb   
+No match
+
 /-- End of testinput6 --/
--- a/ext/pcre/pcrelib/testdata/testoutput9
+++ b/ext/pcre/pcrelib/testdata/testoutput9
@ -1674,4 +1674,364 @@ No match
    \x{1d79}\x{a77d} 
 No match

+/^\p{Xan}/8
+    ABCD
+ 0: A
+    1234
+ 0: 1
+    \x{6ca}
+ 0: \x{6ca}
+    \x{a6c}
+ 0: \x{a6c}
+    \x{10a7}   
+ 0: \x{10a7}
+    ** Failers
+No match
+    _ABC   
+No match
+
+/^\p{Xan}+/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}
+ 1: ABCD1234\x{6ca}\x{a6c}
+ 2: ABCD1234\x{6ca}
+ 3: ABCD1234
+ 4: ABCD123
+ 5: ABCD12
+ 6: ABCD1
+ 7: ABCD
+ 8: ABC
+ 9: AB
+10: A
+    ** Failers
+No match
+    _ABC   
+No match
+
+/^\p{Xan}*/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}
+ 1: ABCD1234\x{6ca}\x{a6c}
+ 2: ABCD1234\x{6ca}
+ 3: ABCD1234
+ 4: ABCD123
+ 5: ABCD12
+ 6: ABCD1
+ 7: ABCD
+ 8: ABC
+ 9: AB
+10: A
+11: 
+    
+/^\p{Xan}{2,9}/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 0: ABCD1234\x{6ca}
+ 1: ABCD1234
+ 2: ABCD123
+ 3: ABCD12
+ 4: ABCD1
+ 5: ABCD
+ 6: ABC
+ 7: AB
+    
+/^[\p{Xan}]/8
+    ABCD1234_
+ 0: A
+    1234abcd_
+ 0: 1
+    \x{6ca}
+ 0: \x{6ca}
+    \x{a6c}
+ 0: \x{a6c}
+    \x{10a7}   
+ 0: \x{10a7}
+    ** Failers
+No match
+    _ABC   
+No match
+ 
+/^[\p{Xan}]+/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}
+ 1: ABCD1234\x{6ca}\x{a6c}
+ 2: ABCD1234\x{6ca}
+ 3: ABCD1234
+ 4: ABCD123
+ 5: ABCD12
+ 6: ABCD1
+ 7: ABCD
+ 8: ABC
+ 9: AB
+10: A
+    ** Failers
+No match
+    _ABC   
+No match
+
+/^>\p{Xsp}/8
+    >\x{1680}\x{2028}\x{0b}
+ 0: >\x{1680}
+    ** Failers
+No match
+    \x{0b} 
+No match
+
+/^>\p{Xsp}+/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
+ 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
+ 3: > \x{09}\x{0a}\x{0c}\x{0d}
+ 4: > \x{09}\x{0a}\x{0c}
+ 5: > \x{09}\x{0a}
+ 6: > \x{09}
+ 7: > 
+
+/^>\p{Xsp}*/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
+ 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
+ 3: > \x{09}\x{0a}\x{0c}\x{0d}
+ 4: > \x{09}\x{0a}\x{0c}
+ 5: > \x{09}\x{0a}
+ 6: > \x{09}
+ 7: > 
+ 8: >
+    
+/^>\p{Xsp}{2,9}/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
+ 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
+ 3: > \x{09}\x{0a}\x{0c}\x{0d}
+ 4: > \x{09}\x{0a}\x{0c}
+ 5: > \x{09}\x{0a}
+ 6: > \x{09}
+    
+/^>[\p{Xsp}]/8
+    >\x{2028}\x{0b}
+ 0: >\x{2028}
+ 
+/^>[\p{Xsp}]+/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
+ 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
+ 3: > \x{09}\x{0a}\x{0c}\x{0d}
+ 4: > \x{09}\x{0a}\x{0c}
+ 5: > \x{09}\x{0a}
+ 6: > \x{09}
+ 7: > 
+
+/^>\p{Xps}/8
+    >\x{1680}\x{2028}\x{0b}
+ 0: >\x{1680}
+    >\x{a0} 
+ 0: >\x{a0}
+    ** Failers
+No match
+    \x{0b} 
+No match
+
+/^>\p{Xps}+/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
+ 3: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
+ 4: > \x{09}\x{0a}\x{0c}\x{0d}
+ 5: > \x{09}\x{0a}\x{0c}
+ 6: > \x{09}\x{0a}
+ 7: > \x{09}
+ 8: > 
+
+/^>\p{Xps}+?/8
+    >\x{1680}\x{2028}\x{0b}
+ 0: >\x{1680}\x{2028}\x{0b}
+ 1: >\x{1680}\x{2028}
+ 2: >\x{1680}
+
+/^>\p{Xps}*/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
+ 3: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
+ 4: > \x{09}\x{0a}\x{0c}\x{0d}
+ 5: > \x{09}\x{0a}\x{0c}
+ 6: > \x{09}\x{0a}
+ 7: > \x{09}
+ 8: > 
+ 9: >
+    
+/^>\p{Xps}{2,9}/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
+ 3: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
+ 4: > \x{09}\x{0a}\x{0c}\x{0d}
+ 5: > \x{09}\x{0a}\x{0c}
+ 6: > \x{09}\x{0a}
+ 7: > \x{09}
+    
+/^>\p{Xps}{2,9}?/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
+ 3: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
+ 4: > \x{09}\x{0a}\x{0c}\x{0d}
+ 5: > \x{09}\x{0a}\x{0c}
+ 6: > \x{09}\x{0a}
+ 7: > \x{09}
+    
+/^>[\p{Xps}]/8
+    >\x{2028}\x{0b}
+ 0: >\x{2028}
+ 
+/^>[\p{Xps}]+/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
+ 3: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
+ 4: > \x{09}\x{0a}\x{0c}\x{0d}
+ 5: > \x{09}\x{0a}\x{0c}
+ 6: > \x{09}\x{0a}
+ 7: > \x{09}
+ 8: > 
+
+/^\p{Xwd}/8
+    ABCD
+ 0: A
+    1234
+ 0: 1
+    \x{6ca}
+ 0: \x{6ca}
+    \x{a6c}
+ 0: \x{a6c}
+    \x{10a7}
+ 0: \x{10a7}
+    _ABC    
+ 0: _
+    ** Failers
+No match
+    [] 
+No match
+
+/^\p{Xwd}+/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 1: ABCD1234\x{6ca}\x{a6c}\x{10a7}
+ 2: ABCD1234\x{6ca}\x{a6c}
+ 3: ABCD1234\x{6ca}
+ 4: ABCD1234
+ 5: ABCD123
+ 6: ABCD12
+ 7: ABCD1
+ 8: ABCD
+ 9: ABC
+10: AB
+11: A
+
+/^\p{Xwd}*/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 1: ABCD1234\x{6ca}\x{a6c}\x{10a7}
+ 2: ABCD1234\x{6ca}\x{a6c}
+ 3: ABCD1234\x{6ca}
+ 4: ABCD1234
+ 5: ABCD123
+ 6: ABCD12
+ 7: ABCD1
+ 8: ABCD
+ 9: ABC
+10: AB
+11: A
+12: 
+    
+/^\p{Xwd}{2,9}/8
+    A_12\x{6ca}\x{a6c}\x{10a7}
+ 0: A_12\x{6ca}\x{a6c}\x{10a7}
+ 1: A_12\x{6ca}\x{a6c}
+ 2: A_12\x{6ca}
+ 3: A_12
+ 4: A_1
+ 5: A_
+    
+/^[\p{Xwd}]/8
+    ABCD1234_
+ 0: A
+    1234abcd_
+ 0: 1
+    \x{6ca}
+ 0: \x{6ca}
+    \x{a6c}
+ 0: \x{a6c}
+    \x{10a7}   
+ 0: \x{10a7}
+    _ABC 
+ 0: _
+    ** Failers
+No match
+    []   
+No match
+ 
+/^[\p{Xwd}]+/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 1: ABCD1234\x{6ca}\x{a6c}\x{10a7}
+ 2: ABCD1234\x{6ca}\x{a6c}
+ 3: ABCD1234\x{6ca}
+ 4: ABCD1234
+ 5: ABCD123
+ 6: ABCD12
+ 7: ABCD1
+ 8: ABCD
+ 9: ABC
+10: AB
+11: A
+
+/-- Unicode properties for \b abd \B --/
+
+/\b...\B/8W
+    abc_
+ 0: abc
+    \x{37e}abc\x{376} 
+ 0: abc
+    \x{37e}\x{376}\x{371}\x{393}\x{394} 
+ 0: \x{376}\x{371}\x{393}
+    !\x{c0}++\x{c1}\x{c2} 
+ 0: ++\x{c1}
+    !\x{c0}+++++ 
+ 0: \x{c0}++
+
+/-- Without PCRE_UCP, non-ASCII always fail, even if < 256  --/
+
+/\b...\B/8
+    abc_
+ 0: abc
+    ** Failers 
+ 0: Fai
+    \x{37e}abc\x{376} 
+No match
+    \x{37e}\x{376}\x{371}\x{393}\x{394} 
+No match
+    !\x{c0}++\x{c1}\x{c2} 
+No match
+    !\x{c0}+++++ 
+No match
+
+/-- With PCRE_UCP, non-UTF8 chars that are < 256 still check properties  --/
+
+/\b...\B/W
+    abc_
+ 0: abc
+    !\x{c0}++\x{c1}\x{c2} 
+ 0: ++\xc1
+    !\x{c0}+++++ 
+ 0: \xc0++
+
 /-- End of testinput9 --/