upgrade PCRE to version 7.5

2025-01-22 11:44:09 +08:00 · 2008-01-13 12:44:57 +00:00 · 2008-01-13 12:44:57 +00:00 · 4c501a0ab6
commit 4c501a0ab6
parent ccc0d6e32b
25 changed files with 1066 additions and 2254 deletions
--- a/2
+++ b/2
@ -29,7 +29,7 @@ PHP                                                                        NEWS
  invoking the date parser. (Scott)

 - Removed the experimental RPL (master/slave) functions from mysqli. (Andrey)
- Upgraded PCRE to version 7.4 (Nuno)
+- Upgraded PCRE to version 7.5 (Nuno)

 - Improved php.ini handling: (Jani)
  . Added ".htaccess" style user-defined php.ini files support for CGI/FastCGI
--- a/ext/pcre/pcrelib/ChangeLog
+++ b/ext/pcre/pcrelib/ChangeLog
@ -1,6 +1,143 @@
 ChangeLog for PCRE
 ------------------

+Version 7.5 10-Jan-08
+---------------------
+
+1.  Applied a patch from Craig: "This patch makes it possible to 'ignore'
+    values in parens when parsing an RE using the C++ wrapper."
+
+2.  Negative specials like \S did not work in character classes in UTF-8 mode.
+    Characters greater than 255 were excluded from the class instead of being
+    included.
+
+3.  The same bug as (2) above applied to negated POSIX classes such as
+    [:^space:].
+
+4.  PCRECPP_STATIC was referenced in pcrecpp_internal.h, but nowhere was it
+    defined or documented. It seems to have been a typo for PCRE_STATIC, so
+    I have changed it.
+
+5.  The construct (?&) was not diagnosed as a syntax error (it referenced the
+    first named subpattern) and a construct such as (?&a) would reference the
+    first named subpattern whose name started with "a" (in other words, the
+    length check was missing). Both these problems are fixed. "Subpattern name
+    expected" is now given for (?&) (a zero-length name), and this patch also
+    makes it give the same error for \k'' (previously it complained that that
+    was a reference to a non-existent subpattern).
+
+6.  The erroneous patterns (?+-a) and (?-+a) give different error messages;
+    this is right because (?- can be followed by option settings as well as by
+    digits. I have, however, made the messages clearer.
+
+7.  Patterns such as (?(1)a|b) (a pattern that contains fewer subpatterns
+    than the number used in the conditional) now cause a compile-time error.
+    This is actually not compatible with Perl, which accepts such patterns, but
+    treats the conditional as always being FALSE (as PCRE used to), but it
+    seems to me that giving a diagnostic is better.
+
+8.  Change "alphameric" to the more common word "alphanumeric" in comments
+    and messages.
+
+9.  Fix two occurrences of "backslash" in comments that should have been
+    "backspace".
+
+10. Remove two redundant lines of code that can never be obeyed (their function
+    was moved elsewhere).
+
+11. The program that makes PCRE's Unicode character property table had a bug
+    which caused it to generate incorrect table entries for sequences of
+    characters that have the same character type, but are in different scripts.
+    It amalgamated them into a single range, with the script of the first of
+    them. In other words, some characters were in the wrong script. There were
+    thirteen such cases, affecting characters in the following ranges:
+
+      U+002b0 - U+002c1
+      U+0060c - U+0060d
+      U+0061e - U+00612
+      U+0064b - U+0065e
+      U+0074d - U+0076d
+      U+01800 - U+01805
+      U+01d00 - U+01d77
+      U+01d9b - U+01dbf
+      U+0200b - U+0200f
+      U+030fc - U+030fe
+      U+03260 - U+0327f
+      U+0fb46 - U+0fbb1
+      U+10450 - U+1049d
+
+12. The -o option (show only the matching part of a line) for pcregrep was not
+    compatible with GNU grep in that, if there was more than one match in a
+    line, it showed only the first of them. It now behaves in the same way as
+    GNU grep.
+
+13. If the -o and -v options were combined for pcregrep, it printed a blank
+    line for every non-matching line. GNU grep prints nothing, and pcregrep now
+    does the same. The return code can be used to tell if there were any
+    non-matching lines.
+
+14. Added --file-offsets and --line-offsets to pcregrep.
+
+15. The pattern (?=something)(?R) was not being diagnosed as a potentially
+    infinitely looping recursion. The bug was that positive lookaheads were not
+    being skipped when checking for a possible empty match (negative lookaheads
+    and both kinds of lookbehind were skipped).
+
+16. Fixed two typos in the Windows-only code in pcregrep.c, and moved the
+    inclusion of <windows.h> to before rather than after the definition of
+    INVALID_FILE_ATTRIBUTES (patch from David Byron).
+
+17. Specifying a possessive quantifier with a specific limit for a Unicode
+    character property caused pcre_compile() to compile bad code, which led at
+    runtime to PCRE_ERROR_INTERNAL (-14). Examples of patterns that caused this
+    are: /\p{Zl}{2,3}+/8 and /\p{Cc}{2}+/8. It was the possessive "+" that
+    caused the error; without that there was no problem.
+
+18. Added --enable-pcregrep-libz and --enable-pcregrep-libbz2.
+
+19. Added --enable-pcretest-libreadline.
+
+20. In pcrecpp.cc, the variable 'count' was incremented twice in
+    RE::GlobalReplace(). As a result, the number of replacements returned was
+    double what it should be. I removed one of the increments, but Craig sent a
+    later patch that removed the other one (the right fix) and added unit tests
+    that check the return values (which was not done before).
+
+21. Several CMake things:
+
+    (1) Arranged that, when cmake is used on Unix, the libraries end up with
+        the names libpcre and libpcreposix, not just pcre and pcreposix.
+
+    (2) The above change means that pcretest and pcregrep are now correctly
+        linked with the newly-built libraries, not previously installed ones.
+
+    (3) Added PCRE_SUPPORT_LIBREADLINE, PCRE_SUPPORT_LIBZ, PCRE_SUPPORT_LIBBZ2.
+
+22. In UTF-8 mode, with newline set to "any", a pattern such as .*a.*=.b.*
+    crashed when matching a string such as a\x{2029}b (note that \x{2029} is a
+    UTF-8 newline character). The key issue is that the pattern starts .*;
+    this means that the match must be either at the beginning, or after a
+    newline. The bug was in the code for advancing after a failed match and
+    checking that the new position followed a newline. It was not taking
+    account of UTF-8 characters correctly.
+
+23. PCRE was behaving differently from Perl in the way it recognized POSIX
+    character classes. PCRE was not treating the sequence [:...:] as a
+    character class unless the ... were all letters. Perl, however, seems to
+    allow any characters between [: and :], though of course it rejects as
+    unknown any "names" that contain non-letters, because all the known class
+    names consist only of letters. Thus, Perl gives an error for [[:1234:]],
+    for example, whereas PCRE did not - it did not recognize a POSIX character
+    class. This seemed a bit dangerous, so the code has been changed to be
+    closer to Perl. The behaviour is not identical to Perl, because PCRE will
+    diagnose an unknown class for, for example, [[:l\ower:]] where Perl will
+    treat it as [[:lower:]]. However, PCRE does now give "unknown" errors where
+    Perl does, and where it didn't before.
+
+24. Rewrite so as to remove the single use of %n from pcregrep because in some
+    Windows environments %n is disabled by default.
+
+
 Version 7.4 21-Sep-07
 ---------------------

--- a/ext/pcre/pcrelib/NEWS
+++ b/ext/pcre/pcrelib/NEWS
@ -1,6 +1,14 @@
 News about PCRE releases
 ------------------------

+Release 7.5 10-Jan-08
+---------------------
+
+This is mainly a bug-fix release. However the ability to link pcregrep with
+libz or libbz2 and the ability to link pcretest with libreadline have been
+added. Also the --line-offsets and --file-offsets options were added to
+pcregrep.
+

 Release 7.4 21-Sep-07
 ---------------------
--- a/ext/pcre/pcrelib/NON-UNIX-USE
+++ b/ext/pcre/pcrelib/NON-UNIX-USE
@ -84,7 +84,7 @@ The following are generic comments about building the PCRE C library "by hand".
       ucptable.h

 (5) Also ensure that you have the following file, which is #included as source
-     when building a debugging version of PCRE and is also used by pcretest.
+     when building a debugging version of PCRE, and is also used by pcretest.

       pcre_printint.src

--- a/ext/pcre/pcrelib/README
+++ b/ext/pcre/pcrelib/README
@ -258,6 +258,24 @@ library. You can read more about them in the pcrebuild man page.

  This automatically implies --enable-rebuild-chartables (see above).

+. It is possible to compile pcregrep to use libz and/or libbz2, in order to
+  read .gz and .bz2 files (respectively), by specifying one or both of
+
+  --enable-pcregrep-libz
+  --enable-pcregrep-libbz2
+
+  Of course, the relevant libraries must be installed on your system.
+
+. It is possible to compile pcretest so that it links with the libreadline
+  library, by specifying
+
+  --enable-pcretest-libreadline
+
+  If this is done, when pcretest's input is from a terminal, it reads it using
+  the readline() function. This provides line-editing and history facilities.
+  Note that libreadline is GPL-licenced, so if you distribute a binary of
+  pcretest linked in this way, there may be licensing issues.
+
 The "configure" script builds the following files for the basic C library:

 . Makefile is the makefile that builds the library
@ -725,4 +743,4 @@ The distribution should contain the following files:
 Philip Hazel
 Email local part: ph10
 Email domain: cam.ac.uk
-Last updated: 21 September 2007
+Last updated: 18 December 2007
--- a/ext/pcre/pcrelib/config.h
+++ b/ext/pcre/pcrelib/config.h
@ -51,6 +51,11 @@ them both to 0; an emulation function will be used. */
 /* Define to 1 if you have the <bits/type_traits.h> header file. */
 /* #undef HAVE_BITS_TYPE_TRAITS_H */

+/* Define to 1 if you have the <bzlib.h> header file. */
+#ifndef HAVE_BZLIB_H
+#define HAVE_BZLIB_H 1
+#endif
+
 /* Define to 1 if you have the <dirent.h> header file. */
 #ifndef HAVE_DIRENT_H
 #define HAVE_DIRENT_H 1
@ -86,6 +91,16 @@ them both to 0; an emulation function will be used. */
 #define HAVE_MEMORY_H 1
 #endif

+/* Define to 1 if you have the <readline/history.h> header file. */
+#ifndef HAVE_READLINE_HISTORY_H
+#define HAVE_READLINE_HISTORY_H 1
+#endif
+
+/* Define to 1 if you have the <readline/readline.h> header file. */
+#ifndef HAVE_READLINE_READLINE_H
+#define HAVE_READLINE_READLINE_H 1
+#endif
+
 /* Define to 1 if you have the <stdint.h> header file. */
 #ifndef HAVE_STDINT_H
 #define HAVE_STDINT_H 1
@ -152,6 +167,11 @@ them both to 0; an emulation function will be used. */
 /* Define to 1 if you have the <windows.h> header file. */
 /* #undef HAVE_WINDOWS_H */

+/* Define to 1 if you have the <zlib.h> header file. */
+#ifndef HAVE_ZLIB_H
+#define HAVE_ZLIB_H 1
+#endif
+
 /* Define to 1 if you have the `_strtoi64' function. */
 /* #undef HAVE__STRTOI64 */

@ -231,13 +251,13 @@ them both to 0; an emulation function will be used. */
 #define PACKAGE_NAME "PCRE"

 /* Define to the full name and version of this package. */
-#define PACKAGE_STRING "PCRE 7.4"
+#define PACKAGE_STRING "PCRE 7.5"

 /* Define to the one symbol short name of this package. */
 #define PACKAGE_TARNAME "pcre"

 /* Define to the version of this package. */
-#define PACKAGE_VERSION "7.4"
+#define PACKAGE_VERSION "7.5"


 /* If you are compiling for a system other than a Unix-like system or
@ -271,6 +291,17 @@ them both to 0; an emulation function will be used. */
 #define STDC_HEADERS 1
 #endif

+/* Define to allow pcregrep to be linked with libbz2, so that it is able to
+   handle .bz2 files. */
+/* #undef SUPPORT_LIBBZ2 */
+
+/* Define to allow pcretest to be linked with libreadline. */
+/* #undef SUPPORT_LIBREADLINE */
+
+/* Define to allow pcregrep to be linked with libz, so that it is able to
+   handle .gz files. */
+/* #undef SUPPORT_LIBZ */
+
 /* Define to enable support for Unicode properties */
 /* #undef SUPPORT_UCP */

@ -279,7 +310,7 @@ them both to 0; an emulation function will be used. */

 /* Version number of package */
 #ifndef VERSION
-#define VERSION "7.4"
+#define VERSION "7.5"
 #endif

 /* Define to empty if `const' does not conform to ANSI C. */
--- a/ext/pcre/pcrelib/pcre.h
+++ b/ext/pcre/pcrelib/pcre.h
@ -42,9 +42,9 @@ POSSIBILITY OF SUCH DAMAGE.
 /* The current PCRE version information. */

 #define PCRE_MAJOR          7
-#define PCRE_MINOR          4
+#define PCRE_MINOR          5
 #define PCRE_PRERELEASE     
-#define PCRE_DATE           2007-09-21
+#define PCRE_DATE           2008-01-10

 /* When an application links to a PCRE DLL in Windows, the symbols that are
 imported have to be identified as such. When building PCRE, the appropriate
--- a/ext/pcre/pcrelib/pcre_compile.c
+++ b/ext/pcre/pcrelib/pcre_compile.c
@ -239,7 +239,7 @@ static const char error_texts[] =
  /* 10 */
  "operand of unlimited repeat could match the empty string\0"  /** DEAD **/
  "internal error: unexpected repeat\0"
-  "unrecognized character after (?\0"
+  "unrecognized character after (? or (?-\0"
  "POSIX named classes are supported only within a class\0"
  "missing )\0"
  /* 15 */
@ -298,7 +298,9 @@ static const char error_texts[] =
  "(*VERB) with an argument is not supported\0"
  /* 60 */
  "(*VERB) not recognized\0"
-  "number is too big";
+  "number is too big\0"
+  "subpattern name expected\0"
+  "digit expected after (?+";


 /* Table to identify digits and hex digits. This is used when compiling
@ -494,16 +496,16 @@ ptr--;                            /* Set pointer back to the last byte */

 if (c == 0) *errorcodeptr = ERR1;

-/* Non-alphamerics are literals. For digits or letters, do an initial lookup in
-a table. A non-zero result is something that can be returned immediately.
+/* Non-alphanumerics are literals. For digits or letters, do an initial lookup
+in a table. A non-zero result is something that can be returned immediately.
 Otherwise further processing may be required. */

 #ifndef EBCDIC  /* ASCII coding */
-else if (c < '0' || c > 'z') {}                           /* Not alphameric */
+else if (c < '0' || c > 'z') {}                           /* Not alphanumeric */
 else if ((i = escapes[c - '0']) != 0) c = i;

 #else           /* EBCDIC coding */
-else if (c < 'a' || (ebcdic_chartab[c] & 0x0E) == 0) {}   /* Not alphameric */
+else if (c < 'a' || (ebcdic_chartab[c] & 0x0E) == 0) {}   /* Not alphanumeric */
 else if ((i = escapes[c - 0x48]) != 0)  c = i;
 #endif

@ -720,10 +722,10 @@ else
    break;

    /* PCRE_EXTRA enables extensions to Perl in the matter of escapes. Any
-    other alphameric following \ is an error if PCRE_EXTRA was set; otherwise,
-    for Perl compatibility, it is a literal. This code looks a bit odd, but
-    there used to be some cases other than the default, and there may be again
-    in future, so I haven't "optimized" it. */
+    other alphanumeric following \ is an error if PCRE_EXTRA was set;
+    otherwise, for Perl compatibility, it is a literal. This code looks a bit
+    odd, but there used to be some cases other than the default, and there may
+    be again in future, so I haven't "optimized" it. */

    default:
    if ((options & PCRE_EXTRA) != 0) switch(c)
@ -1504,8 +1506,9 @@ for (;;)
 can match the empty string or not. It is called from could_be_empty()
 below and from compile_branch() when checking for an unlimited repeat of a
 group that can match nothing. Note that first_significant_code() skips over
-assertions. If we hit an unclosed bracket, we return "empty" - this means we've
-struck an inner bracket whose current branch will already have been scanned.
+backward and negative forward assertions when its final argument is TRUE. If we
+hit an unclosed bracket, we return "empty" - this means we've struck an inner
+bracket whose current branch will already have been scanned.

 Arguments:
  code        points to start of search
@ -1527,6 +1530,16 @@ for (code = first_significant_code(code + _pcre_OP_lengths[*code], NULL, 0, TRUE

  c = *code;

+  /* Skip over forward assertions; the other assertions are skipped by
+  first_significant_code() with a TRUE final argument. */
+
+  if (c == OP_ASSERT)
+    {
+    do code += GET(code, 1); while (*code == OP_ALT);
+    c = *code;
+    continue;
+    }
+
  /* Groups with zero repeats can of course be empty; skip them. */

  if (c == OP_BRAZERO || c == OP_BRAMINZERO)
@ -1722,29 +1735,48 @@ return TRUE;
 *************************************************/

 /* This function is called when the sequence "[:" or "[." or "[=" is
-encountered in a character class. It checks whether this is followed by an
-optional ^ and then a sequence of letters, terminated by a matching ":]" or
-".]" or "=]".
+encountered in a character class. It checks whether this is followed by a
+sequence of characters terminated by a matching ":]" or ".]" or "=]". If we
+reach an unescaped ']' without the special preceding character, return FALSE.

-Argument:
+Originally, this function only recognized a sequence of letters between the
+terminators, but it seems that Perl recognizes any sequence of characters,
+though of course unknown POSIX names are subsequently rejected. Perl gives an
+"Unknown POSIX class" error for [:f\oo:] for example, where previously PCRE
+didn't consider this to be a POSIX class. Likewise for [:1234:].
+
+The problem in trying to be exactly like Perl is in the handling of escapes. We
+have to be sure that [abc[:x\]pqr] is *not* treated as containing a POSIX
+class, but [abc[:x\]pqr:]] is (so that an error can be generated). The code
+below handles the special case of \], but does not try to do any other escape
+processing. This makes it different from Perl for cases such as [:l\ower:]
+where Perl recognizes it as the POSIX class "lower" but PCRE does not recognize
+"l\ower". This is a lesser evil that not diagnosing bad classes when Perl does,
+I think.
+
+Arguments:
  ptr      pointer to the initial [
  endptr   where to return the end pointer
-  cd       pointer to compile data

 Returns:   TRUE or FALSE
 */

 static BOOL
-check_posix_syntax(const uschar *ptr, const uschar **endptr, compile_data *cd)
+check_posix_syntax(const uschar *ptr, const uschar **endptr)
 {
 int terminator;          /* Don't combine these lines; the Solaris cc */
 terminator = *(++ptr);   /* compiler warns about "non-constant" initializer. */
-if (*(++ptr) == '^') ptr++;
-while ((cd->ctypes[*ptr] & ctype_letter) != 0) ptr++;
-if (*ptr == terminator && ptr[1] == ']')
+for (++ptr; *ptr != 0; ptr++)
  {
-  *endptr = ptr;
-  return TRUE;
+  if (*ptr == '\\' && ptr[1] == ']') ptr++; else
+    {
+    if (*ptr == ']') return FALSE;
+    if (*ptr == terminator && ptr[1] == ']')
+      {
+      *endptr = ptr;
+      return TRUE;
+      }
+    }
  }
 return FALSE;
 }
@ -2381,6 +2413,7 @@ req_caseopt = ((options & PCRE_CASELESS) != 0)? REQ_CASELESS : 0;
 for (;; ptr++)
  {
  BOOL negate_class;
+  BOOL should_flip_negation;
  BOOL possessive_quantifier;
  BOOL is_quantifier;
  BOOL is_recurse;
@ -2604,7 +2637,7 @@ for (;; ptr++)
    they are encountered at the top level, so we'll do that too. */

    if ((ptr[1] == ':' || ptr[1] == '.' || ptr[1] == '=') &&
-        check_posix_syntax(ptr, &tempptr, cd))
+        check_posix_syntax(ptr, &tempptr))
      {
      *errorcodeptr = (ptr[1] == ':')? ERR13 : ERR31;
      goto FAILED;
@ -2629,6 +2662,12 @@ for (;; ptr++)
      else break;
      }

+    /* If a class contains a negative special such as \S, we need to flip the
+    negation flag at the end, so that support for characters > 255 works
+    correctly (they are all included in the class). */
+
+    should_flip_negation = FALSE;
+
    /* Keep a count of chars with values < 256 so that we can optimize the case
    of just a single character (as long as it's < 256). However, For higher
    valued UTF-8 characters, we don't yet do any optimization. */
@ -2684,7 +2723,7 @@ for (;; ptr++)

      if (c == '[' &&
          (ptr[1] == ':' || ptr[1] == '.' || ptr[1] == '=') &&
-          check_posix_syntax(ptr, &tempptr, cd))
+          check_posix_syntax(ptr, &tempptr))
        {
        BOOL local_negate = FALSE;
        int posix_class, taboffset, tabopt;
@ -2701,6 +2740,7 @@ for (;; ptr++)
        if (*ptr == '^')
          {
          local_negate = TRUE;
+          should_flip_negation = TRUE;  /* Note negative special */
          ptr++;
          }

@ -2775,7 +2815,7 @@ for (;; ptr++)
        c = check_escape(&ptr, errorcodeptr, cd->bracount, options, TRUE);
        if (*errorcodeptr != 0) goto FAILED;

-        if (-c == ESC_b) c = '\b';       /* \b is backslash in a class */
+        if (-c == ESC_b) c = '\b';       /* \b is backspace in a class */
        else if (-c == ESC_X) c = 'X';   /* \X is literal X in a class */
        else if (-c == ESC_R) c = 'R';   /* \R is literal R in a class */
        else if (-c == ESC_Q)            /* Handle start of quoted string */
@ -2803,6 +2843,7 @@ for (;; ptr++)
            continue;

            case ESC_D:
+            should_flip_negation = TRUE;
            for (c = 0; c < 32; c++) classbits[c] |= ~cbits[c+cbit_digit];
            continue;

@ -2811,6 +2852,7 @@ for (;; ptr++)
            continue;

            case ESC_W:
+            should_flip_negation = TRUE;
            for (c = 0; c < 32; c++) classbits[c] |= ~cbits[c+cbit_word];
            continue;

@ -2820,13 +2862,11 @@ for (;; ptr++)
            continue;

            case ESC_S:
+            should_flip_negation = TRUE;
            for (c = 0; c < 32; c++) classbits[c] |= ~cbits[c+cbit_space];
            classbits[1] |= 0x08;    /* Perl 5.004 onwards omits VT from \s */
            continue;

-            case ESC_E: /* Perl ignores an orphan \E */
-            continue;
-
            default:    /* Not recognized; fall through */
            break;      /* Need "default" setting to stop compiler warning. */
            }
@ -3061,7 +3101,7 @@ for (;; ptr++)
          d = check_escape(&ptr, errorcodeptr, cd->bracount, options, TRUE);
          if (*errorcodeptr != 0) goto FAILED;

-          /* \b is backslash; \X is literal X; \R is literal R; any other
+          /* \b is backspace; \X is literal X; \R is literal R; any other
          special means the '-' was literal */

          if (d < 0)
@ -3325,11 +3365,14 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
    zeroreqbyte = reqbyte;

    /* If there are characters with values > 255, we have to compile an
-    extended class, with its own opcode. If there are no characters < 256,
-    we can omit the bitmap in the actual compiled code. */
+    extended class, with its own opcode, unless there was a negated special
+    such as \S in the class, because in that case all characters > 255 are in
+    the class, so any that were explicitly given as well can be ignored. If
+    (when there are explicit characters > 255 that must be listed) there are no
+    characters < 256, we can omit the bitmap in the actual compiled code. */

 #ifdef SUPPORT_UTF8
-    if (class_utf8)
+    if (class_utf8 && !should_flip_negation)
      {
      *class_utf8data++ = XCL_END;    /* Marks the end of extra data */
      *code++ = OP_XCLASS;
@ -3355,20 +3398,19 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
      }
 #endif

-    /* If there are no characters > 255, negate the 32-byte map if necessary,
-    and copy it into the code vector. If this is the first thing in the branch,
-    there can be no first char setting, whatever the repeat count. Any reqbyte
-    setting must remain unchanged after any kind of repeat. */
+    /* If there are no characters > 255, set the opcode to OP_CLASS or
+    OP_NCLASS, depending on whether the whole class was negated and whether
+    there were negative specials such as \S in the class. Then copy the 32-byte
+    map into the code vector, negating it if necessary. */

+    *code++ = (negate_class == should_flip_negation) ? OP_CLASS : OP_NCLASS;
    if (negate_class)
      {
-      *code++ = OP_NCLASS;
      if (lengthptr == NULL)    /* Save time in the pre-compile phase */
        for (c = 0; c < 32; c++) code[c] = ~classbits[c];
      }
    else
      {
-      *code++ = OP_CLASS;
      memcpy(code, classbits, 32);
      }
    code += 32;
@ -4004,7 +4046,9 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
      int len;
      if (*tempcode == OP_EXACT || *tempcode == OP_TYPEEXACT ||
          *tempcode == OP_NOTEXACT)
-        tempcode += _pcre_OP_lengths[*tempcode];
+        tempcode += _pcre_OP_lengths[*tempcode] +
+          ((*tempcode == OP_TYPEEXACT &&
+             (tempcode[3] == OP_PROP || tempcode[3] == OP_NOTPROP))? 2:0);
      len = code - tempcode;
      if (len > 0) switch (*tempcode)
        {
@ -4231,16 +4275,13 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
            *errorcodeptr = ERR58;
            goto FAILED;
            }
-          if (refsign == '-')
+          recno = (refsign == '-')?
+            cd->bracount - recno + 1 : recno +cd->bracount;
+          if (recno <= 0 || recno > cd->final_bracount)
            {
-            recno = cd->bracount - recno + 1;
-            if (recno <= 0)
-              {
-              *errorcodeptr = ERR15;
-              goto FAILED;
-              }
+            *errorcodeptr = ERR15;
+            goto FAILED;
            }
-          else recno += cd->bracount;
          PUT2(code, 2+LINK_SIZE, recno);
          break;
          }
@ -4312,9 +4353,10 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
          skipbytes = 1;
          }

-        /* Check for the "name" actually being a subpattern number. */
+        /* Check for the "name" actually being a subpattern number. We are
+        in the second pass here, so final_bracount is set. */

-        else if (recno > 0)
+        else if (recno > 0 && recno <= cd->final_bracount)
          {
          PUT2(code, 2+LINK_SIZE, recno);
          }
@ -4508,7 +4550,9 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */

        /* We come here from the Python syntax above that handles both
        references (?P=name) and recursion (?P>name), as well as falling
-        through from the Perl recursion syntax (?&name). */
+        through from the Perl recursion syntax (?&name). We also come here from
+        the Perl \k<name> or \k'name' back reference syntax and the \k{name}
+        .NET syntax. */

        NAMED_REF_OR_RECURSE:
        name = ++ptr;
@ -4520,6 +4564,11 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */

        if (lengthptr != NULL)
          {
+          if (namelen == 0)
+            {
+            *errorcodeptr = ERR62;
+            goto FAILED;
+            }
          if (*ptr != terminator)
            {
            *errorcodeptr = ERR42;
@ -4533,14 +4582,19 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
          recno = 0;
          }

-        /* In the real compile, seek the name in the table */
+        /* In the real compile, seek the name in the table. We check the name
+        first, and then check that we have reached the end of the name in the
+        table. That way, if the name that is longer than any in the table,
+        the comparison will fail without reading beyond the table entry. */

        else
          {
          slot = cd->name_table;
          for (i = 0; i < cd->names_found; i++)
            {
-            if (strncmp((char *)name, (char *)slot+2, namelen) == 0) break;
+            if (strncmp((char *)name, (char *)slot+2, namelen) == 0 &&
+                slot[2+namelen] == 0)
+              break;
            slot += cd->name_entry_size;
            }

@ -4577,7 +4631,15 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
          {
          const uschar *called;

-          if ((refsign = *ptr) == '+') ptr++;
+          if ((refsign = *ptr) == '+')
+            {
+            ptr++;
+            if ((digitab[*ptr] & ctype_digit) == 0)
+              {
+              *errorcodeptr = ERR63;
+              goto FAILED;
+              }
+            }
          else if (refsign == '-')
            {
            if ((digitab[ptr[1]] & ctype_digit) == 0)
@ -5904,7 +5966,7 @@ to compile parts of the pattern into; the compiled code is discarded when it is
 no longer needed, so hopefully this workspace will never overflow, though there
 is a test for its doing so. */

-cd->bracount = 0;
+cd->bracount = cd->final_bracount = 0;
 cd->names_found = 0;
 cd->name_entry_size = 0;
 cd->name_table = NULL;
@ -5981,6 +6043,7 @@ field. Reset the bracket count and the names_found field. Also reset the hwm
 field; this time it's used for remembering forward references to subpatterns.
 */

+cd->final_bracount = cd->bracount;  /* Save for checking forward references */
 cd->bracount = 0;
 cd->names_found = 0;
 cd->name_table = (uschar *)re + re->name_table_offset;
--- a/ext/pcre/pcrelib/pcre_exec.c
+++ b/ext/pcre/pcrelib/pcre_exec.c
@ -4668,10 +4668,10 @@ for(;;)
    if (first_byte_caseless)
      while (start_match < end_subject &&
             md->lcc[*start_match] != first_byte)
-        start_match++;
+        { NEXTCHAR(start_match); }
    else
      while (start_match < end_subject && *start_match != first_byte)
-        start_match++;
+        { NEXTCHAR(start_match); }
    }

  /* Or to just after a linebreak for a multiline match if possible */
@ -4681,7 +4681,7 @@ for(;;)
    if (start_match > md->start_subject + start_offset)
      {
      while (start_match <= end_subject && !WAS_NEWLINE(start_match))
-        start_match++;
+        { NEXTCHAR(start_match); }

      /* If we have just passed a CR and the newline option is ANY or ANYCRLF,
      and we are now at a LF, advance the match position by one more character.
@ -4702,7 +4702,9 @@ for(;;)
    while (start_match < end_subject)
      {
      register unsigned int c = *start_match;
-      if ((start_bits[c/8] & (1 << (c&7))) == 0) start_match++; else break;
+      if ((start_bits[c/8] & (1 << (c&7))) == 0)
+        { NEXTCHAR(start_match); }
+      else break;
      }
    }

--- a/ext/pcre/pcrelib/pcre_internal.h
+++ b/ext/pcre/pcrelib/pcre_internal.h
@ -363,6 +363,7 @@ never be called in byte mode. To make sure it can never even appear when UTF-8
 support is omitted, we don't even define it. */

 #ifndef SUPPORT_UTF8
+#define NEXTCHAR(p) p++;
 #define GETCHAR(c, eptr) c = *eptr;
 #define GETCHARTEST(c, eptr) c = *eptr;
 #define GETCHARINC(c, eptr) c = *eptr++;
@ -372,6 +373,13 @@ support is omitted, we don't even define it. */

 #else   /* SUPPORT_UTF8 */

+/* Advance a character pointer one byte in non-UTF-8 mode and by one character
+in UTF-8 mode. */
+
+#define NEXTCHAR(p) \
+  p++; \
+  if (utf8) { while((*p & 0xc0) == 0x80) p++; }
+
 /* Get the next UTF-8 character, not advancing the pointer. This is called when
 we know we are in UTF-8 mode. */

@ -871,7 +879,7 @@ enum { ERR0,  ERR1,  ERR2,  ERR3,  ERR4,  ERR5,  ERR6,  ERR7,  ERR8,  ERR9,
       ERR30, ERR31, ERR32, ERR33, ERR34, ERR35, ERR36, ERR37, ERR38, ERR39,
       ERR40, ERR41, ERR42, ERR43, ERR44, ERR45, ERR46, ERR47, ERR48, ERR49,
       ERR50, ERR51, ERR52, ERR53, ERR54, ERR55, ERR56, ERR57, ERR58, ERR59,
-       ERR60, ERR61 };
+       ERR60, ERR61, ERR62, ERR63 };

 /* The real format of the start of the pcre block; the index of names and the
 code vector run on as long as necessary after the end. We store an explicit
@ -934,7 +942,8 @@ typedef struct compile_data {
  uschar *name_table;           /* The name/number table */
  int  names_found;             /* Number of entries so far */
  int  name_entry_size;         /* Size of each entry */
-  int  bracount;                /* Count of capturing parens */
+  int  bracount;                /* Count of capturing parens as we compile */
+  int  final_bracount;          /* Saved value after first pass */
  int  top_backref;             /* Maximum back reference */
  unsigned int backref_map;     /* Bitmap of low back refs */
  int  external_options;        /* External (initial) options */
@ -1036,7 +1045,7 @@ typedef struct dfa_match_data {
 #define ctype_letter  0x02
 #define ctype_digit   0x04
 #define ctype_xdigit  0x08
-#define ctype_word    0x10   /* alphameric or '_' */
+#define ctype_word    0x10   /* alphanumeric or '_' */
 #define ctype_meta    0x80   /* regexp meta char or zero (end pattern) */

 /* Offsets for the bitmap tables in pcre_cbits. Each table contains a set
--- a/ext/pcre/pcrelib/pcre_valid_utf8.c
+++ b/ext/pcre/pcrelib/pcre_valid_utf8.c
@ -60,7 +60,7 @@ an invalid string are then undefined.
 Originally, this function checked according to RFC 2279, allowing for values in
 the range 0 to 0x7fffffff, up to 6 bytes long, but ensuring that they were in
 the canonical format. Once somebody had pointed out RFC 3629 to me (it
-obsoletes 2279), additional restrictions were applies. The values are now
+obsoletes 2279), additional restrictions were applied. The values are now
 limited to be between 0 and 0x0010ffff, no more than 4 bytes long, and the
 subrange 0xd000 to 0xdfff is excluded.

--- a/ext/pcre/pcrelib/pcregrep.c
+++ b/ext/pcre/pcrelib/pcregrep.c
--- a/ext/pcre/pcrelib/pcreposix.c
+++ b/ext/pcre/pcrelib/pcreposix.c
@ -122,7 +122,9 @@ static const int eint[] = {
  REG_INVARG,  /* inconsistent NEWLINE options */
  REG_BADPAT,  /* \g is not followed followed by an (optionally braced) non-zero number */
  REG_BADPAT,  /* (?+ or (?- must be followed by a non-zero number */
-  REG_BADPAT   /* number is too big */
+  REG_BADPAT,  /* number is too big */
+  REG_BADPAT,  /* subpattern name expected */
+  REG_BADPAT   /* digit expected after (?+ */
 };

 /* Table of texts corresponding to POSIX error codes */
--- a/ext/pcre/pcrelib/testdata/grepoutput
+++ b/ext/pcre/pcrelib/testdata/grepoutput
@ -358,10 +358,13 @@ after the binary zero
 ./testdata/grepinput:597:after the binary zero
 ---------------------------- Test 42 ------------------------------
 595:before
+595:zero
 596:zero
 597:after
+597:zero
 ---------------------------- Test 43 ------------------------------
 595:before
+595:zero
 596:zero
 597:zero
 ---------------------------- Test 44 ------------------------------
@ -387,3 +390,15 @@ PUT NEW DATA ABOVE THIS LINE.
 over the lazy dog.
 ---------------------------- Test 51 ------------------------------
 fox [1;31mjumps[00m
+---------------------------- Test 52 ------------------------------
+36972,6
+36990,4
+37024,4
+37066,5
+37083,4
+---------------------------- Test 53 ------------------------------
+595:15,6
+595:33,4
+596:28,4
+597:15,5
+597:32,4
--- a/ext/pcre/pcrelib/testdata/testinput1
+++ b/ext/pcre/pcrelib/testdata/testinput1
@ -3421,11 +3421,6 @@
 /((?m)^b)/
    a\nb\nc\n

-/(?(1)a|b)/
-
-/(?(1)b|a)/
-    a
-
 /(x)?(?(1)a|b)/
    *** Failers
    a
@ -4030,4 +4025,15 @@
 /(  (?(1)0|)*   )/x
    abcd

+/[[:abcd:xyz]]/
+    a]
+    :] 
+    
+/[abc[:x\]pqr]/
+    a
+    [
+    :
+    ]
+    p    
+
 / End of testinput1 /
--- a/ext/pcre/pcrelib/testdata/testinput2
+++ b/ext/pcre/pcrelib/testdata/testinput2
@ -398,8 +398,6 @@

 /(?(1?)a|b)/

-/(?(1)a|b|c)/
-
 /[a[:xyz:/

 /(?<=x+)y/
@ -568,15 +566,15 @@

 /ab\d+/I

-/a(?(1)b)/I
+/a(?(1)b)(.)/I

-/a(?(1)bag|big)/I
+/a(?(1)bag|big)(.)/I

-/a(?(1)bag|big)*/I
+/a(?(1)bag|big)*(.)/I

-/a(?(1)bag|big)+/I
+/a(?(1)bag|big)+(.)/I

-/a(?(1)b..|b..)/I
+/a(?(1)b..|b..)(.)/I

 /ab\d{0}e/I

@ -977,13 +975,13 @@

 /()a/I

-/(?(1)ab|ac)/I
+/(?(1)ab|ac)(.)/I

-/(?(1)abz|acz)/I
+/(?(1)abz|acz)(.)/I

-/(?(1)abz)/I
+/(?(1)abz)(.)/I

-/(?(1)abz)123/I
+/(?(1)abz)(1)23/I

 /(a)+/I

@ -2190,8 +2188,8 @@ a random value. /Ix

 /((?(-2)a))/BZ

-/^(?(+1)X|Y)/BZ
-    Y
+/^(?(+1)X|Y)(.)/BZ
+    Y!

 /(foo)\Kbar/
    foobar
@ -2535,4 +2533,60 @@ a random value. /Ix

 /(*CRLF)(*BSR_ANYCRLF)(*CR)ab/I

+/(?<a>)(?&)/
+
+/(?<abc>)(?&a)/
+
+/(?<a>)(?&aaaaaaaaaaaaaaaaaaaaaaa)/
+
+/(?+-a)/
+
+/(?-+a)/
+
+/(?(-1))/
+
+/(?(+10))/
+
+/(?(10))/
+
+/(?(+2))()()/
+
+/(?(2))()()/
+
+/\k''/
+
+/\k<>/
+
+/\k{}/
+
+/(?P=)/
+
+/(?P>)/
+
+/(?!\w)(?R)/
+
+/(?=\w)(?R)/
+
+/(?<!\w)(?R)/
+
+/(?<=\w)(?R)/
+
+/[[:foo:]]/
+
+/[[:1234:]]/
+
+/[[:f\oo:]]/
+
+/[[: :]]/
+
+/[[:...:]]/
+
+/[[:l\ower:]]/
+
+/[[:abc\:]]/
+
+/[abc[:x\]pqr:]]/
+
+/[[:a\dz:]]/
+
 / End of testinput2 /
--- a/ext/pcre/pcrelib/testdata/testinput4
+++ b/ext/pcre/pcrelib/testdata/testinput4
@ -535,4 +535,76 @@
 /\W{2}/8g
    +\x{a3}== 

+/\S/8g
+    \x{442}\x{435}\x{441}\x{442}
+
+/[\S]/8g
+    \x{442}\x{435}\x{441}\x{442}
+
+/\D/8g
+    \x{442}\x{435}\x{441}\x{442}
+
+/[\D]/8g
+    \x{442}\x{435}\x{441}\x{442}
+
+/\W/8g
+    \x{2442}\x{2435}\x{2441}\x{2442}
+
+/[\W]/8g
+    \x{2442}\x{2435}\x{2441}\x{2442}
+    
+/[\S\s]*/8
+    abc\n\r\x{442}\x{435}\x{441}\x{442}xyz 
+
+/[\x{41f}\S]/8g
+    \x{442}\x{435}\x{441}\x{442}
+
+/.[^\S]./8g
+    abc def\x{442}\x{443}xyz\npqr
+
+/.[^\S\n]./8g
+    abc def\x{442}\x{443}xyz\npqr
+
+/[[:^alnum:]]/8g  
+    +\x{2442}
+    
+/[[:^alpha:]]/8g 
+    +\x{2442}
+    
+/[[:^ascii:]]/8g 
+    A\x{442}
+    
+/[[:^blank:]]/8g 
+    A\x{442}
+    
+/[[:^cntrl:]]/8g 
+    A\x{442}
+    
+/[[:^digit:]]/8g 
+    A\x{442}
+    
+/[[:^graph:]]/8g 
+    \x19\x{e01ff}
+    
+/[[:^lower:]]/8g 
+    A\x{422}
+    
+/[[:^print:]]/8g 
+    \x{19}\x{e01ff}
+    
+/[[:^punct:]]/8g 
+    A\x{442}
+    
+/[[:^space:]]/8g 
+    A\x{442}
+    
+/[[:^upper:]]/8g 
+    a\x{442}
+    
+/[[:^word:]]/8g  
+    +\x{2442}
+    
+/[[:^xdigit:]]/8g
+    M\x{442}
+
 / End of testinput4 /
--- a/ext/pcre/pcrelib/testdata/testinput5
+++ b/ext/pcre/pcrelib/testdata/testinput5
@ -453,4 +453,12 @@ can't tell the difference.) --/
    a\x{85}b\<bsr_anycrlf>
    a\x0bb\<bsr_anycrlf>
 
+/.*a.*=.b.*/8<ANY>
+    QQQ\x{2029}ABCaXYZ=!bPQR
+    ** Failers
+    a\x{2029}b
+    \x61\xe2\x80\xa9\x62 
+
+/[[:a\x{100}b:]]/8
+
 / End of testinput5 /
--- a/ext/pcre/pcrelib/testdata/testinput6
+++ b/ext/pcre/pcrelib/testdata/testinput6
@ -832,4 +832,79 @@ was broken in all cases./

 /(\p{Yi}{0,3}+\277)*/

+/^[\p{Arabic}]/8
+    \x{60e} 
+    \x{656} 
+    \x{657} 
+    \x{658} 
+    \x{659} 
+    \x{65a} 
+    \x{65b} 
+    \x{65c} 
+    \x{65d} 
+    \x{65e} 
+    \x{66a} 
+    \x{6e9} 
+    \x{6ef}
+    \x{6fa}  
+    ** Failers
+    \x{600}
+    \x{650}
+    \x{651}  
+    \x{652}  
+    \x{653}  
+    \x{654} 
+    \x{655} 
+    \x{65f}  
+    
+/^\p{Cyrillic}/8
+    \x{1d2b} 
+    
+/^\p{Common}/8
+    \x{589}
+    \x{60c}
+    \x{61f}  
+    \x{964}
+    \x{965}  
+    \x{970}  
+
+/^\p{Inherited}/8
+    \x{64b}
+    \x{654}
+    \x{655}
+    \x{200c} 
+    ** Failers
+    \x{64a}
+    \x{656}     
+
+/^\p{Shavian}/8
+    \x{10450}
+    \x{1047f}
+    
+/^\p{Deseret}/8
+    \x{10400}
+    \x{1044f}
+    
+/^\p{Osmanya}/8
+    \x{10480}
+    \x{1049d}
+    \x{104a0}
+    \x{104a9}
+    ** Failers
+    \x{1049e}
+    \x{1049f}
+    \x{104aa}           
+
+/\p{Zl}{2,3}+/8BZ
+    \xe2\x80\xa8\xe2\x80\xa8
+    \x{2028}\x{2028}\x{2028}
+    
+/\p{Zl}/8BZ
+
+/\p{Lu}{3}+/8BZ
+
+/\pL{2}+/8BZ
+
+/\p{Cc}{2}+/8BZ
+
 / End of testinput6 /
--- a/ext/pcre/pcrelib/testdata/testoutput1
+++ b/ext/pcre/pcrelib/testdata/testoutput1
@ -5551,12 +5551,6 @@ No match
 0: b
 1: b

-/(?(1)a|b)/
-
-/(?(1)b|a)/
-    a
- 0: a
-
 /(x)?(?(1)a|b)/
    *** Failers
 No match
@ -6593,4 +6587,22 @@ No match
 0: 
 1: 

+/[[:abcd:xyz]]/
+    a]
+ 0: a]
+    :] 
+ 0: :]
+    
+/[abc[:x\]pqr]/
+    a
+ 0: a
+    [
+ 0: [
+    :
+ 0: :
+    ]
+ 0: ]
+    p    
+ 0: p
+
 / End of testinput1 /
--- a/ext/pcre/pcrelib/testdata/testoutput2
+++ b/ext/pcre/pcrelib/testdata/testoutput2
@ -109,7 +109,7 @@ Failed: missing ) at offset 4
 Failed: missing ) after comment at offset 7

 /(?z)abc/
-Failed: unrecognized character after (? at offset 2
+Failed: unrecognized character after (? or (?- at offset 2

 /.*b/I
 Capturing subpattern count = 0
@ -310,7 +310,7 @@ No match
 No match

 /ab(?z)cd/
-Failed: unrecognized character after (? at offset 4
+Failed: unrecognized character after (? or (?- at offset 4

 /^abc|def/I
 Capturing subpattern count = 0
@ -946,26 +946,23 @@ Failed: missing ) at offset 4
 Failed: unrecognized character after (?< at offset 3

 /a(?{)b/
-Failed: unrecognized character after (? at offset 3
+Failed: unrecognized character after (? or (?- at offset 3

 /a(?{{})b/
-Failed: unrecognized character after (? at offset 3
+Failed: unrecognized character after (? or (?- at offset 3

 /a(?{}})b/
-Failed: unrecognized character after (? at offset 3
+Failed: unrecognized character after (? or (?- at offset 3

 /a(?{"{"})b/
-Failed: unrecognized character after (? at offset 3
+Failed: unrecognized character after (? or (?- at offset 3

 /a(?{"{"}})b/
-Failed: unrecognized character after (? at offset 3
+Failed: unrecognized character after (? or (?- at offset 3

 /(?(1?)a|b)/
 Failed: malformed number or name after (?( at offset 4

-/(?(1)a|b|c)/
-Failed: conditional group contains more than two branches at offset 10
-
 /[a[:xyz:/
 Failed: missing terminating ] for character class at offset 8

@ -1599,32 +1596,32 @@ No options
 First char = 'a'
 Need char = 'b'

-/a(?(1)b)/I
-Capturing subpattern count = 0
+/a(?(1)b)(.)/I
+Capturing subpattern count = 1
 No options
 First char = 'a'
 No need char

-/a(?(1)bag|big)/I
-Capturing subpattern count = 0
+/a(?(1)bag|big)(.)/I
+Capturing subpattern count = 1
 No options
 First char = 'a'
 Need char = 'g'

-/a(?(1)bag|big)*/I
-Capturing subpattern count = 0
+/a(?(1)bag|big)*(.)/I
+Capturing subpattern count = 1
 No options
 First char = 'a'
 No need char

-/a(?(1)bag|big)+/I
-Capturing subpattern count = 0
+/a(?(1)bag|big)+(.)/I
+Capturing subpattern count = 1
 No options
 First char = 'a'
 Need char = 'g'

-/a(?(1)b..|b..)/I
-Capturing subpattern count = 0
+/a(?(1)b..|b..)(.)/I
+Capturing subpattern count = 1
 No options
 First char = 'a'
 Need char = 'b'
@ -1905,7 +1902,7 @@ No need char
 ------------------------------------------------------------------
        Bra
        ^
-        [\x00-/:-@[-`{-\xff]
+        [\x00-/:-@[-`{-\xff] (neg)
        Ket
        End
 ------------------------------------------------------------------
@ -1931,7 +1928,7 @@ No need char
 ------------------------------------------------------------------
        Bra
        ^
-        [\x00-@[-`{-\xff]
+        [\x00-@[-`{-\xff] (neg)
        Ket
        End
 ------------------------------------------------------------------
@ -1965,7 +1962,7 @@ No need char
 ------------------------------------------------------------------
        Bra
        ^
-        [\x80-\xff]
+        [\x80-\xff] (neg)
        Ket
        End
 ------------------------------------------------------------------
@ -1991,7 +1988,7 @@ No need char
 ------------------------------------------------------------------
        Bra
        ^
-        [\x00-\x08\x0a-\x1f!-\xff]
+        [\x00-\x08\x0a-\x1f!-\xff] (neg)
        Ket
        End
 ------------------------------------------------------------------
@ -2142,7 +2139,7 @@ No need char
 ------------------------------------------------------------------
        Bra
        ^
-        [ -~\x80-\xff]
+        [ -~\x80-\xff] (neg)
        Ket
        End
 ------------------------------------------------------------------
@ -2155,7 +2152,7 @@ No need char
 ------------------------------------------------------------------
        Bra
        ^
-        [\x00-/12:-\xff]
+        [\x00-/12:-\xff] (neg)
        Ket
        End
 ------------------------------------------------------------------
@ -2168,7 +2165,7 @@ No need char
 ------------------------------------------------------------------
        Bra
        ^
-        [\x00-\x08\x0a-\x1f!-\xff]
+        [\x00-\x08\x0a-\x1f!-\xff] (neg)
        Ket
        End
 ------------------------------------------------------------------
@ -2736,7 +2733,7 @@ No need char
 /[\S]/DZ
 ------------------------------------------------------------------
        Bra
-        [\x00-\x08\x0b\x0e-\x1f!-\xff]
+        [\x00-\x08\x0b\x0e-\x1f!-\xff] (neg)
        Ket
        End
 ------------------------------------------------------------------
@ -3441,26 +3438,26 @@ No options
 No first char
 Need char = 'a'

-/(?(1)ab|ac)/I
-Capturing subpattern count = 0
+/(?(1)ab|ac)(.)/I
+Capturing subpattern count = 1
 No options
 First char = 'a'
 No need char

-/(?(1)abz|acz)/I
-Capturing subpattern count = 0
+/(?(1)abz|acz)(.)/I
+Capturing subpattern count = 1
 No options
 First char = 'a'
 Need char = 'z'

-/(?(1)abz)/I
-Capturing subpattern count = 0
+/(?(1)abz)(.)/I
+Capturing subpattern count = 1
 No options
 No first char
 No need char

-/(?(1)abz)123/I
-Capturing subpattern count = 0
+/(?(1)abz)(1)23/I
+Capturing subpattern count = 1
 No options
 No first char
 Need char = '3'
@ -8308,7 +8305,7 @@ Failed: reference to non-existent subpattern at offset 6
 /((?(-2)a))/BZ
 Failed: reference to non-existent subpattern at offset 7

-/^(?(+1)X|Y)/BZ
+/^(?(+1)X|Y)(.)/BZ
 ------------------------------------------------------------------
        Bra
        ^
@ -8318,11 +8315,15 @@ Failed: reference to non-existent subpattern at offset 7
        Alt
        Y
        Ket
+        CBra 1
+        Any
+        Ket
        Ket
        End
 ------------------------------------------------------------------
-    Y
- 0: Y
+    Y!
+ 0: Y!
+ 1: !

 /(foo)\Kbar/
    foobar
@ -9302,4 +9303,86 @@ Forced newline sequence: CR
 First char = 'a'
 Need char = 'b'

+/(?<a>)(?&)/
+Failed: subpattern name expected at offset 9
+
+/(?<abc>)(?&a)/
+Failed: reference to non-existent subpattern at offset 12
+
+/(?<a>)(?&aaaaaaaaaaaaaaaaaaaaaaa)/
+Failed: reference to non-existent subpattern at offset 32
+
+/(?+-a)/
+Failed: digit expected after (?+ at offset 3
+
+/(?-+a)/
+Failed: unrecognized character after (? or (?- at offset 3
+
+/(?(-1))/
+Failed: reference to non-existent subpattern at offset 6
+
+/(?(+10))/
+Failed: reference to non-existent subpattern at offset 7
+
+/(?(10))/
+Failed: reference to non-existent subpattern at offset 6
+
+/(?(+2))()()/
+
+/(?(2))()()/
+
+/\k''/
+Failed: subpattern name expected at offset 3
+
+/\k<>/
+Failed: subpattern name expected at offset 3
+
+/\k{}/
+Failed: subpattern name expected at offset 3
+
+/(?P=)/
+Failed: subpattern name expected at offset 4
+
+/(?P>)/
+Failed: subpattern name expected at offset 4
+
+/(?!\w)(?R)/
+Failed: recursive call could loop indefinitely at offset 9
+
+/(?=\w)(?R)/
+Failed: recursive call could loop indefinitely at offset 9
+
+/(?<!\w)(?R)/
+Failed: recursive call could loop indefinitely at offset 10
+
+/(?<=\w)(?R)/
+Failed: recursive call could loop indefinitely at offset 10
+
+/[[:foo:]]/
+Failed: unknown POSIX class name at offset 3
+
+/[[:1234:]]/
+Failed: unknown POSIX class name at offset 3
+
+/[[:f\oo:]]/
+Failed: unknown POSIX class name at offset 3
+
+/[[: :]]/
+Failed: unknown POSIX class name at offset 3
+
+/[[:...:]]/
+Failed: unknown POSIX class name at offset 3
+
+/[[:l\ower:]]/
+Failed: unknown POSIX class name at offset 3
+
+/[[:abc\:]]/
+Failed: unknown POSIX class name at offset 3
+
+/[abc[:x\]pqr:]]/
+Failed: unknown POSIX class name at offset 6
+
+/[[:a\dz:]]/
+Failed: unknown POSIX class name at offset 3
+
 / End of testinput2 /
--- a/ext/pcre/pcrelib/testdata/testoutput4
+++ b/ext/pcre/pcrelib/testdata/testoutput4
@ -938,4 +938,135 @@ No match
 0: +\x{a3}
 0: ==

+/\S/8g
+    \x{442}\x{435}\x{441}\x{442}
+ 0: \x{442}
+ 0: \x{435}
+ 0: \x{441}
+ 0: \x{442}
+
+/[\S]/8g
+    \x{442}\x{435}\x{441}\x{442}
+ 0: \x{442}
+ 0: \x{435}
+ 0: \x{441}
+ 0: \x{442}
+
+/\D/8g
+    \x{442}\x{435}\x{441}\x{442}
+ 0: \x{442}
+ 0: \x{435}
+ 0: \x{441}
+ 0: \x{442}
+
+/[\D]/8g
+    \x{442}\x{435}\x{441}\x{442}
+ 0: \x{442}
+ 0: \x{435}
+ 0: \x{441}
+ 0: \x{442}
+
+/\W/8g
+    \x{2442}\x{2435}\x{2441}\x{2442}
+ 0: \x{2442}
+ 0: \x{2435}
+ 0: \x{2441}
+ 0: \x{2442}
+
+/[\W]/8g
+    \x{2442}\x{2435}\x{2441}\x{2442}
+ 0: \x{2442}
+ 0: \x{2435}
+ 0: \x{2441}
+ 0: \x{2442}
+    
+/[\S\s]*/8
+    abc\n\r\x{442}\x{435}\x{441}\x{442}xyz 
+ 0: abc\x{0a}\x{0d}\x{442}\x{435}\x{441}\x{442}xyz
+
+/[\x{41f}\S]/8g
+    \x{442}\x{435}\x{441}\x{442}
+ 0: \x{442}
+ 0: \x{435}
+ 0: \x{441}
+ 0: \x{442}
+
+/.[^\S]./8g
+    abc def\x{442}\x{443}xyz\npqr
+ 0: c d
+ 0: z\x{0a}p
+
+/.[^\S\n]./8g
+    abc def\x{442}\x{443}xyz\npqr
+ 0: c d
+
+/[[:^alnum:]]/8g  
+    +\x{2442}
+ 0: +
+ 0: \x{2442}
+    
+/[[:^alpha:]]/8g 
+    +\x{2442}
+ 0: +
+ 0: \x{2442}
+    
+/[[:^ascii:]]/8g 
+    A\x{442}
+ 0: \x{442}
+    
+/[[:^blank:]]/8g 
+    A\x{442}
+ 0: A
+ 0: \x{442}
+    
+/[[:^cntrl:]]/8g 
+    A\x{442}
+ 0: A
+ 0: \x{442}
+    
+/[[:^digit:]]/8g 
+    A\x{442}
+ 0: A
+ 0: \x{442}
+    
+/[[:^graph:]]/8g 
+    \x19\x{e01ff}
+ 0: \x{19}
+ 0: \x{e01ff}
+    
+/[[:^lower:]]/8g 
+    A\x{422}
+ 0: A
+ 0: \x{422}
+    
+/[[:^print:]]/8g 
+    \x{19}\x{e01ff}
+ 0: \x{19}
+ 0: \x{e01ff}
+    
+/[[:^punct:]]/8g 
+    A\x{442}
+ 0: A
+ 0: \x{442}
+    
+/[[:^space:]]/8g 
+    A\x{442}
+ 0: A
+ 0: \x{442}
+    
+/[[:^upper:]]/8g 
+    a\x{442}
+ 0: a
+ 0: \x{442}
+    
+/[[:^word:]]/8g  
+    +\x{2442}
+ 0: +
+ 0: \x{2442}
+    
+/[[:^xdigit:]]/8g
+    M\x{442}
+ 0: M
+ 0: \x{442}
+
 / End of testinput4 /
--- a/ext/pcre/pcrelib/testdata/testoutput5
+++ b/ext/pcre/pcrelib/testdata/testoutput5
@ -1595,4 +1595,17 @@ No match
    a\x0bb\<bsr_anycrlf>
 No match
 
+/.*a.*=.b.*/8<ANY>
+    QQQ\x{2029}ABCaXYZ=!bPQR
+ 0: ABCaXYZ=!bPQR
+    ** Failers
+No match
+    a\x{2029}b
+No match
+    \x61\xe2\x80\xa9\x62 
+No match
+
+/[[:a\x{100}b:]]/8
+Failed: unknown POSIX class name at offset 3
+
 / End of testinput5 /
--- a/ext/pcre/pcrelib/testdata/testoutput6
+++ b/ext/pcre/pcrelib/testdata/testoutput6
@ -1522,4 +1522,161 @@ No match

 /(\p{Yi}{0,3}+\277)*/

+/^[\p{Arabic}]/8
+    \x{60e} 
+ 0: \x{60e}
+    \x{656} 
+ 0: \x{656}
+    \x{657} 
+ 0: \x{657}
+    \x{658} 
+ 0: \x{658}
+    \x{659} 
+ 0: \x{659}
+    \x{65a} 
+ 0: \x{65a}
+    \x{65b} 
+ 0: \x{65b}
+    \x{65c} 
+ 0: \x{65c}
+    \x{65d} 
+ 0: \x{65d}
+    \x{65e} 
+ 0: \x{65e}
+    \x{66a} 
+ 0: \x{66a}
+    \x{6e9} 
+ 0: \x{6e9}
+    \x{6ef}
+ 0: \x{6ef}
+    \x{6fa}  
+ 0: \x{6fa}
+    ** Failers
+No match
+    \x{600}
+No match
+    \x{650}
+No match
+    \x{651}  
+No match
+    \x{652}  
+No match
+    \x{653}  
+No match
+    \x{654} 
+No match
+    \x{655} 
+No match
+    \x{65f}  
+No match
+    
+/^\p{Cyrillic}/8
+    \x{1d2b} 
+ 0: \x{1d2b}
+    
+/^\p{Common}/8
+    \x{589}
+ 0: \x{589}
+    \x{60c}
+ 0: \x{60c}
+    \x{61f}  
+ 0: \x{61f}
+    \x{964}
+ 0: \x{964}
+    \x{965}  
+ 0: \x{965}
+    \x{970}  
+ 0: \x{970}
+
+/^\p{Inherited}/8
+    \x{64b}
+ 0: \x{64b}
+    \x{654}
+ 0: \x{654}
+    \x{655}
+ 0: \x{655}
+    \x{200c} 
+ 0: \x{200c}
+    ** Failers
+No match
+    \x{64a}
+No match
+    \x{656}     
+No match
+
+/^\p{Shavian}/8
+    \x{10450}
+ 0: \x{10450}
+    \x{1047f}
+ 0: \x{1047f}
+    
+/^\p{Deseret}/8
+    \x{10400}
+ 0: \x{10400}
+    \x{1044f}
+ 0: \x{1044f}
+    
+/^\p{Osmanya}/8
+    \x{10480}
+ 0: \x{10480}
+    \x{1049d}
+ 0: \x{1049d}
+    \x{104a0}
+ 0: \x{104a0}
+    \x{104a9}
+ 0: \x{104a9}
+    ** Failers
+No match
+    \x{1049e}
+No match
+    \x{1049f}
+No match
+    \x{104aa}           
+No match
+
+/\p{Zl}{2,3}+/8BZ
+------------------------------------------------------------------
+        Bra
+        prop Zl {2}
+        prop Zl ?+
+        Ket
+        End
+------------------------------------------------------------------
+    \xe2\x80\xa8\xe2\x80\xa8
+ 0: \x{2028}\x{2028}
+    \x{2028}\x{2028}\x{2028}
+ 0: \x{2028}\x{2028}\x{2028}
+    
+/\p{Zl}/8BZ
+------------------------------------------------------------------
+        Bra
+        prop Zl
+        Ket
+        End
+------------------------------------------------------------------
+
+/\p{Lu}{3}+/8BZ
+------------------------------------------------------------------
+        Bra
+        prop Lu {3}
+        Ket
+        End
+------------------------------------------------------------------
+
+/\pL{2}+/8BZ
+------------------------------------------------------------------
+        Bra
+        prop L {2}
+        Ket
+        End
+------------------------------------------------------------------
+
+/\p{Cc}{2}+/8BZ
+------------------------------------------------------------------
+        Bra
+        prop Cc {2}
+        Ket
+        End
+------------------------------------------------------------------
+
 / End of testinput6 /
--- a/ext/pcre/pcrelib/ucptable.h
+++ b/ext/pcre/pcrelib/ucptable.h
@ -539,7 +539,8 @@ static const cnode ucp_table[] = {
  { 0x21000293, 0x14000000 },
  { 0x21000294, 0x1c000000 },
  { 0x21800295, 0x1400001a },
-  { 0x218002b0, 0x18000011 },
+  { 0x218002b0, 0x18000008 },
+  { 0x098002b9, 0x18000008 },
  { 0x098002c2, 0x60000003 },
  { 0x098002c6, 0x1800000b },
  { 0x098002d2, 0x6000000d },
@ -1039,15 +1040,18 @@ static const cnode ucp_table[] = {
  { 0x198005f3, 0x54000001 },
  { 0x09800600, 0x04000003 },
  { 0x0000060b, 0x5c000000 },
-  { 0x0980060c, 0x54000001 },
+  { 0x0900060c, 0x54000000 },
+  { 0x0000060d, 0x54000000 },
  { 0x0080060e, 0x68000001 },
  { 0x00800610, 0x30000005 },
  { 0x0900061b, 0x54000000 },
-  { 0x0080061e, 0x54000001 },
+  { 0x0000061e, 0x54000000 },
+  { 0x0900061f, 0x54000000 },
  { 0x00800621, 0x1c000019 },
  { 0x09000640, 0x18000000 },
  { 0x00800641, 0x1c000009 },
-  { 0x1b80064b, 0x30000013 },
+  { 0x1b80064b, 0x3000000a },
+  { 0x00800656, 0x30000008 },
  { 0x09800660, 0x34000009 },
  { 0x0080066a, 0x54000003 },
  { 0x0080066e, 0x1c000001 },
@ -1074,7 +1078,8 @@ static const cnode ucp_table[] = {
  { 0x31000711, 0x30000000 },
  { 0x31800712, 0x1c00001d },
  { 0x31800730, 0x3000001a },
-  { 0x3180074d, 0x1c000020 },
+  { 0x3180074d, 0x1c000002 },
+  { 0x00800750, 0x1c00001d },
  { 0x37800780, 0x1c000025 },
  { 0x378007a6, 0x3000000a },
  { 0x370007b1, 0x1c000000 },
@ -1460,7 +1465,10 @@ static const cnode ucp_table[] = {
  { 0x1f0017dd, 0x30000000 },
  { 0x1f8017e0, 0x34000009 },
  { 0x1f8017f0, 0x3c000009 },
-  { 0x25801800, 0x54000005 },
+  { 0x25801800, 0x54000001 },
+  { 0x09801802, 0x54000001 },
+  { 0x25001804, 0x54000000 },
+  { 0x09001805, 0x54000000 },
  { 0x25001806, 0x44000000 },
  { 0x25801807, 0x54000003 },
  { 0x2580180b, 0x30000002 },
@ -1513,14 +1521,20 @@ static const cnode ucp_table[] = {
  { 0x3d801b61, 0x68000009 },
  { 0x3d801b6b, 0x30000008 },
  { 0x3d801b74, 0x68000008 },
-  { 0x21801d00, 0x1400002b },
-  { 0x21801d2c, 0x18000035 },
-  { 0x21801d62, 0x14000015 },
+  { 0x21801d00, 0x14000025 },
+  { 0x13801d26, 0x14000004 },
+  { 0x0c001d2b, 0x14000000 },
+  { 0x21801d2c, 0x18000030 },
+  { 0x13801d5d, 0x18000004 },
+  { 0x21801d62, 0x14000003 },
+  { 0x13801d66, 0x14000004 },
+  { 0x21801d6b, 0x1400000c },
  { 0x0c001d78, 0x18000000 },
  { 0x21801d79, 0x14000003 },
  { 0x21001d7d, 0x14000ee6 },
  { 0x21801d7e, 0x1400001c },
-  { 0x21801d9b, 0x18000024 },
+  { 0x21801d9b, 0x18000023 },
+  { 0x13001dbf, 0x18000000 },
  { 0x1b801dc0, 0x3000000a },
  { 0x1b801dfe, 0x30000001 },
  { 0x21001e00, 0x24000001 },
@ -1982,7 +1996,9 @@ static const cnode ucp_table[] = {
  { 0x13001ffc, 0x2000fff7 },
  { 0x13801ffd, 0x60000001 },
  { 0x09802000, 0x7400000a },
-  { 0x0980200b, 0x04000004 },
+  { 0x0900200b, 0x04000000 },
+  { 0x1b80200c, 0x04000001 },
+  { 0x0980200e, 0x04000001 },
  { 0x09802010, 0x44000005 },
  { 0x09802016, 0x54000001 },
  { 0x09002018, 0x50000000 },
@ -2615,7 +2631,8 @@ static const cnode ucp_table[] = {
  { 0x090030a0, 0x44000000 },
  { 0x1d8030a1, 0x1c000059 },
  { 0x090030fb, 0x54000000 },
-  { 0x098030fc, 0x18000002 },
+  { 0x090030fc, 0x18000000 },
+  { 0x1d8030fd, 0x18000001 },
  { 0x1d0030ff, 0x1c000000 },
  { 0x03803105, 0x1c000027 },
  { 0x17803131, 0x1c00005d },
@ -2630,7 +2647,8 @@ static const cnode ucp_table[] = {
  { 0x0980322a, 0x68000019 },
  { 0x09003250, 0x68000000 },
  { 0x09803251, 0x3c00000e },
-  { 0x17803260, 0x6800001f },
+  { 0x17803260, 0x6800001d },
+  { 0x0980327e, 0x68000001 },
  { 0x09803280, 0x3c000009 },
  { 0x0980328a, 0x68000026 },
  { 0x098032b1, 0x3c00000e },
@ -2678,7 +2696,8 @@ static const cnode ucp_table[] = {
  { 0x1900fb3e, 0x1c000000 },
  { 0x1980fb40, 0x1c000001 },
  { 0x1980fb43, 0x1c000001 },
-  { 0x1980fb46, 0x1c00006b },
+  { 0x1980fb46, 0x1c000009 },
+  { 0x0080fb50, 0x1c000061 },
  { 0x0080fbd3, 0x1c00016a },
  { 0x0900fd3e, 0x58000000 },
  { 0x0900fd3f, 0x48000000 },
@ -2944,7 +2963,8 @@ static const cnode ucp_table[] = {
  { 0x0d01044d, 0x1400ffd8 },
  { 0x0d01044e, 0x1400ffd8 },
  { 0x0d01044f, 0x1400ffd8 },
-  { 0x2e810450, 0x1c00004d },
+  { 0x2e810450, 0x1c00002f },
+  { 0x2c810480, 0x1c00001d },
  { 0x2c8104a0, 0x34000009 },
  { 0x0b810800, 0x1c000005 },
  { 0x0b010808, 0x1c000000 },