mirror of
https://github.com/php/php-src.git
synced 2025-01-22 11:44:09 +08:00
*** empty log message ***
This commit is contained in:
parent
97c9603b02
commit
beb6916cfc
@ -1,519 +0,0 @@
|
||||
ChangeLog for PCRE
|
||||
------------------
|
||||
|
||||
|
||||
Version 2.08 31-Aug-99
|
||||
----------------------
|
||||
|
||||
1. When startoffset was not zero and the pattern began with ".*", PCRE was not
|
||||
trying to match at the startoffset position, but instead was moving forward to
|
||||
the next newline as if a previous match had failed.
|
||||
|
||||
2. pcretest was not making use of PCRE_NOTEMPTY when repeating for /g and /G,
|
||||
and could get into a loop if a null string was matched other than at the start
|
||||
of the subject.
|
||||
|
||||
3. Added definitions of PCRE_MAJOR and PCRE_MINOR to pcre.h so the version can
|
||||
be distinguished at compile time, and for completeness also added PCRE_DATE.
|
||||
|
||||
5. Added Paul Sokolovsky's minor changes to make it easy to compile a Win32 DLL
|
||||
in GnuWin32 environments.
|
||||
|
||||
|
||||
Version 2.07 29-Jul-99
|
||||
----------------------
|
||||
|
||||
1. The documentation is now supplied in plain text form and HTML as well as in
|
||||
the form of man page sources.
|
||||
|
||||
2. C++ compilers don't like assigning (void *) values to other pointer types.
|
||||
In particular this affects malloc(). Although there is no problem in Standard
|
||||
C, I've put in casts to keep C++ compilers happy.
|
||||
|
||||
3. Typo on pcretest.c; a cast of (unsigned char *) in the POSIX regexec() call
|
||||
should be (const char *).
|
||||
|
||||
4. If NOPOSIX is defined, pcretest.c compiles without POSIX support. This may
|
||||
be useful for non-Unix systems who don't want to bother with the POSIX stuff.
|
||||
However, I haven't made this a standard facility. The documentation doesn't
|
||||
mention it, and the Makefile doesn't support it.
|
||||
|
||||
5. The Makefile now contains an "install" target, with editable destinations at
|
||||
the top of the file. The pcretest program is not installed.
|
||||
|
||||
6. pgrep -V now gives the PCRE version number and date.
|
||||
|
||||
7. Fixed bug: a zero repetition after a literal string (e.g. /abcde{0}/) was
|
||||
causing the entire string to be ignored, instead of just the last character.
|
||||
|
||||
8. If a pattern like /"([^\\"]+|\\.)*"/ is applied in the normal way to a
|
||||
non-matching string, it can take a very, very long time, even for strings of
|
||||
quite modest length, because of the nested recursion. PCRE now does better in
|
||||
some of these cases. It does this by remembering the last required literal
|
||||
character in the pattern, and pre-searching the subject to ensure it is present
|
||||
before running the real match. In other words, it applies a heuristic to detect
|
||||
some types of certain failure quickly, and in the above example, if presented
|
||||
with a string that has no trailing " it gives "no match" very quickly.
|
||||
|
||||
9. A new runtime option PCRE_NOTEMPTY causes null string matches to be ignored;
|
||||
other alternatives are tried instead.
|
||||
|
||||
|
||||
Version 2.06 09-Jun-99
|
||||
----------------------
|
||||
|
||||
1. Change pcretest's output for amount of store used to show just the code
|
||||
space, because the remainder (the data block) varies in size between 32-bit and
|
||||
64-bit systems.
|
||||
|
||||
2. Added an extra argument to pcre_exec() to supply an offset in the subject to
|
||||
start matching at. This allows lookbehinds to work when searching for multiple
|
||||
occurrences in a string.
|
||||
|
||||
3. Added additional options to pcretest for testing multiple occurrences:
|
||||
|
||||
/+ outputs the rest of the string that follows a match
|
||||
/g loops for multiple occurrences, using the new startoffset argument
|
||||
/G loops for multiple occurrences by passing an incremented pointer
|
||||
|
||||
4. PCRE wasn't doing the "first character" optimization for patterns starting
|
||||
with \b or \B, though it was doing it for other lookbehind assertions. That is,
|
||||
it wasn't noticing that a match for a pattern such as /\bxyz/ has to start with
|
||||
the letter 'x'. On long subject strings, this gives a significant speed-up.
|
||||
|
||||
|
||||
Version 2.05 21-Apr-99
|
||||
----------------------
|
||||
|
||||
1. Changed the type of magic_number from int to long int so that it works
|
||||
properly on 16-bit systems.
|
||||
|
||||
2. Fixed a bug which caused patterns starting with .* not to work correctly
|
||||
when the subject string contained newline characters. PCRE was assuming
|
||||
anchoring for such patterns in all cases, which is not correct because .* will
|
||||
not pass a newline unless PCRE_DOTALL is set. It now assumes anchoring only if
|
||||
DOTALL is set at top level; otherwise it knows that patterns starting with .*
|
||||
must be retried after every newline in the subject.
|
||||
|
||||
|
||||
Version 2.04 18-Feb-99
|
||||
----------------------
|
||||
|
||||
1. For parenthesized subpatterns with repeats whose minimum was zero, the
|
||||
computation of the store needed to hold the pattern was incorrect (too large).
|
||||
If such patterns were nested a few deep, this could multiply and become a real
|
||||
problem.
|
||||
|
||||
2. Added /M option to pcretest to show the memory requirement of a specific
|
||||
pattern. Made -m a synonym of -s (which does this globally) for compatibility.
|
||||
|
||||
3. Subpatterns of the form (regex){n,m} (i.e. limited maximum) were being
|
||||
compiled in such a way that the backtracking after subsequent failure was
|
||||
pessimal. Something like (a){0,3} was compiled as (a)?(a)?(a)? instead of
|
||||
((a)((a)(a)?)?)? with disastrous performance if the maximum was of any size.
|
||||
|
||||
|
||||
Version 2.03 02-Feb-99
|
||||
----------------------
|
||||
|
||||
1. Fixed typo and small mistake in man page.
|
||||
|
||||
2. Added 4th condition (GPL supersedes if conflict) and created separate
|
||||
LICENCE file containing the conditions.
|
||||
|
||||
3. Updated pcretest so that patterns such as /abc\/def/ work like they do in
|
||||
Perl, that is the internal \ allows the delimiter to be included in the
|
||||
pattern. Locked out the use of \ as a delimiter. If \ immediately follows
|
||||
the final delimiter, add \ to the end of the pattern (to test the error).
|
||||
|
||||
4. Added the convenience functions for extracting substrings after a successful
|
||||
match. Updated pcretest to make it able to test these functions.
|
||||
|
||||
|
||||
Version 2.02 14-Jan-99
|
||||
----------------------
|
||||
|
||||
1. Initialized the working variables associated with each extraction so that
|
||||
their saving and restoring doesn't refer to uninitialized store.
|
||||
|
||||
2. Put dummy code into study.c in order to trick the optimizer of the IBM C
|
||||
compiler for OS/2 into generating correct code. Apparently IBM isn't going to
|
||||
fix the problem.
|
||||
|
||||
3. Pcretest: the timing code wasn't using LOOPREPEAT for timing execution
|
||||
calls, and wasn't printing the correct value for compiling calls. Increased the
|
||||
default value of LOOPREPEAT, and the number of significant figures in the
|
||||
times.
|
||||
|
||||
4. Changed "/bin/rm" in the Makefile to "-rm" so it works on Windows NT.
|
||||
|
||||
5. Renamed "deftables" as "dftables" to get it down to 8 characters, to avoid
|
||||
a building problem on Windows NT with a FAT file system.
|
||||
|
||||
|
||||
Version 2.01 21-Oct-98
|
||||
----------------------
|
||||
|
||||
1. Changed the API for pcre_compile() to allow for the provision of a pointer
|
||||
to character tables built by pcre_maketables() in the current locale. If NULL
|
||||
is passed, the default tables are used.
|
||||
|
||||
|
||||
Version 2.00 24-Sep-98
|
||||
----------------------
|
||||
|
||||
1. Since the (>?) facility is in Perl 5.005, don't require PCRE_EXTRA to enable
|
||||
it any more.
|
||||
|
||||
2. Allow quantification of (?>) groups, and make it work correctly.
|
||||
|
||||
3. The first character computation wasn't working for (?>) groups.
|
||||
|
||||
4. Correct the implementation of \Z (it is permitted to match on the \n at the
|
||||
end of the subject) and add 5.005's \z, which really does match only at the
|
||||
very end of the subject.
|
||||
|
||||
5. Remove the \X "cut" facility; Perl doesn't have it, and (?> is neater.
|
||||
|
||||
6. Remove the ability to specify CASELESS, MULTILINE, DOTALL, and
|
||||
DOLLAR_END_ONLY at runtime, to make it possible to implement the Perl 5.005
|
||||
localized options. All options to pcre_study() were also removed.
|
||||
|
||||
7. Add other new features from 5.005:
|
||||
|
||||
$(?<= positive lookbehind
|
||||
$(?<! negative lookbehind
|
||||
(?imsx-imsx) added the unsetting capability
|
||||
such a setting is global if at outer level; local otherwise
|
||||
(?imsx-imsx:) non-capturing groups with option setting
|
||||
(?(cond)re|re) conditional pattern matching
|
||||
|
||||
A backreference to itself in a repeated group matches the previous
|
||||
captured string.
|
||||
|
||||
8. General tidying up of studying (both automatic and via "study")
|
||||
consequential on the addition of new assertions.
|
||||
|
||||
9. As in 5.005, unlimited repeated groups that could match an empty substring
|
||||
are no longer faulted at compile time. Instead, the loop is forcibly broken at
|
||||
runtime if any iteration does actually match an empty substring.
|
||||
|
||||
10. Include the RunTest script in the distribution.
|
||||
|
||||
11. Added tests from the Perl 5.005_02 distribution. This showed up a few
|
||||
discrepancies, some of which were old and were also with respect to 5.004. They
|
||||
have now been fixed.
|
||||
|
||||
|
||||
Version 1.09 28-Apr-98
|
||||
----------------------
|
||||
|
||||
1. A negated single character class followed by a quantifier with a minimum
|
||||
value of one (e.g. [^x]{1,6} ) was not compiled correctly. This could lead to
|
||||
program crashes, or just wrong answers. This did not apply to negated classes
|
||||
containing more than one character, or to minima other than one.
|
||||
|
||||
|
||||
Version 1.08 27-Mar-98
|
||||
----------------------
|
||||
|
||||
1. Add PCRE_UNGREEDY to invert the greediness of quantifiers.
|
||||
|
||||
2. Add (?U) and (?X) to set PCRE_UNGREEDY and PCRE_EXTRA respectively. The
|
||||
latter must appear before anything that relies on it in the pattern.
|
||||
|
||||
|
||||
Version 1.07 16-Feb-98
|
||||
----------------------
|
||||
|
||||
1. A pattern such as /((a)*)*/ was not being diagnosed as in error (unlimited
|
||||
repeat of a potentially empty string).
|
||||
|
||||
|
||||
Version 1.06 23-Jan-98
|
||||
----------------------
|
||||
|
||||
1. Added Markus Oberhumer's little patches for C++.
|
||||
|
||||
2. Literal strings longer than 255 characters were broken.
|
||||
|
||||
|
||||
Version 1.05 23-Dec-97
|
||||
----------------------
|
||||
|
||||
1. Negated character classes containing more than one character were failing if
|
||||
PCRE_CASELESS was set at run time.
|
||||
|
||||
|
||||
Version 1.04 19-Dec-97
|
||||
----------------------
|
||||
|
||||
1. Corrected the man page, where some "const" qualifiers had been omitted.
|
||||
|
||||
2. Made debugging output print "{0,xxx}" instead of just "{,xxx}" to agree with
|
||||
input syntax.
|
||||
|
||||
3. Fixed memory leak which occurred when a regex with back references was
|
||||
matched with an offsets vector that wasn't big enough. The temporary memory
|
||||
that is used in this case wasn't being freed if the match failed.
|
||||
|
||||
4. Tidied pcretest to ensure it frees memory that it gets.
|
||||
|
||||
5. Temporary memory was being obtained in the case where the passed offsets
|
||||
vector was exactly big enough.
|
||||
|
||||
6. Corrected definition of offsetof() from change 5 below.
|
||||
|
||||
7. I had screwed up change 6 below and broken the rules for the use of
|
||||
setjmp(). Now fixed.
|
||||
|
||||
|
||||
Version 1.03 18-Dec-97
|
||||
----------------------
|
||||
|
||||
1. A erroneous regex with a missing opening parenthesis was correctly
|
||||
diagnosed, but PCRE attempted to access brastack[-1], which could cause crashes
|
||||
on some systems.
|
||||
|
||||
2. Replaced offsetof(real_pcre, code) by offsetof(real_pcre, code[0]) because
|
||||
it was reported that one broken compiler failed on the former because "code" is
|
||||
also an independent variable.
|
||||
|
||||
3. The erroneous regex a[]b caused an array overrun reference.
|
||||
|
||||
4. A regex ending with a one-character negative class (e.g. /[^k]$/) did not
|
||||
fail on data ending with that character. (It was going on too far, and checking
|
||||
the next character, typically a binary zero.) This was specific to the
|
||||
optimized code for single-character negative classes.
|
||||
|
||||
5. Added a contributed patch from the TIN world which does the following:
|
||||
|
||||
+ Add an undef for memmove, in case the the system defines a macro for it.
|
||||
|
||||
+ Add a definition of offsetof(), in case there isn't one. (I don't know
|
||||
the reason behind this - offsetof() is part of the ANSI standard - but
|
||||
it does no harm).
|
||||
|
||||
+ Reduce the ifdef's in pcre.c using macro DPRINTF, thereby eliminating
|
||||
most of the places where whitespace preceded '#'. I have given up and
|
||||
allowed the remaining 2 cases to be at the margin.
|
||||
|
||||
+ Rename some variables in pcre to eliminate shadowing. This seems very
|
||||
pedantic, but does no harm, of course.
|
||||
|
||||
6. Moved the call to setjmp() into its own function, to get rid of warnings
|
||||
from gcc -Wall, and avoided calling it at all unless PCRE_EXTRA is used.
|
||||
|
||||
7. Constructs such as \d{8,} were compiling into the equivalent of
|
||||
\d{8}\d{0,65527} instead of \d{8}\d* which didn't make much difference to the
|
||||
outcome, but in this particular case used more store than had been allocated,
|
||||
which caused the bug to be discovered because it threw up an internal error.
|
||||
|
||||
8. The debugging code in both pcre and pcretest for outputting the compiled
|
||||
form of a regex was going wrong in the case of back references followed by
|
||||
curly-bracketed repeats.
|
||||
|
||||
|
||||
Version 1.02 12-Dec-97
|
||||
----------------------
|
||||
|
||||
1. Typos in pcre.3 and comments in the source fixed.
|
||||
|
||||
2. Applied a contributed patch to get rid of places where it used to remove
|
||||
'const' from variables, and fixed some signed/unsigned and uninitialized
|
||||
variable warnings.
|
||||
|
||||
3. Added the "runtest" target to Makefile.
|
||||
|
||||
4. Set default compiler flag to -O2 rather than just -O.
|
||||
|
||||
|
||||
Version 1.01 19-Nov-97
|
||||
----------------------
|
||||
|
||||
1. PCRE was failing to diagnose unlimited repeat of empty string for patterns
|
||||
like /([ab]*)*/, that is, for classes with more than one character in them.
|
||||
|
||||
2. Likewise, it wasn't diagnosing patterns with "once-only" subpatterns, such
|
||||
as /((?>a*))*/ (a PCRE_EXTRA facility).
|
||||
|
||||
|
||||
Version 1.00 18-Nov-97
|
||||
----------------------
|
||||
|
||||
1. Added compile-time macros to support systems such as SunOS4 which don't have
|
||||
memmove() or strerror() but have other things that can be used instead.
|
||||
|
||||
2. Arranged that "make clean" removes the executables.
|
||||
|
||||
|
||||
Version 0.99 27-Oct-97
|
||||
----------------------
|
||||
|
||||
1. Fixed bug in code for optimizing classes with only one character. It was
|
||||
initializing a 32-byte map regardless, which could cause it to run off the end
|
||||
of the memory it had got.
|
||||
|
||||
2. Added, conditional on PCRE_EXTRA, the proposed (?>REGEX) construction.
|
||||
|
||||
|
||||
Version 0.98 22-Oct-97
|
||||
----------------------
|
||||
|
||||
1. Fixed bug in code for handling temporary memory usage when there are more
|
||||
back references than supplied space in the ovector. This could cause segfaults.
|
||||
|
||||
|
||||
Version 0.97 21-Oct-97
|
||||
----------------------
|
||||
|
||||
1. Added the \X "cut" facility, conditional on PCRE_EXTRA.
|
||||
|
||||
2. Optimized negated single characters not to use a bit map.
|
||||
|
||||
3. Brought error texts together as macro definitions; clarified some of them;
|
||||
fixed one that was wrong - it said "range out of order" when it meant "invalid
|
||||
escape sequence".
|
||||
|
||||
4. Changed some char * arguments to const char *.
|
||||
|
||||
5. Added PCRE_NOTBOL and PCRE_NOTEOL (from POSIX).
|
||||
|
||||
6. Added the POSIX-style API wrapper in pcreposix.a and testing facilities in
|
||||
pcretest.
|
||||
|
||||
|
||||
Version 0.96 16-Oct-97
|
||||
----------------------
|
||||
|
||||
1. Added a simple "pgrep" utility to the distribution.
|
||||
|
||||
2. Fixed an incompatibility with Perl: "{" is now treated as a normal character
|
||||
unless it appears in one of the precise forms "{ddd}", "{ddd,}", or "{ddd,ddd}"
|
||||
where "ddd" means "one or more decimal digits".
|
||||
|
||||
3. Fixed serious bug. If a pattern had a back reference, but the call to
|
||||
pcre_exec() didn't supply a large enough ovector to record the related
|
||||
identifying subpattern, the match always failed. PCRE now remembers the number
|
||||
of the largest back reference, and gets some temporary memory in which to save
|
||||
the offsets during matching if necessary, in order to ensure that
|
||||
backreferences always work.
|
||||
|
||||
4. Increased the compatibility with Perl in a number of ways:
|
||||
|
||||
(a) . no longer matches \n by default; an option PCRE_DOTALL is provided
|
||||
to request this handling. The option can be set at compile or exec time.
|
||||
|
||||
(b) $ matches before a terminating newline by default; an option
|
||||
PCRE_DOLLAR_ENDONLY is provided to override this (but not in multiline
|
||||
mode). The option can be set at compile or exec time.
|
||||
|
||||
(c) The handling of \ followed by a digit other than 0 is now supposed to be
|
||||
the same as Perl's. If the decimal number it represents is less than 10
|
||||
or there aren't that many previous left capturing parentheses, an octal
|
||||
escape is read. Inside a character class, it's always an octal escape,
|
||||
even if it is a single digit.
|
||||
|
||||
(d) An escaped but undefined alphabetic character is taken as a literal,
|
||||
unless PCRE_EXTRA is set. Currently this just reserves the remaining
|
||||
escapes.
|
||||
|
||||
(e) {0} is now permitted. (The previous item is removed from the compiled
|
||||
pattern).
|
||||
|
||||
5. Changed all the names of code files so that the basic parts are no longer
|
||||
than 10 characters, and abolished the teeny "globals.c" file.
|
||||
|
||||
6. Changed the handling of character classes; they are now done with a 32-byte
|
||||
bit map always.
|
||||
|
||||
7. Added the -d and /D options to pcretest to make it possible to look at the
|
||||
internals of compilation without having to recompile pcre.
|
||||
|
||||
|
||||
Version 0.95 23-Sep-97
|
||||
----------------------
|
||||
|
||||
1. Fixed bug in pre-pass concerning escaped "normal" characters such as \x5c or
|
||||
\x20 at the start of a run of normal characters. These were being treated as
|
||||
real characters, instead of the source characters being re-checked.
|
||||
|
||||
|
||||
Version 0.94 18-Sep-97
|
||||
----------------------
|
||||
|
||||
1. The functions are now thread-safe, with the caveat that the global variables
|
||||
containing pointers to malloc() and free() or alternative functions are the
|
||||
same for all threads.
|
||||
|
||||
2. Get pcre_study() to generate a bitmap of initial characters for non-
|
||||
anchored patterns when this is possible, and use it if passed to pcre_exec().
|
||||
|
||||
|
||||
Version 0.93 15-Sep-97
|
||||
----------------------
|
||||
|
||||
1. /(b)|(:+)/ was computing an incorrect first character.
|
||||
|
||||
2. Add pcre_study() to the API and the passing of pcre_extra to pcre_exec(),
|
||||
but not actually doing anything yet.
|
||||
|
||||
3. Treat "-" characters in classes that cannot be part of ranges as literals,
|
||||
as Perl does (e.g. [-az] or [az-]).
|
||||
|
||||
4. Set the anchored flag if a branch starts with .* or .*? because that tests
|
||||
all possible positions.
|
||||
|
||||
5. Split up into different modules to avoid including unneeded functions in a
|
||||
compiled binary. However, compile and exec are still in one module. The "study"
|
||||
function is split off.
|
||||
|
||||
6. The character tables are now in a separate module whose source is generated
|
||||
by an auxiliary program - but can then be edited by hand if required. There are
|
||||
now no calls to isalnum(), isspace(), isdigit(), isxdigit(), tolower() or
|
||||
toupper() in the code.
|
||||
|
||||
7. Turn the malloc/free funtions variables into pcre_malloc and pcre_free and
|
||||
make them global. Abolish the function for setting them, as the caller can now
|
||||
set them directly.
|
||||
|
||||
|
||||
Version 0.92 11-Sep-97
|
||||
----------------------
|
||||
|
||||
1. A repeat with a fixed maximum and a minimum of 1 for an ordinary character
|
||||
(e.g. /a{1,3}/) was broken (I mis-optimized it).
|
||||
|
||||
2. Caseless matching was not working in character classes if the characters in
|
||||
the pattern were in upper case.
|
||||
|
||||
3. Make ranges like [W-c] work in the same way as Perl for caseless matching.
|
||||
|
||||
4. Make PCRE_ANCHORED public and accept as a compile option.
|
||||
|
||||
5. Add an options word to pcre_exec() and accept PCRE_ANCHORED and
|
||||
PCRE_CASELESS at run time. Add escapes \A and \I to pcretest to cause it to
|
||||
pass them.
|
||||
|
||||
6. Give an error if bad option bits passed at compile or run time.
|
||||
|
||||
7. Add PCRE_MULTILINE at compile and exec time, and (?m) as well. Add \M to
|
||||
pcretest to cause it to pass that flag.
|
||||
|
||||
8. Add pcre_info(), to get the number of identifying subpatterns, the stored
|
||||
options, and the first character, if set.
|
||||
|
||||
9. Recognize C+ or C{n,m} where n >= 1 as providing a fixed starting character.
|
||||
|
||||
|
||||
Version 0.91 10-Sep-97
|
||||
----------------------
|
||||
|
||||
1. PCRE was failing to diagnose unlimited repeats of subpatterns that could
|
||||
match the empty string as in /(a*)*/. It was looping and ultimately crashing.
|
||||
|
||||
2. PCRE was looping on encountering an indefinitely repeated back reference to
|
||||
a subpattern that had matched an empty string, e.g. /(a|)\1*/. It now does what
|
||||
Perl does - treats the match as successful.
|
||||
|
||||
****
|
@ -1,32 +0,0 @@
|
||||
PCRE LICENCE
|
||||
------------
|
||||
|
||||
PCRE is a library of functions to support regular expressions whose syntax
|
||||
and semantics are as close as possible to those of the Perl 5 language.
|
||||
|
||||
Written by: Philip Hazel <ph10@cam.ac.uk>
|
||||
|
||||
University of Cambridge Computing Service,
|
||||
Cambridge, England. Phone: +44 1223 334714.
|
||||
|
||||
Copyright (c) 1997-1999 University of Cambridge
|
||||
|
||||
Permission is granted to anyone to use this software for any purpose on any
|
||||
computer system, and to redistribute it freely, subject to the following
|
||||
restrictions:
|
||||
|
||||
1. This software is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
|
||||
|
||||
2. The origin of this software must not be misrepresented, either by
|
||||
explicit claim or by omission.
|
||||
|
||||
3. Altered versions must be plainly marked as such, and must not be
|
||||
misrepresented as being the original software.
|
||||
|
||||
4. If PCRE is embedded in any software that is released under the GNU
|
||||
General Purpose Licence (GPL), then the terms of that licence shall
|
||||
supersede any condition above with which it is incompatible.
|
||||
|
||||
End
|
@ -1,11 +0,0 @@
|
||||
|
||||
DEPTH = ../../..
|
||||
topsrcdir = @topsrcdir@
|
||||
srcdir = @srcdir@
|
||||
VPATH = @srcdir@
|
||||
|
||||
LTLIBRARY_NAME = libpcre.la
|
||||
LTLIBRARY_SOURCES = maketables.c get.c study.c pcre.c
|
||||
|
||||
include $(topsrcdir)/build/ltlib.mk
|
||||
|
@ -1,416 +0,0 @@
|
||||
README file for PCRE (Perl-compatible regular expressions)
|
||||
----------------------------------------------------------
|
||||
|
||||
*******************************************************************************
|
||||
* IMPORTANT FOR THOSE UPGRADING FROM VERSIONS BEFORE 2.00 *
|
||||
* *
|
||||
* Please note that there has been a change in the API such that a larger *
|
||||
* ovector is required at matching time, to provide some additional workspace. *
|
||||
* The new man page has details. This change was necessary in order to support *
|
||||
* some of the new functionality in Perl 5.005. *
|
||||
* *
|
||||
* IMPORTANT FOR THOSE UPGRADING FROM VERSION 2.00 *
|
||||
* *
|
||||
* Another (I hope this is the last!) change has been made to the API for the *
|
||||
* pcre_compile() function. An additional argument has been added to make it *
|
||||
* possible to pass over a pointer to character tables built in the current *
|
||||
* locale by pcre_maketables(). To use the default tables, this new arguement *
|
||||
* should be passed as NULL. *
|
||||
* *
|
||||
* IMPORTANT FOR THOSE UPGRADING FROM VERSION 2.05 *
|
||||
* *
|
||||
* Yet another (and again I hope this really is the last) change has been made *
|
||||
* to the API for the pcre_exec() function. An additional argument has been *
|
||||
* added to make it possible to start the match other than at the start of the *
|
||||
* subject string. This is important if there are lookbehinds. The new man *
|
||||
* page has the details, but you just want to convert existing programs, all *
|
||||
* you need to do is to stick in a new fifth argument to pcre_exec(), with a *
|
||||
* value of zero. For example, change *
|
||||
* *
|
||||
* pcre_exec(pattern, extra, subject, length, options, ovec, ovecsize) *
|
||||
* to *
|
||||
* pcre_exec(pattern, extra, subject, length, 0, options, ovec, ovecsize) *
|
||||
*******************************************************************************
|
||||
|
||||
|
||||
The distribution should contain the following files:
|
||||
|
||||
ChangeLog log of changes to the code
|
||||
LICENCE conditions for the use of PCRE
|
||||
Makefile for building PCRE in Unix systems
|
||||
README this file
|
||||
RunTest a Unix shell script for running tests
|
||||
Tech.Notes notes on the encoding
|
||||
pcre.3 man page source for the functions
|
||||
pcre.3.txt plain text version
|
||||
pcre.3.html HTML version
|
||||
pcreposix.3 man page source for the POSIX wrapper API
|
||||
pcreposix.3.txt plain text version
|
||||
pcreposix.3.HTML HTML version
|
||||
dftables.c auxiliary program for building chartables.c
|
||||
get.c )
|
||||
maketables.c )
|
||||
study.c ) source of
|
||||
pcre.c ) the functions
|
||||
pcreposix.c )
|
||||
pcre.h header for the external API
|
||||
pcreposix.h header for the external POSIX wrapper API
|
||||
internal.h header for internal use
|
||||
pcretest.c test program
|
||||
pgrep.1 man page source for pgrep
|
||||
pgrep.1.txt plain text version
|
||||
pgrep.1.HTML HTML version
|
||||
pgrep.c source of a grep utility that uses PCRE
|
||||
perltest Perl test program
|
||||
testinput1 test data, compatible with Perl 5.004 and 5.005
|
||||
testinput2 test data for error messages and non-Perl things
|
||||
testinput3 test data, compatible with Perl 5.005
|
||||
testinput4 test data for locale-specific tests
|
||||
testoutput1 test results corresponding to testinput1
|
||||
testoutput2 test results corresponding to testinput2
|
||||
testoutput3 test results corresponding to testinput3
|
||||
testoutput4 test results corresponding to testinput4
|
||||
dll.mk for Win32 DLL
|
||||
pcre.def ditto
|
||||
|
||||
To build PCRE on a Unix system, first edit Makefile for your system. It is a
|
||||
fairly simple make file, and there are some comments near the top, after the
|
||||
text "On a Unix system". Then run "make". It builds two libraries called
|
||||
libpcre.a and libpcreposix.a, a test program called pcretest, and the pgrep
|
||||
command. You can use "make install" to copy these, and the public header file
|
||||
pcre.h, to appropriate live directories on your system. These installation
|
||||
directories are defined at the top of the Makefile, and you should edit them if
|
||||
necessary.
|
||||
|
||||
For a non-Unix system, read the comments at the top of Makefile, which give
|
||||
some hints on what needs to be done. PCRE has been compiled on Windows systems
|
||||
and on Macintoshes, but I don't know the details as I don't use those systems.
|
||||
It should be straightforward to build PCRE on any system that has a Standard C
|
||||
compiler.
|
||||
|
||||
Some help in building a Win32 DLL of PCRE in GnuWin32 environments was
|
||||
contributed by Paul.Sokolovsky@technologist.com. These environments are
|
||||
Mingw32 (http://www.xraylith.wisc.edu/~khan/software/gnu-win32/) and
|
||||
CygWin (http://sourceware.cygnus.com/cygwin/). Paul comments:
|
||||
|
||||
For CygWin, set CFLAGS=-mno-cygwin, and do 'make dll'. You'll get
|
||||
pcre.dll (containing pcreposix also), libpcre.dll.a, and dynamically
|
||||
linked pgrep and pcretest. If you have /bin/sh, run RunTest (three
|
||||
main test go ok, locale not supported).
|
||||
|
||||
To test PCRE, run the RunTest script in the pcre directory. This can also be
|
||||
run by "make runtest". It runs the pcretest test program (which is documented
|
||||
below) on each of the testinput files in turn, and compares the output with the
|
||||
contents of the corresponding testoutput file. A file called testtry is used to
|
||||
hold the output from pcretest. To run pcretest on just one of the test files,
|
||||
give its number as an argument to RunTest, for example:
|
||||
|
||||
RunTest 3
|
||||
|
||||
The first and third test files can also be fed directly into the perltest
|
||||
script to check that Perl gives the same results. The third file requires the
|
||||
additional features of release 5.005, which is why it is kept separate from the
|
||||
main test input, which needs only Perl 5.004. In the long run, when 5.005 is
|
||||
widespread, these two test files may get amalgamated.
|
||||
|
||||
The second set of tests check pcre_info(), pcre_study(), pcre_copy_substring(),
|
||||
pcre_get_substring(), pcre_get_substring_list(), error detection and run-time
|
||||
flags that are specific to PCRE, as well as the POSIX wrapper API.
|
||||
|
||||
The fourth set of tests checks pcre_maketables(), the facility for building a
|
||||
set of character tables for a specific locale and using them instead of the
|
||||
default tables. The tests make use of the "fr" (French) locale. Before running
|
||||
the test, the script checks for the presence of this locale by running the
|
||||
"locale" command. If that command fails, or if it doesn't include "fr" in the
|
||||
list of available locales, the fourth test cannot be run, and a comment is
|
||||
output to say why. If running this test produces instances of the error
|
||||
|
||||
** Failed to set locale "fr"
|
||||
|
||||
in the comparison output, it means that locale is not available on your system,
|
||||
despite being listed by "locale". This does not mean that PCRE is broken.
|
||||
|
||||
PCRE has its own native API, but a set of "wrapper" functions that are based on
|
||||
the POSIX API are also supplied in the library libpcreposix.a. Note that this
|
||||
just provides a POSIX calling interface to PCRE: the regular expressions
|
||||
themselves still follow Perl syntax and semantics. The header file
|
||||
for the POSIX-style functions is called pcreposix.h. The official POSIX name is
|
||||
regex.h, but I didn't want to risk possible problems with existing files of
|
||||
that name by distributing it that way. To use it with an existing program that
|
||||
uses the POSIX API, it will have to be renamed or pointed at by a link.
|
||||
|
||||
|
||||
Character tables
|
||||
----------------
|
||||
|
||||
PCRE uses four tables for manipulating and identifying characters. The final
|
||||
argument of the pcre_compile() function is a pointer to a block of memory
|
||||
containing the concatenated tables. A call to pcre_maketables() can be used to
|
||||
generate a set of tables in the current locale. If the final argument for
|
||||
pcre_compile() is passed as NULL, a set of default tables that is built into
|
||||
the binary is used.
|
||||
|
||||
The source file called chartables.c contains the default set of tables. This is
|
||||
not supplied in the distribution, but is built by the program dftables
|
||||
(compiled from dftables.c), which uses the ANSI C character handling functions
|
||||
such as isalnum(), isalpha(), isupper(), islower(), etc. to build the table
|
||||
sources. This means that the default C locale which is set for your system will
|
||||
control the contents of these default tables. You can change the default tables
|
||||
by editing chartables.c and then re-building PCRE. If you do this, you should
|
||||
probably also edit Makefile to ensure that the file doesn't ever get
|
||||
re-generated.
|
||||
|
||||
The first two 256-byte tables provide lower casing and case flipping functions,
|
||||
respectively. The next table consists of three 32-byte bit maps which identify
|
||||
digits, "word" characters, and white space, respectively. These are used when
|
||||
building 32-byte bit maps that represent character classes.
|
||||
|
||||
The final 256-byte table has bits indicating various character types, as
|
||||
follows:
|
||||
|
||||
1 white space character
|
||||
2 letter
|
||||
4 decimal digit
|
||||
8 hexadecimal digit
|
||||
16 alphanumeric or '_'
|
||||
128 regular expression metacharacter or binary zero
|
||||
|
||||
You should not alter the set of characters that contain the 128 bit, as that
|
||||
will cause PCRE to malfunction.
|
||||
|
||||
|
||||
The pcretest program
|
||||
--------------------
|
||||
|
||||
This program is intended for testing PCRE, but it can also be used for
|
||||
experimenting with regular expressions.
|
||||
|
||||
If it is given two filename arguments, it reads from the first and writes to
|
||||
the second. If it is given only one filename argument, it reads from that file
|
||||
and writes to stdout. Otherwise, it reads from stdin and writes to stdout, and
|
||||
prompts for each line of input.
|
||||
|
||||
The program handles any number of sets of input on a single input file. Each
|
||||
set starts with a regular expression, and continues with any number of data
|
||||
lines to be matched against the pattern. An empty line signals the end of the
|
||||
set. The regular expressions are given enclosed in any non-alphameric
|
||||
delimiters other than backslash, for example
|
||||
|
||||
/(a|bc)x+yz/
|
||||
|
||||
White space before the initial delimiter is ignored. A regular expression may
|
||||
be continued over several input lines, in which case the newline characters are
|
||||
included within it. See the testinput files for many examples. It is possible
|
||||
to include the delimiter within the pattern by escaping it, for example
|
||||
|
||||
/abc\/def/
|
||||
|
||||
If you do so, the escape and the delimiter form part of the pattern, but since
|
||||
delimiters are always non-alphameric, this does not affect its interpretation.
|
||||
If the terminating delimiter is immediately followed by a backslash, for
|
||||
example,
|
||||
|
||||
/abc/\
|
||||
|
||||
then a backslash is added to the end of the pattern. This is done to provide a
|
||||
way of testing the error condition that arises if a pattern finishes with a
|
||||
backslash, because
|
||||
|
||||
/abc\/
|
||||
|
||||
is interpreted as the first line of a pattern that starts with "abc/", causing
|
||||
pcretest to read the next line as a continuation of the regular expression.
|
||||
|
||||
The pattern may be followed by i, m, s, or x to set the PCRE_CASELESS,
|
||||
PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options, respectively. For
|
||||
example:
|
||||
|
||||
/caseless/i
|
||||
|
||||
These modifier letters have the same effect as they do in Perl. There are
|
||||
others which set PCRE options that do not correspond to anything in Perl: /A,
|
||||
/E, and /X set PCRE_ANCHORED, PCRE_DOLLAR_ENDONLY, and PCRE_EXTRA respectively.
|
||||
|
||||
Searching for all possible matches within each subject string can be requested
|
||||
by the /g or /G modifier. After finding a match, PCRE is called again to search
|
||||
the remainder of the subject string. The difference between /g and /G is that
|
||||
the former uses the startoffset argument to pcre_exec() to start searching at
|
||||
a new point within the entire string (which is in effect what Perl does),
|
||||
whereas the latter passes over a shortened substring. This makes a difference
|
||||
to the matching process if the pattern begins with a lookbehind assertion
|
||||
(including \b or \B).
|
||||
|
||||
If any call to pcre_exec() in a /g or /G sequence matches an empty string, the
|
||||
next call is done with the PCRE_NOTEMPTY flag set so that it cannot match an
|
||||
empty string again. This imitates the way Perl handles such cases when using
|
||||
the /g modifier or the split() function.
|
||||
|
||||
There are a number of other modifiers for controlling the way pcretest
|
||||
operates.
|
||||
|
||||
The /+ modifier requests that as well as outputting the substring that matched
|
||||
the entire pattern, pcretest should in addition output the remainder of the
|
||||
subject string. This is useful for tests where the subject contains multiple
|
||||
copies of the same substring.
|
||||
|
||||
The /L modifier must be followed directly by the name of a locale, for example,
|
||||
|
||||
/pattern/Lfr
|
||||
|
||||
For this reason, it must be the last modifier letter. The given locale is set,
|
||||
pcre_maketables() is called to build a set of character tables for the locale,
|
||||
and this is then passed to pcre_compile() when compiling the regular
|
||||
expression. Without an /L modifier, NULL is passed as the tables pointer; that
|
||||
is, /L applies only to the expression on which it appears.
|
||||
|
||||
The /I modifier requests that pcretest output information about the compiled
|
||||
expression (whether it is anchored, has a fixed first character, and so on). It
|
||||
does this by calling pcre_info() after compiling an expression, and outputting
|
||||
the information it gets back. If the pattern is studied, the results of that
|
||||
are also output.
|
||||
|
||||
The /D modifier is a PCRE debugging feature, which also assumes /I. It causes
|
||||
the internal form of compiled regular expressions to be output after
|
||||
compilation.
|
||||
|
||||
The /S modifier causes pcre_study() to be called after the expression has been
|
||||
compiled, and the results used when the expression is matched.
|
||||
|
||||
The /M modifier causes the size of memory block used to hold the compiled
|
||||
pattern to be output.
|
||||
|
||||
Finally, the /P modifier causes pcretest to call PCRE via the POSIX wrapper API
|
||||
rather than its native API. When this is done, all other modifiers except /i,
|
||||
/m, and /+ are ignored. REG_ICASE is set if /i is present, and REG_NEWLINE is
|
||||
set if /m is present. The wrapper functions force PCRE_DOLLAR_ENDONLY always,
|
||||
and PCRE_DOTALL unless REG_NEWLINE is set.
|
||||
|
||||
Before each data line is passed to pcre_exec(), leading and trailing whitespace
|
||||
is removed, and it is then scanned for \ escapes. The following are recognized:
|
||||
|
||||
\a alarm (= BEL)
|
||||
\b backspace
|
||||
\e escape
|
||||
\f formfeed
|
||||
\n newline
|
||||
\r carriage return
|
||||
\t tab
|
||||
\v vertical tab
|
||||
\nnn octal character (up to 3 octal digits)
|
||||
\xhh hexadecimal character (up to 2 hex digits)
|
||||
|
||||
\A pass the PCRE_ANCHORED option to pcre_exec()
|
||||
\B pass the PCRE_NOTBOL option to pcre_exec()
|
||||
\Cdd call pcre_copy_substring() for substring dd after a successful match
|
||||
(any decimal number less than 32)
|
||||
\Gdd call pcre_get_substring() for substring dd after a successful match
|
||||
(any decimal number less than 32)
|
||||
\L call pcre_get_substringlist() after a successful match
|
||||
\N pass the PCRE_NOTEMPTY option to pcre_exec()
|
||||
\Odd set the size of the output vector passed to pcre_exec() to dd
|
||||
(any number of decimal digits)
|
||||
\Z pass the PCRE_NOTEOL option to pcre_exec()
|
||||
|
||||
A backslash followed by anything else just escapes the anything else. If the
|
||||
very last character is a backslash, it is ignored. This gives a way of passing
|
||||
an empty line as data, since a real empty line terminates the data input.
|
||||
|
||||
If /P was present on the regex, causing the POSIX wrapper API to be used, only
|
||||
\B, and \Z have any effect, causing REG_NOTBOL and REG_NOTEOL to be passed to
|
||||
regexec() respectively.
|
||||
|
||||
When a match succeeds, pcretest outputs the list of captured substrings that
|
||||
pcre_exec() returns, starting with number 0 for the string that matched the
|
||||
whole pattern. Here is an example of an interactive pcretest run.
|
||||
|
||||
$ pcretest
|
||||
PCRE version 2.06 08-Jun-1999
|
||||
|
||||
re> /^abc(\d+)/
|
||||
data> abc123
|
||||
0: abc123
|
||||
1: 123
|
||||
data> xyz
|
||||
No match
|
||||
|
||||
If the strings contain any non-printing characters, they are output as \0x
|
||||
escapes. If the pattern has the /+ modifier, then the output for substring 0 is
|
||||
followed by the the rest of the subject string, identified by "0+" like this:
|
||||
|
||||
re> /cat/+
|
||||
data> cataract
|
||||
0: cat
|
||||
0+ aract
|
||||
|
||||
If the pattern has the /g or /G modifier, the results of successive matching
|
||||
attempts are output in sequence, like this:
|
||||
|
||||
re> /\Bi(\w\w)/g
|
||||
data> Mississippi
|
||||
0: iss
|
||||
1: ss
|
||||
0: iss
|
||||
1: ss
|
||||
0: ipp
|
||||
1: pp
|
||||
|
||||
"No match" is output only if the first match attempt fails.
|
||||
|
||||
If any of \C, \G, or \L are present in a data line that is successfully
|
||||
matched, the substrings extracted by the convenience functions are output with
|
||||
C, G, or L after the string number instead of a colon. This is in addition to
|
||||
the normal full list. The string length (that is, the return from the
|
||||
extraction function) is given in parentheses after each string for \C and \G.
|
||||
|
||||
Note that while patterns can be continued over several lines (a plain ">"
|
||||
prompt is used for continuations), data lines may not. However newlines can be
|
||||
included in data by means of the \n escape.
|
||||
|
||||
If the -p option is given to pcretest, it is equivalent to adding /P to each
|
||||
regular expression: the POSIX wrapper API is used to call PCRE. None of the
|
||||
following flags has any effect in this case.
|
||||
|
||||
If the option -d is given to pcretest, it is equivalent to adding /D to each
|
||||
regular expression: the internal form is output after compilation.
|
||||
|
||||
If the option -i is given to pcretest, it is equivalent to adding /I to each
|
||||
regular expression: information about the compiled pattern is given after
|
||||
compilation.
|
||||
|
||||
If the option -m is given to pcretest, it outputs the size of each compiled
|
||||
pattern after it has been compiled. It is equivalent to adding /M to each
|
||||
regular expression. For compatibility with earlier versions of pcretest, -s is
|
||||
a synonym for -m.
|
||||
|
||||
If the -t option is given, each compile, study, and match is run 20000 times
|
||||
while being timed, and the resulting time per compile or match is output in
|
||||
milliseconds. Do not set -t with -s, because you will then get the size output
|
||||
20000 times and the timing will be distorted. If you want to change the number
|
||||
of repetitions used for timing, edit the definition of LOOPREPEAT at the top of
|
||||
pcretest.c
|
||||
|
||||
|
||||
|
||||
The perltest program
|
||||
--------------------
|
||||
|
||||
The perltest program tests Perl's regular expressions; it has the same
|
||||
specification as pcretest, and so can be given identical input, except that
|
||||
input patterns can be followed only by Perl's lower case modifiers. The
|
||||
contents of testinput1 and testinput3 meet this condition.
|
||||
|
||||
The data lines are processed as Perl double-quoted strings, so if they contain
|
||||
" \ $ or @ characters, these have to be escaped. For this reason, all such
|
||||
characters in testinput1 and testinput3 are escaped so that they can be used
|
||||
for perltest as well as for pcretest, and the special upper case modifiers such
|
||||
as /A that pcretest recognizes are not used in these files. The output should
|
||||
be identical, apart from the initial identifying banner.
|
||||
|
||||
The testinput2 and testinput4 files are not suitable for feeding to perltest,
|
||||
since they do make use of the special upper case modifiers and escapes that
|
||||
pcretest uses to test some features of PCRE. The first of these files also
|
||||
contains malformed regular expressions, in order to check that PCRE diagnoses
|
||||
them correctly.
|
||||
|
||||
Philip Hazel <ph10@cam.ac.uk>
|
||||
July 1999
|
@ -1,94 +0,0 @@
|
||||
#! /bin/sh
|
||||
|
||||
# Run PCRE tests
|
||||
|
||||
cf=diff
|
||||
|
||||
# Select which tests to run; if no selection, run all
|
||||
|
||||
do1=no
|
||||
do2=no
|
||||
do3=no
|
||||
do4=no
|
||||
|
||||
while [ $# -gt 0 ] ; do
|
||||
case $1 in
|
||||
1) do1=yes;;
|
||||
2) do2=yes;;
|
||||
3) do3=yes;;
|
||||
4) do4=yes;;
|
||||
*) echo "Unknown test number $1"; exit 1;;
|
||||
esac
|
||||
shift
|
||||
done
|
||||
|
||||
if [ $do1 = no -a $do2 = no -a $do3 = no -a $do4 = no ] ; then
|
||||
do1=yes
|
||||
do2=yes
|
||||
do3=yes
|
||||
do4=yes
|
||||
fi
|
||||
|
||||
# Primary test, Perl-compatible
|
||||
|
||||
if [ $do1 = yes ] ; then
|
||||
echo "Testing main functionality (Perl compatible)"
|
||||
./pcretest testinput1 testtry
|
||||
if [ $? = 0 ] ; then
|
||||
$cf testtry testoutput1
|
||||
if [ $? != 0 ] ; then exit 1; fi
|
||||
else exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
# PCRE tests that are not Perl-compatible - API & error tests, mostly
|
||||
|
||||
if [ $do2 = yes ] ; then
|
||||
echo "Testing API and error handling (not Perl compatible)"
|
||||
./pcretest -i testinput2 testtry
|
||||
if [ $? = 0 ] ; then
|
||||
$cf testtry testoutput2
|
||||
if [ $? != 0 ] ; then exit 1; fi
|
||||
else exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
# Additional Perl-compatible tests for Perl 5.005's new features
|
||||
|
||||
if [ $do3 = yes ] ; then
|
||||
echo "Testing Perl 5.005 features (Perl 5.005 compatible)"
|
||||
./pcretest testinput3 testtry
|
||||
if [ $? = 0 ] ; then
|
||||
$cf testtry testoutput3
|
||||
if [ $? != 0 ] ; then exit 1; fi
|
||||
else exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
if [ $do1 = yes -a $do2 = yes -a $do3 = yes ] ; then
|
||||
echo "The three main tests all ran OK"
|
||||
echo " "
|
||||
fi
|
||||
|
||||
# Locale-specific tests, provided the "fr" locale is available
|
||||
|
||||
if [ $do4 = yes ] ; then
|
||||
locale -a | grep '^fr$' >/dev/null
|
||||
if [ $? -eq 0 ] ; then
|
||||
echo "Testing locale-specific features (using 'fr' locale)"
|
||||
./pcretest testinput4 testtry
|
||||
if [ $? = 0 ] ; then
|
||||
$cf testtry testoutput4
|
||||
if [ $? != 0 ] ; then exit 1; fi
|
||||
echo "Locale test ran OK"
|
||||
echo " "
|
||||
else exit 1
|
||||
fi
|
||||
else
|
||||
echo "Cannot test locale-specific features - 'fr' locale not found,"
|
||||
echo "or the \"locale\" command is not available to check for it."
|
||||
echo " "
|
||||
fi
|
||||
fi
|
||||
|
||||
# End
|
@ -1,239 +0,0 @@
|
||||
Technical Notes about PCRE
|
||||
--------------------------
|
||||
|
||||
Many years ago I implemented some regular expression functions to an algorithm
|
||||
suggested by Martin Richards. These were not Unix-like in form, and were quite
|
||||
restricted in what they could do by comparison with Perl. The interesting part
|
||||
about the algorithm was that the amount of space required to hold the compiled
|
||||
form of an expression was known in advance. The code to apply an expression did
|
||||
not operate by backtracking, as the Henry Spencer and Perl code does, but
|
||||
instead checked all possibilities simultaneously by keeping a list of current
|
||||
states and checking all of them as it advanced through the subject string. (In
|
||||
the terminology of Jeffrey Friedl's book, it was a "DFA algorithm".) When the
|
||||
pattern was all used up, all remaining states were possible matches, and the
|
||||
one matching the longest subset of the subject string was chosen. This did not
|
||||
necessarily maximize the individual wild portions of the pattern, as is
|
||||
expected in Unix and Perl-style regular expressions.
|
||||
|
||||
By contrast, the code originally written by Henry Spencer and subsequently
|
||||
heavily modified for Perl actually compiles the expression twice: once in a
|
||||
dummy mode in order to find out how much store will be needed, and then for
|
||||
real. The execution function operates by backtracking and maximizing (or,
|
||||
optionally, minimizing in Perl) the amount of the subject that matches
|
||||
individual wild portions of the pattern. This is an "NFA algorithm" in Friedl's
|
||||
terminology.
|
||||
|
||||
For this set of functions that forms PCRE, I tried at first to invent an
|
||||
algorithm that used an amount of store bounded by a multiple of the number of
|
||||
characters in the pattern, to save on compiling time. However, because of the
|
||||
greater complexity in Perl regular expressions, I couldn't do this. In any
|
||||
case, a first pass through the pattern is needed, in order to find internal
|
||||
flag settings like (?i) at top level. So it works by running a very degenerate
|
||||
first pass to calculate a maximum store size, and then a second pass to do the
|
||||
real compile - which may use a bit less than the predicted amount of store. The
|
||||
idea is that this is going to turn out faster because the first pass is
|
||||
degenerate and the second can just store stuff straight into the vector. It
|
||||
does make the compiling functions bigger, of course, but they have got quite
|
||||
big anyway to handle all the Perl stuff.
|
||||
|
||||
The compiled form of a pattern is a vector of bytes, containing items of
|
||||
variable length. The first byte in an item is an opcode, and the length of the
|
||||
item is either implicit in the opcode or contained in the data bytes which
|
||||
follow it. A list of all the opcodes follows:
|
||||
|
||||
Opcodes with no following data
|
||||
------------------------------
|
||||
|
||||
These items are all just one byte long
|
||||
|
||||
OP_END end of pattern
|
||||
OP_ANY match any character
|
||||
OP_SOD match start of data: \A
|
||||
OP_CIRC ^ (start of data, or after \n in multiline)
|
||||
OP_NOT_WORD_BOUNDARY \W
|
||||
OP_WORD_BOUNDARY \w
|
||||
OP_NOT_DIGIT \D
|
||||
OP_DIGIT \d
|
||||
OP_NOT_WHITESPACE \S
|
||||
OP_WHITESPACE \s
|
||||
OP_NOT_WORDCHAR \W
|
||||
OP_WORDCHAR \w
|
||||
OP_EODN match end of data or \n at end: \Z
|
||||
OP_EOD match end of data: \z
|
||||
OP_DOLL $ (end of data, or before \n in multiline)
|
||||
|
||||
|
||||
Repeating single characters
|
||||
---------------------------
|
||||
|
||||
The common repeats (*, +, ?) when applied to a single character appear as
|
||||
two-byte items using the following opcodes:
|
||||
|
||||
OP_STAR
|
||||
OP_MINSTAR
|
||||
OP_PLUS
|
||||
OP_MINPLUS
|
||||
OP_QUERY
|
||||
OP_MINQUERY
|
||||
|
||||
Those with "MIN" in their name are the minimizing versions. Each is followed by
|
||||
the character that is to be repeated. Other repeats make use of
|
||||
|
||||
OP_UPTO
|
||||
OP_MINUPTO
|
||||
OP_EXACT
|
||||
|
||||
which are followed by a two-byte count (most significant first) and the
|
||||
repeated character. OP_UPTO matches from 0 to the given number. A repeat with a
|
||||
non-zero minimum and a fixed maximum is coded as an OP_EXACT followed by an
|
||||
OP_UPTO (or OP_MINUPTO).
|
||||
|
||||
|
||||
Repeating character types
|
||||
-------------------------
|
||||
|
||||
Repeats of things like \d are done exactly as for single characters, except
|
||||
that instead of a character, the opcode for the type is stored in the data
|
||||
byte. The opcodes are:
|
||||
|
||||
OP_TYPESTAR
|
||||
OP_TYPEMINSTAR
|
||||
OP_TYPEPLUS
|
||||
OP_TYPEMINPLUS
|
||||
OP_TYPEQUERY
|
||||
OP_TYPEMINQUERY
|
||||
OP_TYPEUPTO
|
||||
OP_TYPEMINUPTO
|
||||
OP_TYPEEXACT
|
||||
|
||||
|
||||
Matching a character string
|
||||
---------------------------
|
||||
|
||||
The OP_CHARS opcode is followed by a one-byte count and then that number of
|
||||
characters. If there are more than 255 characters in sequence, successive
|
||||
instances of OP_CHARS are used.
|
||||
|
||||
|
||||
Character classes
|
||||
-----------------
|
||||
|
||||
OP_CLASS is used for a character class, provided there are at least two
|
||||
characters in the class. If there is only one character, OP_CHARS is used for a
|
||||
positive class, and OP_NOT for a negative one (that is, for something like
|
||||
[^a]). Another set of repeating opcodes (OP_NOTSTAR etc.) are used for a
|
||||
repeated, negated, single-character class. The normal ones (OP_STAR etc.) are
|
||||
used for a repeated positive single-character class.
|
||||
|
||||
OP_CLASS is followed by a 32-byte bit map containing a 1
|
||||
bit for every character that is acceptable. The bits are counted from the least
|
||||
significant end of each byte.
|
||||
|
||||
|
||||
Back references
|
||||
---------------
|
||||
|
||||
OP_REF is followed by a single byte containing the reference number.
|
||||
|
||||
|
||||
Repeating character classes and back references
|
||||
-----------------------------------------------
|
||||
|
||||
Single-character classes are handled specially (see above). This applies to
|
||||
OP_CLASS and OP_REF. In both cases, the repeat information follows the base
|
||||
item. The matching code looks at the following opcode to see if it is one of
|
||||
|
||||
OP_CRSTAR
|
||||
OP_CRMINSTAR
|
||||
OP_CRPLUS
|
||||
OP_CRMINPLUS
|
||||
OP_CRQUERY
|
||||
OP_CRMINQUERY
|
||||
OP_CRRANGE
|
||||
OP_CRMINRANGE
|
||||
|
||||
All but the last two are just single-byte items. The others are followed by
|
||||
four bytes of data, comprising the minimum and maximum repeat counts.
|
||||
|
||||
|
||||
Brackets and alternation
|
||||
------------------------
|
||||
|
||||
A pair of non-identifying (round) brackets is wrapped round each expression at
|
||||
compile time, so alternation always happens in the context of brackets.
|
||||
Non-identifying brackets use the opcode OP_BRA, while identifying brackets use
|
||||
OP_BRA+1, OP_BRA+2, etc. [Note for North Americans: "bracket" to some English
|
||||
speakers, including myself, can be round, square, or curly. Hence this usage.]
|
||||
|
||||
A bracket opcode is followed by two bytes which give the offset to the next
|
||||
alternative OP_ALT or, if there aren't any branches, to the matching KET
|
||||
opcode. Each OP_ALT is followed by two bytes giving the offset to the next one,
|
||||
or to the KET opcode.
|
||||
|
||||
OP_KET is used for subpatterns that do not repeat indefinitely, while
|
||||
OP_KETRMIN and OP_KETRMAX are used for indefinite repetitions, minimally or
|
||||
maximally respectively. All three are followed by two bytes giving (as a
|
||||
positive number) the offset back to the matching BRA opcode.
|
||||
|
||||
If a subpattern is quantified such that it is permitted to match zero times, it
|
||||
is preceded by one of OP_BRAZERO or OP_BRAMINZERO. These are single-byte
|
||||
opcodes which tell the matcher that skipping this subpattern entirely is a
|
||||
valid branch.
|
||||
|
||||
A subpattern with an indefinite maximum repetition is replicated in the
|
||||
compiled data its minimum number of times (or once with a BRAZERO if the
|
||||
minimum is zero), with the final copy terminating with a KETRMIN or KETRMAX as
|
||||
appropriate.
|
||||
|
||||
A subpattern with a bounded maximum repetition is replicated in a nested
|
||||
fashion up to the maximum number of times, with BRAZERO or BRAMINZERO before
|
||||
each replication after the minimum, so that, for example, (abc){2,5} is
|
||||
compiled as (abc)(abc)((abc)((abc)(abc)?)?)?. The 200-bracket limit does not
|
||||
apply to these internally generated brackets.
|
||||
|
||||
|
||||
Assertions
|
||||
----------
|
||||
|
||||
Forward assertions are just like other subpatterns, but starting with one of
|
||||
the opcodes OP_ASSERT or OP_ASSERT_NOT. Backward assertions use the opcodes
|
||||
OP_ASSERTBACK and OP_ASSERTBACK_NOT, and the first opcode inside the assertion
|
||||
is OP_REVERSE, followed by a two byte count of the number of characters to move
|
||||
back the pointer in the subject string. A separate count is present in each
|
||||
alternative of a lookbehind assertion, allowing them to have different fixed
|
||||
lengths.
|
||||
|
||||
|
||||
Once-only subpatterns
|
||||
---------------------
|
||||
|
||||
These are also just like other subpatterns, but they start with the opcode
|
||||
OP_ONCE.
|
||||
|
||||
|
||||
Conditional subpatterns
|
||||
-----------------------
|
||||
|
||||
These are like other subpatterns, but they start with the opcode OP_COND. If
|
||||
the condition is a back reference, this is stored at the start of the
|
||||
subpattern using the opcode OP_CREF followed by one byte containing the
|
||||
reference number. Otherwise, a conditional subpattern will always start with
|
||||
one of the assertions.
|
||||
|
||||
|
||||
Changing options
|
||||
----------------
|
||||
|
||||
If any of the /i, /m, or /s options are changed within a parenthesized group,
|
||||
an OP_OPT opcode is compiled, followed by one byte containing the new settings
|
||||
of these flags. If there are several alternatives in a group, there is an
|
||||
occurrence of OP_OPT at the start of all those following the first options
|
||||
change, to set appropriate options for the start of the alternative.
|
||||
Immediately after the end of the group there is another such item to reset the
|
||||
flags to their previous values. Other changes of flag within the pattern can be
|
||||
handled entirely at compile time, and so do not cause anything to be put into
|
||||
the compiled data.
|
||||
|
||||
|
||||
Philip Hazel
|
||||
January 1999
|
@ -1,146 +0,0 @@
|
||||
/*************************************************
|
||||
* Perl-Compatible Regular Expressions *
|
||||
*************************************************/
|
||||
|
||||
/* This file is automatically written by the dftables auxiliary
|
||||
program. If you edit it by hand, you might like to edit the Makefile to
|
||||
prevent its ever being regenerated.
|
||||
|
||||
This file is #included in the compilation of pcre.c to build the default
|
||||
character tables which are used when no tables are passed to the compile
|
||||
function. */
|
||||
|
||||
static unsigned char pcre_default_tables[] = {
|
||||
|
||||
/* This table is a lower casing table. */
|
||||
|
||||
0, 1, 2, 3, 4, 5, 6, 7,
|
||||
8, 9, 10, 11, 12, 13, 14, 15,
|
||||
16, 17, 18, 19, 20, 21, 22, 23,
|
||||
24, 25, 26, 27, 28, 29, 30, 31,
|
||||
32, 33, 34, 35, 36, 37, 38, 39,
|
||||
40, 41, 42, 43, 44, 45, 46, 47,
|
||||
48, 49, 50, 51, 52, 53, 54, 55,
|
||||
56, 57, 58, 59, 60, 61, 62, 63,
|
||||
64, 97, 98, 99,100,101,102,103,
|
||||
104,105,106,107,108,109,110,111,
|
||||
112,113,114,115,116,117,118,119,
|
||||
120,121,122, 91, 92, 93, 94, 95,
|
||||
96, 97, 98, 99,100,101,102,103,
|
||||
104,105,106,107,108,109,110,111,
|
||||
112,113,114,115,116,117,118,119,
|
||||
120,121,122,123,124,125,126,127,
|
||||
128,129,130,131,132,133,134,135,
|
||||
136,137,138,139,140,141,142,143,
|
||||
144,145,146,147,148,149,150,151,
|
||||
152,153,154,155,156,157,158,159,
|
||||
160,161,162,163,164,165,166,167,
|
||||
168,169,170,171,172,173,174,175,
|
||||
176,177,178,179,180,181,182,183,
|
||||
184,185,186,187,188,189,190,191,
|
||||
192,193,194,195,196,197,198,199,
|
||||
200,201,202,203,204,205,206,207,
|
||||
208,209,210,211,212,213,214,215,
|
||||
216,217,218,219,220,221,222,223,
|
||||
224,225,226,227,228,229,230,231,
|
||||
232,233,234,235,236,237,238,239,
|
||||
240,241,242,243,244,245,246,247,
|
||||
248,249,250,251,252,253,254,255,
|
||||
|
||||
/* This table is a case flipping table. */
|
||||
|
||||
0, 1, 2, 3, 4, 5, 6, 7,
|
||||
8, 9, 10, 11, 12, 13, 14, 15,
|
||||
16, 17, 18, 19, 20, 21, 22, 23,
|
||||
24, 25, 26, 27, 28, 29, 30, 31,
|
||||
32, 33, 34, 35, 36, 37, 38, 39,
|
||||
40, 41, 42, 43, 44, 45, 46, 47,
|
||||
48, 49, 50, 51, 52, 53, 54, 55,
|
||||
56, 57, 58, 59, 60, 61, 62, 63,
|
||||
64, 97, 98, 99,100,101,102,103,
|
||||
104,105,106,107,108,109,110,111,
|
||||
112,113,114,115,116,117,118,119,
|
||||
120,121,122, 91, 92, 93, 94, 95,
|
||||
96, 65, 66, 67, 68, 69, 70, 71,
|
||||
72, 73, 74, 75, 76, 77, 78, 79,
|
||||
80, 81, 82, 83, 84, 85, 86, 87,
|
||||
88, 89, 90,123,124,125,126,127,
|
||||
128,129,130,131,132,133,134,135,
|
||||
136,137,138,139,140,141,142,143,
|
||||
144,145,146,147,148,149,150,151,
|
||||
152,153,154,155,156,157,158,159,
|
||||
160,161,162,163,164,165,166,167,
|
||||
168,169,170,171,172,173,174,175,
|
||||
176,177,178,179,180,181,182,183,
|
||||
184,185,186,187,188,189,190,191,
|
||||
192,193,194,195,196,197,198,199,
|
||||
200,201,202,203,204,205,206,207,
|
||||
208,209,210,211,212,213,214,215,
|
||||
216,217,218,219,220,221,222,223,
|
||||
224,225,226,227,228,229,230,231,
|
||||
232,233,234,235,236,237,238,239,
|
||||
240,241,242,243,244,245,246,247,
|
||||
248,249,250,251,252,253,254,255,
|
||||
|
||||
/* This table contains bit maps for digits, 'word' chars, and white
|
||||
space. Each map is 32 bytes long and the bits run from the least
|
||||
significant end of each byte. */
|
||||
|
||||
0x00,0x00,0x00,0x00,0x00,0x00,0xff,0x03,
|
||||
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
|
||||
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
|
||||
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
|
||||
|
||||
0x00,0x00,0x00,0x00,0x00,0x00,0xff,0x03,
|
||||
0xfe,0xff,0xff,0x87,0xfe,0xff,0xff,0x07,
|
||||
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
|
||||
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
|
||||
|
||||
0x00,0x3e,0x00,0x00,0x01,0x00,0x00,0x00,
|
||||
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
|
||||
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
|
||||
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00 ,
|
||||
|
||||
/* This table identifies various classes of character by individual bits:
|
||||
0x01 white space character
|
||||
0x02 letter
|
||||
0x04 decimal digit
|
||||
0x08 hexadecimal digit
|
||||
0x10 alphanumeric or '_'
|
||||
0x80 regular expression metacharacter or binary zero
|
||||
*/
|
||||
|
||||
0x80,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 0- 7 */
|
||||
0x00,0x01,0x01,0x01,0x01,0x01,0x00,0x00, /* 8- 15 */
|
||||
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 16- 23 */
|
||||
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 24- 31 */
|
||||
0x01,0x00,0x00,0x00,0x80,0x00,0x00,0x00, /* - ' */
|
||||
0x80,0x80,0x80,0x80,0x00,0x00,0x80,0x00, /* ( - / */
|
||||
0x1c,0x1c,0x1c,0x1c,0x1c,0x1c,0x1c,0x1c, /* 0 - 7 */
|
||||
0x1c,0x1c,0x00,0x00,0x00,0x00,0x00,0x80, /* 8 - ? */
|
||||
0x00,0x1a,0x1a,0x1a,0x1a,0x1a,0x1a,0x12, /* @ - G */
|
||||
0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* H - O */
|
||||
0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* P - W */
|
||||
0x12,0x12,0x12,0x80,0x00,0x00,0x80,0x10, /* X - _ */
|
||||
0x00,0x1a,0x1a,0x1a,0x1a,0x1a,0x1a,0x12, /* ` - g */
|
||||
0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* h - o */
|
||||
0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* p - w */
|
||||
0x12,0x12,0x12,0x80,0x80,0x00,0x00,0x00, /* x -127 */
|
||||
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 128-135 */
|
||||
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 136-143 */
|
||||
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 144-151 */
|
||||
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 152-159 */
|
||||
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 160-167 */
|
||||
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 168-175 */
|
||||
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 176-183 */
|
||||
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 184-191 */
|
||||
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 192-199 */
|
||||
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 200-207 */
|
||||
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 208-215 */
|
||||
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 216-223 */
|
||||
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 224-231 */
|
||||
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 232-239 */
|
||||
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 240-247 */
|
||||
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00};/* 248-255 */
|
||||
|
||||
/* End of chartables.c */
|
@ -1,146 +0,0 @@
|
||||
/*************************************************
|
||||
* Perl-Compatible Regular Expressions *
|
||||
*************************************************/
|
||||
|
||||
/*
|
||||
PCRE is a library of functions to support regular expressions whose syntax
|
||||
and semantics are as close as possible to those of the Perl 5 language.
|
||||
|
||||
Written by: Philip Hazel <ph10@cam.ac.uk>
|
||||
|
||||
Copyright (c) 1997-1999 University of Cambridge
|
||||
|
||||
-----------------------------------------------------------------------------
|
||||
Permission is granted to anyone to use this software for any purpose on any
|
||||
computer system, and to redistribute it freely, subject to the following
|
||||
restrictions:
|
||||
|
||||
1. This software is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
|
||||
|
||||
2. The origin of this software must not be misrepresented, either by
|
||||
explicit claim or by omission.
|
||||
|
||||
3. Altered versions must be plainly marked as such, and must not be
|
||||
misrepresented as being the original software.
|
||||
|
||||
4. If PCRE is embedded in any software that is released under the GNU
|
||||
General Purpose Licence (GPL), then the terms of that licence shall
|
||||
supersede any condition above with which it is incompatible.
|
||||
-----------------------------------------------------------------------------
|
||||
|
||||
See the file Tech.Notes for some information on the internals.
|
||||
*/
|
||||
|
||||
|
||||
/* This is a support program to generate the file chartables.c, containing
|
||||
character tables of various kinds. They are built according to the default C
|
||||
locale and used as the default tables by PCRE. Now that pcre_maketables is
|
||||
a function visible to the outside world, we make use of its code from here in
|
||||
order to be consistent. */
|
||||
|
||||
#include <ctype.h>
|
||||
#include <stdio.h>
|
||||
#include <string.h>
|
||||
|
||||
#include "internal.h"
|
||||
|
||||
#define DFTABLES /* maketables.c notices this */
|
||||
#include "maketables.c"
|
||||
|
||||
|
||||
int main(void)
|
||||
{
|
||||
int i;
|
||||
unsigned const char *tables = pcre_maketables();
|
||||
|
||||
printf(
|
||||
"/*************************************************\n"
|
||||
"* Perl-Compatible Regular Expressions *\n"
|
||||
"*************************************************/\n\n"
|
||||
"/* This file is automatically written by the dftables auxiliary \n"
|
||||
"program. If you edit it by hand, you might like to edit the Makefile to \n"
|
||||
"prevent its ever being regenerated.\n\n"
|
||||
"This file is #included in the compilation of pcre.c to build the default\n"
|
||||
"character tables which are used when no tables are passed to the compile\n"
|
||||
"function. */\n\n"
|
||||
"static unsigned char pcre_default_tables[] = {\n\n"
|
||||
"/* This table is a lower casing table. */\n\n");
|
||||
|
||||
printf(" ");
|
||||
for (i = 0; i < 256; i++)
|
||||
{
|
||||
if ((i & 7) == 0 && i != 0) printf("\n ");
|
||||
printf("%3d", *tables++);
|
||||
if (i != 255) printf(",");
|
||||
}
|
||||
printf(",\n\n");
|
||||
|
||||
printf("/* This table is a case flipping table. */\n\n");
|
||||
|
||||
printf(" ");
|
||||
for (i = 0; i < 256; i++)
|
||||
{
|
||||
if ((i & 7) == 0 && i != 0) printf("\n ");
|
||||
printf("%3d", *tables++);
|
||||
if (i != 255) printf(",");
|
||||
}
|
||||
printf(",\n\n");
|
||||
|
||||
printf(
|
||||
"/* This table contains bit maps for digits, 'word' chars, and white\n"
|
||||
"space. Each map is 32 bytes long and the bits run from the least\n"
|
||||
"significant end of each byte. */\n\n");
|
||||
|
||||
printf(" ");
|
||||
for (i = 0; i < cbit_length; i++)
|
||||
{
|
||||
if ((i & 7) == 0 && i != 0)
|
||||
{
|
||||
if ((i & 31) == 0) printf("\n");
|
||||
printf("\n ");
|
||||
}
|
||||
printf("0x%02x", *tables++);
|
||||
if (i != cbit_length - 1) printf(",");
|
||||
}
|
||||
printf(" ,\n\n");
|
||||
|
||||
printf(
|
||||
"/* This table identifies various classes of character by individual bits:\n"
|
||||
" 0x%02x white space character\n"
|
||||
" 0x%02x letter\n"
|
||||
" 0x%02x decimal digit\n"
|
||||
" 0x%02x hexadecimal digit\n"
|
||||
" 0x%02x alphanumeric or '_'\n"
|
||||
" 0x%02x regular expression metacharacter or binary zero\n*/\n\n",
|
||||
ctype_space, ctype_letter, ctype_digit, ctype_xdigit, ctype_word,
|
||||
ctype_meta);
|
||||
|
||||
printf(" ");
|
||||
for (i = 0; i < 256; i++)
|
||||
{
|
||||
if ((i & 7) == 0 && i != 0)
|
||||
{
|
||||
printf(" /* ");
|
||||
if (isprint(i-8)) printf(" %c -", i-8);
|
||||
else printf("%3d-", i-8);
|
||||
if (isprint(i-1)) printf(" %c ", i-1);
|
||||
else printf("%3d", i-1);
|
||||
printf(" */\n ");
|
||||
}
|
||||
printf("0x%02x", *tables++);
|
||||
if (i != 255) printf(",");
|
||||
}
|
||||
|
||||
printf("};/* ");
|
||||
if (isprint(i-8)) printf(" %c -", i-8);
|
||||
else printf("%3d-", i-8);
|
||||
if (isprint(i-1)) printf(" %c ", i-1);
|
||||
else printf("%3d", i-1);
|
||||
printf(" */\n\n/* End of chartables.c */\n");
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* End of dftables.c */
|
@ -1,60 +0,0 @@
|
||||
# dll.mk - auxilary Makefile to easy build dll's for mingw32 target
|
||||
# ver. 0.6 of 1999-03-25
|
||||
#
|
||||
# Homepage of this makefile - http://www.is.lg.ua/~paul/devel/
|
||||
# Homepage of original mingw32 project -
|
||||
# http://www.fu.is.saga-u.ac.jp/~colin/gcc.html
|
||||
#
|
||||
# How to use:
|
||||
# This makefile can:
|
||||
# 1. Create automatical .def file from list of objects
|
||||
# 2. Create .dll from objects and .def file, either automatical, or your
|
||||
# hand-written (maybe) file, which must have same basename as dll
|
||||
# WARNING! There MUST be object, which name match dll's name. Make sux.
|
||||
# 3. Create import library from .def (as for .dll, only its name required,
|
||||
# not dll itself)
|
||||
# By convention implibs for dll have .dll.a suffix, e.g. libstuff.dll.a
|
||||
# Why not just libstuff.a? 'Cos that's name for static lib, ok?
|
||||
# Process divided into 3 phases because:
|
||||
# 1. Pre-existent .def possible
|
||||
# 2. Generating implib is enough time-consuming
|
||||
#
|
||||
# Variables:
|
||||
# DLL_LDLIBS - libs for linking dll
|
||||
# DLL_LDFLAGS - flags for linking dll
|
||||
#
|
||||
# By using $(DLL_SUFFIX) instead of 'dll', e.g. stuff.$(DLL_SUFFIX)
|
||||
# you may help porting makefiles to other platforms
|
||||
#
|
||||
# Put this file in your make's include path (e.g. main include dir, for
|
||||
# more information see include section in make doc). Put in the beginning
|
||||
# of your own Makefile line "include dll.mk". Specify dependences, e.g.:
|
||||
#
|
||||
# Do all stuff in one step
|
||||
# libstuff.dll.a: $(OBJECTS) stuff.def
|
||||
# stuff.def: $(OBJECTS)
|
||||
#
|
||||
# Steps separated, pre-provided .def, link with user32
|
||||
#
|
||||
# DLL_LDLIBS=-luser32
|
||||
# stuff.dll: $(OBJECTS)
|
||||
# libstuff.dll.a: $(OBJECTS)
|
||||
|
||||
|
||||
DLLWRAP=dllwrap
|
||||
DLLTOOL=dlltool
|
||||
|
||||
DLL_SUFFIX=dll
|
||||
|
||||
.SUFFIXES: .o .$(DLL_SUFFIX)
|
||||
|
||||
_%.def: %.o
|
||||
$(DLLTOOL) --export-all --output-def $@ $^
|
||||
|
||||
%.$(DLL_SUFFIX): %.o
|
||||
$(DLLWRAP) --dllname $(notdir $@) --driver-name $(CC) --def $*.def -o $@ $(filter %.o,$^) $(DLL_LDFLAGS) $(DLL_LDLIBS)
|
||||
|
||||
lib%.$(DLL_SUFFIX).a:%.def
|
||||
$(DLLTOOL) --dllname $(notdir $*.dll) --def $< --output-lib $@
|
||||
|
||||
# End
|
@ -1,189 +0,0 @@
|
||||
/*************************************************
|
||||
* Perl-Compatible Regular Expressions *
|
||||
*************************************************/
|
||||
|
||||
/*
|
||||
This is a library of functions to support regular expressions whose syntax
|
||||
and semantics are as close as possible to those of the Perl 5 language. See
|
||||
the file Tech.Notes for some information on the internals.
|
||||
|
||||
Written by: Philip Hazel <ph10@cam.ac.uk>
|
||||
|
||||
Copyright (c) 1997-1999 University of Cambridge
|
||||
|
||||
-----------------------------------------------------------------------------
|
||||
Permission is granted to anyone to use this software for any purpose on any
|
||||
computer system, and to redistribute it freely, subject to the following
|
||||
restrictions:
|
||||
|
||||
1. This software is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
|
||||
|
||||
2. The origin of this software must not be misrepresented, either by
|
||||
explicit claim or by omission.
|
||||
|
||||
3. Altered versions must be plainly marked as such, and must not be
|
||||
misrepresented as being the original software.
|
||||
|
||||
4. If PCRE is embedded in any software that is released under the GNU
|
||||
General Purpose Licence (GPL), then the terms of that licence shall
|
||||
supersede any condition above with which it is incompatible.
|
||||
-----------------------------------------------------------------------------
|
||||
*/
|
||||
|
||||
/* This module contains some convenience functions for extracting substrings
|
||||
from the subject string after a regex match has succeeded. The original idea
|
||||
for these functions came from Scott Wimer <scottw@cgibuilder.com>. */
|
||||
|
||||
|
||||
/* Include the internals header, which itself includes Standard C headers plus
|
||||
the external pcre header. */
|
||||
|
||||
#include "internal.h"
|
||||
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Copy captured string to given buffer *
|
||||
*************************************************/
|
||||
|
||||
/* This function copies a single captured substring into a given buffer.
|
||||
Note that we use memcpy() rather than strncpy() in case there are binary zeros
|
||||
in the string.
|
||||
|
||||
Arguments:
|
||||
subject the subject string that was matched
|
||||
ovector pointer to the offsets table
|
||||
stringcount the number of substrings that were captured
|
||||
(i.e. the yield of the pcre_exec call, unless
|
||||
that was zero, in which case it should be 1/3
|
||||
of the offset table size)
|
||||
stringnumber the number of the required substring
|
||||
buffer where to put the substring
|
||||
size the size of the buffer
|
||||
|
||||
Returns: if successful:
|
||||
the length of the copied string, not including the zero
|
||||
that is put on the end; can be zero
|
||||
if not successful:
|
||||
PCRE_ERROR_NOMEMORY (-6) buffer too small
|
||||
PCRE_ERROR_NOSUBSTRING (-7) no such captured substring
|
||||
*/
|
||||
|
||||
int
|
||||
pcre_copy_substring(const char *subject, int *ovector, int stringcount,
|
||||
int stringnumber, char *buffer, int size)
|
||||
{
|
||||
int yield;
|
||||
if (stringnumber < 0 || stringnumber >= stringcount)
|
||||
return PCRE_ERROR_NOSUBSTRING;
|
||||
stringnumber *= 2;
|
||||
yield = ovector[stringnumber+1] - ovector[stringnumber];
|
||||
if (size < yield + 1) return PCRE_ERROR_NOMEMORY;
|
||||
memcpy(buffer, subject + ovector[stringnumber], yield);
|
||||
buffer[yield] = 0;
|
||||
return yield;
|
||||
}
|
||||
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Copy all captured strings to new store *
|
||||
*************************************************/
|
||||
|
||||
/* This function gets one chunk of store and builds a list of pointers and all
|
||||
of the captured substrings in it. A NULL pointer is put on the end of the list.
|
||||
|
||||
Arguments:
|
||||
subject the subject string that was matched
|
||||
ovector pointer to the offsets table
|
||||
stringcount the number of substrings that were captured
|
||||
(i.e. the yield of the pcre_exec call, unless
|
||||
that was zero, in which case it should be 1/3
|
||||
of the offset table size)
|
||||
listptr set to point to the list of pointers
|
||||
|
||||
Returns: if successful: 0
|
||||
if not successful:
|
||||
PCRE_ERROR_NOMEMORY (-6) failed to get store
|
||||
*/
|
||||
|
||||
int
|
||||
pcre_get_substring_list(const char *subject, int *ovector, int stringcount,
|
||||
const char ***listptr)
|
||||
{
|
||||
int i;
|
||||
int size = sizeof(char *);
|
||||
int double_count = stringcount * 2;
|
||||
char **stringlist;
|
||||
char *p;
|
||||
|
||||
for (i = 0; i < double_count; i += 2)
|
||||
size += sizeof(char *) + ovector[i+1] - ovector[i] + 1;
|
||||
|
||||
stringlist = (char **)(pcre_malloc)(size);
|
||||
if (stringlist == NULL) return PCRE_ERROR_NOMEMORY;
|
||||
|
||||
*listptr = (const char **)stringlist;
|
||||
p = (char *)(stringlist + stringcount + 1);
|
||||
|
||||
for (i = 0; i < double_count; i += 2)
|
||||
{
|
||||
int len = ovector[i+1] - ovector[i];
|
||||
memcpy(p, subject + ovector[i], len);
|
||||
*stringlist++ = p;
|
||||
p += len;
|
||||
*p++ = 0;
|
||||
}
|
||||
|
||||
*stringlist = NULL;
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Copy captured string to new store *
|
||||
*************************************************/
|
||||
|
||||
/* This function copies a single captured substring into a piece of new
|
||||
store
|
||||
|
||||
Arguments:
|
||||
subject the subject string that was matched
|
||||
ovector pointer to the offsets table
|
||||
stringcount the number of substrings that were captured
|
||||
(i.e. the yield of the pcre_exec call, unless
|
||||
that was zero, in which case it should be 1/3
|
||||
of the offset table size)
|
||||
stringnumber the number of the required substring
|
||||
stringptr where to put a pointer to the substring
|
||||
|
||||
Returns: if successful:
|
||||
the length of the string, not including the zero that
|
||||
is put on the end; can be zero
|
||||
if not successful:
|
||||
PCRE_ERROR_NOMEMORY (-6) failed to get store
|
||||
PCRE_ERROR_NOSUBSTRING (-7) substring not present
|
||||
*/
|
||||
|
||||
int
|
||||
pcre_get_substring(const char *subject, int *ovector, int stringcount,
|
||||
int stringnumber, const char **stringptr)
|
||||
{
|
||||
int yield;
|
||||
char *substring;
|
||||
if (stringnumber < 0 || stringnumber >= stringcount)
|
||||
return PCRE_ERROR_NOSUBSTRING;
|
||||
stringnumber *= 2;
|
||||
yield = ovector[stringnumber+1] - ovector[stringnumber];
|
||||
substring = (char *)(pcre_malloc)(yield + 1);
|
||||
if (substring == NULL) return PCRE_ERROR_NOMEMORY;
|
||||
memcpy(substring, subject + ovector[stringnumber], yield);
|
||||
substring[yield] = 0;
|
||||
*stringptr = substring;
|
||||
return yield;
|
||||
}
|
||||
|
||||
/* End of get.c */
|
@ -1,343 +0,0 @@
|
||||
/*************************************************
|
||||
* Perl-Compatible Regular Expressions *
|
||||
*************************************************/
|
||||
|
||||
|
||||
/* This is a library of functions to support regular expressions whose syntax
|
||||
and semantics are as close as possible to those of the Perl 5 language. See
|
||||
the file Tech.Notes for some information on the internals.
|
||||
|
||||
Written by: Philip Hazel <ph10@cam.ac.uk>
|
||||
|
||||
Copyright (c) 1997-1999 University of Cambridge
|
||||
|
||||
-----------------------------------------------------------------------------
|
||||
Permission is granted to anyone to use this software for any purpose on any
|
||||
computer system, and to redistribute it freely, subject to the following
|
||||
restrictions:
|
||||
|
||||
1. This software is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
|
||||
|
||||
2. The origin of this software must not be misrepresented, either by
|
||||
explicit claim or by omission.
|
||||
|
||||
3. Altered versions must be plainly marked as such, and must not be
|
||||
misrepresented as being the original software.
|
||||
|
||||
4. If PCRE is embedded in any software that is released under the GNU
|
||||
General Purpose Licence (GPL), then the terms of that licence shall
|
||||
supersede any condition above with which it is incompatible.
|
||||
-----------------------------------------------------------------------------
|
||||
*/
|
||||
|
||||
/* This header contains definitions that are shared between the different
|
||||
modules, but which are not relevant to the outside. */
|
||||
|
||||
/* To cope with SunOS4 and other systems that lack memmove() but have bcopy(),
|
||||
define a macro for memmove() if USE_BCOPY is defined. */
|
||||
|
||||
#ifdef USE_BCOPY
|
||||
#undef memmove /* some systems may have a macro */
|
||||
#define memmove(a, b, c) bcopy(b, a, c)
|
||||
#endif
|
||||
|
||||
/* Standard C headers plus the external interface definition */
|
||||
|
||||
#include <ctype.h>
|
||||
#include <limits.h>
|
||||
#include <stddef.h>
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include "pcre.h"
|
||||
|
||||
/* In case there is no definition of offsetof() provided - though any proper
|
||||
Standard C system should have one. */
|
||||
|
||||
#ifndef offsetof
|
||||
#define offsetof(p_type,field) ((size_t)&(((p_type *)0)->field))
|
||||
#endif
|
||||
|
||||
/* These are the public options that can change during matching. */
|
||||
|
||||
#define PCRE_IMS (PCRE_CASELESS|PCRE_MULTILINE|PCRE_DOTALL)
|
||||
|
||||
/* Private options flags start at the most significant end of the four bytes,
|
||||
but skip the top bit so we can use ints for convenience without getting tangled
|
||||
with negative values. The public options defined in pcre.h start at the least
|
||||
significant end. Make sure they don't overlap, though now that we have expanded
|
||||
to four bytes there is plenty of space. */
|
||||
|
||||
#define PCRE_FIRSTSET 0x40000000 /* first_char is set */
|
||||
#define PCRE_REQCHSET 0x20000000 /* req_char is set */
|
||||
#define PCRE_STARTLINE 0x10000000 /* start after \n for multiline */
|
||||
#define PCRE_INGROUP 0x08000000 /* compiling inside a group */
|
||||
#define PCRE_ICHANGED 0x04000000 /* i option changes within regex */
|
||||
|
||||
/* Options for the "extra" block produced by pcre_study(). */
|
||||
|
||||
#define PCRE_STUDY_MAPPED 0x01 /* a map of starting chars exists */
|
||||
|
||||
/* Masks for identifying the public options which are permitted at compile
|
||||
time, run time or study time, respectively. */
|
||||
|
||||
#define PUBLIC_OPTIONS \
|
||||
(PCRE_CASELESS|PCRE_EXTENDED|PCRE_ANCHORED|PCRE_MULTILINE| \
|
||||
PCRE_DOTALL|PCRE_DOLLAR_ENDONLY|PCRE_EXTRA|PCRE_UNGREEDY)
|
||||
|
||||
#define PUBLIC_EXEC_OPTIONS \
|
||||
(PCRE_ANCHORED|PCRE_NOTBOL|PCRE_NOTEOL|PCRE_NOTEMPTY)
|
||||
|
||||
#define PUBLIC_STUDY_OPTIONS 0 /* None defined */
|
||||
|
||||
/* Magic number to provide a small check against being handed junk. */
|
||||
|
||||
#define MAGIC_NUMBER 0x50435245UL /* 'PCRE' */
|
||||
|
||||
/* Miscellaneous definitions */
|
||||
|
||||
typedef int BOOL;
|
||||
|
||||
#define FALSE 0
|
||||
#define TRUE 1
|
||||
|
||||
/* These are escaped items that aren't just an encoding of a particular data
|
||||
value such as \n. They must have non-zero values, as check_escape() returns
|
||||
their negation. Also, they must appear in the same order as in the opcode
|
||||
definitions below, up to ESC_z. The final one must be ESC_REF as subsequent
|
||||
values are used for \1, \2, \3, etc. There is a test in the code for an escape
|
||||
greater than ESC_b and less than ESC_X to detect the types that may be
|
||||
repeated. If any new escapes are put in-between that don't consume a character,
|
||||
that code will have to change. */
|
||||
|
||||
enum { ESC_A = 1, ESC_B, ESC_b, ESC_D, ESC_d, ESC_S, ESC_s, ESC_W, ESC_w,
|
||||
ESC_Z, ESC_z, ESC_REF };
|
||||
|
||||
/* Opcode table: OP_BRA must be last, as all values >= it are used for brackets
|
||||
that extract substrings. Starting from 1 (i.e. after OP_END), the values up to
|
||||
OP_EOD must correspond in order to the list of escapes immediately above. */
|
||||
|
||||
enum {
|
||||
OP_END, /* End of pattern */
|
||||
|
||||
/* Values corresponding to backslashed metacharacters */
|
||||
|
||||
OP_SOD, /* Start of data: \A */
|
||||
OP_NOT_WORD_BOUNDARY, /* \B */
|
||||
OP_WORD_BOUNDARY, /* \b */
|
||||
OP_NOT_DIGIT, /* \D */
|
||||
OP_DIGIT, /* \d */
|
||||
OP_NOT_WHITESPACE, /* \S */
|
||||
OP_WHITESPACE, /* \s */
|
||||
OP_NOT_WORDCHAR, /* \W */
|
||||
OP_WORDCHAR, /* \w */
|
||||
OP_EODN, /* End of data or \n at end of data: \Z. */
|
||||
OP_EOD, /* End of data: \z */
|
||||
|
||||
OP_OPT, /* Set runtime options */
|
||||
OP_CIRC, /* Start of line - varies with multiline switch */
|
||||
OP_DOLL, /* End of line - varies with multiline switch */
|
||||
OP_ANY, /* Match any character */
|
||||
OP_CHARS, /* Match string of characters */
|
||||
OP_NOT, /* Match anything but the following char */
|
||||
|
||||
OP_STAR, /* The maximizing and minimizing versions of */
|
||||
OP_MINSTAR, /* all these opcodes must come in pairs, with */
|
||||
OP_PLUS, /* the minimizing one second. */
|
||||
OP_MINPLUS, /* This first set applies to single characters */
|
||||
OP_QUERY,
|
||||
OP_MINQUERY,
|
||||
OP_UPTO, /* From 0 to n matches */
|
||||
OP_MINUPTO,
|
||||
OP_EXACT, /* Exactly n matches */
|
||||
|
||||
OP_NOTSTAR, /* The maximizing and minimizing versions of */
|
||||
OP_NOTMINSTAR, /* all these opcodes must come in pairs, with */
|
||||
OP_NOTPLUS, /* the minimizing one second. */
|
||||
OP_NOTMINPLUS, /* This first set applies to "not" single characters */
|
||||
OP_NOTQUERY,
|
||||
OP_NOTMINQUERY,
|
||||
OP_NOTUPTO, /* From 0 to n matches */
|
||||
OP_NOTMINUPTO,
|
||||
OP_NOTEXACT, /* Exactly n matches */
|
||||
|
||||
OP_TYPESTAR, /* The maximizing and minimizing versions of */
|
||||
OP_TYPEMINSTAR, /* all these opcodes must come in pairs, with */
|
||||
OP_TYPEPLUS, /* the minimizing one second. These codes must */
|
||||
OP_TYPEMINPLUS, /* be in exactly the same order as those above. */
|
||||
OP_TYPEQUERY, /* This set applies to character types such as \d */
|
||||
OP_TYPEMINQUERY,
|
||||
OP_TYPEUPTO, /* From 0 to n matches */
|
||||
OP_TYPEMINUPTO,
|
||||
OP_TYPEEXACT, /* Exactly n matches */
|
||||
|
||||
OP_CRSTAR, /* The maximizing and minimizing versions of */
|
||||
OP_CRMINSTAR, /* all these opcodes must come in pairs, with */
|
||||
OP_CRPLUS, /* the minimizing one second. These codes must */
|
||||
OP_CRMINPLUS, /* be in exactly the same order as those above. */
|
||||
OP_CRQUERY, /* These are for character classes and back refs */
|
||||
OP_CRMINQUERY,
|
||||
OP_CRRANGE, /* These are different to the three seta above. */
|
||||
OP_CRMINRANGE,
|
||||
|
||||
OP_CLASS, /* Match a character class */
|
||||
OP_REF, /* Match a back reference */
|
||||
|
||||
OP_ALT, /* Start of alternation */
|
||||
OP_KET, /* End of group that doesn't have an unbounded repeat */
|
||||
OP_KETRMAX, /* These two must remain together and in this */
|
||||
OP_KETRMIN, /* order. They are for groups the repeat for ever. */
|
||||
|
||||
/* The assertions must come before ONCE and COND */
|
||||
|
||||
OP_ASSERT, /* Positive lookahead */
|
||||
OP_ASSERT_NOT, /* Negative lookahead */
|
||||
OP_ASSERTBACK, /* Positive lookbehind */
|
||||
OP_ASSERTBACK_NOT, /* Negative lookbehind */
|
||||
OP_REVERSE, /* Move pointer back - used in lookbehind assertions */
|
||||
|
||||
/* ONCE and COND must come after the assertions, with ONCE first, as there's
|
||||
a test for >= ONCE for a subpattern that isn't an assertion. */
|
||||
|
||||
OP_ONCE, /* Once matched, don't back up into the subpattern */
|
||||
OP_COND, /* Conditional group */
|
||||
OP_CREF, /* Used to hold an extraction string number */
|
||||
|
||||
OP_BRAZERO, /* These two must remain together and in this */
|
||||
OP_BRAMINZERO, /* order. */
|
||||
|
||||
OP_BRA /* This and greater values are used for brackets that
|
||||
extract substrings. */
|
||||
};
|
||||
|
||||
/* The highest extraction number. This is limited by the number of opcodes
|
||||
left after OP_BRA, i.e. 255 - OP_BRA. We actually set it somewhat lower. */
|
||||
|
||||
#define EXTRACT_MAX 99
|
||||
|
||||
/* The texts of compile-time error messages are defined as macros here so that
|
||||
they can be accessed by the POSIX wrapper and converted into error codes. Yes,
|
||||
I could have used error codes in the first place, but didn't feel like changing
|
||||
just to accommodate the POSIX wrapper. */
|
||||
|
||||
#define ERR1 "\\ at end of pattern"
|
||||
#define ERR2 "\\c at end of pattern"
|
||||
#define ERR3 "unrecognized character follows \\"
|
||||
#define ERR4 "numbers out of order in {} quantifier"
|
||||
#define ERR5 "number too big in {} quantifier"
|
||||
#define ERR6 "missing terminating ] for character class"
|
||||
#define ERR7 "invalid escape sequence in character class"
|
||||
#define ERR8 "range out of order in character class"
|
||||
#define ERR9 "nothing to repeat"
|
||||
#define ERR10 "operand of unlimited repeat could match the empty string"
|
||||
#define ERR11 "internal error: unexpected repeat"
|
||||
#define ERR12 "unrecognized character after (?"
|
||||
#define ERR13 "too many capturing parenthesized sub-patterns"
|
||||
#define ERR14 "missing )"
|
||||
#define ERR15 "back reference to non-existent subpattern"
|
||||
#define ERR16 "erroffset passed as NULL"
|
||||
#define ERR17 "unknown option bit(s) set"
|
||||
#define ERR18 "missing ) after comment"
|
||||
#define ERR19 "too many sets of parentheses"
|
||||
#define ERR20 "regular expression too large"
|
||||
#define ERR21 "failed to get memory"
|
||||
#define ERR22 "unmatched parentheses"
|
||||
#define ERR23 "internal error: code overflow"
|
||||
#define ERR24 "unrecognized character after (?<"
|
||||
#define ERR25 "lookbehind assertion is not fixed length"
|
||||
#define ERR26 "malformed number after (?("
|
||||
#define ERR27 "conditional group contains more than two branches"
|
||||
#define ERR28 "assertion expected after (?("
|
||||
|
||||
/* All character handling must be done as unsigned characters. Otherwise there
|
||||
are problems with top-bit-set characters and functions such as isspace().
|
||||
However, we leave the interface to the outside world as char *, because that
|
||||
should make things easier for callers. We define a short type for unsigned char
|
||||
to save lots of typing. I tried "uchar", but it causes problems on Digital
|
||||
Unix, where it is defined in sys/types, so use "uschar" instead. */
|
||||
|
||||
typedef unsigned char uschar;
|
||||
|
||||
/* The real format of the start of the pcre block; the actual code vector
|
||||
runs on as long as necessary after the end. */
|
||||
|
||||
typedef struct real_pcre {
|
||||
unsigned long int magic_number;
|
||||
const unsigned char *tables;
|
||||
unsigned long int options;
|
||||
uschar top_bracket;
|
||||
uschar top_backref;
|
||||
uschar first_char;
|
||||
uschar req_char;
|
||||
uschar code[1];
|
||||
} real_pcre;
|
||||
|
||||
/* The real format of the extra block returned by pcre_study(). */
|
||||
|
||||
typedef struct real_pcre_extra {
|
||||
uschar options;
|
||||
uschar start_bits[32];
|
||||
} real_pcre_extra;
|
||||
|
||||
|
||||
/* Structure for passing "static" information around between the functions
|
||||
doing the compiling, so that they are thread-safe. */
|
||||
|
||||
typedef struct compile_data {
|
||||
const uschar *lcc; /* Points to lower casing table */
|
||||
const uschar *fcc; /* Points to case-flipping table */
|
||||
const uschar *cbits; /* Points to character type table */
|
||||
const uschar *ctypes; /* Points to table of type maps */
|
||||
} compile_data;
|
||||
|
||||
/* Structure for passing "static" information around between the functions
|
||||
doing the matching, so that they are thread-safe. */
|
||||
|
||||
typedef struct match_data {
|
||||
int errorcode; /* As it says */
|
||||
int *offset_vector; /* Offset vector */
|
||||
int offset_end; /* One past the end */
|
||||
int offset_max; /* The maximum usable for return data */
|
||||
const uschar *lcc; /* Points to lower casing table */
|
||||
const uschar *ctypes; /* Points to table of type maps */
|
||||
BOOL offset_overflow; /* Set if too many extractions */
|
||||
BOOL notbol; /* NOTBOL flag */
|
||||
BOOL noteol; /* NOTEOL flag */
|
||||
BOOL endonly; /* Dollar not before final \n */
|
||||
BOOL notempty; /* Empty string match not wanted */
|
||||
const uschar *start_subject; /* Start of the subject string */
|
||||
const uschar *end_subject; /* End of the subject string */
|
||||
const uschar *start_match; /* Start of this match attempt */
|
||||
const uschar *end_match_ptr; /* Subject position at end match */
|
||||
int end_offset_top; /* Highwater mark at end of match */
|
||||
} match_data;
|
||||
|
||||
/* Bit definitions for entries in the pcre_ctypes table. */
|
||||
|
||||
#define ctype_space 0x01
|
||||
#define ctype_letter 0x02
|
||||
#define ctype_digit 0x04
|
||||
#define ctype_xdigit 0x08
|
||||
#define ctype_word 0x10 /* alphameric or '_' */
|
||||
#define ctype_meta 0x80 /* regexp meta char or zero (end pattern) */
|
||||
|
||||
/* Offsets for the bitmap tables in pcre_cbits. Each table contains a set
|
||||
of bits for a class map. */
|
||||
|
||||
#define cbit_digit 0 /* for \d */
|
||||
#define cbit_word 32 /* for \w */
|
||||
#define cbit_space 64 /* for \s */
|
||||
#define cbit_length 96 /* Length of the cbits table */
|
||||
|
||||
/* Offsets of the various tables from the base tables pointer, and
|
||||
total length. */
|
||||
|
||||
#define lcc_offset 0
|
||||
#define fcc_offset 256
|
||||
#define cbits_offset 512
|
||||
#define ctypes_offset (cbits_offset + cbit_length)
|
||||
#define tables_length (ctypes_offset + 256)
|
||||
|
||||
/* End of internal.h */
|
@ -1,113 +0,0 @@
|
||||
/*************************************************
|
||||
* Perl-Compatible Regular Expressions *
|
||||
*************************************************/
|
||||
|
||||
/*
|
||||
PCRE is a library of functions to support regular expressions whose syntax
|
||||
and semantics are as close as possible to those of the Perl 5 language.
|
||||
|
||||
Written by: Philip Hazel <ph10@cam.ac.uk>
|
||||
|
||||
Copyright (c) 1997-1999 University of Cambridge
|
||||
|
||||
-----------------------------------------------------------------------------
|
||||
Permission is granted to anyone to use this software for any purpose on any
|
||||
computer system, and to redistribute it freely, subject to the following
|
||||
restrictions:
|
||||
|
||||
1. This software is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
|
||||
|
||||
2. The origin of this software must not be misrepresented, either by
|
||||
explicit claim or by omission.
|
||||
|
||||
3. Altered versions must be plainly marked as such, and must not be
|
||||
misrepresented as being the original software.
|
||||
|
||||
4. If PCRE is embedded in any software that is released under the GNU
|
||||
General Purpose Licence (GPL), then the terms of that licence shall
|
||||
supersede any condition above with which it is incompatible.
|
||||
-----------------------------------------------------------------------------
|
||||
|
||||
See the file Tech.Notes for some information on the internals.
|
||||
*/
|
||||
|
||||
|
||||
/* This file is compiled on its own as part of the PCRE library. However,
|
||||
it is also included in the compilation of dftables.c, in which case the macro
|
||||
DFTABLES is defined. */
|
||||
|
||||
#ifndef DFTABLES
|
||||
#include "internal.h"
|
||||
#endif
|
||||
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Create PCRE character tables *
|
||||
*************************************************/
|
||||
|
||||
/* This function builds a set of character tables for use by PCRE and returns
|
||||
a pointer to them. They are build using the ctype functions, and consequently
|
||||
their contents will depend upon the current locale setting. When compiled as
|
||||
part of the library, the store is obtained via pcre_malloc(), but when compiled
|
||||
inside dftables, use malloc().
|
||||
|
||||
Arguments: none
|
||||
Returns: pointer to the contiguous block of data
|
||||
*/
|
||||
|
||||
unsigned const char *
|
||||
pcre_maketables(void)
|
||||
{
|
||||
unsigned char *yield, *p;
|
||||
int i;
|
||||
|
||||
#ifndef DFTABLES
|
||||
yield = (unsigned char*)(pcre_malloc)(tables_length);
|
||||
#else
|
||||
yield = (unsigned char*)malloc(tables_length);
|
||||
#endif
|
||||
|
||||
if (yield == NULL) return NULL;
|
||||
p = yield;
|
||||
|
||||
/* First comes the lower casing table */
|
||||
|
||||
for (i = 0; i < 256; i++) *p++ = tolower(i);
|
||||
|
||||
/* Next the case-flipping table */
|
||||
|
||||
for (i = 0; i < 256; i++) *p++ = islower(i)? toupper(i) : tolower(i);
|
||||
|
||||
/* Then the character class tables */
|
||||
|
||||
memset(p, 0, cbit_length);
|
||||
for (i = 0; i < 256; i++)
|
||||
{
|
||||
if (isdigit(i)) p[cbit_digit + i/8] |= 1 << (i&7);
|
||||
if (isalnum(i) || i == '_')
|
||||
p[cbit_word + i/8] |= 1 << (i&7);
|
||||
if (isspace(i)) p[cbit_space + i/8] |= 1 << (i&7);
|
||||
}
|
||||
p += cbit_length;
|
||||
|
||||
/* Finally, the character type table */
|
||||
|
||||
for (i = 0; i < 256; i++)
|
||||
{
|
||||
int x = 0;
|
||||
if (isspace(i)) x += ctype_space;
|
||||
if (isalpha(i)) x += ctype_letter;
|
||||
if (isdigit(i)) x += ctype_digit;
|
||||
if (isxdigit(i)) x += ctype_xdigit;
|
||||
if (isalnum(i) || i == '_') x += ctype_word;
|
||||
if (strchr("*+?{^.$|()[", i) != 0) x += ctype_meta;
|
||||
*p++ = x;
|
||||
}
|
||||
|
||||
return yield;
|
||||
}
|
||||
|
||||
/* End of maketables.c */
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@ -1,19 +0,0 @@
|
||||
EXPORTS
|
||||
|
||||
pcre_malloc DATA
|
||||
pcre_free DATA
|
||||
|
||||
pcre_compile
|
||||
pcre_copy_substring
|
||||
pcre_exec
|
||||
pcre_get_substring
|
||||
pcre_get_substring_list
|
||||
pcre_info
|
||||
pcre_maketables
|
||||
pcre_study
|
||||
pcre_version
|
||||
|
||||
regcomp
|
||||
regexec
|
||||
regerror
|
||||
regfree
|
@ -1,96 +0,0 @@
|
||||
/*************************************************
|
||||
* Perl-Compatible Regular Expressions *
|
||||
*************************************************/
|
||||
|
||||
/* Copyright (c) 1997-1999 University of Cambridge */
|
||||
|
||||
#ifndef _PCRE_H
|
||||
#define _PCRE_H
|
||||
|
||||
#define PCRE_MAJOR 2
|
||||
#define PCRE_MINOR 08
|
||||
#define PCRE_DATE 31-Aug-1999
|
||||
|
||||
#include "php_compat.h"
|
||||
|
||||
/* Win32 uses DLL by default */
|
||||
|
||||
#ifdef _WIN32
|
||||
# ifdef STATIC
|
||||
# define PCRE_DL_IMPORT
|
||||
# else
|
||||
# define PCRE_DL_IMPORT __declspec(dllimport)
|
||||
# endif
|
||||
#else
|
||||
# define PCRE_DL_IMPORT
|
||||
#endif
|
||||
|
||||
/* Have to include stdlib.h in order to ensure that size_t is defined;
|
||||
it is needed here for malloc. */
|
||||
|
||||
#include <sys/types.h>
|
||||
#include <stdlib.h>
|
||||
|
||||
/* Allow for C++ users */
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
/* Options */
|
||||
|
||||
#define PCRE_CASELESS 0x0001
|
||||
#define PCRE_MULTILINE 0x0002
|
||||
#define PCRE_DOTALL 0x0004
|
||||
#define PCRE_EXTENDED 0x0008
|
||||
#define PCRE_ANCHORED 0x0010
|
||||
#define PCRE_DOLLAR_ENDONLY 0x0020
|
||||
#define PCRE_EXTRA 0x0040
|
||||
#define PCRE_NOTBOL 0x0080
|
||||
#define PCRE_NOTEOL 0x0100
|
||||
#define PCRE_UNGREEDY 0x0200
|
||||
#define PCRE_NOTEMPTY 0x0400
|
||||
|
||||
/* Exec-time and get-time error codes */
|
||||
|
||||
#define PCRE_ERROR_NOMATCH (-1)
|
||||
#define PCRE_ERROR_NULL (-2)
|
||||
#define PCRE_ERROR_BADOPTION (-3)
|
||||
#define PCRE_ERROR_BADMAGIC (-4)
|
||||
#define PCRE_ERROR_UNKNOWN_NODE (-5)
|
||||
#define PCRE_ERROR_NOMEMORY (-6)
|
||||
#define PCRE_ERROR_NOSUBSTRING (-7)
|
||||
|
||||
/* Types */
|
||||
|
||||
typedef void pcre;
|
||||
typedef void pcre_extra;
|
||||
|
||||
/* Store get and free functions. These can be set to alternative malloc/free
|
||||
functions if required. Some magic is required for Win32 DLL; it is null on
|
||||
other OS. */
|
||||
|
||||
PCRE_DL_IMPORT extern void *(*pcre_malloc)(size_t);
|
||||
PCRE_DL_IMPORT extern void (*pcre_free)(void *);
|
||||
|
||||
#undef PCRE_DL_IMPORT
|
||||
|
||||
/* Functions */
|
||||
|
||||
extern pcre *pcre_compile(const char *, int, const char **, int *,
|
||||
const unsigned char *);
|
||||
extern int pcre_copy_substring(const char *, int *, int, int, char *, int);
|
||||
extern int pcre_exec(const pcre *, const pcre_extra *, const char *,
|
||||
int, int, int, int *, int);
|
||||
extern int pcre_get_substring(const char *, int *, int, int, const char **);
|
||||
extern int pcre_get_substring_list(const char *, int *, int, const char ***);
|
||||
extern int pcre_info(const pcre *, int *, int *);
|
||||
extern unsigned const char *pcre_maketables(void);
|
||||
extern pcre_extra *pcre_study(const pcre *, int, const char **);
|
||||
extern const char *pcre_version(void);
|
||||
|
||||
#ifdef __cplusplus
|
||||
} /* extern "C" */
|
||||
#endif
|
||||
|
||||
#endif /* End of pcre.h */
|
@ -1,141 +0,0 @@
|
||||
.TH PCRE 3
|
||||
.SH NAME
|
||||
pcreposix - POSIX API for Perl-compatible regular expressions.
|
||||
.SH SYNOPSIS
|
||||
.B #include <pcreposix.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int regcomp(regex_t *\fIpreg\fR, const char *\fIpattern\fR,
|
||||
.ti +5n
|
||||
.B int \fIcflags\fR);
|
||||
.PP
|
||||
.br
|
||||
.B int regexec(regex_t *\fIpreg\fR, const char *\fIstring\fR,
|
||||
.ti +5n
|
||||
.B size_t \fInmatch\fR, regmatch_t \fIpmatch\fR[], int \fIeflags\fR);
|
||||
.PP
|
||||
.br
|
||||
.B size_t regerror(int \fIerrcode\fR, const regex_t *\fIpreg\fR,
|
||||
.ti +5n
|
||||
.B char *\fIerrbuf\fR, size_t \fIerrbuf_size\fR);
|
||||
.PP
|
||||
.br
|
||||
.B void regfree(regex_t *\fIpreg\fR);
|
||||
|
||||
|
||||
.SH DESCRIPTION
|
||||
This set of functions provides a POSIX-style API to the PCRE regular expression
|
||||
package. See the \fBpcre\fR documentation for a description of the native API,
|
||||
which contains additional functionality.
|
||||
|
||||
The functions described here are just wrapper functions that ultimately call
|
||||
the native API. Their prototypes are defined in the \fBpcreposix.h\fR header
|
||||
file, and on Unix systems the library itself is called \fBpcreposix.a\fR, so
|
||||
can be accessed by adding \fB-lpcreposix\fR to the command for linking an
|
||||
application which uses them. Because the POSIX functions call the native ones,
|
||||
it is also necessary to add \fR-lpcre\fR.
|
||||
|
||||
As I am pretty ignorant about POSIX, these functions must be considered as
|
||||
experimental. I have implemented only those option bits that can be reasonably
|
||||
mapped to PCRE native options. Other POSIX options are not even defined. It may
|
||||
be that it is useful to define, but ignore, other options. Feedback from more
|
||||
knowledgeable folk may cause this kind of detail to change.
|
||||
|
||||
When PCRE is called via these functions, it is only the API that is POSIX-like
|
||||
in style. The syntax and semantics of the regular expressions themselves are
|
||||
still those of Perl, subject to the setting of various PCRE options, as
|
||||
described below.
|
||||
|
||||
The header for these functions is supplied as \fBpcreposix.h\fR to avoid any
|
||||
potential clash with other POSIX libraries. It can, of course, be renamed or
|
||||
aliased as \fBregex.h\fR, which is the "correct" name. It provides two
|
||||
structure types, \fIregex_t\fR for compiled internal forms, and
|
||||
\fIregmatch_t\fR for returning captured substrings. It also defines some
|
||||
constants whose names start with "REG_"; these are used for setting options and
|
||||
identifying error codes.
|
||||
|
||||
|
||||
.SH COMPILING A PATTERN
|
||||
|
||||
The function \fBregcomp()\fR is called to compile a pattern into an
|
||||
internal form. The pattern is a C string terminated by a binary zero, and
|
||||
is passed in the argument \fIpattern\fR. The \fIpreg\fR argument is a pointer
|
||||
to a regex_t structure which is used as a base for storing information about
|
||||
the compiled expression.
|
||||
|
||||
The argument \fIcflags\fR is either zero, or contains one or more of the bits
|
||||
defined by the following macros:
|
||||
|
||||
REG_ICASE
|
||||
|
||||
The PCRE_CASELESS option is set when the expression is passed for compilation
|
||||
to the native function.
|
||||
|
||||
REG_NEWLINE
|
||||
|
||||
The PCRE_MULTILINE option is set when the expression is passed for compilation
|
||||
to the native function.
|
||||
|
||||
The yield of \fBregcomp()\fR is zero on success, and non-zero otherwise. The
|
||||
\fIpreg\fR structure is filled in on success, and one member of the structure
|
||||
is publicized: \fIre_nsub\fR contains the number of capturing subpatterns in
|
||||
the regular expression. Various error codes are defined in the header file.
|
||||
|
||||
|
||||
.SH MATCHING A PATTERN
|
||||
The function \fBregexec()\fR is called to match a pre-compiled pattern
|
||||
\fIpreg\fR against a given \fIstring\fR, which is terminated by a zero byte,
|
||||
subject to the options in \fIeflags\fR. These can be:
|
||||
|
||||
REG_NOTBOL
|
||||
|
||||
The PCRE_NOTBOL option is set when calling the underlying PCRE matching
|
||||
function.
|
||||
|
||||
REG_NOTEOL
|
||||
|
||||
The PCRE_NOTEOL option is set when calling the underlying PCRE matching
|
||||
function.
|
||||
|
||||
The portion of the string that was matched, and also any captured substrings,
|
||||
are returned via the \fIpmatch\fR argument, which points to an array of
|
||||
\fInmatch\fR structures of type \fIregmatch_t\fR, containing the members
|
||||
\fIrm_so\fR and \fIrm_eo\fR. These contain the offset to the first character of
|
||||
each substring and the offset to the first character after the end of each
|
||||
substring, respectively. The 0th element of the vector relates to the entire
|
||||
portion of \fIstring\fR that was matched; subsequent elements relate to the
|
||||
capturing subpatterns of the regular expression. Unused entries in the array
|
||||
have both structure members set to -1.
|
||||
|
||||
A successful match yields a zero return; various error codes are defined in the
|
||||
header file, of which REG_NOMATCH is the "expected" failure code.
|
||||
|
||||
|
||||
.SH ERROR MESSAGES
|
||||
The \fBregerror()\fR function maps a non-zero errorcode from either
|
||||
\fBregcomp\fR or \fBregexec\fR to a printable message. If \fIpreg\fR is not
|
||||
NULL, the error should have arisen from the use of that structure. A message
|
||||
terminated by a binary zero is placed in \fIerrbuf\fR. The length of the
|
||||
message, including the zero, is limited to \fIerrbuf_size\fR. The yield of the
|
||||
function is the size of buffer needed to hold the whole message.
|
||||
|
||||
|
||||
.SH STORAGE
|
||||
Compiling a regular expression causes memory to be allocated and associated
|
||||
with the \fIpreg\fR structure. The function \fBregfree()\fR frees all such
|
||||
memory, after which \fIpreg\fR may no longer be used as a compiled expression.
|
||||
|
||||
|
||||
.SH AUTHOR
|
||||
Philip Hazel <ph10@cam.ac.uk>
|
||||
.br
|
||||
University Computing Service,
|
||||
.br
|
||||
New Museums Site,
|
||||
.br
|
||||
Cambridge CB2 3QG, England.
|
||||
.br
|
||||
Phone: +44 1223 334714
|
||||
|
||||
Copyright (c) 1997-1999 University of Cambridge.
|
@ -1,182 +0,0 @@
|
||||
<HTML>
|
||||
<HEAD>
|
||||
<TITLE>pcreposix specification</TITLE>
|
||||
</HEAD>
|
||||
<body bgcolor="#FFFFFF" text="#00005A">
|
||||
<H1>pcreposix specification</H1>
|
||||
This HTML document has been generated automatically from the original man page.
|
||||
If there is any nonsense in it, please consult the man page in case the
|
||||
conversion went wrong.
|
||||
<UL>
|
||||
<LI><A NAME="TOC1" HREF="#SEC1">NAME</A>
|
||||
<LI><A NAME="TOC2" HREF="#SEC2">SYNOPSIS</A>
|
||||
<LI><A NAME="TOC3" HREF="#SEC3">DESCRIPTION</A>
|
||||
<LI><A NAME="TOC4" HREF="#SEC4">COMPILING A PATTERN</A>
|
||||
<LI><A NAME="TOC5" HREF="#SEC5">MATCHING A PATTERN</A>
|
||||
<LI><A NAME="TOC6" HREF="#SEC6">ERROR MESSAGES</A>
|
||||
<LI><A NAME="TOC7" HREF="#SEC7">STORAGE</A>
|
||||
<LI><A NAME="TOC8" HREF="#SEC8">AUTHOR</A>
|
||||
</UL>
|
||||
<LI><A NAME="SEC1" HREF="#TOC1">NAME</A>
|
||||
<P>
|
||||
pcreposix - POSIX API for Perl-compatible regular expressions.
|
||||
</P>
|
||||
<LI><A NAME="SEC2" HREF="#TOC1">SYNOPSIS</A>
|
||||
<P>
|
||||
<B>#include <pcreposix.h></B>
|
||||
</P>
|
||||
<P>
|
||||
<B>int regcomp(regex_t *<I>preg</I>, const char *<I>pattern</I>,</B>
|
||||
<B>int <I>cflags</I>);</B>
|
||||
</P>
|
||||
<P>
|
||||
<B>int regexec(regex_t *<I>preg</I>, const char *<I>string</I>,</B>
|
||||
<B>size_t <I>nmatch</I>, regmatch_t <I>pmatch</I>[], int <I>eflags</I>);</B>
|
||||
</P>
|
||||
<P>
|
||||
<B>size_t regerror(int <I>errcode</I>, const regex_t *<I>preg</I>,</B>
|
||||
<B>char *<I>errbuf</I>, size_t <I>errbuf_size</I>);</B>
|
||||
</P>
|
||||
<P>
|
||||
<B>void regfree(regex_t *<I>preg</I>);</B>
|
||||
</P>
|
||||
<LI><A NAME="SEC3" HREF="#TOC1">DESCRIPTION</A>
|
||||
<P>
|
||||
This set of functions provides a POSIX-style API to the PCRE regular expression
|
||||
package. See the <B>pcre</B> documentation for a description of the native API,
|
||||
which contains additional functionality.
|
||||
</P>
|
||||
<P>
|
||||
The functions described here are just wrapper functions that ultimately call
|
||||
the native API. Their prototypes are defined in the <B>pcreposix.h</B> header
|
||||
file, and on Unix systems the library itself is called <B>pcreposix.a</B>, so
|
||||
can be accessed by adding <B>-lpcreposix</B> to the command for linking an
|
||||
application which uses them. Because the POSIX functions call the native ones,
|
||||
it is also necessary to add \fR-lpcre\fR.
|
||||
</P>
|
||||
<P>
|
||||
As I am pretty ignorant about POSIX, these functions must be considered as
|
||||
experimental. I have implemented only those option bits that can be reasonably
|
||||
mapped to PCRE native options. Other POSIX options are not even defined. It may
|
||||
be that it is useful to define, but ignore, other options. Feedback from more
|
||||
knowledgeable folk may cause this kind of detail to change.
|
||||
</P>
|
||||
<P>
|
||||
When PCRE is called via these functions, it is only the API that is POSIX-like
|
||||
in style. The syntax and semantics of the regular expressions themselves are
|
||||
still those of Perl, subject to the setting of various PCRE options, as
|
||||
described below.
|
||||
</P>
|
||||
<P>
|
||||
The header for these functions is supplied as <B>pcreposix.h</B> to avoid any
|
||||
potential clash with other POSIX libraries. It can, of course, be renamed or
|
||||
aliased as <B>regex.h</B>, which is the "correct" name. It provides two
|
||||
structure types, <I>regex_t</I> for compiled internal forms, and
|
||||
<I>regmatch_t</I> for returning captured substrings. It also defines some
|
||||
constants whose names start with "REG_"; these are used for setting options and
|
||||
identifying error codes.
|
||||
</P>
|
||||
<LI><A NAME="SEC4" HREF="#TOC1">COMPILING A PATTERN</A>
|
||||
<P>
|
||||
The function <B>regcomp()</B> is called to compile a pattern into an
|
||||
internal form. The pattern is a C string terminated by a binary zero, and
|
||||
is passed in the argument <I>pattern</I>. The <I>preg</I> argument is a pointer
|
||||
to a regex_t structure which is used as a base for storing information about
|
||||
the compiled expression.
|
||||
</P>
|
||||
<P>
|
||||
The argument <I>cflags</I> is either zero, or contains one or more of the bits
|
||||
defined by the following macros:
|
||||
</P>
|
||||
<P>
|
||||
<PRE>
|
||||
REG_ICASE
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
The PCRE_CASELESS option is set when the expression is passed for compilation
|
||||
to the native function.
|
||||
</P>
|
||||
<P>
|
||||
<PRE>
|
||||
REG_NEWLINE
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
The PCRE_MULTILINE option is set when the expression is passed for compilation
|
||||
to the native function.
|
||||
</P>
|
||||
<P>
|
||||
The yield of <B>regcomp()</B> is zero on success, and non-zero otherwise. The
|
||||
<I>preg</I> structure is filled in on success, and one member of the structure
|
||||
is publicized: <I>re_nsub</I> contains the number of capturing subpatterns in
|
||||
the regular expression. Various error codes are defined in the header file.
|
||||
</P>
|
||||
<LI><A NAME="SEC5" HREF="#TOC1">MATCHING A PATTERN</A>
|
||||
<P>
|
||||
The function <B>regexec()</B> is called to match a pre-compiled pattern
|
||||
<I>preg</I> against a given <I>string</I>, which is terminated by a zero byte,
|
||||
subject to the options in <I>eflags</I>. These can be:
|
||||
</P>
|
||||
<P>
|
||||
<PRE>
|
||||
REG_NOTBOL
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
The PCRE_NOTBOL option is set when calling the underlying PCRE matching
|
||||
function.
|
||||
</P>
|
||||
<P>
|
||||
<PRE>
|
||||
REG_NOTEOL
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
The PCRE_NOTEOL option is set when calling the underlying PCRE matching
|
||||
function.
|
||||
</P>
|
||||
<P>
|
||||
The portion of the string that was matched, and also any captured substrings,
|
||||
are returned via the <I>pmatch</I> argument, which points to an array of
|
||||
<I>nmatch</I> structures of type <I>regmatch_t</I>, containing the members
|
||||
<I>rm_so</I> and <I>rm_eo</I>. These contain the offset to the first character of
|
||||
each substring and the offset to the first character after the end of each
|
||||
substring, respectively. The 0th element of the vector relates to the entire
|
||||
portion of <I>string</I> that was matched; subsequent elements relate to the
|
||||
capturing subpatterns of the regular expression. Unused entries in the array
|
||||
have both structure members set to -1.
|
||||
</P>
|
||||
<P>
|
||||
A successful match yields a zero return; various error codes are defined in the
|
||||
header file, of which REG_NOMATCH is the "expected" failure code.
|
||||
</P>
|
||||
<LI><A NAME="SEC6" HREF="#TOC1">ERROR MESSAGES</A>
|
||||
<P>
|
||||
The <B>regerror()</B> function maps a non-zero errorcode from either
|
||||
<B>regcomp</B> or <B>regexec</B> to a printable message. If <I>preg</I> is not
|
||||
NULL, the error should have arisen from the use of that structure. A message
|
||||
terminated by a binary zero is placed in <I>errbuf</I>. The length of the
|
||||
message, including the zero, is limited to <I>errbuf_size</I>. The yield of the
|
||||
function is the size of buffer needed to hold the whole message.
|
||||
</P>
|
||||
<LI><A NAME="SEC7" HREF="#TOC1">STORAGE</A>
|
||||
<P>
|
||||
Compiling a regular expression causes memory to be allocated and associated
|
||||
with the <I>preg</I> structure. The function <B>regfree()</B> frees all such
|
||||
memory, after which <I>preg</I> may no longer be used as a compiled expression.
|
||||
</P>
|
||||
<LI><A NAME="SEC8" HREF="#TOC1">AUTHOR</A>
|
||||
<P>
|
||||
Philip Hazel <ph10@cam.ac.uk>
|
||||
<BR>
|
||||
University Computing Service,
|
||||
<BR>
|
||||
New Museums Site,
|
||||
<BR>
|
||||
Cambridge CB2 3QG, England.
|
||||
<BR>
|
||||
Phone: +44 1223 334714
|
||||
</P>
|
||||
<P>
|
||||
Copyright (c) 1997-1999 University of Cambridge.
|
@ -1,150 +0,0 @@
|
||||
NAME
|
||||
pcreposix - POSIX API for Perl-compatible regular expres-
|
||||
sions.
|
||||
|
||||
|
||||
|
||||
SYNOPSIS
|
||||
#include <pcreposix.h>
|
||||
|
||||
int regcomp(regex_t *preg, const char *pattern,
|
||||
int cflags);
|
||||
|
||||
int regexec(regex_t *preg, const char *string,
|
||||
size_t nmatch, regmatch_t pmatch[], int eflags);
|
||||
|
||||
size_t regerror(int errcode, const regex_t *preg,
|
||||
char *errbuf, size_t errbuf_size);
|
||||
|
||||
void regfree(regex_t *preg);
|
||||
|
||||
|
||||
|
||||
DESCRIPTION
|
||||
This set of functions provides a POSIX-style API to the PCRE
|
||||
regular expression package. See the pcre documentation for a
|
||||
description of the native API, which contains additional
|
||||
functionality.
|
||||
|
||||
The functions described here are just wrapper functions that
|
||||
ultimately call the native API. Their prototypes are defined
|
||||
in the pcreposix.h header file, and on Unix systems the
|
||||
library itself is called pcreposix.a, so can be accessed by
|
||||
adding -lpcreposix to the command for linking an application
|
||||
which uses them. Because the POSIX functions call the native
|
||||
ones, it is also necessary to add -lpcre.
|
||||
|
||||
As I am pretty ignorant about POSIX, these functions must be
|
||||
considered as experimental. I have implemented only those
|
||||
option bits that can be reasonably mapped to PCRE native
|
||||
options. Other POSIX options are not even defined. It may be
|
||||
that it is useful to define, but ignore, other options.
|
||||
Feedback from more knowledgeable folk may cause this kind of
|
||||
detail to change.
|
||||
|
||||
When PCRE is called via these functions, it is only the API
|
||||
that is POSIX-like in style. The syntax and semantics of the
|
||||
regular expressions themselves are still those of Perl, sub-
|
||||
ject to the setting of various PCRE options, as described
|
||||
below.
|
||||
|
||||
The header for these functions is supplied as pcreposix.h to
|
||||
avoid any potential clash with other POSIX libraries. It
|
||||
can, of course, be renamed or aliased as regex.h, which is
|
||||
the "correct" name. It provides two structure types, regex_t
|
||||
for compiled internal forms, and regmatch_t for returning
|
||||
captured substrings. It also defines some constants whose
|
||||
names start with "REG_"; these are used for setting options
|
||||
and identifying error codes.
|
||||
|
||||
|
||||
|
||||
COMPILING A PATTERN
|
||||
The function regcomp() is called to compile a pattern into
|
||||
an internal form. The pattern is a C string terminated by a
|
||||
binary zero, and is passed in the argument pattern. The preg
|
||||
argument is a pointer to a regex_t structure which is used
|
||||
as a base for storing information about the compiled expres-
|
||||
sion.
|
||||
|
||||
The argument cflags is either zero, or contains one or more
|
||||
of the bits defined by the following macros:
|
||||
|
||||
REG_ICASE
|
||||
|
||||
The PCRE_CASELESS option is set when the expression is
|
||||
passed for compilation to the native function.
|
||||
|
||||
REG_NEWLINE
|
||||
|
||||
The PCRE_MULTILINE option is set when the expression is
|
||||
passed for compilation to the native function.
|
||||
|
||||
The yield of regcomp() is zero on success, and non-zero oth-
|
||||
erwise. The preg structure is filled in on success, and one
|
||||
member of the structure is publicized: re_nsub contains the
|
||||
number of capturing subpatterns in the regular expression.
|
||||
Various error codes are defined in the header file.
|
||||
|
||||
|
||||
|
||||
MATCHING A PATTERN
|
||||
The function regexec() is called to match a pre-compiled
|
||||
pattern preg against a given string, which is terminated by
|
||||
a zero byte, subject to the options in eflags. These can be:
|
||||
|
||||
REG_NOTBOL
|
||||
|
||||
The PCRE_NOTBOL option is set when calling the underlying
|
||||
PCRE matching function.
|
||||
|
||||
REG_NOTEOL
|
||||
|
||||
The PCRE_NOTEOL option is set when calling the underlying
|
||||
PCRE matching function.
|
||||
|
||||
The portion of the string that was matched, and also any
|
||||
captured substrings, are returned via the pmatch argument,
|
||||
which points to an array of nmatch structures of type
|
||||
regmatch_t, containing the members rm_so and rm_eo. These
|
||||
contain the offset to the first character of each substring
|
||||
and the offset to the first character after the end of each
|
||||
substring, respectively. The 0th element of the vector
|
||||
relates to the entire portion of string that was matched;
|
||||
subsequent elements relate to the capturing subpatterns of
|
||||
the regular expression. Unused entries in the array have
|
||||
both structure members set to -1.
|
||||
|
||||
A successful match yields a zero return; various error codes
|
||||
are defined in the header file, of which REG_NOMATCH is the
|
||||
"expected" failure code.
|
||||
|
||||
|
||||
|
||||
ERROR MESSAGES
|
||||
The regerror() function maps a non-zero errorcode from
|
||||
either regcomp or regexec to a printable message. If preg is
|
||||
not NULL, the error should have arisen from the use of that
|
||||
structure. A message terminated by a binary zero is placed
|
||||
in errbuf. The length of the message, including the zero, is
|
||||
limited to errbuf_size. The yield of the function is the
|
||||
size of buffer needed to hold the whole message.
|
||||
|
||||
|
||||
|
||||
STORAGE
|
||||
Compiling a regular expression causes memory to be allocated
|
||||
and associated with the preg structure. The function reg-
|
||||
free() frees all such memory, after which preg may no longer
|
||||
be used as a compiled expression.
|
||||
|
||||
|
||||
|
||||
AUTHOR
|
||||
Philip Hazel <ph10@cam.ac.uk>
|
||||
University Computing Service,
|
||||
New Museums Site,
|
||||
Cambridge CB2 3QG, England.
|
||||
Phone: +44 1223 334714
|
||||
|
||||
Copyright (c) 1997-1999 University of Cambridge.
|
@ -1,250 +0,0 @@
|
||||
/*************************************************
|
||||
* Perl-Compatible Regular Expressions *
|
||||
*************************************************/
|
||||
|
||||
/*
|
||||
This is a library of functions to support regular expressions whose syntax
|
||||
and semantics are as close as possible to those of the Perl 5 language. See
|
||||
the file Tech.Notes for some information on the internals.
|
||||
|
||||
This module is a wrapper that provides a POSIX API to the underlying PCRE
|
||||
functions.
|
||||
|
||||
Written by: Philip Hazel <ph10@cam.ac.uk>
|
||||
|
||||
Copyright (c) 1997-1999 University of Cambridge
|
||||
|
||||
-----------------------------------------------------------------------------
|
||||
Permission is granted to anyone to use this software for any purpose on any
|
||||
computer system, and to redistribute it freely, subject to the following
|
||||
restrictions:
|
||||
|
||||
1. This software is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
|
||||
|
||||
2. The origin of this software must not be misrepresented, either by
|
||||
explicit claim or by omission.
|
||||
|
||||
3. Altered versions must be plainly marked as such, and must not be
|
||||
misrepresented as being the original software.
|
||||
|
||||
4. If PCRE is embedded in any software that is released under the GNU
|
||||
General Purpose Licence (GPL), then the terms of that licence shall
|
||||
supersede any condition above with which it is incompatible.
|
||||
-----------------------------------------------------------------------------
|
||||
*/
|
||||
|
||||
#include "internal.h"
|
||||
#include "pcreposix.h"
|
||||
#include "stdlib.h"
|
||||
|
||||
|
||||
|
||||
/* Corresponding tables of PCRE error messages and POSIX error codes. */
|
||||
|
||||
static const char *estring[] = {
|
||||
ERR1, ERR2, ERR3, ERR4, ERR5, ERR6, ERR7, ERR8, ERR9, ERR10,
|
||||
ERR11, ERR12, ERR13, ERR14, ERR15, ERR16, ERR17, ERR18, ERR19, ERR20,
|
||||
ERR21, ERR22, ERR23, ERR24, ERR25 };
|
||||
|
||||
static int eint[] = {
|
||||
REG_EESCAPE, /* "\\ at end of pattern" */
|
||||
REG_EESCAPE, /* "\\c at end of pattern" */
|
||||
REG_EESCAPE, /* "unrecognized character follows \\" */
|
||||
REG_BADBR, /* "numbers out of order in {} quantifier" */
|
||||
REG_BADBR, /* "number too big in {} quantifier" */
|
||||
REG_EBRACK, /* "missing terminating ] for character class" */
|
||||
REG_ECTYPE, /* "invalid escape sequence in character class" */
|
||||
REG_ERANGE, /* "range out of order in character class" */
|
||||
REG_BADRPT, /* "nothing to repeat" */
|
||||
REG_BADRPT, /* "operand of unlimited repeat could match the empty string" */
|
||||
REG_ASSERT, /* "internal error: unexpected repeat" */
|
||||
REG_BADPAT, /* "unrecognized character after (?" */
|
||||
REG_ESIZE, /* "too many capturing parenthesized sub-patterns" */
|
||||
REG_EPAREN, /* "missing )" */
|
||||
REG_ESUBREG, /* "back reference to non-existent subpattern" */
|
||||
REG_INVARG, /* "erroffset passed as NULL" */
|
||||
REG_INVARG, /* "unknown option bit(s) set" */
|
||||
REG_EPAREN, /* "missing ) after comment" */
|
||||
REG_ESIZE, /* "too many sets of parentheses" */
|
||||
REG_ESIZE, /* "regular expression too large" */
|
||||
REG_ESPACE, /* "failed to get memory" */
|
||||
REG_EPAREN, /* "unmatched brackets" */
|
||||
REG_ASSERT, /* "internal error: code overflow" */
|
||||
REG_BADPAT, /* "unrecognized character after (?<" */
|
||||
REG_BADPAT, /* "lookbehind assertion is not fixed length" */
|
||||
REG_BADPAT, /* "malformed number after (?(" */
|
||||
REG_BADPAT, /* "conditional group containe more than two branches" */
|
||||
REG_BADPAT /* "assertion expected after (?(" */
|
||||
};
|
||||
|
||||
/* Table of texts corresponding to POSIX error codes */
|
||||
|
||||
static const char *pstring[] = {
|
||||
"", /* Dummy for value 0 */
|
||||
"internal error", /* REG_ASSERT */
|
||||
"invalid repeat counts in {}", /* BADBR */
|
||||
"pattern error", /* BADPAT */
|
||||
"? * + invalid", /* BADRPT */
|
||||
"unbalanced {}", /* EBRACE */
|
||||
"unbalanced []", /* EBRACK */
|
||||
"collation error - not relevant", /* ECOLLATE */
|
||||
"bad class", /* ECTYPE */
|
||||
"bad escape sequence", /* EESCAPE */
|
||||
"empty expression", /* EMPTY */
|
||||
"unbalanced ()", /* EPAREN */
|
||||
"bad range inside []", /* ERANGE */
|
||||
"expression too big", /* ESIZE */
|
||||
"failed to get memory", /* ESPACE */
|
||||
"bad back reference", /* ESUBREG */
|
||||
"bad argument", /* INVARG */
|
||||
"match failed" /* NOMATCH */
|
||||
};
|
||||
|
||||
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Translate PCRE text code to int *
|
||||
*************************************************/
|
||||
|
||||
/* PCRE compile-time errors are given as strings defined as macros. We can just
|
||||
look them up in a table to turn them into POSIX-style error codes. */
|
||||
|
||||
static int
|
||||
pcre_posix_error_code(const char *s)
|
||||
{
|
||||
size_t i;
|
||||
for (i = 0; i < sizeof(estring)/sizeof(char *); i++)
|
||||
if (strcmp(s, estring[i]) == 0) return eint[i];
|
||||
return REG_ASSERT;
|
||||
}
|
||||
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Translate error code to string *
|
||||
*************************************************/
|
||||
|
||||
size_t
|
||||
regerror(int errcode, const regex_t *preg, char *errbuf, size_t errbuf_size)
|
||||
{
|
||||
const char *message, *addmessage;
|
||||
size_t length, addlength;
|
||||
|
||||
message = (errcode >= (int)(sizeof(pstring)/sizeof(char *)))?
|
||||
"unknown error code" : pstring[errcode];
|
||||
length = strlen(message) + 1;
|
||||
|
||||
addmessage = " at offset ";
|
||||
addlength = (preg != NULL && (int)preg->re_erroffset != -1)?
|
||||
strlen(addmessage) + 6 : 0;
|
||||
|
||||
if (errbuf_size > 0)
|
||||
{
|
||||
if (addlength > 0 && errbuf_size >= length + addlength)
|
||||
sprintf(errbuf, "%s%s%-6d", message, addmessage, (int)preg->re_erroffset);
|
||||
else
|
||||
{
|
||||
strncpy(errbuf, message, errbuf_size - 1);
|
||||
errbuf[errbuf_size-1] = 0;
|
||||
}
|
||||
}
|
||||
|
||||
return length + addlength;
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Free store held by a regex *
|
||||
*************************************************/
|
||||
|
||||
void
|
||||
regfree(regex_t *preg)
|
||||
{
|
||||
(pcre_free)(preg->re_pcre);
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Compile a regular expression *
|
||||
*************************************************/
|
||||
|
||||
/*
|
||||
Arguments:
|
||||
preg points to a structure for recording the compiled expression
|
||||
pattern the pattern to compile
|
||||
cflags compilation flags
|
||||
|
||||
Returns: 0 on success
|
||||
various non-zero codes on failure
|
||||
*/
|
||||
|
||||
int
|
||||
regcomp(regex_t *preg, const char *pattern, int cflags)
|
||||
{
|
||||
const char *errorptr;
|
||||
int erroffset;
|
||||
int options = 0;
|
||||
|
||||
if ((cflags & REG_ICASE) != 0) options |= PCRE_CASELESS;
|
||||
if ((cflags & REG_NEWLINE) != 0) options |= PCRE_MULTILINE;
|
||||
|
||||
preg->re_pcre = pcre_compile(pattern, options, &errorptr, &erroffset, NULL);
|
||||
preg->re_erroffset = erroffset;
|
||||
|
||||
if (preg->re_pcre == NULL) return pcre_posix_error_code(errorptr);
|
||||
|
||||
preg->re_nsub = pcre_info(preg->re_pcre, NULL, NULL);
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Match a regular expression *
|
||||
*************************************************/
|
||||
|
||||
int
|
||||
regexec(regex_t *preg, const char *str, size_t nmatch,
|
||||
regmatch_t pmatch[], int eflags)
|
||||
{
|
||||
int rc;
|
||||
int options = 0;
|
||||
|
||||
if ((eflags & REG_NOTBOL) != 0) options |= PCRE_NOTBOL;
|
||||
if ((eflags & REG_NOTEOL) != 0) options |= PCRE_NOTEOL;
|
||||
|
||||
preg->re_erroffset = (size_t)(-1); /* Only has meaning after compile */
|
||||
|
||||
rc = pcre_exec(preg->re_pcre, NULL, str, (int)strlen(str), 0, options,
|
||||
(int *)pmatch, nmatch * 2);
|
||||
|
||||
if (rc == 0) return 0; /* All pmatch were filled in */
|
||||
|
||||
if (rc > 0)
|
||||
{
|
||||
size_t i;
|
||||
for (i = rc; i < nmatch; i++) pmatch[i].rm_so = pmatch[i].rm_eo = -1;
|
||||
return 0;
|
||||
}
|
||||
|
||||
else switch(rc)
|
||||
{
|
||||
case PCRE_ERROR_NOMATCH: return REG_NOMATCH;
|
||||
case PCRE_ERROR_NULL: return REG_INVARG;
|
||||
case PCRE_ERROR_BADOPTION: return REG_INVARG;
|
||||
case PCRE_ERROR_BADMAGIC: return REG_INVARG;
|
||||
case PCRE_ERROR_UNKNOWN_NODE: return REG_ASSERT;
|
||||
case PCRE_ERROR_NOMEMORY: return REG_ESPACE;
|
||||
default: return REG_ASSERT;
|
||||
}
|
||||
}
|
||||
|
||||
/* End of pcreposix.c */
|
@ -1,82 +0,0 @@
|
||||
/*************************************************
|
||||
* Perl-Compatible Regular Expressions *
|
||||
*************************************************/
|
||||
|
||||
/* Copyright (c) 1997-1999 University of Cambridge */
|
||||
|
||||
#ifndef _PCREPOSIX_H
|
||||
#define _PCREPOSIX_H
|
||||
|
||||
/* This is the header for the POSIX wrapper interface to the PCRE Perl-
|
||||
Compatible Regular Expression library. It defines the things POSIX says should
|
||||
be there. I hope. */
|
||||
|
||||
/* Have to include stdlib.h in order to ensure that size_t is defined. */
|
||||
|
||||
#include <stdlib.h>
|
||||
|
||||
/* Allow for C++ users */
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
/* Options defined by POSIX. */
|
||||
|
||||
#define REG_ICASE 0x01
|
||||
#define REG_NEWLINE 0x02
|
||||
#define REG_NOTBOL 0x04
|
||||
#define REG_NOTEOL 0x08
|
||||
|
||||
/* Error values. Not all these are relevant or used by the wrapper. */
|
||||
|
||||
enum {
|
||||
REG_ASSERT = 1, /* internal error ? */
|
||||
REG_BADBR, /* invalid repeat counts in {} */
|
||||
REG_BADPAT, /* pattern error */
|
||||
REG_BADRPT, /* ? * + invalid */
|
||||
REG_EBRACE, /* unbalanced {} */
|
||||
REG_EBRACK, /* unbalanced [] */
|
||||
REG_ECOLLATE, /* collation error - not relevant */
|
||||
REG_ECTYPE, /* bad class */
|
||||
REG_EESCAPE, /* bad escape sequence */
|
||||
REG_EMPTY, /* empty expression */
|
||||
REG_EPAREN, /* unbalanced () */
|
||||
REG_ERANGE, /* bad range inside [] */
|
||||
REG_ESIZE, /* expression too big */
|
||||
REG_ESPACE, /* failed to get memory */
|
||||
REG_ESUBREG, /* bad back reference */
|
||||
REG_INVARG, /* bad argument */
|
||||
REG_NOMATCH /* match failed */
|
||||
};
|
||||
|
||||
|
||||
/* The structure representing a compiled regular expression. */
|
||||
|
||||
typedef struct {
|
||||
void *re_pcre;
|
||||
size_t re_nsub;
|
||||
size_t re_erroffset;
|
||||
} regex_t;
|
||||
|
||||
/* The structure in which a captured offset is returned. */
|
||||
|
||||
typedef int regoff_t;
|
||||
|
||||
typedef struct {
|
||||
regoff_t rm_so;
|
||||
regoff_t rm_eo;
|
||||
} regmatch_t;
|
||||
|
||||
/* The functions */
|
||||
|
||||
extern int regcomp(regex_t *, const char *, int);
|
||||
extern int regexec(regex_t *, const char *, size_t, regmatch_t *, int);
|
||||
extern size_t regerror(int, const regex_t *, char *, size_t);
|
||||
extern void regfree(regex_t *);
|
||||
|
||||
#ifdef __cplusplus
|
||||
} /* extern "C" */
|
||||
#endif
|
||||
|
||||
#endif /* End of pcreposix.h */
|
File diff suppressed because it is too large
Load Diff
@ -1,143 +0,0 @@
|
||||
#! /usr/bin/perl
|
||||
|
||||
# Program for testing regular expressions with perl to check that PCRE handles
|
||||
# them the same.
|
||||
|
||||
|
||||
# Function for turning a string into a string of printing chars
|
||||
|
||||
sub pchars {
|
||||
my($t) = "";
|
||||
|
||||
foreach $c (split(//, @_[0]))
|
||||
{
|
||||
if (ord $c >= 32 && ord $c < 127) { $t .= $c; }
|
||||
else { $t .= sprintf("\\x%02x", ord $c); }
|
||||
}
|
||||
$t;
|
||||
}
|
||||
|
||||
|
||||
|
||||
# Read lines from named file or stdin and write to named file or stdout; lines
|
||||
# consist of a regular expression, in delimiters and optionally followed by
|
||||
# options, followed by a set of test data, terminated by an empty line.
|
||||
|
||||
# Sort out the input and output files
|
||||
|
||||
if (@ARGV > 0)
|
||||
{
|
||||
open(INFILE, "<$ARGV[0]") || die "Failed to open $ARGV[0]\n";
|
||||
$infile = "INFILE";
|
||||
}
|
||||
else { $infile = "STDIN"; }
|
||||
|
||||
if (@ARGV > 1)
|
||||
{
|
||||
open(OUTFILE, ">$ARGV[1]") || die "Failed to open $ARGV[1]\n";
|
||||
$outfile = "OUTFILE";
|
||||
}
|
||||
else { $outfile = "STDOUT"; }
|
||||
|
||||
printf($outfile "Perl $] Regular Expressions\n\n");
|
||||
|
||||
# Main loop
|
||||
|
||||
NEXT_RE:
|
||||
for (;;)
|
||||
{
|
||||
printf " re> " if $infile eq "STDIN";
|
||||
last if ! ($_ = <$infile>);
|
||||
printf $outfile "$_" if $infile ne "STDIN";
|
||||
next if ($_ eq "");
|
||||
|
||||
$pattern = $_;
|
||||
|
||||
$delimiter = substr($_, 0, 1);
|
||||
while ($pattern !~ /^\s*(.).*\1/s)
|
||||
{
|
||||
printf " > " if $infile eq "STDIN";
|
||||
last if ! ($_ = <$infile>);
|
||||
printf $outfile "$_" if $infile ne "STDIN";
|
||||
$pattern .= $_;
|
||||
}
|
||||
|
||||
chomp($pattern);
|
||||
$pattern =~ s/\s+$//;
|
||||
|
||||
# Check that the pattern is valid
|
||||
|
||||
eval "\$_ =~ ${pattern}";
|
||||
if ($@)
|
||||
{
|
||||
printf $outfile "Error: $@";
|
||||
next NEXT_RE;
|
||||
}
|
||||
|
||||
# Read data lines and test them
|
||||
|
||||
for (;;)
|
||||
{
|
||||
printf "data> " if $infile eq "STDIN";
|
||||
last NEXT_RE if ! ($_ = <$infile>);
|
||||
chomp;
|
||||
printf $outfile "$_\n" if $infile ne "STDIN";
|
||||
|
||||
s/\s+$//;
|
||||
s/^\s+//;
|
||||
|
||||
last if ($_ eq "");
|
||||
|
||||
$_ = eval "\"$_\""; # To get escapes processed
|
||||
|
||||
$ok = 0;
|
||||
eval "if (\$_ =~ ${pattern}) {" .
|
||||
"\$z = \$&;" .
|
||||
"\$a = \$1;" .
|
||||
"\$b = \$2;" .
|
||||
"\$c = \$3;" .
|
||||
"\$d = \$4;" .
|
||||
"\$e = \$5;" .
|
||||
"\$f = \$6;" .
|
||||
"\$g = \$7;" .
|
||||
"\$h = \$8;" .
|
||||
"\$i = \$9;" .
|
||||
"\$j = \$10;" .
|
||||
"\$k = \$11;" .
|
||||
"\$l = \$12;" .
|
||||
"\$m = \$13;" .
|
||||
"\$n = \$14;" .
|
||||
"\$o = \$15;" .
|
||||
"\$p = \$16;" .
|
||||
"\$ok = 1; }";
|
||||
|
||||
if ($@)
|
||||
{
|
||||
printf $outfile "Error: $@\n";
|
||||
next NEXT_RE;
|
||||
}
|
||||
elsif (!$ok)
|
||||
{
|
||||
printf $outfile "No match\n";
|
||||
}
|
||||
else
|
||||
{
|
||||
@subs = ($z,$a,$b,$c,$d,$e,$f,$g,$h,$i,$j,$k,$l,$m,$n,$o,$p);
|
||||
$last_printed = 0;
|
||||
for ($i = 0; $i <= 17; $i++)
|
||||
{
|
||||
if ($i == 0 || defined $subs[$i])
|
||||
{
|
||||
while ($last_printed++ < $i-1)
|
||||
{ printf $outfile ("%2d: <unset>\n", $last_printed); }
|
||||
printf $outfile ("%2d: %s\n", $i, &pchars($subs[$i]));
|
||||
$last_printed = $i;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
printf $outfile "\n";
|
||||
|
||||
# End
|
@ -1,76 +0,0 @@
|
||||
.TH PGREP 1
|
||||
.SH NAME
|
||||
pgrep - a grep with Perl-compatible regular expressions.
|
||||
.SH SYNOPSIS
|
||||
.B pgrep [-Vchilnsvx] pattern [file] ...
|
||||
|
||||
|
||||
.SH DESCRIPTION
|
||||
\fBpgrep\fR searches files for character patterns, in the same way as other
|
||||
grep commands do, but it uses the PCRE regular expression library to support
|
||||
patterns that are compatible with the regular expressions of Perl 5. See
|
||||
\fBpcre(3)\fR for a full description of syntax and semantics.
|
||||
|
||||
If no files are specified, \fBpgrep\fR reads the standard input. By default,
|
||||
each line that matches the pattern is copied to the standard output, and if
|
||||
there is more than one file, the file name is printed before each line of
|
||||
output. However, there are options that can change how \fBpgrep\fR behaves.
|
||||
|
||||
Lines are limited to BUFSIZ characters. BUFSIZ is defined in \fB<stdio.h>\fR.
|
||||
The newline character is removed from the end of each line before it is matched
|
||||
against the pattern.
|
||||
|
||||
|
||||
.SH OPTIONS
|
||||
.TP 10
|
||||
\fB-V\fR
|
||||
Write the version number of the PCRE library being used to the standard error
|
||||
stream.
|
||||
.TP
|
||||
\fB-c\fR
|
||||
Do not print individual lines; instead just print a count of the number of
|
||||
lines that would otherwise have been printed. If several files are given, a
|
||||
count is printed for each of them.
|
||||
.TP
|
||||
\fB-h\fR
|
||||
Suppress printing of filenames when searching multiple files.
|
||||
.TP
|
||||
\fB-i\fR
|
||||
Ignore upper/lower case distinctions during comparisons.
|
||||
.TP
|
||||
\fB-l\fR
|
||||
Instead of printing lines from the files, just print the names of the files
|
||||
containing lines that would have been printed. Each file name is printed
|
||||
once, on a separate line.
|
||||
.TP
|
||||
\fB-n\fR
|
||||
Precede each line by its line number in the file.
|
||||
.TP
|
||||
\fB-s\fR
|
||||
Work silently, that is, display nothing except error messages.
|
||||
The exit status indicates whether any matches were found.
|
||||
.TP
|
||||
\fB-v\fR
|
||||
Invert the sense of the match, so that lines which do \fInot\fR match the
|
||||
pattern are now the ones that are found.
|
||||
.TP
|
||||
\fB-x\fR
|
||||
Force the pattern to be anchored (it must start matching at the beginning of
|
||||
the line) and in addition, require it to match the entire line. This is
|
||||
equivalent to having ^ and $ characters at the start and end of each
|
||||
alternative branch in the regular expression.
|
||||
|
||||
|
||||
.SH SEE ALSO
|
||||
\fBpcre(3)\fR, Perl 5 documentation
|
||||
|
||||
|
||||
.SH DIAGNOSTICS
|
||||
Exit status is 0 if any matches were found, 1 if no matches were found, and 2
|
||||
for syntax errors or inacessible files (even if matches were found).
|
||||
|
||||
|
||||
.SH AUTHOR
|
||||
Philip Hazel <ph10@cam.ac.uk>
|
||||
.br
|
||||
Copyright (c) 1997-1999 University of Cambridge.
|
@ -1,105 +0,0 @@
|
||||
<HTML>
|
||||
<HEAD>
|
||||
<TITLE>pgrep specification</TITLE>
|
||||
</HEAD>
|
||||
<body bgcolor="#FFFFFF" text="#00005A">
|
||||
<H1>pgrep specification</H1>
|
||||
This HTML document has been generated automatically from the original man page.
|
||||
If there is any nonsense in it, please consult the man page in case the
|
||||
conversion went wrong.
|
||||
<UL>
|
||||
<LI><A NAME="TOC1" HREF="#SEC1">NAME</A>
|
||||
<LI><A NAME="TOC2" HREF="#SEC2">SYNOPSIS</A>
|
||||
<LI><A NAME="TOC3" HREF="#SEC3">DESCRIPTION</A>
|
||||
<LI><A NAME="TOC4" HREF="#SEC4">OPTIONS</A>
|
||||
<LI><A NAME="TOC5" HREF="#SEC5">SEE ALSO</A>
|
||||
<LI><A NAME="TOC6" HREF="#SEC6">DIAGNOSTICS</A>
|
||||
<LI><A NAME="TOC7" HREF="#SEC7">AUTHOR</A>
|
||||
</UL>
|
||||
<LI><A NAME="SEC1" HREF="#TOC1">NAME</A>
|
||||
<P>
|
||||
pgrep - a grep with Perl-compatible regular expressions.
|
||||
</P>
|
||||
<LI><A NAME="SEC2" HREF="#TOC1">SYNOPSIS</A>
|
||||
<P>
|
||||
<B>pgrep [-Vchilnsvx] pattern [file] ...</B>
|
||||
</P>
|
||||
<LI><A NAME="SEC3" HREF="#TOC1">DESCRIPTION</A>
|
||||
<P>
|
||||
<B>pgrep</B> searches files for character patterns, in the same way as other
|
||||
grep commands do, but it uses the PCRE regular expression library to support
|
||||
patterns that are compatible with the regular expressions of Perl 5. See
|
||||
<B>pcre(3)</B> for a full description of syntax and semantics.
|
||||
</P>
|
||||
<P>
|
||||
If no files are specified, <B>pgrep</B> reads the standard input. By default,
|
||||
each line that matches the pattern is copied to the standard output, and if
|
||||
there is more than one file, the file name is printed before each line of
|
||||
output. However, there are options that can change how <B>pgrep</B> behaves.
|
||||
</P>
|
||||
<P>
|
||||
Lines are limited to BUFSIZ characters. BUFSIZ is defined in <B><stdio.h></B>.
|
||||
The newline character is removed from the end of each line before it is matched
|
||||
against the pattern.
|
||||
</P>
|
||||
<LI><A NAME="SEC4" HREF="#TOC1">OPTIONS</A>
|
||||
<P>
|
||||
<B>-V</B>
|
||||
Write the version number of the PCRE library being used to the standard error
|
||||
stream.
|
||||
</P>
|
||||
<P>
|
||||
<B>-c</B>
|
||||
Do not print individual lines; instead just print a count of the number of
|
||||
lines that would otherwise have been printed. If several files are given, a
|
||||
count is printed for each of them.
|
||||
</P>
|
||||
<P>
|
||||
<B>-h</B>
|
||||
Suppress printing of filenames when searching multiple files.
|
||||
</P>
|
||||
<P>
|
||||
<B>-i</B>
|
||||
Ignore upper/lower case distinctions during comparisons.
|
||||
</P>
|
||||
<P>
|
||||
<B>-l</B>
|
||||
Instead of printing lines from the files, just print the names of the files
|
||||
containing lines that would have been printed. Each file name is printed
|
||||
once, on a separate line.
|
||||
</P>
|
||||
<P>
|
||||
<B>-n</B>
|
||||
Precede each line by its line number in the file.
|
||||
</P>
|
||||
<P>
|
||||
<B>-s</B>
|
||||
Work silently, that is, display nothing except error messages.
|
||||
The exit status indicates whether any matches were found.
|
||||
</P>
|
||||
<P>
|
||||
<B>-v</B>
|
||||
Invert the sense of the match, so that lines which do <I>not</I> match the
|
||||
pattern are now the ones that are found.
|
||||
</P>
|
||||
<P>
|
||||
<B>-x</B>
|
||||
Force the pattern to be anchored (it must start matching at the beginning of
|
||||
the line) and in addition, require it to match the entire line. This is
|
||||
equivalent to having ^ and $ characters at the start and end of each
|
||||
alternative branch in the regular expression.
|
||||
</P>
|
||||
<LI><A NAME="SEC5" HREF="#TOC1">SEE ALSO</A>
|
||||
<P>
|
||||
<B>pcre(3)</B>, Perl 5 documentation
|
||||
</P>
|
||||
<LI><A NAME="SEC6" HREF="#TOC1">DIAGNOSTICS</A>
|
||||
<P>
|
||||
Exit status is 0 if any matches were found, 1 if no matches were found, and 2
|
||||
for syntax errors or inacessible files (even if matches were found).
|
||||
</P>
|
||||
<LI><A NAME="SEC7" HREF="#TOC1">AUTHOR</A>
|
||||
<P>
|
||||
Philip Hazel <ph10@cam.ac.uk>
|
||||
<BR>
|
||||
Copyright (c) 1997-1999 University of Cambridge.
|
@ -1,86 +0,0 @@
|
||||
NAME
|
||||
pgrep - a grep with Perl-compatible regular expressions.
|
||||
|
||||
|
||||
|
||||
SYNOPSIS
|
||||
pgrep [-Vchilnsvx] pattern [file] ...
|
||||
|
||||
|
||||
|
||||
DESCRIPTION
|
||||
pgrep searches files for character patterns, in the same way
|
||||
as other grep commands do, but it uses the PCRE regular
|
||||
expression library to support patterns that are compatible
|
||||
with the regular expressions of Perl 5. See pcre(3) for a
|
||||
full description of syntax and semantics.
|
||||
|
||||
If no files are specified, pgrep reads the standard input.
|
||||
By default, each line that matches the pattern is copied to
|
||||
the standard output, and if there is more than one file, the
|
||||
file name is printed before each line of output. However,
|
||||
there are options that can change how pgrep behaves.
|
||||
|
||||
Lines are limited to BUFSIZ characters. BUFSIZ is defined in
|
||||
<stdio.h>. The newline character is removed from the end of
|
||||
each line before it is matched against the pattern.
|
||||
|
||||
|
||||
|
||||
OPTIONS
|
||||
-V Write the version number of the PCRE library being
|
||||
used to the standard error stream.
|
||||
|
||||
-c Do not print individual lines; instead just print
|
||||
a count of the number of lines that would other-
|
||||
wise have been printed. If several files are
|
||||
given, a count is printed for each of them.
|
||||
|
||||
-h Suppress printing of filenames when searching mul-
|
||||
tiple files.
|
||||
|
||||
-i Ignore upper/lower case distinctions during com-
|
||||
parisons.
|
||||
|
||||
-l Instead of printing lines from the files, just
|
||||
print the names of the files containing lines that
|
||||
would have been printed. Each file name is printed
|
||||
once, on a separate line.
|
||||
|
||||
-n Precede each line by its line number in the file.
|
||||
|
||||
-s Work silently, that is, display nothing except
|
||||
error messages. The exit status indicates whether
|
||||
any matches were found.
|
||||
|
||||
-v Invert the sense of the match, so that lines which
|
||||
do not match the pattern are now the ones that are
|
||||
found.
|
||||
|
||||
-x Force the pattern to be anchored (it must start
|
||||
matching at the beginning of the line) and in
|
||||
addition, require it to match the entire line.
|
||||
This is equivalent to having ^ and $ characters at
|
||||
the start and end of each alternative branch in
|
||||
the regular expression.
|
||||
|
||||
|
||||
|
||||
SEE ALSO
|
||||
pcre(3), Perl 5 documentation
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
DIAGNOSTICS
|
||||
Exit status is 0 if any matches were found, 1 if no matches
|
||||
were found, and 2 for syntax errors or inacessible files
|
||||
(even if matches were found).
|
||||
|
||||
|
||||
|
||||
AUTHOR
|
||||
Philip Hazel <ph10@cam.ac.uk>
|
||||
Copyright (c) 1997-1999 University of Cambridge.
|
||||
|
@ -1,225 +0,0 @@
|
||||
/*************************************************
|
||||
* PCRE grep program *
|
||||
*************************************************/
|
||||
|
||||
#include <stdio.h>
|
||||
#include <string.h>
|
||||
#include <stdlib.h>
|
||||
#include <errno.h>
|
||||
#include "pcre.h"
|
||||
|
||||
|
||||
#define FALSE 0
|
||||
#define TRUE 1
|
||||
|
||||
typedef int BOOL;
|
||||
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Global variables *
|
||||
*************************************************/
|
||||
|
||||
static pcre *pattern;
|
||||
static pcre_extra *hints;
|
||||
|
||||
static BOOL count_only = FALSE;
|
||||
static BOOL filenames_only = FALSE;
|
||||
static BOOL invert = FALSE;
|
||||
static BOOL number = FALSE;
|
||||
static BOOL silent = FALSE;
|
||||
static BOOL whole_lines = FALSE;
|
||||
|
||||
|
||||
|
||||
#ifdef STRERROR_FROM_ERRLIST
|
||||
/*************************************************
|
||||
* Provide strerror() for non-ANSI libraries *
|
||||
*************************************************/
|
||||
|
||||
/* Some old-fashioned systems still around (e.g. SunOS4) don't have strerror()
|
||||
in their libraries, but can provide the same facility by this simple
|
||||
alternative function. */
|
||||
|
||||
extern int sys_nerr;
|
||||
extern char *sys_errlist[];
|
||||
|
||||
char *
|
||||
strerror(int n)
|
||||
{
|
||||
if (n < 0 || n >= sys_nerr) return "unknown error number";
|
||||
return sys_errlist[n];
|
||||
}
|
||||
#endif /* STRERROR_FROM_ERRLIST */
|
||||
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Grep an individual file *
|
||||
*************************************************/
|
||||
|
||||
static int
|
||||
pgrep(FILE *in, char *name)
|
||||
{
|
||||
int rc = 1;
|
||||
int linenumber = 0;
|
||||
int count = 0;
|
||||
int offsets[99];
|
||||
char buffer[BUFSIZ];
|
||||
|
||||
while (fgets(buffer, sizeof(buffer), in) != NULL)
|
||||
{
|
||||
BOOL match;
|
||||
int length = (int)strlen(buffer);
|
||||
if (length > 0 && buffer[length-1] == '\n') buffer[--length] = 0;
|
||||
linenumber++;
|
||||
|
||||
match = pcre_exec(pattern, hints, buffer, length, 0, 0, offsets, 99) >= 0;
|
||||
if (match && whole_lines && offsets[1] != length) match = FALSE;
|
||||
|
||||
if (match != invert)
|
||||
{
|
||||
if (count_only) count++;
|
||||
|
||||
else if (filenames_only)
|
||||
{
|
||||
fprintf(stdout, "%s\n", (name == NULL)? "<stdin>" : name);
|
||||
return 0;
|
||||
}
|
||||
|
||||
else if (silent) return 0;
|
||||
|
||||
else
|
||||
{
|
||||
if (name != NULL) fprintf(stdout, "%s:", name);
|
||||
if (number) fprintf(stdout, "%d:", linenumber);
|
||||
fprintf(stdout, "%s\n", buffer);
|
||||
}
|
||||
|
||||
rc = 0;
|
||||
}
|
||||
}
|
||||
|
||||
if (count_only)
|
||||
{
|
||||
if (name != NULL) fprintf(stdout, "%s:", name);
|
||||
fprintf(stdout, "%d\n", count);
|
||||
}
|
||||
|
||||
return rc;
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Usage function *
|
||||
*************************************************/
|
||||
|
||||
static int
|
||||
usage(int rc)
|
||||
{
|
||||
fprintf(stderr, "Usage: pgrep [-Vchilnsvx] pattern [file] ...\n");
|
||||
return rc;
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Main program *
|
||||
*************************************************/
|
||||
|
||||
int
|
||||
main(int argc, char **argv)
|
||||
{
|
||||
int i;
|
||||
int rc = 1;
|
||||
int options = 0;
|
||||
int errptr;
|
||||
const char *error;
|
||||
BOOL filenames = TRUE;
|
||||
|
||||
/* Process the options */
|
||||
|
||||
for (i = 1; i < argc; i++)
|
||||
{
|
||||
char *s;
|
||||
if (argv[i][0] != '-') break;
|
||||
s = argv[i] + 1;
|
||||
while (*s != 0)
|
||||
{
|
||||
switch (*s++)
|
||||
{
|
||||
case 'c': count_only = TRUE; break;
|
||||
case 'h': filenames = FALSE; break;
|
||||
case 'i': options |= PCRE_CASELESS; break;
|
||||
case 'l': filenames_only = TRUE;
|
||||
case 'n': number = TRUE; break;
|
||||
case 's': silent = TRUE; break;
|
||||
case 'v': invert = TRUE; break;
|
||||
case 'x': whole_lines = TRUE; options |= PCRE_ANCHORED; break;
|
||||
|
||||
case 'V':
|
||||
fprintf(stderr, "PCRE version %s\n", pcre_version());
|
||||
break;
|
||||
|
||||
default:
|
||||
fprintf(stderr, "pgrep: unknown option %c\n", s[-1]);
|
||||
return usage(2);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/* There must be at least a regexp argument */
|
||||
|
||||
if (i >= argc) return usage(0);
|
||||
|
||||
/* Compile the regular expression. */
|
||||
|
||||
pattern = pcre_compile(argv[i++], options, &error, &errptr, NULL);
|
||||
if (pattern == NULL)
|
||||
{
|
||||
fprintf(stderr, "pgrep: error in regex at offset %d: %s\n", errptr, error);
|
||||
return 2;
|
||||
}
|
||||
|
||||
/* Study the regular expression, as we will be running it may times */
|
||||
|
||||
hints = pcre_study(pattern, 0, &error);
|
||||
if (error != NULL)
|
||||
{
|
||||
fprintf(stderr, "pgrep: error while studing regex: %s\n", error);
|
||||
return 2;
|
||||
}
|
||||
|
||||
/* If there are no further arguments, do the business on stdin and exit */
|
||||
|
||||
if (i >= argc) return pgrep(stdin, NULL);
|
||||
|
||||
/* Otherwise, work through the remaining arguments as files. If there is only
|
||||
one, don't give its name on the output. */
|
||||
|
||||
if (i == argc - 1) filenames = FALSE;
|
||||
if (filenames_only) filenames = TRUE;
|
||||
|
||||
for (; i < argc; i++)
|
||||
{
|
||||
FILE *in = fopen(argv[i], "r");
|
||||
if (in == NULL)
|
||||
{
|
||||
fprintf(stderr, "%s: failed to open: %s\n", argv[i], strerror(errno));
|
||||
rc = 2;
|
||||
}
|
||||
else
|
||||
{
|
||||
int frc = pgrep(in, filenames? argv[i] : NULL);
|
||||
if (frc == 0 && rc == 1) rc = 0;
|
||||
fclose(in);
|
||||
}
|
||||
}
|
||||
|
||||
return rc;
|
||||
}
|
||||
|
||||
/* End */
|
@ -1,397 +0,0 @@
|
||||
/*************************************************
|
||||
* Perl-Compatible Regular Expressions *
|
||||
*************************************************/
|
||||
|
||||
/*
|
||||
This is a library of functions to support regular expressions whose syntax
|
||||
and semantics are as close as possible to those of the Perl 5 language. See
|
||||
the file Tech.Notes for some information on the internals.
|
||||
|
||||
Written by: Philip Hazel <ph10@cam.ac.uk>
|
||||
|
||||
Copyright (c) 1997-1999 University of Cambridge
|
||||
|
||||
-----------------------------------------------------------------------------
|
||||
Permission is granted to anyone to use this software for any purpose on any
|
||||
computer system, and to redistribute it freely, subject to the following
|
||||
restrictions:
|
||||
|
||||
1. This software is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
|
||||
|
||||
2. The origin of this software must not be misrepresented, either by
|
||||
explicit claim or by omission.
|
||||
|
||||
3. Altered versions must be plainly marked as such, and must not be
|
||||
misrepresented as being the original software.
|
||||
|
||||
4. If PCRE is embedded in any software that is released under the GNU
|
||||
General Purpose Licence (GPL), then the terms of that licence shall
|
||||
supersede any condition above with which it is incompatible.
|
||||
-----------------------------------------------------------------------------
|
||||
*/
|
||||
|
||||
|
||||
/* Include the internals header, which itself includes Standard C headers plus
|
||||
the external pcre header. */
|
||||
|
||||
#include "internal.h"
|
||||
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Set a bit and maybe its alternate case *
|
||||
*************************************************/
|
||||
|
||||
/* Given a character, set its bit in the table, and also the bit for the other
|
||||
version of a letter if we are caseless.
|
||||
|
||||
Arguments:
|
||||
start_bits points to the bit map
|
||||
c is the character
|
||||
caseless the caseless flag
|
||||
cd the block with char table pointers
|
||||
|
||||
Returns: nothing
|
||||
*/
|
||||
|
||||
static void
|
||||
set_bit(uschar *start_bits, int c, BOOL caseless, compile_data *cd)
|
||||
{
|
||||
start_bits[c/8] |= (1 << (c&7));
|
||||
if (caseless && (cd->ctypes[c] & ctype_letter) != 0)
|
||||
start_bits[cd->fcc[c]/8] |= (1 << (cd->fcc[c]&7));
|
||||
}
|
||||
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Create bitmap of starting chars *
|
||||
*************************************************/
|
||||
|
||||
/* This function scans a compiled unanchored expression and attempts to build a
|
||||
bitmap of the set of initial characters. If it can't, it returns FALSE. As time
|
||||
goes by, we may be able to get more clever at doing this.
|
||||
|
||||
Arguments:
|
||||
code points to an expression
|
||||
start_bits points to a 32-byte table, initialized to 0
|
||||
caseless the current state of the caseless flag
|
||||
cd the block with char table pointers
|
||||
|
||||
Returns: TRUE if table built, FALSE otherwise
|
||||
*/
|
||||
|
||||
static BOOL
|
||||
set_start_bits(const uschar *code, uschar *start_bits, BOOL caseless,
|
||||
compile_data *cd)
|
||||
{
|
||||
register int c;
|
||||
|
||||
/* This next statement and the later reference to dummy are here in order to
|
||||
trick the optimizer of the IBM C compiler for OS/2 into generating correct
|
||||
code. Apparently IBM isn't going to fix the problem, and we would rather not
|
||||
disable optimization (in this module it actually makes a big difference, and
|
||||
the pcre module can use all the optimization it can get). */
|
||||
|
||||
volatile int dummy;
|
||||
|
||||
do
|
||||
{
|
||||
const uschar *tcode = code + 3;
|
||||
BOOL try_next = TRUE;
|
||||
|
||||
while (try_next)
|
||||
{
|
||||
try_next = FALSE;
|
||||
|
||||
/* If a branch starts with a bracket or a positive lookahead assertion,
|
||||
recurse to set bits from within them. That's all for this branch. */
|
||||
|
||||
if ((int)*tcode >= OP_BRA || *tcode == OP_ASSERT)
|
||||
{
|
||||
if (!set_start_bits(tcode, start_bits, caseless, cd))
|
||||
return FALSE;
|
||||
}
|
||||
|
||||
else switch(*tcode)
|
||||
{
|
||||
default:
|
||||
return FALSE;
|
||||
|
||||
/* Skip over lookbehind and negative lookahead assertions */
|
||||
|
||||
case OP_ASSERT_NOT:
|
||||
case OP_ASSERTBACK:
|
||||
case OP_ASSERTBACK_NOT:
|
||||
try_next = TRUE;
|
||||
do tcode += (tcode[1] << 8) + tcode[2]; while (*tcode == OP_ALT);
|
||||
tcode += 3;
|
||||
break;
|
||||
|
||||
/* Skip over an option setting, changing the caseless flag */
|
||||
|
||||
case OP_OPT:
|
||||
caseless = (tcode[1] & PCRE_CASELESS) != 0;
|
||||
tcode += 2;
|
||||
try_next = TRUE;
|
||||
break;
|
||||
|
||||
/* BRAZERO does the bracket, but carries on. */
|
||||
|
||||
case OP_BRAZERO:
|
||||
case OP_BRAMINZERO:
|
||||
if (!set_start_bits(++tcode, start_bits, caseless, cd))
|
||||
return FALSE;
|
||||
dummy = 1;
|
||||
do tcode += (tcode[1] << 8) + tcode[2]; while (*tcode == OP_ALT);
|
||||
tcode += 3;
|
||||
try_next = TRUE;
|
||||
break;
|
||||
|
||||
/* Single-char * or ? sets the bit and tries the next item */
|
||||
|
||||
case OP_STAR:
|
||||
case OP_MINSTAR:
|
||||
case OP_QUERY:
|
||||
case OP_MINQUERY:
|
||||
set_bit(start_bits, tcode[1], caseless, cd);
|
||||
tcode += 2;
|
||||
try_next = TRUE;
|
||||
break;
|
||||
|
||||
/* Single-char upto sets the bit and tries the next */
|
||||
|
||||
case OP_UPTO:
|
||||
case OP_MINUPTO:
|
||||
set_bit(start_bits, tcode[3], caseless, cd);
|
||||
tcode += 4;
|
||||
try_next = TRUE;
|
||||
break;
|
||||
|
||||
/* At least one single char sets the bit and stops */
|
||||
|
||||
case OP_EXACT: /* Fall through */
|
||||
tcode++;
|
||||
|
||||
case OP_CHARS: /* Fall through */
|
||||
tcode++;
|
||||
|
||||
case OP_PLUS:
|
||||
case OP_MINPLUS:
|
||||
set_bit(start_bits, tcode[1], caseless, cd);
|
||||
break;
|
||||
|
||||
/* Single character type sets the bits and stops */
|
||||
|
||||
case OP_NOT_DIGIT:
|
||||
for (c = 0; c < 32; c++)
|
||||
start_bits[c] |= ~cd->cbits[c+cbit_digit];
|
||||
break;
|
||||
|
||||
case OP_DIGIT:
|
||||
for (c = 0; c < 32; c++)
|
||||
start_bits[c] |= cd->cbits[c+cbit_digit];
|
||||
break;
|
||||
|
||||
case OP_NOT_WHITESPACE:
|
||||
for (c = 0; c < 32; c++)
|
||||
start_bits[c] |= ~cd->cbits[c+cbit_space];
|
||||
break;
|
||||
|
||||
case OP_WHITESPACE:
|
||||
for (c = 0; c < 32; c++)
|
||||
start_bits[c] |= cd->cbits[c+cbit_space];
|
||||
break;
|
||||
|
||||
case OP_NOT_WORDCHAR:
|
||||
for (c = 0; c < 32; c++)
|
||||
start_bits[c] |= ~(cd->cbits[c] | cd->cbits[c+cbit_word]);
|
||||
break;
|
||||
|
||||
case OP_WORDCHAR:
|
||||
for (c = 0; c < 32; c++)
|
||||
start_bits[c] |= (cd->cbits[c] | cd->cbits[c+cbit_word]);
|
||||
break;
|
||||
|
||||
/* One or more character type fudges the pointer and restarts, knowing
|
||||
it will hit a single character type and stop there. */
|
||||
|
||||
case OP_TYPEPLUS:
|
||||
case OP_TYPEMINPLUS:
|
||||
tcode++;
|
||||
try_next = TRUE;
|
||||
break;
|
||||
|
||||
case OP_TYPEEXACT:
|
||||
tcode += 3;
|
||||
try_next = TRUE;
|
||||
break;
|
||||
|
||||
/* Zero or more repeats of character types set the bits and then
|
||||
try again. */
|
||||
|
||||
case OP_TYPEUPTO:
|
||||
case OP_TYPEMINUPTO:
|
||||
tcode += 2; /* Fall through */
|
||||
|
||||
case OP_TYPESTAR:
|
||||
case OP_TYPEMINSTAR:
|
||||
case OP_TYPEQUERY:
|
||||
case OP_TYPEMINQUERY:
|
||||
switch(tcode[1])
|
||||
{
|
||||
case OP_NOT_DIGIT:
|
||||
for (c = 0; c < 32; c++)
|
||||
start_bits[c] |= ~cd->cbits[c+cbit_digit];
|
||||
break;
|
||||
|
||||
case OP_DIGIT:
|
||||
for (c = 0; c < 32; c++)
|
||||
start_bits[c] |= cd->cbits[c+cbit_digit];
|
||||
break;
|
||||
|
||||
case OP_NOT_WHITESPACE:
|
||||
for (c = 0; c < 32; c++)
|
||||
start_bits[c] |= ~cd->cbits[c+cbit_space];
|
||||
break;
|
||||
|
||||
case OP_WHITESPACE:
|
||||
for (c = 0; c < 32; c++)
|
||||
start_bits[c] |= cd->cbits[c+cbit_space];
|
||||
break;
|
||||
|
||||
case OP_NOT_WORDCHAR:
|
||||
for (c = 0; c < 32; c++)
|
||||
start_bits[c] |= ~(cd->cbits[c] | cd->cbits[c+cbit_word]);
|
||||
break;
|
||||
|
||||
case OP_WORDCHAR:
|
||||
for (c = 0; c < 32; c++)
|
||||
start_bits[c] |= (cd->cbits[c] | cd->cbits[c+cbit_word]);
|
||||
break;
|
||||
}
|
||||
|
||||
tcode += 2;
|
||||
try_next = TRUE;
|
||||
break;
|
||||
|
||||
/* Character class: set the bits and either carry on or not,
|
||||
according to the repeat count. */
|
||||
|
||||
case OP_CLASS:
|
||||
{
|
||||
tcode++;
|
||||
for (c = 0; c < 32; c++) start_bits[c] |= tcode[c];
|
||||
tcode += 32;
|
||||
switch (*tcode)
|
||||
{
|
||||
case OP_CRSTAR:
|
||||
case OP_CRMINSTAR:
|
||||
case OP_CRQUERY:
|
||||
case OP_CRMINQUERY:
|
||||
tcode++;
|
||||
try_next = TRUE;
|
||||
break;
|
||||
|
||||
case OP_CRRANGE:
|
||||
case OP_CRMINRANGE:
|
||||
if (((tcode[1] << 8) + tcode[2]) == 0)
|
||||
{
|
||||
tcode += 5;
|
||||
try_next = TRUE;
|
||||
}
|
||||
break;
|
||||
}
|
||||
}
|
||||
break; /* End of class handling */
|
||||
|
||||
} /* End of switch */
|
||||
} /* End of try_next loop */
|
||||
|
||||
code += (code[1] << 8) + code[2]; /* Advance to next branch */
|
||||
}
|
||||
while (*code == OP_ALT);
|
||||
return TRUE;
|
||||
}
|
||||
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Study a compiled expression *
|
||||
*************************************************/
|
||||
|
||||
/* This function is handed a compiled expression that it must study to produce
|
||||
information that will speed up the matching. It returns a pcre_extra block
|
||||
which then gets handed back to pcre_exec().
|
||||
|
||||
Arguments:
|
||||
re points to the compiled expression
|
||||
options contains option bits
|
||||
errorptr points to where to place error messages;
|
||||
set NULL unless error
|
||||
|
||||
Returns: pointer to a pcre_extra block,
|
||||
NULL on error or if no optimization possible
|
||||
*/
|
||||
|
||||
pcre_extra *
|
||||
pcre_study(const pcre *external_re, int options, const char **errorptr)
|
||||
{
|
||||
uschar start_bits[32];
|
||||
real_pcre_extra *extra;
|
||||
const real_pcre *re = (const real_pcre *)external_re;
|
||||
compile_data compile_block;
|
||||
|
||||
*errorptr = NULL;
|
||||
|
||||
if (re == NULL || re->magic_number != MAGIC_NUMBER)
|
||||
{
|
||||
*errorptr = "argument is not a compiled regular expression";
|
||||
return NULL;
|
||||
}
|
||||
|
||||
if ((options & ~PUBLIC_STUDY_OPTIONS) != 0)
|
||||
{
|
||||
*errorptr = "unknown or incorrect option bit(s) set";
|
||||
return NULL;
|
||||
}
|
||||
|
||||
/* For an anchored pattern, or an unchored pattern that has a first char, or a
|
||||
multiline pattern that matches only at "line starts", no further processing at
|
||||
present. */
|
||||
|
||||
if ((re->options & (PCRE_ANCHORED|PCRE_FIRSTSET|PCRE_STARTLINE)) != 0)
|
||||
return NULL;
|
||||
|
||||
/* Set the character tables in the block which is passed around */
|
||||
|
||||
compile_block.lcc = re->tables + lcc_offset;
|
||||
compile_block.fcc = re->tables + fcc_offset;
|
||||
compile_block.cbits = re->tables + cbits_offset;
|
||||
compile_block.ctypes = re->tables + ctypes_offset;
|
||||
|
||||
/* See if we can find a fixed set of initial characters for the pattern. */
|
||||
|
||||
memset(start_bits, 0, 32 * sizeof(uschar));
|
||||
if (!set_start_bits(re->code, start_bits, (re->options & PCRE_CASELESS) != 0,
|
||||
&compile_block)) return NULL;
|
||||
|
||||
/* Get an "extra" block and put the information therein. */
|
||||
|
||||
extra = (real_pcre_extra *)(pcre_malloc)(sizeof(real_pcre_extra));
|
||||
|
||||
if (extra == NULL)
|
||||
{
|
||||
*errorptr = "failed to get memory";
|
||||
return NULL;
|
||||
}
|
||||
|
||||
extra->options = PCRE_STUDY_MAPPED;
|
||||
memcpy(extra->start_bits, start_bits, sizeof(start_bits));
|
||||
|
||||
return (pcre_extra *)extra;
|
||||
}
|
||||
|
||||
/* End of study.c */
|
File diff suppressed because it is too large
Load Diff
@ -1,589 +0,0 @@
|
||||
/(a)b|/
|
||||
|
||||
/abc/
|
||||
abc
|
||||
defabc
|
||||
\Aabc
|
||||
*** Failers
|
||||
\Adefabc
|
||||
ABC
|
||||
|
||||
/^abc/
|
||||
abc
|
||||
\Aabc
|
||||
*** Failers
|
||||
defabc
|
||||
\Adefabc
|
||||
|
||||
/a+bc/
|
||||
|
||||
/a*bc/
|
||||
|
||||
/a{3}bc/
|
||||
|
||||
/(abc|a+z)/
|
||||
|
||||
/^abc$/
|
||||
abc
|
||||
*** Failers
|
||||
def\nabc
|
||||
|
||||
/ab\gdef/X
|
||||
|
||||
/(?X)ab\gdef/X
|
||||
|
||||
/x{5,4}/
|
||||
|
||||
/z{65536}/
|
||||
|
||||
/[abcd/
|
||||
|
||||
/[\B]/
|
||||
|
||||
/[a-\w]/
|
||||
|
||||
/[z-a]/
|
||||
|
||||
/^*/
|
||||
|
||||
/(abc/
|
||||
|
||||
/(?# abc/
|
||||
|
||||
/(?z)abc/
|
||||
|
||||
/.*b/
|
||||
|
||||
/.*?b/
|
||||
|
||||
/cat|dog|elephant/
|
||||
this sentence eventually mentions a cat
|
||||
this sentences rambles on and on for a while and then reaches elephant
|
||||
|
||||
/cat|dog|elephant/S
|
||||
this sentence eventually mentions a cat
|
||||
this sentences rambles on and on for a while and then reaches elephant
|
||||
|
||||
/cat|dog|elephant/iS
|
||||
this sentence eventually mentions a CAT cat
|
||||
this sentences rambles on and on for a while to elephant ElePhant
|
||||
|
||||
/a|[bcd]/S
|
||||
|
||||
/(a|[^\dZ])/S
|
||||
|
||||
/(a|b)*[\s]/S
|
||||
|
||||
/(ab\2)/
|
||||
|
||||
/{4,5}abc/
|
||||
|
||||
/(a)(b)(c)\2/
|
||||
abcb
|
||||
\O0abcb
|
||||
\O3abcb
|
||||
\O6abcb
|
||||
\O9abcb
|
||||
\O12abcb
|
||||
|
||||
/(a)bc|(a)(b)\2/
|
||||
abc
|
||||
\O0abc
|
||||
\O3abc
|
||||
\O6abc
|
||||
aba
|
||||
\O0aba
|
||||
\O3aba
|
||||
\O6aba
|
||||
\O9aba
|
||||
\O12aba
|
||||
|
||||
/abc$/E
|
||||
abc
|
||||
*** Failers
|
||||
abc\n
|
||||
abc\ndef
|
||||
|
||||
/(a)(b)(c)(d)(e)\6/
|
||||
|
||||
/the quick brown fox/
|
||||
the quick brown fox
|
||||
this is a line with the quick brown fox
|
||||
|
||||
/the quick brown fox/A
|
||||
the quick brown fox
|
||||
*** Failers
|
||||
this is a line with the quick brown fox
|
||||
|
||||
/ab(?z)cd/
|
||||
|
||||
/^abc|def/
|
||||
abcdef
|
||||
abcdef\B
|
||||
|
||||
/.*((abc)$|(def))/
|
||||
defabc
|
||||
\Zdefabc
|
||||
|
||||
/abc/P
|
||||
abc
|
||||
*** Failers
|
||||
|
||||
/^abc|def/P
|
||||
abcdef
|
||||
abcdef\B
|
||||
|
||||
/.*((abc)$|(def))/P
|
||||
defabc
|
||||
\Zdefabc
|
||||
|
||||
/the quick brown fox/P
|
||||
the quick brown fox
|
||||
*** Failers
|
||||
The Quick Brown Fox
|
||||
|
||||
/the quick brown fox/Pi
|
||||
the quick brown fox
|
||||
The Quick Brown Fox
|
||||
|
||||
/abc.def/P
|
||||
*** Failers
|
||||
abc\ndef
|
||||
|
||||
/abc$/P
|
||||
abc
|
||||
abc\n
|
||||
|
||||
/(abc)\2/P
|
||||
|
||||
/(abc\1)/P
|
||||
abc
|
||||
|
||||
/)/
|
||||
|
||||
/a[]b/
|
||||
|
||||
/[^aeiou ]{3,}/
|
||||
co-processors, and for
|
||||
|
||||
/<.*>/
|
||||
abc<def>ghi<klm>nop
|
||||
|
||||
/<.*?>/
|
||||
abc<def>ghi<klm>nop
|
||||
|
||||
/<.*>/U
|
||||
abc<def>ghi<klm>nop
|
||||
|
||||
/<.*>(?U)/
|
||||
abc<def>ghi<klm>nop
|
||||
|
||||
/<.*?>/U
|
||||
abc<def>ghi<klm>nop
|
||||
|
||||
/={3,}/U
|
||||
abc========def
|
||||
|
||||
/(?U)={3,}?/
|
||||
abc========def
|
||||
|
||||
/(?<!bar|cattle)foo/
|
||||
foo
|
||||
catfoo
|
||||
*** Failers
|
||||
the barfoo
|
||||
and cattlefoo
|
||||
|
||||
/(?<=a+)b/
|
||||
|
||||
/(?<=aaa|b{0,3})b/
|
||||
|
||||
/(?<!(foo)a\1)bar/
|
||||
|
||||
/(?i)abc/
|
||||
|
||||
/(a|(?m)a)/
|
||||
|
||||
/(?i)^1234/
|
||||
|
||||
/(^b|(?i)^d)/
|
||||
|
||||
/(?s).*/
|
||||
|
||||
/[abcd]/S
|
||||
|
||||
/(?i)[abcd]/S
|
||||
|
||||
/(?m)[xy]|(b|c)/S
|
||||
|
||||
/(^a|^b)/m
|
||||
|
||||
/(?i)(^a|^b)/m
|
||||
|
||||
/(a)(?(1)a|b|c)/
|
||||
|
||||
/(?(?=a)a|b|c)/
|
||||
|
||||
/(?(1a)/
|
||||
|
||||
/(?(?i))/
|
||||
|
||||
/(?(abc))/
|
||||
|
||||
/(?(?<ab))/
|
||||
|
||||
/((?s)blah)\s+\1/
|
||||
|
||||
/((?i)blah)\s+\1/
|
||||
|
||||
/((?i)b)/DS
|
||||
|
||||
/(a*b|(?i:c*(?-i)d))/S
|
||||
|
||||
/a$/
|
||||
a
|
||||
a\n
|
||||
*** Failers
|
||||
\Za
|
||||
\Za\n
|
||||
|
||||
/a$/m
|
||||
a
|
||||
a\n
|
||||
\Za\n
|
||||
*** Failers
|
||||
\Za
|
||||
|
||||
/\Aabc/m
|
||||
|
||||
/^abc/m
|
||||
|
||||
/^((a+)(?U)([ab]+)(?-U)([bc]+)(\w*))/
|
||||
aaaaabbbbbcccccdef
|
||||
|
||||
/(?<=foo)[ab]/S
|
||||
|
||||
/(?<!foo)(alpha|omega)/S
|
||||
|
||||
/(?!alphabet)[ab]/S
|
||||
|
||||
/(?<=foo\n)^bar/m
|
||||
|
||||
/(?>^abc)/m
|
||||
abc
|
||||
def\nabc
|
||||
*** Failers
|
||||
defabc
|
||||
|
||||
/(?<=ab(c+)d)ef/
|
||||
|
||||
/(?<=ab(?<=c+)d)ef/
|
||||
|
||||
/(?<=ab(c|de)f)g/
|
||||
|
||||
/The next three are in testinput2 because they have variable length branches/
|
||||
|
||||
/(?<=bullock|donkey)-cart/
|
||||
the bullock-cart
|
||||
a donkey-cart race
|
||||
*** Failers
|
||||
cart
|
||||
horse-and-cart
|
||||
|
||||
/(?<=ab(?i)x|y|z)/
|
||||
|
||||
/(?>.*)(?<=(abcd)|(xyz))/
|
||||
alphabetabcd
|
||||
endingxyz
|
||||
|
||||
/(?<=ab(?i)x(?-i)y|(?i)z|b)ZZ/
|
||||
abxyZZ
|
||||
abXyZZ
|
||||
ZZZ
|
||||
zZZ
|
||||
bZZ
|
||||
BZZ
|
||||
*** Failers
|
||||
ZZ
|
||||
abXYZZ
|
||||
zzz
|
||||
bzz
|
||||
|
||||
/(?<!(foo)a)bar/
|
||||
bar
|
||||
foobbar
|
||||
*** Failers
|
||||
fooabar
|
||||
|
||||
/This one is here because Perl 5.005_02 doesn't fail it/
|
||||
|
||||
/^(a)?(?(1)a|b)+$/
|
||||
*** Failers
|
||||
a
|
||||
|
||||
/This one is here because I think Perl 5.005_02 gets the setting of $1 wrong/
|
||||
|
||||
/^(a\1?){4}$/
|
||||
aaaaaa
|
||||
|
||||
/These are syntax tests from Perl 5.005/
|
||||
|
||||
/a[b-a]/
|
||||
|
||||
/a[]b/
|
||||
|
||||
/a[/
|
||||
|
||||
/*a/
|
||||
|
||||
/(*)b/
|
||||
|
||||
/abc)/
|
||||
|
||||
/(abc/
|
||||
|
||||
/a**/
|
||||
|
||||
/)(/
|
||||
|
||||
/\1/
|
||||
|
||||
/\2/
|
||||
|
||||
/(a)|\2/
|
||||
|
||||
/a[b-a]/i
|
||||
|
||||
/a[]b/i
|
||||
|
||||
/a[/i
|
||||
|
||||
/*a/i
|
||||
|
||||
/(*)b/i
|
||||
|
||||
/abc)/i
|
||||
|
||||
/(abc/i
|
||||
|
||||
/a**/i
|
||||
|
||||
/)(/i
|
||||
|
||||
/:(?:/
|
||||
|
||||
/(?<%)b/
|
||||
|
||||
/a(?{)b/
|
||||
|
||||
/a(?{{})b/
|
||||
|
||||
/a(?{}})b/
|
||||
|
||||
/a(?{"{"})b/
|
||||
|
||||
/a(?{"{"}})b/
|
||||
|
||||
/(?(1?)a|b)/
|
||||
|
||||
/(?(1)a|b|c)/
|
||||
|
||||
/[a[:xyz:/
|
||||
|
||||
/(?<=x+)y/
|
||||
|
||||
/a{37,17}/
|
||||
|
||||
/abc/\
|
||||
|
||||
/abc/\P
|
||||
|
||||
/abc/\i
|
||||
|
||||
/(a)bc(d)/
|
||||
abcd
|
||||
abcd\C2
|
||||
abcd\C5
|
||||
|
||||
/(.{20})/
|
||||
abcdefghijklmnopqrstuvwxyz
|
||||
abcdefghijklmnopqrstuvwxyz\C1
|
||||
abcdefghijklmnopqrstuvwxyz\G1
|
||||
|
||||
/(.{15})/
|
||||
abcdefghijklmnopqrstuvwxyz
|
||||
abcdefghijklmnopqrstuvwxyz\C1\G1
|
||||
|
||||
/(.{16})/
|
||||
abcdefghijklmnopqrstuvwxyz
|
||||
abcdefghijklmnopqrstuvwxyz\C1\G1\L
|
||||
|
||||
/^(a|(bc))de(f)/
|
||||
adef\G1\G2\G3\G4\L
|
||||
bcdef\G1\G2\G3\G4\L
|
||||
adefghijk\C0
|
||||
|
||||
/^abc\00def/
|
||||
abc\00def\L\C0
|
||||
|
||||
/word ((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+
|
||||
)((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+
|
||||
)?)?)?)?)?)?)?)?)?otherword/M
|
||||
|
||||
/.*X/D
|
||||
|
||||
/.*X/Ds
|
||||
|
||||
/(.*X|^B)/D
|
||||
|
||||
/(.*X|^B)/Ds
|
||||
|
||||
/(?s)(.*X|^B)/D
|
||||
|
||||
/(?s:.*X|^B)/D
|
||||
|
||||
/\Biss\B/+
|
||||
Mississippi
|
||||
|
||||
/\Biss\B/+P
|
||||
Mississippi
|
||||
|
||||
/iss/G+
|
||||
Mississippi
|
||||
|
||||
/\Biss\B/G+
|
||||
Mississippi
|
||||
|
||||
/\Biss\B/g+
|
||||
Mississippi
|
||||
*** Failers
|
||||
Mississippi\A
|
||||
|
||||
/(?<=[Ms])iss/g+
|
||||
Mississippi
|
||||
|
||||
/(?<=[Ms])iss/G+
|
||||
Mississippi
|
||||
|
||||
/^iss/g+
|
||||
ississippi
|
||||
|
||||
/.*iss/g+
|
||||
abciss\nxyzisspqr
|
||||
|
||||
/.i./+g
|
||||
Mississippi
|
||||
Mississippi\A
|
||||
Missouri river
|
||||
Missouri river\A
|
||||
|
||||
/^.is/+g
|
||||
Mississippi
|
||||
|
||||
/^ab\n/g+
|
||||
ab\nab\ncd
|
||||
|
||||
/^ab\n/mg+
|
||||
ab\nab\ncd
|
||||
|
||||
/abc/
|
||||
|
||||
/abc|bac/
|
||||
|
||||
/(abc|bac)/
|
||||
|
||||
/(abc|(c|dc))/
|
||||
|
||||
/(abc|(d|de)c)/
|
||||
|
||||
/a*/
|
||||
|
||||
/a+/
|
||||
|
||||
/(baa|a+)/
|
||||
|
||||
/a{0,3}/
|
||||
|
||||
/baa{3,}/
|
||||
|
||||
/"([^\\"]+|\\.)*"/
|
||||
|
||||
/(abc|ab[cd])/
|
||||
|
||||
/(a|.)/
|
||||
|
||||
/a|ba|\w/
|
||||
|
||||
/abc(?=pqr)/
|
||||
|
||||
/...(?<=abc)/
|
||||
|
||||
/abc(?!pqr)/
|
||||
|
||||
/ab./
|
||||
|
||||
/ab[xyz]/
|
||||
|
||||
/abc*/
|
||||
|
||||
/ab.c*/
|
||||
|
||||
/a.c*/
|
||||
|
||||
/.c*/
|
||||
|
||||
/ac*/
|
||||
|
||||
/(a.c*|b.c*)/
|
||||
|
||||
/a.c*|aba/
|
||||
|
||||
/.+a/
|
||||
|
||||
/(?=abcda)a.*/
|
||||
|
||||
/(?=a)a.*/
|
||||
|
||||
/a(b)*/
|
||||
|
||||
/a\d*/
|
||||
|
||||
/ab\d*/
|
||||
|
||||
/a(\d)*/
|
||||
|
||||
/abcde{0,0}/
|
||||
|
||||
/ab\d+/
|
||||
|
||||
/a(?(1)b)/
|
||||
|
||||
/a(?(1)bag|big)/
|
||||
|
||||
/a(?(1)bag|big)*/
|
||||
|
||||
/a(?(1)bag|big)+/
|
||||
|
||||
/a(?(1)b..|b..)/
|
||||
|
||||
/ab\d{0}e/
|
||||
|
||||
/a?b?/
|
||||
a
|
||||
b
|
||||
ab
|
||||
\
|
||||
*** Failers
|
||||
\N
|
||||
|
||||
/|-/
|
||||
abcd
|
||||
-abc
|
||||
\Nab-c
|
||||
*** Failers
|
||||
\Nabc
|
||||
|
||||
/.*?/g+
|
||||
abc
|
||||
|
||||
/ End of test input /
|
File diff suppressed because it is too large
Load Diff
@ -1,64 +0,0 @@
|
||||
/^[\w]+/
|
||||
*** Failers
|
||||
École
|
||||
|
||||
/^[\w]+/Lfr
|
||||
École
|
||||
|
||||
/^[\w]+/
|
||||
*** Failers
|
||||
École
|
||||
|
||||
/^[\W]+/
|
||||
École
|
||||
|
||||
/^[\W]+/Lfr
|
||||
*** Failers
|
||||
École
|
||||
|
||||
/[\b]/
|
||||
\b
|
||||
*** Failers
|
||||
a
|
||||
|
||||
/[\b]/Lfr
|
||||
\b
|
||||
*** Failers
|
||||
a
|
||||
|
||||
/^\w+/
|
||||
*** Failers
|
||||
École
|
||||
|
||||
/^\w+/Lfr
|
||||
École
|
||||
|
||||
/(.+)\b(.+)/
|
||||
École
|
||||
|
||||
/(.+)\b(.+)/Lfr
|
||||
*** Failers
|
||||
École
|
||||
|
||||
/École/i
|
||||
École
|
||||
*** Failers
|
||||
école
|
||||
|
||||
/École/iLfr
|
||||
École
|
||||
école
|
||||
|
||||
/\w/IS
|
||||
|
||||
/\w/ISLfr
|
||||
|
||||
/^[\xc8-\xc9]/iLfr
|
||||
École
|
||||
école
|
||||
|
||||
/^[\xc8-\xc9]/Lfr
|
||||
École
|
||||
*** Failers
|
||||
école
|
||||
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@ -1,115 +0,0 @@
|
||||
PCRE version 2.08 31-Aug-1999
|
||||
|
||||
/^[\w]+/
|
||||
*** Failers
|
||||
No match
|
||||
École
|
||||
No match
|
||||
|
||||
/^[\w]+/Lfr
|
||||
École
|
||||
0: École
|
||||
|
||||
/^[\w]+/
|
||||
*** Failers
|
||||
No match
|
||||
École
|
||||
No match
|
||||
|
||||
/^[\W]+/
|
||||
École
|
||||
0: \xc9
|
||||
|
||||
/^[\W]+/Lfr
|
||||
*** Failers
|
||||
0: ***
|
||||
École
|
||||
No match
|
||||
|
||||
/[\b]/
|
||||
\b
|
||||
0: \x08
|
||||
*** Failers
|
||||
No match
|
||||
a
|
||||
No match
|
||||
|
||||
/[\b]/Lfr
|
||||
\b
|
||||
0: \x08
|
||||
*** Failers
|
||||
No match
|
||||
a
|
||||
No match
|
||||
|
||||
/^\w+/
|
||||
*** Failers
|
||||
No match
|
||||
École
|
||||
No match
|
||||
|
||||
/^\w+/Lfr
|
||||
École
|
||||
0: École
|
||||
|
||||
/(.+)\b(.+)/
|
||||
École
|
||||
0: \xc9cole
|
||||
1: \xc9
|
||||
2: cole
|
||||
|
||||
/(.+)\b(.+)/Lfr
|
||||
*** Failers
|
||||
0: *** Failers
|
||||
1: ***
|
||||
2: Failers
|
||||
École
|
||||
No match
|
||||
|
||||
/École/i
|
||||
École
|
||||
0: \xc9cole
|
||||
*** Failers
|
||||
No match
|
||||
école
|
||||
No match
|
||||
|
||||
/École/iLfr
|
||||
École
|
||||
0: École
|
||||
école
|
||||
0: école
|
||||
|
||||
/\w/IS
|
||||
Identifying subpattern count = 0
|
||||
No options
|
||||
No first char
|
||||
No req char
|
||||
Starting character set: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
|
||||
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
|
||||
|
||||
/\w/ISLfr
|
||||
Identifying subpattern count = 0
|
||||
No options
|
||||
No first char
|
||||
No req char
|
||||
Starting character set: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
|
||||
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
|
||||
À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ğ Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü İ Ş ß à á â ã ä å
|
||||
æ ç è é ê ë ì í î ï ğ ñ ò ó ô õ ö ø ù ú û ü ı ş ÿ
|
||||
|
||||
/^[\xc8-\xc9]/iLfr
|
||||
École
|
||||
0: É
|
||||
école
|
||||
0: é
|
||||
|
||||
/^[\xc8-\xc9]/Lfr
|
||||
École
|
||||
0: É
|
||||
*** Failers
|
||||
No match
|
||||
école
|
||||
No match
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user