*** empty log message ***

This commit is contained in:
Andrei Zmievski 2000-04-11 17:35:42 +00:00
parent 97c9603b02
commit beb6916cfc
38 changed files with 0 additions and 26748 deletions

View File

@ -1,519 +0,0 @@
ChangeLog for PCRE
------------------
Version 2.08 31-Aug-99
----------------------
1. When startoffset was not zero and the pattern began with ".*", PCRE was not
trying to match at the startoffset position, but instead was moving forward to
the next newline as if a previous match had failed.
2. pcretest was not making use of PCRE_NOTEMPTY when repeating for /g and /G,
and could get into a loop if a null string was matched other than at the start
of the subject.
3. Added definitions of PCRE_MAJOR and PCRE_MINOR to pcre.h so the version can
be distinguished at compile time, and for completeness also added PCRE_DATE.
5. Added Paul Sokolovsky's minor changes to make it easy to compile a Win32 DLL
in GnuWin32 environments.
Version 2.07 29-Jul-99
----------------------
1. The documentation is now supplied in plain text form and HTML as well as in
the form of man page sources.
2. C++ compilers don't like assigning (void *) values to other pointer types.
In particular this affects malloc(). Although there is no problem in Standard
C, I've put in casts to keep C++ compilers happy.
3. Typo on pcretest.c; a cast of (unsigned char *) in the POSIX regexec() call
should be (const char *).
4. If NOPOSIX is defined, pcretest.c compiles without POSIX support. This may
be useful for non-Unix systems who don't want to bother with the POSIX stuff.
However, I haven't made this a standard facility. The documentation doesn't
mention it, and the Makefile doesn't support it.
5. The Makefile now contains an "install" target, with editable destinations at
the top of the file. The pcretest program is not installed.
6. pgrep -V now gives the PCRE version number and date.
7. Fixed bug: a zero repetition after a literal string (e.g. /abcde{0}/) was
causing the entire string to be ignored, instead of just the last character.
8. If a pattern like /"([^\\"]+|\\.)*"/ is applied in the normal way to a
non-matching string, it can take a very, very long time, even for strings of
quite modest length, because of the nested recursion. PCRE now does better in
some of these cases. It does this by remembering the last required literal
character in the pattern, and pre-searching the subject to ensure it is present
before running the real match. In other words, it applies a heuristic to detect
some types of certain failure quickly, and in the above example, if presented
with a string that has no trailing " it gives "no match" very quickly.
9. A new runtime option PCRE_NOTEMPTY causes null string matches to be ignored;
other alternatives are tried instead.
Version 2.06 09-Jun-99
----------------------
1. Change pcretest's output for amount of store used to show just the code
space, because the remainder (the data block) varies in size between 32-bit and
64-bit systems.
2. Added an extra argument to pcre_exec() to supply an offset in the subject to
start matching at. This allows lookbehinds to work when searching for multiple
occurrences in a string.
3. Added additional options to pcretest for testing multiple occurrences:
/+ outputs the rest of the string that follows a match
/g loops for multiple occurrences, using the new startoffset argument
/G loops for multiple occurrences by passing an incremented pointer
4. PCRE wasn't doing the "first character" optimization for patterns starting
with \b or \B, though it was doing it for other lookbehind assertions. That is,
it wasn't noticing that a match for a pattern such as /\bxyz/ has to start with
the letter 'x'. On long subject strings, this gives a significant speed-up.
Version 2.05 21-Apr-99
----------------------
1. Changed the type of magic_number from int to long int so that it works
properly on 16-bit systems.
2. Fixed a bug which caused patterns starting with .* not to work correctly
when the subject string contained newline characters. PCRE was assuming
anchoring for such patterns in all cases, which is not correct because .* will
not pass a newline unless PCRE_DOTALL is set. It now assumes anchoring only if
DOTALL is set at top level; otherwise it knows that patterns starting with .*
must be retried after every newline in the subject.
Version 2.04 18-Feb-99
----------------------
1. For parenthesized subpatterns with repeats whose minimum was zero, the
computation of the store needed to hold the pattern was incorrect (too large).
If such patterns were nested a few deep, this could multiply and become a real
problem.
2. Added /M option to pcretest to show the memory requirement of a specific
pattern. Made -m a synonym of -s (which does this globally) for compatibility.
3. Subpatterns of the form (regex){n,m} (i.e. limited maximum) were being
compiled in such a way that the backtracking after subsequent failure was
pessimal. Something like (a){0,3} was compiled as (a)?(a)?(a)? instead of
((a)((a)(a)?)?)? with disastrous performance if the maximum was of any size.
Version 2.03 02-Feb-99
----------------------
1. Fixed typo and small mistake in man page.
2. Added 4th condition (GPL supersedes if conflict) and created separate
LICENCE file containing the conditions.
3. Updated pcretest so that patterns such as /abc\/def/ work like they do in
Perl, that is the internal \ allows the delimiter to be included in the
pattern. Locked out the use of \ as a delimiter. If \ immediately follows
the final delimiter, add \ to the end of the pattern (to test the error).
4. Added the convenience functions for extracting substrings after a successful
match. Updated pcretest to make it able to test these functions.
Version 2.02 14-Jan-99
----------------------
1. Initialized the working variables associated with each extraction so that
their saving and restoring doesn't refer to uninitialized store.
2. Put dummy code into study.c in order to trick the optimizer of the IBM C
compiler for OS/2 into generating correct code. Apparently IBM isn't going to
fix the problem.
3. Pcretest: the timing code wasn't using LOOPREPEAT for timing execution
calls, and wasn't printing the correct value for compiling calls. Increased the
default value of LOOPREPEAT, and the number of significant figures in the
times.
4. Changed "/bin/rm" in the Makefile to "-rm" so it works on Windows NT.
5. Renamed "deftables" as "dftables" to get it down to 8 characters, to avoid
a building problem on Windows NT with a FAT file system.
Version 2.01 21-Oct-98
----------------------
1. Changed the API for pcre_compile() to allow for the provision of a pointer
to character tables built by pcre_maketables() in the current locale. If NULL
is passed, the default tables are used.
Version 2.00 24-Sep-98
----------------------
1. Since the (>?) facility is in Perl 5.005, don't require PCRE_EXTRA to enable
it any more.
2. Allow quantification of (?>) groups, and make it work correctly.
3. The first character computation wasn't working for (?>) groups.
4. Correct the implementation of \Z (it is permitted to match on the \n at the
end of the subject) and add 5.005's \z, which really does match only at the
very end of the subject.
5. Remove the \X "cut" facility; Perl doesn't have it, and (?> is neater.
6. Remove the ability to specify CASELESS, MULTILINE, DOTALL, and
DOLLAR_END_ONLY at runtime, to make it possible to implement the Perl 5.005
localized options. All options to pcre_study() were also removed.
7. Add other new features from 5.005:
$(?<= positive lookbehind
$(?<! negative lookbehind
(?imsx-imsx) added the unsetting capability
such a setting is global if at outer level; local otherwise
(?imsx-imsx:) non-capturing groups with option setting
(?(cond)re|re) conditional pattern matching
A backreference to itself in a repeated group matches the previous
captured string.
8. General tidying up of studying (both automatic and via "study")
consequential on the addition of new assertions.
9. As in 5.005, unlimited repeated groups that could match an empty substring
are no longer faulted at compile time. Instead, the loop is forcibly broken at
runtime if any iteration does actually match an empty substring.
10. Include the RunTest script in the distribution.
11. Added tests from the Perl 5.005_02 distribution. This showed up a few
discrepancies, some of which were old and were also with respect to 5.004. They
have now been fixed.
Version 1.09 28-Apr-98
----------------------
1. A negated single character class followed by a quantifier with a minimum
value of one (e.g. [^x]{1,6} ) was not compiled correctly. This could lead to
program crashes, or just wrong answers. This did not apply to negated classes
containing more than one character, or to minima other than one.
Version 1.08 27-Mar-98
----------------------
1. Add PCRE_UNGREEDY to invert the greediness of quantifiers.
2. Add (?U) and (?X) to set PCRE_UNGREEDY and PCRE_EXTRA respectively. The
latter must appear before anything that relies on it in the pattern.
Version 1.07 16-Feb-98
----------------------
1. A pattern such as /((a)*)*/ was not being diagnosed as in error (unlimited
repeat of a potentially empty string).
Version 1.06 23-Jan-98
----------------------
1. Added Markus Oberhumer's little patches for C++.
2. Literal strings longer than 255 characters were broken.
Version 1.05 23-Dec-97
----------------------
1. Negated character classes containing more than one character were failing if
PCRE_CASELESS was set at run time.
Version 1.04 19-Dec-97
----------------------
1. Corrected the man page, where some "const" qualifiers had been omitted.
2. Made debugging output print "{0,xxx}" instead of just "{,xxx}" to agree with
input syntax.
3. Fixed memory leak which occurred when a regex with back references was
matched with an offsets vector that wasn't big enough. The temporary memory
that is used in this case wasn't being freed if the match failed.
4. Tidied pcretest to ensure it frees memory that it gets.
5. Temporary memory was being obtained in the case where the passed offsets
vector was exactly big enough.
6. Corrected definition of offsetof() from change 5 below.
7. I had screwed up change 6 below and broken the rules for the use of
setjmp(). Now fixed.
Version 1.03 18-Dec-97
----------------------
1. A erroneous regex with a missing opening parenthesis was correctly
diagnosed, but PCRE attempted to access brastack[-1], which could cause crashes
on some systems.
2. Replaced offsetof(real_pcre, code) by offsetof(real_pcre, code[0]) because
it was reported that one broken compiler failed on the former because "code" is
also an independent variable.
3. The erroneous regex a[]b caused an array overrun reference.
4. A regex ending with a one-character negative class (e.g. /[^k]$/) did not
fail on data ending with that character. (It was going on too far, and checking
the next character, typically a binary zero.) This was specific to the
optimized code for single-character negative classes.
5. Added a contributed patch from the TIN world which does the following:
+ Add an undef for memmove, in case the the system defines a macro for it.
+ Add a definition of offsetof(), in case there isn't one. (I don't know
the reason behind this - offsetof() is part of the ANSI standard - but
it does no harm).
+ Reduce the ifdef's in pcre.c using macro DPRINTF, thereby eliminating
most of the places where whitespace preceded '#'. I have given up and
allowed the remaining 2 cases to be at the margin.
+ Rename some variables in pcre to eliminate shadowing. This seems very
pedantic, but does no harm, of course.
6. Moved the call to setjmp() into its own function, to get rid of warnings
from gcc -Wall, and avoided calling it at all unless PCRE_EXTRA is used.
7. Constructs such as \d{8,} were compiling into the equivalent of
\d{8}\d{0,65527} instead of \d{8}\d* which didn't make much difference to the
outcome, but in this particular case used more store than had been allocated,
which caused the bug to be discovered because it threw up an internal error.
8. The debugging code in both pcre and pcretest for outputting the compiled
form of a regex was going wrong in the case of back references followed by
curly-bracketed repeats.
Version 1.02 12-Dec-97
----------------------
1. Typos in pcre.3 and comments in the source fixed.
2. Applied a contributed patch to get rid of places where it used to remove
'const' from variables, and fixed some signed/unsigned and uninitialized
variable warnings.
3. Added the "runtest" target to Makefile.
4. Set default compiler flag to -O2 rather than just -O.
Version 1.01 19-Nov-97
----------------------
1. PCRE was failing to diagnose unlimited repeat of empty string for patterns
like /([ab]*)*/, that is, for classes with more than one character in them.
2. Likewise, it wasn't diagnosing patterns with "once-only" subpatterns, such
as /((?>a*))*/ (a PCRE_EXTRA facility).
Version 1.00 18-Nov-97
----------------------
1. Added compile-time macros to support systems such as SunOS4 which don't have
memmove() or strerror() but have other things that can be used instead.
2. Arranged that "make clean" removes the executables.
Version 0.99 27-Oct-97
----------------------
1. Fixed bug in code for optimizing classes with only one character. It was
initializing a 32-byte map regardless, which could cause it to run off the end
of the memory it had got.
2. Added, conditional on PCRE_EXTRA, the proposed (?>REGEX) construction.
Version 0.98 22-Oct-97
----------------------
1. Fixed bug in code for handling temporary memory usage when there are more
back references than supplied space in the ovector. This could cause segfaults.
Version 0.97 21-Oct-97
----------------------
1. Added the \X "cut" facility, conditional on PCRE_EXTRA.
2. Optimized negated single characters not to use a bit map.
3. Brought error texts together as macro definitions; clarified some of them;
fixed one that was wrong - it said "range out of order" when it meant "invalid
escape sequence".
4. Changed some char * arguments to const char *.
5. Added PCRE_NOTBOL and PCRE_NOTEOL (from POSIX).
6. Added the POSIX-style API wrapper in pcreposix.a and testing facilities in
pcretest.
Version 0.96 16-Oct-97
----------------------
1. Added a simple "pgrep" utility to the distribution.
2. Fixed an incompatibility with Perl: "{" is now treated as a normal character
unless it appears in one of the precise forms "{ddd}", "{ddd,}", or "{ddd,ddd}"
where "ddd" means "one or more decimal digits".
3. Fixed serious bug. If a pattern had a back reference, but the call to
pcre_exec() didn't supply a large enough ovector to record the related
identifying subpattern, the match always failed. PCRE now remembers the number
of the largest back reference, and gets some temporary memory in which to save
the offsets during matching if necessary, in order to ensure that
backreferences always work.
4. Increased the compatibility with Perl in a number of ways:
(a) . no longer matches \n by default; an option PCRE_DOTALL is provided
to request this handling. The option can be set at compile or exec time.
(b) $ matches before a terminating newline by default; an option
PCRE_DOLLAR_ENDONLY is provided to override this (but not in multiline
mode). The option can be set at compile or exec time.
(c) The handling of \ followed by a digit other than 0 is now supposed to be
the same as Perl's. If the decimal number it represents is less than 10
or there aren't that many previous left capturing parentheses, an octal
escape is read. Inside a character class, it's always an octal escape,
even if it is a single digit.
(d) An escaped but undefined alphabetic character is taken as a literal,
unless PCRE_EXTRA is set. Currently this just reserves the remaining
escapes.
(e) {0} is now permitted. (The previous item is removed from the compiled
pattern).
5. Changed all the names of code files so that the basic parts are no longer
than 10 characters, and abolished the teeny "globals.c" file.
6. Changed the handling of character classes; they are now done with a 32-byte
bit map always.
7. Added the -d and /D options to pcretest to make it possible to look at the
internals of compilation without having to recompile pcre.
Version 0.95 23-Sep-97
----------------------
1. Fixed bug in pre-pass concerning escaped "normal" characters such as \x5c or
\x20 at the start of a run of normal characters. These were being treated as
real characters, instead of the source characters being re-checked.
Version 0.94 18-Sep-97
----------------------
1. The functions are now thread-safe, with the caveat that the global variables
containing pointers to malloc() and free() or alternative functions are the
same for all threads.
2. Get pcre_study() to generate a bitmap of initial characters for non-
anchored patterns when this is possible, and use it if passed to pcre_exec().
Version 0.93 15-Sep-97
----------------------
1. /(b)|(:+)/ was computing an incorrect first character.
2. Add pcre_study() to the API and the passing of pcre_extra to pcre_exec(),
but not actually doing anything yet.
3. Treat "-" characters in classes that cannot be part of ranges as literals,
as Perl does (e.g. [-az] or [az-]).
4. Set the anchored flag if a branch starts with .* or .*? because that tests
all possible positions.
5. Split up into different modules to avoid including unneeded functions in a
compiled binary. However, compile and exec are still in one module. The "study"
function is split off.
6. The character tables are now in a separate module whose source is generated
by an auxiliary program - but can then be edited by hand if required. There are
now no calls to isalnum(), isspace(), isdigit(), isxdigit(), tolower() or
toupper() in the code.
7. Turn the malloc/free funtions variables into pcre_malloc and pcre_free and
make them global. Abolish the function for setting them, as the caller can now
set them directly.
Version 0.92 11-Sep-97
----------------------
1. A repeat with a fixed maximum and a minimum of 1 for an ordinary character
(e.g. /a{1,3}/) was broken (I mis-optimized it).
2. Caseless matching was not working in character classes if the characters in
the pattern were in upper case.
3. Make ranges like [W-c] work in the same way as Perl for caseless matching.
4. Make PCRE_ANCHORED public and accept as a compile option.
5. Add an options word to pcre_exec() and accept PCRE_ANCHORED and
PCRE_CASELESS at run time. Add escapes \A and \I to pcretest to cause it to
pass them.
6. Give an error if bad option bits passed at compile or run time.
7. Add PCRE_MULTILINE at compile and exec time, and (?m) as well. Add \M to
pcretest to cause it to pass that flag.
8. Add pcre_info(), to get the number of identifying subpatterns, the stored
options, and the first character, if set.
9. Recognize C+ or C{n,m} where n >= 1 as providing a fixed starting character.
Version 0.91 10-Sep-97
----------------------
1. PCRE was failing to diagnose unlimited repeats of subpatterns that could
match the empty string as in /(a*)*/. It was looping and ultimately crashing.
2. PCRE was looping on encountering an indefinitely repeated back reference to
a subpattern that had matched an empty string, e.g. /(a|)\1*/. It now does what
Perl does - treats the match as successful.
****

View File

@ -1,32 +0,0 @@
PCRE LICENCE
------------
PCRE is a library of functions to support regular expressions whose syntax
and semantics are as close as possible to those of the Perl 5 language.
Written by: Philip Hazel <ph10@cam.ac.uk>
University of Cambridge Computing Service,
Cambridge, England. Phone: +44 1223 334714.
Copyright (c) 1997-1999 University of Cambridge
Permission is granted to anyone to use this software for any purpose on any
computer system, and to redistribute it freely, subject to the following
restrictions:
1. This software is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
2. The origin of this software must not be misrepresented, either by
explicit claim or by omission.
3. Altered versions must be plainly marked as such, and must not be
misrepresented as being the original software.
4. If PCRE is embedded in any software that is released under the GNU
General Purpose Licence (GPL), then the terms of that licence shall
supersede any condition above with which it is incompatible.
End

View File

@ -1,11 +0,0 @@
DEPTH = ../../..
topsrcdir = @topsrcdir@
srcdir = @srcdir@
VPATH = @srcdir@
LTLIBRARY_NAME = libpcre.la
LTLIBRARY_SOURCES = maketables.c get.c study.c pcre.c
include $(topsrcdir)/build/ltlib.mk

View File

@ -1,416 +0,0 @@
README file for PCRE (Perl-compatible regular expressions)
----------------------------------------------------------
*******************************************************************************
* IMPORTANT FOR THOSE UPGRADING FROM VERSIONS BEFORE 2.00 *
* *
* Please note that there has been a change in the API such that a larger *
* ovector is required at matching time, to provide some additional workspace. *
* The new man page has details. This change was necessary in order to support *
* some of the new functionality in Perl 5.005. *
* *
* IMPORTANT FOR THOSE UPGRADING FROM VERSION 2.00 *
* *
* Another (I hope this is the last!) change has been made to the API for the *
* pcre_compile() function. An additional argument has been added to make it *
* possible to pass over a pointer to character tables built in the current *
* locale by pcre_maketables(). To use the default tables, this new arguement *
* should be passed as NULL. *
* *
* IMPORTANT FOR THOSE UPGRADING FROM VERSION 2.05 *
* *
* Yet another (and again I hope this really is the last) change has been made *
* to the API for the pcre_exec() function. An additional argument has been *
* added to make it possible to start the match other than at the start of the *
* subject string. This is important if there are lookbehinds. The new man *
* page has the details, but you just want to convert existing programs, all *
* you need to do is to stick in a new fifth argument to pcre_exec(), with a *
* value of zero. For example, change *
* *
* pcre_exec(pattern, extra, subject, length, options, ovec, ovecsize) *
* to *
* pcre_exec(pattern, extra, subject, length, 0, options, ovec, ovecsize) *
*******************************************************************************
The distribution should contain the following files:
ChangeLog log of changes to the code
LICENCE conditions for the use of PCRE
Makefile for building PCRE in Unix systems
README this file
RunTest a Unix shell script for running tests
Tech.Notes notes on the encoding
pcre.3 man page source for the functions
pcre.3.txt plain text version
pcre.3.html HTML version
pcreposix.3 man page source for the POSIX wrapper API
pcreposix.3.txt plain text version
pcreposix.3.HTML HTML version
dftables.c auxiliary program for building chartables.c
get.c )
maketables.c )
study.c ) source of
pcre.c ) the functions
pcreposix.c )
pcre.h header for the external API
pcreposix.h header for the external POSIX wrapper API
internal.h header for internal use
pcretest.c test program
pgrep.1 man page source for pgrep
pgrep.1.txt plain text version
pgrep.1.HTML HTML version
pgrep.c source of a grep utility that uses PCRE
perltest Perl test program
testinput1 test data, compatible with Perl 5.004 and 5.005
testinput2 test data for error messages and non-Perl things
testinput3 test data, compatible with Perl 5.005
testinput4 test data for locale-specific tests
testoutput1 test results corresponding to testinput1
testoutput2 test results corresponding to testinput2
testoutput3 test results corresponding to testinput3
testoutput4 test results corresponding to testinput4
dll.mk for Win32 DLL
pcre.def ditto
To build PCRE on a Unix system, first edit Makefile for your system. It is a
fairly simple make file, and there are some comments near the top, after the
text "On a Unix system". Then run "make". It builds two libraries called
libpcre.a and libpcreposix.a, a test program called pcretest, and the pgrep
command. You can use "make install" to copy these, and the public header file
pcre.h, to appropriate live directories on your system. These installation
directories are defined at the top of the Makefile, and you should edit them if
necessary.
For a non-Unix system, read the comments at the top of Makefile, which give
some hints on what needs to be done. PCRE has been compiled on Windows systems
and on Macintoshes, but I don't know the details as I don't use those systems.
It should be straightforward to build PCRE on any system that has a Standard C
compiler.
Some help in building a Win32 DLL of PCRE in GnuWin32 environments was
contributed by Paul.Sokolovsky@technologist.com. These environments are
Mingw32 (http://www.xraylith.wisc.edu/~khan/software/gnu-win32/) and
CygWin (http://sourceware.cygnus.com/cygwin/). Paul comments:
For CygWin, set CFLAGS=-mno-cygwin, and do 'make dll'. You'll get
pcre.dll (containing pcreposix also), libpcre.dll.a, and dynamically
linked pgrep and pcretest. If you have /bin/sh, run RunTest (three
main test go ok, locale not supported).
To test PCRE, run the RunTest script in the pcre directory. This can also be
run by "make runtest". It runs the pcretest test program (which is documented
below) on each of the testinput files in turn, and compares the output with the
contents of the corresponding testoutput file. A file called testtry is used to
hold the output from pcretest. To run pcretest on just one of the test files,
give its number as an argument to RunTest, for example:
RunTest 3
The first and third test files can also be fed directly into the perltest
script to check that Perl gives the same results. The third file requires the
additional features of release 5.005, which is why it is kept separate from the
main test input, which needs only Perl 5.004. In the long run, when 5.005 is
widespread, these two test files may get amalgamated.
The second set of tests check pcre_info(), pcre_study(), pcre_copy_substring(),
pcre_get_substring(), pcre_get_substring_list(), error detection and run-time
flags that are specific to PCRE, as well as the POSIX wrapper API.
The fourth set of tests checks pcre_maketables(), the facility for building a
set of character tables for a specific locale and using them instead of the
default tables. The tests make use of the "fr" (French) locale. Before running
the test, the script checks for the presence of this locale by running the
"locale" command. If that command fails, or if it doesn't include "fr" in the
list of available locales, the fourth test cannot be run, and a comment is
output to say why. If running this test produces instances of the error
** Failed to set locale "fr"
in the comparison output, it means that locale is not available on your system,
despite being listed by "locale". This does not mean that PCRE is broken.
PCRE has its own native API, but a set of "wrapper" functions that are based on
the POSIX API are also supplied in the library libpcreposix.a. Note that this
just provides a POSIX calling interface to PCRE: the regular expressions
themselves still follow Perl syntax and semantics. The header file
for the POSIX-style functions is called pcreposix.h. The official POSIX name is
regex.h, but I didn't want to risk possible problems with existing files of
that name by distributing it that way. To use it with an existing program that
uses the POSIX API, it will have to be renamed or pointed at by a link.
Character tables
----------------
PCRE uses four tables for manipulating and identifying characters. The final
argument of the pcre_compile() function is a pointer to a block of memory
containing the concatenated tables. A call to pcre_maketables() can be used to
generate a set of tables in the current locale. If the final argument for
pcre_compile() is passed as NULL, a set of default tables that is built into
the binary is used.
The source file called chartables.c contains the default set of tables. This is
not supplied in the distribution, but is built by the program dftables
(compiled from dftables.c), which uses the ANSI C character handling functions
such as isalnum(), isalpha(), isupper(), islower(), etc. to build the table
sources. This means that the default C locale which is set for your system will
control the contents of these default tables. You can change the default tables
by editing chartables.c and then re-building PCRE. If you do this, you should
probably also edit Makefile to ensure that the file doesn't ever get
re-generated.
The first two 256-byte tables provide lower casing and case flipping functions,
respectively. The next table consists of three 32-byte bit maps which identify
digits, "word" characters, and white space, respectively. These are used when
building 32-byte bit maps that represent character classes.
The final 256-byte table has bits indicating various character types, as
follows:
1 white space character
2 letter
4 decimal digit
8 hexadecimal digit
16 alphanumeric or '_'
128 regular expression metacharacter or binary zero
You should not alter the set of characters that contain the 128 bit, as that
will cause PCRE to malfunction.
The pcretest program
--------------------
This program is intended for testing PCRE, but it can also be used for
experimenting with regular expressions.
If it is given two filename arguments, it reads from the first and writes to
the second. If it is given only one filename argument, it reads from that file
and writes to stdout. Otherwise, it reads from stdin and writes to stdout, and
prompts for each line of input.
The program handles any number of sets of input on a single input file. Each
set starts with a regular expression, and continues with any number of data
lines to be matched against the pattern. An empty line signals the end of the
set. The regular expressions are given enclosed in any non-alphameric
delimiters other than backslash, for example
/(a|bc)x+yz/
White space before the initial delimiter is ignored. A regular expression may
be continued over several input lines, in which case the newline characters are
included within it. See the testinput files for many examples. It is possible
to include the delimiter within the pattern by escaping it, for example
/abc\/def/
If you do so, the escape and the delimiter form part of the pattern, but since
delimiters are always non-alphameric, this does not affect its interpretation.
If the terminating delimiter is immediately followed by a backslash, for
example,
/abc/\
then a backslash is added to the end of the pattern. This is done to provide a
way of testing the error condition that arises if a pattern finishes with a
backslash, because
/abc\/
is interpreted as the first line of a pattern that starts with "abc/", causing
pcretest to read the next line as a continuation of the regular expression.
The pattern may be followed by i, m, s, or x to set the PCRE_CASELESS,
PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options, respectively. For
example:
/caseless/i
These modifier letters have the same effect as they do in Perl. There are
others which set PCRE options that do not correspond to anything in Perl: /A,
/E, and /X set PCRE_ANCHORED, PCRE_DOLLAR_ENDONLY, and PCRE_EXTRA respectively.
Searching for all possible matches within each subject string can be requested
by the /g or /G modifier. After finding a match, PCRE is called again to search
the remainder of the subject string. The difference between /g and /G is that
the former uses the startoffset argument to pcre_exec() to start searching at
a new point within the entire string (which is in effect what Perl does),
whereas the latter passes over a shortened substring. This makes a difference
to the matching process if the pattern begins with a lookbehind assertion
(including \b or \B).
If any call to pcre_exec() in a /g or /G sequence matches an empty string, the
next call is done with the PCRE_NOTEMPTY flag set so that it cannot match an
empty string again. This imitates the way Perl handles such cases when using
the /g modifier or the split() function.
There are a number of other modifiers for controlling the way pcretest
operates.
The /+ modifier requests that as well as outputting the substring that matched
the entire pattern, pcretest should in addition output the remainder of the
subject string. This is useful for tests where the subject contains multiple
copies of the same substring.
The /L modifier must be followed directly by the name of a locale, for example,
/pattern/Lfr
For this reason, it must be the last modifier letter. The given locale is set,
pcre_maketables() is called to build a set of character tables for the locale,
and this is then passed to pcre_compile() when compiling the regular
expression. Without an /L modifier, NULL is passed as the tables pointer; that
is, /L applies only to the expression on which it appears.
The /I modifier requests that pcretest output information about the compiled
expression (whether it is anchored, has a fixed first character, and so on). It
does this by calling pcre_info() after compiling an expression, and outputting
the information it gets back. If the pattern is studied, the results of that
are also output.
The /D modifier is a PCRE debugging feature, which also assumes /I. It causes
the internal form of compiled regular expressions to be output after
compilation.
The /S modifier causes pcre_study() to be called after the expression has been
compiled, and the results used when the expression is matched.
The /M modifier causes the size of memory block used to hold the compiled
pattern to be output.
Finally, the /P modifier causes pcretest to call PCRE via the POSIX wrapper API
rather than its native API. When this is done, all other modifiers except /i,
/m, and /+ are ignored. REG_ICASE is set if /i is present, and REG_NEWLINE is
set if /m is present. The wrapper functions force PCRE_DOLLAR_ENDONLY always,
and PCRE_DOTALL unless REG_NEWLINE is set.
Before each data line is passed to pcre_exec(), leading and trailing whitespace
is removed, and it is then scanned for \ escapes. The following are recognized:
\a alarm (= BEL)
\b backspace
\e escape
\f formfeed
\n newline
\r carriage return
\t tab
\v vertical tab
\nnn octal character (up to 3 octal digits)
\xhh hexadecimal character (up to 2 hex digits)
\A pass the PCRE_ANCHORED option to pcre_exec()
\B pass the PCRE_NOTBOL option to pcre_exec()
\Cdd call pcre_copy_substring() for substring dd after a successful match
(any decimal number less than 32)
\Gdd call pcre_get_substring() for substring dd after a successful match
(any decimal number less than 32)
\L call pcre_get_substringlist() after a successful match
\N pass the PCRE_NOTEMPTY option to pcre_exec()
\Odd set the size of the output vector passed to pcre_exec() to dd
(any number of decimal digits)
\Z pass the PCRE_NOTEOL option to pcre_exec()
A backslash followed by anything else just escapes the anything else. If the
very last character is a backslash, it is ignored. This gives a way of passing
an empty line as data, since a real empty line terminates the data input.
If /P was present on the regex, causing the POSIX wrapper API to be used, only
\B, and \Z have any effect, causing REG_NOTBOL and REG_NOTEOL to be passed to
regexec() respectively.
When a match succeeds, pcretest outputs the list of captured substrings that
pcre_exec() returns, starting with number 0 for the string that matched the
whole pattern. Here is an example of an interactive pcretest run.
$ pcretest
PCRE version 2.06 08-Jun-1999
re> /^abc(\d+)/
data> abc123
0: abc123
1: 123
data> xyz
No match
If the strings contain any non-printing characters, they are output as \0x
escapes. If the pattern has the /+ modifier, then the output for substring 0 is
followed by the the rest of the subject string, identified by "0+" like this:
re> /cat/+
data> cataract
0: cat
0+ aract
If the pattern has the /g or /G modifier, the results of successive matching
attempts are output in sequence, like this:
re> /\Bi(\w\w)/g
data> Mississippi
0: iss
1: ss
0: iss
1: ss
0: ipp
1: pp
"No match" is output only if the first match attempt fails.
If any of \C, \G, or \L are present in a data line that is successfully
matched, the substrings extracted by the convenience functions are output with
C, G, or L after the string number instead of a colon. This is in addition to
the normal full list. The string length (that is, the return from the
extraction function) is given in parentheses after each string for \C and \G.
Note that while patterns can be continued over several lines (a plain ">"
prompt is used for continuations), data lines may not. However newlines can be
included in data by means of the \n escape.
If the -p option is given to pcretest, it is equivalent to adding /P to each
regular expression: the POSIX wrapper API is used to call PCRE. None of the
following flags has any effect in this case.
If the option -d is given to pcretest, it is equivalent to adding /D to each
regular expression: the internal form is output after compilation.
If the option -i is given to pcretest, it is equivalent to adding /I to each
regular expression: information about the compiled pattern is given after
compilation.
If the option -m is given to pcretest, it outputs the size of each compiled
pattern after it has been compiled. It is equivalent to adding /M to each
regular expression. For compatibility with earlier versions of pcretest, -s is
a synonym for -m.
If the -t option is given, each compile, study, and match is run 20000 times
while being timed, and the resulting time per compile or match is output in
milliseconds. Do not set -t with -s, because you will then get the size output
20000 times and the timing will be distorted. If you want to change the number
of repetitions used for timing, edit the definition of LOOPREPEAT at the top of
pcretest.c
The perltest program
--------------------
The perltest program tests Perl's regular expressions; it has the same
specification as pcretest, and so can be given identical input, except that
input patterns can be followed only by Perl's lower case modifiers. The
contents of testinput1 and testinput3 meet this condition.
The data lines are processed as Perl double-quoted strings, so if they contain
" \ $ or @ characters, these have to be escaped. For this reason, all such
characters in testinput1 and testinput3 are escaped so that they can be used
for perltest as well as for pcretest, and the special upper case modifiers such
as /A that pcretest recognizes are not used in these files. The output should
be identical, apart from the initial identifying banner.
The testinput2 and testinput4 files are not suitable for feeding to perltest,
since they do make use of the special upper case modifiers and escapes that
pcretest uses to test some features of PCRE. The first of these files also
contains malformed regular expressions, in order to check that PCRE diagnoses
them correctly.
Philip Hazel <ph10@cam.ac.uk>
July 1999

View File

@ -1,94 +0,0 @@
#! /bin/sh
# Run PCRE tests
cf=diff
# Select which tests to run; if no selection, run all
do1=no
do2=no
do3=no
do4=no
while [ $# -gt 0 ] ; do
case $1 in
1) do1=yes;;
2) do2=yes;;
3) do3=yes;;
4) do4=yes;;
*) echo "Unknown test number $1"; exit 1;;
esac
shift
done
if [ $do1 = no -a $do2 = no -a $do3 = no -a $do4 = no ] ; then
do1=yes
do2=yes
do3=yes
do4=yes
fi
# Primary test, Perl-compatible
if [ $do1 = yes ] ; then
echo "Testing main functionality (Perl compatible)"
./pcretest testinput1 testtry
if [ $? = 0 ] ; then
$cf testtry testoutput1
if [ $? != 0 ] ; then exit 1; fi
else exit 1
fi
fi
# PCRE tests that are not Perl-compatible - API & error tests, mostly
if [ $do2 = yes ] ; then
echo "Testing API and error handling (not Perl compatible)"
./pcretest -i testinput2 testtry
if [ $? = 0 ] ; then
$cf testtry testoutput2
if [ $? != 0 ] ; then exit 1; fi
else exit 1
fi
fi
# Additional Perl-compatible tests for Perl 5.005's new features
if [ $do3 = yes ] ; then
echo "Testing Perl 5.005 features (Perl 5.005 compatible)"
./pcretest testinput3 testtry
if [ $? = 0 ] ; then
$cf testtry testoutput3
if [ $? != 0 ] ; then exit 1; fi
else exit 1
fi
fi
if [ $do1 = yes -a $do2 = yes -a $do3 = yes ] ; then
echo "The three main tests all ran OK"
echo " "
fi
# Locale-specific tests, provided the "fr" locale is available
if [ $do4 = yes ] ; then
locale -a | grep '^fr$' >/dev/null
if [ $? -eq 0 ] ; then
echo "Testing locale-specific features (using 'fr' locale)"
./pcretest testinput4 testtry
if [ $? = 0 ] ; then
$cf testtry testoutput4
if [ $? != 0 ] ; then exit 1; fi
echo "Locale test ran OK"
echo " "
else exit 1
fi
else
echo "Cannot test locale-specific features - 'fr' locale not found,"
echo "or the \"locale\" command is not available to check for it."
echo " "
fi
fi
# End

View File

@ -1,239 +0,0 @@
Technical Notes about PCRE
--------------------------
Many years ago I implemented some regular expression functions to an algorithm
suggested by Martin Richards. These were not Unix-like in form, and were quite
restricted in what they could do by comparison with Perl. The interesting part
about the algorithm was that the amount of space required to hold the compiled
form of an expression was known in advance. The code to apply an expression did
not operate by backtracking, as the Henry Spencer and Perl code does, but
instead checked all possibilities simultaneously by keeping a list of current
states and checking all of them as it advanced through the subject string. (In
the terminology of Jeffrey Friedl's book, it was a "DFA algorithm".) When the
pattern was all used up, all remaining states were possible matches, and the
one matching the longest subset of the subject string was chosen. This did not
necessarily maximize the individual wild portions of the pattern, as is
expected in Unix and Perl-style regular expressions.
By contrast, the code originally written by Henry Spencer and subsequently
heavily modified for Perl actually compiles the expression twice: once in a
dummy mode in order to find out how much store will be needed, and then for
real. The execution function operates by backtracking and maximizing (or,
optionally, minimizing in Perl) the amount of the subject that matches
individual wild portions of the pattern. This is an "NFA algorithm" in Friedl's
terminology.
For this set of functions that forms PCRE, I tried at first to invent an
algorithm that used an amount of store bounded by a multiple of the number of
characters in the pattern, to save on compiling time. However, because of the
greater complexity in Perl regular expressions, I couldn't do this. In any
case, a first pass through the pattern is needed, in order to find internal
flag settings like (?i) at top level. So it works by running a very degenerate
first pass to calculate a maximum store size, and then a second pass to do the
real compile - which may use a bit less than the predicted amount of store. The
idea is that this is going to turn out faster because the first pass is
degenerate and the second can just store stuff straight into the vector. It
does make the compiling functions bigger, of course, but they have got quite
big anyway to handle all the Perl stuff.
The compiled form of a pattern is a vector of bytes, containing items of
variable length. The first byte in an item is an opcode, and the length of the
item is either implicit in the opcode or contained in the data bytes which
follow it. A list of all the opcodes follows:
Opcodes with no following data
------------------------------
These items are all just one byte long
OP_END end of pattern
OP_ANY match any character
OP_SOD match start of data: \A
OP_CIRC ^ (start of data, or after \n in multiline)
OP_NOT_WORD_BOUNDARY \W
OP_WORD_BOUNDARY \w
OP_NOT_DIGIT \D
OP_DIGIT \d
OP_NOT_WHITESPACE \S
OP_WHITESPACE \s
OP_NOT_WORDCHAR \W
OP_WORDCHAR \w
OP_EODN match end of data or \n at end: \Z
OP_EOD match end of data: \z
OP_DOLL $ (end of data, or before \n in multiline)
Repeating single characters
---------------------------
The common repeats (*, +, ?) when applied to a single character appear as
two-byte items using the following opcodes:
OP_STAR
OP_MINSTAR
OP_PLUS
OP_MINPLUS
OP_QUERY
OP_MINQUERY
Those with "MIN" in their name are the minimizing versions. Each is followed by
the character that is to be repeated. Other repeats make use of
OP_UPTO
OP_MINUPTO
OP_EXACT
which are followed by a two-byte count (most significant first) and the
repeated character. OP_UPTO matches from 0 to the given number. A repeat with a
non-zero minimum and a fixed maximum is coded as an OP_EXACT followed by an
OP_UPTO (or OP_MINUPTO).
Repeating character types
-------------------------
Repeats of things like \d are done exactly as for single characters, except
that instead of a character, the opcode for the type is stored in the data
byte. The opcodes are:
OP_TYPESTAR
OP_TYPEMINSTAR
OP_TYPEPLUS
OP_TYPEMINPLUS
OP_TYPEQUERY
OP_TYPEMINQUERY
OP_TYPEUPTO
OP_TYPEMINUPTO
OP_TYPEEXACT
Matching a character string
---------------------------
The OP_CHARS opcode is followed by a one-byte count and then that number of
characters. If there are more than 255 characters in sequence, successive
instances of OP_CHARS are used.
Character classes
-----------------
OP_CLASS is used for a character class, provided there are at least two
characters in the class. If there is only one character, OP_CHARS is used for a
positive class, and OP_NOT for a negative one (that is, for something like
[^a]). Another set of repeating opcodes (OP_NOTSTAR etc.) are used for a
repeated, negated, single-character class. The normal ones (OP_STAR etc.) are
used for a repeated positive single-character class.
OP_CLASS is followed by a 32-byte bit map containing a 1
bit for every character that is acceptable. The bits are counted from the least
significant end of each byte.
Back references
---------------
OP_REF is followed by a single byte containing the reference number.
Repeating character classes and back references
-----------------------------------------------
Single-character classes are handled specially (see above). This applies to
OP_CLASS and OP_REF. In both cases, the repeat information follows the base
item. The matching code looks at the following opcode to see if it is one of
OP_CRSTAR
OP_CRMINSTAR
OP_CRPLUS
OP_CRMINPLUS
OP_CRQUERY
OP_CRMINQUERY
OP_CRRANGE
OP_CRMINRANGE
All but the last two are just single-byte items. The others are followed by
four bytes of data, comprising the minimum and maximum repeat counts.
Brackets and alternation
------------------------
A pair of non-identifying (round) brackets is wrapped round each expression at
compile time, so alternation always happens in the context of brackets.
Non-identifying brackets use the opcode OP_BRA, while identifying brackets use
OP_BRA+1, OP_BRA+2, etc. [Note for North Americans: "bracket" to some English
speakers, including myself, can be round, square, or curly. Hence this usage.]
A bracket opcode is followed by two bytes which give the offset to the next
alternative OP_ALT or, if there aren't any branches, to the matching KET
opcode. Each OP_ALT is followed by two bytes giving the offset to the next one,
or to the KET opcode.
OP_KET is used for subpatterns that do not repeat indefinitely, while
OP_KETRMIN and OP_KETRMAX are used for indefinite repetitions, minimally or
maximally respectively. All three are followed by two bytes giving (as a
positive number) the offset back to the matching BRA opcode.
If a subpattern is quantified such that it is permitted to match zero times, it
is preceded by one of OP_BRAZERO or OP_BRAMINZERO. These are single-byte
opcodes which tell the matcher that skipping this subpattern entirely is a
valid branch.
A subpattern with an indefinite maximum repetition is replicated in the
compiled data its minimum number of times (or once with a BRAZERO if the
minimum is zero), with the final copy terminating with a KETRMIN or KETRMAX as
appropriate.
A subpattern with a bounded maximum repetition is replicated in a nested
fashion up to the maximum number of times, with BRAZERO or BRAMINZERO before
each replication after the minimum, so that, for example, (abc){2,5} is
compiled as (abc)(abc)((abc)((abc)(abc)?)?)?. The 200-bracket limit does not
apply to these internally generated brackets.
Assertions
----------
Forward assertions are just like other subpatterns, but starting with one of
the opcodes OP_ASSERT or OP_ASSERT_NOT. Backward assertions use the opcodes
OP_ASSERTBACK and OP_ASSERTBACK_NOT, and the first opcode inside the assertion
is OP_REVERSE, followed by a two byte count of the number of characters to move
back the pointer in the subject string. A separate count is present in each
alternative of a lookbehind assertion, allowing them to have different fixed
lengths.
Once-only subpatterns
---------------------
These are also just like other subpatterns, but they start with the opcode
OP_ONCE.
Conditional subpatterns
-----------------------
These are like other subpatterns, but they start with the opcode OP_COND. If
the condition is a back reference, this is stored at the start of the
subpattern using the opcode OP_CREF followed by one byte containing the
reference number. Otherwise, a conditional subpattern will always start with
one of the assertions.
Changing options
----------------
If any of the /i, /m, or /s options are changed within a parenthesized group,
an OP_OPT opcode is compiled, followed by one byte containing the new settings
of these flags. If there are several alternatives in a group, there is an
occurrence of OP_OPT at the start of all those following the first options
change, to set appropriate options for the start of the alternative.
Immediately after the end of the group there is another such item to reset the
flags to their previous values. Other changes of flag within the pattern can be
handled entirely at compile time, and so do not cause anything to be put into
the compiled data.
Philip Hazel
January 1999

View File

@ -1,146 +0,0 @@
/*************************************************
* Perl-Compatible Regular Expressions *
*************************************************/
/* This file is automatically written by the dftables auxiliary
program. If you edit it by hand, you might like to edit the Makefile to
prevent its ever being regenerated.
This file is #included in the compilation of pcre.c to build the default
character tables which are used when no tables are passed to the compile
function. */
static unsigned char pcre_default_tables[] = {
/* This table is a lower casing table. */
0, 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23,
24, 25, 26, 27, 28, 29, 30, 31,
32, 33, 34, 35, 36, 37, 38, 39,
40, 41, 42, 43, 44, 45, 46, 47,
48, 49, 50, 51, 52, 53, 54, 55,
56, 57, 58, 59, 60, 61, 62, 63,
64, 97, 98, 99,100,101,102,103,
104,105,106,107,108,109,110,111,
112,113,114,115,116,117,118,119,
120,121,122, 91, 92, 93, 94, 95,
96, 97, 98, 99,100,101,102,103,
104,105,106,107,108,109,110,111,
112,113,114,115,116,117,118,119,
120,121,122,123,124,125,126,127,
128,129,130,131,132,133,134,135,
136,137,138,139,140,141,142,143,
144,145,146,147,148,149,150,151,
152,153,154,155,156,157,158,159,
160,161,162,163,164,165,166,167,
168,169,170,171,172,173,174,175,
176,177,178,179,180,181,182,183,
184,185,186,187,188,189,190,191,
192,193,194,195,196,197,198,199,
200,201,202,203,204,205,206,207,
208,209,210,211,212,213,214,215,
216,217,218,219,220,221,222,223,
224,225,226,227,228,229,230,231,
232,233,234,235,236,237,238,239,
240,241,242,243,244,245,246,247,
248,249,250,251,252,253,254,255,
/* This table is a case flipping table. */
0, 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23,
24, 25, 26, 27, 28, 29, 30, 31,
32, 33, 34, 35, 36, 37, 38, 39,
40, 41, 42, 43, 44, 45, 46, 47,
48, 49, 50, 51, 52, 53, 54, 55,
56, 57, 58, 59, 60, 61, 62, 63,
64, 97, 98, 99,100,101,102,103,
104,105,106,107,108,109,110,111,
112,113,114,115,116,117,118,119,
120,121,122, 91, 92, 93, 94, 95,
96, 65, 66, 67, 68, 69, 70, 71,
72, 73, 74, 75, 76, 77, 78, 79,
80, 81, 82, 83, 84, 85, 86, 87,
88, 89, 90,123,124,125,126,127,
128,129,130,131,132,133,134,135,
136,137,138,139,140,141,142,143,
144,145,146,147,148,149,150,151,
152,153,154,155,156,157,158,159,
160,161,162,163,164,165,166,167,
168,169,170,171,172,173,174,175,
176,177,178,179,180,181,182,183,
184,185,186,187,188,189,190,191,
192,193,194,195,196,197,198,199,
200,201,202,203,204,205,206,207,
208,209,210,211,212,213,214,215,
216,217,218,219,220,221,222,223,
224,225,226,227,228,229,230,231,
232,233,234,235,236,237,238,239,
240,241,242,243,244,245,246,247,
248,249,250,251,252,253,254,255,
/* This table contains bit maps for digits, 'word' chars, and white
space. Each map is 32 bytes long and the bits run from the least
significant end of each byte. */
0x00,0x00,0x00,0x00,0x00,0x00,0xff,0x03,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0xff,0x03,
0xfe,0xff,0xff,0x87,0xfe,0xff,0xff,0x07,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x3e,0x00,0x00,0x01,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00 ,
/* This table identifies various classes of character by individual bits:
0x01 white space character
0x02 letter
0x04 decimal digit
0x08 hexadecimal digit
0x10 alphanumeric or '_'
0x80 regular expression metacharacter or binary zero
*/
0x80,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 0- 7 */
0x00,0x01,0x01,0x01,0x01,0x01,0x00,0x00, /* 8- 15 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 16- 23 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 24- 31 */
0x01,0x00,0x00,0x00,0x80,0x00,0x00,0x00, /* - ' */
0x80,0x80,0x80,0x80,0x00,0x00,0x80,0x00, /* ( - / */
0x1c,0x1c,0x1c,0x1c,0x1c,0x1c,0x1c,0x1c, /* 0 - 7 */
0x1c,0x1c,0x00,0x00,0x00,0x00,0x00,0x80, /* 8 - ? */
0x00,0x1a,0x1a,0x1a,0x1a,0x1a,0x1a,0x12, /* @ - G */
0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* H - O */
0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* P - W */
0x12,0x12,0x12,0x80,0x00,0x00,0x80,0x10, /* X - _ */
0x00,0x1a,0x1a,0x1a,0x1a,0x1a,0x1a,0x12, /* ` - g */
0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* h - o */
0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* p - w */
0x12,0x12,0x12,0x80,0x80,0x00,0x00,0x00, /* x -127 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 128-135 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 136-143 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 144-151 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 152-159 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 160-167 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 168-175 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 176-183 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 184-191 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 192-199 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 200-207 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 208-215 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 216-223 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 224-231 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 232-239 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 240-247 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00};/* 248-255 */
/* End of chartables.c */

View File

@ -1,146 +0,0 @@
/*************************************************
* Perl-Compatible Regular Expressions *
*************************************************/
/*
PCRE is a library of functions to support regular expressions whose syntax
and semantics are as close as possible to those of the Perl 5 language.
Written by: Philip Hazel <ph10@cam.ac.uk>
Copyright (c) 1997-1999 University of Cambridge
-----------------------------------------------------------------------------
Permission is granted to anyone to use this software for any purpose on any
computer system, and to redistribute it freely, subject to the following
restrictions:
1. This software is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
2. The origin of this software must not be misrepresented, either by
explicit claim or by omission.
3. Altered versions must be plainly marked as such, and must not be
misrepresented as being the original software.
4. If PCRE is embedded in any software that is released under the GNU
General Purpose Licence (GPL), then the terms of that licence shall
supersede any condition above with which it is incompatible.
-----------------------------------------------------------------------------
See the file Tech.Notes for some information on the internals.
*/
/* This is a support program to generate the file chartables.c, containing
character tables of various kinds. They are built according to the default C
locale and used as the default tables by PCRE. Now that pcre_maketables is
a function visible to the outside world, we make use of its code from here in
order to be consistent. */
#include <ctype.h>
#include <stdio.h>
#include <string.h>
#include "internal.h"
#define DFTABLES /* maketables.c notices this */
#include "maketables.c"
int main(void)
{
int i;
unsigned const char *tables = pcre_maketables();
printf(
"/*************************************************\n"
"* Perl-Compatible Regular Expressions *\n"
"*************************************************/\n\n"
"/* This file is automatically written by the dftables auxiliary \n"
"program. If you edit it by hand, you might like to edit the Makefile to \n"
"prevent its ever being regenerated.\n\n"
"This file is #included in the compilation of pcre.c to build the default\n"
"character tables which are used when no tables are passed to the compile\n"
"function. */\n\n"
"static unsigned char pcre_default_tables[] = {\n\n"
"/* This table is a lower casing table. */\n\n");
printf(" ");
for (i = 0; i < 256; i++)
{
if ((i & 7) == 0 && i != 0) printf("\n ");
printf("%3d", *tables++);
if (i != 255) printf(",");
}
printf(",\n\n");
printf("/* This table is a case flipping table. */\n\n");
printf(" ");
for (i = 0; i < 256; i++)
{
if ((i & 7) == 0 && i != 0) printf("\n ");
printf("%3d", *tables++);
if (i != 255) printf(",");
}
printf(",\n\n");
printf(
"/* This table contains bit maps for digits, 'word' chars, and white\n"
"space. Each map is 32 bytes long and the bits run from the least\n"
"significant end of each byte. */\n\n");
printf(" ");
for (i = 0; i < cbit_length; i++)
{
if ((i & 7) == 0 && i != 0)
{
if ((i & 31) == 0) printf("\n");
printf("\n ");
}
printf("0x%02x", *tables++);
if (i != cbit_length - 1) printf(",");
}
printf(" ,\n\n");
printf(
"/* This table identifies various classes of character by individual bits:\n"
" 0x%02x white space character\n"
" 0x%02x letter\n"
" 0x%02x decimal digit\n"
" 0x%02x hexadecimal digit\n"
" 0x%02x alphanumeric or '_'\n"
" 0x%02x regular expression metacharacter or binary zero\n*/\n\n",
ctype_space, ctype_letter, ctype_digit, ctype_xdigit, ctype_word,
ctype_meta);
printf(" ");
for (i = 0; i < 256; i++)
{
if ((i & 7) == 0 && i != 0)
{
printf(" /* ");
if (isprint(i-8)) printf(" %c -", i-8);
else printf("%3d-", i-8);
if (isprint(i-1)) printf(" %c ", i-1);
else printf("%3d", i-1);
printf(" */\n ");
}
printf("0x%02x", *tables++);
if (i != 255) printf(",");
}
printf("};/* ");
if (isprint(i-8)) printf(" %c -", i-8);
else printf("%3d-", i-8);
if (isprint(i-1)) printf(" %c ", i-1);
else printf("%3d", i-1);
printf(" */\n\n/* End of chartables.c */\n");
return 0;
}
/* End of dftables.c */

View File

@ -1,60 +0,0 @@
# dll.mk - auxilary Makefile to easy build dll's for mingw32 target
# ver. 0.6 of 1999-03-25
#
# Homepage of this makefile - http://www.is.lg.ua/~paul/devel/
# Homepage of original mingw32 project -
# http://www.fu.is.saga-u.ac.jp/~colin/gcc.html
#
# How to use:
# This makefile can:
# 1. Create automatical .def file from list of objects
# 2. Create .dll from objects and .def file, either automatical, or your
# hand-written (maybe) file, which must have same basename as dll
# WARNING! There MUST be object, which name match dll's name. Make sux.
# 3. Create import library from .def (as for .dll, only its name required,
# not dll itself)
# By convention implibs for dll have .dll.a suffix, e.g. libstuff.dll.a
# Why not just libstuff.a? 'Cos that's name for static lib, ok?
# Process divided into 3 phases because:
# 1. Pre-existent .def possible
# 2. Generating implib is enough time-consuming
#
# Variables:
# DLL_LDLIBS - libs for linking dll
# DLL_LDFLAGS - flags for linking dll
#
# By using $(DLL_SUFFIX) instead of 'dll', e.g. stuff.$(DLL_SUFFIX)
# you may help porting makefiles to other platforms
#
# Put this file in your make's include path (e.g. main include dir, for
# more information see include section in make doc). Put in the beginning
# of your own Makefile line "include dll.mk". Specify dependences, e.g.:
#
# Do all stuff in one step
# libstuff.dll.a: $(OBJECTS) stuff.def
# stuff.def: $(OBJECTS)
#
# Steps separated, pre-provided .def, link with user32
#
# DLL_LDLIBS=-luser32
# stuff.dll: $(OBJECTS)
# libstuff.dll.a: $(OBJECTS)
DLLWRAP=dllwrap
DLLTOOL=dlltool
DLL_SUFFIX=dll
.SUFFIXES: .o .$(DLL_SUFFIX)
_%.def: %.o
$(DLLTOOL) --export-all --output-def $@ $^
%.$(DLL_SUFFIX): %.o
$(DLLWRAP) --dllname $(notdir $@) --driver-name $(CC) --def $*.def -o $@ $(filter %.o,$^) $(DLL_LDFLAGS) $(DLL_LDLIBS)
lib%.$(DLL_SUFFIX).a:%.def
$(DLLTOOL) --dllname $(notdir $*.dll) --def $< --output-lib $@
# End

View File

@ -1,189 +0,0 @@
/*************************************************
* Perl-Compatible Regular Expressions *
*************************************************/
/*
This is a library of functions to support regular expressions whose syntax
and semantics are as close as possible to those of the Perl 5 language. See
the file Tech.Notes for some information on the internals.
Written by: Philip Hazel <ph10@cam.ac.uk>
Copyright (c) 1997-1999 University of Cambridge
-----------------------------------------------------------------------------
Permission is granted to anyone to use this software for any purpose on any
computer system, and to redistribute it freely, subject to the following
restrictions:
1. This software is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
2. The origin of this software must not be misrepresented, either by
explicit claim or by omission.
3. Altered versions must be plainly marked as such, and must not be
misrepresented as being the original software.
4. If PCRE is embedded in any software that is released under the GNU
General Purpose Licence (GPL), then the terms of that licence shall
supersede any condition above with which it is incompatible.
-----------------------------------------------------------------------------
*/
/* This module contains some convenience functions for extracting substrings
from the subject string after a regex match has succeeded. The original idea
for these functions came from Scott Wimer <scottw@cgibuilder.com>. */
/* Include the internals header, which itself includes Standard C headers plus
the external pcre header. */
#include "internal.h"
/*************************************************
* Copy captured string to given buffer *
*************************************************/
/* This function copies a single captured substring into a given buffer.
Note that we use memcpy() rather than strncpy() in case there are binary zeros
in the string.
Arguments:
subject the subject string that was matched
ovector pointer to the offsets table
stringcount the number of substrings that were captured
(i.e. the yield of the pcre_exec call, unless
that was zero, in which case it should be 1/3
of the offset table size)
stringnumber the number of the required substring
buffer where to put the substring
size the size of the buffer
Returns: if successful:
the length of the copied string, not including the zero
that is put on the end; can be zero
if not successful:
PCRE_ERROR_NOMEMORY (-6) buffer too small
PCRE_ERROR_NOSUBSTRING (-7) no such captured substring
*/
int
pcre_copy_substring(const char *subject, int *ovector, int stringcount,
int stringnumber, char *buffer, int size)
{
int yield;
if (stringnumber < 0 || stringnumber >= stringcount)
return PCRE_ERROR_NOSUBSTRING;
stringnumber *= 2;
yield = ovector[stringnumber+1] - ovector[stringnumber];
if (size < yield + 1) return PCRE_ERROR_NOMEMORY;
memcpy(buffer, subject + ovector[stringnumber], yield);
buffer[yield] = 0;
return yield;
}
/*************************************************
* Copy all captured strings to new store *
*************************************************/
/* This function gets one chunk of store and builds a list of pointers and all
of the captured substrings in it. A NULL pointer is put on the end of the list.
Arguments:
subject the subject string that was matched
ovector pointer to the offsets table
stringcount the number of substrings that were captured
(i.e. the yield of the pcre_exec call, unless
that was zero, in which case it should be 1/3
of the offset table size)
listptr set to point to the list of pointers
Returns: if successful: 0
if not successful:
PCRE_ERROR_NOMEMORY (-6) failed to get store
*/
int
pcre_get_substring_list(const char *subject, int *ovector, int stringcount,
const char ***listptr)
{
int i;
int size = sizeof(char *);
int double_count = stringcount * 2;
char **stringlist;
char *p;
for (i = 0; i < double_count; i += 2)
size += sizeof(char *) + ovector[i+1] - ovector[i] + 1;
stringlist = (char **)(pcre_malloc)(size);
if (stringlist == NULL) return PCRE_ERROR_NOMEMORY;
*listptr = (const char **)stringlist;
p = (char *)(stringlist + stringcount + 1);
for (i = 0; i < double_count; i += 2)
{
int len = ovector[i+1] - ovector[i];
memcpy(p, subject + ovector[i], len);
*stringlist++ = p;
p += len;
*p++ = 0;
}
*stringlist = NULL;
return 0;
}
/*************************************************
* Copy captured string to new store *
*************************************************/
/* This function copies a single captured substring into a piece of new
store
Arguments:
subject the subject string that was matched
ovector pointer to the offsets table
stringcount the number of substrings that were captured
(i.e. the yield of the pcre_exec call, unless
that was zero, in which case it should be 1/3
of the offset table size)
stringnumber the number of the required substring
stringptr where to put a pointer to the substring
Returns: if successful:
the length of the string, not including the zero that
is put on the end; can be zero
if not successful:
PCRE_ERROR_NOMEMORY (-6) failed to get store
PCRE_ERROR_NOSUBSTRING (-7) substring not present
*/
int
pcre_get_substring(const char *subject, int *ovector, int stringcount,
int stringnumber, const char **stringptr)
{
int yield;
char *substring;
if (stringnumber < 0 || stringnumber >= stringcount)
return PCRE_ERROR_NOSUBSTRING;
stringnumber *= 2;
yield = ovector[stringnumber+1] - ovector[stringnumber];
substring = (char *)(pcre_malloc)(yield + 1);
if (substring == NULL) return PCRE_ERROR_NOMEMORY;
memcpy(substring, subject + ovector[stringnumber], yield);
substring[yield] = 0;
*stringptr = substring;
return yield;
}
/* End of get.c */

View File

@ -1,343 +0,0 @@
/*************************************************
* Perl-Compatible Regular Expressions *
*************************************************/
/* This is a library of functions to support regular expressions whose syntax
and semantics are as close as possible to those of the Perl 5 language. See
the file Tech.Notes for some information on the internals.
Written by: Philip Hazel <ph10@cam.ac.uk>
Copyright (c) 1997-1999 University of Cambridge
-----------------------------------------------------------------------------
Permission is granted to anyone to use this software for any purpose on any
computer system, and to redistribute it freely, subject to the following
restrictions:
1. This software is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
2. The origin of this software must not be misrepresented, either by
explicit claim or by omission.
3. Altered versions must be plainly marked as such, and must not be
misrepresented as being the original software.
4. If PCRE is embedded in any software that is released under the GNU
General Purpose Licence (GPL), then the terms of that licence shall
supersede any condition above with which it is incompatible.
-----------------------------------------------------------------------------
*/
/* This header contains definitions that are shared between the different
modules, but which are not relevant to the outside. */
/* To cope with SunOS4 and other systems that lack memmove() but have bcopy(),
define a macro for memmove() if USE_BCOPY is defined. */
#ifdef USE_BCOPY
#undef memmove /* some systems may have a macro */
#define memmove(a, b, c) bcopy(b, a, c)
#endif
/* Standard C headers plus the external interface definition */
#include <ctype.h>
#include <limits.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "pcre.h"
/* In case there is no definition of offsetof() provided - though any proper
Standard C system should have one. */
#ifndef offsetof
#define offsetof(p_type,field) ((size_t)&(((p_type *)0)->field))
#endif
/* These are the public options that can change during matching. */
#define PCRE_IMS (PCRE_CASELESS|PCRE_MULTILINE|PCRE_DOTALL)
/* Private options flags start at the most significant end of the four bytes,
but skip the top bit so we can use ints for convenience without getting tangled
with negative values. The public options defined in pcre.h start at the least
significant end. Make sure they don't overlap, though now that we have expanded
to four bytes there is plenty of space. */
#define PCRE_FIRSTSET 0x40000000 /* first_char is set */
#define PCRE_REQCHSET 0x20000000 /* req_char is set */
#define PCRE_STARTLINE 0x10000000 /* start after \n for multiline */
#define PCRE_INGROUP 0x08000000 /* compiling inside a group */
#define PCRE_ICHANGED 0x04000000 /* i option changes within regex */
/* Options for the "extra" block produced by pcre_study(). */
#define PCRE_STUDY_MAPPED 0x01 /* a map of starting chars exists */
/* Masks for identifying the public options which are permitted at compile
time, run time or study time, respectively. */
#define PUBLIC_OPTIONS \
(PCRE_CASELESS|PCRE_EXTENDED|PCRE_ANCHORED|PCRE_MULTILINE| \
PCRE_DOTALL|PCRE_DOLLAR_ENDONLY|PCRE_EXTRA|PCRE_UNGREEDY)
#define PUBLIC_EXEC_OPTIONS \
(PCRE_ANCHORED|PCRE_NOTBOL|PCRE_NOTEOL|PCRE_NOTEMPTY)
#define PUBLIC_STUDY_OPTIONS 0 /* None defined */
/* Magic number to provide a small check against being handed junk. */
#define MAGIC_NUMBER 0x50435245UL /* 'PCRE' */
/* Miscellaneous definitions */
typedef int BOOL;
#define FALSE 0
#define TRUE 1
/* These are escaped items that aren't just an encoding of a particular data
value such as \n. They must have non-zero values, as check_escape() returns
their negation. Also, they must appear in the same order as in the opcode
definitions below, up to ESC_z. The final one must be ESC_REF as subsequent
values are used for \1, \2, \3, etc. There is a test in the code for an escape
greater than ESC_b and less than ESC_X to detect the types that may be
repeated. If any new escapes are put in-between that don't consume a character,
that code will have to change. */
enum { ESC_A = 1, ESC_B, ESC_b, ESC_D, ESC_d, ESC_S, ESC_s, ESC_W, ESC_w,
ESC_Z, ESC_z, ESC_REF };
/* Opcode table: OP_BRA must be last, as all values >= it are used for brackets
that extract substrings. Starting from 1 (i.e. after OP_END), the values up to
OP_EOD must correspond in order to the list of escapes immediately above. */
enum {
OP_END, /* End of pattern */
/* Values corresponding to backslashed metacharacters */
OP_SOD, /* Start of data: \A */
OP_NOT_WORD_BOUNDARY, /* \B */
OP_WORD_BOUNDARY, /* \b */
OP_NOT_DIGIT, /* \D */
OP_DIGIT, /* \d */
OP_NOT_WHITESPACE, /* \S */
OP_WHITESPACE, /* \s */
OP_NOT_WORDCHAR, /* \W */
OP_WORDCHAR, /* \w */
OP_EODN, /* End of data or \n at end of data: \Z. */
OP_EOD, /* End of data: \z */
OP_OPT, /* Set runtime options */
OP_CIRC, /* Start of line - varies with multiline switch */
OP_DOLL, /* End of line - varies with multiline switch */
OP_ANY, /* Match any character */
OP_CHARS, /* Match string of characters */
OP_NOT, /* Match anything but the following char */
OP_STAR, /* The maximizing and minimizing versions of */
OP_MINSTAR, /* all these opcodes must come in pairs, with */
OP_PLUS, /* the minimizing one second. */
OP_MINPLUS, /* This first set applies to single characters */
OP_QUERY,
OP_MINQUERY,
OP_UPTO, /* From 0 to n matches */
OP_MINUPTO,
OP_EXACT, /* Exactly n matches */
OP_NOTSTAR, /* The maximizing and minimizing versions of */
OP_NOTMINSTAR, /* all these opcodes must come in pairs, with */
OP_NOTPLUS, /* the minimizing one second. */
OP_NOTMINPLUS, /* This first set applies to "not" single characters */
OP_NOTQUERY,
OP_NOTMINQUERY,
OP_NOTUPTO, /* From 0 to n matches */
OP_NOTMINUPTO,
OP_NOTEXACT, /* Exactly n matches */
OP_TYPESTAR, /* The maximizing and minimizing versions of */
OP_TYPEMINSTAR, /* all these opcodes must come in pairs, with */
OP_TYPEPLUS, /* the minimizing one second. These codes must */
OP_TYPEMINPLUS, /* be in exactly the same order as those above. */
OP_TYPEQUERY, /* This set applies to character types such as \d */
OP_TYPEMINQUERY,
OP_TYPEUPTO, /* From 0 to n matches */
OP_TYPEMINUPTO,
OP_TYPEEXACT, /* Exactly n matches */
OP_CRSTAR, /* The maximizing and minimizing versions of */
OP_CRMINSTAR, /* all these opcodes must come in pairs, with */
OP_CRPLUS, /* the minimizing one second. These codes must */
OP_CRMINPLUS, /* be in exactly the same order as those above. */
OP_CRQUERY, /* These are for character classes and back refs */
OP_CRMINQUERY,
OP_CRRANGE, /* These are different to the three seta above. */
OP_CRMINRANGE,
OP_CLASS, /* Match a character class */
OP_REF, /* Match a back reference */
OP_ALT, /* Start of alternation */
OP_KET, /* End of group that doesn't have an unbounded repeat */
OP_KETRMAX, /* These two must remain together and in this */
OP_KETRMIN, /* order. They are for groups the repeat for ever. */
/* The assertions must come before ONCE and COND */
OP_ASSERT, /* Positive lookahead */
OP_ASSERT_NOT, /* Negative lookahead */
OP_ASSERTBACK, /* Positive lookbehind */
OP_ASSERTBACK_NOT, /* Negative lookbehind */
OP_REVERSE, /* Move pointer back - used in lookbehind assertions */
/* ONCE and COND must come after the assertions, with ONCE first, as there's
a test for >= ONCE for a subpattern that isn't an assertion. */
OP_ONCE, /* Once matched, don't back up into the subpattern */
OP_COND, /* Conditional group */
OP_CREF, /* Used to hold an extraction string number */
OP_BRAZERO, /* These two must remain together and in this */
OP_BRAMINZERO, /* order. */
OP_BRA /* This and greater values are used for brackets that
extract substrings. */
};
/* The highest extraction number. This is limited by the number of opcodes
left after OP_BRA, i.e. 255 - OP_BRA. We actually set it somewhat lower. */
#define EXTRACT_MAX 99
/* The texts of compile-time error messages are defined as macros here so that
they can be accessed by the POSIX wrapper and converted into error codes. Yes,
I could have used error codes in the first place, but didn't feel like changing
just to accommodate the POSIX wrapper. */
#define ERR1 "\\ at end of pattern"
#define ERR2 "\\c at end of pattern"
#define ERR3 "unrecognized character follows \\"
#define ERR4 "numbers out of order in {} quantifier"
#define ERR5 "number too big in {} quantifier"
#define ERR6 "missing terminating ] for character class"
#define ERR7 "invalid escape sequence in character class"
#define ERR8 "range out of order in character class"
#define ERR9 "nothing to repeat"
#define ERR10 "operand of unlimited repeat could match the empty string"
#define ERR11 "internal error: unexpected repeat"
#define ERR12 "unrecognized character after (?"
#define ERR13 "too many capturing parenthesized sub-patterns"
#define ERR14 "missing )"
#define ERR15 "back reference to non-existent subpattern"
#define ERR16 "erroffset passed as NULL"
#define ERR17 "unknown option bit(s) set"
#define ERR18 "missing ) after comment"
#define ERR19 "too many sets of parentheses"
#define ERR20 "regular expression too large"
#define ERR21 "failed to get memory"
#define ERR22 "unmatched parentheses"
#define ERR23 "internal error: code overflow"
#define ERR24 "unrecognized character after (?<"
#define ERR25 "lookbehind assertion is not fixed length"
#define ERR26 "malformed number after (?("
#define ERR27 "conditional group contains more than two branches"
#define ERR28 "assertion expected after (?("
/* All character handling must be done as unsigned characters. Otherwise there
are problems with top-bit-set characters and functions such as isspace().
However, we leave the interface to the outside world as char *, because that
should make things easier for callers. We define a short type for unsigned char
to save lots of typing. I tried "uchar", but it causes problems on Digital
Unix, where it is defined in sys/types, so use "uschar" instead. */
typedef unsigned char uschar;
/* The real format of the start of the pcre block; the actual code vector
runs on as long as necessary after the end. */
typedef struct real_pcre {
unsigned long int magic_number;
const unsigned char *tables;
unsigned long int options;
uschar top_bracket;
uschar top_backref;
uschar first_char;
uschar req_char;
uschar code[1];
} real_pcre;
/* The real format of the extra block returned by pcre_study(). */
typedef struct real_pcre_extra {
uschar options;
uschar start_bits[32];
} real_pcre_extra;
/* Structure for passing "static" information around between the functions
doing the compiling, so that they are thread-safe. */
typedef struct compile_data {
const uschar *lcc; /* Points to lower casing table */
const uschar *fcc; /* Points to case-flipping table */
const uschar *cbits; /* Points to character type table */
const uschar *ctypes; /* Points to table of type maps */
} compile_data;
/* Structure for passing "static" information around between the functions
doing the matching, so that they are thread-safe. */
typedef struct match_data {
int errorcode; /* As it says */
int *offset_vector; /* Offset vector */
int offset_end; /* One past the end */
int offset_max; /* The maximum usable for return data */
const uschar *lcc; /* Points to lower casing table */
const uschar *ctypes; /* Points to table of type maps */
BOOL offset_overflow; /* Set if too many extractions */
BOOL notbol; /* NOTBOL flag */
BOOL noteol; /* NOTEOL flag */
BOOL endonly; /* Dollar not before final \n */
BOOL notempty; /* Empty string match not wanted */
const uschar *start_subject; /* Start of the subject string */
const uschar *end_subject; /* End of the subject string */
const uschar *start_match; /* Start of this match attempt */
const uschar *end_match_ptr; /* Subject position at end match */
int end_offset_top; /* Highwater mark at end of match */
} match_data;
/* Bit definitions for entries in the pcre_ctypes table. */
#define ctype_space 0x01
#define ctype_letter 0x02
#define ctype_digit 0x04
#define ctype_xdigit 0x08
#define ctype_word 0x10 /* alphameric or '_' */
#define ctype_meta 0x80 /* regexp meta char or zero (end pattern) */
/* Offsets for the bitmap tables in pcre_cbits. Each table contains a set
of bits for a class map. */
#define cbit_digit 0 /* for \d */
#define cbit_word 32 /* for \w */
#define cbit_space 64 /* for \s */
#define cbit_length 96 /* Length of the cbits table */
/* Offsets of the various tables from the base tables pointer, and
total length. */
#define lcc_offset 0
#define fcc_offset 256
#define cbits_offset 512
#define ctypes_offset (cbits_offset + cbit_length)
#define tables_length (ctypes_offset + 256)
/* End of internal.h */

View File

@ -1,113 +0,0 @@
/*************************************************
* Perl-Compatible Regular Expressions *
*************************************************/
/*
PCRE is a library of functions to support regular expressions whose syntax
and semantics are as close as possible to those of the Perl 5 language.
Written by: Philip Hazel <ph10@cam.ac.uk>
Copyright (c) 1997-1999 University of Cambridge
-----------------------------------------------------------------------------
Permission is granted to anyone to use this software for any purpose on any
computer system, and to redistribute it freely, subject to the following
restrictions:
1. This software is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
2. The origin of this software must not be misrepresented, either by
explicit claim or by omission.
3. Altered versions must be plainly marked as such, and must not be
misrepresented as being the original software.
4. If PCRE is embedded in any software that is released under the GNU
General Purpose Licence (GPL), then the terms of that licence shall
supersede any condition above with which it is incompatible.
-----------------------------------------------------------------------------
See the file Tech.Notes for some information on the internals.
*/
/* This file is compiled on its own as part of the PCRE library. However,
it is also included in the compilation of dftables.c, in which case the macro
DFTABLES is defined. */
#ifndef DFTABLES
#include "internal.h"
#endif
/*************************************************
* Create PCRE character tables *
*************************************************/
/* This function builds a set of character tables for use by PCRE and returns
a pointer to them. They are build using the ctype functions, and consequently
their contents will depend upon the current locale setting. When compiled as
part of the library, the store is obtained via pcre_malloc(), but when compiled
inside dftables, use malloc().
Arguments: none
Returns: pointer to the contiguous block of data
*/
unsigned const char *
pcre_maketables(void)
{
unsigned char *yield, *p;
int i;
#ifndef DFTABLES
yield = (unsigned char*)(pcre_malloc)(tables_length);
#else
yield = (unsigned char*)malloc(tables_length);
#endif
if (yield == NULL) return NULL;
p = yield;
/* First comes the lower casing table */
for (i = 0; i < 256; i++) *p++ = tolower(i);
/* Next the case-flipping table */
for (i = 0; i < 256; i++) *p++ = islower(i)? toupper(i) : tolower(i);
/* Then the character class tables */
memset(p, 0, cbit_length);
for (i = 0; i < 256; i++)
{
if (isdigit(i)) p[cbit_digit + i/8] |= 1 << (i&7);
if (isalnum(i) || i == '_')
p[cbit_word + i/8] |= 1 << (i&7);
if (isspace(i)) p[cbit_space + i/8] |= 1 << (i&7);
}
p += cbit_length;
/* Finally, the character type table */
for (i = 0; i < 256; i++)
{
int x = 0;
if (isspace(i)) x += ctype_space;
if (isalpha(i)) x += ctype_letter;
if (isdigit(i)) x += ctype_digit;
if (isxdigit(i)) x += ctype_xdigit;
if (isalnum(i) || i == '_') x += ctype_word;
if (strchr("*+?{^.$|()[", i) != 0) x += ctype_meta;
*p++ = x;
}
return yield;
}
/* End of maketables.c */

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,19 +0,0 @@
EXPORTS
pcre_malloc DATA
pcre_free DATA
pcre_compile
pcre_copy_substring
pcre_exec
pcre_get_substring
pcre_get_substring_list
pcre_info
pcre_maketables
pcre_study
pcre_version
regcomp
regexec
regerror
regfree

View File

@ -1,96 +0,0 @@
/*************************************************
* Perl-Compatible Regular Expressions *
*************************************************/
/* Copyright (c) 1997-1999 University of Cambridge */
#ifndef _PCRE_H
#define _PCRE_H
#define PCRE_MAJOR 2
#define PCRE_MINOR 08
#define PCRE_DATE 31-Aug-1999
#include "php_compat.h"
/* Win32 uses DLL by default */
#ifdef _WIN32
# ifdef STATIC
# define PCRE_DL_IMPORT
# else
# define PCRE_DL_IMPORT __declspec(dllimport)
# endif
#else
# define PCRE_DL_IMPORT
#endif
/* Have to include stdlib.h in order to ensure that size_t is defined;
it is needed here for malloc. */
#include <sys/types.h>
#include <stdlib.h>
/* Allow for C++ users */
#ifdef __cplusplus
extern "C" {
#endif
/* Options */
#define PCRE_CASELESS 0x0001
#define PCRE_MULTILINE 0x0002
#define PCRE_DOTALL 0x0004
#define PCRE_EXTENDED 0x0008
#define PCRE_ANCHORED 0x0010
#define PCRE_DOLLAR_ENDONLY 0x0020
#define PCRE_EXTRA 0x0040
#define PCRE_NOTBOL 0x0080
#define PCRE_NOTEOL 0x0100
#define PCRE_UNGREEDY 0x0200
#define PCRE_NOTEMPTY 0x0400
/* Exec-time and get-time error codes */
#define PCRE_ERROR_NOMATCH (-1)
#define PCRE_ERROR_NULL (-2)
#define PCRE_ERROR_BADOPTION (-3)
#define PCRE_ERROR_BADMAGIC (-4)
#define PCRE_ERROR_UNKNOWN_NODE (-5)
#define PCRE_ERROR_NOMEMORY (-6)
#define PCRE_ERROR_NOSUBSTRING (-7)
/* Types */
typedef void pcre;
typedef void pcre_extra;
/* Store get and free functions. These can be set to alternative malloc/free
functions if required. Some magic is required for Win32 DLL; it is null on
other OS. */
PCRE_DL_IMPORT extern void *(*pcre_malloc)(size_t);
PCRE_DL_IMPORT extern void (*pcre_free)(void *);
#undef PCRE_DL_IMPORT
/* Functions */
extern pcre *pcre_compile(const char *, int, const char **, int *,
const unsigned char *);
extern int pcre_copy_substring(const char *, int *, int, int, char *, int);
extern int pcre_exec(const pcre *, const pcre_extra *, const char *,
int, int, int, int *, int);
extern int pcre_get_substring(const char *, int *, int, int, const char **);
extern int pcre_get_substring_list(const char *, int *, int, const char ***);
extern int pcre_info(const pcre *, int *, int *);
extern unsigned const char *pcre_maketables(void);
extern pcre_extra *pcre_study(const pcre *, int, const char **);
extern const char *pcre_version(void);
#ifdef __cplusplus
} /* extern "C" */
#endif
#endif /* End of pcre.h */

View File

@ -1,141 +0,0 @@
.TH PCRE 3
.SH NAME
pcreposix - POSIX API for Perl-compatible regular expressions.
.SH SYNOPSIS
.B #include <pcreposix.h>
.PP
.SM
.br
.B int regcomp(regex_t *\fIpreg\fR, const char *\fIpattern\fR,
.ti +5n
.B int \fIcflags\fR);
.PP
.br
.B int regexec(regex_t *\fIpreg\fR, const char *\fIstring\fR,
.ti +5n
.B size_t \fInmatch\fR, regmatch_t \fIpmatch\fR[], int \fIeflags\fR);
.PP
.br
.B size_t regerror(int \fIerrcode\fR, const regex_t *\fIpreg\fR,
.ti +5n
.B char *\fIerrbuf\fR, size_t \fIerrbuf_size\fR);
.PP
.br
.B void regfree(regex_t *\fIpreg\fR);
.SH DESCRIPTION
This set of functions provides a POSIX-style API to the PCRE regular expression
package. See the \fBpcre\fR documentation for a description of the native API,
which contains additional functionality.
The functions described here are just wrapper functions that ultimately call
the native API. Their prototypes are defined in the \fBpcreposix.h\fR header
file, and on Unix systems the library itself is called \fBpcreposix.a\fR, so
can be accessed by adding \fB-lpcreposix\fR to the command for linking an
application which uses them. Because the POSIX functions call the native ones,
it is also necessary to add \fR-lpcre\fR.
As I am pretty ignorant about POSIX, these functions must be considered as
experimental. I have implemented only those option bits that can be reasonably
mapped to PCRE native options. Other POSIX options are not even defined. It may
be that it is useful to define, but ignore, other options. Feedback from more
knowledgeable folk may cause this kind of detail to change.
When PCRE is called via these functions, it is only the API that is POSIX-like
in style. The syntax and semantics of the regular expressions themselves are
still those of Perl, subject to the setting of various PCRE options, as
described below.
The header for these functions is supplied as \fBpcreposix.h\fR to avoid any
potential clash with other POSIX libraries. It can, of course, be renamed or
aliased as \fBregex.h\fR, which is the "correct" name. It provides two
structure types, \fIregex_t\fR for compiled internal forms, and
\fIregmatch_t\fR for returning captured substrings. It also defines some
constants whose names start with "REG_"; these are used for setting options and
identifying error codes.
.SH COMPILING A PATTERN
The function \fBregcomp()\fR is called to compile a pattern into an
internal form. The pattern is a C string terminated by a binary zero, and
is passed in the argument \fIpattern\fR. The \fIpreg\fR argument is a pointer
to a regex_t structure which is used as a base for storing information about
the compiled expression.
The argument \fIcflags\fR is either zero, or contains one or more of the bits
defined by the following macros:
REG_ICASE
The PCRE_CASELESS option is set when the expression is passed for compilation
to the native function.
REG_NEWLINE
The PCRE_MULTILINE option is set when the expression is passed for compilation
to the native function.
The yield of \fBregcomp()\fR is zero on success, and non-zero otherwise. The
\fIpreg\fR structure is filled in on success, and one member of the structure
is publicized: \fIre_nsub\fR contains the number of capturing subpatterns in
the regular expression. Various error codes are defined in the header file.
.SH MATCHING A PATTERN
The function \fBregexec()\fR is called to match a pre-compiled pattern
\fIpreg\fR against a given \fIstring\fR, which is terminated by a zero byte,
subject to the options in \fIeflags\fR. These can be:
REG_NOTBOL
The PCRE_NOTBOL option is set when calling the underlying PCRE matching
function.
REG_NOTEOL
The PCRE_NOTEOL option is set when calling the underlying PCRE matching
function.
The portion of the string that was matched, and also any captured substrings,
are returned via the \fIpmatch\fR argument, which points to an array of
\fInmatch\fR structures of type \fIregmatch_t\fR, containing the members
\fIrm_so\fR and \fIrm_eo\fR. These contain the offset to the first character of
each substring and the offset to the first character after the end of each
substring, respectively. The 0th element of the vector relates to the entire
portion of \fIstring\fR that was matched; subsequent elements relate to the
capturing subpatterns of the regular expression. Unused entries in the array
have both structure members set to -1.
A successful match yields a zero return; various error codes are defined in the
header file, of which REG_NOMATCH is the "expected" failure code.
.SH ERROR MESSAGES
The \fBregerror()\fR function maps a non-zero errorcode from either
\fBregcomp\fR or \fBregexec\fR to a printable message. If \fIpreg\fR is not
NULL, the error should have arisen from the use of that structure. A message
terminated by a binary zero is placed in \fIerrbuf\fR. The length of the
message, including the zero, is limited to \fIerrbuf_size\fR. The yield of the
function is the size of buffer needed to hold the whole message.
.SH STORAGE
Compiling a regular expression causes memory to be allocated and associated
with the \fIpreg\fR structure. The function \fBregfree()\fR frees all such
memory, after which \fIpreg\fR may no longer be used as a compiled expression.
.SH AUTHOR
Philip Hazel <ph10@cam.ac.uk>
.br
University Computing Service,
.br
New Museums Site,
.br
Cambridge CB2 3QG, England.
.br
Phone: +44 1223 334714
Copyright (c) 1997-1999 University of Cambridge.

View File

@ -1,182 +0,0 @@
<HTML>
<HEAD>
<TITLE>pcreposix specification</TITLE>
</HEAD>
<body bgcolor="#FFFFFF" text="#00005A">
<H1>pcreposix specification</H1>
This HTML document has been generated automatically from the original man page.
If there is any nonsense in it, please consult the man page in case the
conversion went wrong.
<UL>
<LI><A NAME="TOC1" HREF="#SEC1">NAME</A>
<LI><A NAME="TOC2" HREF="#SEC2">SYNOPSIS</A>
<LI><A NAME="TOC3" HREF="#SEC3">DESCRIPTION</A>
<LI><A NAME="TOC4" HREF="#SEC4">COMPILING A PATTERN</A>
<LI><A NAME="TOC5" HREF="#SEC5">MATCHING A PATTERN</A>
<LI><A NAME="TOC6" HREF="#SEC6">ERROR MESSAGES</A>
<LI><A NAME="TOC7" HREF="#SEC7">STORAGE</A>
<LI><A NAME="TOC8" HREF="#SEC8">AUTHOR</A>
</UL>
<LI><A NAME="SEC1" HREF="#TOC1">NAME</A>
<P>
pcreposix - POSIX API for Perl-compatible regular expressions.
</P>
<LI><A NAME="SEC2" HREF="#TOC1">SYNOPSIS</A>
<P>
<B>#include &#60;pcreposix.h&#62;</B>
</P>
<P>
<B>int regcomp(regex_t *<I>preg</I>, const char *<I>pattern</I>,</B>
<B>int <I>cflags</I>);</B>
</P>
<P>
<B>int regexec(regex_t *<I>preg</I>, const char *<I>string</I>,</B>
<B>size_t <I>nmatch</I>, regmatch_t <I>pmatch</I>[], int <I>eflags</I>);</B>
</P>
<P>
<B>size_t regerror(int <I>errcode</I>, const regex_t *<I>preg</I>,</B>
<B>char *<I>errbuf</I>, size_t <I>errbuf_size</I>);</B>
</P>
<P>
<B>void regfree(regex_t *<I>preg</I>);</B>
</P>
<LI><A NAME="SEC3" HREF="#TOC1">DESCRIPTION</A>
<P>
This set of functions provides a POSIX-style API to the PCRE regular expression
package. See the <B>pcre</B> documentation for a description of the native API,
which contains additional functionality.
</P>
<P>
The functions described here are just wrapper functions that ultimately call
the native API. Their prototypes are defined in the <B>pcreposix.h</B> header
file, and on Unix systems the library itself is called <B>pcreposix.a</B>, so
can be accessed by adding <B>-lpcreposix</B> to the command for linking an
application which uses them. Because the POSIX functions call the native ones,
it is also necessary to add \fR-lpcre\fR.
</P>
<P>
As I am pretty ignorant about POSIX, these functions must be considered as
experimental. I have implemented only those option bits that can be reasonably
mapped to PCRE native options. Other POSIX options are not even defined. It may
be that it is useful to define, but ignore, other options. Feedback from more
knowledgeable folk may cause this kind of detail to change.
</P>
<P>
When PCRE is called via these functions, it is only the API that is POSIX-like
in style. The syntax and semantics of the regular expressions themselves are
still those of Perl, subject to the setting of various PCRE options, as
described below.
</P>
<P>
The header for these functions is supplied as <B>pcreposix.h</B> to avoid any
potential clash with other POSIX libraries. It can, of course, be renamed or
aliased as <B>regex.h</B>, which is the "correct" name. It provides two
structure types, <I>regex_t</I> for compiled internal forms, and
<I>regmatch_t</I> for returning captured substrings. It also defines some
constants whose names start with "REG_"; these are used for setting options and
identifying error codes.
</P>
<LI><A NAME="SEC4" HREF="#TOC1">COMPILING A PATTERN</A>
<P>
The function <B>regcomp()</B> is called to compile a pattern into an
internal form. The pattern is a C string terminated by a binary zero, and
is passed in the argument <I>pattern</I>. The <I>preg</I> argument is a pointer
to a regex_t structure which is used as a base for storing information about
the compiled expression.
</P>
<P>
The argument <I>cflags</I> is either zero, or contains one or more of the bits
defined by the following macros:
</P>
<P>
<PRE>
REG_ICASE
</PRE>
</P>
<P>
The PCRE_CASELESS option is set when the expression is passed for compilation
to the native function.
</P>
<P>
<PRE>
REG_NEWLINE
</PRE>
</P>
<P>
The PCRE_MULTILINE option is set when the expression is passed for compilation
to the native function.
</P>
<P>
The yield of <B>regcomp()</B> is zero on success, and non-zero otherwise. The
<I>preg</I> structure is filled in on success, and one member of the structure
is publicized: <I>re_nsub</I> contains the number of capturing subpatterns in
the regular expression. Various error codes are defined in the header file.
</P>
<LI><A NAME="SEC5" HREF="#TOC1">MATCHING A PATTERN</A>
<P>
The function <B>regexec()</B> is called to match a pre-compiled pattern
<I>preg</I> against a given <I>string</I>, which is terminated by a zero byte,
subject to the options in <I>eflags</I>. These can be:
</P>
<P>
<PRE>
REG_NOTBOL
</PRE>
</P>
<P>
The PCRE_NOTBOL option is set when calling the underlying PCRE matching
function.
</P>
<P>
<PRE>
REG_NOTEOL
</PRE>
</P>
<P>
The PCRE_NOTEOL option is set when calling the underlying PCRE matching
function.
</P>
<P>
The portion of the string that was matched, and also any captured substrings,
are returned via the <I>pmatch</I> argument, which points to an array of
<I>nmatch</I> structures of type <I>regmatch_t</I>, containing the members
<I>rm_so</I> and <I>rm_eo</I>. These contain the offset to the first character of
each substring and the offset to the first character after the end of each
substring, respectively. The 0th element of the vector relates to the entire
portion of <I>string</I> that was matched; subsequent elements relate to the
capturing subpatterns of the regular expression. Unused entries in the array
have both structure members set to -1.
</P>
<P>
A successful match yields a zero return; various error codes are defined in the
header file, of which REG_NOMATCH is the "expected" failure code.
</P>
<LI><A NAME="SEC6" HREF="#TOC1">ERROR MESSAGES</A>
<P>
The <B>regerror()</B> function maps a non-zero errorcode from either
<B>regcomp</B> or <B>regexec</B> to a printable message. If <I>preg</I> is not
NULL, the error should have arisen from the use of that structure. A message
terminated by a binary zero is placed in <I>errbuf</I>. The length of the
message, including the zero, is limited to <I>errbuf_size</I>. The yield of the
function is the size of buffer needed to hold the whole message.
</P>
<LI><A NAME="SEC7" HREF="#TOC1">STORAGE</A>
<P>
Compiling a regular expression causes memory to be allocated and associated
with the <I>preg</I> structure. The function <B>regfree()</B> frees all such
memory, after which <I>preg</I> may no longer be used as a compiled expression.
</P>
<LI><A NAME="SEC8" HREF="#TOC1">AUTHOR</A>
<P>
Philip Hazel &#60;ph10@cam.ac.uk&#62;
<BR>
University Computing Service,
<BR>
New Museums Site,
<BR>
Cambridge CB2 3QG, England.
<BR>
Phone: +44 1223 334714
</P>
<P>
Copyright (c) 1997-1999 University of Cambridge.

View File

@ -1,150 +0,0 @@
NAME
pcreposix - POSIX API for Perl-compatible regular expres-
sions.
SYNOPSIS
#include <pcreposix.h>
int regcomp(regex_t *preg, const char *pattern,
int cflags);
int regexec(regex_t *preg, const char *string,
size_t nmatch, regmatch_t pmatch[], int eflags);
size_t regerror(int errcode, const regex_t *preg,
char *errbuf, size_t errbuf_size);
void regfree(regex_t *preg);
DESCRIPTION
This set of functions provides a POSIX-style API to the PCRE
regular expression package. See the pcre documentation for a
description of the native API, which contains additional
functionality.
The functions described here are just wrapper functions that
ultimately call the native API. Their prototypes are defined
in the pcreposix.h header file, and on Unix systems the
library itself is called pcreposix.a, so can be accessed by
adding -lpcreposix to the command for linking an application
which uses them. Because the POSIX functions call the native
ones, it is also necessary to add -lpcre.
As I am pretty ignorant about POSIX, these functions must be
considered as experimental. I have implemented only those
option bits that can be reasonably mapped to PCRE native
options. Other POSIX options are not even defined. It may be
that it is useful to define, but ignore, other options.
Feedback from more knowledgeable folk may cause this kind of
detail to change.
When PCRE is called via these functions, it is only the API
that is POSIX-like in style. The syntax and semantics of the
regular expressions themselves are still those of Perl, sub-
ject to the setting of various PCRE options, as described
below.
The header for these functions is supplied as pcreposix.h to
avoid any potential clash with other POSIX libraries. It
can, of course, be renamed or aliased as regex.h, which is
the "correct" name. It provides two structure types, regex_t
for compiled internal forms, and regmatch_t for returning
captured substrings. It also defines some constants whose
names start with "REG_"; these are used for setting options
and identifying error codes.
COMPILING A PATTERN
The function regcomp() is called to compile a pattern into
an internal form. The pattern is a C string terminated by a
binary zero, and is passed in the argument pattern. The preg
argument is a pointer to a regex_t structure which is used
as a base for storing information about the compiled expres-
sion.
The argument cflags is either zero, or contains one or more
of the bits defined by the following macros:
REG_ICASE
The PCRE_CASELESS option is set when the expression is
passed for compilation to the native function.
REG_NEWLINE
The PCRE_MULTILINE option is set when the expression is
passed for compilation to the native function.
The yield of regcomp() is zero on success, and non-zero oth-
erwise. The preg structure is filled in on success, and one
member of the structure is publicized: re_nsub contains the
number of capturing subpatterns in the regular expression.
Various error codes are defined in the header file.
MATCHING A PATTERN
The function regexec() is called to match a pre-compiled
pattern preg against a given string, which is terminated by
a zero byte, subject to the options in eflags. These can be:
REG_NOTBOL
The PCRE_NOTBOL option is set when calling the underlying
PCRE matching function.
REG_NOTEOL
The PCRE_NOTEOL option is set when calling the underlying
PCRE matching function.
The portion of the string that was matched, and also any
captured substrings, are returned via the pmatch argument,
which points to an array of nmatch structures of type
regmatch_t, containing the members rm_so and rm_eo. These
contain the offset to the first character of each substring
and the offset to the first character after the end of each
substring, respectively. The 0th element of the vector
relates to the entire portion of string that was matched;
subsequent elements relate to the capturing subpatterns of
the regular expression. Unused entries in the array have
both structure members set to -1.
A successful match yields a zero return; various error codes
are defined in the header file, of which REG_NOMATCH is the
"expected" failure code.
ERROR MESSAGES
The regerror() function maps a non-zero errorcode from
either regcomp or regexec to a printable message. If preg is
not NULL, the error should have arisen from the use of that
structure. A message terminated by a binary zero is placed
in errbuf. The length of the message, including the zero, is
limited to errbuf_size. The yield of the function is the
size of buffer needed to hold the whole message.
STORAGE
Compiling a regular expression causes memory to be allocated
and associated with the preg structure. The function reg-
free() frees all such memory, after which preg may no longer
be used as a compiled expression.
AUTHOR
Philip Hazel <ph10@cam.ac.uk>
University Computing Service,
New Museums Site,
Cambridge CB2 3QG, England.
Phone: +44 1223 334714
Copyright (c) 1997-1999 University of Cambridge.

View File

@ -1,250 +0,0 @@
/*************************************************
* Perl-Compatible Regular Expressions *
*************************************************/
/*
This is a library of functions to support regular expressions whose syntax
and semantics are as close as possible to those of the Perl 5 language. See
the file Tech.Notes for some information on the internals.
This module is a wrapper that provides a POSIX API to the underlying PCRE
functions.
Written by: Philip Hazel <ph10@cam.ac.uk>
Copyright (c) 1997-1999 University of Cambridge
-----------------------------------------------------------------------------
Permission is granted to anyone to use this software for any purpose on any
computer system, and to redistribute it freely, subject to the following
restrictions:
1. This software is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
2. The origin of this software must not be misrepresented, either by
explicit claim or by omission.
3. Altered versions must be plainly marked as such, and must not be
misrepresented as being the original software.
4. If PCRE is embedded in any software that is released under the GNU
General Purpose Licence (GPL), then the terms of that licence shall
supersede any condition above with which it is incompatible.
-----------------------------------------------------------------------------
*/
#include "internal.h"
#include "pcreposix.h"
#include "stdlib.h"
/* Corresponding tables of PCRE error messages and POSIX error codes. */
static const char *estring[] = {
ERR1, ERR2, ERR3, ERR4, ERR5, ERR6, ERR7, ERR8, ERR9, ERR10,
ERR11, ERR12, ERR13, ERR14, ERR15, ERR16, ERR17, ERR18, ERR19, ERR20,
ERR21, ERR22, ERR23, ERR24, ERR25 };
static int eint[] = {
REG_EESCAPE, /* "\\ at end of pattern" */
REG_EESCAPE, /* "\\c at end of pattern" */
REG_EESCAPE, /* "unrecognized character follows \\" */
REG_BADBR, /* "numbers out of order in {} quantifier" */
REG_BADBR, /* "number too big in {} quantifier" */
REG_EBRACK, /* "missing terminating ] for character class" */
REG_ECTYPE, /* "invalid escape sequence in character class" */
REG_ERANGE, /* "range out of order in character class" */
REG_BADRPT, /* "nothing to repeat" */
REG_BADRPT, /* "operand of unlimited repeat could match the empty string" */
REG_ASSERT, /* "internal error: unexpected repeat" */
REG_BADPAT, /* "unrecognized character after (?" */
REG_ESIZE, /* "too many capturing parenthesized sub-patterns" */
REG_EPAREN, /* "missing )" */
REG_ESUBREG, /* "back reference to non-existent subpattern" */
REG_INVARG, /* "erroffset passed as NULL" */
REG_INVARG, /* "unknown option bit(s) set" */
REG_EPAREN, /* "missing ) after comment" */
REG_ESIZE, /* "too many sets of parentheses" */
REG_ESIZE, /* "regular expression too large" */
REG_ESPACE, /* "failed to get memory" */
REG_EPAREN, /* "unmatched brackets" */
REG_ASSERT, /* "internal error: code overflow" */
REG_BADPAT, /* "unrecognized character after (?<" */
REG_BADPAT, /* "lookbehind assertion is not fixed length" */
REG_BADPAT, /* "malformed number after (?(" */
REG_BADPAT, /* "conditional group containe more than two branches" */
REG_BADPAT /* "assertion expected after (?(" */
};
/* Table of texts corresponding to POSIX error codes */
static const char *pstring[] = {
"", /* Dummy for value 0 */
"internal error", /* REG_ASSERT */
"invalid repeat counts in {}", /* BADBR */
"pattern error", /* BADPAT */
"? * + invalid", /* BADRPT */
"unbalanced {}", /* EBRACE */
"unbalanced []", /* EBRACK */
"collation error - not relevant", /* ECOLLATE */
"bad class", /* ECTYPE */
"bad escape sequence", /* EESCAPE */
"empty expression", /* EMPTY */
"unbalanced ()", /* EPAREN */
"bad range inside []", /* ERANGE */
"expression too big", /* ESIZE */
"failed to get memory", /* ESPACE */
"bad back reference", /* ESUBREG */
"bad argument", /* INVARG */
"match failed" /* NOMATCH */
};
/*************************************************
* Translate PCRE text code to int *
*************************************************/
/* PCRE compile-time errors are given as strings defined as macros. We can just
look them up in a table to turn them into POSIX-style error codes. */
static int
pcre_posix_error_code(const char *s)
{
size_t i;
for (i = 0; i < sizeof(estring)/sizeof(char *); i++)
if (strcmp(s, estring[i]) == 0) return eint[i];
return REG_ASSERT;
}
/*************************************************
* Translate error code to string *
*************************************************/
size_t
regerror(int errcode, const regex_t *preg, char *errbuf, size_t errbuf_size)
{
const char *message, *addmessage;
size_t length, addlength;
message = (errcode >= (int)(sizeof(pstring)/sizeof(char *)))?
"unknown error code" : pstring[errcode];
length = strlen(message) + 1;
addmessage = " at offset ";
addlength = (preg != NULL && (int)preg->re_erroffset != -1)?
strlen(addmessage) + 6 : 0;
if (errbuf_size > 0)
{
if (addlength > 0 && errbuf_size >= length + addlength)
sprintf(errbuf, "%s%s%-6d", message, addmessage, (int)preg->re_erroffset);
else
{
strncpy(errbuf, message, errbuf_size - 1);
errbuf[errbuf_size-1] = 0;
}
}
return length + addlength;
}
/*************************************************
* Free store held by a regex *
*************************************************/
void
regfree(regex_t *preg)
{
(pcre_free)(preg->re_pcre);
}
/*************************************************
* Compile a regular expression *
*************************************************/
/*
Arguments:
preg points to a structure for recording the compiled expression
pattern the pattern to compile
cflags compilation flags
Returns: 0 on success
various non-zero codes on failure
*/
int
regcomp(regex_t *preg, const char *pattern, int cflags)
{
const char *errorptr;
int erroffset;
int options = 0;
if ((cflags & REG_ICASE) != 0) options |= PCRE_CASELESS;
if ((cflags & REG_NEWLINE) != 0) options |= PCRE_MULTILINE;
preg->re_pcre = pcre_compile(pattern, options, &errorptr, &erroffset, NULL);
preg->re_erroffset = erroffset;
if (preg->re_pcre == NULL) return pcre_posix_error_code(errorptr);
preg->re_nsub = pcre_info(preg->re_pcre, NULL, NULL);
return 0;
}
/*************************************************
* Match a regular expression *
*************************************************/
int
regexec(regex_t *preg, const char *str, size_t nmatch,
regmatch_t pmatch[], int eflags)
{
int rc;
int options = 0;
if ((eflags & REG_NOTBOL) != 0) options |= PCRE_NOTBOL;
if ((eflags & REG_NOTEOL) != 0) options |= PCRE_NOTEOL;
preg->re_erroffset = (size_t)(-1); /* Only has meaning after compile */
rc = pcre_exec(preg->re_pcre, NULL, str, (int)strlen(str), 0, options,
(int *)pmatch, nmatch * 2);
if (rc == 0) return 0; /* All pmatch were filled in */
if (rc > 0)
{
size_t i;
for (i = rc; i < nmatch; i++) pmatch[i].rm_so = pmatch[i].rm_eo = -1;
return 0;
}
else switch(rc)
{
case PCRE_ERROR_NOMATCH: return REG_NOMATCH;
case PCRE_ERROR_NULL: return REG_INVARG;
case PCRE_ERROR_BADOPTION: return REG_INVARG;
case PCRE_ERROR_BADMAGIC: return REG_INVARG;
case PCRE_ERROR_UNKNOWN_NODE: return REG_ASSERT;
case PCRE_ERROR_NOMEMORY: return REG_ESPACE;
default: return REG_ASSERT;
}
}
/* End of pcreposix.c */

View File

@ -1,82 +0,0 @@
/*************************************************
* Perl-Compatible Regular Expressions *
*************************************************/
/* Copyright (c) 1997-1999 University of Cambridge */
#ifndef _PCREPOSIX_H
#define _PCREPOSIX_H
/* This is the header for the POSIX wrapper interface to the PCRE Perl-
Compatible Regular Expression library. It defines the things POSIX says should
be there. I hope. */
/* Have to include stdlib.h in order to ensure that size_t is defined. */
#include <stdlib.h>
/* Allow for C++ users */
#ifdef __cplusplus
extern "C" {
#endif
/* Options defined by POSIX. */
#define REG_ICASE 0x01
#define REG_NEWLINE 0x02
#define REG_NOTBOL 0x04
#define REG_NOTEOL 0x08
/* Error values. Not all these are relevant or used by the wrapper. */
enum {
REG_ASSERT = 1, /* internal error ? */
REG_BADBR, /* invalid repeat counts in {} */
REG_BADPAT, /* pattern error */
REG_BADRPT, /* ? * + invalid */
REG_EBRACE, /* unbalanced {} */
REG_EBRACK, /* unbalanced [] */
REG_ECOLLATE, /* collation error - not relevant */
REG_ECTYPE, /* bad class */
REG_EESCAPE, /* bad escape sequence */
REG_EMPTY, /* empty expression */
REG_EPAREN, /* unbalanced () */
REG_ERANGE, /* bad range inside [] */
REG_ESIZE, /* expression too big */
REG_ESPACE, /* failed to get memory */
REG_ESUBREG, /* bad back reference */
REG_INVARG, /* bad argument */
REG_NOMATCH /* match failed */
};
/* The structure representing a compiled regular expression. */
typedef struct {
void *re_pcre;
size_t re_nsub;
size_t re_erroffset;
} regex_t;
/* The structure in which a captured offset is returned. */
typedef int regoff_t;
typedef struct {
regoff_t rm_so;
regoff_t rm_eo;
} regmatch_t;
/* The functions */
extern int regcomp(regex_t *, const char *, int);
extern int regexec(regex_t *, const char *, size_t, regmatch_t *, int);
extern size_t regerror(int, const regex_t *, char *, size_t);
extern void regfree(regex_t *);
#ifdef __cplusplus
} /* extern "C" */
#endif
#endif /* End of pcreposix.h */

File diff suppressed because it is too large Load Diff

View File

@ -1,143 +0,0 @@
#! /usr/bin/perl
# Program for testing regular expressions with perl to check that PCRE handles
# them the same.
# Function for turning a string into a string of printing chars
sub pchars {
my($t) = "";
foreach $c (split(//, @_[0]))
{
if (ord $c >= 32 && ord $c < 127) { $t .= $c; }
else { $t .= sprintf("\\x%02x", ord $c); }
}
$t;
}
# Read lines from named file or stdin and write to named file or stdout; lines
# consist of a regular expression, in delimiters and optionally followed by
# options, followed by a set of test data, terminated by an empty line.
# Sort out the input and output files
if (@ARGV > 0)
{
open(INFILE, "<$ARGV[0]") || die "Failed to open $ARGV[0]\n";
$infile = "INFILE";
}
else { $infile = "STDIN"; }
if (@ARGV > 1)
{
open(OUTFILE, ">$ARGV[1]") || die "Failed to open $ARGV[1]\n";
$outfile = "OUTFILE";
}
else { $outfile = "STDOUT"; }
printf($outfile "Perl $] Regular Expressions\n\n");
# Main loop
NEXT_RE:
for (;;)
{
printf " re> " if $infile eq "STDIN";
last if ! ($_ = <$infile>);
printf $outfile "$_" if $infile ne "STDIN";
next if ($_ eq "");
$pattern = $_;
$delimiter = substr($_, 0, 1);
while ($pattern !~ /^\s*(.).*\1/s)
{
printf " > " if $infile eq "STDIN";
last if ! ($_ = <$infile>);
printf $outfile "$_" if $infile ne "STDIN";
$pattern .= $_;
}
chomp($pattern);
$pattern =~ s/\s+$//;
# Check that the pattern is valid
eval "\$_ =~ ${pattern}";
if ($@)
{
printf $outfile "Error: $@";
next NEXT_RE;
}
# Read data lines and test them
for (;;)
{
printf "data> " if $infile eq "STDIN";
last NEXT_RE if ! ($_ = <$infile>);
chomp;
printf $outfile "$_\n" if $infile ne "STDIN";
s/\s+$//;
s/^\s+//;
last if ($_ eq "");
$_ = eval "\"$_\""; # To get escapes processed
$ok = 0;
eval "if (\$_ =~ ${pattern}) {" .
"\$z = \$&;" .
"\$a = \$1;" .
"\$b = \$2;" .
"\$c = \$3;" .
"\$d = \$4;" .
"\$e = \$5;" .
"\$f = \$6;" .
"\$g = \$7;" .
"\$h = \$8;" .
"\$i = \$9;" .
"\$j = \$10;" .
"\$k = \$11;" .
"\$l = \$12;" .
"\$m = \$13;" .
"\$n = \$14;" .
"\$o = \$15;" .
"\$p = \$16;" .
"\$ok = 1; }";
if ($@)
{
printf $outfile "Error: $@\n";
next NEXT_RE;
}
elsif (!$ok)
{
printf $outfile "No match\n";
}
else
{
@subs = ($z,$a,$b,$c,$d,$e,$f,$g,$h,$i,$j,$k,$l,$m,$n,$o,$p);
$last_printed = 0;
for ($i = 0; $i <= 17; $i++)
{
if ($i == 0 || defined $subs[$i])
{
while ($last_printed++ < $i-1)
{ printf $outfile ("%2d: <unset>\n", $last_printed); }
printf $outfile ("%2d: %s\n", $i, &pchars($subs[$i]));
$last_printed = $i;
}
}
}
}
}
printf $outfile "\n";
# End

View File

@ -1,76 +0,0 @@
.TH PGREP 1
.SH NAME
pgrep - a grep with Perl-compatible regular expressions.
.SH SYNOPSIS
.B pgrep [-Vchilnsvx] pattern [file] ...
.SH DESCRIPTION
\fBpgrep\fR searches files for character patterns, in the same way as other
grep commands do, but it uses the PCRE regular expression library to support
patterns that are compatible with the regular expressions of Perl 5. See
\fBpcre(3)\fR for a full description of syntax and semantics.
If no files are specified, \fBpgrep\fR reads the standard input. By default,
each line that matches the pattern is copied to the standard output, and if
there is more than one file, the file name is printed before each line of
output. However, there are options that can change how \fBpgrep\fR behaves.
Lines are limited to BUFSIZ characters. BUFSIZ is defined in \fB<stdio.h>\fR.
The newline character is removed from the end of each line before it is matched
against the pattern.
.SH OPTIONS
.TP 10
\fB-V\fR
Write the version number of the PCRE library being used to the standard error
stream.
.TP
\fB-c\fR
Do not print individual lines; instead just print a count of the number of
lines that would otherwise have been printed. If several files are given, a
count is printed for each of them.
.TP
\fB-h\fR
Suppress printing of filenames when searching multiple files.
.TP
\fB-i\fR
Ignore upper/lower case distinctions during comparisons.
.TP
\fB-l\fR
Instead of printing lines from the files, just print the names of the files
containing lines that would have been printed. Each file name is printed
once, on a separate line.
.TP
\fB-n\fR
Precede each line by its line number in the file.
.TP
\fB-s\fR
Work silently, that is, display nothing except error messages.
The exit status indicates whether any matches were found.
.TP
\fB-v\fR
Invert the sense of the match, so that lines which do \fInot\fR match the
pattern are now the ones that are found.
.TP
\fB-x\fR
Force the pattern to be anchored (it must start matching at the beginning of
the line) and in addition, require it to match the entire line. This is
equivalent to having ^ and $ characters at the start and end of each
alternative branch in the regular expression.
.SH SEE ALSO
\fBpcre(3)\fR, Perl 5 documentation
.SH DIAGNOSTICS
Exit status is 0 if any matches were found, 1 if no matches were found, and 2
for syntax errors or inacessible files (even if matches were found).
.SH AUTHOR
Philip Hazel <ph10@cam.ac.uk>
.br
Copyright (c) 1997-1999 University of Cambridge.

View File

@ -1,105 +0,0 @@
<HTML>
<HEAD>
<TITLE>pgrep specification</TITLE>
</HEAD>
<body bgcolor="#FFFFFF" text="#00005A">
<H1>pgrep specification</H1>
This HTML document has been generated automatically from the original man page.
If there is any nonsense in it, please consult the man page in case the
conversion went wrong.
<UL>
<LI><A NAME="TOC1" HREF="#SEC1">NAME</A>
<LI><A NAME="TOC2" HREF="#SEC2">SYNOPSIS</A>
<LI><A NAME="TOC3" HREF="#SEC3">DESCRIPTION</A>
<LI><A NAME="TOC4" HREF="#SEC4">OPTIONS</A>
<LI><A NAME="TOC5" HREF="#SEC5">SEE ALSO</A>
<LI><A NAME="TOC6" HREF="#SEC6">DIAGNOSTICS</A>
<LI><A NAME="TOC7" HREF="#SEC7">AUTHOR</A>
</UL>
<LI><A NAME="SEC1" HREF="#TOC1">NAME</A>
<P>
pgrep - a grep with Perl-compatible regular expressions.
</P>
<LI><A NAME="SEC2" HREF="#TOC1">SYNOPSIS</A>
<P>
<B>pgrep [-Vchilnsvx] pattern [file] ...</B>
</P>
<LI><A NAME="SEC3" HREF="#TOC1">DESCRIPTION</A>
<P>
<B>pgrep</B> searches files for character patterns, in the same way as other
grep commands do, but it uses the PCRE regular expression library to support
patterns that are compatible with the regular expressions of Perl 5. See
<B>pcre(3)</B> for a full description of syntax and semantics.
</P>
<P>
If no files are specified, <B>pgrep</B> reads the standard input. By default,
each line that matches the pattern is copied to the standard output, and if
there is more than one file, the file name is printed before each line of
output. However, there are options that can change how <B>pgrep</B> behaves.
</P>
<P>
Lines are limited to BUFSIZ characters. BUFSIZ is defined in <B>&#60;stdio.h&#62;</B>.
The newline character is removed from the end of each line before it is matched
against the pattern.
</P>
<LI><A NAME="SEC4" HREF="#TOC1">OPTIONS</A>
<P>
<B>-V</B>
Write the version number of the PCRE library being used to the standard error
stream.
</P>
<P>
<B>-c</B>
Do not print individual lines; instead just print a count of the number of
lines that would otherwise have been printed. If several files are given, a
count is printed for each of them.
</P>
<P>
<B>-h</B>
Suppress printing of filenames when searching multiple files.
</P>
<P>
<B>-i</B>
Ignore upper/lower case distinctions during comparisons.
</P>
<P>
<B>-l</B>
Instead of printing lines from the files, just print the names of the files
containing lines that would have been printed. Each file name is printed
once, on a separate line.
</P>
<P>
<B>-n</B>
Precede each line by its line number in the file.
</P>
<P>
<B>-s</B>
Work silently, that is, display nothing except error messages.
The exit status indicates whether any matches were found.
</P>
<P>
<B>-v</B>
Invert the sense of the match, so that lines which do <I>not</I> match the
pattern are now the ones that are found.
</P>
<P>
<B>-x</B>
Force the pattern to be anchored (it must start matching at the beginning of
the line) and in addition, require it to match the entire line. This is
equivalent to having ^ and $ characters at the start and end of each
alternative branch in the regular expression.
</P>
<LI><A NAME="SEC5" HREF="#TOC1">SEE ALSO</A>
<P>
<B>pcre(3)</B>, Perl 5 documentation
</P>
<LI><A NAME="SEC6" HREF="#TOC1">DIAGNOSTICS</A>
<P>
Exit status is 0 if any matches were found, 1 if no matches were found, and 2
for syntax errors or inacessible files (even if matches were found).
</P>
<LI><A NAME="SEC7" HREF="#TOC1">AUTHOR</A>
<P>
Philip Hazel &#60;ph10@cam.ac.uk&#62;
<BR>
Copyright (c) 1997-1999 University of Cambridge.

View File

@ -1,86 +0,0 @@
NAME
pgrep - a grep with Perl-compatible regular expressions.
SYNOPSIS
pgrep [-Vchilnsvx] pattern [file] ...
DESCRIPTION
pgrep searches files for character patterns, in the same way
as other grep commands do, but it uses the PCRE regular
expression library to support patterns that are compatible
with the regular expressions of Perl 5. See pcre(3) for a
full description of syntax and semantics.
If no files are specified, pgrep reads the standard input.
By default, each line that matches the pattern is copied to
the standard output, and if there is more than one file, the
file name is printed before each line of output. However,
there are options that can change how pgrep behaves.
Lines are limited to BUFSIZ characters. BUFSIZ is defined in
<stdio.h>. The newline character is removed from the end of
each line before it is matched against the pattern.
OPTIONS
-V Write the version number of the PCRE library being
used to the standard error stream.
-c Do not print individual lines; instead just print
a count of the number of lines that would other-
wise have been printed. If several files are
given, a count is printed for each of them.
-h Suppress printing of filenames when searching mul-
tiple files.
-i Ignore upper/lower case distinctions during com-
parisons.
-l Instead of printing lines from the files, just
print the names of the files containing lines that
would have been printed. Each file name is printed
once, on a separate line.
-n Precede each line by its line number in the file.
-s Work silently, that is, display nothing except
error messages. The exit status indicates whether
any matches were found.
-v Invert the sense of the match, so that lines which
do not match the pattern are now the ones that are
found.
-x Force the pattern to be anchored (it must start
matching at the beginning of the line) and in
addition, require it to match the entire line.
This is equivalent to having ^ and $ characters at
the start and end of each alternative branch in
the regular expression.
SEE ALSO
pcre(3), Perl 5 documentation
DIAGNOSTICS
Exit status is 0 if any matches were found, 1 if no matches
were found, and 2 for syntax errors or inacessible files
(even if matches were found).
AUTHOR
Philip Hazel <ph10@cam.ac.uk>
Copyright (c) 1997-1999 University of Cambridge.

View File

@ -1,225 +0,0 @@
/*************************************************
* PCRE grep program *
*************************************************/
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <errno.h>
#include "pcre.h"
#define FALSE 0
#define TRUE 1
typedef int BOOL;
/*************************************************
* Global variables *
*************************************************/
static pcre *pattern;
static pcre_extra *hints;
static BOOL count_only = FALSE;
static BOOL filenames_only = FALSE;
static BOOL invert = FALSE;
static BOOL number = FALSE;
static BOOL silent = FALSE;
static BOOL whole_lines = FALSE;
#ifdef STRERROR_FROM_ERRLIST
/*************************************************
* Provide strerror() for non-ANSI libraries *
*************************************************/
/* Some old-fashioned systems still around (e.g. SunOS4) don't have strerror()
in their libraries, but can provide the same facility by this simple
alternative function. */
extern int sys_nerr;
extern char *sys_errlist[];
char *
strerror(int n)
{
if (n < 0 || n >= sys_nerr) return "unknown error number";
return sys_errlist[n];
}
#endif /* STRERROR_FROM_ERRLIST */
/*************************************************
* Grep an individual file *
*************************************************/
static int
pgrep(FILE *in, char *name)
{
int rc = 1;
int linenumber = 0;
int count = 0;
int offsets[99];
char buffer[BUFSIZ];
while (fgets(buffer, sizeof(buffer), in) != NULL)
{
BOOL match;
int length = (int)strlen(buffer);
if (length > 0 && buffer[length-1] == '\n') buffer[--length] = 0;
linenumber++;
match = pcre_exec(pattern, hints, buffer, length, 0, 0, offsets, 99) >= 0;
if (match && whole_lines && offsets[1] != length) match = FALSE;
if (match != invert)
{
if (count_only) count++;
else if (filenames_only)
{
fprintf(stdout, "%s\n", (name == NULL)? "<stdin>" : name);
return 0;
}
else if (silent) return 0;
else
{
if (name != NULL) fprintf(stdout, "%s:", name);
if (number) fprintf(stdout, "%d:", linenumber);
fprintf(stdout, "%s\n", buffer);
}
rc = 0;
}
}
if (count_only)
{
if (name != NULL) fprintf(stdout, "%s:", name);
fprintf(stdout, "%d\n", count);
}
return rc;
}
/*************************************************
* Usage function *
*************************************************/
static int
usage(int rc)
{
fprintf(stderr, "Usage: pgrep [-Vchilnsvx] pattern [file] ...\n");
return rc;
}
/*************************************************
* Main program *
*************************************************/
int
main(int argc, char **argv)
{
int i;
int rc = 1;
int options = 0;
int errptr;
const char *error;
BOOL filenames = TRUE;
/* Process the options */
for (i = 1; i < argc; i++)
{
char *s;
if (argv[i][0] != '-') break;
s = argv[i] + 1;
while (*s != 0)
{
switch (*s++)
{
case 'c': count_only = TRUE; break;
case 'h': filenames = FALSE; break;
case 'i': options |= PCRE_CASELESS; break;
case 'l': filenames_only = TRUE;
case 'n': number = TRUE; break;
case 's': silent = TRUE; break;
case 'v': invert = TRUE; break;
case 'x': whole_lines = TRUE; options |= PCRE_ANCHORED; break;
case 'V':
fprintf(stderr, "PCRE version %s\n", pcre_version());
break;
default:
fprintf(stderr, "pgrep: unknown option %c\n", s[-1]);
return usage(2);
}
}
}
/* There must be at least a regexp argument */
if (i >= argc) return usage(0);
/* Compile the regular expression. */
pattern = pcre_compile(argv[i++], options, &error, &errptr, NULL);
if (pattern == NULL)
{
fprintf(stderr, "pgrep: error in regex at offset %d: %s\n", errptr, error);
return 2;
}
/* Study the regular expression, as we will be running it may times */
hints = pcre_study(pattern, 0, &error);
if (error != NULL)
{
fprintf(stderr, "pgrep: error while studing regex: %s\n", error);
return 2;
}
/* If there are no further arguments, do the business on stdin and exit */
if (i >= argc) return pgrep(stdin, NULL);
/* Otherwise, work through the remaining arguments as files. If there is only
one, don't give its name on the output. */
if (i == argc - 1) filenames = FALSE;
if (filenames_only) filenames = TRUE;
for (; i < argc; i++)
{
FILE *in = fopen(argv[i], "r");
if (in == NULL)
{
fprintf(stderr, "%s: failed to open: %s\n", argv[i], strerror(errno));
rc = 2;
}
else
{
int frc = pgrep(in, filenames? argv[i] : NULL);
if (frc == 0 && rc == 1) rc = 0;
fclose(in);
}
}
return rc;
}
/* End */

View File

@ -1,397 +0,0 @@
/*************************************************
* Perl-Compatible Regular Expressions *
*************************************************/
/*
This is a library of functions to support regular expressions whose syntax
and semantics are as close as possible to those of the Perl 5 language. See
the file Tech.Notes for some information on the internals.
Written by: Philip Hazel <ph10@cam.ac.uk>
Copyright (c) 1997-1999 University of Cambridge
-----------------------------------------------------------------------------
Permission is granted to anyone to use this software for any purpose on any
computer system, and to redistribute it freely, subject to the following
restrictions:
1. This software is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
2. The origin of this software must not be misrepresented, either by
explicit claim or by omission.
3. Altered versions must be plainly marked as such, and must not be
misrepresented as being the original software.
4. If PCRE is embedded in any software that is released under the GNU
General Purpose Licence (GPL), then the terms of that licence shall
supersede any condition above with which it is incompatible.
-----------------------------------------------------------------------------
*/
/* Include the internals header, which itself includes Standard C headers plus
the external pcre header. */
#include "internal.h"
/*************************************************
* Set a bit and maybe its alternate case *
*************************************************/
/* Given a character, set its bit in the table, and also the bit for the other
version of a letter if we are caseless.
Arguments:
start_bits points to the bit map
c is the character
caseless the caseless flag
cd the block with char table pointers
Returns: nothing
*/
static void
set_bit(uschar *start_bits, int c, BOOL caseless, compile_data *cd)
{
start_bits[c/8] |= (1 << (c&7));
if (caseless && (cd->ctypes[c] & ctype_letter) != 0)
start_bits[cd->fcc[c]/8] |= (1 << (cd->fcc[c]&7));
}
/*************************************************
* Create bitmap of starting chars *
*************************************************/
/* This function scans a compiled unanchored expression and attempts to build a
bitmap of the set of initial characters. If it can't, it returns FALSE. As time
goes by, we may be able to get more clever at doing this.
Arguments:
code points to an expression
start_bits points to a 32-byte table, initialized to 0
caseless the current state of the caseless flag
cd the block with char table pointers
Returns: TRUE if table built, FALSE otherwise
*/
static BOOL
set_start_bits(const uschar *code, uschar *start_bits, BOOL caseless,
compile_data *cd)
{
register int c;
/* This next statement and the later reference to dummy are here in order to
trick the optimizer of the IBM C compiler for OS/2 into generating correct
code. Apparently IBM isn't going to fix the problem, and we would rather not
disable optimization (in this module it actually makes a big difference, and
the pcre module can use all the optimization it can get). */
volatile int dummy;
do
{
const uschar *tcode = code + 3;
BOOL try_next = TRUE;
while (try_next)
{
try_next = FALSE;
/* If a branch starts with a bracket or a positive lookahead assertion,
recurse to set bits from within them. That's all for this branch. */
if ((int)*tcode >= OP_BRA || *tcode == OP_ASSERT)
{
if (!set_start_bits(tcode, start_bits, caseless, cd))
return FALSE;
}
else switch(*tcode)
{
default:
return FALSE;
/* Skip over lookbehind and negative lookahead assertions */
case OP_ASSERT_NOT:
case OP_ASSERTBACK:
case OP_ASSERTBACK_NOT:
try_next = TRUE;
do tcode += (tcode[1] << 8) + tcode[2]; while (*tcode == OP_ALT);
tcode += 3;
break;
/* Skip over an option setting, changing the caseless flag */
case OP_OPT:
caseless = (tcode[1] & PCRE_CASELESS) != 0;
tcode += 2;
try_next = TRUE;
break;
/* BRAZERO does the bracket, but carries on. */
case OP_BRAZERO:
case OP_BRAMINZERO:
if (!set_start_bits(++tcode, start_bits, caseless, cd))
return FALSE;
dummy = 1;
do tcode += (tcode[1] << 8) + tcode[2]; while (*tcode == OP_ALT);
tcode += 3;
try_next = TRUE;
break;
/* Single-char * or ? sets the bit and tries the next item */
case OP_STAR:
case OP_MINSTAR:
case OP_QUERY:
case OP_MINQUERY:
set_bit(start_bits, tcode[1], caseless, cd);
tcode += 2;
try_next = TRUE;
break;
/* Single-char upto sets the bit and tries the next */
case OP_UPTO:
case OP_MINUPTO:
set_bit(start_bits, tcode[3], caseless, cd);
tcode += 4;
try_next = TRUE;
break;
/* At least one single char sets the bit and stops */
case OP_EXACT: /* Fall through */
tcode++;
case OP_CHARS: /* Fall through */
tcode++;
case OP_PLUS:
case OP_MINPLUS:
set_bit(start_bits, tcode[1], caseless, cd);
break;
/* Single character type sets the bits and stops */
case OP_NOT_DIGIT:
for (c = 0; c < 32; c++)
start_bits[c] |= ~cd->cbits[c+cbit_digit];
break;
case OP_DIGIT:
for (c = 0; c < 32; c++)
start_bits[c] |= cd->cbits[c+cbit_digit];
break;
case OP_NOT_WHITESPACE:
for (c = 0; c < 32; c++)
start_bits[c] |= ~cd->cbits[c+cbit_space];
break;
case OP_WHITESPACE:
for (c = 0; c < 32; c++)
start_bits[c] |= cd->cbits[c+cbit_space];
break;
case OP_NOT_WORDCHAR:
for (c = 0; c < 32; c++)
start_bits[c] |= ~(cd->cbits[c] | cd->cbits[c+cbit_word]);
break;
case OP_WORDCHAR:
for (c = 0; c < 32; c++)
start_bits[c] |= (cd->cbits[c] | cd->cbits[c+cbit_word]);
break;
/* One or more character type fudges the pointer and restarts, knowing
it will hit a single character type and stop there. */
case OP_TYPEPLUS:
case OP_TYPEMINPLUS:
tcode++;
try_next = TRUE;
break;
case OP_TYPEEXACT:
tcode += 3;
try_next = TRUE;
break;
/* Zero or more repeats of character types set the bits and then
try again. */
case OP_TYPEUPTO:
case OP_TYPEMINUPTO:
tcode += 2; /* Fall through */
case OP_TYPESTAR:
case OP_TYPEMINSTAR:
case OP_TYPEQUERY:
case OP_TYPEMINQUERY:
switch(tcode[1])
{
case OP_NOT_DIGIT:
for (c = 0; c < 32; c++)
start_bits[c] |= ~cd->cbits[c+cbit_digit];
break;
case OP_DIGIT:
for (c = 0; c < 32; c++)
start_bits[c] |= cd->cbits[c+cbit_digit];
break;
case OP_NOT_WHITESPACE:
for (c = 0; c < 32; c++)
start_bits[c] |= ~cd->cbits[c+cbit_space];
break;
case OP_WHITESPACE:
for (c = 0; c < 32; c++)
start_bits[c] |= cd->cbits[c+cbit_space];
break;
case OP_NOT_WORDCHAR:
for (c = 0; c < 32; c++)
start_bits[c] |= ~(cd->cbits[c] | cd->cbits[c+cbit_word]);
break;
case OP_WORDCHAR:
for (c = 0; c < 32; c++)
start_bits[c] |= (cd->cbits[c] | cd->cbits[c+cbit_word]);
break;
}
tcode += 2;
try_next = TRUE;
break;
/* Character class: set the bits and either carry on or not,
according to the repeat count. */
case OP_CLASS:
{
tcode++;
for (c = 0; c < 32; c++) start_bits[c] |= tcode[c];
tcode += 32;
switch (*tcode)
{
case OP_CRSTAR:
case OP_CRMINSTAR:
case OP_CRQUERY:
case OP_CRMINQUERY:
tcode++;
try_next = TRUE;
break;
case OP_CRRANGE:
case OP_CRMINRANGE:
if (((tcode[1] << 8) + tcode[2]) == 0)
{
tcode += 5;
try_next = TRUE;
}
break;
}
}
break; /* End of class handling */
} /* End of switch */
} /* End of try_next loop */
code += (code[1] << 8) + code[2]; /* Advance to next branch */
}
while (*code == OP_ALT);
return TRUE;
}
/*************************************************
* Study a compiled expression *
*************************************************/
/* This function is handed a compiled expression that it must study to produce
information that will speed up the matching. It returns a pcre_extra block
which then gets handed back to pcre_exec().
Arguments:
re points to the compiled expression
options contains option bits
errorptr points to where to place error messages;
set NULL unless error
Returns: pointer to a pcre_extra block,
NULL on error or if no optimization possible
*/
pcre_extra *
pcre_study(const pcre *external_re, int options, const char **errorptr)
{
uschar start_bits[32];
real_pcre_extra *extra;
const real_pcre *re = (const real_pcre *)external_re;
compile_data compile_block;
*errorptr = NULL;
if (re == NULL || re->magic_number != MAGIC_NUMBER)
{
*errorptr = "argument is not a compiled regular expression";
return NULL;
}
if ((options & ~PUBLIC_STUDY_OPTIONS) != 0)
{
*errorptr = "unknown or incorrect option bit(s) set";
return NULL;
}
/* For an anchored pattern, or an unchored pattern that has a first char, or a
multiline pattern that matches only at "line starts", no further processing at
present. */
if ((re->options & (PCRE_ANCHORED|PCRE_FIRSTSET|PCRE_STARTLINE)) != 0)
return NULL;
/* Set the character tables in the block which is passed around */
compile_block.lcc = re->tables + lcc_offset;
compile_block.fcc = re->tables + fcc_offset;
compile_block.cbits = re->tables + cbits_offset;
compile_block.ctypes = re->tables + ctypes_offset;
/* See if we can find a fixed set of initial characters for the pattern. */
memset(start_bits, 0, 32 * sizeof(uschar));
if (!set_start_bits(re->code, start_bits, (re->options & PCRE_CASELESS) != 0,
&compile_block)) return NULL;
/* Get an "extra" block and put the information therein. */
extra = (real_pcre_extra *)(pcre_malloc)(sizeof(real_pcre_extra));
if (extra == NULL)
{
*errorptr = "failed to get memory";
return NULL;
}
extra->options = PCRE_STUDY_MAPPED;
memcpy(extra->start_bits, start_bits, sizeof(start_bits));
return (pcre_extra *)extra;
}
/* End of study.c */

File diff suppressed because it is too large Load Diff

View File

@ -1,589 +0,0 @@
/(a)b|/
/abc/
abc
defabc
\Aabc
*** Failers
\Adefabc
ABC
/^abc/
abc
\Aabc
*** Failers
defabc
\Adefabc
/a+bc/
/a*bc/
/a{3}bc/
/(abc|a+z)/
/^abc$/
abc
*** Failers
def\nabc
/ab\gdef/X
/(?X)ab\gdef/X
/x{5,4}/
/z{65536}/
/[abcd/
/[\B]/
/[a-\w]/
/[z-a]/
/^*/
/(abc/
/(?# abc/
/(?z)abc/
/.*b/
/.*?b/
/cat|dog|elephant/
this sentence eventually mentions a cat
this sentences rambles on and on for a while and then reaches elephant
/cat|dog|elephant/S
this sentence eventually mentions a cat
this sentences rambles on and on for a while and then reaches elephant
/cat|dog|elephant/iS
this sentence eventually mentions a CAT cat
this sentences rambles on and on for a while to elephant ElePhant
/a|[bcd]/S
/(a|[^\dZ])/S
/(a|b)*[\s]/S
/(ab\2)/
/{4,5}abc/
/(a)(b)(c)\2/
abcb
\O0abcb
\O3abcb
\O6abcb
\O9abcb
\O12abcb
/(a)bc|(a)(b)\2/
abc
\O0abc
\O3abc
\O6abc
aba
\O0aba
\O3aba
\O6aba
\O9aba
\O12aba
/abc$/E
abc
*** Failers
abc\n
abc\ndef
/(a)(b)(c)(d)(e)\6/
/the quick brown fox/
the quick brown fox
this is a line with the quick brown fox
/the quick brown fox/A
the quick brown fox
*** Failers
this is a line with the quick brown fox
/ab(?z)cd/
/^abc|def/
abcdef
abcdef\B
/.*((abc)$|(def))/
defabc
\Zdefabc
/abc/P
abc
*** Failers
/^abc|def/P
abcdef
abcdef\B
/.*((abc)$|(def))/P
defabc
\Zdefabc
/the quick brown fox/P
the quick brown fox
*** Failers
The Quick Brown Fox
/the quick brown fox/Pi
the quick brown fox
The Quick Brown Fox
/abc.def/P
*** Failers
abc\ndef
/abc$/P
abc
abc\n
/(abc)\2/P
/(abc\1)/P
abc
/)/
/a[]b/
/[^aeiou ]{3,}/
co-processors, and for
/<.*>/
abc<def>ghi<klm>nop
/<.*?>/
abc<def>ghi<klm>nop
/<.*>/U
abc<def>ghi<klm>nop
/<.*>(?U)/
abc<def>ghi<klm>nop
/<.*?>/U
abc<def>ghi<klm>nop
/={3,}/U
abc========def
/(?U)={3,}?/
abc========def
/(?<!bar|cattle)foo/
foo
catfoo
*** Failers
the barfoo
and cattlefoo
/(?<=a+)b/
/(?<=aaa|b{0,3})b/
/(?<!(foo)a\1)bar/
/(?i)abc/
/(a|(?m)a)/
/(?i)^1234/
/(^b|(?i)^d)/
/(?s).*/
/[abcd]/S
/(?i)[abcd]/S
/(?m)[xy]|(b|c)/S
/(^a|^b)/m
/(?i)(^a|^b)/m
/(a)(?(1)a|b|c)/
/(?(?=a)a|b|c)/
/(?(1a)/
/(?(?i))/
/(?(abc))/
/(?(?<ab))/
/((?s)blah)\s+\1/
/((?i)blah)\s+\1/
/((?i)b)/DS
/(a*b|(?i:c*(?-i)d))/S
/a$/
a
a\n
*** Failers
\Za
\Za\n
/a$/m
a
a\n
\Za\n
*** Failers
\Za
/\Aabc/m
/^abc/m
/^((a+)(?U)([ab]+)(?-U)([bc]+)(\w*))/
aaaaabbbbbcccccdef
/(?<=foo)[ab]/S
/(?<!foo)(alpha|omega)/S
/(?!alphabet)[ab]/S
/(?<=foo\n)^bar/m
/(?>^abc)/m
abc
def\nabc
*** Failers
defabc
/(?<=ab(c+)d)ef/
/(?<=ab(?<=c+)d)ef/
/(?<=ab(c|de)f)g/
/The next three are in testinput2 because they have variable length branches/
/(?<=bullock|donkey)-cart/
the bullock-cart
a donkey-cart race
*** Failers
cart
horse-and-cart
/(?<=ab(?i)x|y|z)/
/(?>.*)(?<=(abcd)|(xyz))/
alphabetabcd
endingxyz
/(?<=ab(?i)x(?-i)y|(?i)z|b)ZZ/
abxyZZ
abXyZZ
ZZZ
zZZ
bZZ
BZZ
*** Failers
ZZ
abXYZZ
zzz
bzz
/(?<!(foo)a)bar/
bar
foobbar
*** Failers
fooabar
/This one is here because Perl 5.005_02 doesn't fail it/
/^(a)?(?(1)a|b)+$/
*** Failers
a
/This one is here because I think Perl 5.005_02 gets the setting of $1 wrong/
/^(a\1?){4}$/
aaaaaa
/These are syntax tests from Perl 5.005/
/a[b-a]/
/a[]b/
/a[/
/*a/
/(*)b/
/abc)/
/(abc/
/a**/
/)(/
/\1/
/\2/
/(a)|\2/
/a[b-a]/i
/a[]b/i
/a[/i
/*a/i
/(*)b/i
/abc)/i
/(abc/i
/a**/i
/)(/i
/:(?:/
/(?<%)b/
/a(?{)b/
/a(?{{})b/
/a(?{}})b/
/a(?{"{"})b/
/a(?{"{"}})b/
/(?(1?)a|b)/
/(?(1)a|b|c)/
/[a[:xyz:/
/(?<=x+)y/
/a{37,17}/
/abc/\
/abc/\P
/abc/\i
/(a)bc(d)/
abcd
abcd\C2
abcd\C5
/(.{20})/
abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz\C1
abcdefghijklmnopqrstuvwxyz\G1
/(.{15})/
abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz\C1\G1
/(.{16})/
abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz\C1\G1\L
/^(a|(bc))de(f)/
adef\G1\G2\G3\G4\L
bcdef\G1\G2\G3\G4\L
adefghijk\C0
/^abc\00def/
abc\00def\L\C0
/word ((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+
)((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+
)?)?)?)?)?)?)?)?)?otherword/M
/.*X/D
/.*X/Ds
/(.*X|^B)/D
/(.*X|^B)/Ds
/(?s)(.*X|^B)/D
/(?s:.*X|^B)/D
/\Biss\B/+
Mississippi
/\Biss\B/+P
Mississippi
/iss/G+
Mississippi
/\Biss\B/G+
Mississippi
/\Biss\B/g+
Mississippi
*** Failers
Mississippi\A
/(?<=[Ms])iss/g+
Mississippi
/(?<=[Ms])iss/G+
Mississippi
/^iss/g+
ississippi
/.*iss/g+
abciss\nxyzisspqr
/.i./+g
Mississippi
Mississippi\A
Missouri river
Missouri river\A
/^.is/+g
Mississippi
/^ab\n/g+
ab\nab\ncd
/^ab\n/mg+
ab\nab\ncd
/abc/
/abc|bac/
/(abc|bac)/
/(abc|(c|dc))/
/(abc|(d|de)c)/
/a*/
/a+/
/(baa|a+)/
/a{0,3}/
/baa{3,}/
/"([^\\"]+|\\.)*"/
/(abc|ab[cd])/
/(a|.)/
/a|ba|\w/
/abc(?=pqr)/
/...(?<=abc)/
/abc(?!pqr)/
/ab./
/ab[xyz]/
/abc*/
/ab.c*/
/a.c*/
/.c*/
/ac*/
/(a.c*|b.c*)/
/a.c*|aba/
/.+a/
/(?=abcda)a.*/
/(?=a)a.*/
/a(b)*/
/a\d*/
/ab\d*/
/a(\d)*/
/abcde{0,0}/
/ab\d+/
/a(?(1)b)/
/a(?(1)bag|big)/
/a(?(1)bag|big)*/
/a(?(1)bag|big)+/
/a(?(1)b..|b..)/
/ab\d{0}e/
/a?b?/
a
b
ab
\
*** Failers
\N
/|-/
abcd
-abc
\Nab-c
*** Failers
\Nabc
/.*?/g+
abc
/ End of test input /

File diff suppressed because it is too large Load Diff

View File

@ -1,64 +0,0 @@
/^[\w]+/
*** Failers
École
/^[\w]+/Lfr
École
/^[\w]+/
*** Failers
École
/^[\W]+/
École
/^[\W]+/Lfr
*** Failers
École
/[\b]/
\b
*** Failers
a
/[\b]/Lfr
\b
*** Failers
a
/^\w+/
*** Failers
École
/^\w+/Lfr
École
/(.+)\b(.+)/
École
/(.+)\b(.+)/Lfr
*** Failers
École
/École/i
École
*** Failers
école
/École/iLfr
École
école
/\w/IS
/\w/ISLfr
/^[\xc8-\xc9]/iLfr
École
école
/^[\xc8-\xc9]/Lfr
École
*** Failers
école

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,115 +0,0 @@
PCRE version 2.08 31-Aug-1999
/^[\w]+/
*** Failers
No match
École
No match
/^[\w]+/Lfr
École
0: École
/^[\w]+/
*** Failers
No match
École
No match
/^[\W]+/
École
0: \xc9
/^[\W]+/Lfr
*** Failers
0: ***
École
No match
/[\b]/
\b
0: \x08
*** Failers
No match
a
No match
/[\b]/Lfr
\b
0: \x08
*** Failers
No match
a
No match
/^\w+/
*** Failers
No match
École
No match
/^\w+/Lfr
École
0: École
/(.+)\b(.+)/
École
0: \xc9cole
1: \xc9
2: cole
/(.+)\b(.+)/Lfr
*** Failers
0: *** Failers
1: ***
2: Failers
École
No match
/École/i
École
0: \xc9cole
*** Failers
No match
école
No match
/École/iLfr
École
0: École
école
0: école
/\w/IS
Identifying subpattern count = 0
No options
No first char
No req char
Starting character set: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
/\w/ISLfr
Identifying subpattern count = 0
No options
No first char
No req char
Starting character set: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ğ Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü İ Ş ß à á â ã ä å
æ ç è é ê ë ì í î ï ğ ñ ò ó ô õ ö ø ù ú û ü ı ş ÿ
/^[\xc8-\xc9]/iLfr
École
0: É
école
0: é
/^[\xc8-\xc9]/Lfr
École
0: É
*** Failers
No match
école
No match