This patch syncs the regex implementation with gnulib (commit 0ee5212).
Only two changes in GLIBC regex testing are required:
1. posix/bug-regex28.c: as previously discussed [1] the change of
expected results on the pattern should be safe.
2. posix/PCRE.tests: the ERE (a)|\1 is malformed (in the sense that
the \1 doesn't mean anything) and although current GLIBC accepts
it has undefined behavior. This patch removes the specific test.
This sync contains some patches from thread 'Regex: Make libc regex
more usable outside GLIBC.' [2] which have been pushed upstream in
gnulib. This patches also fixes some regex issues (BZ #23233,
BZ #21163, BZ #18986, BZ #13762) and I did not add testcases for
both #23233 and #13762 because I couldn't think a simple way to
trigger the expected failure path to trigger them.
Checked on x86_64-linux-gnu and i686-linux-gnu.
[BZ #23233]
[BZ #21163]
[BZ #18986]
[BZ #13762]
* posix/Makefile (tests): Add bug-regex37 and bug-regex38.
* posix/PCRE.tests: Remove invalid test.
* posix/bug-regex28.c: Fix expected values for used syntax.
* posix/bug-regex37.c: New file.
* posix/bug-regex38.c: Likewise.
* posix/regcomp.c: Sync with gnulib.
* posix/regex.c: Likewise.
* posix/regex.h: Likewise.
* posix/regex_internal.c: Likewise.
* posix/regex_internal.h: Likewise.
* posix/regexec.c: Likewise.
[1] https://sourceware.org/ml/libc-alpha/2017-12/msg00807.html
[2] https://sourceware.org/ml/libc-alpha/2017-12/msg00237.html
next_last_offset.
(struct re_dfa_t): Remove unused member states_alloc.
* posix/regcomp.c (init_dfa): Don't initialize unused members.
2005-08-25 Paul Eggert <eggert@cs.ucla.edu>
* posix/regexec.c (set_regs): Don't alloca with an unbounded size.
alloca modernization/simplification for regex.
* posix/regex.c: Remove portability cruft for alloca. This no longer
needs to be at the start of the file, and can be moved into
regex_internal.h and simplified.
* posix/regex_internal.h: Include <alloca.h>.
(__libc_use_alloca) [!defined _LIBC]: New macro.
* posix/regexec.c (build_trtable): Remove "#ifdef _LIBC",
since the code now works outside glibc.
2005-09-06 Ulrich Drepper <drepper@redhat.com>
* include/regex.h: Remove use of _RE_ARGS.
2005-08-25 Paul Eggert <eggert@cs.ucla.edu>
* posix/regexec.c (find_recover_state): Change "err" to "*err".
2005-08-24 Paul Eggert <eggert@cs.ucla.edu>
* posix/regcomp.c (regerror): Pointer args are 'restrict',
as per POSIX.
* posix/regex.h (regerror): Likewise.
* manual/pattern.texi (POSIX Regexp Compilation): Likewise.
Similarly for regcomp and regexec. Also, first 2 args of regexec
and 2nd arg of regerror are const.
* posix/regex.c: Do not include <sys/types.h>, as POSIX no longer
requires this. (The code never needed it.)
2005-08-20 Paul Eggert <eggert@cs.ucla.edu>
* posix/regexec.c (sift_states_bkref): re_node_set_insert returns
int, not reg_errcode_t.
* posix/regex_internal.c (calc_state_hash): Put 'inline' before type,
since some broken compilers warn about it otherwise.
* posix/regcomp.c (create_initial_state): Remove duplicate decl.
2005-08-20 Paul Eggert <eggert@cs.ucla.edu>
* posix/regex.h (_RE_ARGS): Remove. No longer needed, since we assume
C89 or better. All uses removed.
2005-09-06 Ulrich Drepper <drepper@redhat.com>
* posix/regex.c: Prevent using C++ compilers.
2005-08-19 Paul Eggert <eggert@cs.ucla.edu>
* posix/regcomp.c (duplicate_node): Return new index, not an error
code, and let the caller return REG_ESPACE if out of space. This
removes an uninitialied-variable warning with GCC 4.0.1, and also
avoids taking the address of a local variable. All callers
changed.
2005-09-06 Ulrich Drepper <drepper@redhat.com>
* include/time.h (__strptime_internal): Rename parameter to avoid
bogus compiler warning.
2005-08-19 Jim Meyering <jim@meyering.net>
* posix/regexec.c (proceed_next_node): Redo local variables to
avoid GCC shadowing warnings.
2005-09-06 Ulrich Drepper <drepper@redhat.com>
* posix/regex_internal.c (re_acquire_state): Minor code rearrangement.
(re_acquire_state_context): Likewise.
2005-08-19 Paul Eggert <eggert@cs.ucla.edu>
* posix/regex_internal.c (re_string_realloc_buffers):
(re_node_set_insert, re_node_set_insert_last, re_dfa_add_node):
Rename local variables to avoid GCC shadowing warnings.
2005-07-08 Eric Blake <ebb9@byu.net>
Paul Eggert <eggert@cs.ucla.edu>
* posix/regcomp.c (init_dfa): Store __btowc value in wint_t, not
wchar_t. Remove now-unnecessary cast.
(build_range_exp): Likewise.
Update.
2004-11-18 Jakub Jelinek <jakub@redhat.com>
[BZ #544]
* posix/regex.h (RE_NO_SUB): New define.
* posix/regex_internal.h (OP_DELETED_SUBEXP): New.
(re_dfa_t): Add subexp_map.
* posix/regcomp.c (struct subexp_optimize): New type.
(optimize_subexps): New routine.
(re_compile_internal): Call it.
(re_compile_pattern): Set preg->no_sub to 1 if RE_NO_SUB.
(free_dfa_content): Free subexp_map.
(calc_inveclosure, calc_eclosure): Skip OP_DELETED_SUBEXP
nodes.
* posix/regexec.c (re_search_internal): If subexp_map
is not NULL, duplicate registers as needed.
* posix/Makefile: Add rules to build and run tst-regex2.
* posix/tst-regex2.c: New test.
* posix/rxspencer/tests: Fix last two tests (\0 -> \1).
Add some new tests for nested subexpressions.
* posix/regcomp.c (parse_expression): If token is OP_OPEN_DUP_NUM
and RE_CONTEXT_INVALID_DUP syntax flag is set, fail.
* posix/regex.h (RE_CONTEXT_INVALUD_OPS): New macro.
(RE_SYNTAX_POSIX_BASIC): Use RE_CONTEXT_INVALUD_OPS.
* posix/regcomp.c (parse_sub_exp): In case of not-matching ( )
return REG_EPAREN.
2003-09-26 Paolo Bonzini <bonzini@gnu.org>
* posix/regcomp.c (parse_sub_exp): Pass RE_CARET_ANCHORS_HERE
for the first token in a subexpression as well.
2003-10-02 Jakub Jelinek <jakub@redhat.com>
* posix/regcomp.c (peek_token): Add 2003-09-20 changes for anchor
handling again.
(parse_reg_exp): Likewise.
* posix/regex.h (RE_CARET_ANCHORS_HERE): Define.
* posix/bug-regex11.c (tests): Add new tests.
* posix/bug-regex12.c (tests): Add new test.
2003-09-20 Paolo Bonzini <bonzini@gnu.org>
* posix/regcomp.c (peek_token): Don't look back for ( or |
to check whether to treat a caret as special. It fails
for the (extended) regex \(^.
(parse, parse_reg_exp): Pass RE_CARET_ANCHORS_HERE to fetch_token.
* posix/regex.h: Define RE_CARET_ANCHORS_HERE.
* posix/regexec.c: Check out of bounds value before shifting.
* posix/regex_internal.h: Define __attribute for non-gcc.
2003-04-16 Jakub Jelinek <jakub@redhat.com>
* elf/Makefile (distribute): Add tst-tlsmod{[7-9],1[0-2]}.c and
tst-tls10.h.
(tests): Add tst-tls1[0-2].
(modules-names): Add tst-tlsmod{[7-8],1[0-2]}.
($(objpfx)tst-tlsmod8.so): Depend on tst-tlsmod7.so.
($(objpfx)tst-tlsmod10.so): Depend on tst-tlsmod9.so.
($(objpfx)tst-tlsmod12.so): Depend on tst-tlsmod11.so.
($(objpfx)tst-tls10): Depend on tst-tlsmod8.so.
($(objpfx)tst-tls11): Depend on tst-tlsmod10.so.
($(objpfx)tst-tls12): Depend on tst-tlsmod12.so.
* elf/tst-tls10.c: New test.
* elf/tst-tls11.c: New test.
* elf/tst-tls12.c: New test.
* elf/tst-tls10.h: New file.
* elf/tst-tlsmod7.c: New file.
* elf/tst-tlsmod8.c: New file.
* elf/tst-tlsmod9.c: New file.
* elf/tst-tlsmod10.c: New file.
* elf/tst-tlsmod11.c: New file.
* elf/tst-tlsmod12.c: New file.
2003-04-15 Steven Munroe <sjmunroe@us.ibm.com>
* sysdeps/powerpc/bits/atomic.h: Moved ppc32/64 specific code to ...
* sysdeps/powerpc/powerpc32/bits/atomic.h: New file.
* sysdeps/powerpc/powerpc64/bits/atomic.h: New file.
* posix/regex.h: Include <sys/types.h>.
2002-05-21 Isamu Hasegawa <isamu@yamato.ibm.com>
* posix/regex.c: Define `inline' as a macro into nothing for the
compilers which lack the keyword.
* posix/regex.h: (RE_SYNTAX_GNU_AWK): Remove RE_CONTEXT_INVALID_OPS
for the compatibility of gawk.
* posix/regcomp.c: Add fake implementation of isblank() for the
environments which lack the function.
Don't use free_charset() in case of non-i18n envs.
(build_range_exp): Don't use i18n related code in case of non-i18n
envs.
(build_collating_symbol): Likewise.
(build_equiv_class): Likewise.
(build_charclass): Likewise.
(re_compile_fastmap_iter): Likewise.
(parse_bracket_exp): Likewise.
(build_word_op): Likewise.
(regfree): Don't use free_charset() in case of non-i18n envs.
* posix/regex_internal.h: Remove COMPLEX_BRACKET from
re_token_type_t in case of non-i18n envs.
Don't define re_charset_t in case of non-i18n envs.
Change the type of wcs of re_string_t from wchar_t to wint_t,
since we store also WEOF.
* posix/regex_internal.c: (re_string_realloc_buffers): Change
the type of wcs of re_string_t from wchar_t to wint_t.
(re_string_reconstruct): Likewise.
(create_ci_newstate): Don't use i18n related code in case of
non-i18n envs.
(create_cd_newstate): Likewise.
2002-05-24 Ulrich Drepper <drepper@redhat.com>
* iconv/loop.c: Fix typo.
2002-05-23 Jakub Jelinek <jakub@redhat.com>
* inet/ether_line.c (ether_line): Fix a typo causing only
lower 4 bits of each ethernet address byte being assigned.
Don't modify what line points to.
* inet/tst-ether_aton.c (main): Add ether_line tests.
2002-05-23 Marcus Brinkmann <marcus@gnu.org>
* manual/filesys.texi: Don't make readlink example leak memory
when readlink fails.
2001-07-06 Paul Eggert <eggert@twinsun.com>
* manual/argp.texi: Remove ignored LGPL copyright notice; it's
not appropriate for documentation anyway.
* manual/libc-texinfo.sh: "Library General Public License" ->
"Lesser General Public License".
2001-07-06 Andreas Jaeger <aj@suse.de>
* All files under GPL/LGPL version 2: Place under LGPL version
2.1.
2000-10-29 Greg Louis <glouis@dynamicro.on.ca>
* posix/regex.h (__restrict_arr): Move definition out of #ifndef block.
Required because egcs-2.91.66 (aka 1.1.2) defines __restrict, but
doesn't define __restrict_arr.
* manual/search.texi: Correct description if VISIT values.
Patch by Ben Collins <bcollins@debian.org>.
* stdlib/random_r.c (__setstate_r): Correct offset when computing
new rptr and fptr. Reported by Michael Fischer <fischer@cs.yale.edu>.
* posix/regex.h: Add macro definitions to allow compiling outside
glibc.
* stdlib/strfmon.c: Don't report an error if final NUL is at the
end of the buffer. Set errno correctly if floating-point number
would overflow buffer.
* posix/regex.h: Update comment of
RE_SYNTAX_POSIX_MINIMAL_EXTENDED.
1998-12-10 Ulrich Drepper <drepper@cygnus.com>
* inet/getnetgrent_r.c (innetgr): Check host and domain name with
strcasecmp, not strcmp. [PR libc/894].
1998-12-08 Andreas Jaeger <aj@arthur.rhein-neckar.de>
* posix/regex.h: Declare re_comp, re_exec if compiling for libc to
get prototypes.
* wctype/wctype.h: Add prototypes for __iswblank_l and iswblank.
1998-12-08 Andreas Jaeger <aj@arthur.rhein-neckar.de>
* sysdeps/unix/sysv/linux/gethostid.c: Include <netdb.h> to get
prototype for __gethostbyname_r.
* include/time.h: Add declarations of internal interfaces.
* time/tzset.c: Remove declarations of internal interfaces.
* time/gmtime.c: Likewise.
* time/localtime.c: Likewise.
* time/offtime.c: Likewise.
* time/tzfile.c: Likewise.
1998-07-09 13:34 Ulrich Drepper <drepper@cygnus.com>
* grp/grp.h: Define gid_t also for __USE_XOPEN.
* io/fcntl.h: Include <sys/stat.h> also for __USE_XOPEN.
* io/utime.h: Define time_t also for __USE_XOPEN.
* io/sys/stat.h: Define time_t also for __USE_XOPEN.
Define *_t types except for pid_t also for __USE_XOPEN.
Define S_* macros also for __USE_XOPEN.
* locale/langinfo.h: Define CODESET, CRNCYSTR, RADIXCHAR, and
THOUSEP also for __USE_XOPEN.
* math/math.c: Define M_* macros also for __USE_XOPEN.
* math/bits/mathcalls.h: Declare hypot also for __USE_XOPEN.
* posix/fnmatch.h: Define FNM_NOSYS and for if _XOPEN_SOURCE is
defined.
* posix/glob.h: Likewise for GLOB_NOSYS.
* posix/regex.h: Likewise for REG_NOSYS.
* posix/wordexp.h: Likewise for WRDE_NOSYS.
* posix/unistd.h: Define *_t types also for __USE_XOPEN.
* posix/sys/wait.h: Define pid_t for __USE_XOPEN.
* pwd/pwd.h: Define gid_t and pid_t also for __USE_XOPEN.
* signal/signal.h: Define pid_t also fir __USE_XOPEN.
* sysdeps/unix/sysv/linux/bits/fcntl.h: Define _RSYNC and O_DSYNC also
for __USE_POSIX199309.
* sysdeps/unix/sysv/linux/bits/termios.h: Define the various B*
constants also for __USE_XOPEN.
* wcsmbs/wchar.h: For XPG4 include wctype.h.
* intl/dcgettext.c (find_msg): Initialize act to prevent warning.
* locale/setlocale.c (new_composite_name): Likewise for last_len.
* libio/stdio.h: Don't declare fclose_unlocked.
* sysdeps/posix/fpathconf.c: Handle _PC_FILESIZEBITS.
1998-07-08 Mark Kettenis <kettenis@phys.uva.nl>
* stdio/stdio.h: Add prototypes for fflush_unlocked,
getc_unlocked, getchar_unlocked, putc_unlocked, putchar_unlocked,
fgets_unlocked, fread_unlocked, fwrite_unlocked,
clearerr_unlocked, feof_unlocked, ferror_unlocked,
fileno_unlocked, flockfile, ftrylockfile, funlockfile.
[__USE_XOPEN && !__USE_GNU] Declare optarg, optind, opterr. Add
prototype for getopt.
* stdio/clearerr.c (clearerr_unlocked): Weak alias for clearerr.
* stdio/feof.c (feof_unlocked): Weak alias for feof.
* stdio/ferror.c (ferror_unlocked): Weak alias for ferror.
* stdio/fflush.c (fflush_unlocked): Weak alias for fflush.
* stdio/fgets.c (fgets_unlocked): Weak alias for fgets.
* stdio/fileno.c (fileno_unlocked): Weak alias for fileno.
* stdio/fputc.c (fputc_unlocked): Weak alias for fputc.
* stdio/fread.c (fread_unlocked): Weak alias for fread.
* stdio/fwrite.c (fwrite_unlocked): Weak alias for fwrite.
* stdio/getc.c (getchar_unlocked): Weak alias for getc.
* stdio/getchar.c (getchar_unlocked): Weak alias for getchar.
* stdio/putc.c (putc_unlocked): Weak alias for putc.
* stdio/putchar.c (putchar_unlocked): Weak alias for putchar.
* stdio/Versions [GLIBC_2.1]: Add clearerr_unlocked,
feof_unlocked, ferror_unlocked, fflush_unlocked, fgets_unlocked,
fileno_unlocked, fputc_unlocked, fread_unlocked, fwrite_unlocked,
getc_unlocked, getchar_unlocked, putc_unlocked and
putchar_unlocked.
* libio/Versions: Move flockfile, ftrylockfile and funlockfile
from here ...
* stdio-common/Versions: ... to here.
1998-07-09 Andreas Jaeger <aj@arthur.rhein-neckar.de>
* Makerules (versioning): Correct typo.
1998-04-08 20:06 Ulrich Drepper <drepper@cygnus.com>
* iconv/gconv_conf.c (__gconv_read_conf): Use __realpath not realpath.
* iconv/gconv_db.c: Use __ protected regex functions.
* iconv/gconv_simple.c: Use __mbsinit not mbsinit.
* posix/getopt_init.c: Use __getpid not getpid.
* posix/regex.c: Rename all global functions to start with __ and
make old names weak aliases.
* posix/regex.h: Adopt prototypes for this.
* stdlib/canonicalize.c: Define __realpath, make canonicalize_file_name
a weak alias and use __getcwd instead of getcwd.
* stdlib/stdlib.h: Declare __realpath and __canonicalize_file_name.
* stdlib/strtod.c: Use __btowc instead of btowc.
* stdlib/strtol.c: Likewise.
* sysdeps/libm-ieee754/s_matherr.c: Weaken definition of matherr.
* sysdeps/unix/sysv/linux/errlist.c: Make sure definitions of sys_nerr
and sys_errlist are weak.
* wcsmbs/btowc.c: Define function as __btowc and make btowc weak alias.
* wcsmbs/mbrtowc.c: Use __mbsinit not mbsinit.
* wcsmbs/mbsnrtowcs.c: Likewise.
* wcsmbs/mbsrtowcs.c: Likewise.
* wcsmbs/wcsnrtombs.c: Likewise.
* wcsmbs/wcsrtombs.c: Likewise.
* wcsmbs/mbsinit.c: Define function as __mbsinit and make mbsinit
weak alias.
* wcsmbs/wchar.h: Declare __btowc and __mbsinit.
* wctype/wctype.c: Define function as __wctype and make wctype
weak alias.
* wctype/wctype.h: Declare __wctype.