Commit Graph

268 Commits

Author SHA1 Message Date
Fredrik Lundh
bec95b9d88 rewrote the pattern.sub and pattern.subn methods in C
removed (conceptually flawed) getliteral helper; the new sub/subn code
uses a faster code path for literal replacement strings, but doesn't
(yet) look for literal patterns.

added STATE_OFFSET macro, and use it to convert state.start/ptr to
char indexes
2001-10-21 16:47:57 +00:00
Fredrik Lundh
971e78b55b rewrote the pattern.split method in C
also restored SRE Unicode support for 1.6/2.0/2.1
2001-10-20 17:48:46 +00:00
Fredrik Lundh
397a654791 SRE bug #441409:
compile should raise error for non-strings
SRE bug #432570, 448951:
    reset group after failed match

also bumped version number to 2.2.0
2001-10-18 19:30:16 +00:00
Fredrik Lundh
59b68656f8 fixed #449964: sre.sub raises an exception if the template contains a
\g<x> group reference followed by a character escape

(also restructured a few things on the way to fixing #449000)
2001-09-18 20:55:24 +00:00
Fredrik Lundh
21009b9c6f an SRE bugfix a day keeps Guido away...
#462270: sub-tle difference between pre.sub and sre.sub.  PRE ignored
an empty match at the previous location, SRE didn't.

also synced with Secret Labs "sreopen" codebase.
2001-09-18 18:47:09 +00:00
Sjoerd Mullender
89dfe9e292 Removed unreachable return to silence SGI compiler. 2001-08-30 14:37:07 +00:00
Martin v. Löwis
339d0f720e Patch #445762: Support --disable-unicode
- Do not compile unicodeobject, unicodectype, and unicodedata if Unicode is disabled
- check for Py_USING_UNICODE in all places that use Unicode functions
- disables unicode literals, and the builtin functions
- add the types.StringTypes list
- remove Unicode literals from most tests.
2001-08-17 18:39:25 +00:00
Barry Warsaw
214a0b1382 init_sre(): Plug a little leak reported by Insure. 2001-08-16 20:33:48 +00:00
Fredrik Lundh
2d96f11d07 map re.sub() to string.replace(), when possible 2001-07-08 13:26:57 +00:00
Fredrik Lundh
d89a2e7731 bug #416670
added copy/deepcopy support to SRE (still not enabled, since it's not
covered by the test suite)
2001-07-03 20:32:36 +00:00
Fredrik Lundh
df781e6a3f reapplied darryl gallion's minimizing repeat fix. I'm still not 100%
sure about this one, but test #133283 now works even with the fix in
place, and so does the test suite.  we'll see what comes up...
2001-07-02 19:54:28 +00:00
Fredrik Lundh
f71ae461bf pythonware repository roundtrip (untabification) 2001-07-02 17:04:48 +00:00
Fredrik Lundh
19af43d78a added martin's BIGCHARSET patch to SRE 2.1.1. martin reports 2x
speedups for certain unicode character ranges.
2001-07-02 16:58:38 +00:00
Fredrik Lundh
b0f05bdfd3 merged with pythonware's SRE 2.1.1 codebase 2001-07-02 16:42:49 +00:00
Fredrik Lundh
9c7eab82b3 SRE: made "copyright" string static, to avoid potential linking
conflicts.
2001-04-15 19:00:58 +00:00
Fredrik Lundh
b25e1ad253 sre 2.1b2 update:
- take locale into account for word boundary anchors (#410271)
- restored 2.0's *? behaviour (#233283, #408936 and others)
- speed up re.sub/re.subn
2001-03-22 15:50:10 +00:00
Tim Peters
5687ffe0c5 SF patch 404928: Support for next Cygwin gcc (2.95.2-8) 2001-02-28 16:44:18 +00:00
Fredrik Lundh
1c5aa6901f bumped SRE version number to 2.1. cleaned up and added 1.5.2
compatibility patches.
2001-01-16 07:37:30 +00:00
Fredrik Lundh
6f5cba68fc fixed a memory leak in pattern cleanup (patch #103248 by cgw) 2001-01-16 07:05:29 +00:00
Fredrik Lundh
b35ffc0417 added "magic" number to the _sre module, to avoid weird errors caused
by compiler/engine mismatches
2001-01-15 12:46:09 +00:00
Fredrik Lundh
fa25a7d51f -- don't use recursion for unbounded non-greedy repeat
(bugs #115903, #115696)

This is based on a patch by Darrel Gallion.  I'm not 100%
sure about this fix, but I haven't managed to come up with
any test case it cannot handle...
2001-01-14 23:55:55 +00:00
Fredrik Lundh
770617b23e SRE fixes for 2.1 alpha:
-- added some more docstrings
-- fixed typo in scanner class (#125531)
-- the multiline flag (?m) should't affect the \Z operator (#127259)
-- fixed non-greedy backtracking bug (#123769, #127259)
-- added sre.DEBUG flag (currently dumps the parsed pattern structure)
-- fixed a couple of glitches in groupdict (the #126587 memory leak
   had already been fixed by AMK)
2001-01-14 15:06:11 +00:00
Andrew M. Kuchling
48f224c877 Fix bug 126587: matchobject.groupdict() leaks memory because of a missing
DECREF
2000-12-22 14:39:10 +00:00
Fredrik Lundh
ebc37b28fa -- properly reset groups in findall (bug #117612)
-- fixed negative lookbehind to work correctly at the beginning
of the target string (bug #117242)

-- improved syntax check; you can no longer refer to a group
inside itself (bug #110866)
2000-10-28 19:30:41 +00:00
Fredrik Lundh
562586eb3a Accept keyword arguments for (most) pattern and match object
methods.  Closes buglet #115845.
2000-10-03 20:43:34 +00:00
Fredrik Lundh
65d4bc616a Fixed negative lookahead/lookbehind. Closes bug #115618. 2000-10-03 16:29:23 +00:00
Fred Drake
d5fadf75e4 Rationalize use of limits.h, moving the inclusion to Python.h.
Add definitions of INT_MAX and LONG_MAX to pyport.h.
Remove includes of limits.h and conditional definitions of INT_MAX
and LONG_MAX elsewhere.

This closes SourceForge patch #101659 and bug #115323.
2000-09-26 05:46:01 +00:00
Fredrik Lundh
5644b7fad1 - fixed yet another gcc -pedantic warning
- added experimental "expand" method to match objects
- don't use the buffer interface on unicode strings
2000-09-21 17:03:25 +00:00
Fredrik Lundh
510c97ba2f return -1 for undefined groups (as implemented in 1.5.2) instead of
None (as documented) from start/end/span.  closes bug #113254
2000-09-02 16:36:57 +00:00
Fredrik Lundh
e67d8e514f oops. accidentally reintroduced a memory leak. put the bugfix back. 2000-08-27 21:32:46 +00:00
Fredrik Lundh
33accc1f5c don't mistake memory errors (including reaching the recursion limit)
with success.  also, check return values from the mark functions.

this addresses (but doesn't really solve) bug #112693, and low-memory
problems reported by jack jansen.
2000-08-27 20:59:47 +00:00
Barry Warsaw
152fbe88e9 pattern_findall(): Plug small memory leak discovered by Insure.
PyList_Append() always incref's the inserted item.  Be sure to decref
it regardless of whether the append succeeds or fails.
2000-08-18 05:09:50 +00:00
Trent Mick
239548f37d The sre test suite currently overruns the stack on Win64, Linux64, and Monterey
(64-bit AIX) This is because the RECURSION_LIMIT is too low. This patch lowers
to recusion limit to 7500 such that the recusion check fires before a segfault.

Fredrik suggested/approved the fix in private email, modulo sre's recusion
limit checking no being necessary when PyOS_CheckStack is implemented for
Windows.
2000-08-16 22:29:55 +00:00
Fredrik Lundh
5810064476 -- changed findall to return empty strings instead of None
for undefined groups
2000-08-09 09:14:35 +00:00
Jack Jansen
0d15908629 Added a missing } in the USE_STACKCHECK code. 2000-08-07 21:02:50 +00:00
Fredrik Lundh
7898c3e685 -- reset marks if repeat_one tail doesn't match
(this should fix Sjoerd's xmllib problem)
-- added skip field to INFO header
-- changed compiler to generate charset INFO header
-- changed trace messages to support post-mortem analysis
2000-08-07 20:59:04 +00:00
Fredrik Lundh
18c2aa25a1 + if USE_STACKCHECK is defined, use PyOS_CheckStack to look
for excessive recursion.
2000-08-07 17:33:38 +00:00
Fredrik Lundh
96ab46529b -- added recursion limit (currently ~10,000 levels)
-- improved error messages
-- factored out SRE_COUNT; the same code is used by
   SRE_OP_REPEAT_ONE_TEMPLATE
-- minor cleanups
2000-08-03 16:29:50 +00:00
Fredrik Lundh
e186983842 final 0.9.8 updates:
-- added REPEAT_ONE operator
-- added ANY_ALL operator (used to represent "(?s).")
2000-08-01 22:47:49 +00:00
Fredrik Lundh
2f2c67d7e5 -- fixed width calculations for alternations
-- fixed literal check in branch operator
   (this broke test_tokenize, as reported by Mark Favas)
-- added REPEAT_ONE operator (still not enabled, though)
-- added some debugging stuff (maxlevel)
2000-08-01 21:05:41 +00:00
Fredrik Lundh
29c4ba9ada SRE 0.9.8: passes the entire test suite
-- reverted REPEAT operator to use "repeat context" strategy
   (from 0.8.X), but done right this time.
-- got rid of backtracking stack; use nested SRE_MATCH calls
   instead (should probably put it back again in 0.9.9 ;-)
-- properly reset state in scanner mode
-- don't use aggressive inlining by default
2000-08-01 18:20:07 +00:00
Fredrik Lundh
8a3ebf8ca8 -- SRE 0.9.6 sync. this includes:
+ added "regs" attribute
 + fixed "pos" and "endpos" attributes
 + reset "lastindex" and "lastgroup" in scanner methods
 + removed (?P#id) syntax; the "lastindex" and "lastgroup"
   attributes are now always set
 + removed string module dependencies in sre_parse
 + better debugging support in sre_parse
 + various tweaks to build under 1.5.2
2000-07-23 21:46:17 +00:00
Thomas Wouters
f3f33dcf03 Bunch of minor ANSIfications: 'void initfunc()' -> 'void initfunc(void)',
and a couple of functions that were missed in the previous batches. Not
terribly tested, but very carefully scrutinized, three times.

All these were found by the little findkrc.py that I posted to python-dev,
which means there might be more lurking. Cases such as this:

long
func(a, b)
	long a;
	long b; /* flagword */
{

and other cases where the last ; in the argument list isn't followed by a
newline and an opening curly bracket. Regexps to catch all are welcome, of
course ;)
2000-07-21 06:00:07 +00:00
Jeremy Hylton
03657cfdb0 replace PyXXX_Length calls with PyXXX_Size calls 2000-07-12 13:05:33 +00:00
Fredrik Lundh
2855290b84 maintenance release:
- reorganized some code to get rid of -Wall and -W4
  warnings

- fixed default argument handling for sub/subn/split
  methods (reported by Peter Schneider-Kamp).
2000-07-05 21:14:16 +00:00
Fredrik Lundh
72b82ba16d - fixed grouping error bug
- changed "group" operator to "groupref"
2000-07-03 21:31:48 +00:00
Fredrik Lundh
6f01398236 - added lookbehind support (?<=pattern), (?<!pattern).
the pattern must have a fixed width.

- got rid of array-module dependencies; the match pro-
  gram is now stored inside the pattern object, rather
  than in an extra string buffer.

- cleaned up a various of potential leaks, api abuses,
  and other minors in the engine module.

- use mal's new isalnum macro, rather than my own work-
  around.

- untabified test_sre.py.  seems like I removed a couple
  of trailing spaces in the process...
2000-07-03 18:44:21 +00:00
Fredrik Lundh
c2301730b8 - experimental: added two new attributes to the match object:
"lastgroup" is the name of the last matched capturing group,
  "lastindex" is the index of the same group.  if no group was
  matched, both attributes are set to None.

  the (?P#) feature will be removed in the next relase.
2000-07-02 22:25:39 +00:00
Fredrik Lundh
7cafe4d7e4 - actually enabled charset anchors in the engine (still not
used by the code generator)

- changed max repeat value in engine (to match earlier array fix)

- added experimental "which part matched?" mechanism to sre; see
  http://hem.passagen.se/eff/2000_07_01_bot-archive.htm#416954
  or python-dev for details.
2000-07-02 17:33:27 +00:00
Fredrik Lundh
3562f11764 -- use charset bitmaps where appropriate. this gives a 5-10%
speedup for some tests, including the python tokenizer.

-- added support for an optional charset anchor to the engine
   (currently unused by the code generator).

-- removed workaround for array module bug.
2000-07-02 12:00:07 +00:00
Fredrik Lundh
c13222cdff - fixed "{ in any other context" bug
- minor comment touchups in the C module
2000-07-01 23:49:14 +00:00
Fredrik Lundh
22d2546520 today's SRE update:
-- changed 1.6 to 2.0 in the file headers

-- fixed ISALNUM macro for the unicode locale.  this
   solution isn't perfect, but the best I can do with
   Python's current unicode database.
2000-07-01 17:50:59 +00:00
Fredrik Lundh
ef34bd2c0d -- changed $ to match before a trailing newline, even
if the multiline flag isn't given.
2000-06-30 21:40:20 +00:00
Fredrik Lundh
0640e1161f the mad patcher strikes again:
-- added pickling support (only works if sre is imported)

-- fixed wordsize problems in engine
   (instead of casting literals down to the character size,
   cast characters up to the literal size (same as the code
   word size).  this prevents false hits when you're matching
   a unicode pattern against an 8-bit string. (unfortunately,
   this broke another test, but I think the test should be
   changed in this case; more on that on python-dev)

-- added sre.purge function
   (unofficial, clears the cache)
2000-06-30 13:55:15 +00:00
Fredrik Lundh
43b3b49b5a - fixed lookahead assertions (#10, #11, #12)
- untabified sre_constants.py
2000-06-30 10:41:31 +00:00
Fredrik Lundh
df02d0b3f0 - fixed default value handling in group/groupdict
- added test suite
2000-06-30 07:08:20 +00:00
Fredrik Lundh
01016fe972 - fixed split behaviour on empty matches
- fixed compiler problems when using locale/unicode flags

- fixed group/octal code parsing in sub/subn templates
2000-06-30 00:27:46 +00:00
Fredrik Lundh
29c08beab0 still trying to figure out how to fix the remaining
group reset problem.  in the meantime, I added some
optimizations:

- added "inline" directive to LOCAL

  (this assumes that AC_C_INLINE does what it's
  supposed to do).  to compile SRE on a non-unix
  platform that doesn't support inline, you have
  to add a "#define inline" somewhere...

- added code to generate a SRE_OP_INFO primitive

- added code to do fast prefix search

  (enabled by the USE_FAST_SEARCH define; default
  is on, in this release)
2000-06-29 23:33:12 +00:00
Fredrik Lundh
8094611eb8 - fixed another split problem
(those semantics are weird...)

- got rid of $Id$'s (for the moment, at least).  in other
  words, there should be no more "empty" checkins.

- internal: some minor cleanups.
2000-06-29 18:03:25 +00:00
Fredrik Lundh
be2211e940 - fixed split
(test_sre still complains about split, but that's caused by
  the group reset bug, not split itself)

- added more mark slots
  (should be dynamically allocated, but 100 is better than 32.
  and checking for the upper limit is better than overwriting
  the memory ;-)

- internal: renamed the cursor helper class

- internal: removed some bloat from sre_compile
2000-06-29 16:57:40 +00:00
Fredrik Lundh
b389df3402 - renamed "tolower" hook (it happened to work with
my compiler, but not on guido's box...)
2000-06-29 12:48:37 +00:00
Fredrik Lundh
75f2d675ed - last patch broke parse_template; fixed by changing some
tests in sre_patch back to previous version

- fixed return value from findall

- renamed a bunch of functions inside _sre (way too
  many leading underscores...)

</F>
2000-06-29 11:34:28 +00:00
Fredrik Lundh
6c68dc7b1a - removed "alpha only" licensing restriction
- removed some hacks that worked around 1.6 alpha bugs
- removed bogus test code from sre_parse
2000-06-29 10:34:56 +00:00
Fredrik Lundh
436c3d58a2 towards 1.6b1 2000-06-29 08:58:44 +00:00
Jeremy Hylton
b1aa19515f Fredrik Lundh: here's the 96.6% version of SRE 2000-06-01 17:39:12 +00:00
Guido van Rossum
b18618dab7 Vladimir Marangozov's long-awaited malloc restructuring.
For more comments, read the patches@python.org archives.
For documentation read the comments in mymalloc.h and objimpl.h.

(This is not exactly what Vladimir posted to the patches list; I've
made a few changes, and Vladimir sent me a fix in private email for a
problem that only occurs in debug mode.  I'm also holding back on his
change to main.c, which seems unnecessary to me.)
2000-05-03 23:44:39 +00:00
Guido van Rossum
29530886af Remove CRLF line endings.
Fredrik Lundh: add two missing casts.
2000-04-10 17:06:55 +00:00
Guido van Rossum
b700df9824 Adding Fredrik Lundh's _sre.c module and its header files.
NOTE: THIS IS VERY ROUGH ALPHA CODE!
2000-03-31 14:59:30 +00:00