Commit Graph

51 Commits

Author SHA1 Message Date
Guido van Rossum
8ca162f417 Partial introduction of bools where appropriate. 2002-04-07 06:36:23 +00:00
Neal Norwitz
f7fdedc320 SF #515022 remove unused variable 2002-02-11 18:18:29 +00:00
Fredrik Lundh
82b230732f bug #133283, #477728, #483789, #490573
backed out of broken minimal repeat patch from July

also fixed a couple of minor potential resource leaks in pattern_subx
(Guido had already fixed the big one)
2001-12-09 16:13:15 +00:00
Tim Peters
7533587d43 Improved error msg when a symbolic group name is redefined. Added docs
and NEWS.  Bugfix candidate?  That's a dilemma for Anthony <wink>:  /F
did fix a longstanding bug here, but the fix can cause code to raise an
exception that previously worked by accident.
2001-11-03 19:35:43 +00:00
Fredrik Lundh
8a0232d84f SF bug #476912: flag repeated use of the same groupname as
the error it really is (and always has been)
2001-11-02 13:59:51 +00:00
Fredrik Lundh
59b68656f8 fixed #449964: sre.sub raises an exception if the template contains a
\g<x> group reference followed by a character escape

(also restructured a few things on the way to fixing #449000)
2001-09-18 20:55:24 +00:00
Fred Drake
b8f2274985 Added docstrings by Neal Norwitz. This closes SF bug #450980. 2001-09-04 19:10:20 +00:00
Fredrik Lundh
b25e1ad253 sre 2.1b2 update:
- take locale into account for word boundary anchors (#410271)
- restored 2.0's *? behaviour (#233283, #408936 and others)
- speed up re.sub/re.subn
2001-03-22 15:50:10 +00:00
Fredrik Lundh
c0c7ee3a65 detect attempts to repeat anchors (fixes bug #130748) 2001-02-18 21:04:48 +00:00
Fredrik Lundh
f2989b22ff - restored 1.5.2 compatibility (sorry, eric)
- removed __all__ cruft from internal modules (sorry, skip)
- don't assume ASCII for string escapes (sorry, per)
2001-02-18 12:05:16 +00:00
Skip Montanaro
0de65807e6 bunch more __all__ lists
also modified check_all function to suppress all warnings since they aren't
relevant to what this test is doing (allows quiet checking of regsub, for
instance)
2001-02-15 22:15:14 +00:00
Eric S. Raymond
fc170b1fd5 String method conversion. 2001-02-09 11:51:27 +00:00
Fredrik Lundh
1c5aa6901f bumped SRE version number to 2.1. cleaned up and added 1.5.2
compatibility patches.
2001-01-16 07:37:30 +00:00
Fredrik Lundh
470ea5ab94 SRE: stricter pattern syntax checking (covers parts of bug #115900) 2001-01-14 21:00:44 +00:00
Fredrik Lundh
770617b23e SRE fixes for 2.1 alpha:
-- added some more docstrings
-- fixed typo in scanner class (#125531)
-- the multiline flag (?m) should't affect the \Z operator (#127259)
-- fixed non-greedy backtracking bug (#123769, #127259)
-- added sre.DEBUG flag (currently dumps the parsed pattern structure)
-- fixed a couple of glitches in groupdict (the #126587 memory leak
   had already been fixed by AMK)
2001-01-14 15:06:11 +00:00
Fredrik Lundh
ebc37b28fa -- properly reset groups in findall (bug #117612)
-- fixed negative lookbehind to work correctly at the beginning
of the target string (bug #117242)

-- improved syntax check; you can no longer refer to a group
inside itself (bug #110866)
2000-10-28 19:30:41 +00:00
Fredrik Lundh
13ac9926ac Fixed too ambitious "nothing to repeat" check. Closes bug #114033. 2000-10-07 17:38:23 +00:00
Fredrik Lundh
025468d246 SRE didn't handle character category followed by hyphen inside a
character class.  Fix provided by Andrew Kuchling.  Closes bug
#116251.
2000-10-07 10:16:19 +00:00
Fredrik Lundh
d11b5e54f0 Recompile pattern if (?x) flag was found inside the pattern during the
first scan.  Closes bug #115040.
2000-10-03 19:22:26 +00:00
Fredrik Lundh
19f977ba40 - don't hang if group id is followed by whitespace (closes bug #114660) 2000-09-24 14:46:23 +00:00
Fredrik Lundh
143328ba63 -- tightened up parsing of octal numbers
-- improved the SRE test harness: don't use asserts, test a few more
   things (including more boundary conditions)
2000-09-02 11:03:34 +00:00
Tim Peters
17289426e2 SourceForge patch 101396, by an anonymous friend.
"sre_parse.py missing '7' in DIGITS"
2000-09-02 07:44:32 +00:00
Fredrik Lundh
0c4fdbaee8 closes bug #112468 (and all the other bugs that surfaced when
I fixed the a bug in the regression test harness...)
2000-08-31 22:57:55 +00:00
Fredrik Lundh
7898c3e685 -- reset marks if repeat_one tail doesn't match
(this should fix Sjoerd's xmllib problem)
-- added skip field to INFO header
-- changed compiler to generate charset INFO header
-- changed trace messages to support post-mortem analysis
2000-08-07 20:59:04 +00:00
Fredrik Lundh
e186983842 final 0.9.8 updates:
-- added REPEAT_ONE operator
-- added ANY_ALL operator (used to represent "(?s).")
2000-08-01 22:47:49 +00:00
Fredrik Lundh
2f2c67d7e5 -- fixed width calculations for alternations
-- fixed literal check in branch operator
   (this broke test_tokenize, as reported by Mark Favas)
-- added REPEAT_ONE operator (still not enabled, though)
-- added some debugging stuff (maxlevel)
2000-08-01 21:05:41 +00:00
Fredrik Lundh
29c4ba9ada SRE 0.9.8: passes the entire test suite
-- reverted REPEAT operator to use "repeat context" strategy
   (from 0.8.X), but done right this time.
-- got rid of backtracking stack; use nested SRE_MATCH calls
   instead (should probably put it back again in 0.9.9 ;-)
-- properly reset state in scanner mode
-- don't use aggressive inlining by default
2000-08-01 18:20:07 +00:00
Fredrik Lundh
8a3ebf8ca8 -- SRE 0.9.6 sync. this includes:
+ added "regs" attribute
 + fixed "pos" and "endpos" attributes
 + reset "lastindex" and "lastgroup" in scanner methods
 + removed (?P#id) syntax; the "lastindex" and "lastgroup"
   attributes are now always set
 + removed string module dependencies in sre_parse
 + better debugging support in sre_parse
 + various tweaks to build under 1.5.2
2000-07-23 21:46:17 +00:00
Fredrik Lundh
72b82ba16d - fixed grouping error bug
- changed "group" operator to "groupref"
2000-07-03 21:31:48 +00:00
Fredrik Lundh
6f01398236 - added lookbehind support (?<=pattern), (?<!pattern).
the pattern must have a fixed width.

- got rid of array-module dependencies; the match pro-
  gram is now stored inside the pattern object, rather
  than in an extra string buffer.

- cleaned up a various of potential leaks, api abuses,
  and other minors in the engine module.

- use mal's new isalnum macro, rather than my own work-
  around.

- untabified test_sre.py.  seems like I removed a couple
  of trailing spaces in the process...
2000-07-03 18:44:21 +00:00
Fredrik Lundh
c2301730b8 - experimental: added two new attributes to the match object:
"lastgroup" is the name of the last matched capturing group,
  "lastindex" is the index of the same group.  if no group was
  matched, both attributes are set to None.

  the (?P#) feature will be removed in the next relase.
2000-07-02 22:25:39 +00:00
Fredrik Lundh
7cafe4d7e4 - actually enabled charset anchors in the engine (still not
used by the code generator)

- changed max repeat value in engine (to match earlier array fix)

- added experimental "which part matched?" mechanism to sre; see
  http://hem.passagen.se/eff/2000_07_01_bot-archive.htm#416954
  or python-dev for details.
2000-07-02 17:33:27 +00:00
Fredrik Lundh
3562f11764 -- use charset bitmaps where appropriate. this gives a 5-10%
speedup for some tests, including the python tokenizer.

-- added support for an optional charset anchor to the engine
   (currently unused by the code generator).

-- removed workaround for array module bug.
2000-07-02 12:00:07 +00:00
Fredrik Lundh
c13222cdff - fixed "{ in any other context" bug
- minor comment touchups in the C module
2000-07-01 23:49:14 +00:00
Fredrik Lundh
22d2546520 today's SRE update:
-- changed 1.6 to 2.0 in the file headers

-- fixed ISALNUM macro for the unicode locale.  this
   solution isn't perfect, but the best I can do with
   Python's current unicode database.
2000-07-01 17:50:59 +00:00
Fredrik Lundh
55a4f4a528 - fixed code generation error in multiline mode
- fixed parser flag propagation (of all stupid bugs...)
2000-06-30 22:37:31 +00:00
Fredrik Lundh
4ccea94152 - reverted to "\x is binary byte"
- removed evil tabs from sre_parse and sre_compile
2000-06-30 18:39:20 +00:00
Fredrik Lundh
0640e1161f the mad patcher strikes again:
-- added pickling support (only works if sre is imported)

-- fixed wordsize problems in engine
   (instead of casting literals down to the character size,
   cast characters up to the literal size (same as the code
   word size).  this prevents false hits when you're matching
   a unicode pattern against an 8-bit string. (unfortunately,
   this broke another test, but I think the test should be
   changed in this case; more on that on python-dev)

-- added sre.purge function
   (unofficial, clears the cache)
2000-06-30 13:55:15 +00:00
Fredrik Lundh
43b3b49b5a - fixed lookahead assertions (#10, #11, #12)
- untabified sre_constants.py
2000-06-30 10:41:31 +00:00
Fredrik Lundh
b71624e698 - added support for (?P=name)
(closes #3 and #7 from the status report)
2000-06-30 09:13:06 +00:00
Fredrik Lundh
90a0791322 - pedantic: make sure "python -t" doesn't complain... 2000-06-30 07:50:59 +00:00
Fredrik Lundh
01016fe972 - fixed split behaviour on empty matches
- fixed compiler problems when using locale/unicode flags

- fixed group/octal code parsing in sub/subn templates
2000-06-30 00:27:46 +00:00
Fredrik Lundh
8094611eb8 - fixed another split problem
(those semantics are weird...)

- got rid of $Id$'s (for the moment, at least).  in other
  words, there should be no more "empty" checkins.

- internal: some minor cleanups.
2000-06-29 18:03:25 +00:00
Fredrik Lundh
4781b07201 - make sure group names are valid identifiers
(closes the "SRE: symbolic reference" bug)
2000-06-29 12:38:45 +00:00
Fredrik Lundh
75f2d675ed - last patch broke parse_template; fixed by changing some
tests in sre_patch back to previous version

- fixed return value from findall

- renamed a bunch of functions inside _sre (way too
  many leading underscores...)

</F>
2000-06-29 11:34:28 +00:00
Fredrik Lundh
6c68dc7b1a - removed "alpha only" licensing restriction
- removed some hacks that worked around 1.6 alpha bugs
- removed bogus test code from sre_parse
2000-06-29 10:34:56 +00:00
Fredrik Lundh
436c3d58a2 towards 1.6b1 2000-06-29 08:58:44 +00:00
Andrew M. Kuchling
815d5b934b Patch from /F: this patch brings the CVS version of SRE in sync with the
latest public snapshot.""
2000-06-09 14:08:07 +00:00
Guido van Rossum
b81e70ebdb Fredrik Lundh: new snapshot. Mostly reindented.
This one should work with unicode expressions, and compile
a bit more silently.
2000-04-10 17:10:48 +00:00
Andrew M. Kuchling
e3ba931aa4 This patch looks large, but it just deletes the ^M characters and
untabifies the files.  No actual code changes were made.
2000-04-02 05:22:30 +00:00