Commit Graph

33 Commits

Author SHA1 Message Date
Antoine Pitrou
fd036451bf #2834: Change re module semantics, so that str and bytes mixing is forbidden,
and str (unicode) patterns get full unicode matching by default. The re.ASCII
flag is also introduced to ask for ASCII matching instead.
2008-08-19 17:56:33 +00:00
Antoine Pitrou
22628c4d6a #3231: re.compile fails with some bytes patterns 2008-07-22 17:53:22 +00:00
Gustavo Niemeyer
be733ee7fb More work on bug #672491 and patch #712900.
I've applied a modified version of Greg Chapman's patch. I've included
the fixes without introducing the reorganization mentioned, for the sake
of stability. Also, the second fix mentioned in the patch don't fix the
mentioned problem anymore, because of the change introduced by patch
#720991 (by Greg as well). The new fix wasn't complicated though, and is
included as well.

As a note. It seems that there are other places that require the
"protection" of LASTMARK_SAVE()/LASTMARK_RESTORE(), and are just waiting
for someone to find how to break them. Particularly, I belive that every
recursion of SRE_MATCH() should be protected by these macros. I won't
do that right now since I'm not completely sure about this, and we don't
have much time for testing until the next release.
2003-04-20 07:35:44 +00:00
Guido van Rossum
577fb5a1db Fix from SF patch #633359 by Greg Chapman for SF bug #610299:
The problem is in sre_compile.py: the call to
    _compile_charset near the end of _compile_info forgets to
    pass in the flags, so that the info charset is not compiled
    with re.U. (The info charset is used when searching to find
    the first character at which a match could start; it is not
    generated for patterns beginning with a repeat like '\w{1}'.)
2003-02-24 01:18:35 +00:00
Tim Peters
f2715e0764 Whitespace normalization. 2003-02-19 02:35:07 +00:00
Neal Norwitz
bb1844148a SF patch #682432, add lookbehind tests 2003-02-13 03:01:18 +00:00
Gustavo Niemeyer
4e7be06a65 Fixed bug #470582, using a modified version of patch #527371,
from Greg Chapman.

* Modules/_sre.c
  (lastmark_restore): New function, implementing algorithm to restore
  a state to a given lastmark. In addition to the similar algorithm used
  in a few places of SRE_MATCH, restore lastindex when restoring lastmark.
  (SRE_MATCH): Replace lastmark inline restoring by lastmark_restore(),
  function. Also include it where missing. In SRE_OP_MARK, set lastindex
  only if i > lastmark.

* Lib/test/re_tests.py
* Lib/test/test_sre.py
  Included regression tests for the fixed bugs.

* Misc/NEWS
  Mention fixes.
2002-11-06 14:06:53 +00:00
Fredrik Lundh
82b230732f bug #133283, #477728, #483789, #490573
backed out of broken minimal repeat patch from July

also fixed a couple of minor potential resource leaks in pattern_subx
(Guido had already fixed the big one)
2001-12-09 16:13:15 +00:00
Fredrik Lundh
df781e6a3f reapplied darryl gallion's minimizing repeat fix. I'm still not 100%
sure about this one, but test #133283 now works even with the fix in
place, and so does the test suite.  we'll see what comes up...
2001-07-02 19:54:28 +00:00
Fredrik Lundh
b25e1ad253 sre 2.1b2 update:
- take locale into account for word boundary anchors (#410271)
- restored 2.0's *? behaviour (#233283, #408936 and others)
- speed up re.sub/re.subn
2001-03-22 15:50:10 +00:00
Fredrik Lundh
c0c7ee3a65 detect attempts to repeat anchors (fixes bug #130748) 2001-02-18 21:04:48 +00:00
Fredrik Lundh
2e24044f9d from the really-stupid-bug department: uppercase literals should match
uppercase strings also when the IGNORECASE flag is set (bug #128899)

(also added test cases for recently fixed bugs to the regression suite
-- or in other words, check in re_tests.py too...)
2001-01-15 18:28:14 +00:00
Fredrik Lundh
13ac9926ac Fixed too ambitious "nothing to repeat" check. Closes bug #114033. 2000-10-07 17:38:23 +00:00
Fredrik Lundh
025468d246 SRE didn't handle character category followed by hyphen inside a
character class.  Fix provided by Andrew Kuchling.  Closes bug
#116251.
2000-10-07 10:16:19 +00:00
Fredrik Lundh
d11b5e54f0 Recompile pattern if (?x) flag was found inside the pattern during the
first scan.  Closes bug #115040.
2000-10-03 19:22:26 +00:00
Fredrik Lundh
65d4bc616a Fixed negative lookahead/lookbehind. Closes bug #115618. 2000-10-03 16:29:23 +00:00
Fredrik Lundh
19f977ba40 - don't hang if group id is followed by whitespace (closes bug #114660) 2000-09-24 14:46:23 +00:00
Fredrik Lundh
0c4fdbaee8 closes bug #112468 (and all the other bugs that surfaced when
I fixed the a bug in the regression test harness...)
2000-08-31 22:57:55 +00:00
Fredrik Lundh
8e6d571a7c -- enabled some temporarily disabled RE tests
-- added basic unicode tests to test_re
-- added test case for Sjoerd's xmllib problem to re_tests
2000-08-08 17:06:53 +00:00
Fredrik Lundh
2643b55a77 -- whitespace cleanup (real changes coming in next checkin) 2000-08-08 16:52:51 +00:00
Guido van Rossum
8430c583da AMK's latest 1998-04-03 21:47:12 +00:00
Guido van Rossum
dfa6790bd6 New re version from AMK 1997-12-08 17:12:06 +00:00
Guido van Rossum
cf00505325 Added tests for \b, \B (AMK). 1997-08-15 15:44:58 +00:00
Guido van Rossum
95e8053a9f 1.5a3 prerelease 1 from AMK 1997-08-13 22:34:14 +00:00
Guido van Rossum
06c0ec94e4 Several additions from Jeffrey. 1997-07-17 22:36:39 +00:00
Guido van Rossum
a0e4c1bffc Jeffrey's latest -- seems to solve most problems! 1997-07-17 14:52:48 +00:00
Guido van Rossum
9ddd9dad80 Fixed a syntax error caused by a bad line in the Perl source. 1997-07-15 19:01:04 +00:00
Guido van Rossum
16bd0ff16a Merged my changes in, and added all converted Perl tests. 1997-07-15 18:45:20 +00:00
Guido van Rossum
337c6d41d4 Jeffrey's version 1997-07-15 18:42:58 +00:00
Guido van Rossum
23b8d4c15e Tweak re_tests and test_re to differentiate between
groups that have no value and groups that are out of bounds.
1997-07-15 15:49:52 +00:00
Guido van Rossum
847ed4afb5 More tweaks; re.py is nearly there... 1997-07-15 15:40:57 +00:00
Guido van Rossum
04a1d74229 Jeffrey's newest 1997-07-15 14:38:13 +00:00
Guido van Rossum
8e0ce30ce4 test suite for re.py 1997-07-11 19:34:44 +00:00