Commit Graph

41 Commits

Author SHA1 Message Date
Ezio Melotti
153d97b24e #20288: merge with 3.3. 2014-02-01 21:22:26 +02:00
Ezio Melotti
f27b9a741a #20288: fix handling of invalid numeric charrefs in HTMLParser. 2014-02-01 21:21:01 +02:00
Ezio Melotti
95401c5f6b #13633: Added a new convert_charrefs keyword arg to HTMLParser that, when True, automatically converts all character references. 2013-11-23 19:52:05 +02:00
Ezio Melotti
f6de9eb2bb #19688: add back and deprecate the internal HTMLParser.unescape() method. 2013-11-22 05:49:29 +02:00
Ezio Melotti
4a9ee26750 #2927: Added the unescape() function to the html module. 2013-11-19 20:28:45 +02:00
Ezio Melotti
b7038817fe #19480: merge with 3.3. 2013-11-07 18:35:27 +02:00
Ezio Melotti
7165d8b9ba #19480: HTMLParser now accepts all valid start-tag names as defined by the HTML5 standard. 2013-11-07 18:33:24 +02:00
Ezio Melotti
88ebfb129b #15114: The html.parser module now raises a DeprecationWarning when the strict argument of HTMLParser or the HTMLParser.error method are used. 2013-11-02 17:08:24 +02:00
Ezio Melotti
4603487dc9 #18020: improve html.escape speed by an order of magnitude. Patch by Matt Bryant. 2013-07-07 11:11:24 +02:00
Ezio Melotti
f6ca26fbff #17802: merge with 3.3. 2013-05-01 16:20:00 +03:00
Ezio Melotti
8e596a765c #17802: Fix an UnboundLocalError in html.parser. Initial tests by Thomas Barlow. 2013-05-01 16:18:25 +03:00
Ezio Melotti
1698babd1b #14679: add an __all__ (that contains only HTMLParser) to html.parser. 2013-05-01 16:09:34 +03:00
Ezio Melotti
e6e96eea51 #16245: Fix the value of a few entities in html.entities.html5. 2012-10-23 15:51:27 +02:00
Ezio Melotti
518dbfd7b5 Reorder html.entities.html5 entities to make updates easier. Patch by Iuliia Proskurnia. 2012-10-23 14:45:58 +02:00
Ezio Melotti
46495182d0 #15156: HTMLParser now uses the new "html.entities.html5" dictionary. 2012-06-24 22:02:56 +02:00
Ezio Melotti
dc44f55cc9 #11113: add a new "html5" dictionary containing the named character references defined by the HTML5 standard and the equivalent Unicode character(s) to the html.entities module. 2012-06-24 04:37:41 +02:00
Ezio Melotti
3861d8b271 #15114: the strict mode of HTMLParser and the HTMLParseError exception are deprecated now that the parser is able to parse invalid markup. 2012-06-23 15:27:51 +02:00
Ezio Melotti
0780b6bc58 #14538: HTMLParser can now parse correctly start tags that contain a bare /. 2012-04-18 19:18:22 -06:00
Ezio Melotti
29877e8e04 HTMLParser is now able to handle slashes in the start tag. 2012-02-21 09:25:00 +02:00
Ezio Melotti
e31ddedb0e Fix an index and clean up comments. 2012-02-13 20:20:00 +02:00
Ezio Melotti
f4ab491901 Improve handling of declarations in HTMLParser. 2012-02-13 15:50:37 +02:00
Ezio Melotti
5211ffe4df #13993: HTMLParser is now able to handle broken end tags when strict=False. 2012-02-13 11:24:50 +02:00
Ezio Melotti
fa3702dc28 #13960: HTMLParser is now able to handle broken comments when strict=False. 2012-02-10 10:45:44 +02:00
Ezio Melotti
15cb489234 #13358: HTMLParser now calls handle_data only once for each CDATA. 2011-11-18 18:01:49 +02:00
Ezio Melotti
c2fe57762b #1745761, #755670, #13357, #12629, #1200313: improve attribute handling in HTMLParser. 2011-11-14 18:53:33 +02:00
Ezio Melotti
7de56f6a04 #670664: Fix HTMLParser to correctly handle the content of `<script>...</script> and <style>...</style>`. 2011-11-01 14:12:22 +02:00
Ezio Melotti
f50ffa94ab #13273: fix a bug that prevented HTMLParser to properly detect some tags when strict=False. 2011-10-28 13:21:09 +03:00
Senthil Kumaran
d71bbf9fd5 Fix issue12938 - Update the docstring of html.escape. Include the information on single quote. 2011-09-13 07:14:13 +08:00
Ezio Melotti
d9e0b068af #12888: Fix a bug in HTMLParser.unescape that prevented it to escape more than 128 entities. Patch by Peter Otten. 2011-09-05 17:11:06 +03:00
Éric Araujo
51b7aedadd Merge 3.1 2011-05-25 18:13:49 +02:00
Éric Araujo
39f180bb1f Fix display of html.parser.HTMLParser.feed docstring 2011-05-04 15:55:47 +02:00
Ezio Melotti
2e3607c1e7 #7311: fix html.parser to accept non-ASCII attribute values. 2011-04-07 22:03:31 +03:00
Senthil Kumaran
6c85838489 Merged revisions 87542 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/branches/py3k

........
  r87542 | senthil.kumaran | 2010-12-28 23:55:16 +0800 (Tue, 28 Dec 2010) | 3 lines

  Fix Issue10759 - html.parser.unescape() fails on HTML entities with incorrect syntax
........
2010-12-28 16:10:56 +00:00
Senthil Kumaran
164540fee1 Fix Issue10759 - html.parser.unescape() fails on HTML entities with incorrect syntax 2010-12-28 15:55:16 +00:00
R. David Murray
b579dba119 #1486713: Add a tolerant mode to HTMLParser.
The motivation for adding this option is that the the functionality it
provides used to be provided by sgmllib in Python2, and was used by,
for example, BeautifulSoup.  Without this option, the Python3 version
of BeautifulSoup and the many programs that use it are crippled.

The original patch was by 'kxroberto'.  I modified it heavily but kept his
heuristics and test.  I also added additional heuristics to fix #975556,
#1046092, and part of #6191.  This patch should be completely backward
compatible:  the behavior with the default strict=True is unchanged.
2010-12-03 04:06:39 +00:00
Georg Brandl
1f7fffb308 #2830: add html.escape() helper and move cgi.escape() uses in the standard library to it. It defaults to quote=True and also escapes single quotes, which makes casual use safer. The cgi.escape() interface is not touched, but emits a (silent) PendingDeprecationWarning. 2010-10-15 15:57:45 +00:00
Victor Stinner
30c223cff5 Merged revisions 81504 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/branches/py3k

................
  r81504 | victor.stinner | 2010-05-24 23:46:25 +0200 (lun., 24 mai 2010) | 13 lines

  Recorded merge of revisions 81500-81501 via svnmerge from
  svn+ssh://pythondev@svn.python.org/python/trunk

  ........
    r81500 | victor.stinner | 2010-05-24 23:33:24 +0200 (lun., 24 mai 2010) | 2 lines

    Issue #6662: Fix parsing of malformatted charref (&#bad;)
  ........
    r81501 | victor.stinner | 2010-05-24 23:37:28 +0200 (lun., 24 mai 2010) | 2 lines

    Add the author of the last fix (Issue #6662)
  ........
................
2010-05-24 21:48:07 +00:00
Victor Stinner
e021f4b206 Recorded merge of revisions 81500-81501 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r81500 | victor.stinner | 2010-05-24 23:33:24 +0200 (lun., 24 mai 2010) | 2 lines

  Issue #6662: Fix parsing of malformatted charref (&#bad;)
........
  r81501 | victor.stinner | 2010-05-24 23:37:28 +0200 (lun., 24 mai 2010) | 2 lines

  Add the author of the last fix (Issue #6662)
........
2010-05-24 21:46:25 +00:00
Antoine Pitrou
fd036451bf #2834: Change re module semantics, so that str and bytes mixing is forbidden,
and str (unicode) patterns get full unicode matching by default. The re.ASCII
flag is also introduced to ask for ASCII matching instead.
2008-08-19 17:56:33 +00:00
Mark Dickinson
f64dcf3ce0 Change test_htmlparser to reflect the HTMLParser -> html.parser
rename in r63439.

Also fix one occurrence of unichr() in html.parser.
2008-05-21 13:51:18 +00:00
Fred Drake
3c50ea4303 rename HTMLParser to html.parser and htmlentitydefs to html.entities;
includes merge of trunk revision 63432
2008-05-17 22:02:32 +00:00