Commit Graph

62 Commits

Author SHA1 Message Date
Dong-hee Na
157aef79b0
gh-95813: Improve HTMLParser from the view of inheritance (#95874)
* gh-95813: Improve HTMLParser from the view of inheritance

* gh-95813: Add unittest

* Address code review
2022-08-18 13:16:33 +02:00
Ezio Melotti
f28ec34c5c
gh-82927: Update files related to HTML entities. (GH-92504) 2022-06-21 22:03:12 +02:00
slateny
d707d073be
Add source for character mappings (#92014) 2022-05-06 12:28:09 +02:00
Alberto Mardegan
562c0d7398
bpo-45421: Remove dead code from html.parser (GH-28847)
Support for HtmlParserError was removed back in 2014 with commit
73a4359eb0, however this small block was
missed.
2021-10-12 10:12:21 -07:00
Christian Clauss
745c9d9dfc
Fix typos in the Lib directory (GH-28775)
Fix typos in the Lib directory as identified by codespell.

Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
2021-10-06 16:13:48 -07:00
Karl Dubost
9eb11a139f
bpo-41748: Handles unquoted attributes with commas (#24072)
* bpo-41748: Adds tests for unquoted attributes with comma

* bpo-41748: Handles unquoted attributes with comma

* bpo-41748: Addresses review comments

* bpo-41748: Addresses review comments

* Adds more test cases
* Simplifies the regex for handling spaces

* bpo-41748: Moves attributes tests under the right class

* bpo-41748: Addresses review about duplicate attributes

* bpo-41748: Adds NEWS.d entry for this patch
2021-02-01 21:32:50 +01:00
Inada Naoki
fae0ed5099
bpo-37328: remove deprecated HTMLParser.unescape (GH-14186)
It is deprecated since Python 3.4.
2019-08-27 11:48:06 +09:00
Motoki Naruse
3358d589fb bpo-30629: Remove second call of str.lower() in html.parser.parse_endtag. (#2099)
elem is the result of .lower() 6 lines above the handle_endtag call.
Patch by Motoki Naruse
2017-06-16 21:15:25 -04:00
Serhiy Storchaka
c842efc6ae Revert "Fixed a typo in the HTMLParser.feed docstrings" (#1771)
* Revert "Fixed a typo in the HTMLParser.feed docstrings. The docstring started with an 'r', like a The docstring was correct. I read the patch in opposite direction, as *adding* the "r" prefix.
This reverts commit 5ba185039f.
2017-05-24 07:20:45 +03:00
Jani Šumak
5ba185039f Fixed a typo in the HTMLParser.feed docstrings. The docstring started with an 'r', like a rawstring. (#1759) 2017-05-23 16:40:54 +03:00
R David Murray
44b548dda8 #27364: fix "incorrect" uses of escape character in the stdlib.
And most of the tools.

Patch by Emanual Barry, reviewed by me, Serhiy Storchaka, and
Martin Panter.
2016-09-08 13:59:53 -04:00
Martin Panter
46f50726a0 Issue #27076: Doc, comment and tests spelling fixes
Most fixes to Doc/ and Lib/ directories by Ville Skyttä.
2016-05-26 05:35:26 +00:00
Martin Panter
4827e488a4 Merge spelling fixes from 3.4 into 3.5 2015-10-31 12:16:18 +00:00
Martin Panter
1f1177d69a Fix some spelling errors in documentation and code comments 2015-10-31 11:48:53 +00:00
Ezio Melotti
20a2c6482e #23144: merge with 3.4. 2015-09-06 21:44:45 +03:00
Ezio Melotti
6f2bb98966 #23144: Make sure that HTMLParser.feed() returns all the data, even when convert_charrefs is True. 2015-09-06 21:38:06 +03:00
Serhiy Storchaka
82e07b92b3 Issue #23181: More "codepoint" -> "code point". 2015-01-18 11:33:31 +02:00
Serhiy Storchaka
d3faf43f9b Issue #23181: More "codepoint" -> "code point". 2015-01-18 11:28:37 +02:00
Ezio Melotti
6fc16d81af #21047: set the default value for the *convert_charrefs* argument of HTMLParser to True. Patch by Berker Peksag. 2014-08-02 18:36:12 +03:00
Ezio Melotti
11bec7a1b8 Add an __all__ to html.entities. 2014-08-02 15:15:02 +03:00
Ezio Melotti
73a4359eb0 #15114: the strict mode and argument of HTMLParser, HTMLParser.error, and the HTMLParserError exception have been removed. 2014-08-02 14:10:30 +03:00
Ezio Melotti
153d97b24e #20288: merge with 3.3. 2014-02-01 21:22:26 +02:00
Ezio Melotti
f27b9a741a #20288: fix handling of invalid numeric charrefs in HTMLParser. 2014-02-01 21:21:01 +02:00
Ezio Melotti
95401c5f6b #13633: Added a new convert_charrefs keyword arg to HTMLParser that, when True, automatically converts all character references. 2013-11-23 19:52:05 +02:00
Ezio Melotti
f6de9eb2bb #19688: add back and deprecate the internal HTMLParser.unescape() method. 2013-11-22 05:49:29 +02:00
Ezio Melotti
4a9ee26750 #2927: Added the unescape() function to the html module. 2013-11-19 20:28:45 +02:00
Ezio Melotti
b7038817fe #19480: merge with 3.3. 2013-11-07 18:35:27 +02:00
Ezio Melotti
7165d8b9ba #19480: HTMLParser now accepts all valid start-tag names as defined by the HTML5 standard. 2013-11-07 18:33:24 +02:00
Ezio Melotti
88ebfb129b #15114: The html.parser module now raises a DeprecationWarning when the strict argument of HTMLParser or the HTMLParser.error method are used. 2013-11-02 17:08:24 +02:00
Ezio Melotti
4603487dc9 #18020: improve html.escape speed by an order of magnitude. Patch by Matt Bryant. 2013-07-07 11:11:24 +02:00
Ezio Melotti
f6ca26fbff #17802: merge with 3.3. 2013-05-01 16:20:00 +03:00
Ezio Melotti
8e596a765c #17802: Fix an UnboundLocalError in html.parser. Initial tests by Thomas Barlow. 2013-05-01 16:18:25 +03:00
Ezio Melotti
1698babd1b #14679: add an __all__ (that contains only HTMLParser) to html.parser. 2013-05-01 16:09:34 +03:00
Ezio Melotti
e6e96eea51 #16245: Fix the value of a few entities in html.entities.html5. 2012-10-23 15:51:27 +02:00
Ezio Melotti
518dbfd7b5 Reorder html.entities.html5 entities to make updates easier. Patch by Iuliia Proskurnia. 2012-10-23 14:45:58 +02:00
Ezio Melotti
46495182d0 #15156: HTMLParser now uses the new "html.entities.html5" dictionary. 2012-06-24 22:02:56 +02:00
Ezio Melotti
dc44f55cc9 #11113: add a new "html5" dictionary containing the named character references defined by the HTML5 standard and the equivalent Unicode character(s) to the html.entities module. 2012-06-24 04:37:41 +02:00
Ezio Melotti
3861d8b271 #15114: the strict mode of HTMLParser and the HTMLParseError exception are deprecated now that the parser is able to parse invalid markup. 2012-06-23 15:27:51 +02:00
Ezio Melotti
0780b6bc58 #14538: HTMLParser can now parse correctly start tags that contain a bare /. 2012-04-18 19:18:22 -06:00
Ezio Melotti
29877e8e04 HTMLParser is now able to handle slashes in the start tag. 2012-02-21 09:25:00 +02:00
Ezio Melotti
e31ddedb0e Fix an index and clean up comments. 2012-02-13 20:20:00 +02:00
Ezio Melotti
f4ab491901 Improve handling of declarations in HTMLParser. 2012-02-13 15:50:37 +02:00
Ezio Melotti
5211ffe4df #13993: HTMLParser is now able to handle broken end tags when strict=False. 2012-02-13 11:24:50 +02:00
Ezio Melotti
fa3702dc28 #13960: HTMLParser is now able to handle broken comments when strict=False. 2012-02-10 10:45:44 +02:00
Ezio Melotti
15cb489234 #13358: HTMLParser now calls handle_data only once for each CDATA. 2011-11-18 18:01:49 +02:00
Ezio Melotti
c2fe57762b #1745761, #755670, #13357, #12629, #1200313: improve attribute handling in HTMLParser. 2011-11-14 18:53:33 +02:00
Ezio Melotti
7de56f6a04 #670664: Fix HTMLParser to correctly handle the content of `<script>...</script> and <style>...</style>`. 2011-11-01 14:12:22 +02:00
Ezio Melotti
f50ffa94ab #13273: fix a bug that prevented HTMLParser to properly detect some tags when strict=False. 2011-10-28 13:21:09 +03:00
Senthil Kumaran
d71bbf9fd5 Fix issue12938 - Update the docstring of html.escape. Include the information on single quote. 2011-09-13 07:14:13 +08:00
Ezio Melotti
d9e0b068af #12888: Fix a bug in HTMLParser.unescape that prevented it to escape more than 128 entities. Patch by Peter Otten. 2011-09-05 17:11:06 +03:00