#11113: add a new "html5" dictionary containing the named character references defined by the HTML5 standard and the equivalent Unicode character(s) to the html.entities module.

This commit is contained in:
Ezio Melotti 2012-06-24 04:37:41 +02:00
parent b698d8e7e9
commit dc44f55cc9
3 changed files with 2259 additions and 2 deletions

View File

@ -9,13 +9,25 @@
-------------- --------------
This module defines three dictionaries, ``name2codepoint``, ``codepoint2name``, This module defines four dictionaries, :data:`html5`,
and ``entitydefs``. ``entitydefs`` is used to provide the :attr:`entitydefs` :data:`name2codepoint`, :data:`codepoint2name`, and :data:`entitydefs`.
:data:`entitydefs` is used to provide the :attr:`entitydefs`
attribute of the :class:`html.parser.HTMLParser` class. The definition provided attribute of the :class:`html.parser.HTMLParser` class. The definition provided
here contains all the entities defined by XHTML 1.0 that can be handled using here contains all the entities defined by XHTML 1.0 that can be handled using
simple textual substitution in the Latin-1 character set (ISO-8859-1). simple textual substitution in the Latin-1 character set (ISO-8859-1).
.. data:: html5
A dictionary that maps HTML5 named character references [#]_ to the
equivalent Unicode character(s), e.g. ``html5['gt;'] == '>'``.
Note that the trailing semicolon is included in the name (e.g. ``'gt;'``),
however some of the names are accepted by the standard even without the
semicolon: in this case the name is present with and without the ``';'``.
.. versionadded:: 3.3
.. data:: entitydefs .. data:: entitydefs
A dictionary mapping XHTML 1.0 entity definitions to their replacement text in A dictionary mapping XHTML 1.0 entity definitions to their replacement text in
@ -30,3 +42,8 @@ simple textual substitution in the Latin-1 character set (ISO-8859-1).
.. data:: codepoint2name .. data:: codepoint2name
A dictionary that maps Unicode codepoints to HTML entity names. A dictionary that maps Unicode codepoints to HTML entity names.
.. rubric:: Footnotes
.. [#] See http://www.w3.org/TR/html5/named-character-references.html

File diff suppressed because it is too large Load Diff

View File

@ -54,6 +54,10 @@ Library
It is used automatically on platforms supporting the necessary os.openat() It is used automatically on platforms supporting the necessary os.openat()
and os.unlinkat() functions. Main code by Martin von Löwis. and os.unlinkat() functions. Main code by Martin von Löwis.
- Issue #11113: add a new "html5" dictionary containing the named character
references defined by the HTML5 standard and the equivalent Unicode
character(s) to the html.entities module.
- Issue #15114: the strict mode of HTMLParser and the HTMLParseError exception - Issue #15114: the strict mode of HTMLParser and the HTMLParseError exception
are deprecated now that the parser is able to parse invalid markup. are deprecated now that the parser is able to parse invalid markup.