mirror of
https://github.com/python/cpython.git
synced 2025-01-22 00:15:52 +08:00
46d88a1131
* bpo-35805: Add parser for Message-ID header. This parser is based on the definition of Identification Fields from RFC 5322 Sec 3.6.4. This should also prevent folding of Message-ID header using RFC 2047 encoded words and hence fix bpo-35805. * Prevent folding of non-ascii message-id headers. * Add fold method to MsgID token to prevent folding.
463 lines
18 KiB
ReStructuredText
463 lines
18 KiB
ReStructuredText
:mod:`email.headerregistry`: Custom Header Objects
|
|
--------------------------------------------------
|
|
|
|
.. module:: email.headerregistry
|
|
:synopsis: Automatic Parsing of headers based on the field name
|
|
|
|
.. moduleauthor:: R. David Murray <rdmurray@bitdance.com>
|
|
.. sectionauthor:: R. David Murray <rdmurray@bitdance.com>
|
|
|
|
**Source code:** :source:`Lib/email/headerregistry.py`
|
|
|
|
--------------
|
|
|
|
.. versionadded:: 3.6 [1]_
|
|
|
|
Headers are represented by customized subclasses of :class:`str`. The
|
|
particular class used to represent a given header is determined by the
|
|
:attr:`~email.policy.EmailPolicy.header_factory` of the :mod:`~email.policy` in
|
|
effect when the headers are created. This section documents the particular
|
|
``header_factory`` implemented by the email package for handling :RFC:`5322`
|
|
compliant email messages, which not only provides customized header objects for
|
|
various header types, but also provides an extension mechanism for applications
|
|
to add their own custom header types.
|
|
|
|
When using any of the policy objects derived from
|
|
:data:`~email.policy.EmailPolicy`, all headers are produced by
|
|
:class:`.HeaderRegistry` and have :class:`.BaseHeader` as their last base
|
|
class. Each header class has an additional base class that is determined by
|
|
the type of the header. For example, many headers have the class
|
|
:class:`.UnstructuredHeader` as their other base class. The specialized second
|
|
class for a header is determined by the name of the header, using a lookup
|
|
table stored in the :class:`.HeaderRegistry`. All of this is managed
|
|
transparently for the typical application program, but interfaces are provided
|
|
for modifying the default behavior for use by more complex applications.
|
|
|
|
The sections below first document the header base classes and their attributes,
|
|
followed by the API for modifying the behavior of :class:`.HeaderRegistry`, and
|
|
finally the support classes used to represent the data parsed from structured
|
|
headers.
|
|
|
|
|
|
.. class:: BaseHeader(name, value)
|
|
|
|
*name* and *value* are passed to ``BaseHeader`` from the
|
|
:attr:`~email.policy.EmailPolicy.header_factory` call. The string value of
|
|
any header object is the *value* fully decoded to unicode.
|
|
|
|
This base class defines the following read-only properties:
|
|
|
|
|
|
.. attribute:: name
|
|
|
|
The name of the header (the portion of the field before the ':'). This
|
|
is exactly the value passed in the
|
|
:attr:`~email.policy.EmailPolicy.header_factory` call for *name*; that
|
|
is, case is preserved.
|
|
|
|
|
|
.. attribute:: defects
|
|
|
|
A tuple of :exc:`~email.errors.HeaderDefect` instances reporting any
|
|
RFC compliance problems found during parsing. The email package tries to
|
|
be complete about detecting compliance issues. See the :mod:`~email.errors`
|
|
module for a discussion of the types of defects that may be reported.
|
|
|
|
|
|
.. attribute:: max_count
|
|
|
|
The maximum number of headers of this type that can have the same
|
|
``name``. A value of ``None`` means unlimited. The ``BaseHeader`` value
|
|
for this attribute is ``None``; it is expected that specialized header
|
|
classes will override this value as needed.
|
|
|
|
``BaseHeader`` also provides the following method, which is called by the
|
|
email library code and should not in general be called by application
|
|
programs:
|
|
|
|
.. method:: fold(*, policy)
|
|
|
|
Return a string containing :attr:`~email.policy.Policy.linesep`
|
|
characters as required to correctly fold the header according to
|
|
*policy*. A :attr:`~email.policy.Policy.cte_type` of ``8bit`` will be
|
|
treated as if it were ``7bit``, since headers may not contain arbitrary
|
|
binary data. If :attr:`~email.policy.EmailPolicy.utf8` is ``False``,
|
|
non-ASCII data will be :rfc:`2047` encoded.
|
|
|
|
|
|
``BaseHeader`` by itself cannot be used to create a header object. It
|
|
defines a protocol that each specialized header cooperates with in order to
|
|
produce the header object. Specifically, ``BaseHeader`` requires that
|
|
the specialized class provide a :func:`classmethod` named ``parse``. This
|
|
method is called as follows::
|
|
|
|
parse(string, kwds)
|
|
|
|
``kwds`` is a dictionary containing one pre-initialized key, ``defects``.
|
|
``defects`` is an empty list. The parse method should append any detected
|
|
defects to this list. On return, the ``kwds`` dictionary *must* contain
|
|
values for at least the keys ``decoded`` and ``defects``. ``decoded``
|
|
should be the string value for the header (that is, the header value fully
|
|
decoded to unicode). The parse method should assume that *string* may
|
|
contain content-transfer-encoded parts, but should correctly handle all valid
|
|
unicode characters as well so that it can parse un-encoded header values.
|
|
|
|
``BaseHeader``'s ``__new__`` then creates the header instance, and calls its
|
|
``init`` method. The specialized class only needs to provide an ``init``
|
|
method if it wishes to set additional attributes beyond those provided by
|
|
``BaseHeader`` itself. Such an ``init`` method should look like this::
|
|
|
|
def init(self, /, *args, **kw):
|
|
self._myattr = kw.pop('myattr')
|
|
super().init(*args, **kw)
|
|
|
|
That is, anything extra that the specialized class puts in to the ``kwds``
|
|
dictionary should be removed and handled, and the remaining contents of
|
|
``kw`` (and ``args``) passed to the ``BaseHeader`` ``init`` method.
|
|
|
|
|
|
.. class:: UnstructuredHeader
|
|
|
|
An "unstructured" header is the default type of header in :rfc:`5322`.
|
|
Any header that does not have a specified syntax is treated as
|
|
unstructured. The classic example of an unstructured header is the
|
|
:mailheader:`Subject` header.
|
|
|
|
In :rfc:`5322`, an unstructured header is a run of arbitrary text in the
|
|
ASCII character set. :rfc:`2047`, however, has an :rfc:`5322` compatible
|
|
mechanism for encoding non-ASCII text as ASCII characters within a header
|
|
value. When a *value* containing encoded words is passed to the
|
|
constructor, the ``UnstructuredHeader`` parser converts such encoded words
|
|
into unicode, following the :rfc:`2047` rules for unstructured text. The
|
|
parser uses heuristics to attempt to decode certain non-compliant encoded
|
|
words. Defects are registered in such cases, as well as defects for issues
|
|
such as invalid characters within the encoded words or the non-encoded text.
|
|
|
|
This header type provides no additional attributes.
|
|
|
|
|
|
.. class:: DateHeader
|
|
|
|
:rfc:`5322` specifies a very specific format for dates within email headers.
|
|
The ``DateHeader`` parser recognizes that date format, as well as
|
|
recognizing a number of variant forms that are sometimes found "in the
|
|
wild".
|
|
|
|
This header type provides the following additional attributes:
|
|
|
|
.. attribute:: datetime
|
|
|
|
If the header value can be recognized as a valid date of one form or
|
|
another, this attribute will contain a :class:`~datetime.datetime`
|
|
instance representing that date. If the timezone of the input date is
|
|
specified as ``-0000`` (indicating it is in UTC but contains no
|
|
information about the source timezone), then :attr:`.datetime` will be a
|
|
naive :class:`~datetime.datetime`. If a specific timezone offset is
|
|
found (including `+0000`), then :attr:`.datetime` will contain an aware
|
|
``datetime`` that uses :class:`datetime.timezone` to record the timezone
|
|
offset.
|
|
|
|
The ``decoded`` value of the header is determined by formatting the
|
|
``datetime`` according to the :rfc:`5322` rules; that is, it is set to::
|
|
|
|
email.utils.format_datetime(self.datetime)
|
|
|
|
When creating a ``DateHeader``, *value* may be
|
|
:class:`~datetime.datetime` instance. This means, for example, that
|
|
the following code is valid and does what one would expect::
|
|
|
|
msg['Date'] = datetime(2011, 7, 15, 21)
|
|
|
|
Because this is a naive ``datetime`` it will be interpreted as a UTC
|
|
timestamp, and the resulting value will have a timezone of ``-0000``. Much
|
|
more useful is to use the :func:`~email.utils.localtime` function from the
|
|
:mod:`~email.utils` module::
|
|
|
|
msg['Date'] = utils.localtime()
|
|
|
|
This example sets the date header to the current time and date using
|
|
the current timezone offset.
|
|
|
|
|
|
.. class:: AddressHeader
|
|
|
|
Address headers are one of the most complex structured header types.
|
|
The ``AddressHeader`` class provides a generic interface to any address
|
|
header.
|
|
|
|
This header type provides the following additional attributes:
|
|
|
|
|
|
.. attribute:: groups
|
|
|
|
A tuple of :class:`.Group` objects encoding the
|
|
addresses and groups found in the header value. Addresses that are
|
|
not part of a group are represented in this list as single-address
|
|
``Groups`` whose :attr:`~.Group.display_name` is ``None``.
|
|
|
|
|
|
.. attribute:: addresses
|
|
|
|
A tuple of :class:`.Address` objects encoding all
|
|
of the individual addresses from the header value. If the header value
|
|
contains any groups, the individual addresses from the group are included
|
|
in the list at the point where the group occurs in the value (that is,
|
|
the list of addresses is "flattened" into a one dimensional list).
|
|
|
|
The ``decoded`` value of the header will have all encoded words decoded to
|
|
unicode. :class:`~encodings.idna` encoded domain names are also decoded to
|
|
unicode. The ``decoded`` value is set by :attr:`~str.join`\ ing the
|
|
:class:`str` value of the elements of the ``groups`` attribute with ``',
|
|
'``.
|
|
|
|
A list of :class:`.Address` and :class:`.Group` objects in any combination
|
|
may be used to set the value of an address header. ``Group`` objects whose
|
|
``display_name`` is ``None`` will be interpreted as single addresses, which
|
|
allows an address list to be copied with groups intact by using the list
|
|
obtained from the ``groups`` attribute of the source header.
|
|
|
|
|
|
.. class:: SingleAddressHeader
|
|
|
|
A subclass of :class:`.AddressHeader` that adds one
|
|
additional attribute:
|
|
|
|
|
|
.. attribute:: address
|
|
|
|
The single address encoded by the header value. If the header value
|
|
actually contains more than one address (which would be a violation of
|
|
the RFC under the default :mod:`~email.policy`), accessing this attribute
|
|
will result in a :exc:`ValueError`.
|
|
|
|
|
|
Many of the above classes also have a ``Unique`` variant (for example,
|
|
``UniqueUnstructuredHeader``). The only difference is that in the ``Unique``
|
|
variant, :attr:`~.BaseHeader.max_count` is set to 1.
|
|
|
|
|
|
.. class:: MIMEVersionHeader
|
|
|
|
There is really only one valid value for the :mailheader:`MIME-Version`
|
|
header, and that is ``1.0``. For future proofing, this header class
|
|
supports other valid version numbers. If a version number has a valid value
|
|
per :rfc:`2045`, then the header object will have non-``None`` values for
|
|
the following attributes:
|
|
|
|
.. attribute:: version
|
|
|
|
The version number as a string, with any whitespace and/or comments
|
|
removed.
|
|
|
|
.. attribute:: major
|
|
|
|
The major version number as an integer
|
|
|
|
.. attribute:: minor
|
|
|
|
The minor version number as an integer
|
|
|
|
|
|
.. class:: ParameterizedMIMEHeader
|
|
|
|
MIME headers all start with the prefix 'Content-'. Each specific header has
|
|
a certain value, described under the class for that header. Some can
|
|
also take a list of supplemental parameters, which have a common format.
|
|
This class serves as a base for all the MIME headers that take parameters.
|
|
|
|
.. attribute:: params
|
|
|
|
A dictionary mapping parameter names to parameter values.
|
|
|
|
|
|
.. class:: ContentTypeHeader
|
|
|
|
A :class:`ParameterizedMIMEHeader` class that handles the
|
|
:mailheader:`Content-Type` header.
|
|
|
|
.. attribute:: content_type
|
|
|
|
The content type string, in the form ``maintype/subtype``.
|
|
|
|
.. attribute:: maintype
|
|
|
|
.. attribute:: subtype
|
|
|
|
|
|
.. class:: ContentDispositionHeader
|
|
|
|
A :class:`ParameterizedMIMEHeader` class that handles the
|
|
:mailheader:`Content-Disposition` header.
|
|
|
|
.. attribute:: content-disposition
|
|
|
|
``inline`` and ``attachment`` are the only valid values in common use.
|
|
|
|
|
|
.. class:: ContentTransferEncoding
|
|
|
|
Handles the :mailheader:`Content-Transfer-Encoding` header.
|
|
|
|
.. attribute:: cte
|
|
|
|
Valid values are ``7bit``, ``8bit``, ``base64``, and
|
|
``quoted-printable``. See :rfc:`2045` for more information.
|
|
|
|
|
|
|
|
.. class:: HeaderRegistry(base_class=BaseHeader, \
|
|
default_class=UnstructuredHeader, \
|
|
use_default_map=True)
|
|
|
|
This is the factory used by :class:`~email.policy.EmailPolicy` by default.
|
|
``HeaderRegistry`` builds the class used to create a header instance
|
|
dynamically, using *base_class* and a specialized class retrieved from a
|
|
registry that it holds. When a given header name does not appear in the
|
|
registry, the class specified by *default_class* is used as the specialized
|
|
class. When *use_default_map* is ``True`` (the default), the standard
|
|
mapping of header names to classes is copied in to the registry during
|
|
initialization. *base_class* is always the last class in the generated
|
|
class's ``__bases__`` list.
|
|
|
|
The default mappings are:
|
|
|
|
:subject: UniqueUnstructuredHeader
|
|
:date: UniqueDateHeader
|
|
:resent-date: DateHeader
|
|
:orig-date: UniqueDateHeader
|
|
:sender: UniqueSingleAddressHeader
|
|
:resent-sender: SingleAddressHeader
|
|
:to: UniqueAddressHeader
|
|
:resent-to: AddressHeader
|
|
:cc: UniqueAddressHeader
|
|
:resent-cc: AddressHeader
|
|
:bcc: UniqueAddressHeader
|
|
:resent-bcc: AddressHeader
|
|
:from: UniqueAddressHeader
|
|
:resent-from: AddressHeader
|
|
:reply-to: UniqueAddressHeader
|
|
:mime-version: MIMEVersionHeader
|
|
:content-type: ContentTypeHeader
|
|
:content-disposition: ContentDispositionHeader
|
|
:content-transfer-encoding: ContentTransferEncodingHeader
|
|
:message-id: MessageIDHeader
|
|
|
|
``HeaderRegistry`` has the following methods:
|
|
|
|
|
|
.. method:: map_to_type(self, name, cls)
|
|
|
|
*name* is the name of the header to be mapped. It will be converted to
|
|
lower case in the registry. *cls* is the specialized class to be used,
|
|
along with *base_class*, to create the class used to instantiate headers
|
|
that match *name*.
|
|
|
|
|
|
.. method:: __getitem__(name)
|
|
|
|
Construct and return a class to handle creating a *name* header.
|
|
|
|
|
|
.. method:: __call__(name, value)
|
|
|
|
Retrieves the specialized header associated with *name* from the
|
|
registry (using *default_class* if *name* does not appear in the
|
|
registry) and composes it with *base_class* to produce a class,
|
|
calls the constructed class's constructor, passing it the same
|
|
argument list, and finally returns the class instance created thereby.
|
|
|
|
|
|
The following classes are the classes used to represent data parsed from
|
|
structured headers and can, in general, be used by an application program to
|
|
construct structured values to assign to specific headers.
|
|
|
|
|
|
.. class:: Address(display_name='', username='', domain='', addr_spec=None)
|
|
|
|
The class used to represent an email address. The general form of an
|
|
address is::
|
|
|
|
[display_name] <username@domain>
|
|
|
|
or::
|
|
|
|
username@domain
|
|
|
|
where each part must conform to specific syntax rules spelled out in
|
|
:rfc:`5322`.
|
|
|
|
As a convenience *addr_spec* can be specified instead of *username* and
|
|
*domain*, in which case *username* and *domain* will be parsed from the
|
|
*addr_spec*. An *addr_spec* must be a properly RFC quoted string; if it is
|
|
not ``Address`` will raise an error. Unicode characters are allowed and
|
|
will be property encoded when serialized. However, per the RFCs, unicode is
|
|
*not* allowed in the username portion of the address.
|
|
|
|
.. attribute:: display_name
|
|
|
|
The display name portion of the address, if any, with all quoting
|
|
removed. If the address does not have a display name, this attribute
|
|
will be an empty string.
|
|
|
|
.. attribute:: username
|
|
|
|
The ``username`` portion of the address, with all quoting removed.
|
|
|
|
.. attribute:: domain
|
|
|
|
The ``domain`` portion of the address.
|
|
|
|
.. attribute:: addr_spec
|
|
|
|
The ``username@domain`` portion of the address, correctly quoted
|
|
for use as a bare address (the second form shown above). This
|
|
attribute is not mutable.
|
|
|
|
.. method:: __str__()
|
|
|
|
The ``str`` value of the object is the address quoted according to
|
|
:rfc:`5322` rules, but with no Content Transfer Encoding of any non-ASCII
|
|
characters.
|
|
|
|
To support SMTP (:rfc:`5321`), ``Address`` handles one special case: if
|
|
``username`` and ``domain`` are both the empty string (or ``None``), then
|
|
the string value of the ``Address`` is ``<>``.
|
|
|
|
|
|
.. class:: Group(display_name=None, addresses=None)
|
|
|
|
The class used to represent an address group. The general form of an
|
|
address group is::
|
|
|
|
display_name: [address-list];
|
|
|
|
As a convenience for processing lists of addresses that consist of a mixture
|
|
of groups and single addresses, a ``Group`` may also be used to represent
|
|
single addresses that are not part of a group by setting *display_name* to
|
|
``None`` and providing a list of the single address as *addresses*.
|
|
|
|
.. attribute:: display_name
|
|
|
|
The ``display_name`` of the group. If it is ``None`` and there is
|
|
exactly one ``Address`` in ``addresses``, then the ``Group`` represents a
|
|
single address that is not in a group.
|
|
|
|
.. attribute:: addresses
|
|
|
|
A possibly empty tuple of :class:`.Address` objects representing the
|
|
addresses in the group.
|
|
|
|
.. method:: __str__()
|
|
|
|
The ``str`` value of a ``Group`` is formatted according to :rfc:`5322`,
|
|
but with no Content Transfer Encoding of any non-ASCII characters. If
|
|
``display_name`` is none and there is a single ``Address`` in the
|
|
``addresses`` list, the ``str`` value will be the same as the ``str`` of
|
|
that single ``Address``.
|
|
|
|
|
|
.. rubric:: Footnotes
|
|
|
|
.. [1] Originally added in 3.3 as a :term:`provisional module <provisional
|
|
package>`
|