mirror of
https://github.com/python/cpython.git
synced 2024-12-23 00:34:40 +08:00
796 lines
32 KiB
ReStructuredText
796 lines
32 KiB
ReStructuredText
:mod:`pickle` --- Python object serialization
|
|
=============================================
|
|
|
|
.. index::
|
|
single: persistence
|
|
pair: persistent; objects
|
|
pair: serializing; objects
|
|
pair: marshalling; objects
|
|
pair: flattening; objects
|
|
pair: pickling; objects
|
|
|
|
.. module:: pickle
|
|
:synopsis: Convert Python objects to streams of bytes and back.
|
|
.. sectionauthor:: Jim Kerr <jbkerr@sr.hp.com>.
|
|
.. sectionauthor:: Barry Warsaw <barry@zope.com>
|
|
|
|
|
|
The :mod:`pickle` module implements a fundamental, but powerful algorithm for
|
|
serializing and de-serializing a Python object structure. "Pickling" is the
|
|
process whereby a Python object hierarchy is converted into a byte stream, and
|
|
"unpickling" is the inverse operation, whereby a byte stream is converted back
|
|
into an object hierarchy. Pickling (and unpickling) is alternatively known as
|
|
"serialization", "marshalling," [#]_ or "flattening", however, to avoid
|
|
confusion, the terms used here are "pickling" and "unpickling"..
|
|
|
|
.. warning::
|
|
|
|
The :mod:`pickle` module is not intended to be secure against erroneous or
|
|
maliciously constructed data. Never unpickle data received from an untrusted
|
|
or unauthenticated source.
|
|
|
|
|
|
Relationship to other Python modules
|
|
------------------------------------
|
|
|
|
The :mod:`pickle` module has an transparent optimizer (:mod:`_pickle`) written
|
|
in C. It is used whenever available. Otherwise the pure Python implementation is
|
|
used.
|
|
|
|
Python has a more primitive serialization module called :mod:`marshal`, but in
|
|
general :mod:`pickle` should always be the preferred way to serialize Python
|
|
objects. :mod:`marshal` exists primarily to support Python's :file:`.pyc`
|
|
files.
|
|
|
|
The :mod:`pickle` module differs from :mod:`marshal` in several significant ways:
|
|
|
|
* The :mod:`pickle` module keeps track of the objects it has already serialized,
|
|
so that later references to the same object won't be serialized again.
|
|
:mod:`marshal` doesn't do this.
|
|
|
|
This has implications both for recursive objects and object sharing. Recursive
|
|
objects are objects that contain references to themselves. These are not
|
|
handled by marshal, and in fact, attempting to marshal recursive objects will
|
|
crash your Python interpreter. Object sharing happens when there are multiple
|
|
references to the same object in different places in the object hierarchy being
|
|
serialized. :mod:`pickle` stores such objects only once, and ensures that all
|
|
other references point to the master copy. Shared objects remain shared, which
|
|
can be very important for mutable objects.
|
|
|
|
* :mod:`marshal` cannot be used to serialize user-defined classes and their
|
|
instances. :mod:`pickle` can save and restore class instances transparently,
|
|
however the class definition must be importable and live in the same module as
|
|
when the object was stored.
|
|
|
|
* The :mod:`marshal` serialization format is not guaranteed to be portable
|
|
across Python versions. Because its primary job in life is to support
|
|
:file:`.pyc` files, the Python implementers reserve the right to change the
|
|
serialization format in non-backwards compatible ways should the need arise.
|
|
The :mod:`pickle` serialization format is guaranteed to be backwards compatible
|
|
across Python releases.
|
|
|
|
Note that serialization is a more primitive notion than persistence; although
|
|
:mod:`pickle` reads and writes file objects, it does not handle the issue of
|
|
naming persistent objects, nor the (even more complicated) issue of concurrent
|
|
access to persistent objects. The :mod:`pickle` module can transform a complex
|
|
object into a byte stream and it can transform the byte stream into an object
|
|
with the same internal structure. Perhaps the most obvious thing to do with
|
|
these byte streams is to write them onto a file, but it is also conceivable to
|
|
send them across a network or store them in a database. The module
|
|
:mod:`shelve` provides a simple interface to pickle and unpickle objects on
|
|
DBM-style database files.
|
|
|
|
|
|
Data stream format
|
|
------------------
|
|
|
|
.. index::
|
|
single: External Data Representation
|
|
|
|
The data format used by :mod:`pickle` is Python-specific. This has the
|
|
advantage that there are no restrictions imposed by external standards such as
|
|
JSON or XDR (which can't represent pointer sharing); however it means that
|
|
non-Python programs may not be able to reconstruct pickled Python objects.
|
|
|
|
By default, the :mod:`pickle` data format uses a relatively compact binary
|
|
representation. If you need optimal size characteristics, you can efficiently
|
|
:doc:`compress <archiving>` pickled data.
|
|
|
|
The module :mod:`pickletools` contains tools for analyzing data streams
|
|
generated by :mod:`pickle`. :mod:`pickletools` source code has extensive
|
|
comments about opcodes used by pickle protocols.
|
|
|
|
There are currently 4 different protocols which can be used for pickling.
|
|
|
|
* Protocol version 0 is the original "human-readable" protocol and is
|
|
backwards compatible with earlier versions of Python.
|
|
|
|
* Protocol version 1 is an old binary format which is also compatible with
|
|
earlier versions of Python.
|
|
|
|
* Protocol version 2 was introduced in Python 2.3. It provides much more
|
|
efficient pickling of :term:`new-style class`\es. Refer to :pep:`307` for
|
|
information about improvements brought by protocol 2.
|
|
|
|
* Protocol version 3 was added in Python 3. It has explicit support for
|
|
:class:`bytes` objects and cannot be unpickled by Python 2.x. This is
|
|
the default as well as the current recommended protocol; use it whenever
|
|
possible.
|
|
|
|
|
|
Module Interface
|
|
----------------
|
|
|
|
To serialize an object hierarchy, you simply call the :func:`dumps` function.
|
|
Similarly, to de-serialize a data stream, you call the :func:`loads` function.
|
|
However, if you want more control over serialization and de-serialization,
|
|
you can create a :class:`Pickler` or an :class:`Unpickler` object, respectively.
|
|
|
|
The :mod:`pickle` module provides the following constants:
|
|
|
|
|
|
.. data:: HIGHEST_PROTOCOL
|
|
|
|
The highest protocol version available. This value can be passed as a
|
|
*protocol* value.
|
|
|
|
.. data:: DEFAULT_PROTOCOL
|
|
|
|
The default protocol used for pickling. May be less than HIGHEST_PROTOCOL.
|
|
Currently the default protocol is 3, a new protocol designed for Python 3.0.
|
|
|
|
|
|
The :mod:`pickle` module provides the following functions to make the pickling
|
|
process more convenient:
|
|
|
|
.. function:: dump(obj, file, protocol=None, \*, fix_imports=True)
|
|
|
|
Write a pickled representation of *obj* to the open :term:`file object` *file*.
|
|
This is equivalent to ``Pickler(file, protocol).dump(obj)``.
|
|
|
|
The optional *protocol* argument tells the pickler to use the given protocol;
|
|
supported protocols are 0, 1, 2, 3. The default protocol is 3; a
|
|
backward-incompatible protocol designed for Python 3.0.
|
|
|
|
Specifying a negative protocol version selects the highest protocol version
|
|
supported. The higher the protocol used, the more recent the version of
|
|
Python needed to read the pickle produced.
|
|
|
|
The *file* argument must have a write() method that accepts a single bytes
|
|
argument. It can thus be an on-disk file opened for binary writing, a
|
|
:class:`io.BytesIO` instance, or any other custom object that meets this
|
|
interface.
|
|
|
|
If *fix_imports* is True and *protocol* is less than 3, pickle will try to
|
|
map the new Python 3.x names to the old module names used in Python 2.x,
|
|
so that the pickle data stream is readable with Python 2.x.
|
|
|
|
.. function:: dumps(obj, protocol=None, \*, fix_imports=True)
|
|
|
|
Return the pickled representation of the object as a :class:`bytes`
|
|
object, instead of writing it to a file.
|
|
|
|
The optional *protocol* argument tells the pickler to use the given protocol;
|
|
supported protocols are 0, 1, 2, 3. The default protocol is 3; a
|
|
backward-incompatible protocol designed for Python 3.0.
|
|
|
|
Specifying a negative protocol version selects the highest protocol version
|
|
supported. The higher the protocol used, the more recent the version of
|
|
Python needed to read the pickle produced.
|
|
|
|
If *fix_imports* is True and *protocol* is less than 3, pickle will try to
|
|
map the new Python 3.x names to the old module names used in Python 2.x,
|
|
so that the pickle data stream is readable with Python 2.x.
|
|
|
|
.. function:: load(file, \*, fix_imports=True, encoding="ASCII", errors="strict")
|
|
|
|
Read a pickled object representation from the open :term:`file object` *file*
|
|
and return the reconstituted object hierarchy specified therein. This is
|
|
equivalent to ``Unpickler(file).load()``.
|
|
|
|
The protocol version of the pickle is detected automatically, so no protocol
|
|
argument is needed. Bytes past the pickled object's representation are
|
|
ignored.
|
|
|
|
The argument *file* must have two methods, a read() method that takes an
|
|
integer argument, and a readline() method that requires no arguments. Both
|
|
methods should return bytes. Thus *file* can be an on-disk file opened
|
|
for binary reading, a :class:`io.BytesIO` object, or any other custom object
|
|
that meets this interface.
|
|
|
|
Optional keyword arguments are *fix_imports*, *encoding* and *errors*,
|
|
which are used to control compatibility support for pickle stream generated
|
|
by Python 2.x. If *fix_imports* is True, pickle will try to map the old
|
|
Python 2.x names to the new names used in Python 3.x. The *encoding* and
|
|
*errors* tell pickle how to decode 8-bit string instances pickled by Python
|
|
2.x; these default to 'ASCII' and 'strict', respectively.
|
|
|
|
.. function:: loads(bytes_object, \*, fix_imports=True, encoding="ASCII", errors="strict")
|
|
|
|
Read a pickled object hierarchy from a :class:`bytes` object and return the
|
|
reconstituted object hierarchy specified therein
|
|
|
|
The protocol version of the pickle is detected automatically, so no protocol
|
|
argument is needed. Bytes past the pickled object's representation are
|
|
ignored.
|
|
|
|
Optional keyword arguments are *fix_imports*, *encoding* and *errors*,
|
|
which are used to control compatibility support for pickle stream generated
|
|
by Python 2.x. If *fix_imports* is True, pickle will try to map the old
|
|
Python 2.x names to the new names used in Python 3.x. The *encoding* and
|
|
*errors* tell pickle how to decode 8-bit string instances pickled by Python
|
|
2.x; these default to 'ASCII' and 'strict', respectively.
|
|
|
|
|
|
The :mod:`pickle` module defines three exceptions:
|
|
|
|
.. exception:: PickleError
|
|
|
|
Common base class for the other pickling exceptions. It inherits
|
|
:exc:`Exception`.
|
|
|
|
.. exception:: PicklingError
|
|
|
|
Error raised when an unpicklable object is encountered by :class:`Pickler`.
|
|
It inherits :exc:`PickleError`.
|
|
|
|
Refer to :ref:`pickle-picklable` to learn what kinds of objects can be
|
|
pickled.
|
|
|
|
.. exception:: UnpicklingError
|
|
|
|
Error raised when there is a problem unpickling an object, such as a data
|
|
corruption or a security violation. It inherits :exc:`PickleError`.
|
|
|
|
Note that other exceptions may also be raised during unpickling, including
|
|
(but not necessarily limited to) AttributeError, EOFError, ImportError, and
|
|
IndexError.
|
|
|
|
|
|
The :mod:`pickle` module exports two classes, :class:`Pickler` and
|
|
:class:`Unpickler`:
|
|
|
|
.. class:: Pickler(file, protocol=None, \*, fix_imports=True)
|
|
|
|
This takes a binary file for writing a pickle data stream.
|
|
|
|
The optional *protocol* argument tells the pickler to use the given protocol;
|
|
supported protocols are 0, 1, 2, 3. The default protocol is 3; a
|
|
backward-incompatible protocol designed for Python 3.0.
|
|
|
|
Specifying a negative protocol version selects the highest protocol version
|
|
supported. The higher the protocol used, the more recent the version of
|
|
Python needed to read the pickle produced.
|
|
|
|
The *file* argument must have a write() method that accepts a single bytes
|
|
argument. It can thus be an on-disk file opened for binary writing, a
|
|
:class:`io.BytesIO` instance, or any other custom object that meets this interface.
|
|
|
|
If *fix_imports* is True and *protocol* is less than 3, pickle will try to
|
|
map the new Python 3.x names to the old module names used in Python 2.x,
|
|
so that the pickle data stream is readable with Python 2.x.
|
|
|
|
.. method:: dump(obj)
|
|
|
|
Write a pickled representation of *obj* to the open file object given in
|
|
the constructor.
|
|
|
|
.. method:: persistent_id(obj)
|
|
|
|
Do nothing by default. This exists so a subclass can override it.
|
|
|
|
If :meth:`persistent_id` returns ``None``, *obj* is pickled as usual. Any
|
|
other value causes :class:`Pickler` to emit the returned value as a
|
|
persistent ID for *obj*. The meaning of this persistent ID should be
|
|
defined by :meth:`Unpickler.persistent_load`. Note that the value
|
|
returned by :meth:`persistent_id` cannot itself have a persistent ID.
|
|
|
|
See :ref:`pickle-persistent` for details and examples of uses.
|
|
|
|
.. attribute:: fast
|
|
|
|
Deprecated. Enable fast mode if set to a true value. The fast mode
|
|
disables the usage of memo, therefore speeding the pickling process by not
|
|
generating superfluous PUT opcodes. It should not be used with
|
|
self-referential objects, doing otherwise will cause :class:`Pickler` to
|
|
recurse infinitely.
|
|
|
|
Use :func:`pickletools.optimize` if you need more compact pickles.
|
|
|
|
|
|
.. class:: Unpickler(file, \*, fix_imports=True, encoding="ASCII", errors="strict")
|
|
|
|
This takes a binary file for reading a pickle data stream.
|
|
|
|
The protocol version of the pickle is detected automatically, so no
|
|
protocol argument is needed.
|
|
|
|
The argument *file* must have two methods, a read() method that takes an
|
|
integer argument, and a readline() method that requires no arguments. Both
|
|
methods should return bytes. Thus *file* can be an on-disk file object opened
|
|
for binary reading, a :class:`io.BytesIO` object, or any other custom object
|
|
that meets this interface.
|
|
|
|
Optional keyword arguments are *fix_imports*, *encoding* and *errors*,
|
|
which are used to control compatibility support for pickle stream generated
|
|
by Python 2.x. If *fix_imports* is True, pickle will try to map the old
|
|
Python 2.x names to the new names used in Python 3.x. The *encoding* and
|
|
*errors* tell pickle how to decode 8-bit string instances pickled by Python
|
|
2.x; these default to 'ASCII' and 'strict', respectively.
|
|
|
|
.. method:: load()
|
|
|
|
Read a pickled object representation from the open file object given in
|
|
the constructor, and return the reconstituted object hierarchy specified
|
|
therein. Bytes past the pickled object's representation are ignored.
|
|
|
|
.. method:: persistent_load(pid)
|
|
|
|
Raise an :exc:`UnpicklingError` by default.
|
|
|
|
If defined, :meth:`persistent_load` should return the object specified by
|
|
the persistent ID *pid*. If an invalid persistent ID is encountered, an
|
|
:exc:`UnpicklingError` should be raised.
|
|
|
|
See :ref:`pickle-persistent` for details and examples of uses.
|
|
|
|
.. method:: find_class(module, name)
|
|
|
|
Import *module* if necessary and return the object called *name* from it,
|
|
where the *module* and *name* arguments are :class:`str` objects. Note,
|
|
unlike its name suggests, :meth:`find_class` is also used for finding
|
|
functions.
|
|
|
|
Subclasses may override this to gain control over what type of objects and
|
|
how they can be loaded, potentially reducing security risks. Refer to
|
|
:ref:`pickle-restrict` for details.
|
|
|
|
|
|
.. _pickle-picklable:
|
|
|
|
What can be pickled and unpickled?
|
|
----------------------------------
|
|
|
|
The following types can be pickled:
|
|
|
|
* ``None``, ``True``, and ``False``
|
|
|
|
* integers, floating point numbers, complex numbers
|
|
|
|
* strings, bytes, bytearrays
|
|
|
|
* tuples, lists, sets, and dictionaries containing only picklable objects
|
|
|
|
* functions defined at the top level of a module
|
|
|
|
* built-in functions defined at the top level of a module
|
|
|
|
* classes that are defined at the top level of a module
|
|
|
|
* instances of such classes whose :attr:`__dict__` or :meth:`__setstate__` is
|
|
picklable (see section :ref:`pickle-inst` for details)
|
|
|
|
Attempts to pickle unpicklable objects will raise the :exc:`PicklingError`
|
|
exception; when this happens, an unspecified number of bytes may have already
|
|
been written to the underlying file. Trying to pickle a highly recursive data
|
|
structure may exceed the maximum recursion depth, a :exc:`RuntimeError` will be
|
|
raised in this case. You can carefully raise this limit with
|
|
:func:`sys.setrecursionlimit`.
|
|
|
|
Note that functions (built-in and user-defined) are pickled by "fully qualified"
|
|
name reference, not by value. This means that only the function name is
|
|
pickled, along with the name of the module the function is defined in. Neither the
|
|
function's code, nor any of its function attributes are pickled. Thus the
|
|
defining module must be importable in the unpickling environment, and the module
|
|
must contain the named object, otherwise an exception will be raised. [#]_
|
|
|
|
Similarly, classes are pickled by named reference, so the same restrictions in
|
|
the unpickling environment apply. Note that none of the class's code or data is
|
|
pickled, so in the following example the class attribute ``attr`` is not
|
|
restored in the unpickling environment::
|
|
|
|
class Foo:
|
|
attr = 'A class attribute'
|
|
|
|
picklestring = pickle.dumps(Foo)
|
|
|
|
These restrictions are why picklable functions and classes must be defined in
|
|
the top level of a module.
|
|
|
|
Similarly, when class instances are pickled, their class's code and data are not
|
|
pickled along with them. Only the instance data are pickled. This is done on
|
|
purpose, so you can fix bugs in a class or add methods to the class and still
|
|
load objects that were created with an earlier version of the class. If you
|
|
plan to have long-lived objects that will see many versions of a class, it may
|
|
be worthwhile to put a version number in the objects so that suitable
|
|
conversions can be made by the class's :meth:`__setstate__` method.
|
|
|
|
|
|
.. _pickle-inst:
|
|
|
|
Pickling Class Instances
|
|
------------------------
|
|
|
|
In this section, we describe the general mechanisms available to you to define,
|
|
customize, and control how class instances are pickled and unpickled.
|
|
|
|
In most cases, no additional code is needed to make instances picklable. By
|
|
default, pickle will retrieve the class and the attributes of an instance via
|
|
introspection. When a class instance is unpickled, its :meth:`__init__` method
|
|
is usually *not* invoked. The default behaviour first creates an uninitialized
|
|
instance and then restores the saved attributes. The following code shows an
|
|
implementation of this behaviour::
|
|
|
|
def save(obj):
|
|
return (obj.__class__, obj.__dict__)
|
|
|
|
def load(cls, attributes):
|
|
obj = cls.__new__(cls)
|
|
obj.__dict__.update(attributes)
|
|
return obj
|
|
|
|
Classes can alter the default behaviour by providing one or several special
|
|
methods:
|
|
|
|
.. method:: object.__getnewargs__()
|
|
|
|
In protocol 2 and newer, classes that implements the :meth:`__getnewargs__`
|
|
method can dictate the values passed to the :meth:`__new__` method upon
|
|
unpickling. This is often needed for classes whose :meth:`__new__` method
|
|
requires arguments.
|
|
|
|
|
|
.. method:: object.__getstate__()
|
|
|
|
Classes can further influence how their instances are pickled; if the class
|
|
defines the method :meth:`__getstate__`, it is called and the returned object
|
|
is pickled as the contents for the instance, instead of the contents of the
|
|
instance's dictionary. If the :meth:`__getstate__` method is absent, the
|
|
instance's :attr:`__dict__` is pickled as usual.
|
|
|
|
|
|
.. method:: object.__setstate__(state)
|
|
|
|
Upon unpickling, if the class defines :meth:`__setstate__`, it is called with
|
|
the unpickled state. In that case, there is no requirement for the state
|
|
object to be a dictionary. Otherwise, the pickled state must be a dictionary
|
|
and its items are assigned to the new instance's dictionary.
|
|
|
|
.. note::
|
|
|
|
If :meth:`__getstate__` returns a false value, the :meth:`__setstate__`
|
|
method will not be called upon unpickling.
|
|
|
|
|
|
Refer to the section :ref:`pickle-state` for more information about how to use
|
|
the methods :meth:`__getstate__` and :meth:`__setstate__`.
|
|
|
|
.. note::
|
|
|
|
At unpickling time, some methods like :meth:`__getattr__`,
|
|
:meth:`__getattribute__`, or :meth:`__setattr__` may be called upon the
|
|
instance. In case those methods rely on some internal invariant being true,
|
|
the type should implement :meth:`__getnewargs__` to establish such an
|
|
invariant; otherwise, neither :meth:`__new__` nor :meth:`__init__` will be
|
|
called.
|
|
|
|
.. index:: pair: copy; protocol
|
|
|
|
As we shall see, pickle does not use directly the methods described above. In
|
|
fact, these methods are part of the copy protocol which implements the
|
|
:meth:`__reduce__` special method. The copy protocol provides a unified
|
|
interface for retrieving the data necessary for pickling and copying
|
|
objects. [#]_
|
|
|
|
Although powerful, implementing :meth:`__reduce__` directly in your classes is
|
|
error prone. For this reason, class designers should use the high-level
|
|
interface (i.e., :meth:`__getnewargs__`, :meth:`__getstate__` and
|
|
:meth:`__setstate__`) whenever possible. We will show, however, cases where
|
|
using :meth:`__reduce__` is the only option or leads to more efficient pickling
|
|
or both.
|
|
|
|
.. method:: object.__reduce__()
|
|
|
|
The interface is currently defined as follows. The :meth:`__reduce__` method
|
|
takes no argument and shall return either a string or preferably a tuple (the
|
|
returned object is often referred to as the "reduce value").
|
|
|
|
If a string is returned, the string should be interpreted as the name of a
|
|
global variable. It should be the object's local name relative to its
|
|
module; the pickle module searches the module namespace to determine the
|
|
object's module. This behaviour is typically useful for singletons.
|
|
|
|
When a tuple is returned, it must be between two and five items long.
|
|
Optional items can either be omitted, or ``None`` can be provided as their
|
|
value. The semantics of each item are in order:
|
|
|
|
.. XXX Mention __newobj__ special-case?
|
|
|
|
* A callable object that will be called to create the initial version of the
|
|
object.
|
|
|
|
* A tuple of arguments for the callable object. An empty tuple must be given
|
|
if the callable does not accept any argument.
|
|
|
|
* Optionally, the object's state, which will be passed to the object's
|
|
:meth:`__setstate__` method as previously described. If the object has no
|
|
such method then, the value must be a dictionary and it will be added to
|
|
the object's :attr:`__dict__` attribute.
|
|
|
|
* Optionally, an iterator (and not a sequence) yielding successive items.
|
|
These items will be appended to the object either using
|
|
``obj.append(item)`` or, in batch, using ``obj.extend(list_of_items)``.
|
|
This is primarily used for list subclasses, but may be used by other
|
|
classes as long as they have :meth:`append` and :meth:`extend` methods with
|
|
the appropriate signature. (Whether :meth:`append` or :meth:`extend` is
|
|
used depends on which pickle protocol version is used as well as the number
|
|
of items to append, so both must be supported.)
|
|
|
|
* Optionally, an iterator (not a sequence) yielding successive key-value
|
|
pairs. These items will be stored to the object using ``obj[key] =
|
|
value``. This is primarily used for dictionary subclasses, but may be used
|
|
by other classes as long as they implement :meth:`__setitem__`.
|
|
|
|
|
|
.. method:: object.__reduce_ex__(protocol)
|
|
|
|
Alternatively, a :meth:`__reduce_ex__` method may be defined. The only
|
|
difference is this method should take a single integer argument, the protocol
|
|
version. When defined, pickle will prefer it over the :meth:`__reduce__`
|
|
method. In addition, :meth:`__reduce__` automatically becomes a synonym for
|
|
the extended version. The main use for this method is to provide
|
|
backwards-compatible reduce values for older Python releases.
|
|
|
|
.. _pickle-persistent:
|
|
|
|
Persistence of External Objects
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
.. index::
|
|
single: persistent_id (pickle protocol)
|
|
single: persistent_load (pickle protocol)
|
|
|
|
For the benefit of object persistence, the :mod:`pickle` module supports the
|
|
notion of a reference to an object outside the pickled data stream. Such
|
|
objects are referenced by a persistent ID, which should be either a string of
|
|
alphanumeric characters (for protocol 0) [#]_ or just an arbitrary object (for
|
|
any newer protocol).
|
|
|
|
The resolution of such persistent IDs is not defined by the :mod:`pickle`
|
|
module; it will delegate this resolution to the user defined methods on the
|
|
pickler and unpickler, :meth:`persistent_id` and :meth:`persistent_load`
|
|
respectively.
|
|
|
|
To pickle objects that have an external persistent id, the pickler must have a
|
|
custom :meth:`persistent_id` method that takes an object as an argument and
|
|
returns either ``None`` or the persistent id for that object. When ``None`` is
|
|
returned, the pickler simply pickles the object as normal. When a persistent ID
|
|
string is returned, the pickler will pickle that object, along with a marker so
|
|
that the unpickler will recognize it as a persistent ID.
|
|
|
|
To unpickle external objects, the unpickler must have a custom
|
|
:meth:`persistent_load` method that takes a persistent ID object and returns the
|
|
referenced object.
|
|
|
|
Here is a comprehensive example presenting how persistent ID can be used to
|
|
pickle external objects by reference.
|
|
|
|
.. literalinclude:: ../includes/dbpickle.py
|
|
|
|
|
|
.. _pickle-state:
|
|
|
|
Handling Stateful Objects
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
.. index::
|
|
single: __getstate__() (copy protocol)
|
|
single: __setstate__() (copy protocol)
|
|
|
|
Here's an example that shows how to modify pickling behavior for a class.
|
|
The :class:`TextReader` class opens a text file, and returns the line number and
|
|
line contents each time its :meth:`readline` method is called. If a
|
|
:class:`TextReader` instance is pickled, all attributes *except* the file object
|
|
member are saved. When the instance is unpickled, the file is reopened, and
|
|
reading resumes from the last location. The :meth:`__setstate__` and
|
|
:meth:`__getstate__` methods are used to implement this behavior. ::
|
|
|
|
class TextReader:
|
|
"""Print and number lines in a text file."""
|
|
|
|
def __init__(self, filename):
|
|
self.filename = filename
|
|
self.file = open(filename)
|
|
self.lineno = 0
|
|
|
|
def readline(self):
|
|
self.lineno += 1
|
|
line = self.file.readline()
|
|
if not line:
|
|
return None
|
|
if line.endswith('\n'):
|
|
line = line[:-1]
|
|
return "%i: %s" % (self.lineno, line)
|
|
|
|
def __getstate__(self):
|
|
# Copy the object's state from self.__dict__ which contains
|
|
# all our instance attributes. Always use the dict.copy()
|
|
# method to avoid modifying the original state.
|
|
state = self.__dict__.copy()
|
|
# Remove the unpicklable entries.
|
|
del state['file']
|
|
return state
|
|
|
|
def __setstate__(self, state):
|
|
# Restore instance attributes (i.e., filename and lineno).
|
|
self.__dict__.update(state)
|
|
# Restore the previously opened file's state. To do so, we need to
|
|
# reopen it and read from it until the line count is restored.
|
|
file = open(self.filename)
|
|
for _ in range(self.lineno):
|
|
file.readline()
|
|
# Finally, save the file.
|
|
self.file = file
|
|
|
|
|
|
A sample usage might be something like this::
|
|
|
|
>>> reader = TextReader("hello.txt")
|
|
>>> reader.readline()
|
|
'1: Hello world!'
|
|
>>> reader.readline()
|
|
'2: I am line number two.'
|
|
>>> new_reader = pickle.loads(pickle.dumps(reader))
|
|
>>> new_reader.readline()
|
|
'3: Goodbye!'
|
|
|
|
|
|
.. _pickle-restrict:
|
|
|
|
Restricting Globals
|
|
-------------------
|
|
|
|
.. index::
|
|
single: find_class() (pickle protocol)
|
|
|
|
By default, unpickling will import any class or function that it finds in the
|
|
pickle data. For many applications, this behaviour is unacceptable as it
|
|
permits the unpickler to import and invoke arbitrary code. Just consider what
|
|
this hand-crafted pickle data stream does when loaded::
|
|
|
|
>>> import pickle
|
|
>>> pickle.loads(b"cos\nsystem\n(S'echo hello world'\ntR.")
|
|
hello world
|
|
0
|
|
|
|
In this example, the unpickler imports the :func:`os.system` function and then
|
|
apply the string argument "echo hello world". Although this example is
|
|
inoffensive, it is not difficult to imagine one that could damage your system.
|
|
|
|
For this reason, you may want to control what gets unpickled by customizing
|
|
:meth:`Unpickler.find_class`. Unlike its name suggests, :meth:`find_class` is
|
|
called whenever a global (i.e., a class or a function) is requested. Thus it is
|
|
possible to either completely forbid globals or restrict them to a safe subset.
|
|
|
|
Here is an example of an unpickler allowing only few safe classes from the
|
|
:mod:`builtins` module to be loaded::
|
|
|
|
import builtins
|
|
import io
|
|
import pickle
|
|
|
|
safe_builtins = {
|
|
'range',
|
|
'complex',
|
|
'set',
|
|
'frozenset',
|
|
'slice',
|
|
}
|
|
|
|
class RestrictedUnpickler(pickle.Unpickler):
|
|
|
|
def find_class(self, module, name):
|
|
# Only allow safe classes from builtins.
|
|
if module == "builtins" and name in safe_builtins:
|
|
return getattr(builtins, name)
|
|
# Forbid everything else.
|
|
raise pickle.UnpicklingError("global '%s.%s' is forbidden" %
|
|
(module, name))
|
|
|
|
def restricted_loads(s):
|
|
"""Helper function analogous to pickle.loads()."""
|
|
return RestrictedUnpickler(io.BytesIO(s)).load()
|
|
|
|
A sample usage of our unpickler working has intended::
|
|
|
|
>>> restricted_loads(pickle.dumps([1, 2, range(15)]))
|
|
[1, 2, range(0, 15)]
|
|
>>> restricted_loads(b"cos\nsystem\n(S'echo hello world'\ntR.")
|
|
Traceback (most recent call last):
|
|
...
|
|
pickle.UnpicklingError: global 'os.system' is forbidden
|
|
>>> restricted_loads(b'cbuiltins\neval\n'
|
|
... b'(S\'getattr(__import__("os"), "system")'
|
|
... b'("echo hello world")\'\ntR.')
|
|
Traceback (most recent call last):
|
|
...
|
|
pickle.UnpicklingError: global 'builtins.eval' is forbidden
|
|
|
|
|
|
.. XXX Add note about how extension codes could evade our protection
|
|
mechanism (e.g. cached classes do not invokes find_class()).
|
|
|
|
As our examples shows, you have to be careful with what you allow to be
|
|
unpickled. Therefore if security is a concern, you may want to consider
|
|
alternatives such as the marshalling API in :mod:`xmlrpc.client` or
|
|
third-party solutions.
|
|
|
|
|
|
.. _pickle-example:
|
|
|
|
Examples
|
|
--------
|
|
|
|
For the simplest code, use the :func:`dump` and :func:`load` functions. ::
|
|
|
|
import pickle
|
|
|
|
# An arbitrary collection of objects supported by pickle.
|
|
data = {
|
|
'a': [1, 2.0, 3, 4+6j],
|
|
'b': ("character string", b"byte string"),
|
|
'c': set([None, True, False])
|
|
}
|
|
|
|
with open('data.pickle', 'wb') as f:
|
|
# Pickle the 'data' dictionary using the highest protocol available.
|
|
pickle.dump(data, f, pickle.HIGHEST_PROTOCOL)
|
|
|
|
|
|
The following example reads the resulting pickled data. ::
|
|
|
|
import pickle
|
|
|
|
with open('data.pickle', 'rb') as f:
|
|
# The protocol version used is detected automatically, so we do not
|
|
# have to specify it.
|
|
data = pickle.load(f)
|
|
|
|
|
|
.. XXX: Add examples showing how to optimize pickles for size (like using
|
|
.. pickletools.optimize() or the gzip module).
|
|
|
|
|
|
.. seealso::
|
|
|
|
Module :mod:`copyreg`
|
|
Pickle interface constructor registration for extension types.
|
|
|
|
Module :mod:`pickletools`
|
|
Tools for working with and analyzing pickled data.
|
|
|
|
Module :mod:`shelve`
|
|
Indexed databases of objects; uses :mod:`pickle`.
|
|
|
|
Module :mod:`copy`
|
|
Shallow and deep object copying.
|
|
|
|
Module :mod:`marshal`
|
|
High-performance serialization of built-in types.
|
|
|
|
|
|
.. rubric:: Footnotes
|
|
|
|
.. [#] Don't confuse this with the :mod:`marshal` module
|
|
|
|
.. [#] The exception raised will likely be an :exc:`ImportError` or an
|
|
:exc:`AttributeError` but it could be something else.
|
|
|
|
.. [#] The :mod:`copy` module uses this protocol for shallow and deep copying
|
|
operations.
|
|
|
|
.. [#] The limitation on alphanumeric characters is due to the fact
|
|
the persistent IDs, in protocol 0, are delimited by the newline
|
|
character. Therefore if any kind of newline characters occurs in
|
|
persistent IDs, the resulting pickle will become unreadable.
|