2007-08-15 22:28:22 +08:00
|
|
|
|
|
|
|
:mod:`struct` --- Interpret strings as packed binary data
|
|
|
|
=========================================================
|
|
|
|
|
|
|
|
.. module:: struct
|
|
|
|
:synopsis: Interpret strings as packed binary data.
|
|
|
|
|
|
|
|
.. index::
|
|
|
|
pair: C; structures
|
|
|
|
triple: packing; binary; data
|
|
|
|
|
|
|
|
This module performs conversions between Python values and C structs represented
|
|
|
|
as Python strings. It uses :dfn:`format strings` (explained below) as compact
|
|
|
|
descriptions of the lay-out of the C structs and the intended conversion to/from
|
|
|
|
Python values. This can be used in handling binary data stored in files or from
|
|
|
|
network connections, among other sources.
|
|
|
|
|
|
|
|
The module defines the following exception and functions:
|
|
|
|
|
|
|
|
|
|
|
|
.. exception:: error
|
|
|
|
|
|
|
|
Exception raised on various occasions; argument is a string describing what is
|
|
|
|
wrong.
|
|
|
|
|
|
|
|
|
|
|
|
.. function:: pack(fmt, v1, v2, ...)
|
|
|
|
|
|
|
|
Return a string containing the values ``v1, v2, ...`` packed according to the
|
|
|
|
given format. The arguments must match the values required by the format
|
|
|
|
exactly.
|
|
|
|
|
|
|
|
|
|
|
|
.. function:: pack_into(fmt, buffer, offset, v1, v2, ...)
|
|
|
|
|
|
|
|
Pack the values ``v1, v2, ...`` according to the given format, write the packed
|
|
|
|
bytes into the writable *buffer* starting at *offset*. Note that the offset is
|
|
|
|
a required argument.
|
|
|
|
|
|
|
|
|
|
|
|
.. function:: unpack(fmt, string)
|
|
|
|
|
|
|
|
Unpack the string (presumably packed by ``pack(fmt, ...)``) according to the
|
|
|
|
given format. The result is a tuple even if it contains exactly one item. The
|
|
|
|
string must contain exactly the amount of data required by the format
|
|
|
|
(``len(string)`` must equal ``calcsize(fmt)``).
|
|
|
|
|
|
|
|
|
|
|
|
.. function:: unpack_from(fmt, buffer[,offset=0])
|
|
|
|
|
|
|
|
Unpack the *buffer* according to tthe given format. The result is a tuple even
|
|
|
|
if it contains exactly one item. The *buffer* must contain at least the amount
|
|
|
|
of data required by the format (``len(buffer[offset:])`` must be at least
|
|
|
|
``calcsize(fmt)``).
|
|
|
|
|
|
|
|
|
|
|
|
.. function:: calcsize(fmt)
|
|
|
|
|
|
|
|
Return the size of the struct (and hence of the string) corresponding to the
|
|
|
|
given format.
|
|
|
|
|
|
|
|
Format characters have the following meaning; the conversion between C and
|
|
|
|
Python values should be obvious given their types:
|
|
|
|
|
|
|
|
+--------+-------------------------+--------------------+-------+
|
|
|
|
| Format | C Type | Python | Notes |
|
|
|
|
+========+=========================+====================+=======+
|
|
|
|
| ``x`` | pad byte | no value | |
|
|
|
|
+--------+-------------------------+--------------------+-------+
|
|
|
|
| ``c`` | :ctype:`char` | string of length 1 | |
|
|
|
|
+--------+-------------------------+--------------------+-------+
|
|
|
|
| ``b`` | :ctype:`signed char` | integer | |
|
|
|
|
+--------+-------------------------+--------------------+-------+
|
|
|
|
| ``B`` | :ctype:`unsigned char` | integer | |
|
|
|
|
+--------+-------------------------+--------------------+-------+
|
|
|
|
| ``t`` | :ctype:`_Bool` | bool | \(1) |
|
|
|
|
+--------+-------------------------+--------------------+-------+
|
|
|
|
| ``h`` | :ctype:`short` | integer | |
|
|
|
|
+--------+-------------------------+--------------------+-------+
|
|
|
|
| ``H`` | :ctype:`unsigned short` | integer | |
|
|
|
|
+--------+-------------------------+--------------------+-------+
|
|
|
|
| ``i`` | :ctype:`int` | integer | |
|
|
|
|
+--------+-------------------------+--------------------+-------+
|
|
|
|
| ``I`` | :ctype:`unsigned int` | long | |
|
|
|
|
+--------+-------------------------+--------------------+-------+
|
|
|
|
| ``l`` | :ctype:`long` | integer | |
|
|
|
|
+--------+-------------------------+--------------------+-------+
|
|
|
|
| ``L`` | :ctype:`unsigned long` | long | |
|
|
|
|
+--------+-------------------------+--------------------+-------+
|
|
|
|
| ``q`` | :ctype:`long long` | long | \(2) |
|
|
|
|
+--------+-------------------------+--------------------+-------+
|
|
|
|
| ``Q`` | :ctype:`unsigned long | long | \(2) |
|
|
|
|
| | long` | | |
|
|
|
|
+--------+-------------------------+--------------------+-------+
|
|
|
|
| ``f`` | :ctype:`float` | float | |
|
|
|
|
+--------+-------------------------+--------------------+-------+
|
|
|
|
| ``d`` | :ctype:`double` | float | |
|
|
|
|
+--------+-------------------------+--------------------+-------+
|
|
|
|
| ``s`` | :ctype:`char[]` | string | |
|
|
|
|
+--------+-------------------------+--------------------+-------+
|
|
|
|
| ``p`` | :ctype:`char[]` | string | |
|
|
|
|
+--------+-------------------------+--------------------+-------+
|
|
|
|
| ``P`` | :ctype:`void \*` | integer | |
|
|
|
|
+--------+-------------------------+--------------------+-------+
|
|
|
|
|
|
|
|
Notes:
|
|
|
|
|
|
|
|
(1)
|
|
|
|
The ``'t'`` conversion code corresponds to the :ctype:`_Bool` type defined by
|
|
|
|
C99. If this type is not available, it is simulated using a :ctype:`char`. In
|
|
|
|
standard mode, it is always represented by one byte.
|
|
|
|
|
|
|
|
(2)
|
|
|
|
The ``'q'`` and ``'Q'`` conversion codes are available in native mode only if
|
|
|
|
the platform C compiler supports C :ctype:`long long`, or, on Windows,
|
|
|
|
:ctype:`__int64`. They are always available in standard modes.
|
|
|
|
|
|
|
|
A format character may be preceded by an integral repeat count. For example,
|
|
|
|
the format string ``'4h'`` means exactly the same as ``'hhhh'``.
|
|
|
|
|
|
|
|
Whitespace characters between formats are ignored; a count and its format must
|
|
|
|
not contain whitespace though.
|
|
|
|
|
|
|
|
For the ``'s'`` format character, the count is interpreted as the size of the
|
|
|
|
string, not a repeat count like for the other format characters; for example,
|
|
|
|
``'10s'`` means a single 10-byte string, while ``'10c'`` means 10 characters.
|
|
|
|
For packing, the string is truncated or padded with null bytes as appropriate to
|
|
|
|
make it fit. For unpacking, the resulting string always has exactly the
|
|
|
|
specified number of bytes. As a special case, ``'0s'`` means a single, empty
|
|
|
|
string (while ``'0c'`` means 0 characters).
|
|
|
|
|
|
|
|
The ``'p'`` format character encodes a "Pascal string", meaning a short
|
|
|
|
variable-length string stored in a fixed number of bytes. The count is the total
|
|
|
|
number of bytes stored. The first byte stored is the length of the string, or
|
|
|
|
255, whichever is smaller. The bytes of the string follow. If the string
|
|
|
|
passed in to :func:`pack` is too long (longer than the count minus 1), only the
|
|
|
|
leading count-1 bytes of the string are stored. If the string is shorter than
|
|
|
|
count-1, it is padded with null bytes so that exactly count bytes in all are
|
|
|
|
used. Note that for :func:`unpack`, the ``'p'`` format character consumes count
|
|
|
|
bytes, but that the string returned can never contain more than 255 characters.
|
|
|
|
|
|
|
|
For the ``'I'``, ``'L'``, ``'q'`` and ``'Q'`` format characters, the return
|
|
|
|
value is a Python long integer.
|
|
|
|
|
|
|
|
For the ``'P'`` format character, the return value is a Python integer or long
|
|
|
|
integer, depending on the size needed to hold a pointer when it has been cast to
|
|
|
|
an integer type. A *NULL* pointer will always be returned as the Python integer
|
|
|
|
``0``. When packing pointer-sized values, Python integer or long integer objects
|
|
|
|
may be used. For example, the Alpha and Merced processors use 64-bit pointer
|
|
|
|
values, meaning a Python long integer will be used to hold the pointer; other
|
|
|
|
platforms use 32-bit pointers and will use a Python integer.
|
|
|
|
|
|
|
|
For the ``'t'`` format character, the return value is either :const:`True` or
|
|
|
|
:const:`False`. When packing, the truth value of the argument object is used.
|
|
|
|
Either 0 or 1 in the native or standard bool representation will be packed, and
|
|
|
|
any non-zero value will be True when unpacking.
|
|
|
|
|
|
|
|
By default, C numbers are represented in the machine's native format and byte
|
|
|
|
order, and properly aligned by skipping pad bytes if necessary (according to the
|
|
|
|
rules used by the C compiler).
|
|
|
|
|
|
|
|
Alternatively, the first character of the format string can be used to indicate
|
|
|
|
the byte order, size and alignment of the packed data, according to the
|
|
|
|
following table:
|
|
|
|
|
|
|
|
+-----------+------------------------+--------------------+
|
|
|
|
| Character | Byte order | Size and alignment |
|
|
|
|
+===========+========================+====================+
|
|
|
|
| ``@`` | native | native |
|
|
|
|
+-----------+------------------------+--------------------+
|
|
|
|
| ``=`` | native | standard |
|
|
|
|
+-----------+------------------------+--------------------+
|
|
|
|
| ``<`` | little-endian | standard |
|
|
|
|
+-----------+------------------------+--------------------+
|
|
|
|
| ``>`` | big-endian | standard |
|
|
|
|
+-----------+------------------------+--------------------+
|
|
|
|
| ``!`` | network (= big-endian) | standard |
|
|
|
|
+-----------+------------------------+--------------------+
|
|
|
|
|
|
|
|
If the first character is not one of these, ``'@'`` is assumed.
|
|
|
|
|
|
|
|
Native byte order is big-endian or little-endian, depending on the host system.
|
|
|
|
For example, Motorola and Sun processors are big-endian; Intel and DEC
|
|
|
|
processors are little-endian.
|
|
|
|
|
|
|
|
Native size and alignment are determined using the C compiler's
|
|
|
|
:keyword:`sizeof` expression. This is always combined with native byte order.
|
|
|
|
|
|
|
|
Standard size and alignment are as follows: no alignment is required for any
|
|
|
|
type (so you have to use pad bytes); :ctype:`short` is 2 bytes; :ctype:`int` and
|
|
|
|
:ctype:`long` are 4 bytes; :ctype:`long long` (:ctype:`__int64` on Windows) is 8
|
|
|
|
bytes; :ctype:`float` and :ctype:`double` are 32-bit and 64-bit IEEE floating
|
|
|
|
point numbers, respectively. :ctype:`_Bool` is 1 byte.
|
|
|
|
|
|
|
|
Note the difference between ``'@'`` and ``'='``: both use native byte order, but
|
|
|
|
the size and alignment of the latter is standardized.
|
|
|
|
|
|
|
|
The form ``'!'`` is available for those poor souls who claim they can't remember
|
|
|
|
whether network byte order is big-endian or little-endian.
|
|
|
|
|
|
|
|
There is no way to indicate non-native byte order (force byte-swapping); use the
|
|
|
|
appropriate choice of ``'<'`` or ``'>'``.
|
|
|
|
|
|
|
|
The ``'P'`` format character is only available for the native byte ordering
|
|
|
|
(selected as the default or with the ``'@'`` byte order character). The byte
|
|
|
|
order character ``'='`` chooses to use little- or big-endian ordering based on
|
|
|
|
the host system. The struct module does not interpret this as native ordering,
|
|
|
|
so the ``'P'`` format is not available.
|
|
|
|
|
|
|
|
Examples (all using native byte order, size and alignment, on a big-endian
|
|
|
|
machine)::
|
|
|
|
|
|
|
|
>>> from struct import *
|
|
|
|
>>> pack('hhl', 1, 2, 3)
|
|
|
|
'\x00\x01\x00\x02\x00\x00\x00\x03'
|
|
|
|
>>> unpack('hhl', '\x00\x01\x00\x02\x00\x00\x00\x03')
|
|
|
|
(1, 2, 3)
|
|
|
|
>>> calcsize('hhl')
|
|
|
|
8
|
|
|
|
|
|
|
|
Hint: to align the end of a structure to the alignment requirement of a
|
|
|
|
particular type, end the format with the code for that type with a repeat count
|
|
|
|
of zero. For example, the format ``'llh0l'`` specifies two pad bytes at the
|
|
|
|
end, assuming longs are aligned on 4-byte boundaries. This only works when
|
|
|
|
native size and alignment are in effect; standard size and alignment does not
|
|
|
|
enforce any alignment.
|
|
|
|
|
|
|
|
|
|
|
|
.. seealso::
|
|
|
|
|
|
|
|
Module :mod:`array`
|
|
|
|
Packed binary storage of homogeneous data.
|
|
|
|
|
|
|
|
Module :mod:`xdrlib`
|
|
|
|
Packing and unpacking of XDR data.
|
|
|
|
|
|
|
|
|
|
|
|
.. _struct-objects:
|
|
|
|
|
|
|
|
Struct Objects
|
|
|
|
--------------
|
|
|
|
|
|
|
|
The :mod:`struct` module also defines the following type:
|
|
|
|
|
|
|
|
|
|
|
|
.. class:: Struct(format)
|
|
|
|
|
|
|
|
Return a new Struct object which writes and reads binary data according to the
|
|
|
|
format string *format*. Creating a Struct object once and calling its methods
|
|
|
|
is more efficient than calling the :mod:`struct` functions with the same format
|
|
|
|
since the format string only needs to be compiled once.
|
|
|
|
|
|
|
|
|
|
|
|
Compiled Struct objects support the following methods and attributes:
|
|
|
|
|
|
|
|
.. method:: Struct.pack(v1, v2, ...)
|
|
|
|
|
|
|
|
Identical to the :func:`pack` function, using the compiled format.
|
|
|
|
(``len(result)`` will equal :attr:`self.size`.)
|
|
|
|
|
|
|
|
|
|
|
|
.. method:: Struct.pack_into(buffer, offset, v1, v2, ...)
|
|
|
|
|
|
|
|
Identical to the :func:`pack_into` function, using the compiled format.
|
|
|
|
|
|
|
|
|
|
|
|
.. method:: Struct.unpack(string)
|
|
|
|
|
|
|
|
Identical to the :func:`unpack` function, using the compiled format.
|
|
|
|
(``len(string)`` must equal :attr:`self.size`).
|
|
|
|
|
|
|
|
|
|
|
|
.. method:: Struct.unpack_from(buffer[, offset=0])
|
|
|
|
|
|
|
|
Identical to the :func:`unpack_from` function, using the compiled format.
|
|
|
|
(``len(buffer[offset:])`` must be at least :attr:`self.size`).
|
|
|
|
|
|
|
|
|
|
|
|
.. attribute:: Struct.format
|
|
|
|
|
|
|
|
The format string used to construct this Struct object.
|
|
|
|
|
Merged revisions 57221-57391 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk
........
r57227 | facundo.batista | 2007-08-20 17:16:21 -0700 (Mon, 20 Aug 2007) | 5 lines
Catch ProtocolError exceptions and include the header information in
test output (to make it easier to debug test failures caused by
problems in the server). [GSoC - Alan McIntyre]
........
r57229 | mark.hammond | 2007-08-20 18:04:47 -0700 (Mon, 20 Aug 2007) | 5 lines
[ 1761786 ] distutils.util.get_platform() return value on 64bit Windows
As discussed on distutils-sig: Allows the generated installer name on
64bit Windows platforms to be different than the name generated for
32bit Windows platforms.
........
r57230 | mark.hammond | 2007-08-20 18:05:16 -0700 (Mon, 20 Aug 2007) | 5 lines
[ 1761786 ] distutils.util.get_platform() return value on 64bit Windows
As discussed on distutils-sig: Allows the generated installer name on
64bit Windows platforms to be different than the name generated for
32bit Windows platforms.
........
r57253 | georg.brandl | 2007-08-20 23:01:18 -0700 (Mon, 20 Aug 2007) | 2 lines
Demand version 2.5.1 since 2.5 has a bug with codecs.open context managers.
........
r57254 | georg.brandl | 2007-08-20 23:03:43 -0700 (Mon, 20 Aug 2007) | 2 lines
Revert accidental checkins from last commit.
........
r57255 | georg.brandl | 2007-08-20 23:07:08 -0700 (Mon, 20 Aug 2007) | 2 lines
Bug #1777160: mention explicitly that e.g. -1**2 is -1.
........
r57256 | georg.brandl | 2007-08-20 23:12:19 -0700 (Mon, 20 Aug 2007) | 3 lines
Bug #1777168: replace operator names "opa"... with "op1"... and mark everything up as literal,
to enhance readability.
........
r57259 | facundo.batista | 2007-08-21 09:57:18 -0700 (Tue, 21 Aug 2007) | 8 lines
Added test for behavior of operations on an unconnected SMTP object,
and tests for NOOP, RSET, and VRFY. Corrected typo in a comment for
testNonnumericPort. Added a check for constructing SMTP objects when
non-numeric ports are included in the host name. Derived a server from
SMTPServer to test various ESMTP/SMTP capabilities. Check that a
second HELO to DebuggingServer returns an error. [GSoC - Alan McIntyre]
........
r57279 | skip.montanaro | 2007-08-22 12:02:16 -0700 (Wed, 22 Aug 2007) | 2 lines
Note that BeOS is unsupported as of Python 2.6.
........
r57280 | skip.montanaro | 2007-08-22 12:05:21 -0700 (Wed, 22 Aug 2007) | 1 line
whoops - need to check in configure as well
........
r57284 | alex.martelli | 2007-08-22 14:14:17 -0700 (Wed, 22 Aug 2007) | 5 lines
Fix compile.c so that it records 0.0 and -0.0 as separate constants in a code
object's co_consts tuple; add a test to show that the previous behavior (where
these two constants were "collapsed" into one) causes serious malfunctioning.
........
r57286 | gregory.p.smith | 2007-08-22 14:32:34 -0700 (Wed, 22 Aug 2007) | 3 lines
stop leaving log.0000001 __db.00* and xxx.db turds in developer
sandboxes when bsddb3 tests are run.
........
r57301 | jeffrey.yasskin | 2007-08-22 16:14:27 -0700 (Wed, 22 Aug 2007) | 3 lines
When setup.py fails to find the necessary bits to build some modules, have it
print a slightly more informative message.
........
r57320 | brett.cannon | 2007-08-23 07:53:17 -0700 (Thu, 23 Aug 2007) | 2 lines
Make test_runpy re-entrant.
........
r57324 | georg.brandl | 2007-08-23 10:54:11 -0700 (Thu, 23 Aug 2007) | 2 lines
Bug #1768121: fix wrong/missing opcode docs.
........
r57326 | georg.brandl | 2007-08-23 10:57:05 -0700 (Thu, 23 Aug 2007) | 2 lines
Bug #1766421: "return code" vs. "status code".
........
r57328 | georg.brandl | 2007-08-23 11:08:06 -0700 (Thu, 23 Aug 2007) | 2 lines
Second half of #1752175: #ifdef out references to PyImport_DynLoadFiletab if HAVE_DYNAMIC_LOADING is not defined.
........
r57331 | georg.brandl | 2007-08-23 11:11:33 -0700 (Thu, 23 Aug 2007) | 2 lines
Use try-except-finally in contextlib.
........
r57343 | georg.brandl | 2007-08-23 13:35:00 -0700 (Thu, 23 Aug 2007) | 2 lines
Bug #1697820: document that the old slice protocol is still used by builtin types.
........
r57345 | georg.brandl | 2007-08-23 13:40:01 -0700 (Thu, 23 Aug 2007) | 2 lines
Bug #1573854: fix docs for sqlite3 cursor rowcount attr.
........
r57347 | georg.brandl | 2007-08-23 13:50:23 -0700 (Thu, 23 Aug 2007) | 2 lines
Bug #1694833: fix imp.find_module() docs wrt. packages.
........
r57348 | georg.brandl | 2007-08-23 13:53:28 -0700 (Thu, 23 Aug 2007) | 2 lines
Bug #1594966: fix misleading usage example
........
r57349 | georg.brandl | 2007-08-23 13:55:44 -0700 (Thu, 23 Aug 2007) | 2 lines
Clarify wording a bit.
........
r57351 | georg.brandl | 2007-08-23 14:18:44 -0700 (Thu, 23 Aug 2007) | 2 lines
Bug #1752332: httplib no longer uses socket.getaddrinfo().
........
r57352 | georg.brandl | 2007-08-23 14:21:36 -0700 (Thu, 23 Aug 2007) | 2 lines
Bug #1734111: document struct.Struct.size.
........
r57353 | georg.brandl | 2007-08-23 14:27:57 -0700 (Thu, 23 Aug 2007) | 2 lines
Bug #1688564: document os.path.join's absolute path behavior in the docstring.
........
r57354 | georg.brandl | 2007-08-23 14:36:05 -0700 (Thu, 23 Aug 2007) | 2 lines
Bug #1625381: clarify match vs search introduction.
........
r57355 | georg.brandl | 2007-08-23 14:42:54 -0700 (Thu, 23 Aug 2007) | 2 lines
Bug #1758696: more info about descriptors.
........
r57357 | georg.brandl | 2007-08-23 14:55:57 -0700 (Thu, 23 Aug 2007) | 2 lines
Patch #1779550: remove redundant code in logging.
........
r57378 | gregory.p.smith | 2007-08-23 22:11:38 -0700 (Thu, 23 Aug 2007) | 2 lines
Fix bug 1725856.
........
r57382 | georg.brandl | 2007-08-23 23:10:01 -0700 (Thu, 23 Aug 2007) | 2 lines
uuid creation is now threadsafe, backport from py3k rev. 57375.
........
r57389 | georg.brandl | 2007-08-24 04:47:37 -0700 (Fri, 24 Aug 2007) | 2 lines
Bug #1765375: fix stripping of unwanted LDFLAGS.
........
r57391 | guido.van.rossum | 2007-08-24 07:53:14 -0700 (Fri, 24 Aug 2007) | 2 lines
Fix silly typo in test name.
........
2007-08-25 00:32:05 +08:00
|
|
|
.. attribute:: Struct.size
|
|
|
|
|
|
|
|
The calculated size of the struct (and hence of the string) corresponding
|
|
|
|
to :attr:`format`.
|
|
|
|
|