(i.e. email.test), so move the guts of them here from Lib/test. The
latter directory will retain stubs to run the email.test tests using
Python's standard regression test.
test_email_torture.py is a torture tester which will not run under
Python's test suite because I don't want to commit megs of data to
that project (it will fail cleanly there). When run under the mimelib
project it'll stress test the package with megs of message samples
collected from various locations in the wild.
(i.e. email.test), so move the guts of them here from Lib/test. The
latter directory will retain stubs to run the email.test tests using
Python's standard regression test.
test_email_torture.py is a torture tester which will not run under
Python's test suite because I don't want to commit megs of data to
that project (it will fail cleanly there). When run under the mimelib
project it'll stress test the package with megs of message samples
collected from various locations in the wild.
email/test/data is a copy of Lib/test/data. The fate of the latter is
still undecided.
backwards compatibility, we're silently deprecating get_type(),
get_subtype() and get_main_type(). We may eventually noisily
deprecate these. For now, we'll just fix a bug in the splitting of
the main and subtypes.
get_content_type(), get_content_maintype(), get_content_subtype(): New
methods which replace the above. These /always/ return a content type
string and do not take a failobj, because an email message always at
least has a default content type.
set_default_type(): Someday there may be additional default content
types, so don't hard code an assertion about the value of the ctype
argument.
version of PySlice_GetIndicesEx"):
> OK. Michael, if you want to check in indices(), go ahead.
Then I did what was needed, but didn't check it in. Here it is.
quoting:
in non-strict mode, messages don't require a blank line at the end
with a missing end-terminator. A single newline is sufficient now.
Handle trailing whitespace at the end of a boundary. Had to switch
from using string.split() to re.split()
Handle whitespace on the end of a parameter list for Content-type.
Handle whitespace on the end of a plain content-type header.
Specifically,
get_type(): Strip the content type string.
_get_params_preserve(): Strip the parameter names and values on both
sides.
_parsebody(): Lots of changes as described above, with some stylistic
changes by Barry (who hopefully didn't screw things up ;).
the default range to end at 2**20 (machines are much faster now).
Fixed what was quite a arguably a bug, explaining an old mystery: the
"!sort" case here contructs what *was* a quadratic-time disaster for
the old quicksort implementation. But under the current samplesort, it
always ran much faster than *sort (the random case). This never made
sense. Turns out it was because !sort was sorting an integer array,
while all the other cases sort floats; and comparing ints goes much
quicker than comparing floats in Python. After changing !sort to chew
on floats instead, it's now slower than the random sort case, which
makes more sense (but is just a few percent slower; samplesort is
massively less sensitive to "bad patterns" than quicksort).
The implementation now stores all the lines of the request in a buffer
and makes a single send() call when the request is finished,
specifically when endheaders() is called.
This appears to improve performance. The old code called send() for
each line. The sends are all short, so they caused bad interactions
with the Nagle algorithm and delayed acknowledgements. In simple
tests, the second packet was delayed by 100s of ms. The second send was
delayed by the Nagle algorithm, waiting for the ack. The delayed ack
strategy delays the ack in hopes of piggybacking it on a data packet,
but the server won't send any data until it receives the complete
request.
This change minimizes the problem that Nagle + delayed ack will cause
a problem, although a request large enough to be broken into two
packets will still suffer some delay. Luckily the MSS is large enough
to accomodate most single packets.
XXX Bug fix candidate?
existed at the time atexit first got imported. That's a bug, and this
fixes it.
Also reworked test_atexit.py to test for this too, and to stop using
an "expected output" file, and to test what actually happens at exit
instead of just simulating what it thinks atexit will do at exit.
Bugfix candidate, but it's messy so I'll backport to 2.2 myself.
The test of httplib makes it difficult to maintain httplib. There are
two many idioms that pyclbr doesn't seem to understand, and I don't
understand how to update these tests to make them work.
Also remove commented out test of urllib2.
more spaces only crashed pdb.
While I was at it, cleaned up some style nits (spaces between function
and parenthesis, and redundant parentheses in if statement).
takes much longer to run in the context of the test suite than when run in
isolation. That's because it forces a large number of full collections,
which take time proportional to the total number of gc'ed objects in the
whole system.
But since the dangerous implementation trickery that caused this test to
fail in 2.0, 2.1 and 2.2 doesn't exist in 2.3 anymore (the trashcan
mechanism stopped doing evil things when the possibility for compiling
without cyclic gc was taken away), such an expensive test is no longer
justified. This checkin leaves the test intact, but fiddles the
constants to reduce the runtime by about a factor of 5.
debug-build failure when an instance of a new-style class is resurrected
by a __del__ method -- we simply never had any code that tried this.
This is already fixed in 2.3 CVS. In 2.2.1, it blows up via
Fatal Python error: GC object already in linked list
I'll fix it in 2.2.1 CVS next.
The recent SSL changes resulted in important, but subtle changes to
close() semantics. Since builtin socket makefile() is not called for
SSL connections, we don't get separately closeable fds for connection
and response. Comments in the code explain how to restore makefile
semantics.
Bug fix candidate.
.splitlines() on them, since they may be Header instances.
test_multilingual(), test_header_ctor_default_args(): New tests of
make_header() and that Header can take all default arguments.
create a Header instance. Closes feature request #539481.
Header.__init__(): Allow the initial string to be omitted.
__eq__(), __ne__(): Support rich comparisons for equality of Header
instances withy Header instances or strings.
Also, update a bunch of docstrings.
argument to the constructor -- defaulting to true -- which is
different than Anthony's approach of using global state.
parse(), parsestr(): Grow a `headersonly' argument which stops parsing
once the header block has been seen, i.e. it does /not/ parse or even
read the body of the message. This is used for parsing message/rfc822
type messages.
We need test cases for the non-strict parsing. Anthony will supply
these.
_parsebody(): We can get rid of the isdigest end-of-line kludges,
although we still need to know if we're parsing a multipart/digest so
we can set the default type accordingly.
text/plain but the RFCs state that inside a multipart/digest, the
default type is message/rfc822. To preserve idempotency, we need a
separate place to define the default type than the Content-Type:
header.
get_default_type(), set_default_type(): Accessor and mutator methods
for the default type.
recursive generation).
_dispatch(): If the message object doesn't have a Content-Type:
header, check its default type instead of assuming it's text/plain.
This makes for correct generation of message/rfc822 containers.
_handle_multipart(): We can get rid of the isdigest kludge. Just
print the message as normal and everything will work out correctly.
_handle_mulitpart_digest(): We don't need this anymore either.
ndiff function, so just alias it to assertEqual in that case.
Various: make sure all openfile()/read()'s are wrapped in
try/finally's so the file gets closed.
A bunch of new tests checking the corner cases for multipart/digest
and message/rfc822.
If multiple header fields with the same name occur, they are combined
according to the rules in RFC 2616 sec 4.2:
Appending each subsequent field-value to the first, each separated by
a comma. The order in which header fields with the same field-name are
received is significant to the interpretation of the combined field
value.
Section 19.6 of RFC 2616 (HTTP/1.1):
It is beyond the scope of a protocol specification to mandate
compliance with previous versions. HTTP/1.1 was deliberately
designed, however, to make supporting previous versions easy....
And we would expect HTTP/1.1 clients to:
- recognize the format of the Status-Line for HTTP/1.0 and 1.1
responses;
- understand any valid response in the format of HTTP/0.9, 1.0, or
1.1.
The changes to the code do handle response in the format of HTTP/0.9.
Some users may consider this a bug because all responses with a
sufficiently corrupted status line will look like an HTTP/0.9
response. These users can pass strict=1 to the HTTP constructors to
get a BadStatusLine exception instead.
While this is a new feature of sorts, it enhances the robustness of
the code (be tolerant in what you accept). Thus, I consider it a bug
fix candidate.
XXX strict needs to be documented.
[1.3] Added documentation of the namespace URI for elements with no namespace.
[1.4] New property http://www.python.org/sax/properties/encoding.
[1.5] Support optional string interning in pyexpat.
[1.15]
Added understanding of the feature_validation, feature_external_pes,
and feature_string_interning features.
Added support for the feature_external_ges feature.
Added support for the property_xml_string property.
[1.16]
Made it recognize the namespace prefixes feature.
[1.17]
removed erroneous first line
[1.19]
Support optional string interning in pyexpat.
[1.21]
Restore compatibility with versions of Python that did not support weak
references. These do not get the cyclic reference fix, but they will
continue to work as they did before.
[1.22]
Activate entity processing unless standalone.
Specifically,
decode_rfc2231(), encode_rfc2231(): Functions to encode and decode RFC
2231 style parameters.
decode_params(): Function to decode a list of parameters.
Specifically,
_formatparam(): Teach this about encoded `param' arguments, which are
a 3-tuple of items (charset, language, value). language is ignored.
_unquotevalue(): Handle both 3-tuple RFC 2231 values and unencoded
values.
_get_params_preserve(): Decode the parameters before returning them.
get_params(), get_param(): Use _unquotevalue().
get_filename(), get_boundary(): Teach these about encoded (3-tuple)
parameters.
folding. Note that some of the Japanese tests have changed, but I
don't really know if they are correct or not. :(
Someone with Japanese and RFC 2047 expertise, please take a look!
headers with no charset or 'us-ascii' charsets. Actually this is only
partially true: we know about semicolons (but not true parameters) and
we know about whitespace (but not technically folding whitespace).
Still it should be good enough for all practical purposes.
Other changes include:
__init__(): Add a continuation_ws argument, which defaults to a single
space. Set this to change the whitespace used for continuation lines
when a header must be split. Also, changed the way header line
lengths are calculated, so that they take into account continuation_ws
(when tabs-expanded) and any provided header_name parameter. This
should do much better on returning split headers for which the first
and subsequent lines must fit into a specified width.
guess_maxlinelen(): Removed. I don't think we need this method as
part of the public API.
encode_chunks() -> _encode_chunks(): I don't think we need this one as
part of the public API either.
know anything about RFC 2047 encoded headers. Fortunately we have a
perfectly good header splitter in Header.encode(). So we just call
that to give us a properly formatted and split header.
Header.encode() didn't know about "highest-level syntactic breaks" but
that's been fixed now too.
Didn't use the patch, because universal newlines support made it easy.
It might be worth fixing the actual problem in the 2.2 maintenance
branch, in which case the patch is still needed.
Setting the buffer_text attribute to true causes the parser to collect
character data, waiting as long as possible to report it to the Python
callback. This can save an enormous number of callbacks from C to
Python, which can be a substantial performance improvement.
buffer_text defaults to false.
The HTTPResponse class now handles 100 continue responses, instead of
choking on them. It detects them internally in the _begin() method
and ignores them. Based on a patch by Bob Kline.
This closes SF bugs 498149 and 551273.
The FakeSocket class (for SSL) is now usable with HTTP/1.1
connections. The old version of the code could not work with
persistent connections, because the makefile() implementation read
until EOF before returning. If the connection is persistent, the
server sends a response and leaves the connection open. A client that
reads until EOF will block until the server gives up on the connection
-- more than a minute in my test case.
The problem was fixed by implementing a reasonable makefile(). It
reads data only when it is needed by the layers above it. It's
implementation uses an internal buffer with a default size of 8192.
Also, rename begin() method of HTTPResponse to _begin() because it
should only be called by the HTTPConnection.
the Idle debugger.
M PyShell.py : Call RemoteDebugger.close_remote_debugger()
M RemoteDebugger.py: Add close_remote_debugger(); further polish code used
to start the debugger sections.
M rpc.py : Add comments on Idlefork methods register(), unregister()
comment out unused methods
M run.py : Add stop_the_debugger(); polish code
M Debugger.py : Added clear_file_breaks()
M EditorWindow.py : Clear breaks when closed, commments->docstrings,
comment out some debugging print statements
M PyShell.py : comments->docstrings ; clarify extending EditorWindow
methods.
M RemoteDebugger.py: Add clear_all_file_breaks() functionality,
clarify some comments.
The default implementation calls _compile() to compile individual
files. This method must be implemented by the subclass. This change
factors out most of the remaining common code in all the compilers
except mwerks.
In a fresh interpreter, type.mro(tuple) would segfault, because
PyType_Ready() isn't called for tuple yet. To fix, call
PyType_Ready(type) if type->tp_dict is NULL.
Use a repr() on the subprocess side when fetching dict values for stack.
The various dict entities are not needed by the debugger GUI, only
their representation.
These built-in functions are replaced by their (now callable) type:
slice()
buffer()
and these types can also be called (but have no built-in named
function named after them)
classobj (type name used to be "class")
code
function
instance
instancemethod (type name used to be "instance method")
The module "new" has been replaced with a small backward compatibility
placeholder in Python.
A large portion of the patch simply removes the new module from
various platform-specific build recipes. The following binary Mac
project files still have references to it:
Mac/Build/PythonCore.mcp
Mac/Build/PythonStandSmall.mcp
Mac/Build/PythonStandalone.mcp
[I've tweaked the code layout and the doc strings here and there, and
added a comment to types.py about StringTypes vs. basestring. --Guido]