添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

This issue tracker has been migrated to GitHub , and is currently read-only .
For more information, see the GitHub FAQs in the Python's Developer Guide.

Created on 2011-04-21 13:42 by bero , last changed 2022-04-11 14:57 by admin . This issue is now closed .

Files File name Uploaded Description python-2.7.1-fix-httplib-UnicodeDecodeError.patch bero , 2011-04-21 13:42 Proposed fix
Sending e.g. a JPEG file with a httplib POST request (e.g. through mechanize) can result in an error like this:
  File "/usr/lib64/python2.7/httplib.py", line 947, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib64/python2.7/httplib.py", line 988, in _send_request
    self.endheaders(body)
  File "/usr/lib64/python2.7/httplib.py", line 941, in endheaders
    self._send_output(message_body)
  File "/usr/lib64/python2.7/httplib.py", line 802, in _send_output
    msg += message_body
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 2566: invalid start byte
The code triggering this is the attempt to merge the msg and message_body into a single request in httplib.py lines 791+
The patch I'm attaching treats an invalid string of unknown encoding (e.g. binary data wrapped as string) like something that isn't a string.
Works for me with the patch.
Did you run the httplib test with your patch? Interactively
>>> from test.test_httplib import test_main as f; f()
(verbose mode, over 40 tests)
In 3.x, the patch would be to http/client.py, line 802 in 3.2 release
if isinstance(message_body, str) # becomes
if isinstance(message_body, bytes)
Will this be an issue in 3.x?
Not sure how to get it into verbose mode (I presume you don't mean "python -v"), but normal mode (22 tests) works fine:
Python 2.7.1 (r271:86832, Apr 22 2011, 13:40:40)
[GCC 4.6.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from test.test_httplib import test_main as f
test_auto_headers (test.test_httplib.HeaderTests) ... ok
test_ipv6host_header (test.test_httplib.HeaderTests) ... ok
test_putheader (test.test_httplib.HeaderTests) ... ok
test_responses (test.test_httplib.OfflineTest) ... ok
test_bad_status_repr (test.test_httplib.BasicTest) ... ok
test_chunked (test.test_httplib.BasicTest) ... ok
test_chunked_head (test.test_httplib.BasicTest) ... ok
test_epipe (test.test_httplib.BasicTest) ... ok
test_filenoattr (test.test_httplib.BasicTest) ... ok
test_host_port (test.test_httplib.BasicTest) ... ok
test_incomplete_read (test.test_httplib.BasicTest) ... ok
test_negative_content_length (test.test_httplib.BasicTest) ... ok
test_partial_reads (test.test_httplib.BasicTest) ... ok
test_read_head (test.test_httplib.BasicTest) ... ok
test_response_headers (test.test_httplib.BasicTest) ... ok
test_send (test.test_httplib.BasicTest) ... ok
test_send_file (test.test_httplib.BasicTest) ... ok
test_status_lines (test.test_httplib.BasicTest) ... ok
testTimeoutAttribute (test.test_httplib.TimeoutTest)
This will prove that the timeout gets through ... ok
test_attributes (test.test_httplib.HTTPSTimeoutTest) ... ok
testHTTPConnectionSourceAddress (test.test_httplib.SourceAddressTest) ... ok
testHTTPSConnectionSourceAddress (test.test_httplib.SourceAddressTest) ... ok
----------------------------------------------------------------------
Ran 22 tests in 0.004s
Not sure if this is an issue with 3.x - I haven't used 3.x so far.
Hello Bernhard, 
I tried to a POST of JPEG file, through urllib2 (which internally uses httplib) and goes through the code that you pointed out and I don't face any problem. I am able to POST binaries using httplib.
I am also surprised at UnicodeDecodeError which is being raised. The POST data is string (8-bit strings) in Python2.7 and the portion of code will have no problem in creating the content.
You will get UnicodeDecodeError, only if you explicitly pass a Unicode Object as Data and never when you pass string or binary string.
Perhaps mechanize is doing something wrong here and sending a Unicode object.
So, this really does not look like a bug to me.
(Also a note on patch. The patch tries to silence the error, which is wrong thing to do).
If you can provide a simple snippet to reproduce this error, feel free reopen this again. I am closing this as 'works for me'.
Thanks.
I have the same problem as the original submitter.
The reason it previously worked for you was probably because you didn't utilize a "right" unicode string in the urllib2.request. The following code will raise the exception (I enclose the data file for completeness, but it fails with basically any binary data).
It works fine with Python 2.6.6, but fails with Python 2.7.1.
import urllib2
f = open("data", "r")
mydata = f.read()
f.close()
#this fails
url=unicode('http://localhost/test')
#this works
#url=str('http://localhost/test')
#this also works 
#url=unicode('http://localhost')
req = urllib2.Request(url, data=mydata)
urllib2.urlopen(req)
    
The bug was about sending Binary "data" via httplib. In the example you
wrote, you are sending a unicode "url" and experiencing a failure for
certain examples.
In the 2.7, the urls should be str type, we don't have function to
deal with unicode url separately and sending of unicode url is an
error.
Hello,
I would like to subscribe to the issue. The problem seems to indeed exist in Python 2.7. 
What I'm doing is to proxy HTTP requests (using Django) and the PUT / POST requests are working fine on Python 2.6 but are failing on 2.7 with the error already presented in the first bero's message.
I'm using httplib2 and the code looks like
http = httplib2.Http(timeout=5)
    resp, content = http.request(
        request_url, method,
        body=body, headers=headers)
    except (AttributeError, httplib.ResponseNotReady), e:
        # ...
Body is the result of the Django's request.read() which in fact contain the binary data from the PUT / POST request.
The full stack trace is:
Traceback:
File "/home/cyrus/workspace/macleod/ve/lib/python2.7/site-packages/django/core/handlers/base.py" in get_response
  111.                         response = callback(request, *callback_args, **callback_kwargs)
File "/home/cyrus/workspace/macleod/apps/macleod/macleod/auth.py" in _decorated_view
  33.         return view(request, *args, **kwargs)
File "/home/cyrus/workspace/macleod/ve/lib/python2.7/site-packages/django/views/decorators/csrf.py" in wrapped_view
  39.         resp = view_func(*args, **kwargs)
File "/home/cyrus/workspace/macleod/ve/lib/python2.7/site-packages/django/views/decorators/csrf.py" in wrapped_view
  52.         return view_func(*args, **kwargs)
File "/home/cyrus/workspace/macleod/apps/macleod/macleod/views.py" in dispatch
  55.         original=request.build_absolute_uri())
File "/home/cyrus/workspace/macleod/apps/macleod/macleod/handlers/its.py" in proxy
  51.                 body=body, headers=headers)
File "/home/cyrus/workspace/macleod/ve/lib/python2.7/site-packages/httplib2/__init__.py" in request
  1129.                     (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
File "/home/cyrus/workspace/macleod/ve/lib/python2.7/site-packages/httplib2/__init__.py" in _request
  901.         (response, content) = self._conn_request(conn, request_uri, method, body, headers)
File "/home/cyrus/workspace/macleod/ve/lib/python2.7/site-packages/httplib2/__init__.py" in _conn_request
  862.                 conn.request(method, request_uri, body, headers)
File "/usr/local/lib/python2.7/httplib.py" in request
  941.         self._send_request(method, url, body, headers)
File "/usr/local/lib/python2.7/httplib.py" in _send_request
  975.         self.endheaders(body)
File "/usr/local/lib/python2.7/httplib.py" in endheaders
  937.         self._send_output(message_body)
File "/usr/local/lib/python2.7/httplib.py" in _send_output
  795.             msg += message_body
    
Hello again,
After some digging I found that the "real" problem was because the provided URL was a unicode string and the concatenation was failing. Maybe this is not a big deal, but I think we should least do a proper assertion for the provided URL or some other checks, because the error encountered is at least confusing.
Ion, as you perhaps noticed, posting a message 'subscribes' you (puts you on the nosy list). One can also add oneself as nosy with the little button under it without saying anything.
This should be reopened because we do not change error classes in bugfix releases (ie, future 2.7.x releases) because that can break code -- unless the error class is contrary to the doc and we decide the doc is right. Even as a new feature, a change is dubious and carefully to be considered.
There is another problem that makes the problem even more critical. OS X 10.7 does include Python 2.7.1 as the *default* interpreter.
So we'll need both a fix for the future and an workaround.
BTW, the hack with sys.setdefaultencoding cannot be used if you really send binary data.
Soren, this is an issue that claimed a bug, not a bug. The resolution is that the claim appears false because the problem arose from using unicode rather than bytes url. The error message may be confusing, but the error class cannot be changed. Senthil says that he *did* send non-ascii bytes with no problem.
I have to add some details here. First, this bug has nothing to do with the URL, it does reproduce for normal urls.
Still the problem with the line: "msg += message_body" is quite complex when combined with Python 2.7:
type(msg) is unicode
type(message_body) is str ... even if I tried to manually force Python for use bytes. It seams that in 2.7 bytes are alias to str. Due to this the code will fail to run only on 2.7 because it will try to convert  binary data to unicode string.
If I am not mistaken the code will work with Python 3.x, because there bytes() are not str().
Hi Sorin,
On Sat, Jun 25, 2011 at 07:54:24PM +0000, sorin wrote:
> type(message_body) is str ... even if I tried to manually force
> Python for use bytes. It seams that in 2.7 bytes are alias to str.
> Due to this the code will fail to run only on 2.7 because it will
> try to convert  binary data to unicode string.
Bit confused here. You encode the string to bytes and decode it back
to str. One does not force bytes to str. And if you use, str or bytes
consistently in Python2.7 you wont face the problem.
2022-04-11 14:57:16adminsetgithub: 56107 2011-07-04 16:16:35eric.araujosetmessages: - msg134878 2011-06-25 20:22:42orsenthilsetmessages: + msg139116 2011-06-25 19:54:23ssbarneasetmessages: + msg139110 2011-06-24 18:22:47terry.reedysetmessages: + msg138972 2011-06-24 15:40:23ssbarneasetmessages: + msg138954 2011-06-24 15:27:10orsenthilsetmessages: + msg138952 2011-06-24 11:25:05ssbarneasetmessages: + msg138914 2011-06-24 11:00:41ssbarneasetnosy: + ssbarnea
messages: + msg138908
2011-06-10 23:45:50terry.reedysetmessages: + msg138142 2011-06-10 18:00:15terry.reedysetmessages: + msg138128 2011-06-10 09:06:29cyrussetmessages: + msg138059 2011-06-10 08:48:13cyrussetnosy: + cyrus
messages: + msg138056
2011-05-16 01:51:57orsenthilsetmessages: + msg136060 2011-05-15 18:29:59Jiri.Horkysetfiles: + data
nosy: + Jiri.Horky
messages: + msg136043

2011-05-06 13:08:19orsenthilsetstatus: open -> closed
messages: + msg135290

assignee: orsenthil
resolution: works for me
stage: test needed -> resolved 2011-04-30 16:30:08eric.araujosetnosy: + eric.araujo
messages: + msg134878
2011-04-30 06:57:04berosetmessages: + msg134840 2011-04-30 00:11:31terry.reedysetnosy: + terry.reedy

messages: + msg134824
stage: test needed 2011-04-21 17:37:57santoso.wijayasetnosy: + santoso.wijaya
2011-04-21 13:44:13ezio.melottisetnosy: + orsenthil, ezio.melotti
2011-04-21 13:42:33berocreate