This issue tracker
has been migrated to
GitHub
,
and is currently
read-only
.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on
2011-04-21 13:42
by
bero
, last changed
2022-04-11 14:57
by
admin
. This issue is now
closed
.
Files
File name
Uploaded
Description
python-2.7.1-fix-httplib-UnicodeDecodeError.patch
bero
,
2011-04-21 13:42
Proposed fix
Sending e.g. a JPEG file with a httplib POST request (e.g. through mechanize) can result in an error like this:
File "/usr/lib64/python2.7/httplib.py", line 947, in request
self._send_request(method, url, body, headers)
File "/usr/lib64/python2.7/httplib.py", line 988, in _send_request
self.endheaders(body)
File "/usr/lib64/python2.7/httplib.py", line 941, in endheaders
self._send_output(message_body)
File "/usr/lib64/python2.7/httplib.py", line 802, in _send_output
msg += message_body
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 2566: invalid start byte
The code triggering this is the attempt to merge the msg and message_body into a single request in httplib.py lines 791+
The patch I'm attaching treats an invalid string of unknown encoding (e.g. binary data wrapped as string) like something that isn't a string.
Works for me with the patch.
Did you run the httplib test with your patch? Interactively
>>> from test.test_httplib import test_main as f; f()
(verbose mode, over 40 tests)
In 3.x, the patch would be to http/client.py, line 802 in 3.2 release
if isinstance(message_body, str) # becomes
if isinstance(message_body, bytes)
Will this be an issue in 3.x?
Not sure how to get it into verbose mode (I presume you don't mean "python -v"), but normal mode (22 tests) works fine:
Python 2.7.1 (r271:86832, Apr 22 2011, 13:40:40)
[GCC 4.6.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from test.test_httplib import test_main as f
test_auto_headers (test.test_httplib.HeaderTests) ... ok
test_ipv6host_header (test.test_httplib.HeaderTests) ... ok
test_putheader (test.test_httplib.HeaderTests) ... ok
test_responses (test.test_httplib.OfflineTest) ... ok
test_bad_status_repr (test.test_httplib.BasicTest) ... ok
test_chunked (test.test_httplib.BasicTest) ... ok
test_chunked_head (test.test_httplib.BasicTest) ... ok
test_epipe (test.test_httplib.BasicTest) ... ok
test_filenoattr (test.test_httplib.BasicTest) ... ok
test_host_port (test.test_httplib.BasicTest) ... ok
test_incomplete_read (test.test_httplib.BasicTest) ... ok
test_negative_content_length (test.test_httplib.BasicTest) ... ok
test_partial_reads (test.test_httplib.BasicTest) ... ok
test_read_head (test.test_httplib.BasicTest) ... ok
test_response_headers (test.test_httplib.BasicTest) ... ok
test_send (test.test_httplib.BasicTest) ... ok
test_send_file (test.test_httplib.BasicTest) ... ok
test_status_lines (test.test_httplib.BasicTest) ... ok
testTimeoutAttribute (test.test_httplib.TimeoutTest)
This will prove that the timeout gets through ... ok
test_attributes (test.test_httplib.HTTPSTimeoutTest) ... ok
testHTTPConnectionSourceAddress (test.test_httplib.SourceAddressTest) ... ok
testHTTPSConnectionSourceAddress (test.test_httplib.SourceAddressTest) ... ok
----------------------------------------------------------------------
Ran 22 tests in 0.004s
Not sure if this is an issue with 3.x - I haven't used 3.x so far.
Hello Bernhard,
I tried to a POST of JPEG file, through urllib2 (which internally uses httplib) and goes through the code that you pointed out and I don't face any problem. I am able to POST binaries using httplib.
I am also surprised at UnicodeDecodeError which is being raised. The POST data is string (8-bit strings) in Python2.7 and the portion of code will have no problem in creating the content.
You will get UnicodeDecodeError, only if you explicitly pass a Unicode Object as Data and never when you pass string or binary string.
Perhaps mechanize is doing something wrong here and sending a Unicode object.
So, this really does not look like a bug to me.
(Also a note on patch. The patch tries to silence the error, which is wrong thing to do).
If you can provide a simple snippet to reproduce this error, feel free reopen this again. I am closing this as 'works for me'.
Thanks.
I have the same problem as the original submitter.
The reason it previously worked for you was probably because you didn't utilize a "right" unicode string in the urllib2.request. The following code will raise the exception (I enclose the data file for completeness, but it fails with basically any binary data).
It works fine with Python 2.6.6, but fails with Python 2.7.1.
import urllib2
f = open("data", "r")
mydata = f.read()
f.close()
#this fails
url=unicode('http://localhost/test')
#this works
#url=str('http://localhost/test')
#this also works
#url=unicode('http://localhost')
req = urllib2.Request(url, data=mydata)
urllib2.urlopen(req)
The bug was about sending Binary "data" via httplib. In the example you
wrote, you are sending a unicode "url" and experiencing a failure for
certain examples.
In the 2.7, the urls should be str type, we don't have function to
deal with unicode url separately and sending of unicode url is an
error.
Hello,
I would like to subscribe to the issue. The problem seems to indeed exist in Python 2.7.
What I'm doing is to proxy HTTP requests (using Django) and the PUT / POST requests are working fine on Python 2.6 but are failing on 2.7 with the error already presented in the first bero's message.
I'm using httplib2 and the code looks like
http = httplib2.Http(timeout=5)
resp, content = http.request(
request_url, method,
body=body, headers=headers)
except (AttributeError, httplib.ResponseNotReady), e:
# ...
Body is the result of the Django's request.read() which in fact contain the binary data from the PUT / POST request.
The full stack trace is:
Traceback:
File "/home/cyrus/workspace/macleod/ve/lib/python2.7/site-packages/django/core/handlers/base.py" in get_response
111. response = callback(request, *callback_args, **callback_kwargs)
File "/home/cyrus/workspace/macleod/apps/macleod/macleod/auth.py" in _decorated_view
33. return view(request, *args, **kwargs)
File "/home/cyrus/workspace/macleod/ve/lib/python2.7/site-packages/django/views/decorators/csrf.py" in wrapped_view
39. resp = view_func(*args, **kwargs)
File "/home/cyrus/workspace/macleod/ve/lib/python2.7/site-packages/django/views/decorators/csrf.py" in wrapped_view
52. return view_func(*args, **kwargs)
File "/home/cyrus/workspace/macleod/apps/macleod/macleod/views.py" in dispatch
55. original=request.build_absolute_uri())
File "/home/cyrus/workspace/macleod/apps/macleod/macleod/handlers/its.py" in proxy
51. body=body, headers=headers)
File "/home/cyrus/workspace/macleod/ve/lib/python2.7/site-packages/httplib2/__init__.py" in request
1129. (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
File "/home/cyrus/workspace/macleod/ve/lib/python2.7/site-packages/httplib2/__init__.py" in _request
901. (response, content) = self._conn_request(conn, request_uri, method, body, headers)
File "/home/cyrus/workspace/macleod/ve/lib/python2.7/site-packages/httplib2/__init__.py" in _conn_request
862. conn.request(method, request_uri, body, headers)
File "/usr/local/lib/python2.7/httplib.py" in request
941. self._send_request(method, url, body, headers)
File "/usr/local/lib/python2.7/httplib.py" in _send_request
975. self.endheaders(body)
File "/usr/local/lib/python2.7/httplib.py" in endheaders
937. self._send_output(message_body)
File "/usr/local/lib/python2.7/httplib.py" in _send_output
795. msg += message_body
Hello again,
After some digging I found that the "real" problem was because the provided URL was a unicode string and the concatenation was failing. Maybe this is not a big deal, but I think we should least do a proper assertion for the provided URL or some other checks, because the error encountered is at least confusing.
Ion, as you perhaps noticed, posting a message 'subscribes' you (puts you on the nosy list). One can also add oneself as nosy with the little button under it without saying anything.
This should be reopened because we do not change error classes in bugfix releases (ie, future 2.7.x releases) because that can break code -- unless the error class is contrary to the doc and we decide the doc is right. Even as a new feature, a change is dubious and carefully to be considered.
There is another problem that makes the problem even more critical. OS X 10.7 does include Python 2.7.1 as the *default* interpreter.
So we'll need both a fix for the future and an workaround.
BTW, the hack with sys.setdefaultencoding cannot be used if you really send binary data.
Soren, this is an issue that claimed a bug, not a bug. The resolution is that the claim appears false because the problem arose from using unicode rather than bytes url. The error message may be confusing, but the error class cannot be changed. Senthil says that he *did* send non-ascii bytes with no problem.
I have to add some details here. First, this bug has nothing to do with the URL, it does reproduce for normal urls.
Still the problem with the line: "msg += message_body" is quite complex when combined with Python 2.7:
type(msg) is unicode
type(message_body) is str ... even if I tried to manually force Python for use bytes. It seams that in 2.7 bytes are alias to str. Due to this the code will fail to run only on 2.7 because it will try to convert binary data to unicode string.
If I am not mistaken the code will work with Python 3.x, because there bytes() are not str().
Hi Sorin,
On Sat, Jun 25, 2011 at 07:54:24PM +0000, sorin wrote:
> type(message_body) is str ... even if I tried to manually force
> Python for use bytes. It seams that in 2.7 bytes are alias to str.
> Due to this the code will fail to run only on 2.7 because it will
> try to convert binary data to unicode string.
Bit confused here. You encode the string to bytes and decode it back
to str. One does not force bytes to str. And if you use, str or bytes
consistently in Python2.7 you wont face the problem.
2022-04-11 14:57:16adminsetgithub: 56107
2011-07-04 16:16:35eric.araujosetmessages:
- msg134878
2011-06-25 20:22:42orsenthilsetmessages:
+ msg139116
2011-06-25 19:54:23ssbarneasetmessages:
+ msg139110
2011-06-24 18:22:47terry.reedysetmessages:
+ msg138972
2011-06-24 15:40:23ssbarneasetmessages:
+ msg138954
2011-06-24 15:27:10orsenthilsetmessages:
+ msg138952
2011-06-24 11:25:05ssbarneasetmessages:
+ msg138914
2011-06-24 11:00:41ssbarneasetnosy:
+ ssbarnea
messages:
+ msg138908
2011-06-10 23:45:50terry.reedysetmessages:
+ msg138142
2011-06-10 18:00:15terry.reedysetmessages:
+ msg138128
2011-06-10 09:06:29cyrussetmessages:
+ msg138059
2011-06-10 08:48:13cyrussetnosy:
+ cyrus
messages:
+ msg138056
2011-05-16 01:51:57orsenthilsetmessages:
+ msg136060
2011-05-15 18:29:59Jiri.Horkysetfiles:
+ data
nosy:
+ Jiri.Horky
messages:
+ msg136043
2011-05-06 13:08:19orsenthilsetstatus: open -> closed
messages:
+ msg135290
assignee: orsenthil
resolution: works for me
stage: test needed -> resolved
2011-04-30 16:30:08eric.araujosetnosy:
+ eric.araujo
messages:
+ msg134878
2011-04-30 06:57:04berosetmessages:
+ msg134840
2011-04-30 00:11:31terry.reedysetnosy:
+ terry.reedy
messages:
+ msg134824
stage: test needed
2011-04-21 17:37:57santoso.wijayasetnosy:
+ santoso.wijaya
2011-04-21 13:44:13ezio.melottisetnosy:
+ orsenthil, ezio.melotti
2011-04-21 13:42:33berocreate