Ticket #2118 (closed defect: fixed)
UnicodeDecodeError when using unicode in URLs
| Reported by: | streawkceur | Owned by: | faide |
|---|---|---|---|
| Priority: | normal | Milestone: | 1.1 |
| Component: | TurboGears | Version: | 1.0.7 |
| Severity: | normal | Keywords: | |
| Cc: |
Description
When requesting a URL with Unicode characters in it (e.g. the Umlaut "ü" encoded as "%C3%BC") the logging in turbogears.controllers.RootController?._cp_log_access fails with a UnicodeDecodeError?.
It works when changing
'r': request.requestLine, to 'r': request.requestLine.decode('utf8'),
'f': request.headers.get('referer', ), to 'f': request.headers.get('referer', ).decode('utf8'),
and self.accesslog.info(s) to self.accesslog.info(s.encode('utf8'))
Might be a bit more difficult if your app uses a different encoding.
Cheers
Change History
comment:2 follow-up: ↓ 3 Changed 3 years ago by streawkceur
I just quickstarted a new project and added a default method to the Root controller:
@expose()
def default(self, *args, **kwargs):
return 'default'
When I open this URL http://localhost:8080/%C3%BC (UTF-8 "ü") I get a 500:
Unrecoverable error in the server.
Page handler: 'ordinal not in range(128)'
Traceback (most recent call last):
File "/Library/Python/2.5/site-packages/CherryPy-2.3.0-py2.5.egg/cherrypy/_cpwsgi.py", line 125, in wsgiApp
environ['wsgi.input'])
File "/Library/Python/2.5/site-packages/CherryPy-2.3.0-py2.5.egg/cherrypy/_cphttptools.py", line 88, in run
_cputil.get_special_attribute("_cp_log_access", "_cpLogAccess")()
File "/Library/Python/2.5/site-packages/TurboGears-1.0.7-py2.5.egg/turbogears/controllers.py", line 507, in _cp_log_access
self.accesslog.info(s.encode('utf8'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 11: ordinal not in range(128)
OS X 10.5.5, Firefox 3.0.5, Python 2.5.1 (r251:54863, Jan 17 2008, 19:35:17), TG 1.0.7.
CherryPy only passes byte strings for referer and the request line. This seems to cause problems when trying to log them as I believe that Python doesn't know how to print >7 bit characters when not decoding/encoding them.
comment:3 in reply to: ↑ 2 Changed 3 years ago by chrisz
File "/Library/Python/2.5/site-packages/TurboGears-1.0.7-py2.5.egg/turbogears/controllers.py", line 507, in _cp_log_access
self.accesslog.info(s.encode('utf8'))
Seems you're working with a modified version of controllers.py already. The original version has
self.accesslog.info(s)
here in line 507. This should not fail since it simply passes the byte string as it is to the log. Can you try again with the unmodified controllers.py?
I cannot reproduce this. Can you give some more details (platform, browser, Py and TG version, traceback)?