Warning: Can't synchronize with repository "(default)" (Unsupported version control system "svn": No module named svn). Look in the Trac log for more information.

Ticket #1130 (closed defect: fixed)

Opened 11 years ago

Last modified 7 years ago

logging crashes when URL contains unicode symbols and user is logged in (mod_python system)

Reported by: dado1945 Owned by: anonymous
Priority: normal Milestone: 1.0.x bugfix
Component: TurboGears Version: 1.0.8
Severity: minor Keywords:
Cc:

Description (last modified by jorge.vargas) (diff)

It is quite hard to reproduce this bug because it requires TurboGears running behind apache using mod_python. I think similar configuration was used: http://trac.turbogears.org/turbogears/wiki/ModPythonIntegration09

Next requirement is unicode characters in URL and user must be logged in (I am using standart TG identity mechanism).

This produce following traceback (and system does not work as intended):

2006-09-28 12:30:53,818 cherrypy.msg INFO : Page handler: 'ordinal not in range(128)'
Traceback (most recent call last):
  File "/usr/lib/python2.4/site-packages/CherryPy-2.2.1-py2.4.egg/cherrypy/_cpwsgi.py", line 75, in wsgiApp
    environ['wsgi.input'])
  File "/usr/lib/python2.4/site-packages/CherryPy-2.2.1-py2.4.egg/cherrypy/_cphttptools.py", line 78, in run
    _cputil.get_special_attribute("_cp_log_access", "_cpLogAccess")()
  File "/usr/lib/python2.4/site-packages/TurboGears-1.0b1-py2.4.egg/turbogears/controllers.py", line 450, in _cp_log_access
    '') or "-",
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 28: ordinal not in range(128)

The reason can be illustrated with following example:

>>> a
'dalius'
>>> b
'\xc5\xbev\xc4\x97ris'
>>> '%s %s' % (a, b)
'dalius \xc5\xbev\xc4\x97ris'
>>> a = u'dalius'
>>> '%s %s' % (a, b)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 0: ordinal not in range(128)
>>>

If we check /usr/lib/python2.4/site-packages/TurboGears-1.0b1-py2.4.egg/turbogears/controllers.py line 450 we will find that username is unicode and url is utf-8 string. I think following check would help:

if isinstance(username, unicode):
    username = username.encode('utf-8')

Change History

comment:1 Changed 11 years ago by jorge.vargas

  • Severity changed from critical to normal
  • Description modified (diff)
  • Milestone set to 1.0b3

-fixed some formatting... embrace wiki!

-I do not undestand why it requires mod_python if you can reproduce on the console.

are you refering to http://trac.turbogears.org/turbogears/browser/tags/1.0b1/turbogears/controllers.py#L450, there is nothing there about user just a flew lines up.

on the other hand I believe URLs must be ascii at least according to the specs.

last but not least I believe your problem is with the usernames not being unicode and not the URLs (unless your passing in the users as URL) as in "per user pages httPwww.foo.com/elpargo" where elpargo is the user.

because the value your pulling is from "username = cherrypy.request.user_name" in which case the fix should be in the code that puts that value into de request, I think.

comment:2 Changed 11 years ago by dado1945

  • Thanks for fixing wiki ;)
  • I can't reproduce this on console.

Yes, I'm referring to the very same line: http://trac.turbogears.org/turbogears/browser/tags/1.0b1/turbogears/controllers.py#L450

URL is ascii but in line L450 it comes as UTF-8 encoded string. Example of UTF-8 URL:  http://dado1945.storas.lt/wiki/lt/dado1945/Japoni%C5%A1kos%20%C4%AFdomyb%C4%97s

Problem is in line 439. When you log in username becomes unicode string: http://trac.turbogears.org/turbogears/browser/tags/1.0b1/turbogears/controllers.py#L439

And turbogears fails to generate log entry (like in my given example with \xc5\xbev\xc4\x97ris)

comment:3 Changed 11 years ago by alberto

  • Milestone changed from 1.0b3 to 1.1

comment:4 Changed 11 years ago by jorge.vargas

  • Severity changed from normal to minor

should this be fix with a call to unicode?

comment:5 Changed 11 years ago by dado1945

Please, either add fix for unicode check or reassign this problem to cherrypy team by instructing that cherrypy.request.user_name should return non-unicode string always if you think that one "if clause" is performance overkill :)

comment:6 Changed 11 years ago by alberto

  • Milestone changed from 1.1 to __unclassified__

Batch moved into unclassified from 1.1 to properly track progress on the later

comment:8 Changed 10 years ago by dado1945

This problem is reproducible in latest release (1.0.2.2)

P.S. Spammers seems to attack your site.

comment:9 Changed 9 years ago by kikidonk

  • Version changed from 1.0b1 to 1.0.8

I stumbled across a similar error running 1.0.8.

I suspect the whole turbogears identity framework uses an implicit assumption that identity.user_name will always be byte encoded because by default the table uses a String column type. In my case I needed to support unicode user_names, using a UnicodeText? column type.

I fone does that there are several lines of code that will break generating 500 errors which i consider a pretty critical bug.

Now i'm a bit confused if this is a bug, or a known-limitation of user_name column's type in the identity system.

If it is a limitation, I suppose I have to manually encode the username to some encoding just before inserting into the DB in a String type column?

comment:10 Changed 9 years ago by kikidonk

(sorry about formatting, resubmitting) I stumbled across a similar error running 1.0.8.

I suspect the whole turbogears identity framework uses an implicit assumption that identity.user_name will always be byte encoded because by default the table uses a String column type. In my case I needed to support unicode user_names, using a UnicodeText?? column type.

If one does that there are several lines of code that will break generating 500 errors which i consider a pretty critical bug.

Now i'm a bit confused if this is a bug, or a known-limitation of user_name column's type in the identity system.

If it is a limitation, I suppose I have to manually encode the username to some encoding just before inserting into the DB in a String type column?

comment:11 Changed 9 years ago by chrisz

TG (even in the very old version 1.0b1 you're pointing to) always stores the user name as a unicode column in the database (in both SO and SA variants), not as byte string. I have no problems using non-ascii user names in TG 1.0.8, and they appear as utf-8 in my log file.

This ticket resembles #2118 which I cannot reproduce either. If you really think there is a hidden problem, give us some more info about your platform, versions, database (also SO or SA) and step-by-step instructions how to reproduce the issue.

comment:12 Changed 9 years ago by dado1945

kikidonik, you have different problem than I have had. In my case unicode is in URL not in username. I have solved my problem in rather drastic way: moved to different web-framework (Pylons)...

comment:13 Changed 9 years ago by kikidonk

Ok I understood the problem!

You can reproduce easily using a quickstarted project:

$ tg-admin quickstart -i -s
$ tg-admin sql create

Modify in controllers.py:15
def index(self): -> def default(self, *args, **kwargs):

$ tg-admin shell
> u = User()
> u.user_name = 'ß'.decode('utf-8')
> u.password = 'foo'
> session.flush()

Now, go to  http://localhost:8080/éé?id=éé

You will see the error related to http://trac.turbogears.org/turbogears/browser/tags/1.0b1/turbogears/controllers.py#L439

What happens is that it will attempt to generate the log string but the arguments will be mixed unicode (the user name) and byte-encoded string (cherrypy.request.requestLine).

The error by itself is simple enough to generate in a python shell:

print "%s %s" % ('ß'.decode('utf-8'), 'ß')

If the arguments are mixed unicode and byte-string (outside ascii) python will try to convert everything to bytestring using ascii (which can't represent ß)

To wrap up:

  • The bug appears only if using an unicode username (outside ascii) and at the same time having non-ascii chars in the http request
  • The bug will make logging crash returning a 500 internal error
  • The fix is to either
    • encode the username to utf-8 in the log string (and hope the log file will have consistent encoding, that is, assume the cherrypy.request.requestLine is utf-8)
    • decode the cherrypy.request.requestLine to a unicode string (but encoding is not known, unless we assume utf-8 and fallback to latin1...)

This is a duplicate of bug #2118 as you mention

comment:14 Changed 9 years ago by kikidonk

And while i'm at it, now that I know unicode user_name is supported, there will be a bug here: http://trac.turbogears.org/browser/tags/1.0b1/turbogears/identity/visitor.py#L116

Reproduce easily using the steps I described above, but instead of opening the url in your browser:

curl -u 'ß:foo' 'http://localhost:8080/éé?id=éé'

This will make the code go through 'identity_from_http_auth', which will get the Authorization header (cherrypy returns this as a byte-string) then base64 decode the stuff.

In the end you end up calling line 118 validate_identity with a byte-encoded username.

This means that this same bytestring will be passed as is to sqlalchemy, which won't be able to compare the username in the db (stored as unicode string) the the given bytestring username, and it generates a warning:

/usr/lib/python2.5/site-packages/SQLAlchemy-0.4.8-py2.5.egg/sqlalchemy/engine/default.py:241: SAWarning: Unicode type received non-unicode bind param value '\xc3\x9f'

Hopefully the db engine will have some good fallbacks allowing the query to run correctly, but this is not something we can rely on.

Possible fix: decode the username just before calling validate_identity using utf-8 (fallback to latin1, as I don't know if the encoding of http headers is specified anywhere)

comment:15 Changed 9 years ago by chrisz

Thanks, kikidonk, this is much clearer now.

I can reproduce the issue in comment:13 (you forgot to mention that you also need to login as user "ß", and it happens only with certain browsers such as MSIE, while Firefox does not seem to cause problem because it always urlencodes the url).

Btw, instead of 'ß'.decode('utf-8') you can also write u'ß' if your code is written in utf-8. And you should point to the current code at http://trac.turbogears.org/browser/branches/1.0 instead of the very old http://trac.turbogears.org/browser/tags/1.0b1/.

I can also confirm the issue in comment:14. TurboGears should really convert the base64-user-pass string to unicode in decode_basic_credentials before splitting it with ':' and returning it. Unfortunately, RFC2617 does not specify any charset here and some googling shows that there is disaccord whether latin-1 or utf-8 is to be used. Probably we should try to decode as utf-8 first, and if that fails, decode as latin-1.

These kind of problems are why I really appreciate the changes made in Python 3.0...

comment:16 Changed 9 years ago by chrisz

Fixed both problems in r6032. Please let me know if you agree with the fix, then I will port it over to the other TG 1.x branches as well.

comment:17 Changed 9 years ago by kikidonk

The fixes look good for both cases. You might want to correct your use of 'ascci' to 'ascii' though :)

comment:18 Changed 9 years ago by chrisz

  • Status changed from new to closed
  • Resolution set to fixed

Ok, fixed the typo :) and ported to the other branches in r6033.

@dado1945: I really think it was the same bug that appeared only if you used a non-ascii URL and a non-ascii user name and a MSIE browser at the same time. You may consider using TG2 since it is based on Pylons. However, TG1 has been much improved since you opened the bug and is more mature, maybe you want to give it another try.

Closing this and #2118 now. Please reopen if you think there are still any remaining issues.

comment:19 Changed 9 years ago by dado1945

I never use MSIE. So I don't know if it is the same bug. That doesn't matter anymore ;-) I believe that it is more mature now and etc. But I just don't see way back to TG (even TG2) anymore. While I don't reject possibility that if I will for beer I have to write some TG code ;-)

I must agree that TG was very good introduction to web programming.

comment:20 Changed 9 years ago by kikidonk

Just for the record, I was able to reproduce this error using Firefox2 and 3, so it's not only msie. I won't dig for the cause of that since the bug is now fixed :)

comment:21 Changed 7 years ago by chrisz

  • Milestone changed from __unclassified__ to 1.0.x bugfix
Note: See TracTickets for help on using tickets.