Warning: Can't synchronize with repository "(default)" (Unsupported version control system "svn": No module named svn). Look in the Trac log for more information.

Ticket #381 (closed defect: fixed)

Opened 13 years ago

Last modified 12 years ago

[PATCH] Unicode characters not encoded properly for Safari

Reported by: Canis Lupus Owned by: anonymous
Priority: normal Milestone: 0.9
Component: TurboGears Version:
Severity: normal Keywords:
Cc:

Description

In controllers.py, the following line:

    unicodechars = re.compile(r"([^\x00-\x9F])")

appears to be wrong, and probably ought to be:

    unicodechars = re.compile(r"([^\x00-\x7F])")

Unicodechars is only used in the block marked "fix the Safari XMLHttpRequest encoding problem", and characters between 0x7F and 0x9F are causing the ascii codec (as used in unicodechars.sub() later in the block) to choke. I don't know what the Safari problem was, so I don't know if my 'probable fix' is correct, but it Works For Metm

Change History

comment:1 Changed 13 years ago by anonymous

  • Summary changed from Unicode characters not encoded properly for Safari to (fix included) Unicode characters not encoded properly for Safari

comment:2 Changed 13 years ago by kevin

  • Milestone set to 0.9
  • Summary changed from (fix included) Unicode characters not encoded properly for Safari to [PATCH] Unicode characters not encoded properly for Safari

comment:3 Changed 13 years ago by kevin

  • Summary changed from [PATCH] Unicode characters not encoded properly for Safari to Unicode characters not encoded properly for Safari

Is there a utf-8 string you can give me that shows the invalid value? Here's the sub you mention:

        output = unicodechars.sub(
            lambda m: "&#x%x;" % ord(m.group(1)), 
            output).encode("ascii")

Note that the characters are being replaced by &#x(SOMEVAL);, so that should certainly be valid ascii. Unless there are other characters outside the range (but your change is reducing the range.) A test case would be great.

comment:4 Changed 13 years ago by kevin

  • Summary changed from Unicode characters not encoded properly for Safari to [PATCH] Unicode characters not encoded properly for Safari

Here's an update from Canis explaining where I went awry in my last comment :)

The short response is: Re-read the regexp, it has a "^" in it :)

The long one is:
    unicodechars = re.compile(r"([^\x00-\x9F])")

...the ^ at the beginning negates the sense of the character range, so the
sub replaces characters _above_ x9F with &#xxx; entities.  The specific
string that 'broke' Turbogears for me was — (an em-dash) in a KID
template. KID resolves the entity to a literal (x97), which is in the
not-captured range of regexp. Simple test in the interactive prompt:

import re
unicodechars = re.compile(r"([^\x00-\x9F])")
unicodechars.sub("made safe?", u"\x97")
'\x97'
_.encode("ascii")

UnicodeEncodeError: 'ascii' codec can't encode character u'\x97' in
position 0: ordinal not in range(128)

So yes, I'm reducing the range, but I'm reducing the range of "safe, do
not translate" characters to match that of the ascii codec:

unicodechars = re.compile(r"([^\x00-\x7F])")
unicodechars.sub("made safe!", u"\x97")
'made safe!'

comment:5 Changed 13 years ago by kevin

  • Status changed from new to closed
  • Resolution set to fixed

committed in [561]

Note: See TracTickets for help on using tickets.