Warning: Can't synchronize with repository "(default)" (Unsupported version control system "svn": No module named svn). Look in the Trac log for more information.

Ticket #1995 (closed defect: invalid)

Opened 6 years ago

Last modified 6 years ago

expose('json') encodes, but does not set coding in Content-Type header

Reported by: PeterRussell Owned by: faide
Priority: normal Milestone: 1.1
Component: TurboGears Version: 1.0.7
Severity: minor Keywords: unicode json expose
Cc:

Description

When using the expose decorator with the json template, non-ASCII characters in Unicode strings appear to be encoded in utf-8 by default, but the charset part of the Content-Type header doesn't reflect this. Either the encoding should be specified in the headers, or all non-ascii characters should be escaped with \uXXXX codes. A test case that should support either of these modes of operation is attached.

Attachments

non-ascii-json.patch Download (1.8 KB) - added by PeterRussell 6 years ago.
Failing test case

Change History

Changed 6 years ago by PeterRussell

Failing test case

comment:1 Changed 6 years ago by chrisz

  • Status changed from new to closed
  • Resolution set to fixed
  • Severity changed from major to minor

Thanks for submitting the patch. Your observation is right, but I believe this is expected behavior that should not be changed.

Since the default encoding for JSON (application/json) is in fact utf-8, this encoding does not need to be specified in the header by adding an additional charset parameter. (Optionally, you can also use utf-16 or utf-32 instead of utf-8, but even then you don't need to specify which of these encodings is used since it is clear from looking at the first 4 bytes.)

See  http://www.ietf.org/rfc/rfc4627.txt for the exact specs.

Particularly notice that section 6 says that there are no required/optional parameters for application/json (i.e. particularly, no "charset"). This is different from application/xhtml+xml, where charset is an optional parameter (but also not required).

Please reopen if you think I'm misinterpreting the rfc or the rfc should not be taken seriously for whatever pragmatic reason (in this case, please explain where the current implementation will cause problems).

comment:2 Changed 6 years ago by Chris Arndt

  • Status changed from closed to reopened
  • Resolution fixed deleted

comment:3 Changed 6 years ago by Chris Arndt

  • Status changed from reopened to closed
  • Resolution set to invalid

Setting resolution to "invalid" for allowing for proper statistics. (Or should this be "wontfix"? But there is nothing to fix...)

comment:4 Changed 6 years ago by PeterRussell

Thanks for the speedy response. You're quite right about the rfc (in fact I remembered myself on my way home from work). I suppose it may be a bug that simplejson doesn't decode utf-8 encoded JSON, given the spec, but it's not a bug in TurboGears

I apologise for wasting your time.

comment:5 Changed 6 years ago by chrisz

Are you sure simplejson doesn't decode utf-8? The simplejson homepage says "the decoder can handle incoming JSON strings of any specified encoding (UTF-8 by default)". If this doesn't work correctly, you should create a bug report at  http://code.google.com/p/simplejson/issues/list.

comment:6 Changed 6 years ago by PeterRussell

simplejson.loads(u'{"a": "é"}'.encode('utf-8')) fails for me.

comment:7 Changed 6 years ago by chrisz

Maybe you need to update your simplejson since it is working for me.

comment:8 Changed 6 years ago by PeterRussell

I'm using 1.9.3, which is the lastest version in PyPI.

The bug is filed here:  http://code.google.com/p/simplejson/issues/detail?id=22

Note: See TracTickets for help on using tickets.