Ticket #1995 (closed defect: invalid)
expose('json') encodes, but does not set coding in Content-Type header
| Reported by: | PeterRussell | Owned by: | faide |
|---|---|---|---|
| Priority: | normal | Milestone: | 1.1 |
| Component: | TurboGears | Version: | 1.0.7 |
| Severity: | minor | Keywords: | unicode json expose |
| Cc: |
Description
When using the expose decorator with the json template, non-ASCII characters in Unicode strings appear to be encoded in utf-8 by default, but the charset part of the Content-Type header doesn't reflect this. Either the encoding should be specified in the headers, or all non-ascii characters should be escaped with \uXXXX codes. A test case that should support either of these modes of operation is attached.
Attachments
Change History
Changed 5 years ago by PeterRussell
-
attachment
non-ascii-json.patch
added
comment:1 Changed 5 years ago by chrisz
- Status changed from new to closed
- Resolution set to fixed
- Severity changed from major to minor
Thanks for submitting the patch. Your observation is right, but I believe this is expected behavior that should not be changed.
Since the default encoding for JSON (application/json) is in fact utf-8, this encoding does not need to be specified in the header by adding an additional charset parameter. (Optionally, you can also use utf-16 or utf-32 instead of utf-8, but even then you don't need to specify which of these encodings is used since it is clear from looking at the first 4 bytes.)
See http://www.ietf.org/rfc/rfc4627.txt for the exact specs.
Particularly notice that section 6 says that there are no required/optional parameters for application/json (i.e. particularly, no "charset"). This is different from application/xhtml+xml, where charset is an optional parameter (but also not required).
Please reopen if you think I'm misinterpreting the rfc or the rfc should not be taken seriously for whatever pragmatic reason (in this case, please explain where the current implementation will cause problems).
comment:2 Changed 5 years ago by Chris Arndt
- Status changed from closed to reopened
- Resolution fixed deleted
comment:3 Changed 5 years ago by Chris Arndt
- Status changed from reopened to closed
- Resolution set to invalid
Setting resolution to "invalid" for allowing for proper statistics. (Or should this be "wontfix"? But there is nothing to fix...)
comment:4 Changed 5 years ago by PeterRussell
Thanks for the speedy response. You're quite right about the rfc (in fact I remembered myself on my way home from work). I suppose it may be a bug that simplejson doesn't decode utf-8 encoded JSON, given the spec, but it's not a bug in TurboGears
I apologise for wasting your time.
comment:5 Changed 5 years ago by chrisz
Are you sure simplejson doesn't decode utf-8? The simplejson homepage says "the decoder can handle incoming JSON strings of any specified encoding (UTF-8 by default)". If this doesn't work correctly, you should create a bug report at http://code.google.com/p/simplejson/issues/list.
comment:6 Changed 5 years ago by PeterRussell
simplejson.loads(u'{"a": "é"}'.encode('utf-8')) fails for me.
comment:7 Changed 5 years ago by chrisz
Maybe you need to update your simplejson since it is working for me.
comment:8 Changed 5 years ago by PeterRussell
I'm using 1.9.3, which is the lastest version in PyPI.
The bug is filed here: http://code.google.com/p/simplejson/issues/detail?id=22
Failing test case