Warning: Can't synchronize with repository "(default)" (Unsupported version control system "svn": No module named svn). Look in the Trac log for more information.

Ticket #242 (closed defect: fixed)

Opened 13 years ago

Last modified 8 years ago

Problems with I18N and (X)HTML entities

Reported by: Jorge Godoy <jgodoy@…> Owned by: anonymous
Priority: high Milestone: 1.0.4
Component: tg-admin (non-toolbox) Version:
Severity: major Keywords:
Cc:

Description

Kid (or CherryPy?, but I believe it is Kid) has fixed the problem of using HTML entities withing the template, but the i18n module is still choking on those.

Here's a traceback collecting the string to mount a potfile:

================================================================================ Traceback (most recent call last):

File "/usr/lib/python2.4/site-packages/CherryPy-2.1.0-py2.4.egg/cherrypy/_cphttptools.py", line 271, in run

main()

File "/usr/lib/python2.4/site-packages/CherryPy-2.1.0-py2.4.egg/cherrypy/_cphttptools.py", line 502, in main

body = page_handler(*args, cherrypy.request.paramMap)

File "/home/godoy/desenvolvimento/python/TurboGears/trunk/turbogears/controllers.py", line 196, in newfunc

html, *args, kw)

File "/home/godoy/desenvolvimento/python/TurboGears/trunk/turbogears/database.py", line 174, in run_with_transaction

retval = func(*args, kw)

File "/home/godoy/desenvolvimento/python/TurboGears/trunk/turbogears/controllers.py", line 222, in _execute_func

output = func(self, *args, kw)

File "/home/godoy/desenvolvimento/python/TurboGears/trunk/turbogears/toolbox/admi18n/init.py", line 271, in string_collection

self.collect_string_for_files(files)

File "/home/godoy/desenvolvimento/python/TurboGears/trunk/turbogears/toolbox/admi18n/init.py", line 233, in collect_string_for_files

pygettext.main()

File "/home/godoy/desenvolvimento/python/TurboGears/trunk/turbogears/toolbox/admi18n/pygettext.py", line 676, in main

if os.path.splitext(filename)[-1].lower() == '.kid': eater.extract_kid_strings()

File "/home/godoy/desenvolvimento/python/TurboGears/trunk/turbogears/toolbox/admi18n/pygettext.py", line 471, in extract_kid_strings

f = ElementTree?(file=self.curfile)

File "/usr/lib/python2.4/site-packages/elementtree-1.2.6-py2.4.egg/elementtree/ElementTree.py", line 543, in init File "/usr/lib/python2.4/site-packages/elementtree-1.2.6-py2.4.egg/elementtree/ElementTree.py", line 583, in parse File "/usr/lib/python2.4/site-packages/elementtree-1.2.6-py2.4.egg/elementtree/ElementTree.py", line 1242, in feed File "/usr/lib/python2.4/site-packages/elementtree-1.2.6-py2.4.egg/elementtree/ElementTree.py", line 1195, in _default

ExpatError?: undefined entity &copy;: line 70, column 8 ================================================================================

From the Expat Error, I believe I could try fixing that adding something on my DTD... I'll try pointing to some local DTD to allow the validation, but since neither Kid nor tidy choke on those, it should pass...

The same also happens for &nbsp; and I believe for other entities as well.

Attachments

reverse-changeset-1689.patch Download (2.3 KB) - added by Chris Arndt 11 years ago.
Reverse changeset r1689

Change History

comment:1 Changed 13 years ago by anonymous

  • Milestone set to 0.9

comment:2 Changed 13 years ago by oefe

Isn't kid using ElementTree? as well?

In general, it's probably best to avoid (html) entities in XML. As XML is unicode-based, there is no need for them. Just write your templates in utf-8 or another suitable encoding. "©" is more readable than &copy; anyway. If you are using an html editor that insists on entities, maybe you can configure it to emit numeric entities?

comment:3 Changed 13 years ago by godoy

It is doable but " " (two spaces) generates a different output than "&nbsp;&nbsp;", so this is really a problem if you need the entities to be there. This is still hapenning in the trunk as of r1164.

comment:4 Changed 13 years ago by cogumbreiro

The solution to this problem is here:

 http://online.effbot.org/2003_07_01_archive.htm#escape

comment:5 Changed 13 years ago by fredrik

&nbsp; isn't the same thing as a space; to embed a &nbsp;, you need to insert chr(160), not chr(32). To do this in XML, just use "&#160;" instead of "&nbsp;" (or use an editor that allows you to embed "no-breaking space" in your document).

comment:6 Changed 13 years ago by elvelind

  • Status changed from new to closed
  • Resolution set to fixed

fixed in r1689

comment:7 Changed 11 years ago by amit

  • Status changed from closed to reopened
  • Resolution fixed deleted
  • Component changed from Toolbox to tg-admin (non-toolbox)
  • Milestone changed from 0.9 to 1.0.4

$ tg-admin i18n collect

fails with the same error. command/i18n.py needs to be fixed as well.

comment:8 Changed 11 years ago by Chris Arndt

First, I can't reproduce the error with tg-admin i18n collect. I put &copy; in the welcome.kid of a fresh quickstarted project and the i18n collect command ran without errors and the messages.pot file showed a © char in unicode encoding in the msgid.

But: I think the "fix" introduced in r1689 was wrong. Kid templates are not (X)HTML they are XML! In XML there are only predefined entities for & = &amp;, < = &lt;, > = &gt;, ' = &apos; and " = &quot; (see e.g.  this wikipedia article). All other characters should be written as unicode characters or numeric character references in the form of &#nnnn; or &xhhhh; (or write a DTD for your template that declares the entities you are using).

I'm attaching a patch that reverses r1689 and I will later resolve this ticket as invalid, if nobody objects.

I also tested the full chain of steps for setting up i18n for a quickstarted app (with tg-admin and with the Toolbox) with a character entity reference in one of my templates and with the changes from r1689 removed and everything worked correct, so it seems that the fix from r1689 is not necessary any more, anyway.

Changed 11 years ago by Chris Arndt

Reverse changeset r1689

comment:9 Changed 11 years ago by amit

I have seen the problem saying:

undefined entity &nbsp;: line 132, column 24

comment:10 Changed 11 years ago by Chris Arndt

Can you reproduce this with a current SVN checkout? Can you provide an example template and the exact steps leading to the error?

comment:11 Changed 11 years ago by amit

Hi, I have tested with the latest SVN sources (1.0 branch). It seems to be fixed. Please close this ticket...

Thanks.

comment:12 Changed 11 years ago by Chris Arndt

  • Status changed from reopened to closed
  • Resolution set to fixed
Note: See TracTickets for help on using tickets.