Warning: Can't synchronize with repository "(default)" (Unsupported version control system "svn": No module named svn). Look in the Trac log for more information.

Ticket #2258 (closed defect: fixed)

Opened 8 years ago

Last modified 8 years ago

[PATCH] merge looses translations

Reported by: rejoc Owned by: chrisz
Priority: normal Milestone: 1.1
Component: I18n Version: 1.1 HEAD
Severity: normal Keywords: i18n merge
Cc:

Description

This is the bug report for part a ticket #2257

if you have a .po file containing

#: demo/templates/master.html:75
msgid "some text"
msgstr "un petit texte"

#: demo/templates/welcome.html:25
msgid "some text"
msgstr ""

it compiles and gives the expected translation of "un petit texte" for all the occurrences of "some text".

Now you do a "tg-admin i18n merge" (usually after a collect because you modified something) and... you get :

#: demo/templates/master.html:75
msgid "some text"
msgstr ""

#: demo/templates/welcome.html:25
msgid "some text"
msgstr ""

You lost your translation because merge uses the last msgstr it finds in the .po for a given msgid.

Attachments

i18n_pygettext_catalog.py.diff Download (362 bytes) - added by rejoc 8 years ago.
keeps the latest non empty translation from original .po file when merging

Change History

Changed 8 years ago by rejoc

keeps the latest non empty translation from original .po file when merging

comment:1 follow-up: ↓ 2 Changed 8 years ago by Gustavo

That's the default and intended behavior, therefore I think it should be documented instead of changing it by default. See also #2257.

comment:2 in reply to: ↑ 1 ; follow-up: ↓ 3 Changed 8 years ago by rejoc

Replying to Gustavo:

That's the default and intended behavior, therefore I think it should be documented instead of changing it by default. See also #2257.

Is this the intented behaviour ?

There are cases where, if you do the following

tg-admin i18n collect
tg-admin i18n add fr
... edit .po file to add translations to some of the msgids
tg-admin i18n compile
# you get your translations.
tg-admin i18n merge  # no collect so .pot is the same
tg-admin i18n compile

you loose some of your translations :-(

Within tg-admin, compile takes the last non empty translation as the valid one. I found reasonable for merge to use the same method.

This is what the patch provides. No more.

comment:3 in reply to: ↑ 2 ; follow-up: ↓ 4 Changed 8 years ago by Gustavo

Replying to rejoc:

Replying to Gustavo:

That's the default and intended behavior, therefore I think it should be documented instead of changing it by default. See also #2257.

Is this the intented behaviour ?

There are cases where, if you do the following

tg-admin i18n collect
tg-admin i18n add fr
... edit .po file to add translations to some of the msgids
tg-admin i18n compile
# you get your translations.
tg-admin i18n merge  # no collect so .pot is the same
tg-admin i18n compile

you loose some of your translations :-(

Within tg-admin, compile takes the last non empty translation as the valid one. I found reasonable for merge to use the same method.

This is what the patch provides. No more.

Yes, that's part of the intended behavior. If it was as simple as having xgettext (or equivalent) merging duplicate messages, I bet you it had been implemented in the official tools many years ago.

Any message collection mechanism that handles duplicate message is bogus. You *can't* assume that its algorithm will work in all situation. That's why msguniq and msgcomm exist.

comment:4 in reply to: ↑ 3 Changed 8 years ago by rejoc

Replying to Gustavo:

If it was as simple as having xgettext (or equivalent) merging duplicate messages, I bet you it had been implemented in the official tools many years ago.

Give it a try and you'll see that xgettext on a collection of files handles duplicate messages and merge them like this :

xgettext *.py 

gives the following record in messages.po file

#: test2.py:2 test.py:2 test.py:4 test.py:7
msgid "some duplicated text"
msgstr ""

comment:5 Changed 8 years ago by Chris Arndt

  • Milestone changed from 1.1b4 to 1.1

comment:6 Changed 8 years ago by chrisz

  • Status changed from new to assigned
  • Owner changed from Chris Arndt to chrisz

I think the actual problem is caused by tg-admin i18n merge already merging duplicate messages in the .po file before merging these together with the messages in the .pot file.

Trying to merge duplicate messages may be bad, as Gustavo pointed out, but it is already done by tg-admin i18n merge (turbogears.i18n.pygettext.catalog).

Now tg-admin i18n compile (turbogears.i18n.msgfmt, copied from the Python i18n tool with the same name) also merges the duplicates in the .po file.

The problem is, as rejoc rightly points out, that tg-admin i18n merge always takes the last translation, while tg-admin i18n compile takes the last nonempty translation (just like the original Python i18n tool msgfmt does). Btw, all of them only regard non-fuzzy translations, so this is consistent.

To avoid the loss of translations mentioned by rejoc we could either avoid the merging of duplicates inside the .po file in the first step of tg-admin i18n merge, or at least we should do it reasonably and consistently by regarding only *nonempty* translations, as done by rejoc's simple patch.

To have this issue solved in TG 1.1rc1 which will be released today, I'll go with the second option and check in the patch.

comment:7 Changed 8 years ago by chrisz

Applied in r6674.

comment:8 follow-up: ↓ 9 Changed 8 years ago by Chris Arndt

So this means we can't have different translations for the same message ids in different files? Your patch now causes the last translation in the .po file to be inserted everywhere for the same message id, regardless of in which file it is. is that the intended behavior?

comment:9 in reply to: ↑ 8 Changed 8 years ago by chrisz

Replying to Chris Arndt:

Your patch now causes the last translation in the .po file to be inserted everywhere for the same message id, regardless of in which file it is. is that the intended behavior?

Whether intended or not, that was already the behavior before the patch. The patch only alleviates this by not replacing an existing translation with an empty translation.

comment:10 Changed 8 years ago by Chris Arndt

  • Status changed from assigned to closed
  • Resolution set to fixed

Ok, I trust you on this. If somebody disagrees, please discuss on the mailing list and we can decide whether to reopen this ticket.

comment:11 Changed 8 years ago by chrisz

Btw, TG2 (Babel) shows the same behavior. It's only different in that messages with the same id are already merged during a "collect" ("extract_messages"), not only during a "merge" ("update_catalog"). Therefore #2257 is not an issue in TG2.

comment:12 Changed 8 years ago by chrisz

Just to clarify, the problem with message ids requiring different translations is intrinsic to gettext - it's not caused by TG. You can only solve this by using different domains (not supported by TG, by default we merge everything together into one domain specified in i18n.domain) or specifying a msgctxt (not yet supported by the Python std lib).

Note: See TracTickets for help on using tickets.