Ticket #2258 (closed defect: fixed)
[PATCH] merge looses translations
Reported by: | rejoc | Owned by: | chrisz |
---|---|---|---|
Priority: | normal | Milestone: | 1.1 |
Component: | I18n | Version: | 1.1 HEAD |
Severity: | normal | Keywords: | i18n merge |
Cc: |
Description
This is the bug report for part a ticket #2257
if you have a .po file containing
#: demo/templates/master.html:75 msgid "some text" msgstr "un petit texte" #: demo/templates/welcome.html:25 msgid "some text" msgstr ""
it compiles and gives the expected translation of "un petit texte" for all the occurrences of "some text".
Now you do a "tg-admin i18n merge" (usually after a collect because you modified something) and... you get :
#: demo/templates/master.html:75 msgid "some text" msgstr "" #: demo/templates/welcome.html:25 msgid "some text" msgstr ""
You lost your translation because merge uses the last msgstr it finds in the .po for a given msgid.
Attachments
Change History
Changed 10 years ago by rejoc
-
attachment
i18n_pygettext_catalog.py.diff
added
comment:1 follow-up: ↓ 2 Changed 10 years ago by Gustavo
That's the default and intended behavior, therefore I think it should be documented instead of changing it by default. See also #2257.
comment:2 in reply to: ↑ 1 ; follow-up: ↓ 3 Changed 10 years ago by rejoc
Replying to Gustavo:
That's the default and intended behavior, therefore I think it should be documented instead of changing it by default. See also #2257.
Is this the intented behaviour ?
There are cases where, if you do the following
tg-admin i18n collect tg-admin i18n add fr ... edit .po file to add translations to some of the msgids tg-admin i18n compile # you get your translations. tg-admin i18n merge # no collect so .pot is the same tg-admin i18n compile
you loose some of your translations :-(
Within tg-admin, compile takes the last non empty translation as the valid one. I found reasonable for merge to use the same method.
This is what the patch provides. No more.
comment:3 in reply to: ↑ 2 ; follow-up: ↓ 4 Changed 10 years ago by Gustavo
Replying to rejoc:
Replying to Gustavo:
That's the default and intended behavior, therefore I think it should be documented instead of changing it by default. See also #2257.
Is this the intented behaviour ?
There are cases where, if you do the following
tg-admin i18n collect tg-admin i18n add fr ... edit .po file to add translations to some of the msgids tg-admin i18n compile # you get your translations. tg-admin i18n merge # no collect so .pot is the same tg-admin i18n compileyou loose some of your translations :-(
Within tg-admin, compile takes the last non empty translation as the valid one. I found reasonable for merge to use the same method.
This is what the patch provides. No more.
Yes, that's part of the intended behavior. If it was as simple as having xgettext (or equivalent) merging duplicate messages, I bet you it had been implemented in the official tools many years ago.
Any message collection mechanism that handles duplicate message is bogus. You *can't* assume that its algorithm will work in all situation. That's why msguniq and msgcomm exist.
comment:4 in reply to: ↑ 3 Changed 10 years ago by rejoc
Replying to Gustavo:
If it was as simple as having xgettext (or equivalent) merging duplicate messages, I bet you it had been implemented in the official tools many years ago.
Give it a try and you'll see that xgettext on a collection of files handles duplicate messages and merge them like this :
xgettext *.py
gives the following record in messages.po file
#: test2.py:2 test.py:2 test.py:4 test.py:7 msgid "some duplicated text" msgstr ""
comment:6 Changed 9 years ago by chrisz
- Status changed from new to assigned
- Owner changed from Chris Arndt to chrisz
I think the actual problem is caused by tg-admin i18n merge already merging duplicate messages in the .po file before merging these together with the messages in the .pot file.
Trying to merge duplicate messages may be bad, as Gustavo pointed out, but it is already done by tg-admin i18n merge (turbogears.i18n.pygettext.catalog).
Now tg-admin i18n compile (turbogears.i18n.msgfmt, copied from the Python i18n tool with the same name) also merges the duplicates in the .po file.
The problem is, as rejoc rightly points out, that tg-admin i18n merge always takes the last translation, while tg-admin i18n compile takes the last nonempty translation (just like the original Python i18n tool msgfmt does). Btw, all of them only regard non-fuzzy translations, so this is consistent.
To avoid the loss of translations mentioned by rejoc we could either avoid the merging of duplicates inside the .po file in the first step of tg-admin i18n merge, or at least we should do it reasonably and consistently by regarding only *nonempty* translations, as done by rejoc's simple patch.
To have this issue solved in TG 1.1rc1 which will be released today, I'll go with the second option and check in the patch.
comment:8 follow-up: ↓ 9 Changed 9 years ago by Chris Arndt
So this means we can't have different translations for the same message ids in different files? Your patch now causes the last translation in the .po file to be inserted everywhere for the same message id, regardless of in which file it is. is that the intended behavior?
comment:9 in reply to: ↑ 8 Changed 9 years ago by chrisz
Replying to Chris Arndt:
Your patch now causes the last translation in the .po file to be inserted everywhere for the same message id, regardless of in which file it is. is that the intended behavior?
Whether intended or not, that was already the behavior before the patch. The patch only alleviates this by not replacing an existing translation with an empty translation.
comment:10 Changed 9 years ago by Chris Arndt
- Status changed from assigned to closed
- Resolution set to fixed
Ok, I trust you on this. If somebody disagrees, please discuss on the mailing list and we can decide whether to reopen this ticket.
comment:11 Changed 9 years ago by chrisz
Btw, TG2 (Babel) shows the same behavior. It's only different in that messages with the same id are already merged during a "collect" ("extract_messages"), not only during a "merge" ("update_catalog"). Therefore #2257 is not an issue in TG2.
comment:12 Changed 9 years ago by chrisz
Just to clarify, the problem with message ids requiring different translations is intrinsic to gettext - it's not caused by TG. You can only solve this by using different domains (not supported by TG, by default we merge everything together into one domain specified in i18n.domain) or specifying a msgctxt (not yet supported by the Python std lib).
keeps the latest non empty translation from original .po file when merging