Skip to content

v7.2.6 breaks translations of permalink titles #11689

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
chrisjsewell opened this issue Sep 20, 2023 · 15 comments
Closed

v7.2.6 breaks translations of permalink titles #11689

chrisjsewell opened this issue Sep 20, 2023 · 15 comments

Comments

@chrisjsewell
Copy link
Member

chrisjsewell commented Sep 20, 2023

Describe the bug

Heya @AA-Turner 7e9a206 broke translations;
if you're changing text wrapped in _(...), then best to check if there was an existing translation 😅

How to Reproduce

Environment Information

-

Sphinx extensions

No response

Additional context

No response

@picnixz
Copy link
Member

picnixz commented Sep 20, 2023

I can't find occurrences of "Permalink to this equation" in the locales of sphinx/locales (but maybe the github search is broken). Can you tell us which (built-in) translation is broken and what would be the existing translation?

@picnixz picnixz added the i18n label Sep 20, 2023
@chrisjsewell
Copy link
Member Author

chrisjsewell commented Sep 20, 2023

Permalink to this heading was translated to Lien permanent vers cette rubrique (for fr), and now it is not (this was identified in the myst-parser tests)

@picnixz
Copy link
Member

picnixz commented Sep 20, 2023

What I asked is: did Sphinx have a translation in sphinx/locales for Permalink to this heading which was then removed due to the update?

Also, if the issue affects myst-parser tests and there was no translation on our side, then you should open an issue on their side since we cannot do anything.

(Or tell me if there is some magic behind that I'm not aware of since I'm no expert in i18n things, like is there a way to actually check that there exists some translation somewhere?)

@chrisjsewell
Copy link
Member Author

chrisjsewell commented Sep 20, 2023

@picnixz I'm the author and maintainer of myst-parser, it is nothing to do with that, it is sphinx that broke it

Or tell me if there is some magic behind that I'm not aware of since I'm no expert in i18n thing

then maybe put in some effort to look into it before commenting 🫠

did Sphinx have a translation in sphinx/locales for Permalink to this heading which was then removed due to the update?

I have now looked into it, and appears there is a bot that updates all the translations, without actually checking that translations have been broken

All the translation for Permalink to this... were removed in: #11538

@picnixz
Copy link
Member

picnixz commented Sep 20, 2023

Ok. Then I'll let @AA-Turner revert the broken parts since I don't know if there is a "fast" way to do it (btw, is there a way to "alert" us that the translations actually exist? like is there a CI/CD check that we could add for this? otherwise we'll probably do it the old way using bash scripts)

@chrisjsewell
Copy link
Member Author

btw, is there a way to "alert" us that the translations actually exist?

yeh I think that would be good cheers,
but obviously a rule of thumb would be that if any text is wrapped in _(...) then it should have translations, and you shouldn't change it without changing the translations

@chrisjsewell
Copy link
Member Author

FYI, there is another v7.2 discrepancy I noted; that desc_signature, desc_name, etc now seem to have duplicate classes, e.g. classes="sig sig-object" is now classes="sig sig-object sig sig-object"

Don't know if I'll have time to track down this

@picnixz
Copy link
Member

picnixz commented Sep 20, 2023

that desc_signature, desc_name, etc now seem to have duplicate classes

I'm not sure but I cannot reproduce this. At least, in HTML5Translator.visit_desc_signature, when I print the node, I get:

<desc_signature _toc_name="foo()" _toc_parts="('foo',)" class="" classes="sig sig-object py" fullname="foo" ids="foo" module="True">
    <desc_name classes="sig-name descname" xml:space="preserve">
        foo
    <desc_parameterlist multi_line_parameter_list="False" xml:space="preserve">
        <desc_parameter xml:space="preserve">
            <desc_sig_name classes="n">
                x

and the rst is

.. function:: foo(x)

   Some function.

   :param x: The x
   :type x: int

@chrisjsewell
Copy link
Member Author

I'm not sure but I cannot reproduce this

Ah, actually I think this was broken in about Sphinx==7.1.2, but is now fixed in 7.2.6

@picnixz
Copy link
Member

picnixz commented Sep 20, 2023

Good then. By the way, I looked at how translations are actually updated and it's using transifex, which I don't know at all. I don't know if there is a way to check with transifex itself. If I needed to implement the feature from scratch, here's what I would do:

  • When writing a PR, extract the translations from the modified files only (and not from every files).

  • Design a tool that actually make the correct diff, namely extracts from the base file the "old" translated string and from the modified file the "new" translated string (so that we know which translation was changed to what).

    For all existing translations for the modified files, check if they were replaced by a new translated string or not. Here, we have two cases:

    1. The text (key) actually contained an error, so the key in the translated messages should be changed but not its content (translated text). This is an example of what happened in this issue.
    2. The text (key) changes its meaning and so should the translated messages. I don't have an example of that.

    In the first case, the CI/CD check should pass but we would have some follow-up bot which would then update the .pot and corresponding .po files by keeping the old translation but just changing the key. In the second case, the PR should be rejected and the files must be manually changed (i.e., the translations should be re-created).

    In order to know whether the bot should consider the first or second case, maybe we could add some label tha triggers specific checks or only trigger the check if the PR is named "[i18n-update]" (1st case) or "[i18n-new]" (2nd case).

@AA-Turner
Copy link
Member

Thanks for the note Chris.

this was identified in the myst-parser tests

Would you consider contributing some of these tests upstream?


This fails as the translations weren't synchronised for each release. This used to happen either manually in the release wheel or by merging the automatic transifex PR. However with the move to trusted publishing, this has stopped. The solution I suppose is either (a) always trigger and merge the transifex PR or (b) integrate translations into the wheel build process.

(a) is the simpler option, though fallible, and increases repository size each time due to the binary blobs of content (.mo files)

(b) would require a little more work, but I think is my preference currently -- we'd be able to remove all .mo files from the source tree, I think. This would also ensure that the translations have synchronised as close as possible to the moment of release.

A

@AA-Turner
Copy link
Member

I'd also welcome soundings on if we ought to make a 7.2.7 release (an homage to the aeroplane, naturally).

A

@chrisjsewell
Copy link
Member Author

Would you consider contributing some of these tests upstream?

I would indeed, although not sure how much time I have right now 😅

@n-peugnet
Copy link
Contributor

but obviously a rule of thumb would be that if any text is wrapped in _(...) then it should have translations, and you shouldn't change it without changing the translations

I don't think this is how the translation workflow works with gettext. The source message have been changed to be more meaningful in the opinion of the software developer. Now this new message needs to be translated. The software developer is not expected to know all the languages the software is translated into, so it is not up to him/her to update all the translation files. Instead, new translation templates (.pot) files are generated from the source and each translation maintainer need to update its translation file (.po) and translate the new message. Translations always arrive a little bit later than the new messages but this is the way it is expected to work out.

@n-peugnet
Copy link
Contributor

@picnixz:

  • Design a tool that actually make the correct diff, namely extracts from the base file the "old" translated string and from the modified file the "new" translated string (so that we know which translation was changed to what).

    For all existing translations for the modified files, check if they were replaced by a new translated string or not. Here, we have two cases:

    1. The text (key) actually contained an error, so the key in the translated messages should be changed but not its content (translated text). This is an example of what happened in this issue.

    2. The text (key) changes its meaning and so should the translated messages. I don't have an example of that.

Well this is quite similar to what gettext already does. When a string changes, this creates a new message, gettext then tries to match this new message against the previous ones, finds the most similar one, use it as the translation and mark it as fuzzy. That way, the translators know that the string has been updated and can see the diff between the previous and updated string and choose if they want to make changes to the translated string or if it is not needed and just remove the fuzzy tag.
In the meantime, as long as the translation is marked as fuzzy, it is not used and falls-back to the source message, to prevent displaying outdated informations.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants