Skip to content

Semantic Comments #53

Closed
Closed
@zbraniecki

Description

@zbraniecki

One of the features we're debating for Fluent is a concept of Semantic Comments.

The idea is to define a syntax for comments that can be easily read in a "text-like" mode, but can also gain semantic meaning that can be additionally interpreted by the tools. Similar to how MarkDown file works maybe.

Generally speaking, semantic comments are meant to not have any meaning at runtime, thus not requiring comments to be parsed or carried into runtime environment.

The original idea has been described in projectfluent/fluent#16 and then extended to cover several areas:

Meta information

# A message displayed in the UI for license acceptance
#
# @context button
# @tone formal, polite
# @mode simple
# @policy firefox-brand-policy
license-accept-button = I have read Firefox Policy and accept the license terms

Original issue: projectfluent/fluent#139

Meta information generally describes some form of key-value pairs where keys are defined to carry a meaning that can be interpreted by tools like CAT tools, validators, etc.

Examples of use cases:

  • UI context of the message
  • Tone/formality of the message
  • Mode/Toolkit - HTML, React, no-html, simple etc.
  • License ID
  • Policy meta-information (brand usage etc.)
  • Message versions (tier 2)
  • Variables
  • ...

but by design the concept is fairly extensible and the value can be also an array, boolean, number or record, not just a string.

Variables

# Variables:
#   $value (Number) - Value of the unit (for example: 4.6, 500)
#   $unit (String) - Name of the unit (for example: "bytes", "KB")
sitedata-total-size = Your stored cookies, site data and cache are currently using { $value } { $unit } of disk space.

Original issue: projectfluent/fluent#140

This is a particular example of meta-information, which could be desugared to:

#   @var $value (Number) - Value of the unit (for example: 4.6, 500)
#   @var $unit (String) - Name of the unit (for example: "bytes", "KB")
sitedata-total-size = Your stored cookies, site data and cache are currently using { $value } { $unit } of disk space.

And then meta-information key @var has its value with its own syntax containing: name of the variable, type of the variable, description of the variable and a list of examples.

All of those are fully human-readable, and at the same time can be interpreted by the tooling allowing for easy error checks (You're passing a string to DATETIME formatter!), and more WYSIWYG/QA mode where the localizer can see a formatted string with an example value and check if it looks good.

Here's a prototype I once wrote for a tool that guides a localizer through a variant-based translation without expecting the user to step out of the "Q/A" mode - https://labs.braniecki.net/l10n-tool/
Such a UX requires example for variables, and this meta-information could provide it.

String Versions

# @version 2
my-string = This is a slightly updated string

Original issue: projectfluent/fluent#141

This comes from an observation that string changes are of one of the three types

  1. Original string spelling update
  2. Slight modifications
  3. Significant update

In scenario (1) we'd like to update the source string without bothering anyone. Gettext, for example, doesn't allow for that since it uses the source string as an ID, thus any update to it invalidates all translations.
In scenario (3), we'd like to invalidate all "old" translations since they are not applicable to the new meaning of the source string. Fluent handles that by asking developers to change the ID, breaking the social contract and establishing a new one under a new ID.

Scenario (2) is the more nuanced one. The value of the original string does get updated, and we'd like to notify translators that they may want to look at the message again and consider whether their translation should be updated, but since we didn't change the meaning, we don't want to invalidate all translations - we believe the old translation is better for users than fallback to the source language for the string.

Currently Fluent doesn't provide that model, while our initial analysis indicate that at least 10% of changes could fall into that category. Currently, to stay on the safe side, we invalidate such translations accruing cost and requiring translators to provide a new translation.
Some of that cost is mitigated by MT, but not all.

This proposal adjusts it by providing an optional meta-information named @version. The initial, implicit version of every string is 1. When a string gets updated in the (2) scenario, the developer can "bump" its version to @version 2 and tooling can then detect that all translations without @version are on version 1 and let them know that this message has an update they may want to take a look at.
At the same time, software will still use version 1 of the translations since its good enough.
Once the translator takes a look, they either update it or mark it as "good" and @version 2 gets set in the meta-information of the translation of the message. The cycle continues.

I suspect that the 10% is lower than actual value of that mode. I believe that if we were to start using it in production, developers more often would prefer to go for scenario (2) over (3) and in result more translations would be preserved for longer with less cost to the user and localizers.

Title Line

## Privacy Section - Site Data
##
## This sections will contain several messages
## that should be translated by a lawyer if possible.

privacy-msg1 = Foo
privacy-msg2 = Foo 2

##

Original issue: projectfluent/fluent#138

Multiline comments pose an issue for some forms of CAT tool display. In Pontoon (CAT tools developed for Fluent), we could use the concept of Group Comments (#40) to visually cluster lists of messages in the left pane, but since the Group Comment can be multiple, there's no good way to semantically understand how to display a multi-line group comment in a single line for that use case.

Title Line could be an optional logic that states that if the comment is multi line, and the first line is separated from the rest by an empty line, then it is treated as a Title Line.

Such line could serve as a "summary" of the comment, or group, or resource, and have some highlighting markup added when displayed.

In the original issue you can see two mocks of how regular syntax highlighter and Pontoon could benefit from being able to identify title line of a multiline comment.

There are potentially other uses of such meta-information, and as with the rest of the planning, we may decide to just settle on data model, leaving syntax to describe it out of scope, but I think it's useful to imagine some implementable syntax for each data model piece that could actually work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    out-of-scope?requirementsIssues related with MF requirements listresolve-candidateThis issue appears to have been answered or resolved, and may be closed soon.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions