Update JavaScript stemmer code to the latest version of Snowball (v2.1.0) #8867
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The stemmer code that is currently used in Sphinx was based on an old version of snowball (it was added in 059a74c in 2015). Furthermore, it is not pure JS code, but rather “JSX” (not the widely known React JSX, but https://github.com/jsx/JSX, which is also dead since 2015).
Since then, there were two snowball releases: v2.0.0 in October 2019 and v2.1.0 recently, in January 2021. In snowballstem/snowball@ddbe1fa9d5da7db0, JSX was replaced with native JS that is much easier to maintain and understand.
I should also note that our Python code uses snowballstemmer module which is also v2.1.0 now, as that module is based on the same snowball code base (so this pull request brings JS up to date with Python).
What I did in this pull request:
I updated the non-minified JS files to the latest version from snowball. To generate them, I cloned snowball repository and ran
make dist_libstemmer_js
, then copied them from generated archive to Sphinx repository.The minified files were embedded in Python code. This makes the code ugly and difficult to maintain. For consistency with non-minified files, I created a separate directory (
minified-js
) and put the files there.The minified files were created from non-minified ones using these commands:
The
-m
option ofuglifyjs
enables name mangling, e.g. using short names for variables that are only used internally.To make the page load faster, the JS code is still embedded into
language_data.js
, not copied as its own file.Note: now there is
base-stemmer.js
which defines the base class for all stemmers. That file is copied/embedded alongside the language-specific code.With this pull request, it will be also easier to add new languages. In addition to what Sphinx already has, Snowball now supports Arabic, Armenian, Basque, Catalan, Greek, Hindi, Indonesian, Irish, Lithuanian, Nepali, Serbian, Tamil and Yiddish languages. I did not add new languages here, just updated the existing ones.