Remove CJK language setting and fix reading time on mixed text. Fixes #10031 #10032

DeinAlptraum · 2022-06-18T13:44:38Z

This removes the hasCJKLanguage setting, instead determining it dynamically on a word-by-word basis by checking for the presence of CJK characters.

This fixes the reading time for mixed texts, by computing separete word counts for CJK and non-CJK texts, then computing their reading times separately via the two formulas, and finally summing them up.

CLAassistant · 2022-06-18T13:44:42Z

All committers have signed the CLA.

github-actions · 2023-06-19T02:02:12Z

This PR has been automatically marked as stale because it has not had recent activity. The resources of the Hugo team are limited, and so we are asking for your help.
Please check https://github.com/gohugoio/hugo/blob/master/CONTRIBUTING.md#code-contribution and verify that this code contribution fits with the description. If yes, tell is in a comment.
This PR will automatically be closed in the near future if no further activity occurs. Thank you for all your contributions.

DeinAlptraum · 2023-06-19T13:06:08Z

Yes, I believe it fits the description, including closing an open issue.

DeinAlptraum · 2025-05-23T09:27:31Z

I have updated this branch to resolve conflicts, and have also improved the implementation slightly.

jmooring · 2025-05-25T16:45:41Z

I'm not a linguistics expert, but my understanding is that CJK languages do have explicit word separators like spaces in English. But we're relying on strings.Fields to separate words, so I'm not sure how this addresses #10031.

cc: @davidsneighbour

DeinAlptraum · 2025-05-25T23:45:30Z

@jmooring no, there are no word separators in Chinese/Japanese. The concept of a "word" is not even clearly defined in those languages, which is why counting characters is the best one can do.
I just looked up Korean and it seems that they do use spaces. We could use a different formula here or only treat CJ differently if you prefer.

In either case, I don't think this affects the resolution of #10031: the problem is that, with hasCJKLanguage activated, any reading time is computed as if the text consists of only CJK characters. As an example, a text that contains one CJK character and 10,000 non-CJK words currently results in a reading time of about 20mins (rather than the expected 47mins) because it uses the CJK formula, rather than "501 runes per minute" for the one CJK character and "213 words per minute" for the remaining 10,000 words
Splitting at spaces for CJK languages is not an issue even if they have spaces (like Korean) because we only count the number of runes, so this doesn't affect the result anyway.

This PR resolves that issue with mixed text, by computing separate counts for CJK characters as rune count and everything else via word count. This is still not perfect obviously (is Korean read at a similar speed as Chinese? what about Arabic anyway? etc.) but it's an improvement over the current situation at least, and eliminates an unnecessary setting.

github-actions bot added the Stale label Jun 19, 2023

github-actions bot removed the Stale label Jun 20, 2023

Fix reading time on mixed CJK/non-CJK text. Fixes gohugoio#10031

fde4893

DeinAlptraum force-pushed the fix-reading-time-mixed-cjk branch from b200173 to fde4893 Compare May 23, 2025 09:23

DeinAlptraum changed the title ~~Fix reading time on mixed CJK/non-CJK text. Fixes #10031~~ Remove CJK language setting and fix reading time on mixed CJK/non-CJK text. Fixes #10031 May 23, 2025

DeinAlptraum changed the title ~~Remove CJK language setting and fix reading time on mixed CJK/non-CJK text. Fixes #10031~~ Remove CJK language setting and fix reading time on mixed text. Fixes #10031 May 23, 2025

Remove unused package

3733707

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Remove CJK language setting and fix reading time on mixed text. Fixes #10031 #10032

Remove CJK language setting and fix reading time on mixed text. Fixes #10031 #10032

Uh oh!

DeinAlptraum commented Jun 18, 2022 •

edited

Loading

Uh oh!

CLAassistant commented Jun 18, 2022 •

edited

Loading

Uh oh!

github-actions bot commented Jun 19, 2023

Uh oh!

DeinAlptraum commented Jun 19, 2023

Uh oh!

DeinAlptraum commented May 23, 2025

Uh oh!

jmooring commented May 25, 2025

Uh oh!

DeinAlptraum commented May 25, 2025

Uh oh!

Uh oh!

Uh oh!

Remove CJK language setting and fix reading time on mixed text. Fixes #10031 #10032

Are you sure you want to change the base?

Remove CJK language setting and fix reading time on mixed text. Fixes #10031 #10032

Uh oh!

Conversation

DeinAlptraum commented Jun 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CLAassistant commented Jun 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jun 19, 2023

Uh oh!

DeinAlptraum commented Jun 19, 2023

Uh oh!

DeinAlptraum commented May 23, 2025

Uh oh!

jmooring commented May 25, 2025

Uh oh!

DeinAlptraum commented May 25, 2025

Uh oh!

Uh oh!

DeinAlptraum commented Jun 18, 2022 •

edited

Loading

CLAassistant commented Jun 18, 2022 •

edited

Loading