-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Remove CJK language setting and fix reading time on mixed text. Fixes #10031 #10032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Remove CJK language setting and fix reading time on mixed text. Fixes #10031 #10032
Conversation
This PR has been automatically marked as stale because it has not had recent activity. The resources of the Hugo team are limited, and so we are asking for your help. |
Yes, I believe it fits the description, including closing an open issue. |
b200173
to
fde4893
Compare
I have updated this branch to resolve conflicts, and have also improved the implementation slightly. |
I'm not a linguistics expert, but my understanding is that CJK languages do have explicit word separators like spaces in English. But we're relying on cc: @davidsneighbour |
@jmooring no, there are no word separators in Chinese/Japanese. The concept of a "word" is not even clearly defined in those languages, which is why counting characters is the best one can do. In either case, I don't think this affects the resolution of #10031: the problem is that, with This PR resolves that issue with mixed text, by computing separate counts for CJK characters as rune count and everything else via word count. This is still not perfect obviously (is Korean read at a similar speed as Chinese? what about Arabic anyway? etc.) but it's an improvement over the current situation at least, and eliminates an unnecessary setting. |
This removes the
hasCJKLanguage
setting, instead determining it dynamically on a word-by-word basis by checking for the presence of CJK characters.This fixes the reading time for mixed texts, by computing separete word counts for CJK and non-CJK texts, then computing their reading times separately via the two formulas, and finally summing them up.