Skip to content

Commit 50253a9

Browse files
committed
feat: array_to_sentence_string and number_of_words filters from Jekyll, #443
1 parent b12eb8a commit 50253a9

File tree

9 files changed

+303
-10
lines changed

9 files changed

+303
-10
lines changed

docs/source/_data/sidebar.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ filters:
2929
overview: overview.html
3030
abs: abs.html
3131
append: append.html
32+
array_to_sentence_string: array_to_sentence_string.html
3233
at_least: at_least.html
3334
at_most: at_most.html
3435
capitalize: capitalize.html
@@ -63,6 +64,7 @@ filters:
6364
modulo: modulo.html
6465
newline_to_br: newline_to_br.html
6566
normalize_whitespace: normalize_whitespace.html
67+
number_of_words: number_of_words.html
6668
plus: plus.html
6769
pop: pop.html
6870
push: push.html
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
---
2+
title: array_to_sentence_string
3+
---
4+
5+
{% since %}v10.13.0{% endsince %}
6+
7+
Convert an array into a sentence. Useful for listing tags. Optional argument for connector.
8+
9+
Input
10+
```liquid
11+
{{ "foo,bar,baz" | split: "," | array_to_sentence_string }}
12+
```
13+
14+
Output
15+
```text
16+
foo, bar, and baz
17+
```
18+
19+
Input
20+
```liquid
21+
{{ "foo,bar,baz" | split: "," | array_to_sentence_string: "or" }}
22+
```
23+
24+
Output
25+
```text
26+
foo, bar, or baz
27+
```
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
---
2+
title: number_of_words
3+
---
4+
5+
{% since %}v10.13.0{% endsince %}
6+
7+
Count the number of words in some text. This filter takes an optional argument to control the handling of Chinese-Japanese-Korean (CJK) characters in the input string:
8+
- Passing `'cjk'` as the argument will count every CJK character detected as one word irrespective of being separated by whitespace.
9+
- Passing `'auto'` (auto-detect) works similar to `'cjk'` but is more performant if the filter is used on a variable string that may or may not contain CJK chars.
10+
11+
Input
12+
```liquid
13+
{{ "Hello world!" | number_of_words }}
14+
```
15+
16+
Output
17+
```text
18+
2
19+
```
20+
21+
Input
22+
```liquid
23+
{{ "你好hello世界world" | number_of_words }}
24+
```
25+
26+
Output
27+
```text
28+
1
29+
```
30+
31+
Input
32+
```liquid
33+
{{ "你好hello世界world" | number_of_words: "cjk" }}
34+
```
35+
36+
Output
37+
```text
38+
6
39+
```
40+
41+
Input
42+
```liquid
43+
{{ "你好hello世界world" | number_of_words: "auto" }}
44+
```
45+
46+
Output
47+
```text
48+
6
49+
```

docs/source/filters/overview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ There's 40+ filters supported by LiquidJS. These filters can be categorized into
1010
Categories | Filters
1111
--- | ---
1212
Math | plus, minus, modulo, times, floor, ceil, round, divided_by, abs, at_least, at_most
13-
String | append, prepend, capitalize, upcase, downcase, strip, lstrip, rstrip, strip_newlines, split, replace, replace_first, replace_last,remove, remove_first, remove_last, truncate, truncatewords, normalize_whitespace
13+
String | append, prepend, capitalize, upcase, downcase, strip, lstrip, rstrip, strip_newlines, split, replace, replace_first, replace_last,remove, remove_first, remove_last, truncate, truncatewords, normalize_whitespace, number_of_words, array_to_sentence_string
1414
HTML/URI | escape, escape_once, url_encode, url_decode, strip_html, newline_to_br, xml_escape, cgi_escape, uri_escape
1515
Array | slice, map, sort, sort_natural, uniq, where, where_exp, group_by, group_by_exp, find, find_exp, first, last, join, reverse, concat, compact, size, push, pop, shift, unshift
1616
Date | date, date_to_xmlschema, date_to_rfc822, date_to_string, date_to_long_string
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
---
2+
title: array_to_sentence_string
3+
---
4+
5+
{% since %}v10.13.0{% endsince %}
6+
7+
把数组转化为句子,用于做标签列表。有一个可选的连接词参数。
8+
9+
输入
10+
```liquid
11+
{{ "foo,bar,baz" | split: "," | array_to_sentence_string }}
12+
```
13+
14+
输出
15+
```text
16+
foo, bar, and baz
17+
```
18+
19+
输入
20+
```liquid
21+
{{ "foo,bar,baz" | split: "," | array_to_sentence_string: "or" }}
22+
```
23+
24+
输出
25+
```text
26+
foo, bar, or baz
27+
```
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
---
2+
title: number_of_words
3+
---
4+
5+
{% since %}v10.13.0{% endsince %}
6+
7+
计算文本中的单词数。此过滤器接受一个可选参数,用于控制输入字符串中汉字-日语-韩语(CJK)字符的处理方式:
8+
- `'cjk'`:将每个检测到的 CJK 字符计为一个单词,无论是否由空格分隔。
9+
- `'auto'`:与 `'cjk'` 类似,但如果过滤器用于可能包含或不包含 CJK 字符的字符串,则性能更好。
10+
11+
输入
12+
```liquid
13+
{{ "Hello world!" | number_of_words }}
14+
```
15+
16+
输出
17+
```text
18+
2
19+
```
20+
21+
输入
22+
```liquid
23+
{{ "你好hello世界world" | number_of_words }}
24+
```
25+
26+
输出
27+
```text
28+
1
29+
```
30+
31+
输入
32+
```liquid
33+
{{ "你好hello世界world" | number_of_words: "cjk" }}
34+
```
35+
36+
输出
37+
```text
38+
6
39+
```
40+
41+
输入
42+
```liquid
43+
{{ "你好hello世界world" | number_of_words: "auto" }}
44+
```
45+
46+
输出
47+
```text
48+
6
49+
```

docs/source/zh-cn/filters/overview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ LiquidJS 共支持 40+ 个过滤器,可以分为如下几类:
1010
类别 | 过滤器
1111
--- | ---
1212
数学 | plus, minus, modulo, times, floor, ceil, round, divided_by, abs, at_least, at_most
13-
字符串 | append, prepend, capitalize, upcase, downcase, strip, lstrip, rstrip, strip_newlines, split, replace, replace_first, replace_last, remove, remove_first, remove_last, truncate, truncatewords, normalize_whitespace
13+
字符串 | append, prepend, capitalize, upcase, downcase, strip, lstrip, rstrip, strip_newlines, split, replace, replace_first, replace_last, remove, remove_first, remove_last, truncate, truncatewords, normalize_whitespace, number_of_words, array_to_sentence_string
1414
HTML/URI | escape, escape_once, url_encode, url_decode, strip_html, newline_to_br, xml_escape, cgi_escape, uri_escape
1515
数组 | slice, map, sort, sort_natural, uniq, where, where_exp, group_by, group_by_exp, find, find_exp, first, last, join, reverse, concat, compact, size, push, pop, shift, unshift
1616
日期 | date, date_to_xmlschema, date_to_rfc822, date_to_string, date_to_long_string

src/filters/string.ts

Lines changed: 51 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,20 @@
33
*
44
* * prefer stringify() to String() since `undefined`, `null` should eval ''
55
*/
6+
7+
// Han (Chinese) characters: \u4E00-\u9FFF
8+
// Additional Han characters: \uF900-\uFAFF (CJK Compatibility Ideographs)
9+
// Additional Han characters: \u3400-\u4DBF (CJK Unified Ideographs Extension A)
10+
// Katakana (Japanese): \u30A0-\u30FF
11+
// Hiragana (Japanese): \u3040-\u309F
12+
// Hangul (Korean): \uAC00-\uD7AF
613
import { assert, escapeRegExp, stringify } from '../util'
714

15+
const rCJKWord = /[\u4E00-\u9FFF\uF900-\uFAFF\u3400-\u4DBF\u3040-\u309F\u30A0-\u30FF\uAC00-\uD7AF]/gu
16+
17+
// Word boundary followed by word characters (for detecting words)
18+
const rNonCJKWord = /[^\u4E00-\u9FFF\uF900-\uFAFF\u3400-\u4DBF\u3040-\u309F\u30A0-\u30FF\uAC00-\uD7AF\s]+/gu
19+
820
export function append (v: string, arg: string) {
921
assert(arguments.length === 2, 'append expect 2 arguments')
1022
return stringify(v) + stringify(arg)
@@ -32,16 +44,16 @@ export function upcase (str: string) {
3244
}
3345

3446
export function remove (v: string, arg: string) {
35-
return stringify(v).split(String(arg)).join('')
47+
return stringify(v).split(stringify(arg)).join('')
3648
}
3749

3850
export function remove_first (v: string, l: string) {
39-
return stringify(v).replace(String(l), '')
51+
return stringify(v).replace(stringify(l), '')
4052
}
4153

4254
export function remove_last (v: string, l: string) {
4355
const str = stringify(v)
44-
const pattern = String(l)
56+
const pattern = stringify(l)
4557
const index = str.lastIndexOf(pattern)
4658
if (index === -1) return str
4759
return str.substring(0, index) + str.substring(index + pattern.length)
@@ -56,7 +68,7 @@ export function rstrip (str: string, chars?: string) {
5668
}
5769

5870
export function split (v: string, arg: string) {
59-
const arr = stringify(v).split(String(arg))
71+
const arr = stringify(v).split(stringify(arg))
6072
// align to ruby split, which is the behavior of shopify/liquid
6173
// see: https://ruby-doc.org/core-2.4.0/String.html#method-i-split
6274
while (arr.length && arr[arr.length - 1] === '') arr.pop()
@@ -83,19 +95,19 @@ export function capitalize (str: string) {
8395
}
8496

8597
export function replace (v: string, pattern: string, replacement: string) {
86-
return stringify(v).split(String(pattern)).join(replacement)
98+
return stringify(v).split(stringify(pattern)).join(replacement)
8799
}
88100

89101
export function replace_first (v: string, arg1: string, arg2: string) {
90-
return stringify(v).replace(String(arg1), arg2)
102+
return stringify(v).replace(stringify(arg1), arg2)
91103
}
92104

93105
export function replace_last (v: string, arg1: string, arg2: string) {
94106
const str = stringify(v)
95-
const pattern = String(arg1)
107+
const pattern = stringify(arg1)
96108
const index = str.lastIndexOf(pattern)
97109
if (index === -1) return str
98-
const replacement = String(arg2)
110+
const replacement = stringify(arg2)
99111
return str.substring(0, index) + replacement + str.substring(index + pattern.length)
100112
}
101113

@@ -117,3 +129,34 @@ export function normalize_whitespace (v: string) {
117129
v = stringify(v)
118130
return v.replace(/\s+/g, ' ')
119131
}
132+
133+
export function number_of_words (input: string, mode?: 'cjk' | 'auto') {
134+
input = stringify(input).trim()
135+
if (!input) return 0
136+
switch (mode) {
137+
case 'cjk':
138+
// Count CJK characters and words
139+
return (input.match(rCJKWord) || []).length + (input.match(rNonCJKWord) || []).length
140+
case 'auto':
141+
// Count CJK characters, if none, count words
142+
return rCJKWord.test(input)
143+
? input.match(rCJKWord)!.length + (input.match(rNonCJKWord) || []).length
144+
: input.split(/\s+/).length
145+
default:
146+
// Count words only
147+
return input.split(/\s+/).length
148+
}
149+
}
150+
151+
export function array_to_sentence_string (array: unknown[], connector = 'and') {
152+
switch (array.length) {
153+
case 0:
154+
return ''
155+
case 1:
156+
return array[0]
157+
case 2:
158+
return `${array[0]} ${connector} ${array[1]}`
159+
default:
160+
return `${array.slice(0, -1).join(', ')}, ${connector} ${array[array.length - 1]}`
161+
}
162+
}

test/integration/filters/string.spec.ts

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -238,4 +238,100 @@ describe('filters/string', function () {
238238
expect(liquid.parseAndRenderSync('{{ "a \n b c" | normalize_whitespace }}')).toEqual('a b c')
239239
})
240240
})
241+
describe('number_of_words', () => {
242+
it('should count words of Latin sentence', async () => {
243+
const html = await liquid.parseAndRender('{{ "I\'m not hungry" | number_of_words: "auto"}}')
244+
expect(html).toEqual('3')
245+
})
246+
247+
it('should count words of mixed sentence', async () => {
248+
const html = await liquid.parseAndRender('{{ "Hello world!" | number_of_words }}')
249+
expect(html).toEqual('2')
250+
})
251+
252+
it('should count words of CJK sentence', async () => {
253+
const html = await liquid.parseAndRender('{{ "你好hello世界world" | number_of_words }}')
254+
expect(html).toEqual('1')
255+
})
256+
257+
it('should count words of CJK sentence with mode "cjk"', async () => {
258+
const html = await liquid.parseAndRender('{{ "你好hello世界world" | number_of_words: "cjk" }}')
259+
expect(html).toEqual('6')
260+
})
261+
262+
it('should count words of CJK sentence with mode "auto"', async () => {
263+
const html = await liquid.parseAndRender('{{ "你好hello世界world" | number_of_words: "auto" }}')
264+
expect(html).toEqual('6')
265+
})
266+
it('should handle empty input', async () => {
267+
const html = await liquid.parseAndRender('{{ "" | number_of_words }}')
268+
expect(html).toEqual('0')
269+
})
270+
271+
it('should handle input with only whitespace', async () => {
272+
const html = await liquid.parseAndRender('{{ " " | number_of_words }}')
273+
expect(html).toEqual('0')
274+
})
275+
276+
it('should count words with punctuation marks', async () => {
277+
const html = await liquid.parseAndRender('{{ "Hello! This is a test." | number_of_words }}')
278+
expect(html).toEqual('5')
279+
})
280+
281+
it('should count words with special characters', async () => {
282+
const html = await liquid.parseAndRender('{{ "This is a test with special characters: !@#$%^&*()-_+=`~[]{};:\'\\"\\|<,>.?/" | number_of_words }}')
283+
expect(html).toEqual('8')
284+
})
285+
286+
it('should count words with multiple spaces between words', async () => {
287+
const html = await liquid.parseAndRender('{{ " Hello world! " | number_of_words }}')
288+
expect(html).toEqual('2')
289+
})
290+
291+
it('should count words with mixed CJK characters', async () => {
292+
const html = await liquid.parseAndRender('{{ "你好こんにちは안녕하세요" | number_of_words: "cjk" }}')
293+
expect(html).toEqual('12')
294+
})
295+
})
296+
describe('array_to_sentence_string', () => {
297+
it('should handle an empty array', async () => {
298+
const html = await liquid.parseAndRender('{{ arr | array_to_sentence_string }}', { arr: [] })
299+
expect(html).toEqual('')
300+
})
301+
302+
it('should handle an array with one element', async () => {
303+
const html = await liquid.parseAndRender('{{ arr | array_to_sentence_string }}', { arr: ['apple'] })
304+
expect(html).toEqual('apple')
305+
})
306+
307+
it('should handle an array with two elements', async () => {
308+
const html = await liquid.parseAndRender('{{ arr | array_to_sentence_string }}', { arr: ['apple', 'banana'] })
309+
expect(html).toEqual('apple and banana')
310+
})
311+
312+
it('should handle an array with more than two elements', async () => {
313+
const html = await liquid.parseAndRender('{{ arr | array_to_sentence_string }}', { arr: ['apple', 'banana', 'orange'] })
314+
expect(html).toEqual('apple, banana, and orange')
315+
})
316+
317+
it('should handle an array with custom connector', async () => {
318+
const html = await liquid.parseAndRender('{{ arr | array_to_sentence_string: "or" }}', { arr: ['apple', 'banana', 'orange'] })
319+
expect(html).toEqual('apple, banana, or orange')
320+
})
321+
322+
it('should handle an array of numbers', async () => {
323+
const html = await liquid.parseAndRender('{{ arr | array_to_sentence_string }}', { arr: [1, 2, 3] })
324+
expect(html).toEqual('1, 2, and 3')
325+
})
326+
327+
it('should handle an array of mixed types', async () => {
328+
const html = await liquid.parseAndRender('{{ arr | array_to_sentence_string }}', { arr: ['apple', 2, 'orange'] })
329+
expect(html).toEqual('apple, 2, and orange')
330+
})
331+
332+
it('should handle an array of mixed types', async () => {
333+
const html = await liquid.parseAndRender('{{ "foo,bar,baz" | split: "," | array_to_sentence_string }}')
334+
expect(html).toEqual('foo, bar, and baz')
335+
})
336+
})
241337
})

0 commit comments

Comments
 (0)