Skip to content

Commit 663493e

Browse files
committed
updated README.md
1 parent 1337ec2 commit 663493e

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -69,8 +69,8 @@ https://huggingface.co/datasets/cahya/simple-wikipedia/resolve/main/simple-wikip
6969
## Tools using this tokenizer
7070

7171
We also created the [json2bin](https://github.com/cahya-wirawan/json2bin) application to convert datasets from JSONL format
72-
into binidx format, a data format used for training RWKV models. It supports batch encoding with multithreading and
73-
can convert a dataset more than 70 times faster than the original json2binidx program written in Python.
72+
into binidx format, a data format used for training RWKV models. It uses multithreading to scale up the performance and
73+
can convert a dataset more than 70 times faster (around 360 MB/s) than the original json2binidx program written in Python.
7474

7575
## Changelog
7676
- Version 0.9.0

0 commit comments

Comments
 (0)