File tree Expand file tree Collapse file tree 1 file changed +2
-2
lines changed Expand file tree Collapse file tree 1 file changed +2
-2
lines changed Original file line number Diff line number Diff line change @@ -69,8 +69,8 @@ https://huggingface.co/datasets/cahya/simple-wikipedia/resolve/main/simple-wikip
69
69
## Tools using this tokenizer
70
70
71
71
We also created the [ json2bin] ( https://github.com/cahya-wirawan/json2bin ) application to convert datasets from JSONL format
72
- into binidx format, a data format used for training RWKV models. It supports batch encoding with multithreading and
73
- can convert a dataset more than 70 times faster than the original json2binidx program written in Python.
72
+ into binidx format, a data format used for training RWKV models. It uses multithreading to scale up the performance and
73
+ can convert a dataset more than 70 times faster (around 360 MB/s) than the original json2binidx program written in Python.
74
74
75
75
## Changelog
76
76
- Version 0.9.0
You can’t perform that action at this time.
0 commit comments