Also download and tokenize datasets in pure C #230
matiasdelellis
started this conversation in
Ideas
Replies: 1 comment
-
That's just because we're initializing from OpenAI GPT-2 weights and we're using Python to download and write them conveniently. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
There is no need for 245MB of PyTorch or 107MB of cPython... ?
I loved this statement, but when proceeding to install the dependencies, it seems that it needs several gigabytes of python dependencies just to download the datasets.. 😞
I guess this could also be implemented in pure C... Of course I say this even without understanding how this works 😅 , but your projects are great, and I suppose this would be a good goal in line with the project...😬
Beta Was this translation helpful? Give feedback.
All reactions