-
Notifications
You must be signed in to change notification settings - Fork 25
Embeddings VIA API? #51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I tried defining the embedding model as part of model list and it still ran locally. Also looking for an API solution for embeddings.
|
@PatrickMer you need to change the chunking.py but the way its run is for local deployment now api |
@PatrickMer did you get the repo running?? |
Yeah, I got the repo running locally on the example and then using OpenRouter for the example, but when trying to use a larger dataset (25 files, 75MB), the embedding uses all of my RAM (~14GB) and is not feasable. What changes did you make to chunking.py? |
i didn't make any changes lol, it's taking way too long to chunk its stuck on the chunking mode |
@drewskidang I tried changing chunking.py to use cuda instead of cpu for semantic modelling. It worked, and chunking started to run without using too much RAM, but I still ran out of RAM and got a memory error about 5 minutes in. Do you think using a smaller model could work? Or is the memory issue coming from loading the data into memory? |
@PatrickMer @drewskidang have you tried using the fast chunking mode? the quality difference for most tasks should be negligible |
@sumukshashidhar This was using fast chunking mode, I only tried semantic thinking using VRAM for the model might reduce the program's RAM usage. Have you tried running larger datasets, and if so on what hardware? |
@PatrickMer i redid the entire chunking to use llama index for data loading lol. And use togetherapi for embeddings |
@drewskidang haha did it work. Could you share that. |
@PatrickMer will tomorrow lol if don't remind me |
@PatrickMer I did try it with large, but I tried it on 8xH100 machines 😅, which was an oversight. I'll investigate this part further. In the meantime @drewskidang - would be great if you could share the embedding API implementation / make a PR 😄 |
I dont have the hardware to run the embeddings locally ist here a way to configure use the API?
The text was updated successfully, but these errors were encountered: