Embeddings VIA API? #51

drewskidang · 2025-04-06T14:26:09Z

I dont have the hardware to run the embeddings locally ist here a way to configure use the API?

PatrickMer · 2025-04-06T14:33:18Z

I tried defining the embedding model as part of model list and it still ran locally. Also looking for an API solution for embeddings.

model_list: 
  - model_name: deepseek/deepseek-chat-v3-0324:free
    provider: null
    api_key: $OPENROUTER_TOKEN
    base_url: "https://openrouter.ai/api/v1"
    max_concurrent_requests: 4
  - model_name: intfloat/multilingual-e5-large-instruct
    provider: hf-inference
    api_key: $HF_TOKEN
    base_url: null
    max_concurrent_requests: 4

model_roles:
  ingestion:
    - deepseek/deepseek-chat-v3-0324:free
  summarization:
    - deepseek/deepseek-chat-v3-0324:free
  chunking:
    - intfloat/multilingual-e5-large-instruct # your sentence level chunking model
  single_shot_question_generation:
    - deepseek/deepseek-chat-v3-0324:free
  multi_hop_question_generation:
    - deepseek/deepseek-chat-v3-0324:free```

drewskidang · 2025-04-06T14:36:56Z

@PatrickMer you need to change the chunking.py but the way its run is for local deployment now api

drewskidang · 2025-04-06T15:11:34Z

@PatrickMer did you get the repo running??

PatrickMer · 2025-04-06T15:16:21Z

Yeah, I got the repo running locally on the example and then using OpenRouter for the example, but when trying to use a larger dataset (25 files, 75MB), the embedding uses all of my RAM (~14GB) and is not feasable.

What changes did you make to chunking.py?

drewskidang · 2025-04-06T15:23:32Z

i didn't make any changes lol, it's taking way too long to chunk its stuck on the chunking mode

PatrickMer · 2025-04-06T16:48:10Z

@drewskidang I tried changing chunking.py to use cuda instead of cpu for semantic modelling. It worked, and chunking started to run without using too much RAM, but I still ran out of RAM and got a memory error about 5 minutes in.

Do you think using a smaller model could work? Or is the memory issue coming from loading the data into memory?

sumukshashidhar · 2025-04-06T17:01:10Z

@PatrickMer @drewskidang have you tried using the fast chunking mode? the quality difference for most tasks should be negligible

PatrickMer · 2025-04-06T17:04:40Z

Yeah, I got the repo running locally on the example and then using OpenRouter for the example, but when trying to use a larger dataset (25 files, 75MB), the embedding uses all of my RAM (~14GB) and is not feasable.

@sumukshashidhar This was using fast chunking mode, I only tried semantic thinking using VRAM for the model might reduce the program's RAM usage. Have you tried running larger datasets, and if so on what hardware?

drewskidang · 2025-04-06T20:41:51Z

@PatrickMer i redid the entire chunking to use llama index for data loading lol. And use togetherapi for embeddings

PatrickMer · 2025-04-06T22:24:25Z

@drewskidang haha did it work. Could you share that.

drewskidang · 2025-04-07T00:11:49Z

@PatrickMer will tomorrow lol if don't remind me

sumukshashidhar · 2025-04-10T14:26:11Z

@PatrickMer I did try it with large, but I tried it on 8xH100 machines 😅, which was an oversight. I'll investigate this part further. In the meantime @drewskidang - would be great if you could share the embedding API implementation / make a PR 😄

sumukshashidhar mentioned this issue Apr 10, 2025

Measure RAM usage for large document sets #57

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Embeddings VIA API? #51

Embeddings VIA API? #51

drewskidang commented Apr 6, 2025

PatrickMer commented Apr 6, 2025

Uh oh!

drewskidang commented Apr 6, 2025

Uh oh!

drewskidang commented Apr 6, 2025

Uh oh!

PatrickMer commented Apr 6, 2025 •

edited

Loading

Uh oh!

drewskidang commented Apr 6, 2025

Uh oh!

PatrickMer commented Apr 6, 2025

Uh oh!

sumukshashidhar commented Apr 6, 2025

Uh oh!

PatrickMer commented Apr 6, 2025 •

edited

Loading

Uh oh!

drewskidang commented Apr 6, 2025

Uh oh!

PatrickMer commented Apr 6, 2025

Uh oh!

drewskidang commented Apr 7, 2025

Uh oh!

sumukshashidhar commented Apr 10, 2025

Uh oh!

Embeddings VIA API? #51

Embeddings VIA API? #51

Comments

drewskidang commented Apr 6, 2025

PatrickMer commented Apr 6, 2025

Uh oh!

drewskidang commented Apr 6, 2025

Uh oh!

drewskidang commented Apr 6, 2025

Uh oh!

PatrickMer commented Apr 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

drewskidang commented Apr 6, 2025

Uh oh!

PatrickMer commented Apr 6, 2025

Uh oh!

sumukshashidhar commented Apr 6, 2025

Uh oh!

PatrickMer commented Apr 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

drewskidang commented Apr 6, 2025

Uh oh!

PatrickMer commented Apr 6, 2025

Uh oh!

drewskidang commented Apr 7, 2025

Uh oh!

sumukshashidhar commented Apr 10, 2025

Uh oh!

PatrickMer commented Apr 6, 2025 •

edited

Loading

PatrickMer commented Apr 6, 2025 •

edited

Loading