[Feature]: Resource ingestion pipeline to the vector database #43

jurmy24 · 2024-11-01T09:35:38Z

Is your feature request related to a problem? Please describe.
It takes a lot of work to convert the PDF's of eg. textbooks to well divided chunks and also to find all the relevant information such as page numbers, associated chapter, subsection, etc... and put this into the vector database chunks (+ the associated resources and sections tables)

Describe the solution you'd like
I want a pipeline under the scripts/database folder that takes as input a PDF and automatically uploads the chunks and its metadata to the database. Discuss with me on the best way to do this.

jurmy24 · 2024-12-09T20:32:11Z

This is quite a big feature request. Will likely split it up.

jurmy24 · 2025-01-28T20:25:59Z

Too big of an issue. Delegate to twiga-warehouse repository.

jurmy24 added enhancement New feature or request dev Anything related to internal tooling/tests/CICD labels Nov 1, 2024

jurmy24 linked a pull request Nov 30, 2024 that will close this issue

Jurmy24/development/flow review #61

Merged

jurmy24 removed a link to a pull request Nov 30, 2024

Jurmy24/development/flow review #61

Merged

jurmy24 added this to the Resource Ingestion Pipeline milestone Dec 10, 2024

jurmy24 assigned alvaro-mazcu, iamrobzy and louhelhir Dec 10, 2024

jurmy24 added the invalid This doesn't seem right label Jan 28, 2025

jurmy24 closed this as not planned Won't fix, can't repro, duplicate, stale Jan 28, 2025

jurmy24 closed this as completed May 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature]: Resource ingestion pipeline to the vector database #43

[Feature]: Resource ingestion pipeline to the vector database #43

jurmy24 commented Nov 1, 2024

jurmy24 commented Dec 9, 2024

Uh oh!

jurmy24 commented Jan 28, 2025

Uh oh!

[Feature]: Resource ingestion pipeline to the vector database #43

[Feature]: Resource ingestion pipeline to the vector database #43

Comments

jurmy24 commented Nov 1, 2024

jurmy24 commented Dec 9, 2024

Uh oh!

jurmy24 commented Jan 28, 2025

Uh oh!