Skip to content

[Feature]: Resource ingestion pipeline to the vector database #43

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jurmy24 opened this issue Nov 1, 2024 · 2 comments
Closed

[Feature]: Resource ingestion pipeline to the vector database #43

jurmy24 opened this issue Nov 1, 2024 · 2 comments
Assignees
Labels
dev Anything related to internal tooling/tests/CICD enhancement New feature or request invalid This doesn't seem right

Comments

@jurmy24
Copy link
Member

jurmy24 commented Nov 1, 2024

Is your feature request related to a problem? Please describe.
It takes a lot of work to convert the PDF's of eg. textbooks to well divided chunks and also to find all the relevant information such as page numbers, associated chapter, subsection, etc... and put this into the vector database chunks (+ the associated resources and sections tables)

Describe the solution you'd like
I want a pipeline under the scripts/database folder that takes as input a PDF and automatically uploads the chunks and its metadata to the database. Discuss with me on the best way to do this.

@jurmy24 jurmy24 added enhancement New feature or request dev Anything related to internal tooling/tests/CICD labels Nov 1, 2024
@jurmy24 jurmy24 linked a pull request Nov 30, 2024 that will close this issue
@jurmy24 jurmy24 removed a link to a pull request Nov 30, 2024
@jurmy24
Copy link
Member Author

jurmy24 commented Dec 9, 2024

This is quite a big feature request. Will likely split it up.

@jurmy24 jurmy24 added this to the Resource Ingestion Pipeline milestone Dec 10, 2024
@jurmy24 jurmy24 added the invalid This doesn't seem right label Jan 28, 2025
@jurmy24
Copy link
Member Author

jurmy24 commented Jan 28, 2025

Too big of an issue. Delegate to twiga-warehouse repository.

@jurmy24 jurmy24 closed this as not planned Won't fix, can't repro, duplicate, stale Jan 28, 2025
@jurmy24 jurmy24 closed this as completed May 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dev Anything related to internal tooling/tests/CICD enhancement New feature or request invalid This doesn't seem right
Projects
None yet
Development

No branches or pull requests

4 participants