[Feature]: Resource ingestion pipeline to the vector database #43
Labels
dev
Anything related to internal tooling/tests/CICD
enhancement
New feature or request
invalid
This doesn't seem right
Is your feature request related to a problem? Please describe.
It takes a lot of work to convert the PDF's of eg. textbooks to well divided chunks and also to find all the relevant information such as page numbers, associated chapter, subsection, etc... and put this into the vector database chunks (+ the associated resources and sections tables)
Describe the solution you'd like
I want a pipeline under the scripts/database folder that takes as input a PDF and automatically uploads the chunks and its metadata to the database. Discuss with me on the best way to do this.
The text was updated successfully, but these errors were encountered: