feat: RAG zotero pipeline #197

cloter · 2024-05-12T23:31:52Z

cloter
May 12, 2024

I would like to pull documents from a local Zotero library for RAG.
Presently we need to include each document manually, it would be really helpful to include a whole tree of documents, either from a directory on the PC or (and this is the feature request) include them from Zotero.
Zotero has an API that could be used for this, see:
https://www.zotero.org/support/dev/client_coding/javascript_api#running_ad_hoc_javascript_in_zotero
I did not look into Open WebUI's code (not being a very experienced programmer) but if somebody has any hints I could try to tackle this.

tjbck · 2024-06-01T21:33:04Z

tjbck
Jun 1, 2024
Maintainer

https://github.com/urschrei/pyzotero looks promising!

1 reply

ma3oun Aug 19, 2024

It seems this library is only for the online tool. I didn't find anything for a local Zotero library. It requires an API key and and user ID to access your online library. Not what we're looking for.

htc502 · 2025-01-28T22:05:31Z

htc502
Jan 28, 2025

any updates on this?

0 replies

bear8203 · 2025-04-29T02:44:06Z

bear8203
Apr 29, 2025

I wish to have a script to do with vector db automation and get it to see at knowledge base or rag db to get reference easily... my chatgpt 4.5 said :

Zotero Integration via Pipelines for Automatic Knowledge Base Updates

Context

Many Open-WebUI users manage extensive research libraries through Zotero. Currently, updating the Open-WebUI Knowledge Base with Zotero content requires manual exports and uploads of PDF files. Automating this process could significantly enhance usability and keep the Knowledge Base updated seamlessly.

Proposal

Integrate Zotero's API directly within Open-WebUI's Pipelines to periodically fetch and embed documents automatically. This would enable users to keep their Knowledge Base continuously synchronized with their Zotero library, removing manual overhead.

How it Could Work

Use Zotero's official REST API to fetch metadata and attachments (especially PDFs).

Automatically download and parse PDFs to extract text.

Generate embeddings from the text using existing embedding models.

Update these embeddings directly into the configured Vector Database (e.g., Qdrant).

Example Pipeline Implementation (Python):

import requests
from PyPDF2 import PdfReader
from io import BytesIO
from sentence_transformers import SentenceTransformer
from qdrant_client import QdrantClient

ZOTERO_API_KEY = "YOUR_API_KEY"
USER_ID = "YOUR_USER_ID"
QDRANT_URL = "http://localhost:6333"
COLLECTION_NAME = "zotero_kb"
model = SentenceTransformer('BAAI/bge-small-en-v1.5')

def fetch_zotero_items():
    headers = {"Zotero-API-Key": ZOTERO_API_KEY}
    url = f"https://api.zotero.org/users/{USER_ID}/items?format=json"
    response = requests.get(url, headers=headers)
    return response.json()

def extract_pdf_text(pdf_url):
    pdf_response = requests.get(pdf_url)
    pdf = PdfReader(BytesIO(pdf_response.content))
    text = ""
    for page in pdf.pages:
        text += page.extract_text()
    return text

def embed_and_upload_to_qdrant():
    items = fetch_zotero_items()
    qdrant = QdrantClient(QDRANT_URL)

    for item in items:
        attachments = item.get('links', {}).get('attachment', {})
        if attachments:
            pdf_text = extract_pdf_text(attachments['href'])
            vector = model.encode(pdf_text).tolist()

            qdrant.upsert(
                collection_name=COLLECTION_NAME,
                points=[{
                    "id": item['key'],
                    "vector": vector,
                    "payload": {
                        "title": item['data']['title'],
                        "url": item['data'].get('url', '')
                    }
                }]
            )

Benefits

Reduces manual work and potential human error.

Ensures Knowledge Base remains up-to-date with research activities.

Enhances productivity and usability for research-oriented users.

Suggested Implementation

Provide a configurable Zotero pipeline template within the Open-WebUI's official pipeline examples.

Optionally, enable scheduling or manual triggering through Open-WebUI functions (commands).

I would greatly appreciate the community's feedback and support on this idea. If implemented, this could significantly benefit many users relying on Zotero for their research and knowledge management.

Thanks for considering this enhancement!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: RAG zotero pipeline #197

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

feat: RAG zotero pipeline #197

cloter May 12, 2024

Replies: 3 comments · 1 reply

tjbck Jun 1, 2024 Maintainer

ma3oun Aug 19, 2024

htc502 Jan 28, 2025

bear8203 Apr 29, 2025

Zotero Integration via Pipelines for Automatic Knowledge Base Updates

Context

Proposal

How it Could Work

Benefits

Suggested Implementation

cloter
May 12, 2024

Replies: 3 comments 1 reply

tjbck
Jun 1, 2024
Maintainer

htc502
Jan 28, 2025

bear8203
Apr 29, 2025