Replies: 3 comments 1 reply
-
https://github.com/urschrei/pyzotero looks promising! |
Beta Was this translation helpful? Give feedback.
-
any updates on this? |
Beta Was this translation helpful? Give feedback.
-
I wish to have a script to do with vector db automation and get it to see at knowledge base or rag db to get reference easily... my chatgpt 4.5 said : Zotero Integration via Pipelines for Automatic Knowledge Base UpdatesContextMany Open-WebUI users manage extensive research libraries through Zotero. Currently, updating the Open-WebUI Knowledge Base with Zotero content requires manual exports and uploads of PDF files. Automating this process could significantly enhance usability and keep the Knowledge Base updated seamlessly. ProposalIntegrate Zotero's API directly within Open-WebUI's Pipelines to periodically fetch and embed documents automatically. This would enable users to keep their Knowledge Base continuously synchronized with their Zotero library, removing manual overhead. How it Could WorkUse Zotero's official REST API to fetch metadata and attachments (especially PDFs). Automatically download and parse PDFs to extract text. Generate embeddings from the text using existing embedding models. Update these embeddings directly into the configured Vector Database (e.g., Qdrant). Example Pipeline Implementation (Python): import requests
from PyPDF2 import PdfReader
from io import BytesIO
from sentence_transformers import SentenceTransformer
from qdrant_client import QdrantClient
ZOTERO_API_KEY = "YOUR_API_KEY"
USER_ID = "YOUR_USER_ID"
QDRANT_URL = "http://localhost:6333"
COLLECTION_NAME = "zotero_kb"
model = SentenceTransformer('BAAI/bge-small-en-v1.5')
def fetch_zotero_items():
headers = {"Zotero-API-Key": ZOTERO_API_KEY}
url = f"https://api.zotero.org/users/{USER_ID}/items?format=json"
response = requests.get(url, headers=headers)
return response.json()
def extract_pdf_text(pdf_url):
pdf_response = requests.get(pdf_url)
pdf = PdfReader(BytesIO(pdf_response.content))
text = ""
for page in pdf.pages:
text += page.extract_text()
return text
def embed_and_upload_to_qdrant():
items = fetch_zotero_items()
qdrant = QdrantClient(QDRANT_URL)
for item in items:
attachments = item.get('links', {}).get('attachment', {})
if attachments:
pdf_text = extract_pdf_text(attachments['href'])
vector = model.encode(pdf_text).tolist()
qdrant.upsert(
collection_name=COLLECTION_NAME,
points=[{
"id": item['key'],
"vector": vector,
"payload": {
"title": item['data']['title'],
"url": item['data'].get('url', '')
}
}]
) BenefitsReduces manual work and potential human error. Ensures Knowledge Base remains up-to-date with research activities. Enhances productivity and usability for research-oriented users. Suggested ImplementationProvide a configurable Zotero pipeline template within the Open-WebUI's official pipeline examples. Optionally, enable scheduling or manual triggering through Open-WebUI functions (commands). I would greatly appreciate the community's feedback and support on this idea. If implemented, this could significantly benefit many users relying on Zotero for their research and knowledge management. Thanks for considering this enhancement! |
Beta Was this translation helpful? Give feedback.
-
I would like to pull documents from a local Zotero library for RAG.
Presently we need to include each document manually, it would be really helpful to include a whole tree of documents, either from a directory on the PC or (and this is the feature request) include them from Zotero.
Zotero has an API that could be used for this, see:
https://www.zotero.org/support/dev/client_coding/javascript_api#running_ad_hoc_javascript_in_zotero
I did not look into Open WebUI's code (not being a very experienced programmer) but if somebody has any hints I could try to tackle this.
Beta Was this translation helpful? Give feedback.
All reactions