ComfyUI-Ollama-Describer

🚀 News

[22-02-2025] Added Support for Pydantic Schema Input

Now, nodes can accept Pydantic Schemas as input, making it easier to define structured outputs.
To generate Pydantic schemas, you can use the Python Interpreter Node by Christian Byrne.

Introduction

This extension for ComfyUI enables the use of Ollama LLM models, such as Gemma, Llava (multimodal), Llama2, Llama3, and Mistral.

📌 Features:

Ollama Image Describer 🖼️: Generate structured descriptions of images.
Ollama Text Describer 📝: Extract meaningful insights from text.
Ollama Image Captioner 📷: Create automatic captions for images.
Ollama Captioner Extra Options ⚙️: Advanced customization for captions.
Text Transformer 🔄: Prepend, append, or modify text dynamically.
JSON Property Extractor 📑: Extract specific values from JSON outputs.

Installation

1️⃣ Install Ollama

Follow the official Ollama installation guide.

2️⃣ Install via ComfyUI Manager (Recommended)

The easiest way to install this extension is through ComfyUI Manager:

Open ComfyUI Manager.
Search for ComfyUI-Ollama-Describer.
Click Install and restart ComfyUI.

3️⃣ Install Manually

git clone https://github.com/alisson-anjos/ComfyUI-Ollama-Describer.git

Path should be custom_nodes\ComfyUI-Ollama-Describer.

4️⃣ Install Dependencies

Windows:

Run install.bat

Linux/Mac/Windows:

pip install -r requirements.txt

Usage

Ollama Image Describer 📷🔍

Extracts structured descriptions from images using vision-enabled LLMs.
Useful for analyzing images and generating detailed captions, including objects, actions, and surroundings.

Key Parameters:

model: Select LLaVa models (7B, 13B, etc.).
custom_model: Specify a custom model from Ollama's library.
api_host: Define the API address (e.g., http://localhost:11434).
timeout: Max response time before canceling the request.
temperature: Controls randomness (0 = factual, 1 = creative).
top_k, top_p, repeat_penalty: Fine-tune text generation.
max_tokens: Maximum response length in tokens.
seed_number: Set seed for reproducibility (-1 for random).
keep_model_alive: Defines how long the model stays loaded after execution:
- 0: Unloads immediately.
- -1: Stays loaded indefinitely.
- Any other value (e.g., 10) keeps it in memory for that number of seconds.
prompt: The main instruction for the model.
system_context: Provide additional context for better responses.
structured_output_format: Accepts either a Python dictionary or a valid JSON string to define the expected response structure.

JSON Property Extractor 📑

Used to extract specific values from structured JSON outputs returned by Ollama Image Describer or Ollama Text Describer.
Works by selecting a key (or path) inside a JSON structure and outputting only the requested data.
Useful for filtering, extracting key insights, or formatting responses for further processing.
Compatible with structured_output_format, which allows defining structured outputs via a Python dictionary or a valid JSON string.

Ollama Text Describer 📝🔎

Processes text inputs to generate structured descriptions or summaries.
Ideal for refining text-based outputs and enhancing context understanding.

Ollama Image Captioner 🖼️📖

Automatically generates concise and relevant captions for images.
Processes images from a specified folder, iterates through each file, and generates .txt caption files saved in the output directory.
Useful for bulk image captioning, dataset preparation, and AI-assisted annotation.
Useful for image-to-text applications, content tagging, and accessibility.

Ollama Captioner Extra Options 🎛️

Works in conjunction with Ollama Image Captioner to provide additional customization for captions.
Allows fine-tuning of captions by enabling or disabling specific details like lighting, camera angle, composition, and aesthetic quality.
Useful for controlling caption verbosity, accuracy, and inclusion of metadata like camera settings or image quality.
Helps tailor the output for different applications such as dataset labeling, content creation, and accessibility enhancements.
Provides additional customization settings for generated captions.
Helps refine style, verbosity, and accuracy based on user preferences.

Text Transformer ✏️

Allows users to modify, append, prepend, or replace text dynamically.
Useful for formatting, restructuring, and enhancing text-based outputs.

🛠️ Technical Details

Understanding Model Suffixes & Quantization

Suffix	Meaning
Q	Quantized model (smaller, faster)
4, 8, etc.	Number of bits used (lower = smaller & faster)
K	K-means quantization (more efficient)
M	Medium-sized model
F16 / F32	Floating-point precision (higher = more accurate)

More details on quantization: Medium Article.

Perplexity Explained 🧠

Measures how well a model predicts text.
Lower perplexity = better predictions.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.github/workflows		.github/workflows
images		images
workflows		workflows
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
config.json		config.json
config.py		config.py
install.bat		install.bat
install.py		install.py
nodes.py		nodes.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ComfyUI-Ollama-Describer

🚀 News

[22-02-2025] Added Support for Pydantic Schema Input

Introduction

📌 Features:

Installation

1️⃣ Install Ollama

2️⃣ Install via ComfyUI Manager (Recommended)

3️⃣ Install Manually

4️⃣ Install Dependencies

Windows:

Linux/Mac/Windows:

Usage

Ollama Image Describer 📷🔍

Key Parameters:

JSON Property Extractor 📑

Ollama Text Describer 📝🔎

Ollama Image Captioner 🖼️📖

Ollama Captioner Extra Options 🎛️

Text Transformer ✏️

🛠️ Technical Details

Understanding Model Suffixes & Quantization

Perplexity Explained 🧠

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 5

Languages

License

alisson-anjos/ComfyUI-Ollama-Describer

Folders and files

Latest commit

History

Repository files navigation

ComfyUI-Ollama-Describer

🚀 News

[22-02-2025] Added Support for Pydantic Schema Input

Introduction

📌 Features:

Installation

1️⃣ Install Ollama

2️⃣ Install via ComfyUI Manager (Recommended)

3️⃣ Install Manually

4️⃣ Install Dependencies

Windows:

Linux/Mac/Windows:

Usage

Ollama Image Describer 📷🔍

Key Parameters:

JSON Property Extractor 📑

Ollama Text Describer 📝🔎

Ollama Image Captioner 🖼️📖

Ollama Captioner Extra Options 🎛️

Text Transformer ✏️

🛠️ Technical Details

Understanding Model Suffixes & Quantization

Perplexity Explained 🧠

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 5

Languages

Packages