Skip to content

Commit d313734

Browse files
Commit
1 parent 068c78b commit d313734

File tree

1 file changed

+157
-162
lines changed

1 file changed

+157
-162
lines changed

README.md

Lines changed: 157 additions & 162 deletions
Original file line numberDiff line numberDiff line change
@@ -1,165 +1,160 @@
1-
# Hugging Face Local Inference Examples
2-
3-
This repository contains a collection of Python scripts demonstrating how to run various AI tasks locally using models from the Hugging Face Hub and the `transformers` library (along with related libraries like `datasets`, `sentence-transformers`, etc.).
4-
5-
These examples cover a range of modalities including **Text**, **Vision**, **Audio**, and **Multimodal** combinations, showcasing different models and pipelines available within the Hugging Face ecosystem. Each script aims to be runnable with minimal modification (often just providing an input file path or configuring text/labels/data within the script).
6-
7-
## Examples Included
8-
9-
The scripts are categorized by the primary data modalities they handle:
10-
11-
### 📝 Text Examples
12-
13-
1. **Sentiment Analysis (`run_sentiment.py`)**
14-
* Task: Text Classification (Positive/Negative)
15-
* Model: `distilbert-base-uncased-finetuned-sst-2-english` (Pipeline Default)
16-
2. **Text Generation (`run_generation.py`)**
17-
* Task: Generating text following a prompt.
18-
* Model: `gpt2`
19-
3. **Zero-Shot Text Classification (`run_zero_shot.py`)**
20-
* Task: Classifying text using arbitrary labels without specific fine-tuning.
21-
* Model: `facebook/bart-large-mnli` (Pipeline Default)
22-
4. **Named Entity Recognition (NER) (`run_ner.py`)**
23-
* Task: Identifying named entities (Person, Location, Org).
24-
* Model: `dbmdz/bert-large-cased-finetuned-conll03-english`
25-
5. **Summarization (`run_summarization.py`)**
26-
* Task: Creating a shorter summary of a longer text.
27-
* Model: `facebook/bart-large-cnn`
28-
6. **Translation (EN->FR) (`run_translation.py`)**
29-
* Task: Translating text from English to French.
30-
* Model: `Helsinki-NLP/opus-mt-en-fr`
31-
7. **Question Answering (Extractive Text) (`run_qa.py`)**
32-
* Task: Finding the answer span within a context paragraph given a question.
33-
* Model: `distilbert-base-cased-distilled-squad`
34-
8. **Fill-Mask (`run_fill_mask.py`)**
35-
* Task: Predicting masked words in a sentence (Masked Language Modeling).
36-
* Model: `roberta-base`
37-
9. **Sentence Embeddings & Similarity (`run_embeddings.py`, `run_similarity_search.py`)**
38-
* Task: Generating semantic vector representations and finding similar sentences.
39-
* Model: `sentence-transformers/all-MiniLM-L6-v2` (via `sentence-transformers` library)
40-
10. **Emotion Classification (`run_emotion.py`)**
41-
* Task: Text Classification (Detecting emotions like joy, anger, sadness).
42-
* Model: `j-hartmann/emotion-english-distilroberta-base`
43-
11. **Table Question Answering (`run_table_qa.py`)**
44-
* Task: Answering questions based on tabular data (requires `pandas`, `torch-scatter`).
45-
* Model: `google/tapas-base-finetuned-wtq`
46-
12. **Dialogue Simulation (`run_dialogue_generation.py`)**
47-
* Task: Simulating multi-turn conversation via text generation pipeline.
48-
* Model: `microsoft/DialoGPT-medium`
49-
13. **Part-of-Speech (POS) Tagging (`run_pos_tagging.py`)**
50-
* Task: Identifying grammatical parts of speech for each word.
51-
* Model: `vblagoje/bert-english-uncased-finetuned-pos`
52-
53-
### 🖼️ Vision Examples (Purely Image Input/Output)
54-
55-
1. **Image Classification (`run_image_classification.py`)**
56-
* Task: Classifying the main subject of an image.
57-
* Model: `google/vit-base-patch16-224`
58-
2. **Object Detection (`run_object_detection_annotated.py`)**
59-
* Task: Identifying multiple objects in an image with bounding boxes and labels (plus annotation).
60-
* Model: `facebook/detr-resnet-50`
61-
3. **Depth Estimation (`run_depth_estimation.py`)**
62-
* Task: Estimating depth from a single image, saving a depth map.
63-
* Model: `Intel/dpt-large`
64-
4. **Image Segmentation (`run_segmentation.py`)**
65-
* Task: Assigning category labels (e.g., road, sky, car) to each pixel (requires `matplotlib`, `numpy`).
66-
* Model: `nvidia/segformer-b0-finetuned-ade-512-512`
67-
5. **Image Super-Resolution (`run_super_resolution.py`)**
68-
* Task: Upscaling an image (x2) to enhance resolution.
69-
* Model: `caidas/swin2SR-classical-sr-x2-64`
70-
71-
### 🎧 Audio Examples (Purely Audio Input/Output)
72-
73-
1. **Audio Classification (`run_audio_classification.py`)**
74-
* Task: Classifying the type of sound in an audio file (e.g., Speech, Music). Requires `torchaudio`.
75-
* Model: `MIT/ast-finetuned-audioset-10-10-0.4593`
76-
77-
### 🔄 Multimodal Examples (Vision + Text)
78-
79-
1. **Image Captioning (`run_image_captioning.py`)**
80-
* Task: Generating a text description for an image.
81-
* Model: `nlpconnect/vit-gpt2-image-captioning`
82-
2. **Visual Question Answering (VQA) (`run_vqa.py`)**
83-
* Task: Answering questions based on image content.
84-
* Model: `dandelin/vilt-b32-finetuned-vqa`
85-
3. **Zero-Shot Image Classification (`run_zero_shot_image.py`)**
86-
* Task: Classifying images against arbitrary text labels (requires `ftfy`, `regex`).
87-
* Model: `openai/clip-vit-base-patch32`
88-
4. **Document Question Answering (DocVQA) (`run_docvqa.py`)**
89-
* Task: Answering questions based on document image content (requires `sentencepiece`).
90-
* Model: `naver-clova-ix/donut-base-finetuned-docvqa`
91-
92-
### 🔄 Multimodal Examples (Audio + Text)
93-
94-
1. **Automatic Speech Recognition (ASR) (`run_asr_flexible.py`)**
95-
* Task: Transcribing speech from an audio file to text.
96-
* Model: `openai/whisper-base`
97-
2. **Zero-Shot Audio Classification (`run_zero_shot_audio.py`)**
98-
* Task: Classifying sounds against arbitrary text labels.
99-
* Model: `laion/clap-htsat-unfused`
100-
3. **Text-to-Speech (TTS) (`run_tts.py`)**
101-
* Task: Generating speech audio from text (requires `SpeechRecognition`, `protobuf`).
102-
* Model: `microsoft/speecht5_tts` + `microsoft/speecht5_hifigan`
103-
104-
*(Refer to comments within each script for more specific details on models and implementation.)*
105-
106-
## Prerequisites
107-
108-
Before running these scripts, ensure you have the following:
109-
110-
1. **Python:** Python 3.8 or later is recommended.
111-
2. **System Dependencies (Ubuntu/Debian):** Some scripts (especially audio-related) require system libraries. Install common ones using:
112-
```bash
113-
# libsndfile1 is for reading/writing audio files
114-
# ffmpeg is often needed by libraries for handling various audio/video formats
115-
sudo apt update && sudo apt install libsndfile1 ffmpeg
116-
```
117-
*(Removed `tesseract-ocr`. Other operating systems may require different commands).*
118-
3. **Python Libraries:** It's highly recommended to use a Python virtual environment. You can install all common dependencies used across the remaining examples with a single command:
119-
```bash
120-
pip install "transformers[audio,sentencepiece]" torch datasets soundfile librosa sentence-transformers Pillow torchvision timm requests pandas torch-scatter ftfy regex numpy torchaudio matplotlib SpeechRecognition protobuf
121-
```
122-
* **Note:** Removed `pytesseract`. Using `"transformers[audio,sentencepiece]"` helps install common audio dependencies and `sentencepiece`. Not every script requires *all* of these libraries. However, installing them all ensures you can run most examples. Refer to comments within the files for minimal requirements if needed.
123-
124-
## General Usage
125-
126-
1. **Clone the Repository:**
127-
```bash
128-
git clone <repository-url>
129-
cd <repository-directory>
130-
```
131-
2. **Create Virtual Environment (Recommended):**
132-
```bash
133-
python3 -m venv .venv
134-
source .venv/bin/activate
135-
```
136-
*(Use `.\.venv\Scripts\activate` on Windows)*
137-
3. **Install System Dependencies:** Follow the instructions in the Prerequisites section if applicable for your OS (especially `libsndfile1`, `ffmpeg` on Ubuntu/Debian).
138-
4. **Install Python Libraries:** Run the combined pip command from the Prerequisites section within your activated virtual environment.
139-
5. **Configure Script Inputs (IMPORTANT):**
140-
* Many scripts require you to provide input, such as a path to a local **image file**, an **audio file**, specific **text/questions**, **candidate labels**, or **table data** inside the script.
141-
* **Open the specific `.py` script you want to run** in a text editor before executing it.
142-
* Look for comments indicating `USER ACTION REQUIRED` or variables like `user_image_path`, `user_audio_path`, `user_doc_image_path`, `question`, `candidate_labels`, `data` (for tables), `text_to_speak`, etc.
143-
* **Modify these variables** according to the script's needs (e.g., provide a valid file path, change the question text, update labels, define table data). Some scripts include logic to download a sample file if a local one isn't found - read the script comments for details.
144-
6. **Run the Script:**
145-
* Execute the desired script using Python from your terminal (ensure your virtual environment is active):
146-
```bash
147-
python <script_name>.py
148-
```
149-
(e.g., `python run_sentiment.py`, `python run_docvqa.py`)
150-
151-
## Model Downloads
152-
153-
The first time you run a script using a specific Hugging Face model, the necessary model weights, configuration, and tokenizer/processor files will be automatically downloaded from the Hugging Face Hub and cached locally (usually in `~/.cache/huggingface/` or `C:\Users\<User>\.cache\huggingface\`). Subsequent runs using the same model will load directly from the cache, making them much faster and enabling offline use (provided all necessary files are cached).
154-
155-
## Hardware Considerations
156-
157-
* **CPU:** Most scripts will run on a CPU, but performance (especially for larger models or complex tasks like vision, audio, generation) might be slow.
158-
* **GPU:** An NVIDIA GPU with CUDA configured correctly and a compatible version of `torch` installed is highly recommended for significantly faster inference. The scripts include basic logic to attempt using the GPU if available.
159-
* **RAM:** Models vary greatly in size. Ensure you have sufficient RAM. Smaller models might need 4-8GB, while larger ones (like `large` variants, vision/audio/document models) might require 16GB or more.
1+
# NLP Examples: A Collection of AI Scripts 🤖
2+
3+
![NLP Examples](https://img.shields.io/badge/NLP%20Examples-Collection%20of%20AI%20Scripts-blue)
4+
5+
Welcome to the **NLP Examples** repository! This project features a variety of Python scripts that demonstrate how to run various AI tasks locally. We utilize models from the Hugging Face Hub and the transformers library, along with related libraries like datasets and sentence-transformers. Our examples cover a range of modalities, including text, vision, and audio, showcasing different models and pipelines.
6+
7+
## Table of Contents
8+
9+
1. [Features](#features)
10+
2. [Installation](#installation)
11+
3. [Usage](#usage)
12+
4. [Examples](#examples)
13+
5. [Contributing](#contributing)
14+
6. [License](#license)
15+
7. [Links](#links)
16+
17+
## Features
18+
19+
- **Text Processing**: Utilize state-of-the-art NLP models like BERT for tasks such as text classification and sentiment analysis.
20+
- **Audio Processing**: Explore automatic speech recognition (ASR) models to transcribe audio files into text.
21+
- **Vision Tasks**: Implement models like DETR for object detection in images.
22+
- **Comprehensive Examples**: Each script is self-contained and includes detailed comments to guide you through the code.
23+
24+
## Installation
25+
26+
To get started, you'll need to set up your environment. Follow these steps:
27+
28+
1. **Clone the Repository**:
29+
30+
```bash
31+
git clone https://github.com/Sleepparalysis1/NLP-Examples.git
32+
cd NLP-Examples
33+
```
34+
35+
2. **Create a Virtual Environment** (optional but recommended):
36+
37+
```bash
38+
python -m venv venv
39+
source venv/bin/activate # On Windows use `venv\Scripts\activate`
40+
```
41+
42+
3. **Install Dependencies**:
43+
44+
Use `pip` to install the required libraries.
45+
46+
```bash
47+
pip install -r requirements.txt
48+
```
49+
50+
## Usage
51+
52+
Each script in this repository serves a specific purpose. You can run them directly from the command line. For example, to run a text classification script, use:
53+
54+
```bash
55+
python text_classification.py --input "Your text here"
56+
```
57+
58+
Make sure to check the script for additional options.
59+
60+
## Examples
61+
62+
### Text Classification with BERT
63+
64+
This example shows how to use a BERT model for text classification.
65+
66+
```python
67+
from transformers import BertTokenizer, BertForSequenceClassification
68+
import torch
69+
70+
# Load pre-trained model and tokenizer
71+
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
72+
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
73+
74+
# Prepare input
75+
inputs = tokenizer("Hello, world!", return_tensors="pt")
76+
77+
# Get predictions
78+
with torch.no_grad():
79+
outputs = model(**inputs)
80+
predictions = torch.argmax(outputs.logits, dim=-1)
81+
82+
print(f"Predicted class: {predictions.item()}")
83+
```
84+
85+
### Automatic Speech Recognition
86+
87+
This example demonstrates how to transcribe audio using an ASR model.
88+
89+
```python
90+
from transformers import Wav2Vec2ForCTC, Wav2Vec2Tokenizer
91+
import torch
92+
93+
# Load pre-trained model and tokenizer
94+
tokenizer = Wav2Vec2Tokenizer.from_pretrained('facebook/wav2vec2-base-960h')
95+
model = Wav2Vec2ForCTC.from_pretrained('facebook/wav2vec2-base-960h')
96+
97+
# Load audio file
98+
audio_input = "path/to/audio.wav"
99+
100+
# Transcribe audio
101+
input_values = tokenizer(audio_input, return_tensors="pt").input_values
102+
with torch.no_grad():
103+
logits = model(input_values).logits
104+
105+
# Get predicted ids
106+
predicted_ids = torch.argmax(logits, dim=-1)
107+
108+
# Decode the ids to text
109+
transcription = tokenizer.batch_decode(predicted_ids)
110+
print(f"Transcription: {transcription[0]}")
111+
```
112+
113+
### Object Detection with DETR
114+
115+
This example illustrates how to use the DETR model for object detection.
116+
117+
```python
118+
from transformers import DetrImageProcessor, DetrForObjectDetection
119+
import torch
120+
from PIL import Image
121+
122+
# Load pre-trained model and processor
123+
processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50")
124+
model = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50")
125+
126+
# Load image
127+
image = Image.open("path/to/image.jpg")
128+
129+
# Prepare input
130+
inputs = processor(images=image, return_tensors="pt")
131+
132+
# Get predictions
133+
with torch.no_grad():
134+
outputs = model(**inputs)
135+
136+
# Process outputs
137+
# (Add code to visualize results)
138+
```
139+
140+
## Contributing
141+
142+
We welcome contributions to this repository! If you have an idea for a new example or improvement, please follow these steps:
143+
144+
1. Fork the repository.
145+
2. Create a new branch (`git checkout -b feature/YourFeature`).
146+
3. Make your changes and commit them (`git commit -m 'Add new feature'`).
147+
4. Push to your branch (`git push origin feature/YourFeature`).
148+
5. Open a pull request.
149+
150+
Please ensure your code adheres to the existing style and includes comments for clarity.
160151

161152
## License
162153

163-
* The Python scripts in this repository are provided as examples, likely under the MIT License (or specify your chosen license).
164-
* The Hugging Face libraries (`transformers`, `datasets`, etc.) are typically licensed under Apache 2.0.
165-
* Individual models downloaded from the Hugging Face Hub have their own licenses. Please refer to the model card on the Hub for specific terms of use for each model (note that some models like Donut or specific fine-tunes might have non-commercial or other restrictions).
154+
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
155+
156+
## Links
157+
158+
For the latest updates and releases, visit the [Releases section](https://github.com/Sleepparalysis1/NLP-Examples/releases). Download and execute the files to explore the examples.
159+
160+
You can also check the "Releases" section for additional resources and updates.

0 commit comments

Comments
 (0)