|
1 |
| -# Hugging Face Local Inference Examples |
2 |
| - |
3 |
| -This repository contains a collection of Python scripts demonstrating how to run various AI tasks locally using models from the Hugging Face Hub and the `transformers` library (along with related libraries like `datasets`, `sentence-transformers`, etc.). |
4 |
| - |
5 |
| -These examples cover a range of modalities including **Text**, **Vision**, **Audio**, and **Multimodal** combinations, showcasing different models and pipelines available within the Hugging Face ecosystem. Each script aims to be runnable with minimal modification (often just providing an input file path or configuring text/labels/data within the script). |
6 |
| - |
7 |
| -## Examples Included |
8 |
| - |
9 |
| -The scripts are categorized by the primary data modalities they handle: |
10 |
| - |
11 |
| -### 📝 Text Examples |
12 |
| - |
13 |
| -1. **Sentiment Analysis (`run_sentiment.py`)** |
14 |
| - * Task: Text Classification (Positive/Negative) |
15 |
| - * Model: `distilbert-base-uncased-finetuned-sst-2-english` (Pipeline Default) |
16 |
| -2. **Text Generation (`run_generation.py`)** |
17 |
| - * Task: Generating text following a prompt. |
18 |
| - * Model: `gpt2` |
19 |
| -3. **Zero-Shot Text Classification (`run_zero_shot.py`)** |
20 |
| - * Task: Classifying text using arbitrary labels without specific fine-tuning. |
21 |
| - * Model: `facebook/bart-large-mnli` (Pipeline Default) |
22 |
| -4. **Named Entity Recognition (NER) (`run_ner.py`)** |
23 |
| - * Task: Identifying named entities (Person, Location, Org). |
24 |
| - * Model: `dbmdz/bert-large-cased-finetuned-conll03-english` |
25 |
| -5. **Summarization (`run_summarization.py`)** |
26 |
| - * Task: Creating a shorter summary of a longer text. |
27 |
| - * Model: `facebook/bart-large-cnn` |
28 |
| -6. **Translation (EN->FR) (`run_translation.py`)** |
29 |
| - * Task: Translating text from English to French. |
30 |
| - * Model: `Helsinki-NLP/opus-mt-en-fr` |
31 |
| -7. **Question Answering (Extractive Text) (`run_qa.py`)** |
32 |
| - * Task: Finding the answer span within a context paragraph given a question. |
33 |
| - * Model: `distilbert-base-cased-distilled-squad` |
34 |
| -8. **Fill-Mask (`run_fill_mask.py`)** |
35 |
| - * Task: Predicting masked words in a sentence (Masked Language Modeling). |
36 |
| - * Model: `roberta-base` |
37 |
| -9. **Sentence Embeddings & Similarity (`run_embeddings.py`, `run_similarity_search.py`)** |
38 |
| - * Task: Generating semantic vector representations and finding similar sentences. |
39 |
| - * Model: `sentence-transformers/all-MiniLM-L6-v2` (via `sentence-transformers` library) |
40 |
| -10. **Emotion Classification (`run_emotion.py`)** |
41 |
| - * Task: Text Classification (Detecting emotions like joy, anger, sadness). |
42 |
| - * Model: `j-hartmann/emotion-english-distilroberta-base` |
43 |
| -11. **Table Question Answering (`run_table_qa.py`)** |
44 |
| - * Task: Answering questions based on tabular data (requires `pandas`, `torch-scatter`). |
45 |
| - * Model: `google/tapas-base-finetuned-wtq` |
46 |
| -12. **Dialogue Simulation (`run_dialogue_generation.py`)** |
47 |
| - * Task: Simulating multi-turn conversation via text generation pipeline. |
48 |
| - * Model: `microsoft/DialoGPT-medium` |
49 |
| -13. **Part-of-Speech (POS) Tagging (`run_pos_tagging.py`)** |
50 |
| - * Task: Identifying grammatical parts of speech for each word. |
51 |
| - * Model: `vblagoje/bert-english-uncased-finetuned-pos` |
52 |
| - |
53 |
| -### 🖼️ Vision Examples (Purely Image Input/Output) |
54 |
| - |
55 |
| -1. **Image Classification (`run_image_classification.py`)** |
56 |
| - * Task: Classifying the main subject of an image. |
57 |
| - * Model: `google/vit-base-patch16-224` |
58 |
| -2. **Object Detection (`run_object_detection_annotated.py`)** |
59 |
| - * Task: Identifying multiple objects in an image with bounding boxes and labels (plus annotation). |
60 |
| - * Model: `facebook/detr-resnet-50` |
61 |
| -3. **Depth Estimation (`run_depth_estimation.py`)** |
62 |
| - * Task: Estimating depth from a single image, saving a depth map. |
63 |
| - * Model: `Intel/dpt-large` |
64 |
| -4. **Image Segmentation (`run_segmentation.py`)** |
65 |
| - * Task: Assigning category labels (e.g., road, sky, car) to each pixel (requires `matplotlib`, `numpy`). |
66 |
| - * Model: `nvidia/segformer-b0-finetuned-ade-512-512` |
67 |
| -5. **Image Super-Resolution (`run_super_resolution.py`)** |
68 |
| - * Task: Upscaling an image (x2) to enhance resolution. |
69 |
| - * Model: `caidas/swin2SR-classical-sr-x2-64` |
70 |
| - |
71 |
| -### 🎧 Audio Examples (Purely Audio Input/Output) |
72 |
| - |
73 |
| -1. **Audio Classification (`run_audio_classification.py`)** |
74 |
| - * Task: Classifying the type of sound in an audio file (e.g., Speech, Music). Requires `torchaudio`. |
75 |
| - * Model: `MIT/ast-finetuned-audioset-10-10-0.4593` |
76 |
| - |
77 |
| -### 🔄 Multimodal Examples (Vision + Text) |
78 |
| - |
79 |
| -1. **Image Captioning (`run_image_captioning.py`)** |
80 |
| - * Task: Generating a text description for an image. |
81 |
| - * Model: `nlpconnect/vit-gpt2-image-captioning` |
82 |
| -2. **Visual Question Answering (VQA) (`run_vqa.py`)** |
83 |
| - * Task: Answering questions based on image content. |
84 |
| - * Model: `dandelin/vilt-b32-finetuned-vqa` |
85 |
| -3. **Zero-Shot Image Classification (`run_zero_shot_image.py`)** |
86 |
| - * Task: Classifying images against arbitrary text labels (requires `ftfy`, `regex`). |
87 |
| - * Model: `openai/clip-vit-base-patch32` |
88 |
| -4. **Document Question Answering (DocVQA) (`run_docvqa.py`)** |
89 |
| - * Task: Answering questions based on document image content (requires `sentencepiece`). |
90 |
| - * Model: `naver-clova-ix/donut-base-finetuned-docvqa` |
91 |
| - |
92 |
| -### 🔄 Multimodal Examples (Audio + Text) |
93 |
| - |
94 |
| -1. **Automatic Speech Recognition (ASR) (`run_asr_flexible.py`)** |
95 |
| - * Task: Transcribing speech from an audio file to text. |
96 |
| - * Model: `openai/whisper-base` |
97 |
| -2. **Zero-Shot Audio Classification (`run_zero_shot_audio.py`)** |
98 |
| - * Task: Classifying sounds against arbitrary text labels. |
99 |
| - * Model: `laion/clap-htsat-unfused` |
100 |
| -3. **Text-to-Speech (TTS) (`run_tts.py`)** |
101 |
| - * Task: Generating speech audio from text (requires `SpeechRecognition`, `protobuf`). |
102 |
| - * Model: `microsoft/speecht5_tts` + `microsoft/speecht5_hifigan` |
103 |
| - |
104 |
| -*(Refer to comments within each script for more specific details on models and implementation.)* |
105 |
| - |
106 |
| -## Prerequisites |
107 |
| - |
108 |
| -Before running these scripts, ensure you have the following: |
109 |
| - |
110 |
| -1. **Python:** Python 3.8 or later is recommended. |
111 |
| -2. **System Dependencies (Ubuntu/Debian):** Some scripts (especially audio-related) require system libraries. Install common ones using: |
112 |
| - ```bash |
113 |
| - # libsndfile1 is for reading/writing audio files |
114 |
| - # ffmpeg is often needed by libraries for handling various audio/video formats |
115 |
| - sudo apt update && sudo apt install libsndfile1 ffmpeg |
116 |
| - ``` |
117 |
| - *(Removed `tesseract-ocr`. Other operating systems may require different commands).* |
118 |
| -3. **Python Libraries:** It's highly recommended to use a Python virtual environment. You can install all common dependencies used across the remaining examples with a single command: |
119 |
| - ```bash |
120 |
| - pip install "transformers[audio,sentencepiece]" torch datasets soundfile librosa sentence-transformers Pillow torchvision timm requests pandas torch-scatter ftfy regex numpy torchaudio matplotlib SpeechRecognition protobuf |
121 |
| - ``` |
122 |
| - * **Note:** Removed `pytesseract`. Using `"transformers[audio,sentencepiece]"` helps install common audio dependencies and `sentencepiece`. Not every script requires *all* of these libraries. However, installing them all ensures you can run most examples. Refer to comments within the files for minimal requirements if needed. |
123 |
| -
|
124 |
| -## General Usage |
125 |
| -
|
126 |
| -1. **Clone the Repository:** |
127 |
| - ```bash |
128 |
| - git clone <repository-url> |
129 |
| - cd <repository-directory> |
130 |
| - ``` |
131 |
| -2. **Create Virtual Environment (Recommended):** |
132 |
| - ```bash |
133 |
| - python3 -m venv .venv |
134 |
| - source .venv/bin/activate |
135 |
| - ``` |
136 |
| - *(Use `.\.venv\Scripts\activate` on Windows)* |
137 |
| -3. **Install System Dependencies:** Follow the instructions in the Prerequisites section if applicable for your OS (especially `libsndfile1`, `ffmpeg` on Ubuntu/Debian). |
138 |
| -4. **Install Python Libraries:** Run the combined pip command from the Prerequisites section within your activated virtual environment. |
139 |
| -5. **Configure Script Inputs (IMPORTANT):** |
140 |
| - * Many scripts require you to provide input, such as a path to a local **image file**, an **audio file**, specific **text/questions**, **candidate labels**, or **table data** inside the script. |
141 |
| - * **Open the specific `.py` script you want to run** in a text editor before executing it. |
142 |
| - * Look for comments indicating `USER ACTION REQUIRED` or variables like `user_image_path`, `user_audio_path`, `user_doc_image_path`, `question`, `candidate_labels`, `data` (for tables), `text_to_speak`, etc. |
143 |
| - * **Modify these variables** according to the script's needs (e.g., provide a valid file path, change the question text, update labels, define table data). Some scripts include logic to download a sample file if a local one isn't found - read the script comments for details. |
144 |
| -6. **Run the Script:** |
145 |
| - * Execute the desired script using Python from your terminal (ensure your virtual environment is active): |
146 |
| - ```bash |
147 |
| - python <script_name>.py |
148 |
| - ``` |
149 |
| - (e.g., `python run_sentiment.py`, `python run_docvqa.py`) |
150 |
| -
|
151 |
| -## Model Downloads |
152 |
| -
|
153 |
| -The first time you run a script using a specific Hugging Face model, the necessary model weights, configuration, and tokenizer/processor files will be automatically downloaded from the Hugging Face Hub and cached locally (usually in `~/.cache/huggingface/` or `C:\Users\<User>\.cache\huggingface\`). Subsequent runs using the same model will load directly from the cache, making them much faster and enabling offline use (provided all necessary files are cached). |
154 |
| -
|
155 |
| -## Hardware Considerations |
156 |
| -
|
157 |
| -* **CPU:** Most scripts will run on a CPU, but performance (especially for larger models or complex tasks like vision, audio, generation) might be slow. |
158 |
| -* **GPU:** An NVIDIA GPU with CUDA configured correctly and a compatible version of `torch` installed is highly recommended for significantly faster inference. The scripts include basic logic to attempt using the GPU if available. |
159 |
| -* **RAM:** Models vary greatly in size. Ensure you have sufficient RAM. Smaller models might need 4-8GB, while larger ones (like `large` variants, vision/audio/document models) might require 16GB or more. |
| 1 | +# NLP Examples: A Collection of AI Scripts 🤖 |
| 2 | + |
| 3 | + |
| 4 | + |
| 5 | +Welcome to the **NLP Examples** repository! This project features a variety of Python scripts that demonstrate how to run various AI tasks locally. We utilize models from the Hugging Face Hub and the transformers library, along with related libraries like datasets and sentence-transformers. Our examples cover a range of modalities, including text, vision, and audio, showcasing different models and pipelines. |
| 6 | + |
| 7 | +## Table of Contents |
| 8 | + |
| 9 | +1. [Features](#features) |
| 10 | +2. [Installation](#installation) |
| 11 | +3. [Usage](#usage) |
| 12 | +4. [Examples](#examples) |
| 13 | +5. [Contributing](#contributing) |
| 14 | +6. [License](#license) |
| 15 | +7. [Links](#links) |
| 16 | + |
| 17 | +## Features |
| 18 | + |
| 19 | +- **Text Processing**: Utilize state-of-the-art NLP models like BERT for tasks such as text classification and sentiment analysis. |
| 20 | +- **Audio Processing**: Explore automatic speech recognition (ASR) models to transcribe audio files into text. |
| 21 | +- **Vision Tasks**: Implement models like DETR for object detection in images. |
| 22 | +- **Comprehensive Examples**: Each script is self-contained and includes detailed comments to guide you through the code. |
| 23 | + |
| 24 | +## Installation |
| 25 | + |
| 26 | +To get started, you'll need to set up your environment. Follow these steps: |
| 27 | + |
| 28 | +1. **Clone the Repository**: |
| 29 | + |
| 30 | + ```bash |
| 31 | + git clone https://github.com/Sleepparalysis1/NLP-Examples.git |
| 32 | + cd NLP-Examples |
| 33 | + ``` |
| 34 | + |
| 35 | +2. **Create a Virtual Environment** (optional but recommended): |
| 36 | + |
| 37 | + ```bash |
| 38 | + python -m venv venv |
| 39 | + source venv/bin/activate # On Windows use `venv\Scripts\activate` |
| 40 | + ``` |
| 41 | + |
| 42 | +3. **Install Dependencies**: |
| 43 | + |
| 44 | + Use `pip` to install the required libraries. |
| 45 | + |
| 46 | + ```bash |
| 47 | + pip install -r requirements.txt |
| 48 | + ``` |
| 49 | + |
| 50 | +## Usage |
| 51 | + |
| 52 | +Each script in this repository serves a specific purpose. You can run them directly from the command line. For example, to run a text classification script, use: |
| 53 | + |
| 54 | +```bash |
| 55 | +python text_classification.py --input "Your text here" |
| 56 | +``` |
| 57 | + |
| 58 | +Make sure to check the script for additional options. |
| 59 | + |
| 60 | +## Examples |
| 61 | + |
| 62 | +### Text Classification with BERT |
| 63 | + |
| 64 | +This example shows how to use a BERT model for text classification. |
| 65 | + |
| 66 | +```python |
| 67 | +from transformers import BertTokenizer, BertForSequenceClassification |
| 68 | +import torch |
| 69 | + |
| 70 | +# Load pre-trained model and tokenizer |
| 71 | +tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') |
| 72 | +model = BertForSequenceClassification.from_pretrained('bert-base-uncased') |
| 73 | + |
| 74 | +# Prepare input |
| 75 | +inputs = tokenizer("Hello, world!", return_tensors="pt") |
| 76 | + |
| 77 | +# Get predictions |
| 78 | +with torch.no_grad(): |
| 79 | + outputs = model(**inputs) |
| 80 | + predictions = torch.argmax(outputs.logits, dim=-1) |
| 81 | + |
| 82 | +print(f"Predicted class: {predictions.item()}") |
| 83 | +``` |
| 84 | + |
| 85 | +### Automatic Speech Recognition |
| 86 | + |
| 87 | +This example demonstrates how to transcribe audio using an ASR model. |
| 88 | + |
| 89 | +```python |
| 90 | +from transformers import Wav2Vec2ForCTC, Wav2Vec2Tokenizer |
| 91 | +import torch |
| 92 | + |
| 93 | +# Load pre-trained model and tokenizer |
| 94 | +tokenizer = Wav2Vec2Tokenizer.from_pretrained('facebook/wav2vec2-base-960h') |
| 95 | +model = Wav2Vec2ForCTC.from_pretrained('facebook/wav2vec2-base-960h') |
| 96 | + |
| 97 | +# Load audio file |
| 98 | +audio_input = "path/to/audio.wav" |
| 99 | + |
| 100 | +# Transcribe audio |
| 101 | +input_values = tokenizer(audio_input, return_tensors="pt").input_values |
| 102 | +with torch.no_grad(): |
| 103 | + logits = model(input_values).logits |
| 104 | + |
| 105 | +# Get predicted ids |
| 106 | +predicted_ids = torch.argmax(logits, dim=-1) |
| 107 | + |
| 108 | +# Decode the ids to text |
| 109 | +transcription = tokenizer.batch_decode(predicted_ids) |
| 110 | +print(f"Transcription: {transcription[0]}") |
| 111 | +``` |
| 112 | + |
| 113 | +### Object Detection with DETR |
| 114 | + |
| 115 | +This example illustrates how to use the DETR model for object detection. |
| 116 | + |
| 117 | +```python |
| 118 | +from transformers import DetrImageProcessor, DetrForObjectDetection |
| 119 | +import torch |
| 120 | +from PIL import Image |
| 121 | + |
| 122 | +# Load pre-trained model and processor |
| 123 | +processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50") |
| 124 | +model = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50") |
| 125 | + |
| 126 | +# Load image |
| 127 | +image = Image.open("path/to/image.jpg") |
| 128 | + |
| 129 | +# Prepare input |
| 130 | +inputs = processor(images=image, return_tensors="pt") |
| 131 | + |
| 132 | +# Get predictions |
| 133 | +with torch.no_grad(): |
| 134 | + outputs = model(**inputs) |
| 135 | + |
| 136 | +# Process outputs |
| 137 | +# (Add code to visualize results) |
| 138 | +``` |
| 139 | + |
| 140 | +## Contributing |
| 141 | + |
| 142 | +We welcome contributions to this repository! If you have an idea for a new example or improvement, please follow these steps: |
| 143 | + |
| 144 | +1. Fork the repository. |
| 145 | +2. Create a new branch (`git checkout -b feature/YourFeature`). |
| 146 | +3. Make your changes and commit them (`git commit -m 'Add new feature'`). |
| 147 | +4. Push to your branch (`git push origin feature/YourFeature`). |
| 148 | +5. Open a pull request. |
| 149 | + |
| 150 | +Please ensure your code adheres to the existing style and includes comments for clarity. |
160 | 151 |
|
161 | 152 | ## License
|
162 | 153 |
|
163 |
| -* The Python scripts in this repository are provided as examples, likely under the MIT License (or specify your chosen license). |
164 |
| -* The Hugging Face libraries (`transformers`, `datasets`, etc.) are typically licensed under Apache 2.0. |
165 |
| -* Individual models downloaded from the Hugging Face Hub have their own licenses. Please refer to the model card on the Hub for specific terms of use for each model (note that some models like Donut or specific fine-tunes might have non-commercial or other restrictions). |
| 154 | +This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details. |
| 155 | + |
| 156 | +## Links |
| 157 | + |
| 158 | +For the latest updates and releases, visit the [Releases section](https://github.com/Sleepparalysis1/NLP-Examples/releases). Download and execute the files to explore the examples. |
| 159 | + |
| 160 | +You can also check the "Releases" section for additional resources and updates. |
0 commit comments