Skip to content

Add support for image classification #226

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 23, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 9 additions & 5 deletions docs/source/onnxruntime/modeling_ort.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,13 @@ specific language governing permissions and limitations under the License.

# Optimum Inference with ONNX Runtime

Optimum is a utility package for building and running inference with accelerated runtime like ONNX Runtime.
Optimum can be used to load optimized models from the [Hugging Face Hub](hf.co/models) and create pipelines
Optimum is a utility package for building and running inference with accelerated runtime like ONNX Runtime.
Optimum can be used to load optimized models from the [Hugging Face Hub](hf.co/models) and create pipelines
to run accelerated inference without rewriting your APIs.

## Switching from Transformers to Optimum Inference

The Optimum Inference models are API compatible with Hugging Face Transformers models. This means you can just replace your `AutoModelForXxx` class with the corresponding `ORTModelForXxx` class in `optimum`. For example, this is how you can use a question answering model in `optimum`:
The Optimum Inference models are API compatible with Hugging Face Transformers models. This means you can just replace your `AutoModelForXxx` class with the corresponding `ORTModelForXxx` class in `optimum`. For example, this is how you can use a question answering model in `optimum`:

```diff
from transformers import AutoTokenizer, pipeline
Expand Down Expand Up @@ -57,8 +57,8 @@ You can find a complete walkhrough Optimum Inference for ONNX Runtime in this [n

### Working with the Hugging Face Model Hub

The Optimum model classes like [`~onnxruntime.ORTModelForSequenceClassification`] are integrated with the [Hugging Face Model Hub](https://hf.co/models), which means you can not only
load model from the Hub, but also push your models to the Hub with `push_to_hub()` method. Below is an example which downloads a vanilla Transformers model
The Optimum model classes like [`~onnxruntime.ORTModelForSequenceClassification`] are integrated with the [Hugging Face Model Hub](https://hf.co/models), which means you can not only
load model from the Hub, but also push your models to the Hub with `push_to_hub()` method. Below is an example which downloads a vanilla Transformers model
from the Hub and converts it to an optimum onnxruntime model and pushes it back into a new repository.

<!-- TODO: Add Quantizer into example when UX improved -->
Expand Down Expand Up @@ -105,3 +105,7 @@ from the Hub and converts it to an optimum onnxruntime model and pushes it back

[[autodoc]] onnxruntime.modeling_ort.ORTModelForCausalLM

## ORTModelForImageClassification

[[autodoc]] onnxruntime.modeling_ort.ORTModelForImageClassification

32 changes: 16 additions & 16 deletions docs/source/pipelines.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,7 @@ specific language governing permissions and limitations under the License.

# Optimum pipelines for inference

The [`~pipelines.pipeline`] function makes it simple to use models from the [Model Hub](https://huggingface.co/models) for accelerated inference on a variety of tasks such as text classification.
Even if you don't have experience with a specific modality or understand the code powering the models, you can still use them with the [`~pipelines.pipeline`] function!
The [`~pipelines.pipeline`] function makes it simple to use models from the [Model Hub](https://huggingface.co/models) for accelerated inference on a variety of tasks such as text classification, question answering and image classification.

<Tip>

Expand All @@ -31,11 +30,12 @@ Currenlty supported tasks are:
* `question-answering`
* `zero-shot-classification`
* `text-generation`
* `image-classification`

## Optimum pipeline usage

While each task has an associated pipeline class, it is simpler to use the general [`~pipelines.pipeline`] function which wraps all the task-specific pipelines in one object.
The [`~pipelines.pipeline`] function automatically loads a default model and tokenizer capable of inference for your task.
While each task has an associated pipeline class, it is simpler to use the general [`~pipelines.pipeline`] function which wraps all the task-specific pipelines in one object.
The [`~pipelines.pipeline`] function automatically loads a default model and tokenizer/feature-extractor capable of inference for your task.

1. Start by creating a pipeline by specifying an inference task:

Expand All @@ -46,7 +46,7 @@ The [`~pipelines.pipeline`] function automatically loads a default model and tok

```

2. Pass your input text to the [`~pipelines.pipeline`] function:
2. Pass your input text/image to the [`~pipelines.pipeline`] function:

```python
>>> classifier("I like you. I love you.")
Expand All @@ -57,9 +57,9 @@ _Note: The default models used in the [`~pipelines.pipeline`] function are not o

### Using vanilla Transformers model and converting to ONNX

The [`~pipelines.pipeline`] function accepts any supported model from the [Model Hub](https://huggingface.co/models).
There are tags on the Model Hub that allow you to filter for a model you'd like to use for your task.
Once you've picked an appropriate model, load it with the `from_pretrained("{model_id}",from_transformers=True)` method associated with the `ORTModelFor*`
The [`~pipelines.pipeline`] function accepts any supported model from the [Model Hub](https://huggingface.co/models).
There are tags on the Model Hub that allow you to filter for a model you'd like to use for your task.
Once you've picked an appropriate model, load it with the `from_pretrained("{model_id}",from_transformers=True)` method associated with the `ORTModelFor*`
`AutoTokenizer' class. For example, here's how you can load the [`~onnxruntime.ORTModelForQuestionAnswering`] class for question answering:

```python
Expand All @@ -80,10 +80,10 @@ Once you've picked an appropriate model, load it with the `from_pretrained("{mod

### Using Optimum models

The [`~pipelines.pipeline`] function is tightly integrated with [Model Hub](https://huggingface.co/models) and can load optimized models directly, e.g. those created with ONNX Runtime.
There are tags on the Model Hub that allow you to filter for a model you'd like to use for your task.
The [`~pipelines.pipeline`] function is tightly integrated with [Model Hub](https://huggingface.co/models) and can load optimized models directly, e.g. those created with ONNX Runtime.
There are tags on the Model Hub that allow you to filter for a model you'd like to use for your task.
Once you've picked an appropriate model, load it with the `from_pretrained()` method associated with the corresponding `ORTModelFor*`
and `AutoTokenizer' class. For example, here's how you can load an optimized model for question answering:
and `AutoTokenizer'/`AutoFeatureExtractor` class. For example, here's how you can load an optimized model for question answering:

```python
>>> from transformers import AutoTokenizer
Expand Down Expand Up @@ -132,7 +132,7 @@ Below you can find two examples on how you could [`~onnxruntime.ORTOptimizer`] a
onnx_quantized_model_output_path=save_path / "model-quantized.onnx",
quantization_config=qconfig,
)
>>> quantizer.model.config.save_pretrained(save_path) # saves config.json
>>> quantizer.model.config.save_pretrained(save_path) # saves config.json

# load optimized model from local path or repository
>>> model = ORTModelForSequenceClassification.from_pretrained(save_path,file_name="model-quantized.onnx")
Expand Down Expand Up @@ -176,7 +176,7 @@ Below you can find two examples on how you could [`~onnxruntime.ORTOptimizer`] a
onnx_optimized_model_output_path=save_path / "model-optimized.onnx",
optimization_config=optimization_config,
)
>>> optimizer.model.config.save_pretrained(save_path) # saves config.json
>>> optimizer.model.config.save_pretrained(save_path) # saves config.json

# load optimized model from local path or repository
>>> model = ORTModelForSequenceClassification.from_pretrained(save_path,file_name="model-optimized.onnx")
Expand All @@ -198,16 +198,16 @@ Below you can find two examples on how you could [`~onnxruntime.ORTOptimizer`] a
## Transformers pipeline usage

The [`~pipelines.pipeline`] function is just a light wrapper around the `transformers.pipeline` function to enable checks for supported tasks and additional features
, like quantization and optimization. This being said you can use the `transformers.pipeline` and just replace your `AutoFor*` with the optimum
`ORTModelFor*` class.
, like quantization and optimization. This being said you can use the `transformers.pipeline` and just replace your `AutoModelFor*` with the optimum
`ORTModelFor*` class.

```diff
from transformers import AutoTokenizer, pipeline
-from transformers import AutoModelForQuestionAnswering
+from optimum.onnxruntime import ORTModelForQuestionAnswering

-model = AutoModelForQuestionAnswering.from_pretrained("deepset/roberta-base-squad2")
+model = ORTModelForQuestionAnswering.from_transformers("optimum/roberta-base-squad2")
+model = ORTModelForQuestionAnswering.from_pretrained("optimum/roberta-base-squad2")
tokenizer = AutoTokenizer.from_pretrained("deepset/roberta-base-squad2")

onnx_qa = pipeline("question-answering",model=model,tokenizer=tokenizer)
Expand Down
1 change: 1 addition & 0 deletions optimum/onnxruntime/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ class ORTQuantizableOperator(Enum):
from .modeling_ort import (
ORTModelForCausalLM,
ORTModelForFeatureExtraction,
ORTModelForImageClassification,
ORTModelForQuestionAnswering,
ORTModelForSequenceClassification,
ORTModelForTokenClassification,
Expand Down
1 change: 0 additions & 1 deletion optimum/onnxruntime/configuration.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@
from datasets import Dataset
from packaging.version import Version, parse

from onnxruntime import GraphOptimizationLevel
from onnxruntime import __version__ as ort_version
from onnxruntime.quantization import CalibraterBase, CalibrationMethod, QuantFormat, QuantizationMode, QuantType
from onnxruntime.quantization.calibrate import create_calibrator
Expand Down
Loading