huggingface · regisss · Jun 23, 2022 · Jun 20, 2022 · Jun 20, 2022 · Jun 21, 2022
diff --git a/docs/source/onnxruntime/modeling_ort.mdx b/docs/source/onnxruntime/modeling_ort.mdx
@@ -12,13 +12,13 @@ specific language governing permissions and limitations under the License.
 
 # Optimum Inference with ONNX Runtime
 
-Optimum is a utility package for building and running inference with accelerated runtime like ONNX Runtime. 
-Optimum can be used to load optimized models from the [Hugging Face Hub](hf.co/models) and create pipelines 
+Optimum is a utility package for building and running inference with accelerated runtime like ONNX Runtime.
+Optimum can be used to load optimized models from the [Hugging Face Hub](hf.co/models) and create pipelines
 to run accelerated inference without rewriting your APIs.
 
 ## Switching from Transformers to Optimum Inference
 
-The Optimum Inference models are API compatible with Hugging Face Transformers models. This means you can just replace your `AutoModelForXxx` class with the corresponding `ORTModelForXxx` class in `optimum`. For example, this is how you can use a question answering model in `optimum`: 
+The Optimum Inference models are API compatible with Hugging Face Transformers models. This means you can just replace your `AutoModelForXxx` class with the corresponding `ORTModelForXxx` class in `optimum`. For example, this is how you can use a question answering model in `optimum`:
 
 ```diff
 from transformers import AutoTokenizer, pipeline
@@ -57,8 +57,8 @@ You can find a complete walkhrough Optimum Inference for ONNX Runtime in this [n
 
 ### Working with the Hugging Face Model Hub
 
-The Optimum model classes like [`~onnxruntime.ORTModelForSequenceClassification`] are integrated with the [Hugging Face Model Hub](https://hf.co/models), which means you can not only 
-load model from the Hub, but also push your models to the Hub with `push_to_hub()` method. Below is an example which downloads a vanilla Transformers model 
+The Optimum model classes like [`~onnxruntime.ORTModelForSequenceClassification`] are integrated with the [Hugging Face Model Hub](https://hf.co/models), which means you can not only
+load model from the Hub, but also push your models to the Hub with `push_to_hub()` method. Below is an example which downloads a vanilla Transformers model
 from the Hub and converts it to an optimum onnxruntime model and pushes it back into a new repository.
 
 <!-- TODO: Add Quantizer into example when UX improved -->
@@ -105,3 +105,7 @@ from the Hub and converts it to an optimum onnxruntime model and pushes it back
 
 [[autodoc]] onnxruntime.modeling_ort.ORTModelForCausalLM
 
+## ORTModelForImageClassification
+
+[[autodoc]] onnxruntime.modeling_ort.ORTModelForImageClassification
+
diff --git a/docs/source/pipelines.mdx b/docs/source/pipelines.mdx
@@ -12,8 +12,7 @@ specific language governing permissions and limitations under the License.
 
 # Optimum pipelines for inference
 
-The [`~pipelines.pipeline`] function makes it simple to use models from the [Model Hub](https://huggingface.co/models) for accelerated inference on a variety of tasks such as text classification.
-Even if you don't have experience with a specific modality or understand the code powering the models, you can still use them with the [`~pipelines.pipeline`] function!
+The [`~pipelines.pipeline`] function makes it simple to use models from the [Model Hub](https://huggingface.co/models) for accelerated inference on a variety of tasks such as text classification, question answering and image classification.
 
 <Tip>
 
@@ -31,11 +30,12 @@ Currenlty supported tasks are:
 * `question-answering`
 * `zero-shot-classification`
 * `text-generation`
+* `image-classification`
 
 ## Optimum pipeline usage
 
-While each task has an associated pipeline class, it is simpler to use the general [`~pipelines.pipeline`] function which wraps all the task-specific pipelines in one object. 
-The [`~pipelines.pipeline`] function automatically loads a default model and tokenizer capable of inference for your task. 
+While each task has an associated pipeline class, it is simpler to use the general [`~pipelines.pipeline`] function which wraps all the task-specific pipelines in one object.
+The [`~pipelines.pipeline`] function automatically loads a default model and tokenizer/feature-extractor capable of inference for your task.
 
 1. Start by creating a pipeline by specifying an inference task:
 
@@ -46,7 +46,7 @@ The [`~pipelines.pipeline`] function automatically loads a default model and tok
 
 ```
 
-2. Pass your input text to the [`~pipelines.pipeline`] function:
+2. Pass your input text/image to the [`~pipelines.pipeline`] function:
 
 ```python
 >>> classifier("I like you. I love you.")
@@ -57,9 +57,9 @@ _Note: The default models used in the [`~pipelines.pipeline`] function are not o
 
 ### Using vanilla Transformers model and converting to ONNX
 
-The [`~pipelines.pipeline`] function accepts any supported model from the [Model Hub](https://huggingface.co/models). 
-There are tags on the Model Hub that allow you to filter for a model you'd like to use for your task. 
-Once you've picked an appropriate model, load it with the `from_pretrained("{model_id}",from_transformers=True)` method associated with the `ORTModelFor*` 
+The [`~pipelines.pipeline`] function accepts any supported model from the [Model Hub](https://huggingface.co/models).
+There are tags on the Model Hub that allow you to filter for a model you'd like to use for your task.
+Once you've picked an appropriate model, load it with the `from_pretrained("{model_id}",from_transformers=True)` method associated with the `ORTModelFor*`
 `AutoTokenizer' class. For example, here's how you can load the [`~onnxruntime.ORTModelForQuestionAnswering`] class for question answering:
 
 ```python
@@ -80,10 +80,10 @@ Once you've picked an appropriate model, load it with the `from_pretrained("{mod
 
 ### Using Optimum models
 
-The [`~pipelines.pipeline`] function is tightly integrated with [Model Hub](https://huggingface.co/models) and can load optimized models directly, e.g. those created with ONNX Runtime. 
-There are tags on the Model Hub that allow you to filter for a model you'd like to use for your task. 
+The [`~pipelines.pipeline`] function is tightly integrated with [Model Hub](https://huggingface.co/models) and can load optimized models directly, e.g. those created with ONNX Runtime.
+There are tags on the Model Hub that allow you to filter for a model you'd like to use for your task.
 Once you've picked an appropriate model, load it with the `from_pretrained()` method associated with the corresponding `ORTModelFor*`
-and `AutoTokenizer' class. For example, here's how you can load an optimized model for question answering:
+and `AutoTokenizer'/`AutoFeatureExtractor` class. For example, here's how you can load an optimized model for question answering:
 
 ```python
 >>> from transformers import AutoTokenizer
@@ -132,7 +132,7 @@ Below you can find two examples on how you could [`~onnxruntime.ORTOptimizer`] a
     onnx_quantized_model_output_path=save_path / "model-quantized.onnx",
     quantization_config=qconfig,
     )
->>> quantizer.model.config.save_pretrained(save_path) # saves config.json 
+>>> quantizer.model.config.save_pretrained(save_path) # saves config.json
 
 # load optimized model from local path or repository
 >>> model = ORTModelForSequenceClassification.from_pretrained(save_path,file_name="model-quantized.onnx")
@@ -176,7 +176,7 @@ Below you can find two examples on how you could [`~onnxruntime.ORTOptimizer`] a
     onnx_optimized_model_output_path=save_path / "model-optimized.onnx",
     optimization_config=optimization_config,
 )
->>> optimizer.model.config.save_pretrained(save_path) # saves config.json 
+>>> optimizer.model.config.save_pretrained(save_path) # saves config.json
 
 # load optimized model from local path or repository
 >>> model = ORTModelForSequenceClassification.from_pretrained(save_path,file_name="model-optimized.onnx")
@@ -198,16 +198,16 @@ Below you can find two examples on how you could [`~onnxruntime.ORTOptimizer`] a
 ## Transformers pipeline usage
 
 The [`~pipelines.pipeline`] function is just a light wrapper around the `transformers.pipeline` function to enable checks for supported tasks and additional features
-, like quantization and optimization. This being said you can use the `transformers.pipeline` and just replace your `AutoFor*` with the optimum 
- `ORTModelFor*` class. 
+, like quantization and optimization. This being said you can use the `transformers.pipeline` and just replace your `AutoModelFor*` with the optimum
+ `ORTModelFor*` class.
 
 ```diff
 from transformers import AutoTokenizer, pipeline
 -from transformers import AutoModelForQuestionAnswering
 +from optimum.onnxruntime import ORTModelForQuestionAnswering
 
 -model = AutoModelForQuestionAnswering.from_pretrained("deepset/roberta-base-squad2")
-+model = ORTModelForQuestionAnswering.from_transformers("optimum/roberta-base-squad2")
++model = ORTModelForQuestionAnswering.from_pretrained("optimum/roberta-base-squad2")
 tokenizer = AutoTokenizer.from_pretrained("deepset/roberta-base-squad2")
 
 onnx_qa = pipeline("question-answering",model=model,tokenizer=tokenizer)

diff --git a/optimum/onnxruntime/__init__.py b/optimum/onnxruntime/__init__.py
@@ -53,6 +53,7 @@ class ORTQuantizableOperator(Enum):
 from .modeling_ort import (
     ORTModelForCausalLM,
     ORTModelForFeatureExtraction,
+    ORTModelForImageClassification,
     ORTModelForQuestionAnswering,
     ORTModelForSequenceClassification,
     ORTModelForTokenClassification,

diff --git a/optimum/onnxruntime/configuration.py b/optimum/onnxruntime/configuration.py
@@ -20,7 +20,6 @@
 from datasets import Dataset
 from packaging.version import Version, parse
 
-from onnxruntime import GraphOptimizationLevel
 from onnxruntime import __version__ as ort_version
 from onnxruntime.quantization import CalibraterBase, CalibrationMethod, QuantFormat, QuantizationMode, QuantType
 from onnxruntime.quantization.calibrate import create_calibrator