UKPLab · tomaarsen · Apr 15, 2025 · Apr 11, 2025 · Apr 11, 2025 · Apr 15, 2025
diff --git a/docs/cross_encoder/usage/efficiency.rst b/docs/cross_encoder/usage/efficiency.rst
diff --git a/docs/cross_encoder/usage/usage.rst b/docs/cross_encoder/usage/usage.rst
@@ -73,4 +73,5 @@ Once you have `installed <../../installation.html>`_ Sentence Transformers, you
    :caption: Tasks
 
    Cross-Encoder vs Bi-Encoder <../../../examples/cross_encoder/applications/README>
-   ../../../examples/sentence_transformer/applications/retrieve_rerank/README
+   ../../../examples/sentence_transformer/applications/retrieve_rerank/README
+   efficiency
diff --git a/docs/img/ce_backends_benchmark_cpu.png b/docs/img/ce_backends_benchmark_cpu.png
diff --git a/docs/img/ce_backends_benchmark_gpu.png b/docs/img/ce_backends_benchmark_gpu.png
diff --git a/docs/sentence_transformer/usage/efficiency.rst b/docs/sentence_transformer/usage/efficiency.rst
@@ -132,9 +132,9 @@ Optimizing ONNX Models
 
 .. include:: backend_export_sidebar.rst
 
-ONNX models can be optimized using Optimum, allowing for speedups on CPUs and GPUs alike. To do this, you can use the :func:`~sentence_transformers.backend.export_optimized_onnx_model` function, which saves the optimized in a directory or model repository that you specify. It expects:
+ONNX models can be optimized using `Optimum <https://huggingface.co/docs/optimum/index>`_, allowing for speedups on CPUs and GPUs alike. To do this, you can use the :func:`~sentence_transformers.backend.export_optimized_onnx_model` function, which saves the optimized in a directory or model repository that you specify. It expects:
 
-- ``model``: a Sentence Transformer model loaded with the ONNX backend.
+- ``model``: a Sentence Transformer or Cross Encoder model loaded with the ONNX backend.
 - ``optimization_config``: ``"O1"``, ``"O2"``, ``"O3"``, or ``"O4"`` representing optimization levels from :class:`~optimum.onnxruntime.AutoOptimizationConfig`, or an :class:`~optimum.onnxruntime.OptimizationConfig` instance.
 - ``model_name_or_path``: a path to save the optimized model file, or the repository name if you want to push it to the Hugging Face Hub.
 - ``push_to_hub``: (Optional) a boolean to push the optimized model to the Hugging Face Hub.
@@ -204,9 +204,9 @@ Quantizing ONNX Models
 
 .. include:: backend_export_sidebar.rst
 
-ONNX models can be quantized to int8 precision using Optimum, allowing for faster inference on CPUs. To do this, you can use the :func:`~sentence_transformers.backend.export_dynamic_quantized_onnx_model` function, which saves the quantized in a directory or model repository that you specify. Dynamic quantization, unlike static quantization, does not require a calibration dataset. It expects:
+ONNX models can be quantized to int8 precision using `Optimum <https://huggingface.co/docs/optimum/index>`_, allowing for faster inference on CPUs. To do this, you can use the :func:`~sentence_transformers.backend.export_dynamic_quantized_onnx_model` function, which saves the quantized in a directory or model repository that you specify. Dynamic quantization, unlike static quantization, does not require a calibration dataset. It expects:
 
-- ``model``: a Sentence Transformer model loaded with the ONNX backend.
+- ``model``: a Sentence Transformer or Cross Encoder model loaded with the ONNX backend.
 - ``quantization_config``: ``"arm64"``, ``"avx2"``, ``"avx512"``, or ``"avx512_vnni"`` representing quantization configurations from :class:`~optimum.onnxruntime.AutoQuantizationConfig`, or an :class:`~optimum.onnxruntime.QuantizationConfig` instance.
 - ``model_name_or_path``: a path to save the quantized model file, or the repository name if you want to push it to the Hugging Face Hub.
 - ``push_to_hub``: (Optional) a boolean to push the quantized model to the Hugging Face Hub.
@@ -329,15 +329,15 @@ Quantizing OpenVINO Models
 
 .. include:: backend_export_sidebar.rst
 
-OpenVINO models can be quantized to int8 precision using Optimum Intel to speed up inference.
+OpenVINO models can be quantized to int8 precision using `Optimum Intel <https://huggingface.co/docs/optimum/main/en/intel/index>`_ to speed up inference.
 To do this, you can use the :func:`~sentence_transformers.backend.export_static_quantized_openvino_model` function,
 which saves the quantized model in a directory or model repository that you specify.
 Post-Training Static Quantization expects:
 
-- ``model``: a Sentence Transformer model loaded with the OpenVINO backend.
+- ``model``: a Sentence Transformer or Cross Encoder model loaded with the OpenVINO backend.
 - ``quantization_config``: (Optional) The quantization configuration. This parameter accepts either:
-      ``None`` for the default 8-bit quantization, a dictionary representing quantization configurations, or
-      an :class:`~optimum.intel.OVQuantizationConfig` instance.
+  ``None`` for the default 8-bit quantization, a dictionary representing quantization configurations, or
+  an :class:`~optimum.intel.OVQuantizationConfig` instance.
 - ``model_name_or_path``: a path to save the quantized model file, or the repository name if you want to push it to the Hugging Face Hub.
 - ``dataset_name``: (Optional) The name of the dataset to load for calibration. If not specified, defaults to ``sst2`` subset from the ``glue`` dataset.
 - ``dataset_config_name``: (Optional) The specific configuration of the dataset to load.
@@ -541,8 +541,8 @@ Based on the benchmarks, this flowchart should help you decide which backend to
       }
    }}%%
    graph TD
-   A(What is your hardware?) -->|GPU| B(Is your text usually smaller than 500 characters?)
-   A -->|CPU| C(Is a 0.4% accuracy loss acceptable?)
+   A(What is your hardware?) -->|GPU| B(Is your text usually smaller<br>than 500 characters?)
+   A -->|CPU| C(Is a 0.4% accuracy loss<br>acceptable?)
    B -->|yes| D[onnx-O4]
    B -->|no| F[float16]
    C -->|yes| G[openvino-qint8]

diff --git a/docs/sentence_transformer/usage/usage.rst b/docs/sentence_transformer/usage/usage.rst
@@ -56,6 +56,6 @@ Once you have `installed <../../installation.html>`_ Sentence Transformers, you
    ../../../examples/sentence_transformer/applications/parallel-sentence-mining/README
    ../../../examples/sentence_transformer/applications/image-search/README
    ../../../examples/sentence_transformer/applications/embedding-quantization/README
-   efficiency
    custom_models
+   efficiency