Skip to content

[backend] Add ONNX & OpenVINO support for Cross Encoder (reranker) models #3319

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 15, 2025

Conversation

tomaarsen
Copy link
Collaborator

Hello!

Pull Request overview

  • Add ONNX & OpenVINO support for Cross Encoder (reranker) models
  • Add CrossEncoder support to the model optimization functions to optimize or quantize models with ONNX or OpenVINO.
  • Add documentation for speeding up inference for Cross Encoder models
  • Update the Sentence Transformer docs for speeding up inference to fix the mermaid graph

Details

ce_backends_benchmark_cpu
ce_backends_benchmark_gpu

The usage is rather elementary:

from sentence_transformers import CrossEncoder

model = CrossEncoder("cross-encoder/ms-marco-MiniLM-L6-v2", backend="onnx")

# Verify that everything works as expected
query = "Which planet is known as the Red Planet?"
passages = [
    "Venus is often called Earth's twin because of its similar size and proximity.",
    "Mars, known for its reddish appearance, is often referred to as the Red Planet.",
    "Jupiter, the largest planet in our solar system, has a prominent red spot.",
    "Saturn, famous for its rings, is sometimes mistaken for the Red Planet."
]

scores = model.predict([(query, passage) for passage in passages])
print(scores)

This will 1) check if there's an ONNX model already in the model repository/path, and 2) if not, export one.
If you're exporting one, it's recommended to save that model (model.save_pretrained()) to prevent having to re-export it every time.

  • Tom Aarsen

@tomaarsen
Copy link
Collaborator Author

Many of the original cross-encoder models have had ONNX (normal, optimized, quantized) and OpenVINO (normal, static quantized) variants uploaded: https://huggingface.co/cross-encoder/ms-marco-MiniLM-L6-v2/tree/main/onnx

  • Tom Aarsen

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 5 out of 8 changed files in this pull request and generated 1 comment.

Files not reviewed (3)
  • docs/cross_encoder/usage/usage.rst: Language not supported
  • docs/sentence_transformer/usage/efficiency.rst: Language not supported
  • docs/sentence_transformer/usage/usage.rst: Language not supported
Comments suppressed due to low confidence (1)

sentence_transformers/cross_encoder/CrossEncoder.py:445

  • The parameter 'is_local' is annotated as a string but represents a boolean flag. Update its type annotation to 'bool' for clarity and consistency.
def _backend_warn_to_save(self, model_name_or_path: str, is_local: str, backend_name: str) -> None:

@tomaarsen tomaarsen merged commit f604c67 into UKPLab:master Apr 15, 2025
1 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: Support for ONNX backend for CrossEncoders.
1 participant