Feature Request: Support for ONNX backend for CrossEncoders. #3039

SupreethRao99 · 2024-11-06T14:22:52Z

Recently, I noticed that the SentenceTransformers class has gained the ability to use the ONNX backend, which is incredibly beneficial for enhancing performance, especially on CPUs.

I would like to request a similar feature for the CrossEncoder class. Adding support for the ONNX backend in CrossEncoder would be a significant enhancement. It would greatly accelerate reranking tasks on CPU, making the library even more powerful and efficient.

Here are some potential benefits:

Improved Performance: Faster inference times on CPU, useful when GPUs are not available.
Scalability: Ability to handle larger reranking workloads with reduced latency.
Consistency: Ensuring that both SentenceTransformers and CrossEncoder classes can leverage the same performance optimizations.

Thank you for considering this feature request.

The text was updated successfully, but these errors were encountered:

tomaarsen · 2024-11-07T07:57:37Z

Hello!

Thanks for the suggestion. Since I took over this project, I have made various improvements to SentenceTransformer models, such as multi-GPU training, bf16, loss logging, new backends, etc. My intention is to spend some time starting from next week on extending these improvements to CrossEncoder: both on the training and on the inference side. That will include adding ONNX/OV backends to the CrossEncoder.

Tom Aarsen

arjungandeeva · 2025-04-07T06:00:50Z

Hello @tomaarsen , Is there any update on this ? I am not able to see any onnx models on the hugging face for the
https://huggingface.co/cross-encoder/ms-marco-MiniLM-L6-v2
https://huggingface.co/cross-encoder/ms-marco-MiniLM-L4-v2

Could you please add the onnx models for these ?

tomaarsen · 2025-04-07T06:03:24Z

Hello!

I wrote an implementation for ONNX and OpenVINO support last Friday, and I'll be benchmarking and testing it this week.
It will be included in the next v4.1 release, and all existing models on the cross-encoder organization on Hugging Face will have ONNX models uploaded.

I'm aiming to release v4.1 as soon as possible.

Tom Aarsen

toniopelo · 2025-04-12T16:52:10Z

I could test this performance-wise on consumer GPU if there is a pre-release available @tomaarsen.
I have a large dataset to score with CrossEncoder and I would greatly benefit from this to speed up the process.
Do you have a rough idea of the expected speedup with ONNX backend now ?

Anyway, thanks for your great work on this lib and the very clean v4 release!

tomaarsen · 2025-04-12T16:55:29Z

You're in luck! I've prepared my Pull Request here: #3319
It contains 2 pictures that detail the expected average speedup gain on both GPUs and CPUs. You can also already install this branch and use it, if you'd like, but the documentation on how to use it isn't on https://sbert.net yet (but only in the PR itself). The full v4.1 release with this feature should be published next week.

Tom Aarsen

toniopelo · 2025-04-12T17:08:31Z

Woaw, your response time doesn't need a speedup, it's definitely SOTA! 🥇
Pictures are great to have an idea of the speedup. If I understand correctly, there is nothing faster that torch-fp16 on GPU ?
If so and I am already running with model_kwargs={"torch_dtype": "float16"}, then I have nothing to expect from another backend like ONNX speedup-wise, am I right ?

tomaarsen · 2025-04-12T17:10:28Z

That's right! Sometimes bfloat16 is marginally faster (e.g. 1-2%), but it can also perform slightly worse.

ONNX can also be slightly faster under certain settings, but fp16 outperformed ONNX on average.

tomaarsen linked a pull request Apr 14, 2025 that will close this issue

[backend] Add ONNX & OpenVINO support for Cross Encoder (reranker) models #3319

Merged

tomaarsen closed this as completed in #3319 Apr 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Support for ONNX backend for CrossEncoders. #3039

Feature Request: Support for ONNX backend for CrossEncoders. #3039

SupreethRao99 commented Nov 6, 2024

tomaarsen commented Nov 7, 2024

arjungandeeva commented Apr 7, 2025

tomaarsen commented Apr 7, 2025

toniopelo commented Apr 12, 2025

tomaarsen commented Apr 12, 2025

toniopelo commented Apr 12, 2025 •

edited

Loading

tomaarsen commented Apr 12, 2025

Feature Request: Support for ONNX backend for CrossEncoders. #3039

Feature Request: Support for ONNX backend for CrossEncoders. #3039

Comments

SupreethRao99 commented Nov 6, 2024

tomaarsen commented Nov 7, 2024

arjungandeeva commented Apr 7, 2025

tomaarsen commented Apr 7, 2025

toniopelo commented Apr 12, 2025

tomaarsen commented Apr 12, 2025

toniopelo commented Apr 12, 2025 • edited Loading

tomaarsen commented Apr 12, 2025

toniopelo commented Apr 12, 2025 •

edited

Loading