-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Feature Request: Support for ONNX backend for CrossEncoders. #3039
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hello! Thanks for the suggestion. Since I took over this project, I have made various improvements to
|
Hello @tomaarsen , Is there any update on this ? I am not able to see any onnx models on the hugging face for the Could you please add the onnx models for these ? |
Hello! I wrote an implementation for ONNX and OpenVINO support last Friday, and I'll be benchmarking and testing it this week. I'm aiming to release v4.1 as soon as possible.
|
I could test this performance-wise on consumer GPU if there is a pre-release available @tomaarsen. Anyway, thanks for your great work on this lib and the very clean v4 release! |
You're in luck! I've prepared my Pull Request here: #3319
|
Woaw, your response time doesn't need a speedup, it's definitely SOTA! 🥇 |
That's right! Sometimes bfloat16 is marginally faster (e.g. 1-2%), but it can also perform slightly worse. ONNX can also be slightly faster under certain settings, but fp16 outperformed ONNX on average. |
Recently, I noticed that the
SentenceTransformers
class has gained the ability to use the ONNX backend, which is incredibly beneficial for enhancing performance, especially on CPUs.I would like to request a similar feature for the
CrossEncoder
class. Adding support for the ONNX backend inCrossEncoder
would be a significant enhancement. It would greatly accelerate reranking tasks on CPU, making the library even more powerful and efficient.Here are some potential benefits:
SentenceTransformers
andCrossEncoder
classes can leverage the same performance optimizations.Thank you for considering this feature request.
The text was updated successfully, but these errors were encountered: