This repository was archived by the owner on Oct 30, 2023. It is now read-only.
This repository was archived by the owner on Oct 30, 2023. It is now read-only.
Question #15
Closed
Description
Hello guys,
Very nice piece of work.
I was wondering why you didn't use a
einsum implementation of the bilinear attention in order to speed up training.
This equation is perfect for it. U should have a significant gain, and it would be nice for once to have highly optimized code available on github.
Best,
T.C
Metadata
Metadata
Assignees
Labels
No labels