Open
Description
Training the SAINT algorithm on a massive dataset, however I'm restricted into using batch sizes of 32-38 as only subsets of the data are highly correlated with each other (share a timestamp of collection as a feature) and would benefit of the intersample attention.
I haven't found anything natively built into the library to allow this, but it feels probable that there could be something built in or that someone has come up with a solution since it seems like it's natural extension of intersample attention.
Advice would be appreciated!
Thanks.