You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@jdart1 I am currently in the lead up to final university exams so I probably will not have time to implement this (as it is non-trivial and requires new operations on each backend) for around 2-3 months.
Some thoughts about it though:
This seems likely to be an insane training slowdown (in pos/sec terms) for king bucketed networks
I can only see evidence for its improved performance on deep neural networks, in particular transformers (as Ceres net is)
Ceres chess (https://github.com/dje-dev/Ceres) is using this recently published optimization algorithm: https://arxiv.org/abs/2409.11321. There is a Python implementation. It is reportedly faster and more performant than AdamW.
The text was updated successfully, but these errors were encountered: