SOAP optimizer #367

jdart1 · 2025-04-08T14:58:54Z

Ceres chess (https://github.com/dje-dev/Ceres) is using this recently published optimization algorithm: https://arxiv.org/abs/2409.11321. There is a Python implementation. It is reportedly faster and more performant than AdamW.

jw1912 · 2025-04-08T16:39:50Z

@jdart1 I am currently in the lead up to final university exams so I probably will not have time to implement this (as it is non-trivial and requires new operations on each backend) for around 2-3 months.
Some thoughts about it though:

This seems likely to be an insane training slowdown (in pos/sec terms) for king bucketed networks
I can only see evidence for its improved performance on deep neural networks, in particular transformers (as Ceres net is)

jdart1 · 2025-04-08T17:13:17Z

It is only a suggestion. If it is less efficient then there is no point in implementing it.

cosmobobak · 2025-04-18T14:44:42Z

On the topic of alternate optimisers, there's also Muon.

jdart1 added the enhancement New feature or request label Apr 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SOAP optimizer #367

SOAP optimizer #367

jdart1 commented Apr 8, 2025

jw1912 commented Apr 8, 2025

jdart1 commented Apr 8, 2025

cosmobobak commented Apr 18, 2025

SOAP optimizer #367

SOAP optimizer #367

Comments

jdart1 commented Apr 8, 2025

jw1912 commented Apr 8, 2025

jdart1 commented Apr 8, 2025

cosmobobak commented Apr 18, 2025