About the convergence trend comparison with Adamw in ViT-H

Hi, 
Thank you very much for your brilliant work on Adan!
And from you paper, it said Adan should get a lower loss (both Train and test) than Adamw according to Figure 1. However, I got a higher training loss with Adan than AdamW in ViT-H:
<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">

<head>

<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 15">
<link id=Main-File rel=Main-File
href="file:///C:/Users/YUJIEZ~1.SEC/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
<link rel=File-List
href="file:///C:/Users/YUJIEZ~1.SEC/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">


</head>

<body link="#0563C1" vlink="#954F72">



Steps | Adamw_train_loss | Adan_train_loss
-- | -- | --
200 | 6.9077 | 6.9077
400 | 6.9074 | 6.9075
600 | 6.9068 | 6.9073
800 | 6.9061 | 6.907
1000 | 6.905 | 6.9064
1200 | 6.9036 | 6.9056
1400 | 6.9014 | 6.9044
1600 | 6.899 | 6.9028
1800 | 6.8953 | 6.9003
2000 | 6.8911 | 6.8971
2200 | 6.8848 | 6.8929
2400 | 6.8789 | 6.8893
2600 | 6.8699 | 6.8843
2800 | 6.8626 | 6.8805
3000 | 6.8528 | 6.8744
3200 | 6.8402 | 6.868
3400 | 6.8293 | 6.862
3600 | 6.8172 | 6.8547
3800 | 6.7989 | 6.8465
4000 | 6.7913 | 6.8405



</body>

</html>

I used the same HPs as AdamW and only changed beta from (0.9, 0.999) to (0.9, 0.92, 0.999). 
I only trained for few steps to see the trend. But it seems the loss gap from AdamW is quite big, should I change other HPs to better using Adan? How can I get a lower Loss than AdamW?
I noticed that Adan prefers a large batch size in Vision tasks, should we using a larger batch size?
Or should I train with more steps to see the trend?
Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

About the convergence trend comparison with Adamw in ViT-H #16

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Steps	Adamw_train_loss	Adan_train_loss
200	6.9077	6.9077
400	6.9074	6.9075
600	6.9068	6.9073
800	6.9061	6.907
1000	6.905	6.9064
1200	6.9036	6.9056
1400	6.9014	6.9044
1600	6.899	6.9028
1800	6.8953	6.9003
2000	6.8911	6.8971
2200	6.8848	6.8929
2400	6.8789	6.8893
2600	6.8699	6.8843
2800	6.8626	6.8805
3000	6.8528	6.8744
3200	6.8402	6.868
3400	6.8293	6.862
3600	6.8172	6.8547
3800	6.7989	6.8465
4000	6.7913	6.8405

About the convergence trend comparison with Adamw in ViT-H #16

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions