We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent a91084e commit a36fc09Copy full SHA for a36fc09
README.md
@@ -54,13 +54,13 @@ You can use token-shift in usual QKV self-attention too. I looked at the weights
54
55
p.s. There is a MHA_pro model in this repo with strong performance. Give it a try :)
56
57
-# Sampling method
+# The top-a Sampling method
58
59
-We also propose a new sampling method (as in src/utils.py):
+We also propose a new sampling method called top-a (as in src/utils.py):
60
61
(1) Find the max probability p_max after softmax.
62
63
-(2) Remove all entries whose probability is lower than 0.02 * pow(p_max, 2)
+(2) Remove all entries whose probability is lower than 0.02 * pow(p_max, 2). So it's adaptive, hence "top-a".
64
65
(3) Feel free to tune the 0.02 and 2 factor.
66
0 commit comments