Skip to content

Commit a36fc09

Browse files
authored
Update README.md
1 parent a91084e commit a36fc09

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -54,13 +54,13 @@ You can use token-shift in usual QKV self-attention too. I looked at the weights
5454

5555
p.s. There is a MHA_pro model in this repo with strong performance. Give it a try :)
5656

57-
# Sampling method
57+
# The top-a Sampling method
5858

59-
We also propose a new sampling method (as in src/utils.py):
59+
We also propose a new sampling method called top-a (as in src/utils.py):
6060

6161
(1) Find the max probability p_max after softmax.
6262

63-
(2) Remove all entries whose probability is lower than 0.02 * pow(p_max, 2)
63+
(2) Remove all entries whose probability is lower than 0.02 * pow(p_max, 2). So it's adaptive, hence "top-a".
6464

6565
(3) Feel free to tune the 0.02 and 2 factor.
6666

0 commit comments

Comments
 (0)