-
Notifications
You must be signed in to change notification settings - Fork 19
Problem with the entropy #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
it's a multidimensional normal distribution with a spherical covariance: then, After, you can divide by k to reduce the weight of the entropy loss. the -1 is to make the quantity positive, so the gradient descent will make it close to 0. But you are right, there is still a typo. it should be |
@alexis-jacq @xuehy Have you tried the modified entropy? I also found that the original entropy calculation seems wrong, and changed as @alexis-jacq one. But It seems the original one looks better in performance though I'm testing on a different environment (not Mujoco). I want to know how the modified entropy changes learning in Mujoco environment. Unfortunately, I couldn't run Mujoco because of Python version... |
I have a doubt with using the entropy as well. If we use as Loss the probability density function of the gaussian with u and sigma squared estimated from the net, evaluated in the point corresponding to the executed action, we will found as its derivative with respect to sigma squared: If we add to the loss also the entropy (with a minus sign), following the formula mentioned by @alexis-jacq, its derivative with respect to sigma squared would be: Since it is suggested to multiply the entropy by a constant factor (1e-4 in the Mnih's paper), it's seems to me that the contribution of the entropy would be very marginal.. Am I missing something? |
The entropy of a Gaussian distribution is
k/2 * log(2 * pi * e) + 1/2 * log(|Sigma|) according to the Wikipedia where k is the dimension of the distribution.
However, in the code, the entropy is calculated by -1/2 * (log(2*pi + |sigma|) + 1).
Why?
The text was updated successfully, but these errors were encountered: