Skip to content

Problem with the entropy #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
xuehy opened this issue Jul 31, 2017 · 3 comments
Open

Problem with the entropy #2

xuehy opened this issue Jul 31, 2017 · 3 comments

Comments

@xuehy
Copy link

xuehy commented Jul 31, 2017

The entropy of a Gaussian distribution is
k/2 * log(2 * pi * e) + 1/2 * log(|Sigma|) according to the Wikipedia where k is the dimension of the distribution.

However, in the code, the entropy is calculated by -1/2 * (log(2*pi + |sigma|) + 1).

Why?

@alexis-jacq
Copy link

alexis-jacq commented Aug 4, 2017

it's a multidimensional normal distribution with a spherical covariance:
you have Nd_Sigma = 1d_sigma^2 * Nd_Identity. so |Nd_Sigma| = 1d_sigma^(2*k)

then, k/2 * log(2 * pi * e) + 1/2 * log(|Sigma|) = k/2 * (log(2*pi*e*1d_sigma^2)) = k/2 * (log(2*pi*1d_sigma^2) + 1)

After, you can divide by k to reduce the weight of the entropy loss. the -1 is to make the quantity positive, so the gradient descent will make it close to 0.

But you are right, there is still a typo. it should be
entropy = -0.5*((sigma_sq x 2*pi.expand_as(sigma_sq)).log()+1)
instead of
entropy = -0.5*((sigma_sq + 2*pi.expand_as(sigma_sq)).log()+1)

@kkjh0723
Copy link

kkjh0723 commented Sep 8, 2017

@alexis-jacq @xuehy Have you tried the modified entropy? I also found that the original entropy calculation seems wrong, and changed as @alexis-jacq one. But It seems the original one looks better in performance though I'm testing on a different environment (not Mujoco). I want to know how the modified entropy changes learning in Mujoco environment. Unfortunately, I couldn't run Mujoco because of Python version...

@giubacchio
Copy link

giubacchio commented Feb 21, 2018

I have a doubt with using the entropy as well. If we use as Loss the probability density function of the gaussian with u and sigma squared estimated from the net, evaluated in the point corresponding to the executed action, we will found as its derivative with respect to sigma squared:
dL/dsigma_sq = 1/(2*sigma_sq) - (x-u)^2/(2*sigma_sq^2)

If we add to the loss also the entropy (with a minus sign), following the formula mentioned by @alexis-jacq, its derivative with respect to sigma squared would be:
d(-E)/dsigma_sq = -1/(2*sigma_sq)
which is equivalent (apart from the minus sign) to the first term of dL/dsigma_sq.

Since it is suggested to multiply the entropy by a constant factor (1e-4 in the Mnih's paper), it's seems to me that the contribution of the entropy would be very marginal..

Am I missing something?
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants