-
-
Notifications
You must be signed in to change notification settings - Fork 71
Parity for stan_variable
of CmdStanVB
.
#636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I didn't realize this until I was working on #634. It's definitely an odd inconsistency, but it's not clear to me which one is better. If you think of VI as mainly being useful for initializing something like MCMC (which my impression is that this seems to be how a lot of the Stan community views it) then the point estimate is sort of what you want. To peak behind the curtain, once Pathfinder lands in CmdStan (stan-dev/stan#3123) I am expecting the general recommendation to be "don't use ADVI without a good reason", so maybe this is not a super critical distinction anyway. |
@WardBrian: even if we're using it to initialize MCMC, we want to initialize at a draw from the approximate posterior. You can see this most easily if the target is multivariate normal---then the optimizer finds the mode, the approximation we get is perfect, and we want to initialize with a draw from the posterior, which is a draw from the approximate distribution, not its mode. The higher-level reason to start with a draw from the posterior is that a Markov chain that preserves the stationary distribution (as ours do) will be stationary throughout if you initialize with a draw from the posterior. |
This is tricky. The suggestion from @tillahoffmann is more consistent and how this should've been coded initially. But given that it was coded inconsistently and released, it might make more sense to just choose a different name going forward and deprecate the old names. I don't like for like |
@bob-carpenter with your clarification in mind I’m not sure why if there’s any reason we chose the current behavior, I can’t recall. I think the name That aside, the issue is not as simple as introducing a different name, since we also alias the |
that's why the name was chosen. I don't think we should change it now. the implementation for ADVI is the result of both laziness and ignorance on my part. a) we were steering people away from ADVI |
I don't like the overload I included an option for turning the Jacobian adjustment on for Laplace approximation, but we still can't do proper Laplace approximation until the same is done for the optimization step. I opened an issue but haven't started coding it: With the Jacobian adjustment for constrained parameters turned off, in cases where there are constrained parameters, we get a penalized maximum likelihood estimate (penalized MLE), not a max a posteriori (MAP) estimate. This should be easy to fix---a template parameter just needs to be plumbed down to the optimizer. |
Keeping on the original thread, I’m not opposed to doing a major version release with breaking changes. There are a few other breaking changes which would be nice (particularly with things like #634), and I’m sure @mitzimorris can think of some things which would be nice to change that we’ve learned in the past year. I don’t think very many people are using the things we would be breaking, for example I believe Facebook’s prophet would be unaffected. We’d want to take some time to think about it and what we feel needs to change, but I don’t see why it would be off the table |
for multi-dimensional variables in the Stan program, we need an accessor which doesn't flatten the elements into a 1-d array. the use of |
I think The |
I've opened a draft PR in #652 which addresses this issue pending a decision on naming the method. |
Calling
stan_variable(name)
on aCmdStanVB
object currently returns a point estimate whereas callingstan_variable(name)
on aCmdStanMCMC
object returns posterior samples. Would there be interest to updateCmdStanVB.stan_variable
to return samples from the variational posterior approximation such that variational and MCMC sampling can be swapped in/out easily? This would be a breaking change.The text was updated successfully, but these errors were encountered: