Skip to content

Parity for stan_variable of CmdStanVB. #636

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tillahoffmann opened this issue Nov 8, 2022 · 10 comments · Fixed by #681
Closed

Parity for stan_variable of CmdStanVB. #636

tillahoffmann opened this issue Nov 8, 2022 · 10 comments · Fixed by #681

Comments

@tillahoffmann
Copy link
Contributor

Calling stan_variable(name) on a CmdStanVB object currently returns a point estimate whereas calling stan_variable(name) on a CmdStanMCMC object returns posterior samples. Would there be interest to update CmdStanVB.stan_variable to return samples from the variational posterior approximation such that variational and MCMC sampling can be swapped in/out easily? This would be a breaking change.

@WardBrian
Copy link
Member

I didn't realize this until I was working on #634. It's definitely an odd inconsistency, but it's not clear to me which one is better. If you think of VI as mainly being useful for initializing something like MCMC (which my impression is that this seems to be how a lot of the Stan community views it) then the point estimate is sort of what you want.
If you view it as a true alternative to sampling, you probably want the draws.

To peak behind the curtain, once Pathfinder lands in CmdStan (stan-dev/stan#3123) I am expecting the general recommendation to be "don't use ADVI without a good reason", so maybe this is not a super critical distinction anyway.

@bob-carpenter
Copy link
Member

@WardBrian: even if we're using it to initialize MCMC, we want to initialize at a draw from the approximate posterior.

You can see this most easily if the target is multivariate normal---then the optimizer finds the mode, the approximation we get is perfect, and we want to initialize with a draw from the posterior, which is a draw from the approximate distribution, not its mode.

The higher-level reason to start with a draw from the posterior is that a Markov chain that preserves the stationary distribution (as ours do) will be stationary throughout if you initialize with a draw from the posterior.

@bob-carpenter
Copy link
Member

This would be a breaking change.

This is tricky. The suggestion from @tillahoffmann is more consistent and how this should've been coded initially.

But given that it was coded inconsistently and released, it might make more sense to just choose a different name going forward and deprecate the old names. I don't like stan_variable anyway. Instead, we can use stan_sample or stan_draws to extract draws and stan_point_estimate or something like that for the VI parameter estimate. Then we can deprecate stan_variable everywhere and use stan_draws consistently going forward. And then eventually just get rid of stan_variable.

for like stan_draws(name) instead of stan_variable(name). Then we can implement stan_draws for both VI and MCMC and deprecate the originals without breaking backward compatibility.

@WardBrian
Copy link
Member

@bob-carpenter with your clarification in mind I’m not sure why if there’s any reason we chose the current behavior, I can’t recall.

I think the name stan_variable is nice because it also makes sense for true point estimates like from optimization, or for generated quantities. To me I read it as “please give me the variable named from the Stan program”.

That aside, the issue is not as simple as introducing a different name, since we also alias the . operator to stan_variable, so fit.xyz is equivalent to fit.stan_variable(“xyz”). If we introduced a new name, we’d either need to leave the existing behavior for . (suboptimal), or change it over (breaking)

@mitzimorris
Copy link
Member

mitzimorris commented Nov 8, 2022

To me I read it as “please give me the variable named from the Stan program”.

that's why the name was chosen. I don't think we should change it now.

the implementation for ADVI is the result of both laziness and ignorance on my part.

a) we were steering people away from ADVI
b) the CSV file format was different enough that it would have required yet another one-off parser
c) I thought that there was some question as to whether or not the sample should or could have been generated differently - perhaps that was my misunderstanding.

@bob-carpenter
Copy link
Member

I don't like the overload stan_variable for different inference techniques. I think of Stan variables as being part of a Stan program. Those don't have values. Instead, I think of the program as specifying a concrete model implementation. Given a model, there are multiple ways to do inference and we provide MCMC, VI, and optimization within Stan. All of these can produce samples---MCMC directly, VI by sampling from the approximating distribution, and optimization by sampling from the Laplace approximation. They can all produce point estimates, too---MCMC lets you take posterior mean or median, VI provides an approximate posterior mean, and optimization as currently coded gives you penalized maximum likelihood (if you could leave the Jacobian correction on, it'd also let you take a max a posteriori estimate).

I included an option for turning the Jacobian adjustment on for Laplace approximation, but we still can't do proper Laplace approximation until the same is done for the optimization step. I opened an issue but haven't started coding it:

stan-dev/stan#3149

With the Jacobian adjustment for constrained parameters turned off, in cases where there are constrained parameters, we get a penalized maximum likelihood estimate (penalized MLE), not a max a posteriori (MAP) estimate. This should be easy to fix---a template parameter just needs to be plumbed down to the optimizer.

@WardBrian
Copy link
Member

Keeping on the original thread, I’m not opposed to doing a major version release with breaking changes. There are a few other breaking changes which would be nice (particularly with things like #634), and I’m sure @mitzimorris can think of some things which would be nice to change that we’ve learned in the past year.

I don’t think very many people are using the things we would be breaking, for example I believe Facebook’s prophet would be unaffected.

We’d want to take some time to think about it and what we feel needs to change, but I don’t see why it would be off the table

@mitzimorris
Copy link
Member

I think of Stan variables as being part of a Stan program.

for multi-dimensional variables in the Stan program, we need an accessor which doesn't flatten the elements into a 1-d array. the use of variable was meant to reflect that. happy to find better names.

@WardBrian
Copy link
Member

I think rstan calls the same idea extract, which I don't mind. It gets around the issue of having to answer extract what (samples, point estimates, etc) by just focusing on the verb.

The . issue would still probably warrant a breaking change, which would (in my opinion) only really be worth it if we went and did some more standardization. For example, stan_variable for optimization (which, to Bob's point, we call CmdStanMLE, which is probably the wrong name for the result of optimization going forward) sometimes returns a ndarray and sometimes returns a float - for consistency I think it should return a 1-d array in the scalar case just for uniform handling. There are a few more things like this in the package.

@tillahoffmann tillahoffmann mentioned this issue Feb 13, 2023
2 tasks
@tillahoffmann
Copy link
Contributor Author

I've opened a draft PR in #652 which addresses this issue pending a decision on naming the method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants