Parity for `stan_variable` of `CmdStanVB`. #636

tillahoffmann · 2022-11-08T15:36:38Z

Calling stan_variable(name) on a CmdStanVB object currently returns a point estimate whereas calling stan_variable(name) on a CmdStanMCMC object returns posterior samples. Would there be interest to update CmdStanVB.stan_variable to return samples from the variational posterior approximation such that variational and MCMC sampling can be swapped in/out easily? This would be a breaking change.

The text was updated successfully, but these errors were encountered:

WardBrian · 2022-11-08T15:43:06Z

I didn't realize this until I was working on #634. It's definitely an odd inconsistency, but it's not clear to me which one is better. If you think of VI as mainly being useful for initializing something like MCMC (which my impression is that this seems to be how a lot of the Stan community views it) then the point estimate is sort of what you want.
If you view it as a true alternative to sampling, you probably want the draws.

To peak behind the curtain, once Pathfinder lands in CmdStan (stan-dev/stan#3123) I am expecting the general recommendation to be "don't use ADVI without a good reason", so maybe this is not a super critical distinction anyway.

bob-carpenter · 2022-11-08T20:10:21Z

@WardBrian: even if we're using it to initialize MCMC, we want to initialize at a draw from the approximate posterior.

You can see this most easily if the target is multivariate normal---then the optimizer finds the mode, the approximation we get is perfect, and we want to initialize with a draw from the posterior, which is a draw from the approximate distribution, not its mode.

The higher-level reason to start with a draw from the posterior is that a Markov chain that preserves the stationary distribution (as ours do) will be stationary throughout if you initialize with a draw from the posterior.

bob-carpenter · 2022-11-08T20:16:12Z

This would be a breaking change.

This is tricky. The suggestion from @tillahoffmann is more consistent and how this should've been coded initially.

But given that it was coded inconsistently and released, it might make more sense to just choose a different name going forward and deprecate the old names. I don't like stan_variable anyway. Instead, we can use stan_sample or stan_draws to extract draws and stan_point_estimate or something like that for the VI parameter estimate. Then we can deprecate stan_variable everywhere and use stan_draws consistently going forward. And then eventually just get rid of stan_variable.

for like stan_draws(name) instead of stan_variable(name). Then we can implement stan_draws for both VI and MCMC and deprecate the originals without breaking backward compatibility.

WardBrian · 2022-11-08T20:21:20Z

@bob-carpenter with your clarification in mind I’m not sure why if there’s any reason we chose the current behavior, I can’t recall.

I think the name stan_variable is nice because it also makes sense for true point estimates like from optimization, or for generated quantities. To me I read it as “please give me the variable named from the Stan program”.

That aside, the issue is not as simple as introducing a different name, since we also alias the . operator to stan_variable, so fit.xyz is equivalent to fit.stan_variable(“xyz”). If we introduced a new name, we’d either need to leave the existing behavior for . (suboptimal), or change it over (breaking)

mitzimorris · 2022-11-08T20:31:40Z

To me I read it as “please give me the variable named from the Stan program”.

that's why the name was chosen. I don't think we should change it now.

the implementation for ADVI is the result of both laziness and ignorance on my part.

a) we were steering people away from ADVI
b) the CSV file format was different enough that it would have required yet another one-off parser
c) I thought that there was some question as to whether or not the sample should or could have been generated differently - perhaps that was my misunderstanding.

bob-carpenter · 2022-11-08T20:37:37Z

I don't like the overload stan_variable for different inference techniques. I think of Stan variables as being part of a Stan program. Those don't have values. Instead, I think of the program as specifying a concrete model implementation. Given a model, there are multiple ways to do inference and we provide MCMC, VI, and optimization within Stan. All of these can produce samples---MCMC directly, VI by sampling from the approximating distribution, and optimization by sampling from the Laplace approximation. They can all produce point estimates, too---MCMC lets you take posterior mean or median, VI provides an approximate posterior mean, and optimization as currently coded gives you penalized maximum likelihood (if you could leave the Jacobian correction on, it'd also let you take a max a posteriori estimate).

I included an option for turning the Jacobian adjustment on for Laplace approximation, but we still can't do proper Laplace approximation until the same is done for the optimization step. I opened an issue but haven't started coding it:

stan-dev/stan#3149

With the Jacobian adjustment for constrained parameters turned off, in cases where there are constrained parameters, we get a penalized maximum likelihood estimate (penalized MLE), not a max a posteriori (MAP) estimate. This should be easy to fix---a template parameter just needs to be plumbed down to the optimizer.

WardBrian · 2022-11-08T20:42:15Z

Keeping on the original thread, I’m not opposed to doing a major version release with breaking changes. There are a few other breaking changes which would be nice (particularly with things like #634), and I’m sure @mitzimorris can think of some things which would be nice to change that we’ve learned in the past year.

I don’t think very many people are using the things we would be breaking, for example I believe Facebook’s prophet would be unaffected.

We’d want to take some time to think about it and what we feel needs to change, but I don’t see why it would be off the table

mitzimorris · 2022-11-08T20:55:39Z

I think of Stan variables as being part of a Stan program.

for multi-dimensional variables in the Stan program, we need an accessor which doesn't flatten the elements into a 1-d array. the use of variable was meant to reflect that. happy to find better names.

WardBrian · 2022-11-08T21:12:15Z

I think rstan calls the same idea extract, which I don't mind. It gets around the issue of having to answer extract what (samples, point estimates, etc) by just focusing on the verb.

The . issue would still probably warrant a breaking change, which would (in my opinion) only really be worth it if we went and did some more standardization. For example, stan_variable for optimization (which, to Bob's point, we call CmdStanMLE, which is probably the wrong name for the result of optimization going forward) sometimes returns a ndarray and sometimes returns a float - for consistency I think it should return a 1-d array in the scalar case just for uniform handling. There are a few more things like this in the package.

tillahoffmann · 2023-02-13T23:08:57Z

I've opened a draft PR in #652 which addresses this issue pending a decision on naming the method.

tillahoffmann mentioned this issue Feb 13, 2023

Variational samples #652

Closed

2 tasks

WardBrian linked a pull request Aug 15, 2023 that will close this issue

Stan 2.33: Move IO munging to external package, refactors #681

Merged

2 tasks

WardBrian closed this as completed in #681 Sep 5, 2023

WardBrian mentioned this issue Sep 5, 2023

Planning the next CmdStanPy release #690

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parity for `stan_variable` of `CmdStanVB`. #636

Parity for `stan_variable` of `CmdStanVB`. #636

tillahoffmann commented Nov 8, 2022

WardBrian commented Nov 8, 2022

bob-carpenter commented Nov 8, 2022

bob-carpenter commented Nov 8, 2022

WardBrian commented Nov 8, 2022

mitzimorris commented Nov 8, 2022 •

edited

Loading

bob-carpenter commented Nov 8, 2022

WardBrian commented Nov 8, 2022

mitzimorris commented Nov 8, 2022

WardBrian commented Nov 8, 2022

tillahoffmann commented Feb 13, 2023

Parity for stan_variable of CmdStanVB. #636

Parity for stan_variable of CmdStanVB. #636

Comments

tillahoffmann commented Nov 8, 2022

WardBrian commented Nov 8, 2022

bob-carpenter commented Nov 8, 2022

bob-carpenter commented Nov 8, 2022

WardBrian commented Nov 8, 2022

mitzimorris commented Nov 8, 2022 • edited Loading

bob-carpenter commented Nov 8, 2022

WardBrian commented Nov 8, 2022

mitzimorris commented Nov 8, 2022

WardBrian commented Nov 8, 2022

tillahoffmann commented Feb 13, 2023

Parity for `stan_variable` of `CmdStanVB`. #636

Parity for `stan_variable` of `CmdStanVB`. #636

mitzimorris commented Nov 8, 2022 •

edited

Loading