Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stan 2.33: Move IO munging to external package, refactors #681

Merged
merged 20 commits into from
Sep 5, 2023

Conversation

WardBrian
Copy link
Member

@WardBrian WardBrian commented Jul 25, 2023

Submission Checklist

  • Run unit tests
  • Declare copyright holder and open-source license: see below

Summary

IO changes for 2.33 (tuple support).

  • use https://github.com/WardBrian/stanio (tbd: will this move under stan-dev?)
    • support writing tuples to json
    • support reading/extracting tuples from draws array
  • breaking changes:
    • internals of InferenceMetadata changed (not technically part of our public API, but it was mentioned some places in our docs)
    • VB's .stan_variable function now returns the varational draws, not the mean approximation (same as Variational samples #652)
    • MLE and VB always return a np.ndarray from .stan_variable, never just a Python float

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company):

Simons Foundation

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

@WardBrian WardBrian added the feature New feature or request label Jul 25, 2023
@codecov-commenter
Copy link

codecov-commenter commented Jul 25, 2023

Codecov Report

Merging #681 (7945536) into develop (107a347) will decrease coverage by 0.60%.
The diff coverage is n/a.

@@             Coverage Diff             @@
##           develop     #681      +/-   ##
===========================================
- Coverage    80.60%   80.01%   -0.60%     
===========================================
  Files           72       72              
  Lines        11292    10951     -341     
===========================================
- Hits          9102     8762     -340     
+ Misses        2190     2189       -1     

see 40 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@WardBrian WardBrian changed the title First pass at writing tuples to JSON Stan 2.33: Move IO munging to external package, refactors Aug 14, 2023
@WardBrian WardBrian added method inputs Python objects to CmdStan inputs method outputs CmdStan outputs to Python objects labels Aug 14, 2023
@WardBrian
Copy link
Member Author

@mitzimorris - I think this is ready to review

It is probably hard to review without simultaneously "reviewing" stanio

@WardBrian WardBrian linked an issue Aug 15, 2023 that may be closed by this pull request
@tillahoffmann tillahoffmann mentioned this pull request Aug 17, 2023
2 tasks
@WardBrian WardBrian requested a review from mitzimorris August 23, 2023 18:46
Copy link
Member

@mitzimorris mitzimorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should go with breaking change w/r/t metatdata visibility.

@WardBrian
Copy link
Member Author

The breaking change is also the different return type of stan_variable for optimize and variational

@mitzimorris
Copy link
Member

mitzimorris commented Aug 24, 2023

I took a look at the FBProphet code, and it doesn't use stan_variable - although it runs optimization, it's pulling things back as a np array.
I'm willing to go with a breaking change.
the other place to check is ArviZ. checking now

@WardBrian
Copy link
Member Author

I don't think arviz touches anything other than CmdStanMCMC

@mitzimorris
Copy link
Member

regarding moving stanio under stan-dev - should we create repo stanio with subdirs according to language -
python, r, c++?

@WardBrian
Copy link
Member Author

I don't know enough about the R ecosystem to know if they need this - I think rvars/posterior cover a lot of the existing use-cases, but of course tuples change things.

C++ I think it would be very difficult to write something like this, since the return type of the extracting functions depends on the data

@WardBrian WardBrian mentioned this pull request Aug 25, 2023
2 tasks
@mitzimorris
Copy link
Member

I think it's OK to make this a repo under standev just for the python io used by CmdStanPy and BridgeStan and future Python packages.

@WardBrian WardBrian marked this pull request as ready for review September 5, 2023 15:04
@tillahoffmann
Copy link
Contributor

Would it make sense to add a dtype argument somewhere in the data loading, e.g., as an extra argument to CmdStanModel.{sample,optimize,...} to specify the dtype returned by stan_variable? This could be useful for reducing the memory footprint of larger models. For example, to evaluate WAIC for the election88 dataset of CBS News polls, the log_lik variable has num_draws_sampling * chains * 11565 elements. Using cmdstanpy defaults that amounts to ~ 740 MB or ~ 1.5 GB if we include the logits for the binary outcome. Not enormous, but my laptop wasn't too happy when I compared several models.

This may not be the right thread, but thought I'd mention it in case it affects the data munging.

@WardBrian
Copy link
Member Author

@tillahoffmann I think that's worth its own issue. It is sort of orthogonal to this issue, since the transformations we apply here are agnostic to what the dtype of the draws are. If we changed what we were reading it from the files, these should propagate nicely

@WardBrian WardBrian merged commit a28bb95 into develop Sep 5, 2023
@WardBrian WardBrian deleted the feature/tuple-io branch September 5, 2023 16:15
WardBrian added a commit that referenced this pull request Sep 5, 2023
Effectively reverts the breaking changes in #681 temporarily
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request method inputs Python objects to CmdStan inputs method outputs CmdStan outputs to Python objects
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parity for stan_variable of CmdStanVB.
4 participants