-
Notifications
You must be signed in to change notification settings - Fork 563
Support for temporary/fantasy training data #177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Is there a way to do this efficiently without resetting the mean/covar caches? Seems like re-computing the full kernel matrix would be quite expensive if all we do is modify the data by adding a small number of fantasies. |
So dealing with the mean caches is basically a case of dealing with linear systems involving bordered matrices, because we basically want to update The covar cache is only used/computed with LOVE, which would take some thought on how to update. This is actually something that @andrewgordonwilson and I are actively researching: how to update LOVE in the setting where you add individual data points. I have some ideas about this, but they are a bit complicated for a github issue. |
For what it's worth, doing the solve from scratch using CG has the same asymptotic complexity for exact GPs ( |
Being smart about warm-starting should probably be very helpful here as well, right?. E.g. the initial guess could take the solution from the previous solve for the existing points, and sth. ad-hoc like the mean across the solution for the previous points. |
Good point, initializing with the existing mean cache with a few extra zeros concatenated for the fantasy examples is a smart idea. Assuming we don't expect the training data to ever change radically, what do you think about making it default behavior to, if a mean cache already exists, expand it to match the training data size and use it as initialization always? |
That sounds good to me. Would you want to use zeros or the mean across the mean cache? |
Alright, I gave this some more thought and for temporary fantasy points and exact GPs specifically there is an O(kn+m^2) time approximate solution (where m is the number of fantasy points) for updating the mean cache and covar cache IF we are already using LOVE, where k is the rank of the decomposition used for LOVE. I'll implement this idea as the default behavior when There will be some kinda involved internal changes with this, so I can either get started on this now or continue with the original plan of helping finish up priors first. cc @andrewgordonwilson, since the trick I'm talking about here is highly relevant to our discussion about updating the precomputed cache for LOVE. |
Let's try to get the priors in first, so we can avoid working on diverging branches as much as possible |
This is related to the need for "fantasy" observations for BayesOpt, where, in addition to training data, we want to condition a model on extra temporary training data with sampled function values as labels.
Right now, this is technically supported via the
set_train_data
method, which can make arbitrary changes to the training data withstrict=False
, which would let us just directly append the fantasy data to the training data:gpytorch/gpytorch/models/exact_gp.py
Line 50 in c16ec46
If we'd prefer a better interface that doesn't involve the user tracking how many fantasy points they've added so they can remove them later, we could add a similar method (
set_fantasy_data
?) that modifiesself.fantasy_inputs
andself.fantasy_targets
attributes (defaultNone
) similar to thetrain_inputs
andtrain_targets
ones modified there.If we did this, then we'd basically need to update
__call__
in three ways.First, we concatenate in
fantasy_inputs
here:gpytorch/gpytorch/models/exact_gp.py
Lines 92 to 95 in c16ec46
Then concatenate
fantasy_targets
on tofantasy_labels
here:gpytorch/gpytorch/models/exact_gp.py
Line 104 in c16ec46
And update the
n_train
argument to account for the fantasy training data here:gpytorch/gpytorch/models/exact_gp.py
Line 110 in c16ec46
The text was updated successfully, but these errors were encountered: