-
Notifications
You must be signed in to change notification settings - Fork 398
Support for multiple criteria? #245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yes, we thought about that but haven't reached a conclusion yet. You could post the gist of your current solution, maybe we can find a way to simplify it. In general, we proceeded along the line of "simple things should be simple, complex things should be possible". With regard to your problem, one could argue that it is not be a "simple thing" anymore, so we have to ask whether we want to support it directly. Coming to the "complex things should be possible" part, it is our philosophy to have many small but well documented methods on With that out of the way, if I had to implement the feature of supporting multiple criteria, this is what I would do:
If all of that is possible without unduly increasing code complexity, I could see us including such a feature, but it's certainly no easy task. The advantage would be that, if well implemented, it is easier for the user and more flexible, but I'm afraid that the code could get very complex. |
Thanks for taking time to consider this. Your list of requirements makes a lot of sense and indeed suggests that the implementation might get very complex. Thinking more about this, I realized that as complex as it is already, this "declarative" approach to specify a composite loss function is not flexible enough (at least for my current use case). I have a network with multiple outputs, and each individual loss function is concerned with a subset of them. Thus some additional logic is needed to dissect the network output tuple and feed it to the individual losses as appropriate. Doing this in a declarative way would make things even more complicated, and modifying individual losses to be able to accept entire network output tuple and only use what's needed would kill the purpose of having this declarative specification in the first place. Further, adding support to specify aggregation method and it's parameters (weights) will add even more complexity. With this in mind, I think that the declarative approach is not good here. Besides, it contributes little to address my original "feature request" (which was to enable monitoring the individual losses, not to simplify loss function composition). Thus I think the alternative approach that I proposed has more potential. Specifically, a user should implement a custom criterion class that incapsulates the individual losses and the aggregation logic and can deal with multi-output network (if needed). In addition to returning a single loss number it would output a dict of loss components. The functions on the This requires considerably less modifications to |
The naïve approach would be to define two criteria and overwrite def get_loss(self, y_pred, y_true, **kwargs):
y_pred_a, y_true_a = y_pred[:, 0], y_true[:, 0]
y_pred_b, y_true_b = y_pred[:, 1], y_true[:, 1]
loss_a = self.criterion_a_(y_pred_a, y_true_a)
loss_b = self.criterion_b_(y_pred_b, y_true_b)
self.history.record_batch('loss_a', loss_a)
self.history.record_batch('loss_b', loss_b)
return loss_a + loss_b This has the drawback that you'd have to overwrite def get_loss(self, y_pred, y_true, **kwargs):
losses = self.criterion_(y_pred, y_true)
self.history.record_batch('loss_a', losses['a'])
self.history.record_batch('loss_b', losses['b'])
return losses['a'] + losses['b'] The only remaining problem with such an approach is that you need an additional scoring callback that logs the additional loss terms, e.g. def loss_a_score(net, X=None, y=None):
return net.history[-1, 'batches', -1, 'loss_a']
def loss_b_score(net, X=None, y=None):
return net.history[-1, 'batches', -1, 'loss_b']
Net(callbacks=[
BatchScoring(loss_a_socre, target_extractor=skorch.utils.noop),
BatchScoring(loss_b_socre, target_extractor=skorch.utils.noop),
]) I think it would be helpful if we'd provide a convenience function to easily create such a loss scoring callback, e.g. |
For me it does not feel right that I rather see something like this inside the ...
losses = self.get_loss(y_pred, yi, X=Xi, training=True)
loss = sum(losses.values()) if isinstance(losses, dict) else losses
loss.backward()
...
return {
'loss': losses,
'y_pred': y_pred,
} The loss (now a dict) will be recorded as usual inside the fitting loop under |
In general, ottonemo's 2nd suggestion looks good enough to me for prototyping. For bringing your model to production, I see why you would want a more robust solution.
That's a good thing, because it means skorch allows you to build an intuition about what should and what shouldn't be done :D
This can be handled with the
I don't like this too much because everybody now has to deal with the possibility that I have another suggestion, though, that I'd like to discuss. Perhaps we could change
This would also require The good thing about this would be that the consumers of
|
This is certainly doable since def get_loss(self, y_pred, y_true, training=True, **kwargs):
losses = self.criterion_(y_pred, y_true)
name_suffix = '_train' if training else '_valid'
self.history.record_batch('loss_a' + name_suffix, losses['a'])
self.history.record_batch('loss_b' + name_suffix, losses['b'])
return losses['a'] + losses['b']
This does not regard train/validation though, the two history entries would share their name. We would need to add a suffix to the names depending on the step we are in, e.g.: # in train_step:
history.record({(k + '_train'): v for k, v in losses})
# in validation_step:
history.record({(k + '_valid'): v for k, v in losses}) Since we'd have to iterate over the values anyway we could call |
'Tis true |
Thanks for the inputs. I see two discussion points here: a) how to record b) in which function to record. Let's deal with them separately. How to recordI like Benjamin's idea of dictionary unpacking with automatic data extraction from tensors.
I think it would be logical to adopt the following rule: if history # [{'train_loss': 0.5, 'valid_loss': 0.6}]
history.record('train_loss', {'a': 1, 'b': 2})
history # [{'train_loss': 0.5, 'valid_loss': 0.6, 'train_loss_a': 1, 'train_loss_b': 2}] This fits nicely with @ottonemo's suggestion, simplifying
Could you elaborate why? In which function to record
Agree. But I still don't see the benefit of manipulating history inside this function. A function named |
This looks promising. There might be a little problem, though, with the naming. Say my loss is
This was for my first proposal, with the handling you proposed, it wouldn't be necessary.
I don't think it would be a best practice. As mentioned earlier, this suggestion is rather meant for fast prototyping. We're not against having a cleaner solution down the line ;) |
I think the convention should be that the criterion class outputs either a single loss tensor, or a dictionary of individual losses (the keys should not have "loss" in the name). Then at the point where the history is recorded, one or two losses = self.get_loss(y_pred, yi, X=Xi, training=True)
net.history.record('train_loss', losses)
if isinstance(losses, dict):
aggregate_loss = sum(losses.values())
net.history.record('train_loss', aggregate_loss) |
Okay, you suggest to move recording the losses to Then you propose to just sum the loss values. This could be too restrictive. I was thinking more along the lines of the criterion returning the aggregate loss using a specific key, and the individual loss components. That way, it is easier to control how the losses are aggregated. Finally, I would really hope that we could get this line to work with both
But right now I don't see how we could achieve that. |
Regarding the discussion where to place the loss logging I think we have to first make clear what the purpose of I'm also in favor of keeping the housekeeping in the step functions at a minimum to make them easy to comprehend and overwrite, everything that is placed there should be meaningful and local. With locality I mean that when introducing code into the step functions we cannot rely on its side effects. Imagine we move the logging to the step functions, this would mean that we now rely on the logging code being present there. However since the step functions are meant to be overwritten this might have been done already in some code base somewhere which now, after upgrading, does not log losses anymore. Therefore adding side-effects here is problematic ( Bottom line: logging additional things is OK since we don't (by default) rely on it later on but logging crucial things is not OK. |
Taking all your comments into account, here is my current proposal. I'll describe what changes are needed where, and then explain how this proposal addresses the points you've made.
By convention, the output of criterion classes should be either of:
Supports both types of output from criterion classes. Homogenizes it to the
The only change is that
Small simplification, history is recorded with
If the value is a dict, prefix with the attribute name (as per my previous proposal). If the dict is empty, ignore. If the value is a tuple, recursively call Discussion
No housekeeping in
This makes it a good place to perform the
Agree. Criterion is now responsible for aggregation.
... and even being a tuple of a scalar and a dict. |
Overall, I like your suggestions, especially since it only requires minimal changes in What I would discuss is whether we want to have
Probably the same behavior for tuples and lists. All in all, I'm also a little bit afraid that |
Regarding
|
I also agree with Benjamin's point regarding functions with variable number of return values. However in this case benefits outweigh deficits I believe. Also, we call criterion only inside
Some of my losses already depend on With our current solution pretty much all the added complexity goes to the history recording internals. Of course this does require effort to implement and test, but I believe from the user perspective it does not complicate anything, only adds value. |
OK, let's recap.
What needs to be done
The currently proposed solution sketches like this: def get_loss(...): # -> (float, dict)
loss = self.criterion_(y_true, y_pred)
return loss if isinstance(loss, tuple) else (loss, {})
def train_step(...):
# ...
loss, loss_parts = self.get_loss(...)
loss.backward()
# ...
return (loss, loss_parts)
# history.record
def record(k, v):
if isinstance(v, tuple):
[record(k, n) for n in v]
elif isinstance(k, dict):
[record(k + '_' + vk for vk, vv in v.items()]
else:
# normal record Did I get everything right? Edit: I want to note that adding this functionality to the history will prevent us from writing such data structures to the history. Not sure if that is good. |
Great summary!
Benjamin proposed to allow unpacking both tuples and lists in # history.record
def record(k, v):
if isinstance(v, tuple) or isinstance(v, list):
[record(k, n) for n in v]
elif isinstance(v, dict):
[record(k + '_' + vk for vk, vv in v.items()]
else:
# normal record
That's a valid concern. When I proposed to add an "events" list to the history, Benjamin's answer was
So I got an impression that his idea is to keep history simple and flat. But if you have a different vision, then this last point should be seriously considered. |
I tried implementing this and I stumbled upon the following issues:
history.record('train_loss', (step['loss'][0].item(), {k: v.item() for k, v in step['loss'][1].items()}))
# alternatively
history.record('train_loss', step['loss'][0].item())
history.record('train_loss', {k: v.item() for k, v in step['loss'][1].items()})) Maybe something like this but I'm still not too happy about this: def train_step(...):
loss, loss_parts = self.get_loss(y_pred, yi, X=Xi, training=True)
return {
'loss': loss,
'parts': loss_parts,
'y_pred': y_pred,
}
def fit_loop(...):
# ...
self.history.record_batch('train_loss', step['loss'].item())
self.history.record_batch('train_loss', step.get('parts', {}))
# ...
We could also think about wrapping the values in a transparent container object which is |
Why not? Nobody should ever write tensors to the history. If somebody does so, then it's a mistake. (GPU memory gets blocked, eventually leading to out-of-memory error.) If type checking/conversion is built-in to the history, that's an additional safety net, isn't it a good thing? 👍 for your last proposal. The last hunk can be reduced to: self.history.record_batch('train_loss', step['loss'], step['parts']) assuming that conversion is implemented, empty dicts are ignored, and |
Any news on this? That would be extremely helpful! |
No news on this front unfortunately. What exactly is the problem you need to solve? As mentioned in an earlier comment, a quick and dirty solution is to override |
Hi! I solved the problem by subclassing However, I struggle with
The reason behind is that Edit: I just passed |
At the moment skorch does not explicitly support fitting with respect to multiple loss functions. Of course, one can define a composite loss function that computes individual losses internally and outputs a sum, but this does not allow to monitor loss components individually through skorch callbacks and/or history.
Perhaps
NeuralNet
can be made to accept a list (or a dict) of criteria?And the report would look something like this:
Or, alternatively, composite loss functions may output a dict with individual losses and
validation/train_step()
may take care of summing them up.I was wondering if you have already considered adding such a feature? Do you see any major obstacles? If you can provide me with some guidance, I'd be interested to implement this.
The text was updated successfully, but these errors were encountered: