Mixing the order of `--config` and `fit` in LightningCLI can cause confusion #19714

awaelchli · 2024-03-29T13:50:38Z

Bug description

If you launch with

python main.py --config ... fit

instead of

python main.py fit --config ...

Then you end up with cryptic errors such as

No action for key "trainer.accelerator

See report on twitter:
https://x.com/4ndr3aR/status/1772676605837484054?s=20

This is because the LightningCLI parser is built after applying the fit stage is parsed, on only that can match the provided config file. In general, the order matters for jsonargparse for good reasons.

Is there something we can do to improve the error for the user, with a sanity check before parsing begins?

Repro example:

import sys

import torch
from lightning.pytorch import LightningModule
from torch.utils.data import DataLoader, Dataset

from lightning.pytorch.cli import LightningCLI


class RandomDataset(Dataset):
    def __init__(self, size, length):
        self.len = length
        self.data = torch.randn(length, size)

    def __getitem__(self, index):
        return self.data[index]

    def __len__(self):
        return self.len


class BoringModel(LightningModule):
    def __init__(self):
        super().__init__()
        self.layer = torch.nn.Linear(32, 2)

    def forward(self, x):
        return self.layer(x)

    def training_step(self, batch, batch_idx):
        return self(batch).sum()

    def configure_optimizers(self):
        return torch.optim.SGD(self.layer.parameters(), lr=0.1)

    def train_dataloader(self):
        return DataLoader(RandomDataset(32, 64), batch_size=2)


sys.argv = ["bug_report_model.py", "--config", "config.yaml", "fit"]
cli = LightningCLI(BoringModel)

cc @Borda @carmocca @mauvilsa

The text was updated successfully, but these errors were encountered:

awaelchli · 2024-03-29T14:38:34Z

It also has the consequence that with LightningCLI(..., run=False) you can't provide a config file, and if you attempt you get:

error: Validation failed: No action for key "ckpt_path" to check its value.

awaelchli · 2024-03-29T15:06:21Z

The problem is I don't know how to make a smart error message here, because the config file could also contain a section fit and then it works. Making a smart error message here would require parsing the config file before making a decision. Would we want this approach?

mauvilsa · 2024-03-30T14:30:54Z

One idea I can propose is to have a way to disable the --config option before the subcommand. It is true that a global config file with a fit: entry or any other subcommand. But I don't think that everyone needs or wants this option. Not being there avoids the possibility of confusion, since --config will only be accepted after the subcommand.

awaelchli · 2024-03-31T01:48:07Z

Hey @mauvilsa thanks for the input. It's sort of what I tried to do in #19715 until I saw the tests failing that had a fit: entry. I think that even if we did that restriction, the only way for doing LightningCLI(..., run=False) with a config file would be to have the fit: entry so I'm not sure we can consider the option to remove this feature.

mauvilsa · 2024-03-31T13:32:17Z

Actually what I meant was to have a parameter, e.g. multi_subcommand_config: bool = True that if set to False then the global --config is not added. By default it should be True, otherwise it would be a breaking change.

carmocca · 2024-04-03T11:56:36Z

I wouldn't touch anything here. This is a feature (as already noticed) and I don't expect anybody to go ahead and set this strange flag that most won't understand why it exists.

I would say that a lot of the confusion stems from the error message presented by jsonargparse: "No action for key foo to check its value". @mauvilsa This of assumes that the user of the CLI understands how the CLI works (what is an action, what is a key, what is checking their values) but that's generally not true as most of the users are not the developers who implemented the CLI and the users have no idea about what's going on under the hood.

Could jsonargparse update or provide a way to customize this message? For the LightningCLI and the CLI in litgpt it would be better to show something like "Failed to parse the foo key in your config or arguments. Make sure the format matches that returned by --print_config"

mauvilsa · 2024-04-04T06:39:25Z

Could jsonargparse update or provide a way to customize this message? For the LightningCLI and the CLI in litgpt it would be better to show something like "Failed to parse the foo key in your config or arguments. Make sure the format matches that returned by --print_config"

The confusion reported in this issue is about providing a config before or after the subcommand. The custom error message above does not help. In fact, --print_config can also be used before or after the subcommand, making still confusing.

I do agree that the error messages in jsonargparse can be improved, which is something generic, and not particular to lightning. And better to invest time on that, than supporting a way to customize error messages which also requires effort. One possibility could be to track the source of the problematic key, and print different error messages depending on it origin. E.g. if the problem is before the subcommand, the error could suggest the user to look at the help without providing a subcommand. And if it is after, then the error could suggest the help for that particular subcommand.

awaelchli added bug Something isn't working needs triage Waiting to be triaged by maintainers labels Mar 29, 2024

github-actions bot added the ver: 2.2.x label Mar 29, 2024

awaelchli added feature Is an improvement or enhancement lightningcli pl.cli.LightningCLI and removed needs triage Waiting to be triaged by maintainers labels Mar 29, 2024

awaelchli added this to the 2.3 milestone Mar 29, 2024

awaelchli self-assigned this Mar 29, 2024

awaelchli mentioned this issue Mar 29, 2024

Improve error message in LightningCLI if subcommand is not the first argument #19715

Closed

awaelchli modified the milestones: 2.3, future Jun 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mixing the order of `--config` and `fit` in LightningCLI can cause confusion #19714

Mixing the order of `--config` and `fit` in LightningCLI can cause confusion #19714

awaelchli commented Mar 29, 2024 •

edited

Loading

awaelchli commented Mar 29, 2024

awaelchli commented Mar 29, 2024 •

edited

Loading

mauvilsa commented Mar 30, 2024

awaelchli commented Mar 31, 2024

mauvilsa commented Mar 31, 2024

carmocca commented Apr 3, 2024

mauvilsa commented Apr 4, 2024

Mixing the order of --config and fit in LightningCLI can cause confusion #19714

Mixing the order of --config and fit in LightningCLI can cause confusion #19714

Comments

awaelchli commented Mar 29, 2024 • edited Loading

Bug description

awaelchli commented Mar 29, 2024

awaelchli commented Mar 29, 2024 • edited Loading

mauvilsa commented Mar 30, 2024

awaelchli commented Mar 31, 2024

mauvilsa commented Mar 31, 2024

carmocca commented Apr 3, 2024

mauvilsa commented Apr 4, 2024

Mixing the order of `--config` and `fit` in LightningCLI can cause confusion #19714

Mixing the order of `--config` and `fit` in LightningCLI can cause confusion #19714

awaelchli commented Mar 29, 2024 •

edited

Loading

awaelchli commented Mar 29, 2024 •

edited

Loading