Skip to content

Mixing the order of --config and fit in LightningCLI can cause confusion #19714

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
awaelchli opened this issue Mar 29, 2024 · 7 comments
Open
Assignees
Labels
bug Something isn't working feature Is an improvement or enhancement lightningcli pl.cli.LightningCLI ver: 2.2.x
Milestone

Comments

@awaelchli
Copy link
Contributor

awaelchli commented Mar 29, 2024

Bug description

If you launch with

python main.py --config ... fit 

instead of

python main.py fit --config ...

Then you end up with cryptic errors such as

No action for key "trainer.accelerator

See report on twitter:
https://x.com/4ndr3aR/status/1772676605837484054?s=20

image

This is because the LightningCLI parser is built after applying the fit stage is parsed, on only that can match the provided config file. In general, the order matters for jsonargparse for good reasons.

Is there something we can do to improve the error for the user, with a sanity check before parsing begins?

Repro example:

import sys

import torch
from lightning.pytorch import LightningModule
from torch.utils.data import DataLoader, Dataset

from lightning.pytorch.cli import LightningCLI


class RandomDataset(Dataset):
    def __init__(self, size, length):
        self.len = length
        self.data = torch.randn(length, size)

    def __getitem__(self, index):
        return self.data[index]

    def __len__(self):
        return self.len


class BoringModel(LightningModule):
    def __init__(self):
        super().__init__()
        self.layer = torch.nn.Linear(32, 2)

    def forward(self, x):
        return self.layer(x)

    def training_step(self, batch, batch_idx):
        return self(batch).sum()

    def configure_optimizers(self):
        return torch.optim.SGD(self.layer.parameters(), lr=0.1)

    def train_dataloader(self):
        return DataLoader(RandomDataset(32, 64), batch_size=2)


sys.argv = ["bug_report_model.py", "--config", "config.yaml", "fit"]
cli = LightningCLI(BoringModel)

cc @Borda @carmocca @mauvilsa

@awaelchli awaelchli added bug Something isn't working needs triage Waiting to be triaged by maintainers labels Mar 29, 2024
@awaelchli awaelchli added feature Is an improvement or enhancement lightningcli pl.cli.LightningCLI and removed needs triage Waiting to be triaged by maintainers labels Mar 29, 2024
@awaelchli awaelchli added this to the 2.3 milestone Mar 29, 2024
@awaelchli awaelchli self-assigned this Mar 29, 2024
@awaelchli
Copy link
Contributor Author

It also has the consequence that with LightningCLI(..., run=False) you can't provide a config file, and if you attempt you get:

error: Validation failed: No action for key "ckpt_path" to check its value.

@awaelchli
Copy link
Contributor Author

awaelchli commented Mar 29, 2024

The problem is I don't know how to make a smart error message here, because the config file could also contain a section fit and then it works. Making a smart error message here would require parsing the config file before making a decision. Would we want this approach?

@mauvilsa
Copy link
Contributor

One idea I can propose is to have a way to disable the --config option before the subcommand. It is true that a global config file with a fit: entry or any other subcommand. But I don't think that everyone needs or wants this option. Not being there avoids the possibility of confusion, since --config will only be accepted after the subcommand.

@awaelchli
Copy link
Contributor Author

Hey @mauvilsa thanks for the input. It's sort of what I tried to do in #19715 until I saw the tests failing that had a fit: entry. I think that even if we did that restriction, the only way for doing LightningCLI(..., run=False) with a config file would be to have the fit: entry so I'm not sure we can consider the option to remove this feature.

@mauvilsa
Copy link
Contributor

Actually what I meant was to have a parameter, e.g. multi_subcommand_config: bool = True that if set to False then the global --config is not added. By default it should be True, otherwise it would be a breaking change.

@carmocca
Copy link
Contributor

carmocca commented Apr 3, 2024

I wouldn't touch anything here. This is a feature (as already noticed) and I don't expect anybody to go ahead and set this strange flag that most won't understand why it exists.

I would say that a lot of the confusion stems from the error message presented by jsonargparse: "No action for key foo to check its value". @mauvilsa This of assumes that the user of the CLI understands how the CLI works (what is an action, what is a key, what is checking their values) but that's generally not true as most of the users are not the developers who implemented the CLI and the users have no idea about what's going on under the hood.

Could jsonargparse update or provide a way to customize this message? For the LightningCLI and the CLI in litgpt it would be better to show something like "Failed to parse the foo key in your config or arguments. Make sure the format matches that returned by --print_config"

@mauvilsa
Copy link
Contributor

mauvilsa commented Apr 4, 2024

Could jsonargparse update or provide a way to customize this message? For the LightningCLI and the CLI in litgpt it would be better to show something like "Failed to parse the foo key in your config or arguments. Make sure the format matches that returned by --print_config"

The confusion reported in this issue is about providing a config before or after the subcommand. The custom error message above does not help. In fact, --print_config can also be used before or after the subcommand, making still confusing.

I do agree that the error messages in jsonargparse can be improved, which is something generic, and not particular to lightning. And better to invest time on that, than supporting a way to customize error messages which also requires effort. One possibility could be to track the source of the problematic key, and print different error messages depending on it origin. E.g. if the problem is before the subcommand, the error could suggest the user to look at the help without providing a subcommand. And if it is after, then the error could suggest the help for that particular subcommand.

@awaelchli awaelchli modified the milestones: 2.3, future Jun 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working feature Is an improvement or enhancement lightningcli pl.cli.LightningCLI ver: 2.2.x
Projects
None yet
3 participants