Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escape template filenames in glob patterns #16407

Merged
merged 28 commits into from
Mar 3, 2025
Merged

Escape template filenames in glob patterns #16407

merged 28 commits into from
Mar 3, 2025

Conversation

ntBre
Copy link
Contributor

@ntBre ntBre commented Feb 26, 2025

Summary

Fixes #9381. This PR fixes errors like

Cause: error parsing glob '/Users/me/project/{{cookiecutter.project_dirname}}/__pycache__': nested alternate groups are not allowed

caused by glob special characters in filenames like {{cookiecutter.project_dirname}}. When the user is matching that directory exactly, they can use the workaround given by #7959 (comment), but that doesn't work for a nested config file with relative paths. For example, the directory tree in the reproduction repo linked here:

.
├── README.md
├── hello.py
├── pyproject.toml
├── uv.lock
└── {{cookiecutter.repo_name}}
    ├── main.py
    ├── pyproject.toml
    └── tests
        └── maintest.py

where the inner pyproject.toml contains a relative glob:

[tool.ruff.lint.per-file-ignores]
"tests/*" = ["F811"]

Test Plan

A new CLI test in both the linter and formatter. The formatter test may not be necessary because I didn't have to modify any additional code to pass it, but the original report mentioned both check and format, so I wanted to be sure both were fixed.

@ntBre ntBre added the bug Something isn't working label Feb 26, 2025
Copy link
Contributor

github-actions bot commented Feb 26, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@ntBre
Copy link
Contributor Author

ntBre commented Feb 26, 2025

I'm not sure what's going wrong on Windows. It looks like the tempdir_filter isn't working. I force pushed back over it, but I even tried removing my second filter here and the temp path is still wrong.

@MichaReiser
Copy link
Member

I'm not sure what's going wrong on Windows. It looks like the tempdir_filter isn't working. I force pushed back over it, but I even tried removing my second filter here and the temp path is still wrong.

Yeah, the filters are brutal on windows. Can you try what we have in

let mut settings = insta::Settings::clone_current();
settings.add_filter(&tempdir_filter(&project_dir), "<temp_dir>/");
settings.add_filter(r#"\\(\w\w|\s|\.|")"#, "/$1");

Comment on lines 658 to 665
Err(e) if *e.kind() == globset::ErrorKind::NestedAlternates => {
let new = s.replace("{{", "[{][{]").replace("}}", "[}][}]");
log::warn!(
"Error parsing original glob: `{s:?}`, trying with escaped braces: `{new:?}`"
);
Ok(Glob::new(&new)?)
}
glob => Ok(glob?),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This, indeed, feels somewhat hacky :D

I think my main concern with this is that it replaces nested alternates anywhere in the glob -- even in the part that the user provided.

What I understand is that the problem is mainly about {{ and }} in the base path up to the project. Could we escape the {{ and }} in the project's base-path only instead of applying this fix to the entire glob?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, although the nested alternates will always cause an error even if the user provided them. I thought it might be nice to allow them to avoid manual escaping in that case, but at least they can control that directly, unlike the base paths.

We could also add the escapes to the whole path just before creating the glob. I think that version would also look less hacky than catching this error and retrying, but then we also do the work of trying to replace each time. That was the main reason I tried to keep the replacement in the error case, but we're still only compiling the globs once, so it's probably not a big deal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, although the nested alternates will always cause an error even if the user provided them. I thought it might be nice to allow them to avoid manual escaping in that case, but at least they can control that directly, unlike the base paths.

The problem I see with this is that they may have wanted to use nested alternates and are now surprised that the glob doesn't match anything.

@ntBre
Copy link
Contributor Author

ntBre commented Feb 27, 2025

Just pushing a test for Windows, I haven't addressed the hacky path stuff yet.

@ntBre
Copy link
Contributor Author

ntBre commented Feb 27, 2025

It looks like the regex escapes in tempdir_filter are causing problems, but it works for all of the other tests using this...

C:\/Users\/RUNNER~1\/AppData\/Local\/Temp\/.tmpCazbkF\\{{cookiecutter.repo_name}}\/tests\\*

OH, maybe it's the cookiecutter causing problems again. The escapes for the braces might be getting in the way of the tempdir replacement. Let me try nesting it one level deeper.

@MichaReiser
Copy link
Member

You can try to use regex::escape to avoid regex problems

@ntBre
Copy link
Contributor Author

ntBre commented Feb 27, 2025

tempdir_filter uses regex::escape already, unless I'm misunderstanding. My current guess is that the escapes added by regex::escape are disrupting the TMP replacement because they end up right next to the tempdir.

C:\/Users\/RUNNER~1\/AppData\/Local\/Temp\/.tmpCazbkF\\{{cookiecutter.repo_name}}\/tests\\*
                                                     ^^^ this looks suspicious to me

Copy link
Member

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look at this!

I think the thing I'm not understanding here is why nested alternates are the only problem here? Isn't the problem that we're using arbitrary input to build a glob? And shouldn't that imply the input (in this case, a file path) should always be escaped? Like, the nested alternates is just one way for this problem to manifest. But what if a file path had a * or a [ in it? We would want to escape it there too.

So I guess, can we fix this by just always escaping the literal inputs to a glob?

pub fn try_glob_new(s: &str) -> Result<Glob> {
match Glob::new(s) {
Err(e) if *e.kind() == globset::ErrorKind::NestedAlternates => {
let new = s.replace("{{", "[{][{]").replace("}}", "[}][}]");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also not technically right in all cases I think, e.g., [{{], although that's pretty pathological.

@ntBre
Copy link
Contributor Author

ntBre commented Feb 27, 2025

Thanks for the review! I failed to notice the globset::escape function. I think we can just use that on the base part of the path that we're adding to the glob pattern. That will be much more robust than my manual escaping here and also leave the user's glob alone in case that's what they intended.

@ntBre
Copy link
Contributor Author

ntBre commented Feb 27, 2025

This is now a much simpler fix, thanks again for the reviews and suggestions! I also removed the warning, which was the source of the path in the snapshot, so hopefully the Windows tests will pass too.

I realized it was also possible to trigger this for the non-absolute path, although in a pretty niche case. The test I added extends per-file-ignores with the working directory set to tempdir/{{cookiecutter.repo_name}}. I peeked at the implementation of Path::absolutize and it just calls Path::absolutize_from with CWD (via the private get_cwd! macro), so I think the change here should be safe.

@ntBre
Copy link
Contributor Author

ntBre commented Feb 27, 2025

I was checking for the other uses of Glob::new, and I think this could also be a problem for the FilePath::User variant.

#[derive(Debug, Clone, CacheKey, PartialEq, PartialOrd, Eq, Ord)]
pub enum FilePattern {
Builtin(&'static str),
User(String, PathBuf),
}

I'll start adding the escape calls for it too.

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this looks great

Ok(Self::User(pattern, absolute))
Ok(Self::User(
s.to_string(),
GlobPath::normalize(s, &*path_dedot::CWD),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we introduce a cwd helper function in fs. WASM doesn't like path_dedot::CWD

pub fn relativize_path<P: AsRef<Path>>(path: P) -> String {
let path = path.as_ref();
#[cfg(target_arch = "wasm32")]
let cwd = Path::new(".");
#[cfg(not(target_arch = "wasm32"))]
let cwd = path_absolutize::path_dedot::CWD.as_path();
if let Ok(path) = path.strip_prefix(cwd) {
return format!("{}", path.display());
}
format!("{}", path.display())
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good idea, I factored this out into a get_cwd function and call it everywhere in the linter. I left the calls in the ruff and ruff_workspace crates alone for now. Should I replace those too?

None => fs::normalize_path(path),
};

let project_root = project_root.unwrap_or(&path_dedot::CWD);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, let's avoid spreading path_dedot::CWD over our code base and instead use a helper to maximize WASM compatibility

Copy link
Member

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ntBre ntBre merged commit d93ed29 into main Mar 3, 2025
21 checks passed
@ntBre ntBre deleted the brent/globbing branch March 3, 2025 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ruff format or ruff check returns "nested alternate groups are not allowed" error
3 participants