Skip to content

Speed up pattern parsing in pathlib.Path.glob() #126363

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
barneygale opened this issue Nov 3, 2024 · 0 comments
Closed

Speed up pattern parsing in pathlib.Path.glob() #126363

barneygale opened this issue Nov 3, 2024 · 0 comments
Labels
performance Performance or resource usage topic-pathlib

Comments

@barneygale
Copy link
Contributor

barneygale commented Nov 3, 2024

pathlib.Path.glob() converts a given string pattern to a Path object:

if not isinstance(pattern, PurePath):
pattern = self.with_segments(pattern)

Which has the following drawbacks:

  • It instantiates a Path object, which is slow
  • It normalizes a Windows UNC drive if given, which is unnecessary - any anchor causes NotImplementedError to be raised.
  • It strips the trailing slash from the pattern, so we need to peek at pattern._raw_path to restore it

It would be better to add a variant of PurePath._parse_path() for dealing specifically with glob patterns.

Linked PRs

@barneygale barneygale added performance Performance or resource usage topic-pathlib labels Nov 3, 2024
barneygale added a commit to barneygale/cpython that referenced this issue Nov 3, 2024
The implementation of `Path.glob()` does rather a hacky thing: it calls
`self.with_segments()` to convert the given pattern to a `Path` object, and
then peeks at the private `_raw_path` attribute to see if pathlib removed a
trailing slash from the pattern.

In this patch, we make `glob()` use a new `_parse_pattern()` classmethod
that splits the pattern into parts while preserving information about any
trailing slash. This skips the cost of creating a `Path` object, and avoids
some path anchor normalization, which makes `Path.glob()` slightly faster.
But mostly it's about making the code less naughty.
barneygale added a commit that referenced this issue Nov 4, 2024
The implementation of `Path.glob()` does rather a hacky thing: it calls
`self.with_segments()` to convert the given pattern to a `Path` object, and
then peeks at the private `_raw_path` attribute to see if pathlib removed a
trailing slash from the pattern.

In this patch, we make `glob()` use a new `_parse_pattern()` classmethod
that splits the pattern into parts while preserving information about any
trailing slash. This skips the cost of creating a `Path` object, and avoids
some path anchor normalization, which makes `Path.glob()` slightly faster.
But mostly it's about making the code less naughty.

Co-authored-by: Tomas R. <[email protected]>
picnixz pushed a commit to picnixz/cpython that referenced this issue Dec 8, 2024
…ython#126364)

The implementation of `Path.glob()` does rather a hacky thing: it calls
`self.with_segments()` to convert the given pattern to a `Path` object, and
then peeks at the private `_raw_path` attribute to see if pathlib removed a
trailing slash from the pattern.

In this patch, we make `glob()` use a new `_parse_pattern()` classmethod
that splits the pattern into parts while preserving information about any
trailing slash. This skips the cost of creating a `Path` object, and avoids
some path anchor normalization, which makes `Path.glob()` slightly faster.
But mostly it's about making the code less naughty.

Co-authored-by: Tomas R. <[email protected]>
ebonnal pushed a commit to ebonnal/cpython that referenced this issue Jan 12, 2025
…ython#126364)

The implementation of `Path.glob()` does rather a hacky thing: it calls
`self.with_segments()` to convert the given pattern to a `Path` object, and
then peeks at the private `_raw_path` attribute to see if pathlib removed a
trailing slash from the pattern.

In this patch, we make `glob()` use a new `_parse_pattern()` classmethod
that splits the pattern into parts while preserving information about any
trailing slash. This skips the cost of creating a `Path` object, and avoids
some path anchor normalization, which makes `Path.glob()` slightly faster.
But mostly it's about making the code less naughty.

Co-authored-by: Tomas R. <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage topic-pathlib
Projects
None yet
Development

No branches or pull requests

1 participant