-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
Tarfile is unnecessarily slow #121267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
performance
Performance or resource usage
stdlib
Python modules in the Lib dir
type-feature
A feature request or enhancement
Comments
jforberg
added a commit
to jforberg/cpython
that referenced
this issue
Jul 2, 2024
Tarfile in the default write mode spends much of its time resolving UIDs into usernames and GIDs into group names. By caching these mappings, a significant speedup can be achieved. In my simple benchmark[1], this extra caching speeds up tarfile by 8x. [1] https://gist.github.com/jforberg/86af759c796199740c31547ae828aef2
jforberg
added a commit
to jforberg/cpython
that referenced
this issue
Jul 2, 2024
Tarfile in the default write mode spends much of its time resolving UIDs into usernames and GIDs into group names. By caching these mappings, a significant speedup can be achieved. In my simple benchmark[1], this extra caching speeds up tarfile by 8x. [1] https://gist.github.com/jforberg/86af759c796199740c31547ae828aef2
Could you paste your cProfile output? I was a bit surprised that most of the time is spent on reading the pwd and grp. |
@gaogaotiantian Here is the output: |
jforberg
added a commit
to jforberg/cpython
that referenced
this issue
Jul 3, 2024
Tarfile in the default write mode spends much of its time resolving UIDs into usernames and GIDs into group names. By caching these mappings, a significant speedup can be achieved. In my simple benchmark[1], this extra caching speeds up tarfile by 8x. [1] https://gist.github.com/jforberg/86af759c796199740c31547ae828aef2
jforberg
added a commit
to jforberg/cpython
that referenced
this issue
Jul 3, 2024
Tarfile in the default write mode spends much of its time resolving UIDs into usernames and GIDs into group names. By caching these mappings, a significant speedup can be achieved. In my simple benchmark[1], this extra caching speeds up tarfile by 8x. [1] https://gist.github.com/jforberg/86af759c796199740c31547ae828aef2
hauntsaninja
added a commit
that referenced
this issue
Oct 30, 2024
Tarfile in the default write mode spends much of its time resolving UIDs into usernames and GIDs into group names. By caching these mappings, a significant speedup can be achieved. In my simple benchmark[1], this extra caching speeds up tarfile by 8x. [1] https://gist.github.com/jforberg/86af759c796199740c31547ae828aef2 --------- Co-authored-by: Tian Gao <[email protected]> Co-authored-by: Bénédikt Tran <[email protected]> Co-authored-by: Shantanu <[email protected]>
Recategorizing this issue as per #121269 (comment). |
picnixz
added a commit
to picnixz/cpython
that referenced
this issue
Dec 8, 2024
…on#121269) Tarfile in the default write mode spends much of its time resolving UIDs into usernames and GIDs into group names. By caching these mappings, a significant speedup can be achieved. In my simple benchmark[1], this extra caching speeds up tarfile by 8x. [1] https://gist.github.com/jforberg/86af759c796199740c31547ae828aef2 --------- Co-authored-by: Tian Gao <[email protected]> Co-authored-by: Bénédikt Tran <[email protected]> Co-authored-by: Shantanu <[email protected]>
ebonnal
pushed a commit
to ebonnal/cpython
that referenced
this issue
Jan 12, 2025
…on#121269) Tarfile in the default write mode spends much of its time resolving UIDs into usernames and GIDs into group names. By caching these mappings, a significant speedup can be achieved. In my simple benchmark[1], this extra caching speeds up tarfile by 8x. [1] https://gist.github.com/jforberg/86af759c796199740c31547ae828aef2 --------- Co-authored-by: Tian Gao <[email protected]> Co-authored-by: Bénédikt Tran <[email protected]> Co-authored-by: Shantanu <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
performance
Performance or resource usage
stdlib
Python modules in the Lib dir
type-feature
A feature request or enhancement
Bug report
Bug description:
There is room for improvement in tarfile write performance. In a simple benchmark I find that tarfile spends most of its time doing repeated user name/group name queries.
https://gist.github.com/jforberg/86af759c796199740c31547ae828aef2
CPython versions tested on:
CPython main branch
Operating systems tested on:
Linux
Linked PRs
The text was updated successfully, but these errors were encountered: