Description
Bug report
Bug description:
Hello,
I am currently debugging this issue.
I have noticed that the bug can be reproduced when the problematic file is truncated to 9 GiB B but it does not happen when truncated to 8 GiB.
The problem seems to be that the next member offset is computed wrong. It seems to point 512 B after the correct TAR header, which, in this case, points into the data for the extended attributes such as 30 mtime=1752348[...]
.
One of the differences seems to be this code part, which is not hit for the working case:
Lines 1562 to 1569 in 47b01da
While looking into the line above, i.e., into _apply_pax_info
, I noticed that there is no definite order for applying the size even though it can appear multiple times!
Lines 1615 to 1634 in 47b01da
In the non-working case, the PAX headers look like this:
{'GNU.sparse.major': '1',
'GNU.sparse.minor': '0',
'GNU.sparse.name': 'userdata',
'GNU.sparse.realsize': '9663676416',
'atime': '1752349406.975921575',
'ctime': '1752349534.57652562',
'mtime': '1752349534.57652562',
'size': '9602318848'}
I.e, the size member first gets set to GNU.sparse.realsize
and then to size
. The debug output looks like this:
[_apply_pax_info] SET SIZE to: 9663676416 from key: GNU.sparse.realsize
[_apply_pax_info] SET SIZE to: 9602318848 from key: size
[_apply_pax_info] SET key to: 1752349534.5765257 from key: mtime
Is it specified that the order of the PAX headers must always be this way? Else, one might just as well encounter it like this:
{'atime': '1752349406.975921575',
'ctime': '1752349534.57652562',
'mtime': '1752349534.57652562',
'size': '9602318848',
'GNU.sparse.major': '1',
'GNU.sparse.minor': '0',
'GNU.sparse.name': 'userdata',
'GNU.sparse.realsize': '9663676416'}
and either one of these orders would be a bug.
The working case does not have this ambiguity:
{'GNU.sparse.major': '1',
'GNU.sparse.minor': '0',
'GNU.sparse.name': 'userdata',
'GNU.sparse.realsize': '8589934592',
'atime': '1752349538.445543898',
'ctime': '1752351104.53673501',
'mtime': '1752351104.53673501'}
the debug output looks like this:
[_apply_pax_info] SET SIZE to: 8589934592 from key: GNU.sparse.realsize
[_apply_pax_info] SET key to: 1752351104.536735 from key: mtime
I.e., even if the is no ordering problem, there already are different semantics for the TarInfo.size
member as one will contain GNU.sparse.realsize
and the other will contain [PAXHeader.]size
.
CPython versions tested on:
CPython main branch
Operating systems tested on:
Linux
Linked PRs
Metadata
Metadata
Assignees
Projects
Status