Skip to content

Fix zfs_get_data access to files with wrong generation #11682

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 20, 2021

Conversation

tuxoko
Copy link
Contributor

@tuxoko tuxoko commented Mar 3, 2021

Motivation and Context

Description

If TX_WRITE is create on a file, and the file is later deleted and a new
directory is created on the same object id, it is possible that when
zil_commit happens, zfs_get_data will be called on the new directory.
This may result in panic as it tries to do range lock.

This patch fixes this issue by record the generation number during
zfs_log_write, so zfs_get_data can check if the object is valid.

Signed-off-by: Chunwei Chen [email protected]
Closes #10593

How Has This Been Tested?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@problame
Copy link
Contributor

problame commented Mar 3, 2021

Maybe a newb question because I am not familiar with the concept of ZPL generations but: What if the destroy + mkdir happens in the same txg? Wouldn't the generation number be the same?

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Mar 3, 2021
@tuxoko
Copy link
Contributor Author

tuxoko commented Mar 3, 2021

@problame
generation number is assigned at file creation.
In order to have 2 files on the same inode having the same gen number, you need create, remove and create happen on the same txg.
But regardless, remove and subsequent create can't happen on the same txg, because you can't create before remove is finished in the syncing context.

@adamdmoss
Copy link
Contributor

While I can't vouch for its correctness, I've been running with this PR for a week or two now with no apparent ill effects.

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job running this down. The fix itself makes good sense to me, I just wonder if we want to give it a more clear name since private2 is only ever used as a generation number.

If TX_WRITE is create on a file, and the file is later deleted and a new
directory is created on the same object id, it is possible that when
zil_commit happens, zfs_get_data will be called on the new directory.
This may result in panic as it tries to do range lock.

This patch fixes this issue by record the generation number during
zfs_log_write, so zfs_get_data can check if the object is valid.

Signed-off-by: Chunwei Chen <[email protected]>
Closes openzfs#10593

Change-Id: I07307002ad3e0a7de577bab487dc11c447645a83
@tuxoko tuxoko force-pushed the upstream_get_data_gen branch from 17ebb12 to a3494fa Compare March 19, 2021 22:14
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Mar 20, 2021
@behlendorf behlendorf merged commit 296a4a3 into openzfs:master Mar 20, 2021
jsai20 pushed a commit to jsai20/zfs that referenced this pull request Mar 30, 2021
If TX_WRITE is create on a file, and the file is later deleted and a new
directory is created on the same object id, it is possible that when
zil_commit happens, zfs_get_data will be called on the new directory.
This may result in panic as it tries to do range lock.

This patch fixes this issue by record the generation number during
zfs_log_write, so zfs_get_data can check if the object is valid.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes openzfs#10593
Closes openzfs#11682
adamdmoss pushed a commit to adamdmoss/zfs that referenced this pull request Apr 10, 2021
If TX_WRITE is create on a file, and the file is later deleted and a new
directory is created on the same object id, it is possible that when
zil_commit happens, zfs_get_data will be called on the new directory.
This may result in panic as it tries to do range lock.

This patch fixes this issue by record the generation number during
zfs_log_write, so zfs_get_data can check if the object is valid.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes openzfs#10593
Closes openzfs#11682
sempervictus pushed a commit to sempervictus/zfs that referenced this pull request May 31, 2021
If TX_WRITE is create on a file, and the file is later deleted and a new
directory is created on the same object id, it is possible that when
zil_commit happens, zfs_get_data will be called on the new directory.
This may result in panic as it tries to do range lock.

This patch fixes this issue by record the generation number during
zfs_log_write, so zfs_get_data can check if the object is valid.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes openzfs#10593
Closes openzfs#11682
@scineram
Copy link

scineram commented Jun 1, 2021

I can't find this in 2.0.5-staging. @tonyhutter

tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Jun 3, 2021
If TX_WRITE is create on a file, and the file is later deleted and a new
directory is created on the same object id, it is possible that when
zil_commit happens, zfs_get_data will be called on the new directory.
This may result in panic as it tries to do range lock.

This patch fixes this issue by record the generation number during
zfs_log_write, so zfs_get_data can check if the object is valid.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes openzfs#10593
Closes openzfs#11682
@tonyhutter
Copy link
Contributor

@scineram thanks, I just pushed it into my zfs-2.0.5 patchset (#12182)

tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Jun 10, 2021
If TX_WRITE is create on a file, and the file is later deleted and a new
directory is created on the same object id, it is possible that when
zil_commit happens, zfs_get_data will be called on the new directory.
This may result in panic as it tries to do range lock.

This patch fixes this issue by record the generation number during
zfs_log_write, so zfs_get_data can check if the object is valid.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes openzfs#10593
Closes openzfs#11682
tonyhutter pushed a commit that referenced this pull request Jun 23, 2021
If TX_WRITE is create on a file, and the file is later deleted and a new
directory is created on the same object id, it is possible that when
zil_commit happens, zfs_get_data will be called on the new directory.
This may result in panic as it tries to do range lock.

This patch fixes this issue by record the generation number during
zfs_log_write, so zfs_get_data can check if the object is valid.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes #10593
Closes #11682
tonyhutter pushed a commit that referenced this pull request Jun 23, 2021
If TX_WRITE is create on a file, and the file is later deleted and a new
directory is created on the same object id, it is possible that when
zil_commit happens, zfs_get_data will be called on the new directory.
This may result in panic as it tries to do range lock.

This patch fixes this issue by record the generation number during
zfs_log_write, so zfs_get_data can check if the object is valid.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes #10593
Closes #11682
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consistent General Protection Fault; zfs_rangelock_enter() -> avl_nearest() -> avl_walk()
7 participants