-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Labels
Type: DefectIncorrect behavior (e.g. crash, hang)Incorrect behavior (e.g. crash, hang)
Description
System information
Type | Version/Name |
---|---|
Distribution Name | RHEL |
Distribution Version | 8.10 |
Kernel Version | 4.18 |
Architecture | x86-64 |
OpenZFS Version | 2.2.4 |
Describe the problem you're observing
We've seen cases where two spares were assigned to the same failed vdev:
NAME STATE READ WRITE CKSUM
tank20 DEGRADED 0 0 0
draid2:8d:90c:2s-0 DEGRADED 0 0 0
L0 ONLINE 0 0 0
L1 ONLINE 0 0 0
L2 ONLINE 0 0 0
L3 ONLINE 0 0 0
L4 ONLINE 0 0 0
L5 ONLINE 0 0 0
spare-6 DEGRADED 0 0 13.2K
replacing-0 DEGRADED 0 0 0
spare-0 DEGRADED 0 0 0
L6/old FAULTED 0 0 0 external device fault
draid2-0-1 ONLINE 0 0 0
L6 ONLINE 0 0 0
draid2-0-0 ONLINE 0 0 0
L7 ONLINE 0 0 0
Detaching the spares got the pools back to being healthy again. Here is the procedure our admins used to get the pool back to normal:
1. zpool detach <GUID of L6/old>
1. It detached, but still left with 2 ONLINE spares
2. zpool detach draid2-0-1
1. Spare detached and the good L6 decreased one indentation level
but draid2-0-0 didn't auto-detach
3. zpool detach draid2-0-0
1. Spare detached leaving everything looking normal
4. Started a scrub
Describe how to reproduce the problem
We will need to develop a test case to reproduce this. I think it would be roughly:
- Create a dRAID pool with 2 spares.
- Fault one of the disks, call it disk1
- Let the dRAID spare kick in.
- Replace disk1 with a new disk, called disk1-new
- While it's resilvering to disk1-new, fault disk1-new.
- See if the 2nd spare kicks in
Include any warning/errors/backtraces from the system logs
Metadata
Metadata
Assignees
Labels
Type: DefectIncorrect behavior (e.g. crash, hang)Incorrect behavior (e.g. crash, hang)