Skip to content

Two dRAID spares for one vdev #16547

@tonyhutter

Description

@tonyhutter

System information

Type Version/Name
Distribution Name RHEL
Distribution Version 8.10
Kernel Version 4.18
Architecture x86-64
OpenZFS Version 2.2.4

Describe the problem you're observing

We've seen cases where two spares were assigned to the same failed vdev:

     NAME                  STATE     READ WRITE CKSUM
     tank20                DEGRADED     0     0     0
       draid2:8d:90c:2s-0  DEGRADED     0     0     0
         L0                ONLINE       0     0     0
         L1                ONLINE       0     0     0
         L2                ONLINE       0     0     0
         L3                ONLINE       0     0     0
         L4                ONLINE       0     0     0
         L5                ONLINE       0     0     0
         spare-6           DEGRADED     0     0 13.2K
           replacing-0     DEGRADED     0     0     0
             spare-0       DEGRADED     0     0     0
               L6/old      FAULTED      0     0     0  external device fault
               draid2-0-1  ONLINE       0     0     0
             L6            ONLINE       0     0     0
           draid2-0-0      ONLINE       0     0     0
         L7                ONLINE       0     0     0 

Detaching the spares got the pools back to being healthy again. Here is the procedure our admins used to get the pool back to normal:

1. zpool detach <GUID of L6/old>
    1. It detached, but still left with 2 ONLINE spares
2. zpool detach draid2-0-1
    1. Spare detached and the good L6 decreased one indentation level
       but draid2-0-0 didn't auto-detach
3. zpool detach draid2-0-0
    1. Spare detached leaving everything looking normal
4. Started a scrub 

Describe how to reproduce the problem

We will need to develop a test case to reproduce this. I think it would be roughly:

  1. Create a dRAID pool with 2 spares.
  2. Fault one of the disks, call it disk1
  3. Let the dRAID spare kick in.
  4. Replace disk1 with a new disk, called disk1-new
  5. While it's resilvering to disk1-new, fault disk1-new.
  6. See if the 2nd spare kicks in

Include any warning/errors/backtraces from the system logs

Metadata

Metadata

Assignees

Labels

Type: DefectIncorrect behavior (e.g. crash, hang)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions