Skip to content

Panic in spa_vdev_remove_cancel_sync() on metaslab with empty spacemap #17359

@fuporovvStack

Description

@fuporovvStack

System information

Type Version/Name
Distribution Name FreeBSD
Distribution Version 15
Kernel Version
Architecture x86_64
OpenZFS Version master

I got kernel panic in spa_vdev_remove_cancel_sync() function during ZFS testsuite running.

Last successful test:

/home/user/Sources/zfs/tests/zfs-tests/tests/functional/cli_root/zpool_wait/zpool_wait_remove (run as root) [00:17] [PASS]

FreeBSD kgdb output:

(kgdb) bt
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57
#1  doadump (textdump=textdump@entry=0) at /usr/src/sys/kern/kern_shutdown.c:404
#2  0xffffffff804a2a3a in db_dump (dummy=<optimized out>, dummy2=<optimized out>, dummy3=<optimized out>, dummy4=<optimized out>) at /usr/src/sys/ddb/db_command.c:596
#3  0xffffffff804a282d in db_command (last_cmdp=<optimized out>, cmd_table=<optimized out>, dopager=true) at /usr/src/sys/ddb/db_command.c:508
#4  0xffffffff804a24ed in db_command_loop () at /usr/src/sys/ddb/db_command.c:555
#5  0xffffffff804a5ec6 in db_trap (type=<optimized out>, code=<optimized out>) at /usr/src/sys/ddb/db_main.c:267
#6  0xffffffff80ba4edf in kdb_trap (type=type@entry=3, code=code@entry=0, tf=tf@entry=0xfffffe01c7d6f870) at /usr/src/sys/kern/subr_kdb.c:790
#7  0xffffffff8108b5db in trap (frame=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:608
#8  <signal handler called>
#9  kdb_enter (why=<optimized out>, msg=<optimized out>) at /usr/src/sys/kern/subr_kdb.c:556
#10 0xffffffff80b54ecb in vpanic (fmt=0xffffffff811ffa3b "%s", ap=ap@entry=0xfffffe01c7d6faa0) at /usr/src/sys/kern/kern_shutdown.c:967
#11 0xffffffff80b54d33 in panic (fmt=0xffffffff81b9c3a0 <cnputs_mtx> "\276\277\026\201\377\377\377\377") at /usr/src/sys/kern/kern_shutdown.c:892
#12 0xffffffff8108c0b6 in trap_fatal (frame=<optimized out>, eva=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:960
#13 0xffffffff8108c0b6 in trap_pfault (frame=0xfffffe01c7d6fb40, usermode=false, signo=<optimized out>, ucode=<optimized out>)
#14 <signal handler called>
#15 spa_vdev_remove_cancel_sync (arg=<optimized out>, tx=0xfffff80784d74d00) at /home/user/Sources/zfs/module/zfs/vdev_removal.c:1935
#16 0xffffffff82b377f5 in dsl_sync_task_sync (dst=0xfffffe0149096a50, tx=tx@entry=0xfffff80784d74d00) at /home/user/Sources/zfs/module/zfs/dsl_synctask.c:256
#17 0xffffffff82b250ab in dsl_pool_sync (dp=dp@entry=0xfffff800af145000, txg=txg@entry=28) at /home/user/Sources/zfs/module/zfs/dsl_pool.c:853
#18 0xffffffff82b6d3ab in spa_sync_iterate_to_convergence (spa=0xfffffe01c1ce3000, tx=0xfffff8078b720600) at /home/user/Sources/zfs/module/zfs/spa.c:10013
#19 spa_sync (spa=spa@entry=0xfffffe01c1ce3000, txg=txg@entry=28) at /home/user/Sources/zfs/module/zfs/spa.c:10261
#20 0xffffffff82b889dd in txg_sync_thread (arg=arg@entry=0xfffff800af145000) at /home/user/Sources/zfs/module/zfs/txg.c:601
#21 0xffffffff80b082b2 in fork_exit (callout=0xffffffff82b88530 <txg_sync_thread>, arg=0xfffff800af145000, frame=0xfffffe01c7d6ff40) at /usr/src/sys/kern/kern_fork.c:1152
#22 <signal handler called>
(kgdb) frame 15
#15 spa_vdev_remove_cancel_sync (arg=<optimized out>, tx=0xfffff80784d74d00) at /home/user/Sources/zfs/module/zfs/vdev_removal.c:1935
1935			    msp->ms_sm->sm_size;
(kgdb) print *msp
$1 = {ms_lock = {lock_object = {lo_name = 0xffffffff82d4a39c <.L.str.48+1> "ms->ms_lock", lo_flags = 577961984, lo_data = 0, lo_witness = 0x0}, sx_lock = 1}, ms_sync_lock = {lock_object = {lo_name = 0xffffffff82d5bf7c <.L.str.49+1> "ms->ms_sync_lock", lo_flags = 577961984, lo_data = 0, lo_witness = 0x0}, sx_lock = 1}, ms_load_cv = {
    cv_description = 0xffffffff82d2dffb <.L.str.50+1> "ms->ms_load_cv", cv_waiters = 0}, ms_sm = 0x0, ms_id = 0, ms_start = 0, ms_size = 536870912, ms_fragmentation = 0, ms_allocating = {0xfffff807a367bc00, 0xfffff807f1da1400, 0xfffff807e801dc00, 0xfffff80800280c00}, ms_allocatable = 0xfffff807e7a04c00, ms_allocated_this_txg = 0, ms_allocating_total = 0, 
  ms_freeing = 0xfffff807e5259c00, ms_freed = 0xfffff807ee4f7000, ms_defer = {0xfffff807e5194c00, 0xfffff807f1356000}, ms_checkpointing = 0xfffff807d50df000, ms_trim = 0xfffff80011fe4800, ms_condensing = 0, ms_condense_wanted = 0, ms_disabled = 0, ms_loaded = 1, ms_loading = 0, ms_flush_cv = {cv_description = 0xffffffff82d3fd9e <.L.str.51+1> "ms->ms_flush_cv", cv_waiters = 0}, 
  ms_flushing = 0, ms_synchist = {0 <repeats 32 times>}, ms_deferhist = {{0 <repeats 32 times>}, {0 <repeats 32 times>}}, ms_allocated_space = 0, ms_deferspace = 0, ms_weight = 522417556774977537, ms_activation_weight = 0, ms_selected_txg = 7, ms_load_time = 10260584374358, ms_unload_time = 0, ms_selected_time = 1747813153, ms_alloc_txg = 0, ms_max_size = 536870912, 
  ms_allocator = -1, ms_primary = 0, ms_allocatable_by_size = {bt_compar = 0xffffffff82b4c120 <metaslab_rangesize32_compare>, bt_find_in_buf = 0xffffffff82b4c160 <metaslab_rt_find_rangesize32_in_buf>, bt_elem_size = 8, bt_leaf_size = 4096, bt_leaf_cap = 510, bt_height = 0, bt_num_elems = 1, bt_num_nodes = 1, bt_root = 0xfffff800af21b000, bt_bulk = 0xfffff800af21b000}, 
  ms_unflushed_frees_by_size = {bt_compar = 0xffffffff82b4c120 <metaslab_rangesize32_compare>, bt_find_in_buf = 0xffffffff82b4c160 <metaslab_rt_find_rangesize32_in_buf>, bt_elem_size = 8, bt_leaf_size = 4096, bt_leaf_cap = 510, bt_height = -1, bt_num_elems = 0, bt_num_nodes = 0, bt_root = 0x0, bt_bulk = 0x0}, ms_lbas = {0 <repeats 64 times>}, ms_group = 0xfffff807d483f000, 
  ms_group_node = {avl_child = {0x0, 0x0}, avl_pcb = 1}, ms_txg_node = {tn_next = {0x0, 0x0, 0x0, 0x0}, tn_member = "\000\000\000"}, ms_spa_txg_node = {avl_child = {0x0, 0x0}, avl_pcb = 0}, ms_class_txg_node = {list_next = 0xfffff8003cd86758, list_prev = 0xfffff801ed489228}, ms_unflushed_allocs = 0xfffff807f289e000, ms_unflushed_frees = 0xfffff807114d7000, ms_unflushed_txg = 0, 
  ms_unflushed_dirty = 0, ms_synced_length = 0, ms_new = 0}

As could be seen, the msp->ms_sm is 0x0. Looks like the condition does not handle the case, wheh msp->ms_sm == NULL.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: DefectIncorrect behavior (e.g. crash, hang)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions