Skip to content

incusd crashes when a message is written to console websocket #1449

Closed
@tsuzu

Description

@tsuzu

Required information

  • Distribution: Flatcar Linux 3941.0.0
  • Distribution version: 6.7
  • The output of "incus info" or if that fails:
incus info
config:
  cluster.https_address: <omitted>:8443
  core.https_address: :8443
  oidc.audience: <omitted>
  oidc.client.id: <omitted>
  oidc.issuer: <omitted>
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- network_sriov
- console
- restrict_dev_incus
- migration_pre_copy
- infiniband
- dev_incus_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- dev_incus_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- backup_compression
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- clustering_evacuation
- network_ovn_nat_address
- network_bgp
- network_forward
- custom_volume_refresh
- network_counters_errors_dropped
- metrics
- image_source_project
- clustering_config
- network_peer
- linux_sysctl
- network_dns
- ovn_nic_acceleration
- certificate_self_renewal
- instance_project_move
- storage_volume_project_move
- cloud_init
- network_dns_nat
- database_leader
- instance_all_projects
- clustering_groups
- ceph_rbd_du
- instance_get_full
- qemu_metrics
- gpu_mig_uuid
- event_project
- clustering_evacuation_live
- instance_allow_inconsistent_copy
- network_state_ovn
- storage_volume_api_filtering
- image_restrictions
- storage_zfs_export
- network_dns_records
- storage_zfs_reserve_space
- network_acl_log
- storage_zfs_blocksize
- metrics_cpu_seconds
- instance_snapshot_never
- certificate_token
- instance_nic_routed_neighbor_probe
- event_hub
- agent_nic_config
- projects_restricted_intercept
- metrics_authentication
- images_target_project
- images_all_projects
- cluster_migration_inconsistent_copy
- cluster_ovn_chassis
- container_syscall_intercept_sched_setscheduler
- storage_lvm_thinpool_metadata_size
- storage_volume_state_total
- instance_file_head
- instances_nic_host_name
- image_copy_profile
- container_syscall_intercept_sysinfo
- clustering_evacuation_mode
- resources_pci_vpd
- qemu_raw_conf
- storage_cephfs_fscache
- network_load_balancer
- vsock_api
- instance_ready_state
- network_bgp_holdtime
- storage_volumes_all_projects
- metrics_memory_oom_total
- storage_buckets
- storage_buckets_create_credentials
- metrics_cpu_effective_total
- projects_networks_restricted_access
- storage_buckets_local
- loki
- acme
- internal_metrics
- cluster_join_token_expiry
- remote_token_expiry
- init_preseed
- storage_volumes_created_at
- cpu_hotplug
- projects_networks_zones
- network_txqueuelen
- cluster_member_state
- instances_placement_scriptlet
- storage_pool_source_wipe
- zfs_block_mode
- instance_generation_id
- disk_io_cache
- amd_sev
- storage_pool_loop_resize
- migration_vm_live
- ovn_nic_nesting
- oidc
- network_ovn_l3only
- ovn_nic_acceleration_vdpa
- cluster_healing
- instances_state_total
- auth_user
- security_csm
- instances_rebuild
- numa_cpu_placement
- custom_volume_iso
- network_allocations
- zfs_delegate
- storage_api_remote_volume_snapshot_copy
- operations_get_query_all_projects
- metadata_configuration
- syslog_socket
- event_lifecycle_name_and_project
- instances_nic_limits_priority
- disk_initial_volume_configuration
- operation_wait
- image_restriction_privileged
- cluster_internal_custom_volume_copy
- disk_io_bus
- storage_cephfs_create_missing
- instance_move_config
- ovn_ssl_config
- certificate_description
- disk_io_bus_virtio_blk
- loki_config_instance
- instance_create_start
- clustering_evacuation_stop_options
- boot_host_shutdown_action
- agent_config_drive
- network_state_ovn_lr
- image_template_permissions
- storage_bucket_backup
- storage_lvm_cluster
- shared_custom_block_volumes
- auth_tls_jwt
- oidc_claim
- device_usb_serial
- numa_cpu_balanced
- image_restriction_nesting
- network_integrations
- instance_memory_swap_bytes
- network_bridge_external_create
- network_zones_all_projects
- storage_zfs_vdev
- container_migration_stateful
- profiles_all_projects
- instances_scriptlet_get_instances
- instances_scriptlet_get_cluster_members
- instances_scriptlet_get_project
- network_acl_stateless
- instance_state_started_at
- networks_all_projects
- network_acls_all_projects
- storage_buckets_all_projects
- resources_load
- instance_access
- project_access
- projects_force_delete
- resources_cpu_flags
- disk_io_bus_cache_filesystem
- instance_oci
- clustering_groups_config
- instances_lxcfs_per_instance
- clustering_groups_vm_cpu_definition
- disk_volume_subpath
- projects_limits_disk_pool
- network_ovn_isolated
- qemu_raw_qmp
- network_load_balancer_health_check
- oidc_scopes
- network_integrations_peer_name
- qemu_scriptlet
- instance_auto_restart
- storage_lvm_metadatasize
- ovn_nic_promiscuous
- ovn_nic_ip_address_none
- instances_state_os_info
- network_load_balancer_state
- instance_nic_macvlan_mode
- storage_lvm_cluster_create
- network_ovn_external_interfaces
- instances_scriptlet_get_instances_count
- cluster_rebalance
- custom_volume_refresh_exclude_older_snapshots
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
- oidc
auth_user_name: root
auth_user_method: unix
environment:
  addresses:
  - <omitted>
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    MIIB/zCCAYagAwIBAgIQWRCbU44hrisQmKspUyDV8jAKBggqhkjOPQQDAzAyMRkw
    FwYDVQQKExBMaW51eCBDb250YWluZXJzMRUwEwYDVQQDDAxyb290QHBhcHJpa2Ew
    HhcNMjQxMTI1MTg0ODUzWhcNMzQxMTIzMTg0ODUzWjAyMRkwFwYDVQQKExBMaW51
    eCBDb250YWluZXJzMRUwEwYDVQQDDAxyb290QHBhcHJpa2EwdjAQBgcqhkjOPQIB
    BgUrgQQAIgNiAASZI1i282jvdDzNX9aYzAw+cgPMVjH17A6a0VB/QQWaxn31W67h
    ee5CEkrGyFnyi6aiJJLuvpkJ39XWQgr5qEorIb9MgTsAmvJKTnBuZIMv4JQeChEi
    IGS/BbBJCCdlchWjYTBfMA4GA1UdDwEB/wQEAwIFoDATBgNVHSUEDDAKBggrBgEF
    BQcDATAMBgNVHRMBAf8EAjAAMCoGA1UdEQQjMCGCB3BhcHJpa2GHBH8AAAGHEAAA
    AAAAAAAAAAAAAAAAAAEwCgYIKoZIzj0EAwMDZwAwZAIwI/dyfbAzC8BrHmApElAt
    jLaZQoOQtrkfNQc7Njon+H6unHsjp8MO1hRDz/omhBE6AjA8XASjj6LD0hzMA4Pa
    ou1D30Yt9GrlVbt4bBbSdmKjqkjNzuCOzJC5fEQmrVNY7qU=
    -----END CERTIFICATE-----
  certificate_fingerprint: d87686e669cb458392f13f57e74da9f5c40d95b945514d61445daf4c18629d40
  driver: lxc | qemu
  driver_version: 6.0.2 | 9.0.3
  firewall: nftables
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    idmapped_mounts: "true"
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    seccomp_listener_continue: "true"
    uevent_injection: "true"
    unpriv_binfmt: "false"
    unpriv_fscaps: "true"
  kernel_version: 6.6.21-flatcar
  lxc_features:
    cgroup2: "true"
    core_scheduling: "true"
    devpts_fd: "true"
    idmapped_mounts_v2: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_allow_deny_syntax: "true"
    seccomp_notify: "true"
    seccomp_proxy_send_notify_fd: "true"
  os_name: Debian GNU/Linux
  os_version: "12"
  project: default
  server: incus
  server_clustered: true
  server_event_mode: full-mesh
  server_name: paprika
  server_pid: 215824
  server_version: "6.7"
  storage: lvm
  storage_version: 2.03.16(2) (2022-05-18) / 1.02.185 (2022-05-18) / 4.48.0
  storage_supported_drivers:
  - name: dir
    version: "1"
    remote: false
  - name: lvm
    version: 2.03.16(2) (2022-05-18) / 1.02.185 (2022-05-18) / 4.48.0
    remote: false
  - name: lvmcluster
    version: 2.03.16(2) (2022-05-18) / 1.02.185 (2022-05-18) / 4.48.0
    remote: true
  - name: btrfs
    version: "6.2"
    remote: false

Issue description

When I opened Text console of a VM/Container on Web UI and then moved to another pane like Configuration, incusd crashed with the following output:

panic: concurrent write to websocket connection

goroutine 12380 [running]:
github.com/gorilla/websocket.(*messageWriter).flushFrame(0xc00087de68, 0x1, {0x373a6c0?, 0xc00087de01?, 0xc00087de98?})
        /root/go/pkg/mod/github.com/gorilla/[email protected]/conn.go:617 +0x4b8
github.com/gorilla/websocket.(*Conn).WriteMessage(0xc0004f1340, 0x1fa654c?, {0x373a6c0, 0x0, 0x0})
        /root/go/pkg/mod/github.com/gorilla/[email protected]/conn.go:770 +0x126
github.com/lxc/incus/v6/shared/ws.(*wrapper).Close(0xc0010620c0)
        /build/incus/shared/ws/rwc.go:92 +0x17f
github.com/lxc/incus/v6/shared/ws.MirrorRead.func1()
        /build/incus/shared/ws/mirror.go:38 +0x21d
created by github.com/lxc/incus/v6/shared/ws.MirrorRead in goroutine 12377
        /build/incus/shared/ws/mirror.go:32 +0x1e5

or

panic: concurrent write to websocket connection

goroutine 2786 [running]:
github.com/gorilla/websocket.(*messageWriter).flushFrame(0xc000871d48, 0x1, {0xc001957116?, 0x0?, 0xc000119508?})
        /root/go/pkg/mod/github.com/gorilla/[email protected]/conn.go:617 +0x4b8
github.com/gorilla/websocket.(*Conn).WriteMessage(0xc0019109a0, 0x0?, {0xc001957116, 0x2, 0x2})
        /root/go/pkg/mod/github.com/gorilla/[email protected]/conn.go:770 +0x126
main.(*consoleWs).doConsole.func4()
        /build/incus/cmd/incusd/instance_console.go:324 +0x49
main.(*consoleWs).doConsole(0xc00104e1e0, 0x0?)
        /build/incus/cmd/incusd/instance_console.go:345 +0x4c2
main.(*consoleWs).Do(0xc00104e1e0, 0xc001178780)
        /build/incus/cmd/incusd/instance_console.go:202 +0x89
github.com/lxc/incus/v6/internal/server/operations.(*Operation).Start.func1(0xc001178780)
        /build/incus/internal/server/operations/operations.go:290 +0x26
created by github.com/lxc/incus/v6/internal/server/operations.(*Operation).Start in goroutine 2602
        /build/incus/internal/server/operations/operations.go:289 +0x105

Steps to reproduce

I can reproduce the problem by

  1. Open the incus web UI
  2. Open Text console of a container
  3. Move to another pane like Configuration
  4. Go back to Step 2

After some tries, incusd will crash.
However, I'm still looking for a way to reproduce this with CLI or a Go program.

Information to attach

  • [ ] Any relevant kernel output (dmesg)
  • [ ] Container log (incus info NAME --show-log)
  • [ ] Container configuration (incus config show NAME --expanded)
  • Main daemon log (at /var/log/incus/incusd.log)
    • attached above
  • [ ] Output of the client with --debug
  • Output of the daemon with --debug (alternatively output of incus monitor --pretty while reproducing the issue)
INFO   [2024-12-02T16:09:27Z] Action: instance-console, Source: /1.0/instances/testvm, Requestor: oidc/github|8574909 (172.71.24.104)  type=console
DEBUG  [2024-12-02T16:09:27Z] [paprika] Started mirroring websocket         address="172.71.24.99:59566"
DEBUG  [2024-12-02T16:09:27Z] [paprika] Websocket: Started read mirror      address="172.71.24.99:59566"
DEBUG  [2024-12-02T16:09:27Z] [paprika] Websocket: Started write mirror     address="172.71.24.99:59566"
INFO   [2024-12-02T16:09:27Z] Action: instance-console, Source: /1.0/instances/testvm, Requestor: oidc/github|8574909 (172.71.24.104)  type=console
DEBUG  [2024-12-02T16:09:27Z] [paprika] Handling API request                ip="162.158.119.126:25678" method=GET protocol=oidc url="/1.0/operations/6490c63e-db50-4f2c-909d-86b6285a363c/websocket?secret=044caddb2b7fae88fb9e9a96e96f94c7adbe8844d972bafd11910b1774d2588f" username="github|8574909"
DEBUG  [2024-12-02T16:09:27Z] [paprika] Connecting to operation             class=websocket description="Showing console" operation=6490c63e-db50-4f2c-909d-86b6285a363c project=default
DEBUG  [2024-12-02T16:09:27Z] [paprika] Connected to operation              class=websocket description="Showing console" operation=6490c63e-db50-4f2c-909d-86b6285a363c project=default
DEBUG  [2024-12-02T16:09:29Z] [paprika] Websocket: Stopped read mirror      address="172.71.24.99:59566" err="read unix:@->/run/incus/testvm/qemu.console: file already closed"
DEBUG  [2024-12-02T16:09:29Z] [paprika] Got error getting next reader: websocket: close 1005 (no status)
DEBUG  [2024-12-02T16:09:29Z] [paprika] Finished mirroring console to websocket  address="172.71.24.99:59566"
DEBUG  [2024-12-02T16:09:29Z] [paprika] Websocket: Sending barrier message  address="172.71.24.99:59566"
DEBUG  [2024-12-02T16:09:29Z] [paprika] Console control websocket finished
Error: websocket: close 1006 (abnormal closure): unexpected EOF

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions