Skip to content

Docker daemon and Containerd dockerd out of sync in 18.09 #421

Closed
@deft-code

Description

@deft-code
  • This is a bug report
  • This is a feature request
  • I searched existing issues before opening this one

We're seeing two bad behaviors. For some reason dockerd is failing (crashing?) when first installed. Second when dockerd crashes it is unable to restart due to the containerd task "dockerd" still running.

Expected behavior

apt-get install docker-ce
version 2:18.09.0ce0.4.tp4-0~debian installed
docker ps -aq
nothing
systemctl stop docker.service
success
systemctl is-active docker.service
inactive
docker info
fails
systemctl start docker.service
systemctl is-active docker.service
active

Actual behavior

docker ps -aq
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
systemctl is-active docker.service
failed
docker info
still works!
systemctl stop docker.service
systemctl start docker.service
systemctl is-active docker.service
activating (NOT activated, the daemon process doesn't exist yet)
/usr/bin/dockerd
container dockerd already has a running process
ctr -n docker tasks list
TASK PID STATUS
dockerd NNNNN RUNNING
ctr -n docker tasks kill dockerd
ctr -n docker tasks list
TASK PID STATUS
dockerd NNNNN STOPPED
systemctl is-active docker.service
activating
ctr -n docker tasks delete dockerd
systemctl is-active docker.service
active // The daemon successfully restarted once containerd was unblocked.

Steps to reproduce the behavior

On a clean vm install the latest docker-ce version
immediately try to use docker (in our case docker ps).
The socket is bad so we attempt to restart the daemon.

We can manually reproduce the problem killing the dockerd daemon with SIGKILL.
kill -9 <PID of /usr/bin/dockerd>

Output of docker version:

Client:
 Version:           18.09.0-ce-tp4
 API version:       1.39
 Go version:        go1.10.3
 Git commit:        33764aa
 Built:             Fri Aug 24 23:19:58 2018
 OS/Arch:           linux/amd64
 Experimental:      false
Server:
 Engine:
  Version:          18.09.0-ce-tp4
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       33764aa
  Built:            
  OS/Arch:          linux/amd64
  Experimental:     false

Output of docker info:

Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 0
Server Version: 18.09.0-ce-tp4
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc containerd
Default Runtime: containerd
Init Binary: docker-init
containerd version: 6f13ff3ea48a6bc2fb9b47c0acce24cf274dafd9 (expected: 468a545b9edcd5932818eb9de8e72413e616e86e)
runc version: 459bfaec1fc6c17d8bfb12d0a0f69e7e7271ed2a (expected: 69663f0bd4b60df09991c08812a60108003fa340)
init version: fec3683
Kernel Version: 4.9.0-8-amd64
Operating System: Debian GNU/Linux 9 (stretch)
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 3.617GiB
Name: docker-roundtrip-test-8e22218e2cfd48f1
ID: CTS2:JUUA:WELS:4TIL:HPJ3:4P2B:JVL5:SYCD:PS2I:DOJO:XHBA:MBXV
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.)

systemctl status docker.service
● docker.service - Docker Application Container Engine
   Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
   Active: activating (auto-restart) (Result: exit-code) since Tue 2018-08-28 18:10:06 UTC; 78ms ago
     Docs: https://docs.docker.com
  Process: 17679 ExecStart=/usr/bin/dockerd (code=exited, status=1/FAILURE)
  Process: 17673 ExecStartPre=/usr/libexec/containerd-offline-installer /var/lib/containerd-offline-installer/containerd-shim-process.tar docker.io/docker/containerd-shim-process (code=exited, status=0/SUCCESS)
 Main PID: 17679 (code=exited, status=1/FAILURE)
      CPU: 109ms

Aug 28 18:10:06 docker-roundtrip-test-8e22218e2cfd48f1 systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Aug 28 18:10:06 docker-roundtrip-test-8e22218e2cfd48f1 systemd[1]: docker.service: Unit entered failed state.
Aug 28 18:10:06 docker-roundtrip-test-8e22218e2cfd48f1 systemd[1]: docker.service: Failed with result 'exit-code'.
systemctl status containerd.service
● containerd.service - containerd container runtime
   Loaded: loaded (/lib/systemd/system/containerd.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2018-08-28 00:50:47 UTC; 17h ago
     Docs: https://containerd.io
 Main PID: 25324 (containerd)
    Tasks: 20 (limit: 4915)
   Memory: 170.6M
      CPU: 17min 36.789s
   CGroup: /system.slice/containerd.service
           ├─25324 /usr/bin/containerd
           └─26372 /opt/containerd/bin/containerd-shim-process-v1 -namespace docker -address /run/containerd/containerd.sock -publish-binary /usr/bin/containerd

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable
ctr -n docker tasks list
TASK       PID      STATUS    
dockerd    26382    RUNNING

I've attached the whole out of the commands to we when we encountered the problem. Much of the file is just noise. However You can can see that docker and containerd were not previously installed and that immediately after install docker commands could not find the socket.

If we manually recover the VM it works fine thereafter (e.g. we can't manually reproduce the issue). I suspect it there is something of a race between docker.service and containerd's dockerd task.

output.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions