[New Scheduler] Container unpausing results in key remaining in etcd for deleted container

This is consistently reproducible for me now. When a container is unpaused, the keys are first unwatched and possibly deleted before the unpause operation completes. At the same time as the unpause operation, a container remove can happen (still trying to figure out what triggers this, but my theory was that unwatching the container keys results in something watching it request to delete the container). When the unpause operation happens, it seems that no matter the result of the operation the future that requested it to be unpaused will always be returned successful. This results in the etcd key for the container being re-written as a part of the `.map` after the unpause future completes. Since the container was deleted, this key is now no longer tracked and will live indefinitely unless the component is restarted. This is a critical bug for me moving forward with the scheduler since it's so easily reproducible and can block activations from processing for an affected action if the memory queue still thinks a container is running when it's not.

Relevant logs (a couple of them I added locally to follow the exact code path better):

```
[2022-09-14T00:00:12.526Z] [INFO] [#tid_sid_unknown] [ContainerManager] Choose a warmed container whisk/warmed

[2022-09-14T00:00:12.580Z] [INFO] [#tid_sid_unknown] [FunctionPullingContainerPool] Choose warmed container

[2022-09-14T00:00:12.581Z] [INFO] [#tid_sid_containerCreation] [DockerClientWithFileAccess] running /usr/bin/docker unpause ca688d9b746b1da6a992e8c43e03a6df44a56608ec20ce3cccff07dd0940a22d (timeout: 10 seconds)

#these two unwatch logs are from above where I said the unregister data can happen before the register #data when the unpause completes
[2022-09-14T00:00:12.581Z] [INFO] [#tid_sid_unknown] [WatcherService] unwatch endpoint: UnwatchEndpoint(whisk/warmed

[2022-09-14T00:00:12.581Z] [INFO] [#tid_sid_unknown] [WatcherService] unwatch endpoint: UnwatchEndpoint(whisk/namespace

[2022-09-14T00:00:12.581Z] [INFO] [#tid_sid_invokerNanny] [DockerClientWithFileAccess] running /usr/bin/docker rm -f ca688d9b746b1da6a992e8c43e03a6df44a56608ec20ce3cccff07dd0940a22d (timeout: 1 minute)

#added this log locally indicating the etcd key is removed
[2022-09-14T00:00:12.585Z] [INFO] [#tid_sid_unknown] [EtcdWorker] successfully removed whisk/warmed

[2022-09-14T00:00:12.588Z] [INFO] [#tid_sid_unknown] [EtcdWorker] successfully removed whisk/namespace

[2022-09-14T00:00:12.595Z] [INFO] [#tid_DgxZq5kjsHIwixkSsI5onK8zYSgpOkxM] [MemoryQueue] Remove containerId because ca688d9b746b1da6a992e8c43e03a6df44a56608ec20ce3cccff07dd0940a22d is not alive

#unpause operation completes
[2022-09-14T00:00:12.705Z] [INFO] [#tid_sid_containerCreation] [FunctionPullingContainerProxy] Resumed container ContainerId(ca688d9b746b1da6a992e8c43e03a6df44a56608ec20ce3cccff07dd0940a22d)

[2022-09-14T00:00:12.706Z] [INFO] [#tid_sid_unknown] [WatcherService] watch endpoint: WatchEndpoint(whisk/namespace

#key is added back to etcd for deleted container
[2022-09-14T00:00:12.711Z] [INFO] [#tid_sid_unknown] [EtcdWorker] successfully stored whisk/namespace

#added this log locally to denote when the memory queue sees an existing container and adds it to #its existing set
[2022-09-14T00:00:12.711Z] [INFO] [#tid_sid_unknown] [MemoryQueue] [redacted:Running] start tracking container ca688d9b746b1da6a992e8c43e03a6df44a56608ec20ce3cccff07dd0940a22d

[2022-09-14T00:11:09.557Z] [INFO] [#tid_sid_unknown] [MemoryQueue] [redacted:Running] The queue is timed out but there are still 0 activation messages or (running: 1 -> Set(ca688d9b746b1da6a992e8c43e03a6df44a56608ec20ce3cccff07dd0940a22d), in-progress: 0 -> Set()) containers
```

Relevant code:

```
  when(Paused) {
    case Event(job: Initialize, data: WarmData) =>
      implicit val transId = job.transId
      val parent = context.parent
      cancelTimer(IdleTimeoutName)
      cancelTimer(KeepingTimeoutName)
      data.container
        .resume()
        .map { _ =>
          logging.info(this, s"Resumed container ${data.container.containerId}")
          // put existing key again
          dataManagementService ! RegisterData(
            s"${ContainerKeys.existingContainers(data.invocationNamespace, data.action.fullyQualifiedName(true), data.action.rev, Some(instance), Some(data.container.containerId))}",
            "")
          parent ! Resumed(data)
          // the new queue may locates on an different scheduler, so recreate the activation client when necessary
          // since akka port will no be used, we can put any value except 0 here
          data.clientProxy ! RequestActivation(
            newScheduler = Some(SchedulerEndpoints(job.schedulerHost, job.rpcPort, 10)))
          startSingleTimer(UnusedTimeoutName, StateTimeout, unusedTimeout)
          timedOut = false
        }
        .recover {
          case t: Throwable =>
            logging.error(this, s"Failed to resume container ${data.container.containerId}, error: $t")
            parent ! ResumeFailed(data)
            self ! ResumeFailed(data)
        }

      // always clean data in etcd regardless of success and failure
      dataManagementService ! UnregisterData(
        ContainerKeys.warmedContainers(
          data.invocationNamespace,
          data.action.fullyQualifiedName(false),
          data.revision,
          instance,
          data.container.containerId))
      goto(Running)
```

You can see here that dataManagementService sends a request to unregister the etcd keys not chained to the unpause operation so this can happen before the docker unpause occurs. The `.map` is always entered even if the docker unpause fails (verifiable through metric emission for unpause error from `LoggingMarkers.INVOKER_DOCKER_CMD`.  The only thing left for me to really figure out is why `docker rm` occurs at the same time as the unpause operation. Maybe it's as simple as the unpause fails for whatever reason triggering the removal in which case maybe the fix is as simple as making sure the etcd write doesn't occur if the unpause fails; but I'm not yet convinced that the docker rm does not occur because of some side effect

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[New Scheduler] Container unpausing results in key remaining in etcd for deleted container #5325

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[New Scheduler] Container unpausing results in key remaining in etcd for deleted container #5325

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions