Skip to content

Self hosted runners failing with WRITE ERROR: A session for this runner already exists #3441

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
soumik-avoma opened this issue Aug 26, 2024 · 10 comments
Labels
bug Something isn't working

Comments

@soumik-avoma
Copy link

We are using self-hosted ephimeral runners for our github jobs. Recently we are seeing jobs failing randomly with the error This job failed on the UI. On further digging we found out that it's trying to run the job on a runner which is already in Active state instead of Idle. We also saw the following logs.

ERR Terminal] WRITE ERROR: A session for this runner already exists
It gets stuck in this process for long and then fails. Ex:

Waiting for a runner to pick up this job...
Job is about to start running on the runner: ip-10-0-87-43 (organization)

We even tried upgrading the action agent to v2.319.1 , but the issue still persists.
All of our runners have the same labels hence the job could easily pickup any runner in Idle state, but instead its trying to pickup a runner which is in Active state.

@soumik-avoma soumik-avoma added the bug Something isn't working label Aug 26, 2024
@luketomlinson
Copy link
Contributor

Hi @soumik-avoma, do you have a link to a workflow run where this issue is occurring?

@soumik-avoma
Copy link
Author

Hi @luketomlinson,
Since the workflow are private, hence won't be able to share that. But here are the logs:

2024-08-26 16:05:53.816 | [2024-08-26 10:35:53Z INFO MessageListener] Sleeping for 30 seconds before retrying. |  
-- | -- | --
  |   | 2024-08-26 16:05:53.816 | [2024-08-26 10:35:53Z INFO MessageListener] The session conflict exception haven't reached retry limit. |  
  |   | 2024-08-26 16:05:53.816 | [2024-08-26 10:35:53Z ERR  Terminal] WRITE ERROR: A session for this runner already exists. |  
  |   | 2024-08-26 16:05:53.816 | [2024-08-26 10:35:53Z INFO MessageListener] The session for this runner already exists. |  
  |   | 2024-08-26 16:05:53.816 | at GitHub.Runner.Listener.MessageListener.CreateSessionAsync(CancellationToken token) |  
  |   | 2024-08-26 16:05:53.816 | at GitHub.Services.WebApi.VssHttpClientBase.SendAsync[T](HttpMethod method, IEnumerable`1 additionalHeaders, Guid locationId, Object routeValues, ApiResourceVersion version, HttpContent content, IEnumerable`1 queryParameters, Object userState, CancellationToken cancellationToken) |  
  |   | 2024-08-26 16:05:53.816 | at GitHub.Services.WebApi.VssHttpClientBase.SendAsync[T](HttpRequestMessage message, Object userState, CancellationToken cancellationToken) |  
  |   | 2024-08-26 16:05:53.816 | at GitHub.Services.WebApi.VssHttpClientBase.SendAsync(HttpRequestMessage message, HttpCompletionOption completionOption, Object userState, CancellationToken cancellationToken) |  
  |   | 2024-08-26 16:05:53.816 | at GitHub.Services.WebApi.VssHttpClientBase.HandleResponseAsync(HttpResponseMessage response, CancellationToken cancellationToken) |  
  |   | 2024-08-26 16:05:53.816 | [2024-08-26 10:35:53Z ERR  MessageListener] GitHub.DistributedTask.WebApi.TaskAgentSessionConflictException: The actions runner ip-10-0-74-151-1724668474 already has an active session. |  
  |   | 2024-08-26 16:05:53.816 | [2024-08-26 10:35:53Z ERR  MessageListener] Catch exception during create session. |  
  |   | 2024-08-26 16:05:53.816 | [2024-08-26 10:35:53Z ERR  GitHubActionsService] POST request to https://pipelinesghubeus2.actions.githubusercontent.com/85uueRDPTlsbvfv9f8kaXsAIyvEp4gmsRFQMwFV501YbBtzszu/_apis/distributedtask/pools/1/sessions failed. HTTP Status: Conflict |  
  |   | 2024-08-26 16:05:53.566 | [2024-08-26 10:35:53Z INFO Terminal] WRITE LINE: |  
  |   | 2024-08-26 16:05:53.566 | [2024-08-26 10:35:53Z INFO Terminal] WRITE LINE: |  
  |   | 2024-08-26 16:05:53.566 | [2024-08-26 10:35:53Z INFO MessageListener] VssConnection created |  
  |   | 2024-08-26 16:05:52.813 | [2024-08-26 10:35:52Z INFO GitHubActionsService] Finished operation Location.GetConnectionData |  
  |   | 2024-08-26 16:05:52.813 | [2024-08-26 10:35:52Z INFO GitHubActionsService] Finished operation Location.GetConnectionData |  
  |   | 2024-08-26 16:05:52.813 | [2024-08-26 10:35:52Z INFO GitHubActionsService] Finished operation Location.GetConnectionData |  
  |   | 2024-08-26 16:05:52.312 | [2024-08-26 10:35:52Z INFO GitHubActionsService] Starting operation Location.GetConnectionData |  
  |   | 2024-08-26 16:05:52.312 | [2024-08-26 10:35:52Z INFO RunnerServer] Establish connection with 60 seconds timeout. |  
  |   | 2024-08-26 16:05:52.312 | [2024-08-26 10:35:52Z INFO RunnerServer] EstablishVssConnection |  
  |   | 2024-08-26 16:05:52.312 | [2024-08-26 10:35:52Z INFO GitHubActionsService] Starting operation Location.GetConnectionData |  
  |   | 2024-08-26 16:05:52.312 | [2024-08-26 10:35:52Z INFO RunnerServer] Establish connection with 60 seconds timeout. |  
  |   | 2024-08-26 16:05:52.312 | [2024-08-26 10:35:52Z INFO RunnerServer] EstablishVssConnection |  
  |   | 2024-08-26 16:05:52.312 | [2024-08-26 10:35:52Z INFO GitHubActionsService] Starting operation Location.GetConnectionData |  
  |   | 2024-08-26 16:05:52.312 | [2024-08-26 10:35:52Z INFO RunnerServer] Establish connection with 100 seconds timeout. |  
  |   | 2024-08-26 16:05:52.312 | [2024-08-26 10:35:52Z INFO RunnerServer] EstablishVssConnection |  
  |   | 2024-08-26 16:05:52.312 | [2024-08-26 10:35:52Z INFO MessageListener] Connecting to the Runner Server... |  
  |   | 2024-08-26 16:05:52.312 | [2024-08-26 10:35:52Z INFO MessageListener] Attempt to create session. |  
  |   | 2024-08-26 16:05:25.232 | [2024-08-26 10:34:56Z INFO Listener] Runner execution has finished with return code 0 |  
  |   | 2024-08-26 16:05:25.232 | [2024-08-26 10:34:56Z INFO ProcessInvokerWrapper] Finished process 2052 with exit code 0, and elapsed time 00:00:00.0011824. |  
  |   | 2024-08-26 16:05:25.232 | [2024-08-26 10:34:56Z INFO ProcessInvokerWrapper] STDOUT/STDERR stream read finished. |  
  |   | 2024-08-26 16:05:25.232 | [2024-08-26 10:34:56Z INFO ProcessInvokerWrapper] STDOUT/STDERR stream read finished. |  
  |   | 2024-08-26 16:05:25.232 | [2024-08-26 10:34:56Z INFO ProcessInvokerWrapper] Process started with process id 2052, waiting for process exit. |  
  |   | 2024-08-26 16:05:25.232 | [2024-08-26 10:34:56Z INFO ProcessInvokerWrapper] Updated oom_score_adj to 500 for PID: 2052. |  
  |   | 2024-08-26 16:05:25.232 | [2024-08-26 10:34:56Z INFO ProcessInvokerWrapper]   High priority process: 'False' |  
  |   | 2024-08-26 16:05:25.232 | [2024-08-26 10:34:56Z INFO ProcessInvokerWrapper]   Keep redirected STDIN open: 'False' |  
  |   | 2024-08-26 16:05:25.232 | [2024-08-26 10:34:56Z INFO ProcessInvokerWrapper]   Persist current code page: 'False' |  
  |   | 2024-08-26 16:05:25.232 | [2024-08-26 10:34:56Z INFO ProcessInvokerWrapper]   Redirected STDIN: 'False' |  
  |   | 2024-08-26 16:05:25.232 | [2024-08-26 10:34:56Z INFO ProcessInvokerWrapper]   Force kill process on cancellation: 'False' |  
  |   | 2024-08-26 16:05:25.232 | [2024-08-26 10:34:56Z INFO ProcessInvokerWrapper]   Encoding web name:  ; code page: '' |  
  |   | 2024-08-26 16:05:25.232 | [2024-08-26 10:34:56Z INFO ProcessInvokerWrapper]   Require exit code zero: 'True' |  
  |   | 2024-08-26 16:05:25.232 | [2024-08-26 10:34:56Z INFO ProcessInvokerWrapper]   Working directory: '/home/ubuntu/actions-runner' |  
  |   | 2024-08-26 16:05:25.232 | [2024-08-26 10:34:56Z INFO ProcessInvokerWrapper]   Arguments: '755 "/home/ubuntu/actions-runner/svc.sh"' |  
  |   | 2024-08-26 16:05:25.232 | [2024-08-26 10:34:56Z INFO ProcessInvokerWrapper]   File name: '/usr/bin/chmod' |  
  |   | 2024-08-26 16:05:25.232 | [2024-08-26 10:34:56Z INFO ProcessInvokerWrapper] Starting process: |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:56Z INFO UnixUtil] Running /usr/bin/chmod 755 "/home/ubuntu/actions-runner/svc.sh" |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:56Z INFO UnixUtil] Location: '/usr/bin/chmod' |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:56Z INFO UnixUtil] Which2: 'chmod' |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:56Z INFO HostContext] Well known directory 'Root': '/home/ubuntu/actions-runner' |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:56Z INFO HostContext] Well known directory 'Bin': '/home/ubuntu/actions-runner/bin' |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:56Z INFO HostContext] Well known directory 'Bin': '/home/ubuntu/actions-runner/bin' |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:56Z INFO HostContext] Well known directory 'Root': '/home/ubuntu/actions-runner' |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:56Z INFO HostContext] Well known directory 'Bin': '/home/ubuntu/actions-runner/bin' |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:56Z INFO SystemDControlManager] Service name 'actions.runner.ORG_NAME.ip-10-0-74-151-1724668474.service' display name 'GitHub Actions Runner (ORG_NAME.ip-10-0-74-151-1724668474)' will be used for service configuration. |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:56Z INFO Terminal] WRITE LINE: |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:56Z INFO Terminal] WRITE LINE: |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:56Z INFO ConfigurationStore] Settings Saved. |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:56Z INFO ConfigurationStore] Saving runner settings. |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:56Z INFO PromptManager] ReadValue |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:56Z INFO CommandSettings] Flag 'unattended': 'True' |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:56Z INFO CommandSettings] Arg 'work': '' |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:55Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:55Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/ubuntu/actions-runner/.credentials_rsaparams |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:55Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/ubuntu/actions-runner/.credentials_rsaparams |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:54Z INFO GitHubActionsService] Finished operation Location.GetConnectionData |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:54Z INFO GitHubActionsService] Finished operation Location.GetConnectionData |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:54Z INFO GitHubActionsService] Finished operation Location.GetConnectionData |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:54Z INFO GitHubActionsService] Starting operation Location.GetConnectionData |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:54Z INFO RunnerServer] Establish connection with 60 seconds timeout. |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:54Z INFO RunnerServer] EstablishVssConnection |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:54Z INFO GitHubActionsService] Starting operation Location.GetConnectionData |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:54Z INFO RunnerServer] Establish connection with 60 seconds timeout. |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:54Z INFO RunnerServer] EstablishVssConnection |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:54Z INFO GitHubActionsService] Starting operation Location.GetConnectionData |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:54Z INFO RunnerServer] Establish connection with 100 seconds timeout. |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:54Z INFO RunnerServer] EstablishVssConnection |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:53Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/ubuntu/actions-runner/.credentials_rsaparams |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:53Z INFO CredentialManager] Creating credential type: OAuth |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:53Z INFO CredentialManager] Creating type OAuth |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:53Z INFO CredentialManager] GetCredentialProvider |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:53Z INFO ConfigurationStore] stored True |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:53Z INFO ConfigurationStore] HasCredentials() |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:53Z INFO ConfigurationStore] Credentials Saved. |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:53Z INFO ConfigurationStore] Saving OAuth credential @ /home/ubuntu/actions-runner/.credentials |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:53Z INFO CommandSettings] Flag 'ephemeral': 'True' |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:53Z INFO CommandSettings] Flag 'ephemeral': 'True' |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:53Z INFO CommandSettings] Flag 'disableupdate': 'False' |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:53Z INFO CommandSettings] Flag 'no-default-labels': 'False' |  
  |   | 2024-08-26 16:05:25.231 | [2024-08-26 10:34:53Z INFO CommandSettings] Flag 'disableupdate': 'False'

@luketomlinson
Copy link
Contributor

👋Hi @soumik-avoma
From our logs, I see that runner attempting to create a session twice. Once at 2024-08-26T10:35:13UTC and again 11 seconds later at 2024-08-26T10:35:24UTC. The 2nd one results in the conflict you were seeing. Is it possible in your setup/automation there is something causing ./run.sh to be called multiple times?

@soumik-avoma
Copy link
Author

This is what we use for registering the runner. We do not call run.sh anywhere explicitly.

    payload=$(curl -sX POST -H "Authorization: token $GITHUB_PAT" https://api.github.com/orgs/ORG_NAME/actions/runners/registration-token)
    export RUNNER_TOKEN=$(echo $payload | jq .token --raw-output)
    mkdir -p actions-runner && cd actions-runner
    curl -o actions-runner-linux-$ARCH-2.319.1.tar.gz -L https://github.com/actions/runner/releases/download/v2.319.1/actions-runner-linux-$ARCH-2.319.1.tar.gz
    echo "03d993c65e0c4daa5e3bf5a5a35ba356f363bdb5ceb6b5808fd52fdb813dd8e8  actions-runner-linux-$ARCH-2.319.1.tar.gz" | shasum -a 256 -c
    tar xzf ./actions-runner-linux-$ARCH-2.319.1.tar.gz
    cd /home/ubuntu/actions-runner
    ./config.sh \
        --name $(hostname) \
        --token $RUNNER_TOKEN \
        --url https://github.com/ORG_NAME \
        --unattended \
        --labels $PURPOSE \
        --replace \
        --ephemeral
    sudo ./svc.sh install

@Donnie
Copy link

Donnie commented Oct 2, 2024

Hi @soumik-avoma please try at least doubling the memory of your runner.
It happens, that after an OOM event, the runner restarts. And then it tries to create a new session whereas the previous session was not gracefully ended.

@liriID
Copy link

liriID commented Dec 12, 2024

Hi @soumik-avoma please try at least doubling the memory of your runner. It happens, that after an OOM event, the runner restarts. And then it tries to create a new session whereas the previous session was not gracefully ended.

How can we verify this with logs or other hints?

@fgendorf
Copy link

We have the same situation in our self hosted runners windows

@kaybutter
Copy link

Running into the same issue with macOS runners. Action runners keep randomly disappearing from github and if I check the local logs, I see they failed like this a few days before.

@wozniakpl
Copy link

We had a similar issue. We run the ARC in EKS and deleting the pod wasn't helping because we kept seeing that A session for this runner already exists.

Manually removing the runner (not the runner scale set) helped. After the pod went down, a new one was spawned, a new runner was registered and it started accepting jobs.

@AndreySvinarenko
Copy link

AndreySvinarenko commented Apr 11, 2025

Same issue, subscribed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

8 participants