SSH issue after running devsec.hardening.ssh_hardening role #854

jobetinfosec · 2025-03-05T15:58:18Z

I ran this role against a fresh installed Ubuntu 24.04 server, and the end, the following error showed up:

fatal: [domain.tld]: FAILED! => {"changed": false, "msg": "Unable to start service ssh: Job for ssh.service failed because the control process exited with error code.\nSee \"systemctl status ssh.service\" and \"journalctl -xeu ssh.service\" for details.\n"}

Via a dashboard console, I managed to log as root user and check logs:

fatal: chroot ("/run/sshd"): No such file or directory [preauth]

How may I fix this?

The text was updated successfully, but these errors were encountered:

schurzi · 2025-03-06T09:54:51Z

Hey @jobetinfosec, we would appreciate if you use the provided template for reporting Issues.

Which version of our collection are you using? Since this is a bug, that was fixed in 10.0.0 (more specific #784) it should not happen anymore.

jobetinfosec · 2025-03-06T11:49:53Z

Hi @schurzi
I'm using devsec.hardening ver. 10.3.0

schurzi · 2025-03-06T12:11:19Z

interesting. What does the task Ensure privilege separation directory exists report in your Ansible output?

jobetinfosec · 2025-03-06T12:23:40Z

TASK [devsec.hardening.ssh_hardening : Ensure privilege separation directory exists]
ok: [test]

jobetinfosec · 2025-03-06T12:38:44Z

I think I found the culprit...
When I ran the playbook the first time, I only ran an apt update command, and the SSH error came out.
Now I ran also apt upgrade and no more SSH errors... for God's sake...

schurzi · 2025-03-08T21:13:14Z

I am glad you solved the issue for your case. I consider failures that lead to an inaccessible server very serious, so I'd like to understand how you arrived at this problem. I tried several ways to replicate this issue with my test servers. I could not reproduce this problem. Can you describe a bit more clearly how I can trigger this problem?

jobetinfosec · 2025-03-09T09:21:54Z

Hi @schurzi
First of all I ran the devsec scripts against an Ubuntu server running 24.04 release.
The first time, I ran the scripts without updating anything on target server, and I've gotten a missing auditd package warning.
After running apt update the fatal: chroot ("/run/sshd"): No such file or directory [preauth] error showed up.
The third time after running apt update and apt upgrade the scripts ran successfully.

jobetinfosec · 2025-03-09T11:25:54Z

Hi @schurzi

However, testing it again on another server this time using an Ansible playbook, a further issue came out...

Mar 06 16:05:46 test systemd[1]: ssh.service: Found left-over process 853 (sshd) in control group while starting unit.>
Mar 06 16:05:46 test systemd[1]: ssh.service: This usually indicates unclean termination of a previous run, or service>
Mar 06 16:05:46 test sshd[15968]: error: Bind to port 22 on 0.0.0.0 failed: Address already in use.
Mar 06 16:05:46 test sshd[15968]: fatal: Cannot bind any address.
Mar 06 16:05:46 test systemd[1]: ssh.service: Main process exited, code=exited, status=255/EXCEPTION
Subject: Unit process exited
Defined-By: systemd
Support: http://www.ubuntu.com/support

An ExecStart= process belonging to unit ssh.service has exited.

The process' exit code is 'exited' and its exit status is 255.
Mar 06 16:05:46 test systemd[1]: ssh.service: Failed with result 'exit-code'.
Subject: Unit failed
Defined-By: systemd
Support: http://www.ubuntu.com/support

The unit ssh.service has entered the 'failed' state with result 'exit-code'.
Mar 06 16:05:46 test systemd[1]: ssh.service: Unit process 853 (sshd) remains running after unit stopped.
Mar 06 16:05:46 test systemd[1]: Failed to start ssh.service - OpenBSD Secure Shell server.
Subject: A start job for unit ssh.service has failed
Defined-By: systemd
Support: http://www.ubuntu.com/support

A start job for unit ssh.service has finished with a failure.

The job identifier is 2221 and the job result is failed.

The Ansible playbook I used, simply updates and upgrades system packages, add 3 sudo users and installs a few basic packages:

- certbot
- composer
- curl
- git
- htop
- net-tools
- python3-pip
- screen
- supervisor
- tree
- unzip
- vim     
- whois
- zip

Any idea?

jobetinfosec · 2025-03-13T16:33:30Z

@schurzi
Any news about the above?

BTW, if it can be of any help, this is the Ansible version I'm currently using:
ansible [core 2.18.2]

jacksonblankenship · 2025-03-14T04:38:23Z

I am glad you solved the issue for your case. I consider failures that lead to an inaccessible server very serious, so I'd like to understand how you arrived at this problem. I tried several ways to replicate this issue with my test servers. I could not reproduce this problem. Can you describe a bit more clearly how I can trigger this problem?

I'm experiencing a similar issue on a DigitalOcean droplet (512 MB Memory / 10 GB Disk / SFO3 - Ubuntu 24.04 LTS x64) while running as root. My playbook runs fine but fails during SSH hardening.

Root cause update: After further testing, I've found that using these two roles together (geerlingguy.docker and devsec.hardening.ssh_hardening) causes the issue regardless of execution order. The server becomes inaccessible when both roles are used in the same playbook.

The culprit appears to be this line which "Resets the ssh connection to apply user changes." This reset conflicts with the SSH hardening configurations, effectively locking out access to the server.

To reproduce: Create a minimal playbook that includes both roles (in any order) and the server will become inaccessible after execution.

---
- name: Example
  hosts: example_host
  become: true

  roles:
    - role: geerlingguy.docker
    - role: devsec.hardening.ssh_hardening

jobetinfosec · 2025-03-17T14:39:07Z

@schurzi

I would need to solve this issue. Did you manage to replicate this error somehow?

thomasgl-orange · 2025-03-25T11:40:14Z

FYI, we also have encountered issues when integrating ssh_hardening in our playbooks for Ubuntu 24.04, very similar to #854 (comment) (with left-over process and thus a busy port 22 preventing bind for the new sshd).

It turned out that in our base 24.04 image, openssh was still in version 1:9.6p1-3ubuntu13.5. And there are important fixes to the socket-activation thing in later version, 1:9.6p1-3ubuntu13.6. We now apply some apt dist-upgrade role before running the ssh_hardening role (it currently gives us openssh 1:9.6p1-3ubuntu13.8, which also includes these fixes), and the transition from socket-activated mode to regular service mode does not fail anymore.

schurzi · 2025-03-26T23:34:17Z

Sorry I am currently swamped with other tasks and will not get to a work on this in the next few weeks.

I beleive the comment from @thomasgl-orange might have the solution in it. I kind of want to verify this and then we could include a update tasks in our role. I am not sure however what also needs to be done besides the update. We will need to test if we also need to reconnect the Ansible ssh session, reload systemd and how it should be ordered with our config changes.

AimbotNooby · 2025-03-27T07:56:39Z

FYI: We run a daily image build pipeline with Ubuntu 24 and also get the same error. But only in about 30% of the cases. The error often does not occur again with a new run.
I have not yet investigated the cause in more detail.

jobetinfosec · 2025-03-27T17:22:26Z

Hi @schurzi
The solution suggested by @thomasgl-orange (openssh-server 1:9.6p1-3ubuntu13.8) doesn't solve the issue.
Since the playbook disables root access, I created a sudo user, and I checked that I was able to ssh into the host.
After successfully running the ssh playbook, when trying to ssh, a "Too many authentication failures" shows up.
I even tried to comment out the "Change Debian/Ubuntu systems so ssh starts traditionally instead of socket-activated" task just to check, but the same error shows up: "Too many authentication failures"

ChristianIvicevic · 2025-04-23T13:14:01Z

I am experiencing the same issue every other run when initializing a new instance of a VPS. There is definitely some flakiness or race condition leading to this. Checking the logs reveals the same error messages previous posters have shared. Let me know to what extent I can provide you with more detailed info to debug this issue further.

fliespl · 2025-04-25T18:58:25Z

Same on my end... Killing sshd / ssh processed using Provider console fixed issue. Ubuntu 24.04 (Hetzner).

Not sure it makes a difference, but i am using:


[ssh_connection]
pipelining = True
ssh_args = -o ControlMaster=auto -o ControlPersist=1200

ChristianIvicevic · 2025-04-26T02:34:59Z

FYI, we also have encountered issues when integrating ssh_hardening in our playbooks for Ubuntu 24.04, very similar to #854 (comment) (with left-over process and thus a busy port 22 preventing bind for the new sshd).

It turned out that in our base 24.04 image, openssh was still in version 1:9.6p1-3ubuntu13.5. And there are important fixes to the socket-activation thing in later version, 1:9.6p1-3ubuntu13.6. We now apply some apt dist-upgrade role before running the ssh_hardening role (it currently gives us openssh 1:9.6p1-3ubuntu13.8, which also includes these fixes), and the transition from socket-activated mode to regular service mode does not fail anymore.

fwiw, this is indeed something I can anecdotally confirm as well. At least not the exact details, but running apt upgrade before the playbook fixes this issue for me!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SSH issue after running devsec.hardening.ssh_hardening role #854

SSH issue after running devsec.hardening.ssh_hardening role #854

jobetinfosec commented Mar 5, 2025

schurzi commented Mar 6, 2025

jobetinfosec commented Mar 6, 2025

schurzi commented Mar 6, 2025

jobetinfosec commented Mar 6, 2025

jobetinfosec commented Mar 6, 2025

schurzi commented Mar 8, 2025

jobetinfosec commented Mar 9, 2025

jobetinfosec commented Mar 9, 2025

jobetinfosec commented Mar 13, 2025 •

edited

Loading

jacksonblankenship commented Mar 14, 2025 •

edited

Loading

jobetinfosec commented Mar 17, 2025

thomasgl-orange commented Mar 25, 2025

schurzi commented Mar 26, 2025

AimbotNooby commented Mar 27, 2025

jobetinfosec commented Mar 27, 2025

ChristianIvicevic commented Apr 23, 2025

fliespl commented Apr 25, 2025 •

edited

Loading

ChristianIvicevic commented Apr 26, 2025

SSH issue after running devsec.hardening.ssh_hardening role #854

SSH issue after running devsec.hardening.ssh_hardening role #854

Comments

jobetinfosec commented Mar 5, 2025

schurzi commented Mar 6, 2025

jobetinfosec commented Mar 6, 2025

schurzi commented Mar 6, 2025

jobetinfosec commented Mar 6, 2025

jobetinfosec commented Mar 6, 2025

schurzi commented Mar 8, 2025

jobetinfosec commented Mar 9, 2025

jobetinfosec commented Mar 9, 2025

jobetinfosec commented Mar 13, 2025 • edited Loading

jacksonblankenship commented Mar 14, 2025 • edited Loading

jobetinfosec commented Mar 17, 2025

thomasgl-orange commented Mar 25, 2025

schurzi commented Mar 26, 2025

AimbotNooby commented Mar 27, 2025

jobetinfosec commented Mar 27, 2025

ChristianIvicevic commented Apr 23, 2025

fliespl commented Apr 25, 2025 • edited Loading

ChristianIvicevic commented Apr 26, 2025

jobetinfosec commented Mar 13, 2025 •

edited

Loading

jacksonblankenship commented Mar 14, 2025 •

edited

Loading

fliespl commented Apr 25, 2025 •

edited

Loading