Skip to content

SSH issue after running devsec.hardening.ssh_hardening role #854

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jobetinfosec opened this issue Mar 5, 2025 · 18 comments
Open

SSH issue after running devsec.hardening.ssh_hardening role #854

jobetinfosec opened this issue Mar 5, 2025 · 18 comments

Comments

@jobetinfosec
Copy link

I ran this role against a fresh installed Ubuntu 24.04 server, and the end, the following error showed up:

fatal: [domain.tld]: FAILED! => {"changed": false, "msg": "Unable to start service ssh: Job for ssh.service failed because the control process exited with error code.\nSee \"systemctl status ssh.service\" and \"journalctl -xeu ssh.service\" for details.\n"}

Via a dashboard console, I managed to log as root user and check logs:

fatal: chroot ("/run/sshd"): No such file or directory [preauth]

How may I fix this?

@schurzi
Copy link
Contributor

schurzi commented Mar 6, 2025

Hey @jobetinfosec, we would appreciate if you use the provided template for reporting Issues.

Which version of our collection are you using? Since this is a bug, that was fixed in 10.0.0 (more specific #784) it should not happen anymore.

@jobetinfosec
Copy link
Author

Hi @schurzi
I'm using devsec.hardening ver. 10.3.0

@schurzi
Copy link
Contributor

schurzi commented Mar 6, 2025

interesting. What does the task Ensure privilege separation directory exists report in your Ansible output?

@jobetinfosec
Copy link
Author

TASK [devsec.hardening.ssh_hardening : Ensure privilege separation directory exists]
ok: [test]

@jobetinfosec
Copy link
Author

I think I found the culprit...
When I ran the playbook the first time, I only ran an apt update command, and the SSH error came out.
Now I ran also apt upgrade and no more SSH errors... for God's sake...

@schurzi
Copy link
Contributor

schurzi commented Mar 8, 2025

I am glad you solved the issue for your case. I consider failures that lead to an inaccessible server very serious, so I'd like to understand how you arrived at this problem. I tried several ways to replicate this issue with my test servers. I could not reproduce this problem. Can you describe a bit more clearly how I can trigger this problem?

@jobetinfosec
Copy link
Author

Hi @schurzi
First of all I ran the devsec scripts against an Ubuntu server running 24.04 release.
The first time, I ran the scripts without updating anything on target server, and I've gotten a missing auditd package warning.
After running apt update the fatal: chroot ("/run/sshd"): No such file or directory [preauth] error showed up.
The third time after running apt update and apt upgrade the scripts ran successfully.

@jobetinfosec
Copy link
Author

Hi @schurzi

However, testing it again on another server this time using an Ansible playbook, a further issue came out...

Mar 06 16:05:46 test systemd[1]: ssh.service: Found left-over process 853 (sshd) in control group while starting unit.>
Mar 06 16:05:46 test systemd[1]: ssh.service: This usually indicates unclean termination of a previous run, or service>
Mar 06 16:05:46 test sshd[15968]: error: Bind to port 22 on 0.0.0.0 failed: Address already in use.
Mar 06 16:05:46 test sshd[15968]: fatal: Cannot bind any address.
Mar 06 16:05:46 test systemd[1]: ssh.service: Main process exited, code=exited, status=255/EXCEPTION
Subject: Unit process exited
Defined-By: systemd
Support: http://www.ubuntu.com/support

An ExecStart= process belonging to unit ssh.service has exited.

The process' exit code is 'exited' and its exit status is 255.
Mar 06 16:05:46 test systemd[1]: ssh.service: Failed with result 'exit-code'.
Subject: Unit failed
Defined-By: systemd
Support: http://www.ubuntu.com/support

The unit ssh.service has entered the 'failed' state with result 'exit-code'.
Mar 06 16:05:46 test systemd[1]: ssh.service: Unit process 853 (sshd) remains running after unit stopped.
Mar 06 16:05:46 test systemd[1]: Failed to start ssh.service - OpenBSD Secure Shell server.
Subject: A start job for unit ssh.service has failed
Defined-By: systemd
Support: http://www.ubuntu.com/support

A start job for unit ssh.service has finished with a failure.

The job identifier is 2221 and the job result is failed.

The Ansible playbook I used, simply updates and upgrades system packages, add 3 sudo users and installs a few basic packages:

- certbot
- composer
- curl
- git
- htop
- net-tools
- python3-pip
- screen
- supervisor
- tree
- unzip
- vim     
- whois
- zip

Any idea?

@jobetinfosec
Copy link
Author

jobetinfosec commented Mar 13, 2025

@schurzi
Any news about the above?

BTW, if it can be of any help, this is the Ansible version I'm currently using:
ansible [core 2.18.2]

@jacksonblankenship
Copy link

jacksonblankenship commented Mar 14, 2025

I am glad you solved the issue for your case. I consider failures that lead to an inaccessible server very serious, so I'd like to understand how you arrived at this problem. I tried several ways to replicate this issue with my test servers. I could not reproduce this problem. Can you describe a bit more clearly how I can trigger this problem?

I'm experiencing a similar issue on a DigitalOcean droplet (512 MB Memory / 10 GB Disk / SFO3 - Ubuntu 24.04 LTS x64) while running as root. My playbook runs fine but fails during SSH hardening.

Root cause update: After further testing, I've found that using these two roles together (geerlingguy.docker and devsec.hardening.ssh_hardening) causes the issue regardless of execution order. The server becomes inaccessible when both roles are used in the same playbook.

The culprit appears to be this line which "Resets the ssh connection to apply user changes." This reset conflicts with the SSH hardening configurations, effectively locking out access to the server.

To reproduce: Create a minimal playbook that includes both roles (in any order) and the server will become inaccessible after execution.

---
- name: Example
  hosts: example_host
  become: true

  roles:
    - role: geerlingguy.docker
    - role: devsec.hardening.ssh_hardening

@jobetinfosec
Copy link
Author

@schurzi

I would need to solve this issue. Did you manage to replicate this error somehow?

@thomasgl-orange
Copy link

FYI, we also have encountered issues when integrating ssh_hardening in our playbooks for Ubuntu 24.04, very similar to #854 (comment) (with left-over process and thus a busy port 22 preventing bind for the new sshd).

It turned out that in our base 24.04 image, openssh was still in version 1:9.6p1-3ubuntu13.5. And there are important fixes to the socket-activation thing in later version, 1:9.6p1-3ubuntu13.6. We now apply some apt dist-upgrade role before running the ssh_hardening role (it currently gives us openssh 1:9.6p1-3ubuntu13.8, which also includes these fixes), and the transition from socket-activated mode to regular service mode does not fail anymore.

@schurzi
Copy link
Contributor

schurzi commented Mar 26, 2025

Sorry I am currently swamped with other tasks and will not get to a work on this in the next few weeks.

I beleive the comment from @thomasgl-orange might have the solution in it. I kind of want to verify this and then we could include a update tasks in our role. I am not sure however what also needs to be done besides the update. We will need to test if we also need to reconnect the Ansible ssh session, reload systemd and how it should be ordered with our config changes.

@AimbotNooby
Copy link

FYI: We run a daily image build pipeline with Ubuntu 24 and also get the same error. But only in about 30% of the cases. The error often does not occur again with a new run.
I have not yet investigated the cause in more detail.

@jobetinfosec
Copy link
Author

Hi @schurzi
The solution suggested by @thomasgl-orange (openssh-server 1:9.6p1-3ubuntu13.8) doesn't solve the issue.
Since the playbook disables root access, I created a sudo user, and I checked that I was able to ssh into the host.
After successfully running the ssh playbook, when trying to ssh, a "Too many authentication failures" shows up.
I even tried to comment out the "Change Debian/Ubuntu systems so ssh starts traditionally instead of socket-activated" task just to check, but the same error shows up: "Too many authentication failures"

@ChristianIvicevic
Copy link

I am experiencing the same issue every other run when initializing a new instance of a VPS. There is definitely some flakiness or race condition leading to this. Checking the logs reveals the same error messages previous posters have shared. Let me know to what extent I can provide you with more detailed info to debug this issue further.

Image

@fliespl
Copy link

fliespl commented Apr 25, 2025

Same on my end... Killing sshd / ssh processed using Provider console fixed issue. Ubuntu 24.04 (Hetzner).

Not sure it makes a difference, but i am using:


[ssh_connection]
pipelining = True
ssh_args = -o ControlMaster=auto -o ControlPersist=1200

@ChristianIvicevic
Copy link

FYI, we also have encountered issues when integrating ssh_hardening in our playbooks for Ubuntu 24.04, very similar to #854 (comment) (with left-over process and thus a busy port 22 preventing bind for the new sshd).

It turned out that in our base 24.04 image, openssh was still in version 1:9.6p1-3ubuntu13.5. And there are important fixes to the socket-activation thing in later version, 1:9.6p1-3ubuntu13.6. We now apply some apt dist-upgrade role before running the ssh_hardening role (it currently gives us openssh 1:9.6p1-3ubuntu13.8, which also includes these fixes), and the transition from socket-activated mode to regular service mode does not fail anymore.

fwiw, this is indeed something I can anecdotally confirm as well. At least not the exact details, but running apt upgrade before the playbook fixes this issue for me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants