Skip to content

[BUG] 3.0.0 fails to start on Debian due to boostrap checks #18273

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
landon-lengyel opened this issue May 12, 2025 · 27 comments
Open

[BUG] 3.0.0 fails to start on Debian due to boostrap checks #18273

landon-lengyel opened this issue May 12, 2025 · 27 comments
Labels
bug Something isn't working Other

Comments

@landon-lengyel
Copy link

Describe the bug

Hi,

I have found (both on my install, and on a clean install) that OpenSearch fails to start on Debian 12 when using version 3.0.0 and utilizing the network.host: 0.0.0.0 option

The network.host: 0.0.0.0 option appears to cause a "bootstrap check" to run during start (not a whole lot of OpenSearch documentation about what this is, or how to work with it).

With network.host: 0.0.0.0 the only way I can find to actually start the service is by editing /lib/systemd/system/opensearch.service and adding seccomp to the line: SystemCallFilter=madvise mincore mlock mlock2 munlock get_mempolicy sched_getaffinity sched_setaffinity fcntl

Thank you for your help.

Related component

Other

To Reproduce

  1. Install OpenSearch on fully updated Debian 12.
  2. Edit opensearch.yml to include the option: network.host: 0.0.0.0 this will cause the boostrap check to run.
  3. Attempt to restart OpenSearch service.

Expected behavior

OpenSearch start successfully, or at least gives some indication of how to fix the issue. It's unclear what the issue even is to me. Am I not supposed to use 0.0.0.0? Am I expected to add seccomp to the SystemD option? Or is this just a bug?

Additional Details

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: Debian 12
  • Version: 3.0.0

Additional context
Log:

[2025-05-12T15:01:20,036][INFO ][o.o.s.a.i.AuditLogImpl   ] [osc01-test] Closing AuditLogImpl
[2025-05-12T15:01:20,039][INFO ][o.o.n.Node               ] [osc01-test] closed
[2025-05-12T15:01:23,653][WARN ][o.o.b.JNANatives         ] [osc01-test] unable to install syscall filter:
java.lang.UnsupportedOperationException: seccomp(BOGUS_OPERATION): Operation not permitted
        at org.opensearch.bootstrap.SystemCallFilter.linuxImpl(SystemCallFilter.java:311) ~[opensearch-3.0.0.jar:3.0.0]
        at org.opensearch.bootstrap.SystemCallFilter.init(SystemCallFilter.java:666) ~[opensearch-3.0.0.jar:3.0.0]
        at org.opensearch.bootstrap.JNANatives.tryInstallSystemCallFilter(JNANatives.java:281) [opensearch-3.0.0.jar:3.0.0]
        at org.opensearch.bootstrap.Natives.tryInstallSystemCallFilter(Natives.java:128) [opensearch-3.0.0.jar:3.0.0]
        at org.opensearch.bootstrap.Bootstrap.initializeNatives(Bootstrap.java:130) [opensearch-3.0.0.jar:3.0.0]
        at org.opensearch.bootstrap.Bootstrap.setup(Bootstrap.java:192) [opensearch-3.0.0.jar:3.0.0]
        at org.opensearch.bootstrap.Bootstrap.init(Bootstrap.java:405) [opensearch-3.0.0.jar:3.0.0]
        at org.opensearch.bootstrap.OpenSearch.init(OpenSearch.java:168) [opensearch-3.0.0.jar:3.0.0]
        at org.opensearch.bootstrap.OpenSearch.execute(OpenSearch.java:159) [opensearch-3.0.0.jar:3.0.0]
        at org.opensearch.common.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:110) [opensearch-3.0.0.jar:3.0.0]
        at org.opensearch.cli.Command.mainWithoutErrorHandling(Command.java:138) [opensearch-cli-3.0.0.jar:3.0.0]
        at org.opensearch.cli.Command.main(Command.java:101) [opensearch-cli-3.0.0.jar:3.0.0]
        at org.opensearch.bootstrap.OpenSearch.main(OpenSearch.java:125) [opensearch-3.0.0.jar:3.0.0]
        at org.opensearch.bootstrap.OpenSearch.main(OpenSearch.java:91) [opensearch-3.0.0.jar:3.0.0]
[2025-05-12T15:01:23,806][WARN ][stderr                   ] [osc01-test] May 12, 2025 3:01:23 PM org.opensearch.javaagent.bootstrap.AgentPolicy setPolicy
[2025-05-12T15:01:23,807][WARN ][stderr                   ] [osc01-test] INFO: Policy attached successfully: org.opensearch.bootstrap.OpenSearchPolicy@103082dd
[2025-05-12T15:01:23,814][INFO ][o.o.n.Node               ] [osc01-test] version[3.0.0], pid[3019], build[deb/dc4efa821904cc2d7ea7ef61c0f577d3fc0d8be9/2025-05-03T06:23:34.992456558Z], OS[Linux/6.1.0-34-amd64/amd64], JVM[Eclipse Adoptium/OpenJDK 64-Bit Server VM/21.0.7/21.0.7+6-LTS]

...

[2025-05-12T15:01:33,749][INFO ][o.o.s.l.BuiltinLogTypeLoader] [osc01-test] Loaded [ad_ldap_logtype.json] log type
[2025-05-12T15:01:33,839][INFO ][o.o.t.TransportService   ] [osc01-test] publish_address {MYIP:9300}, bound_addresses {[::]:9300}
[2025-05-12T15:01:33,840][INFO ][o.o.t.TransportService   ] [osc01-test] Remote clusters initialized successfully.
[2025-05-12T15:01:33,999][INFO ][o.o.b.BootstrapChecks    ] [osc01-test] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2025-05-12T15:01:34,002][ERROR][o.o.b.Bootstrap          ] [osc01-test] node validation exception
[2] bootstrap checks failed
[1]: system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk
[2]: the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_cluster_manager_nodes / cluster.initial_master_nodes] must be configured
[2025-05-12T15:01:34,003][INFO ][o.o.n.Node               ] [osc01-test] stopping ...
@landon-lengyel landon-lengyel added bug Something isn't working untriaged labels May 12, 2025
@github-actions github-actions bot added the Other label May 12, 2025
@landon-lengyel
Copy link
Author

I'm guessing that this is likely related to these two issues:
17181
18083

@cwperks
Copy link
Member

cwperks commented May 12, 2025

FYI @RajatGupta02 @kumargu

Does opensearch.service need to be updated?

@kumargu
Copy link
Contributor

kumargu commented May 13, 2025

andon-lengyel could you also help us with the version of ssystemd installed on your Debian 12 environment?

@mrluanma
Copy link

@kumargu systemd in Debian 12 is version v252.

I'm running into this on Ubuntu 22.04 as well:

ii  systemd        249.11-0ubuntu3.15 amd64        system and service manager

@RajatGupta02
Copy link
Contributor

I think the issue is this.
@mrluanma @landon-lengyel Could you please try this and confirm if it runs for you?

@mrluanma
Copy link

mrluanma commented May 13, 2025

@RajatGupta02 That can't be my issue because I'm trying to upgrade a cluster from 2.x to 3.x.

root@byte-api-7:~# grep -C 3 'system call filters failed to install' /var/log/opensearch/opensearch.log
[2025-05-13T15:12:57,159][INFO ][o.o.b.BootstrapChecks    ] [byte-api-7] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2025-05-13T15:12:57,164][ERROR][o.o.b.Bootstrap          ] [byte-api-7] node validation exception
[1] bootstrap checks failed
[1]: system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk
[2025-05-13T15:12:57,165][INFO ][o.o.n.Node               ] [byte-api-7] stopping ...
[2025-05-13T15:12:57,167][INFO ][o.o.s.a.r.AuditMessageRouter] [byte-api-7] Closing AuditMessageRouter
[2025-05-13T15:12:57,167][INFO ][o.o.s.a.s.SinkProvider   ] [byte-api-7] Closing InternalOpenSearchSink

Two work-around worked for me:

  1. disable bootstrap.system_call_filter
root@im25xacxf0n8as:~# grep bootstrap.system_call_filter /etc/opensearch/opensearch.yml
bootstrap.system_call_filter: false
  1. add seccomp to SystemCallFilter as @landon-lengyel detailed in the OP.
root@im25xacxf0n8as:~# grep SystemCallFilter /etc/systemd/system/opensearch.service.d/override.conf
SystemCallFilter=seccomp

@kumargu
Copy link
Contributor

kumargu commented May 13, 2025

@RajatGupta02 That can't be my issue because I'm trying to upgrade a cluster from 2.x to 3.x.

the new systemd configurations which blocks syscalls were introduced in 3.0.

@kumargu
Copy link
Contributor

kumargu commented May 13, 2025

Did the workaround mentioned before work for you? We would recommend you to try the workaround rather bypassing the systemd rules.

@landon-lengyel
Copy link
Author

I think the issue is this. @mrluanma @landon-lengyel Could you please try this and confirm if it runs for you?

For my test server that I setup just for this, setting discovery.type: single-node did solve it. I missed that. The error message there could be greatly improved in that situation.
But like @mrluanma my main cluster has multiple nodes and all nodes are still having the issue.

@kumargu
Copy link
Contributor

kumargu commented May 13, 2025

@landon-lengyel we will evaluate the implications of adding secomp in SystemCallFilter rule. That's probably the right fix here; unless we find a better alternative. We will come back on this tomorrow.

@kumargu
Copy link
Contributor

kumargu commented May 14, 2025

We are not able to repro the issue with latest DEBIAN Ubuntu X86 running systemd version 255.

  • Is it possible for you to upgrade the systemd version in your environment?

@landon-lengyel
Copy link
Author

Which version of Debian did you use that had systemd 255?

Here is a new config I created to test with a cluster. It is experiencing the same issue:

cluster.name: osc01-test
node.name: osc01-test01
node.attr.type: virtual
#path.data:
#path.logs:
network.host: 0.0.0.0
network.publish_host: osc01-test01.example.net
discovery.seed_hosts: [ "osc01-test01.example.net","osc01-test02.example.net" ]
cluster.initial_cluster_manager_nodes: ["osc01-test01.example.net"]

transport.ssl.enforce_hostname_verification: true
plugins.security.ssl.transport.pemcert_filepath: /etc/opensearch/osc01-test01.pem
plugins.security.ssl.transport.pemkey_filepath: /etc/opensearch/osc01-test01.key
plugins.security.ssl.transport.pemtrustedcas_filepath: /etc/opensearch/combined-ca.pem
plugins.security.ssl.http.enabled: true
plugins.security.ssl.http.pemcert_filepath: /etc/opensearch/osc01-test01.pem
plugins.security.ssl.http.pemkey_filepath: /etc/opensearch/osc01-test01.key
plugins.security.ssl.http.pemtrustedcas_filepath: /etc/opensearch/combined-ca.pem
plugins.security.allow_default_init_securityindex: true
plugins.security.authcz.admin_dn:
  - 'CN=osc01n01-admin,OU=IT,O=EXAMPLE,L=SLC,ST=UTAH,C=US'
plugins.security.nodes_dn:
  - 'CN=osc01-test01.example.net'
  - 'CN=osc01-test02.example.net'
plugins.security.audit.type: internal_opensearch_data_stream
plugins.security.audit.config.data_stream.template.name: opensearch-security-auditlog
plugins.security.enable_snapshot_restore_privilege: true
plugins.security.check_snapshot_restore_write_privileges: true
plugins.security.restapi.roles_enabled: ["all_access", "security_rest_api_access"]

plugins.security.ssl.http.clientauth_mode: OPTIONAL

@mrluanma
Copy link

We are not able to repro the issue with latest DEBIAN Ubuntu X86 running systemd version 255.

I can repro on Ubuntu 24.10 with systemd 256.

root@lima-default:~# lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 24.10
Release:	24.10
Codename:	oracular

root@lima-default:~# dpkg -l systemd
ii  systemd        256.5-2ubuntu3.1 amd64        system and service manager

root@lima-default:~# wget https://artifacts.opensearch.org/releases/bundle/opensearch/3.0.0/opensearch-3.0.0-linux-x64.deb
root@lima-default:~# env OPENSEARCH_INITIAL_ADMIN_PASSWORD=pwI59DRyZFYj3Ygn dpkg -i opensearch-3.0.0-linux-x64.deb
root@lima-default:~# sed -i 's/#network.host: 192.168.0.1/network.host: 0.0.0.0/' /etc/opensearch/opensearch.yml

root@lima-default:~# systemctl start opensearch
Job for opensearch.service failed because the control process exited with error code.
See "systemctl status opensearch.service" and "journalctl -xeu opensearch.service" for details.

root@lima-default:~# grep -C 3 'system call filters failed to install' /var/log/opensearch/opensearch.log
[2025-05-15T11:15:49,592][INFO ][o.o.b.BootstrapChecks    ] [lima-default] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2025-05-15T11:15:49,598][ERROR][o.o.b.Bootstrap          ] [lima-default] node validation exception
[2] bootstrap checks failed
[1]: system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk
[2]: the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_cluster_manager_nodes / cluster.initial_master_nodes] must be configured
[2025-05-15T11:15:49,602][INFO ][o.o.n.Node               ] [lima-default] stopping ...
[2025-05-15T11:15:49,603][INFO ][o.o.s.a.r.AuditMessageRouter] [lima-default] Closing AuditMessageRouter

@mrluanma
Copy link

New repro with systemd 255 on Ubuntu 24.04:

root@lima-ubuntu-24-04:~# lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 24.04.2 LTS
Release:	24.04
Codename:	noble

root@lima-ubuntu-24-04:~# dpkg -l systemd
ii  systemd        255.4-1ubuntu8.6 arm64        system and service manager

root@lima-ubuntu-24-04:~# wget https://artifacts.opensearch.org/releases/bundle/opensearch/3.0.0/opensearch-3.0.0-linux-arm64.deb
root@lima-ubuntu-24-04:~# env OPENSEARCH_INITIAL_ADMIN_PASSWORD=pwI59DRyZFYj3Ygn dpkg -i opensearch-3.0.0-linux-arm64.deb
root@lima-ubuntu-24-04:~# sed -i 's/#network.host: 192.168.0.1/network.host: 0.0.0.0/' /etc/opensearch/opensearch.yml
root@lima-ubuntu-24-04:~# sed -i 's/#cluster.initial_cluster_manager_nodes:/cluster.initial_cluster_manager_nodes:/' /etc/opensearch/opensearch.yml

root@lima-ubuntu-24-04:~# systemctl start opensearch
Job for opensearch.service failed because the control process exited with error code.
See "systemctl status opensearch.service" and "journalctl -xeu opensearch.service" for details.

root@lima-ubuntu-24-04:~# grep -C 3 'system call filters failed to install' /var/log/opensearch/opensearch.log
[2025-05-15T12:03:12,319][INFO ][o.o.b.BootstrapChecks    ] [lima-ubuntu-24-04] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2025-05-15T12:03:12,321][ERROR][o.o.b.Bootstrap          ] [lima-ubuntu-24-04] node validation exception
[1] bootstrap checks failed
[1]: system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk
[2025-05-15T12:03:12,322][INFO ][o.o.s.a.r.AuditMessageRouter] [lima-ubuntu-24-04] Closing AuditMessageRouter
[2025-05-15T12:03:12,322][INFO ][o.o.n.Node               ] [lima-ubuntu-24-04] stopping ...
[2025-05-15T12:03:12,323][INFO ][o.o.s.a.s.SinkProvider   ] [lima-ubuntu-24-04] Closing InternalOpenSearchSink

@RajatGupta02
Copy link
Contributor

We were able to reproduce it, and we are identifying a fix.
thanks

@inntran
Copy link

inntran commented May 15, 2025

I'm able to reproduce the same issue on AlmaLinux 9.5, systemd 252, using RPM package of OpenSearch 3.0.0

systemd 252 (252-46.el9_5.3.alma.1)
+PAM +AUDIT +SELINUX -APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN -IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY +P11KIT -QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK +XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified

By adding seccomp to the list of SystemCallFilter, I can start the service.

@kumargu
Copy link
Contributor

kumargu commented May 15, 2025

@RajatGupta02 let's publish the change to include seccomp in SystemCallFilter. Its a non destructive change in presence of CAP_SYS_ADMIN and CAP_SYS_NET

@inntran
Copy link

inntran commented May 15, 2025

As of systemd 252, see output of systemd-analyze syscall-filter
madvice, get_mempolicy are included in @system-service directly.
mlock, mlock2, munlock are included in @memlock which is included in @system-service.
sched_getaffinity is included in @default, which has syscalls that are always permitted, and included in @system-service
sched_setaffinity is included in @resources, which is included in @system-service.
fcntl is included in @file-system, which is included in @system-service.

So only seccomp mincore is required in addition to @system-service.

In the comment, A set starts with "@" character, followed by name of the set. see: https://www.freedesktop.org/software/systemd/man/latest/systemd.exec.html#System%20Call%20Filtering

@kumargu
Copy link
Contributor

kumargu commented May 15, 2025

As of systemd 252, see output of systemd-analyze syscall-filter
madvice, get_mempolicy are included in @System-service directly.
mlock, mlock2, munlock are included in @memlock which is included in @System-service.
sched_getaffinity is included in @default, which has syscalls that are always permitted, and included in @System-service
sched_setaffinity is included in @resources, which is included in @System-service.
fcntl is included in @File-system, which is included in @System-service.

@inntran we want to be explicit because we don't know what version of systemd would be installed on client machines.

@inntran
Copy link

inntran commented May 15, 2025

May I suggest we separate seccomp and mincore from others, as they are very likely to be included in system-service already, e.g. two lines of SystemCallFilter.

@kumargu
Copy link
Contributor

kumargu commented May 15, 2025

yeah sure. makes sense.

@kumargu
Copy link
Contributor

kumargu commented May 15, 2025

#18309 is merged, let's us know if the issue is fixed.

@landon-lengyel
Copy link
Author

My test and production environment seem to start fine with that new service file. Thank you for tackling this one so quickly! I hope to see the repos updated with it soon.
@kumargu

@kumargu
Copy link
Contributor

kumargu commented May 17, 2025

I will circle back on this GH issue once I have information: if there's a plan to release a patch version for 3.0.

@cwperks
Copy link
Member

cwperks commented May 18, 2025

Not sure if its related, but I saw a forum post on debian startup issues: https://forum.opensearch.org/t/problem-installing-opensearch-on-debian-in-lxc-unprivileged-container/24344?u=cwperks

@RajatGupta02
Copy link
Contributor

Not sure if its related, but I saw a forum post on debian startup issues: https://forum.opensearch.org/t/problem-installing-opensearch-on-debian-in-lxc-unprivileged-container/24344?u=cwperks

This seems to be due to a custom path opensearch-shm being configured by the user, we just have a /dev/shm path configured instead. So its not related to this issue.

@krisfreedain
Copy link
Member

Catch All Triage - 1 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Other
Projects
None yet
Development

No branches or pull requests

7 participants