Skip to content

salt.states.service fails to recognize init.d/sysv services on systemd systems #11900

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
uvsmtid opened this issue Apr 10, 2014 · 13 comments
Closed
Labels
Bug broken, incorrect, or confusing behavior Execution-Module P3 Priority 3 Platform Relates to OS, containers, platform-based utilities like FS, system based apps severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around stale
Milestone

Comments

@uvsmtid
Copy link
Contributor

uvsmtid commented Apr 10, 2014

Problem/Example

This is a simple Salt state to enable and start jenkins service:

# jenkins.sls
activate_jenkins_service:
    service.running:
        - name: jenkins
        - enable: True

Official Jenkins installation on RedHat/CentOS/Fedora uses init.d/sysv scripts.

Manually enabling and starting through init.d/sysv scripts perfectly works even on systemd-based Fedora 20:

systemctl enable jenkins                                                                                                                                                                             
jenkins.service is not a native service, redirecting to /sbin/chkconfig.
Executing /sbin/chkconfig jenkins on

systemctl start jenkins

On the other hand, Salt fails to execute the state:

salt-call -l all state.sls jenkins
...
          ID: activate_jenkins_service
    Function: service.running
        Name: jenkins
      Result: False
     Comment: The named service jenkins is not available
     Changes:   
...

Cause

The problem stems from the fact that Salt executes systemctl list-unit-files command which only lists systemd unit files excluding init.d/sysv scripts:

...
[INFO    ] Executing state service.running for jenkins
[INFO    ] Executing command 'systemctl --full list-unit-files | col -b' in directory '/root'
...

Because Salt doesn't see required jenkins service in the list of unit files, it doesn't pass next execution to systemctl for enabling/starting/... the service and does't let systemctl to tell "authoritatively" about actual existence of the service.

Proposal

This issue is very closely related to issue #8444 (as far as the proposed solution is concerned) and described in this comment.

Rather than executing any pre-validation logic (i.e. finding service name somewhere), Salt should rely on systemd (and its systemctl command) to determine whether states to enable/start/... the service failed or succeeded. In other words, Salt should execute systemctl with any arbitrary service name optimistically and report result of the execution instead of trying to predict its outcome.

Workaround

Again, see it in issue #8444.

Versions

Master and minion is the same host with Fedora 20 x86_64:

 salt --versions-report
           Salt: 2014.1.1
         Python: 2.7.5 (default, Feb 19 2014, 13:47:28)
         Jinja2: 2.7.1
       M2Crypto: 0.21.1
 msgpack-python: 0.1.13
   msgpack-pure: Not Installed
       pycrypto: 2.6.1
         PyYAML: 3.10
          PyZMQ: 13.0.2
            ZMQ: 3.2.4
@cachedout
Copy link
Contributor

I agree with what you're saying here. Let's try and get this in.

@cachedout
Copy link
Contributor

@mtorromeo says this should be fixed by #11921. @uvsmtid can you verify?

@uvsmtid
Copy link
Contributor Author

uvsmtid commented Apr 16, 2014

@cachedout and @mtorromeo Thanks for updates!

I cherry-picked both 90bece1 and 9617d33 on top of 2014.1 (latest develop had some unrelated issues) in my virtualenv.

#8444 looks fixed

I used similar state mentioned there in its example

activate_vpn_service:
    service.running:
        - name: [email protected]
        - enable: True

Indeed, commit 9617d33 handles @ in systemd unit names to make it work.
And while it still uses systemctl --full list-units command (see problems for init.d/sysv service next), parameterized services were listed in all my tries.

#11900 (this issue) still have problems

See example of Jenkins service state in the beginning of this issue.
After variations with enable/disable start/stop I can conclude that it doesn't work in general case. And here is why...

The code after commit 90bece1 still uses command systemctl --full list-units which simply does not list init.d/sysv until they are started on the system (only when they are started: enable/disable won't affect anything).
For example, start jenkins service manually and try to list it:

sudo systemctl start jenkins
systemctl --full list-units | grep jenkins
jenkins.service
# OK

Then stop jenkins service manually and execute:

sudo systemctl stop jenkins
systemctl --full list-units | grep jenkins
# ERROR: no output captured by grep

Although it seems more like an issue with systemd (I have even updated it here), the fastest fix is still possible through salt only. The argument is that systemctl --full list-units is not required to manage service.

@cachedout
Copy link
Contributor

@uvsmtid This is great feedback, thank you! I'll go ahead and close #8444 then and we'll keep working on this one.

@smithjm
Copy link

smithjm commented Aug 20, 2014

This is still broken in 2014.1.10 on Fedora-20. While there is a kludgy workaround, this really does need to be fixed. the workaround, for those in a CICD environment who need to clear out any blocks in their pipeline, is ugly but works (this example is for Centrify, which also uses sysv init-style files but is managable with systemd under FC20):

centrify-service:
  service.running:
    - name: centrifydc
    - enable: True
    - reload: True
    - watch:
      - file: /etc/centrifydc/centrifydc.conf
    - require:
      - pkg: centrify-packages
      - file: centrify-config
      - cmd: centrify-adjoin
{%- if salt['grains.get']('osfinger', 'undefined') == 'Fedora-20' %}
    - provider: service
{%- endif %}

@LordFPL
Copy link

LordFPL commented Apr 3, 2015

Hello,

A little update for a strange thing :

salt-call service.available registrator.service
[INFO    ] Executing command 'systemctl --all --full --no-legend --no-pager list-units | col -b' in directory '/root'
[INFO    ] Executing command 'systemctl --full --no-legend --no-pager list-unit-files | col -b' in directory '/root'
[INFO    ] Legacy init script: "README".
[INFO    ] Legacy init script: "functions".
[INFO    ] Legacy init script: "netconsole".
[INFO    ] Legacy init script: "network".
local:
    False

But :

salt-call service.available registrator
[INFO    ] Executing command 'systemctl --all --full --no-legend --no-pager list-units | col -b' in directory '/root'
[INFO    ] Executing command 'systemctl --full --no-legend --no-pager list-unit-files | col -b' in directory '/root'
[INFO    ] Legacy init script: "README".
[INFO    ] Legacy init script: "functions".
[INFO    ] Legacy init script: "netconsole".
[INFO    ] Legacy init script: "network".
local:
    True

Why don't support the ".service" ? On systemd both are working :/

(and it's make me a little headache to find this...)

@jfindlay jfindlay added the Platform Relates to OS, containers, platform-based utilities like FS, system based apps label May 26, 2015
@blbradley
Copy link
Contributor

This happens to me when using hadoop-formula's hadoop.hdfs state. It starts three different services. The first service started by the highstate during a fresh run is not found. The rest of the services are found and function as normal. A second highstate run proceeds normally. This possibly indicates that Salt is reloading systemd later in the process than needed.

State:

{% if hdfs.is_namenode or hdfs.is_datanode %}
hdfs-services:
  service.running:
    - enable: True
    - names:
{% if hdfs.is_namenode %}
      - hadoop-secondarynamenode
      - hadoop-namenode
{% endif %}
{% if hdfs.is_datanode %}
      - hadoop-datanode
{% endif %}

{% endif %}

I also extend hdfs-services with provider: debian_service. I've tried it with the default for Debian Jessie (provider: systemd) with same results.

/var/log/salt/minion:

[INFO    ] Executing command 'service hadoop-namenode status' in directory '/root'
[ERROR   ] Command 'service hadoop-namenode status' failed with return code: 3
[ERROR   ] output: * hadoop-namenode.service
   Loaded: not-found (Reason: No such file or directory)
   Active: inactive (dead)
[INFO    ] Executing command 'service hadoop-namenode start' in directory '/root'
[ERROR   ] Command 'service hadoop-namenode start' failed with return code: 6
[ERROR   ] output: Failed to start hadoop-namenode.service: Unit hadoop-namenode.service failed to load: No such file or directory.

Versions report:

                  Salt: 2015.5.0
                Python: 2.7.9 (default, Mar  1 2015, 12:57:24)
                Jinja2: 2.7.3
              M2Crypto: 0.21.1
        msgpack-python: 0.4.2
          msgpack-pure: Not Installed
              pycrypto: 2.6.1
               libnacl: Not Installed
                PyYAML: 3.11
                 ioflo: Not Installed
                 PyZMQ: 14.4.0
                  RAET: Not Installed
                   ZMQ: 4.0.5
                  Mako: 1.0.0
 Debian source package: 2015.5.0+ds-1~bpo8+1

I would like to debug this further but haven't debugged Salt much since I switched from Salt SSH to Master/Minion setup. Suggestions?

@jfindlay jfindlay added severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around P3 Priority 3 and removed severity-low 4th level, cosemtic problems, work around exists labels Jul 28, 2015
@thequailman
Copy link

Running, CentOS 7, Salt version 2015.8.8.2. Cassandra is affected by this as well. As a work around, running this kludge works:

cassandra_kludge:
  cmd.run:
    - name: systemctl enable cassandra
    - unless: systemctl -a | grep cassandra

cassandra_service:
  service.running:
    - name: cassandra
    - init_delay: 10
    - require:
        - cmd: cassandra_kludge

@uvsmtid
Copy link
Contributor Author

uvsmtid commented Apr 28, 2016

This even made me update the bug in systemd again.

My test still confirm that there is no known way by systemd to list disabled services based on init.d/sysv scripts. The best current solution would be enabling/starting/stopping/disabling service and checking error code returned by systemctl - it will fail if there is no such service, but it will succeed if there is one without need to know upfront about it.

@Talkless
Copy link
Contributor

I have discovered somewhat similar problem on Debian Jessie, when I deploy new sysv script and try to use service.running state. I get:

2016-11-28 13:59:10,206 [salt.state       ][INFO    ][1092] Running state [pgbouncer-web-login] at time 13:59:10.205771
2016-11-28 13:59:10,207 [salt.state       ][INFO    ][1092] Executing state service.running for pgbouncer-web-login
2016-11-28 13:59:10,209 [salt.loaded.int.module.cmdmod][INFO    ][1092] Executing command ['systemctl', 'status', 'pgbouncer-web-login.service', '-n', '0'] in directory '/root'
2016-11-28 13:59:10,229 [salt.loaded.int.module.cmdmod][DEBUG   ][1092] output: * pgbouncer-web-login.service
   Loaded: not-found (Reason: No such file or directory)
   Active: inactive (dead)
2016-11-28 13:59:10,230 [salt.state       ][ERROR   ][1092] The named service pgbouncer-web-login is not available

Whole idea is to create /etc/init.d/pgbouncer-web-login daemon which is modification (copy) of /etc/init.d/pgbouncer (pgbouncer does not yet support systemd), but with different ports, configs, etc. because of need to have multiple pgbouncer pools, but that's details.

I had no problem on Wheesy, but on Jessie with systemd it seems that I have to execute systemctl daemon-reload (using module.wait -> cmd.run) to make new init.d script "visible" and service.running to work.

But does that mean that service.running should always reload systemd configuration? Would it be.. "bad" in any case?

@seanjnkns
Copy link
Contributor

Still see this same issue:
Salt Version:
Salt: 2016.3.5

Dependency Versions:
cffi: 0.8.6
cherrypy: Not Installed
dateutil: 1.5
gitdb: Not Installed
gitpython: Not Installed
ioflo: Not Installed
Jinja2: 2.7.2
libgit2: Not Installed
libnacl: Not Installed
M2Crypto: 0.21.1
Mako: 0.8.1
msgpack-pure: Not Installed
msgpack-python: 0.4.8
mysql-python: 1.2.5
pycparser: 2.14
pycrypto: 2.6.1
pygit2: Not Installed
Python: 2.7.5 (default, Sep 15 2016, 22:37:39)
python-gnupg: Not Installed
PyYAML: 3.11
PyZMQ: 15.3.0
RAET: Not Installed
smmap: Not Installed
timelib: Not Installed
Tornado: 4.2.1
ZMQ: 4.1.4

System Versions:
dist: centos 7.2.1511 Core
machine: x86_64
release: 4.4.52-2.el7.centos.x86_64
system: Linux
version: CentOS Linux 7.2.1511 Core

Using a very simple file.managed + service.running/enable

vxlan SysV service file:

/etc/init.d/vxlan:
file.managed:
- source: salt://services/vxlan/vxlan
- user: root
- group: root
- mode: 755
- require_in:
- service: vxlan

vxlan:
service.running:
- enable: True

If I chkconfig --add vxlan and then re-run these states, no problem. BTW, this appears to be a regression as I don't recall having this issue in 2016.3.4. I haven't tested 2016.11.3, which came out today, as we're not quite ready to move to that yet. Although, I'm inclined to just change this to a systemd service given I have full control over this one regardless of the bug in salt.

@devopsprosiva
Copy link

I ran into the same issue with cassandra init service on centos7. @gtmanfred suggested using the provider option for service.running which fixed the issue for me.

https://docs.saltstack.com/en/latest/ref/states/providers.html

start cassandra:
  service.running:
    - name: cassandra
    - provider: rh_service

@stale
Copy link

stale bot commented May 8, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

@stale stale bot added the stale label May 8, 2019
@stale stale bot closed this as completed May 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior Execution-Module P3 Priority 3 Platform Relates to OS, containers, platform-based utilities like FS, system based apps severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around stale
Projects
None yet
Development

No branches or pull requests