Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.

Bug 983129

Summary: nagios-3.4.1-2.el6 init script overwrites pid file unnecessarily
Product: [Fedora] Fedora EPEL Reporter: Jason Kincl <jkincl>
Component: nagiosAssignee: Jose Pedro Oliveira <jose.p.oliveira.oss>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: unspecified    
Version: el6CC: affix, jkincl, jose.p.oliveira.oss, kevin.sumner, lemenkov, linux, ondrejj, shawn.starr
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: nagios-3.5.1-1.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-09-15 18:33:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jason Kincl 2013-07-10 14:57:35 UTC
Description of problem:
In the init script for nagios that is shipped with the RPM: /etc/rc.d/init.d/nagios, when executing "start" after starting the nagios binary runs: 

pidof nagios > $NagiosRunFile

which overwrites the pid that the nagios binary writes to the file /var/run/nagios.pid when it starts with extra pids of child processes spawned from the nagios parent process. This causes erroneous errors to be displayed when a "stop" or "reload" are executed later on because those child processes have finished executing.

Version-Release number of selected component (if applicable):

nagios-3.4.1-2.el6

How reproducible:



Steps to Reproduce:
1. Start nagios daemon with init script with some hosts and services configured
2. Wait a few minutes for the initial spawned child processes to finish
3. Use init script to reload, restart, stop the nagios daemon

Actual results:

# /etc/init.d/nagios restart
Running configuration check...done.
Stopping nagios: /etc/init.d/nagios: line 74: kill: (30429) - No such process
/etc/init.d/nagios: line 74: kill: (30408) - No such process
done.
Starting nagios: done.


Expected results:

# /etc/init.d/nagios restart
Running configuration check...done.
Stopping nagios: done.
Starting nagios: done.


Additional info:

Comment 1 Jose Pedro Oliveira 2013-07-14 00:36:04 UTC
Jason,

I can't reproduce the problem with nagios-3.5.0-1.el6:
  
   # rpm -q nagios
   nagios-3.5.0-1.el6.i686

   # /etc/init.d/nagios restart
   Running configuration check...done.
   Stopping nagios: done.
   Starting nagios: done.

Could you upgrade to the latest nagios version available in EPEL6 and see you can reproduce the problem?

tia,
jpo

Comment 2 Kevin Sumner 2013-08-20 22:30:39 UTC
I'm actually seeing this with the 3.4.4 version, but after checking 3.5.0 RPM from EPEL6, I believe this would still be the case.  The creation of the nagios.pid file with multiple PIDs is dependent upon a race condition between the Nagios process spawning a child and pidof being executed by the init script.

- If Nagios process spawns a child first, a second (possibly more) PIDs are see in nagios.pid
- If pidof runs before a spawn, only one PID is found in nagios.pid

So the outcome is dependent on a lot of factors, particularly if nagios needs to spawn immediately (e.g. for a particular check to run).  I would imagine that an empty or small Nagios config would favor correct behavior, as there's little to no need for it to spawn children, so reproducibility would be difficult.  As case study, I've only seen this begin to happen after our config has grown significantly over the past year.

Regardless, here's the patch that should fix this, taken against nagios-3.5.0-1.el6.x86_64.  Since Nagios manages its own PID file just fine, there's no need for the init script to overwrite the config file with pidof.

diff -u /tmp/nagios.orig /etc/init.d/nagios
--- /tmp/nagios.orig    2013-08-20 22:19:50.158724164 +0000
+++ /etc/init.d/nagios  2013-08-20 22:19:59.501536667 +0000
@@ -138,7 +138,6 @@
                        chown $NagiosUser:$NagiosGroup $NagiosRunFile
                        [ -x /sbin/restorecon ] && /sbin/restorecon $NagiosRunFile
                        $NagiosBin -d $NagiosCfgFile
-                        pidof nagios > $NagiosRunFile
                        if [ -d $NagiosLockDir ]; then touch $NagiosLockDir/$NagiosLockFile; fi
                        echo " done."
                        exit 0

Comment 3 Jose Pedro Oliveira 2013-08-29 01:55:59 UTC
TODO list (starting point: git master branch):

 1. The "pidof nagios > $NagiosRunFile" line is being added by the patch
    nagios-0001-from-rpm.patch.

 2. The patch nagios-0002-SELinux-relabeling.patch also needs to be updated

Comment 4 Fedora Update System 2013-08-29 03:13:51 UTC
nagios-3.5.0-2.el6 has been submitted as an update for Fedora EPEL 6.
https://admin.fedoraproject.org/updates/nagios-3.5.0-2.el6

Comment 5 Jose Pedro Oliveira 2013-08-29 03:21:35 UTC
Changes also in nagios-3.5.0-9.fc20 and nagios-3.5.0-9.fc21.

Koji nagios builds:
http://koji.fedoraproject.org/koji/packageinfo?packageID=2593

Comment 6 Fedora Update System 2013-08-29 17:42:25 UTC
Package nagios-3.5.0-2.el6:
* should fix your issue,
* was pushed to the Fedora EPEL 6 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=epel-testing nagios-3.5.0-2.el6'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-EPEL-2013-11385/nagios-3.5.0-2.el6
then log in and leave karma (feedback).

Comment 7 Fedora Update System 2013-08-30 22:31:26 UTC
nagios-3.5.1-1.el6 has been submitted as an update for Fedora EPEL 6.
https://admin.fedoraproject.org/updates/nagios-3.5.1-1.el6

Comment 8 Fedora Update System 2013-09-15 18:33:01 UTC
nagios-3.5.1-1.el6 has been pushed to the Fedora EPEL 6 stable repository.  If problems still persist, please make note of it in this bug report.