Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 983129 - nagios-3.4.1-2.el6 init script overwrites pid file unnecessarily
Summary: nagios-3.4.1-2.el6 init script overwrites pid file unnecessarily
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora EPEL
Classification: Fedora
Component: nagios
Version: el6
Hardware: All
OS: Linux
unspecified
low
Target Milestone: ---
Assignee: Jose Pedro Oliveira
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-07-10 14:57 UTC by Jason Kincl
Modified: 2013-09-15 18:33 UTC (History)
8 users (show)

Fixed In Version: nagios-3.5.1-1.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-09-15 18:33:01 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Jason Kincl 2013-07-10 14:57:35 UTC
Description of problem:
In the init script for nagios that is shipped with the RPM: /etc/rc.d/init.d/nagios, when executing "start" after starting the nagios binary runs: 

pidof nagios > $NagiosRunFile

which overwrites the pid that the nagios binary writes to the file /var/run/nagios.pid when it starts with extra pids of child processes spawned from the nagios parent process. This causes erroneous errors to be displayed when a "stop" or "reload" are executed later on because those child processes have finished executing.

Version-Release number of selected component (if applicable):

nagios-3.4.1-2.el6

How reproducible:



Steps to Reproduce:
1. Start nagios daemon with init script with some hosts and services configured
2. Wait a few minutes for the initial spawned child processes to finish
3. Use init script to reload, restart, stop the nagios daemon

Actual results:

# /etc/init.d/nagios restart
Running configuration check...done.
Stopping nagios: /etc/init.d/nagios: line 74: kill: (30429) - No such process
/etc/init.d/nagios: line 74: kill: (30408) - No such process
done.
Starting nagios: done.


Expected results:

# /etc/init.d/nagios restart
Running configuration check...done.
Stopping nagios: done.
Starting nagios: done.


Additional info:

Comment 1 Jose Pedro Oliveira 2013-07-14 00:36:04 UTC
Jason,

I can't reproduce the problem with nagios-3.5.0-1.el6:
  
   # rpm -q nagios
   nagios-3.5.0-1.el6.i686

   # /etc/init.d/nagios restart
   Running configuration check...done.
   Stopping nagios: done.
   Starting nagios: done.

Could you upgrade to the latest nagios version available in EPEL6 and see you can reproduce the problem?

tia,
jpo

Comment 2 Kevin Sumner 2013-08-20 22:30:39 UTC
I'm actually seeing this with the 3.4.4 version, but after checking 3.5.0 RPM from EPEL6, I believe this would still be the case.  The creation of the nagios.pid file with multiple PIDs is dependent upon a race condition between the Nagios process spawning a child and pidof being executed by the init script.

- If Nagios process spawns a child first, a second (possibly more) PIDs are see in nagios.pid
- If pidof runs before a spawn, only one PID is found in nagios.pid

So the outcome is dependent on a lot of factors, particularly if nagios needs to spawn immediately (e.g. for a particular check to run).  I would imagine that an empty or small Nagios config would favor correct behavior, as there's little to no need for it to spawn children, so reproducibility would be difficult.  As case study, I've only seen this begin to happen after our config has grown significantly over the past year.

Regardless, here's the patch that should fix this, taken against nagios-3.5.0-1.el6.x86_64.  Since Nagios manages its own PID file just fine, there's no need for the init script to overwrite the config file with pidof.

diff -u /tmp/nagios.orig /etc/init.d/nagios
--- /tmp/nagios.orig    2013-08-20 22:19:50.158724164 +0000
+++ /etc/init.d/nagios  2013-08-20 22:19:59.501536667 +0000
@@ -138,7 +138,6 @@
                        chown $NagiosUser:$NagiosGroup $NagiosRunFile
                        [ -x /sbin/restorecon ] && /sbin/restorecon $NagiosRunFile
                        $NagiosBin -d $NagiosCfgFile
-                        pidof nagios > $NagiosRunFile
                        if [ -d $NagiosLockDir ]; then touch $NagiosLockDir/$NagiosLockFile; fi
                        echo " done."
                        exit 0

Comment 3 Jose Pedro Oliveira 2013-08-29 01:55:59 UTC
TODO list (starting point: git master branch):

 1. The "pidof nagios > $NagiosRunFile" line is being added by the patch
    nagios-0001-from-rpm.patch.

 2. The patch nagios-0002-SELinux-relabeling.patch also needs to be updated

Comment 4 Fedora Update System 2013-08-29 03:13:51 UTC
nagios-3.5.0-2.el6 has been submitted as an update for Fedora EPEL 6.
https://admin.fedoraproject.org/updates/nagios-3.5.0-2.el6

Comment 5 Jose Pedro Oliveira 2013-08-29 03:21:35 UTC
Changes also in nagios-3.5.0-9.fc20 and nagios-3.5.0-9.fc21.

Koji nagios builds:
http://koji.fedoraproject.org/koji/packageinfo?packageID=2593

Comment 6 Fedora Update System 2013-08-29 17:42:25 UTC
Package nagios-3.5.0-2.el6:
* should fix your issue,
* was pushed to the Fedora EPEL 6 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=epel-testing nagios-3.5.0-2.el6'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-EPEL-2013-11385/nagios-3.5.0-2.el6
then log in and leave karma (feedback).

Comment 7 Fedora Update System 2013-08-30 22:31:26 UTC
nagios-3.5.1-1.el6 has been submitted as an update for Fedora EPEL 6.
https://admin.fedoraproject.org/updates/nagios-3.5.1-1.el6

Comment 8 Fedora Update System 2013-09-15 18:33:01 UTC
nagios-3.5.1-1.el6 has been pushed to the Fedora EPEL 6 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.