Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1699993 - Erlang crash dump every minute
Summary: Erlang crash dump every minute
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rabbitmq-server
Version: 13.0 (Queens)
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: z8
: 13.0 (Queens)
Assignee: Peter Lemenkov
QA Contact: pkomarov
URL:
Whiteboard:
: 1714128 1751615 (view as bug list)
Depends On:
Blocks: 1715315
TreeView+ depends on / blocked
 
Reported: 2019-04-15 14:31 UTC by Chris Hudson
Modified: 2019-12-29 09:55 UTC (History)
12 users (show)

Fixed In Version: rabbitmq-server-3.6.15-4.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-09-03 16:58:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 4060901 0 Troubleshoot None Erlang crash dump every minute on running OpenStack cloud 2019-05-07 17:06:07 UTC
Red Hat Product Errata RHBA-2019:2623 0 None None None 2019-09-03 16:58:32 UTC

Comment 2 John Eckersberg 2019-04-15 19:20:14 UTC
Important bits from the case notes, Chris and I were looking at this last week before the BZ was opened...

The crash slogan:

Slogan: Kernel pid terminated (application_controller) ({application_start_failure,kernel,{{shutdown,{failed_to_start_child,net_sup,{shutdown,{failed_to_start_child,net_kernel,{'EXIT',nodistribution}}}}},{k

Systemtap was used to watch all exec()s and capture the cmdline of the crashing beam.smp.  It looks like:

Fri Apr  5 17:41:51 2019   2013  74073 134463       beam.smp /usr/lib64/erlang/erts-7.3.1.6/bin/beam.smp -- -root /usr/lib64/erlang -progname erl -- -home /var/lib/rabbitmq -- -sname epmd-starter-144528111 -proto_dist "inet_tcp" -noshell -eval halt().

This gets called from rabbit_nodes_common:ensure_epmd here:

https://github.com/rabbitmq/rabbitmq-common/blob/v3.6.x/src/rabbit_nodes_common.erl#L37

Which is called from the rabbit_epmd_monitor process here:

https://github.com/rabbitmq/rabbitmq-server/blob/v3.6.x/src/rabbit_epmd_monitor.erl#L108

The epmd monitor fires the check timer every 60 seconds, thus explaining the regular period seen here.

What is not clear is why the epmd-starter exec fails to start distribution.  There is some initial debugging in the case files around perhaps issues with ipv6 and/or hostname resolution, but it's not apparent that either are responsible.

Everything seems to be functioning properly.  The service is up, registered, epmd is running, everything is clustered.  Just the epmd-starter crashes.  Since epmd is already running, it doesn't have any practical effect on the running system.

Comment 13 Peter Lemenkov 2019-06-14 14:49:02 UTC
*** Bug 1714128 has been marked as a duplicate of this bug. ***

Comment 17 David Hill 2019-07-04 14:24:36 UTC
Customer tried with 3.6.16 and it appears to have solved his problem.

Comment 21 Chris Hudson 2019-07-08 14:01:26 UTC
I will report back on #1 in comment 20. Placing need info on Peter re: #2 & #3.

-Chris

Comment 24 Peter Lemenkov 2019-07-23 15:59:12 UTC
Please, try rabbitmq-server-3.6.15-4.el7ost build. It shouldn't create so many coredumps.

Comment 36 errata-xmlrpc 2019-09-03 16:58:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2623

Comment 37 Peter Lemenkov 2019-09-12 09:33:16 UTC
*** Bug 1751615 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.