Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1186749 - after the controller failover, the nova computer service cannot connect to controller
Summary: after the controller failover, the nova computer service cannot connect to co...
Keywords:
Status: CLOSED DUPLICATE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 5.0 (RHEL 7)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 6.0 (Juno)
Assignee: Eoghan Glynn
QA Contact: nlevinki
URL:
Whiteboard:
Depends On: 1175685
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-01-28 13:26 UTC by lidong chen
Modified: 2019-09-09 16:00 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-05-04 18:28:33 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
computer1 (480.00 KB, application/x-tar)
2015-01-28 13:26 UTC, lidong chen
no flags Details
computer2 (1.09 MB, application/x-tar)
2015-01-28 13:28 UTC, lidong chen
no flags Details
controller1 (1.39 MB, application/x-7z-compressed)
2015-01-28 13:31 UTC, lidong chen
no flags Details
controller2 (2.46 MB, application/x-7z-compressed)
2015-01-28 13:33 UTC, lidong chen
no flags Details
controller3 (3.22 MB, application/x-7z-compressed)
2015-01-28 13:36 UTC, lidong chen
no flags Details

Description lidong chen 2015-01-28 13:26:45 UTC
Created attachment 985144 [details]
computer1

Description of problem:
I used rhel_osp_installer deploy the high availability openstack.
when the contorller failover, i find the nova-computer service is abnormal.

i used 'pcs cluster standby' command at 2015-01-28 22:54 to trigger failover.

after the controller failover, the nova computer service cannot connect to controller.

I used 'nova-manage service list' command to display the status.

[root@mac04f93882f3ea ~]# nova-manage service list
Binary           Host                                 Zone             Status     State Updated_At
nova-cert        mac04f93882f3a2.example.com          internal         enabled    :-)   2015-01-28 15:50:35
nova-consoleauth mac04f93882f3a2.example.com          internal         enabled    :-)   2015-01-28 15:50:41
nova-scheduler   mac04f93882f3a2.example.com          internal         enabled    :-)   2015-01-28 15:50:40
nova-conductor   mac04f93882f3a2.example.com          internal         enabled    :-)   2015-01-28 15:50:32
nova-cert        macac4e914657d8.example.com          internal         enabled    XXX   2015-01-28 15:47:29
nova-consoleauth macac4e914657d8.example.com          internal         enabled    :-)   2015-01-28 15:50:40
nova-cert        mac04f93882f3ea.example.com          internal         enabled    :-)   2015-01-28 15:50:37
nova-consoleauth mac04f93882f3ea.example.com          internal         enabled    :-)   2015-01-28 15:50:35
nova-scheduler   macac4e914657d8.example.com          internal         enabled    XXX   2015-01-28 15:47:26
nova-conductor   macac4e914657d8.example.com          internal         enabled    XXX   2015-01-28 15:47:30
nova-scheduler   mac04f93882f3ea.example.com          internal         enabled    :-)   2015-01-28 15:50:39
nova-conductor   mac04f93882f3ea.example.com          internal         enabled    :-)   2015-01-28 15:50:31
nova-compute     mac04f93882f3f2.example.com          nova             enabled    XXX   2015-01-28 15:46:27
nova-compute     mac04f93882f3ca.example.com          nova             enabled    XXX   2015-01-28 15:46:29

this is the log of nova-computer.
2015-01-28 22:54:25.939 12696 ERROR oslo.messaging._drivers.impl_rabbit [-] Failed to publish message to topic 'conductor': Socket closed
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 622, in ensure
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     return method(*args, **kwargs)
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 718, in _publish
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     publisher = cls(self.conf, self.channel, topic, **kwargs)
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 379, in __init__
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     **options)
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 326, in __init__
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     self.reconnect(channel)
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 334, in reconnect
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     routing_key=self.routing_key)
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 85, in __init__
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     self.revive(self._channel)
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 218, in revive
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     self.declare()
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 105, in declare
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     self.exchange.declare()
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/entity.py", line 166, in declare
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     nowait=nowait, passive=passive,
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 620, in exchange_declare
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     (40, 11),  # Channel.exchange_declare_ok
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 67, in wait
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     self.channel_id, allowed_methods)
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 240, in _wait_method
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     self.method_reader.read_method()
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 189, in read_method
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     raise m
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit IOError: Socket closed
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit 
2015-01-28 22:54:25.942 12696 INFO oslo.messaging._drivers.impl_rabbit [-] Reconnecting to AMQP server on 192.168.88.184:5672
2015-01-28 22:54:25.942 12696 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds...
2015-01-28 22:54:26.956 12696 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on 192.168.88.184:5672
2015-01-28 22:54:52.327 12696 AUDIT nova.compute.resource_tracker [-] Auditing locally available compute resources
2015-01-28 22:55:26.972 12696 ERROR nova.servicegroup.drivers.db [-] model server went away
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db Traceback (most recent call last):
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.7/site-packages/nova/servicegroup/drivers/db.py", line 95, in _report_state
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db     service.service_ref, state_catalog)
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.7/site-packages/nova/conductor/api.py", line 218, in service_update
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db     return self._manager.service_update(context, service, values)
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.7/site-packages/nova/conductor/rpcapi.py", line 330, in service_update
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db     service=service_p, values=values)
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.7/site-packages/oslo/messaging/rpc/client.py", line 150, in call
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db     wait_for_reply=True, timeout=timeout)
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.7/site-packages/oslo/messaging/transport.py", line 90, in _send
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db     timeout=timeout)
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 412, in send
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db     return self._send(target, ctxt, message, wait_for_reply, timeout)
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 403, in _send
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db     result = self._waiter.wait(msg_id, timeout)
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 267, in wait
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db     reply, ending = self._poll_connection(msg_id, timeout)
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 217, in _poll_connection
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db     % msg_id)
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db MessagingTimeout: Timed out waiting for a reply to message ID c00916ecaa8b4926a8abecb4267ddf56
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db 
2015-01-28 22:55:26.973 12696 WARNING nova.openstack.common.loopingcall [-] task run outlasted interval by 51.035014 sec
2015-01-28 22:56:27.008 12696 ERROR nova.openstack.common.periodic_task [-] Error during ComputeManager.update_available_resource: Timed out waiting for a reply to message ID d928c1473ad24846a44f2651d5caa96d
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task Traceback (most recent call last):
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/openstack/common/periodic_task.py", line 182, in run_periodic_tasks
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task     task(self, context)
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5527, in update_available_resource
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task     rt.update_available_resource(context)
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/openstack/common/lockutils.py", line 249, in inner
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task     return f(*args, **kwargs)
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 315, in update_available_resource
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task     context, self.host, self.nodename)
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/objects/base.py", line 110, in wrapper
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task     args, kwargs)
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/conductor/rpcapi.py", line 425, in object_class_action
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task     objver=objver, args=args, kwargs=kwargs)
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/oslo/messaging/rpc/client.py", line 150, in call
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task     wait_for_reply=True, timeout=timeout)
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/oslo/messaging/transport.py", line 90, in _send
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task     timeout=timeout)
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 412, in send
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task     return self._send(target, ctxt, message, wait_for_reply, timeout)
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 403, in _send
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task     result = self._waiter.wait(msg_id, timeout)
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 267, in wait
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task     reply, ending = self._poll_connection(msg_id, timeout)
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 217, in _poll_connection
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task     % msg_id)
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task MessagingTimeout: Timed out waiting for a reply to message ID d928c1473ad24846a44f2651d5caa96d
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task 
2015-01-28 22:56:27.009 12696 WARNING nova.openstack.common.loopingcall [-] task run outlasted interval by 50.03548 sec
2015-01-28 22:56:27.012 12696 ERROR oslo.messaging._drivers.impl_rabbit [-] Failed to publish message to topic 'conductor': Socket closed
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 622, in ensure
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     return method(*args, **kwargs)
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 718, in _publish
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     publisher = cls(self.conf, self.channel, topic, **kwargs)
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 379, in __init__
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     **options)
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 326, in __init__
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     self.reconnect(channel)
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 334, in reconnect
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     routing_key=self.routing_key)
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 85, in __init__
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     self.revive(self._channel)
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 218, in revive
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     self.declare()
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 105, in declare
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     self.exchange.declare()
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/entity.py", line 166, in declare
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     nowait=nowait, passive=passive,
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 620, in exchange_declare
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     (40, 11),  # Channel.exchange_declare_ok
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 67, in wait
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     self.channel_id, allowed_methods)
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 240, in _wait_method
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     self.method_reader.read_method()
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 189, in read_method
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     raise m
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit IOError: Socket closed

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.used rhel_osp_installer deploy the high availability openstack.
2.used 'pcs cluster standby' command to trigger failover at each controller node.
3.used 'nova-manage service list' command to display the status.

Actual results:
the nova-computer service is abnormal.

Expected results:
the nova-computer service is normal.

Additional info:

Comment 1 lidong chen 2015-01-28 13:28:15 UTC
Created attachment 985146 [details]
computer2

Comment 2 lidong chen 2015-01-28 13:31:45 UTC
Created attachment 985147 [details]
controller1

Comment 4 lidong chen 2015-01-28 13:33:11 UTC
Created attachment 985155 [details]
controller2

Comment 5 lidong chen 2015-01-28 13:36:12 UTC
Created attachment 985156 [details]
controller3


Note You need to log in before you can comment on or make changes to this bug.