Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1020216 - libvirt fails to shut down domain: could not destroy libvirt domain: Requested operation is not valid: domain is not running
Summary: libvirt fails to shut down domain: could not destroy libvirt domain: Requeste...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Fedora
Classification: Fedora
Component: libvirt
Version: 22
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Libvirt Maintainers
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: TRACKER-bugs-affecting-libguestfs
TreeView+ depends on / blocked
 
Reported: 2013-10-17 10:08 UTC by Richard W.M. Jones
Modified: 2016-04-26 14:12 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-09-21 22:09:03 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Richard W.M. Jones 2013-10-17 10:08:59 UTC
Description of problem:

I get this error intermittently when calling virDomainDestroyFlags
with flags=VIR_DOMAIN_DESTROY_GRACEFUL.

Fatal error: exception Guestfs.Error("could not destroy libvirt domain: Requested operation is not valid: domain is not running [code=55 domain=10]")

The domain has possibly exited itself before we call
virDomainDestroyFlags.

However, and this is strange: if I add a sleep to the guest
so it doesn't shut down immediately, eg. 'sleep 30', then
virDomainDestroyFlags will hang for 30 seconds, and *then*
give the same error as above.

There are no errors in the qemu log file.

qemu does not appear to be segfaulting (so different from bug 853369).

Version-Release number of selected component (if applicable):

libvirt-daemon-1.1.3-2.fc21.x86_64
qemu-1.4.2-11.fc19.x86_64
kernel-3.10.9-200.fc19.x86_64

(Will try updating to qemu from Rawhide shortly)

How reproducible:

Not reliably reproducible.  Right now on my laptop it's happening
90% of the time, but usually it doesn't happen at all.

Steps to Reproduce:
1. Run a virt tool such as virt-resize.

Comment 1 Richard W.M. Jones 2013-10-17 11:23:27 UTC
Some more random data points:

If the machine is loaded with disk activity, then the bug doesn't
happen.  It seems like a race condition of some sort.

Upgrading to qemu-1.6.0-10.fc21 does appear to have made the bug
happen less often.

I'm afraid I don't have a good reproducer for this.  It may
be connected with ./configure --enable-valgrind-daemon which is
a debugging option that changes the order of shutdown: in production
builds we always rely on libvirt actively killing qemu, but when
--enable-valgrind-daemon is used, the appliance can shut itself
down.  Production builds would never have this option enabled.

For reference the command I'm actually using to reproduce this locally is:

LIBGUESTFS_DEBUG=1 ./run ./builder/website/test-guest.sh fedora-18

Comment 2 Richard W.M. Jones 2013-10-18 12:43:39 UTC
(In reply to Richard W.M. Jones from comment #0)
> However, and this is strange: if I add a sleep to the guest
> so it doesn't shut down immediately, eg. 'sleep 30', then
> virDomainDestroyFlags will hang for 30 seconds, and *then*
> give the same error as above.

Note: This part is NOT strange.  The hang here was in libguestfs.
Just ignore this paragraph in the bug description.

Comment 3 Daniel Berrangé 2013-10-18 12:45:40 UTC
On the surface this doesn't really look like a bug. If the guest is not running when virDomainDestroyFlags is called, then getting back this error code is expected. So the real question here is why QEMU is exited before libguestfs expected it to.

Comment 4 Daniel Berrangé 2013-10-18 12:49:11 UTC
Can you capture a trace of libvirtd with the following log settings

  LIBVIRT_LOG_OUTPUTS="1:qemu 1:command 1:security 1:process 1:cgroup"

while triggering the 'virDomainDestroyFlags' API, and also provide the corresponding /var/log/libvirt/qemu/$GUEST.log.  The timestamps between the two may let us identify the sequencing

Comment 5 Richard W.M. Jones 2013-10-18 13:49:52 UTC
Unfortunately, the overhead of debugging makes the bug go away ...

Here is the script I'm using:

-------------------
vfile=/tmp/libvirt.log
gfile=/tmp/guestfs.log
rm -f $vfile $gfile
dir=$HOME/d/libguestfs

export LIBVIRT_DEBUG=1
export LIBVIRT_LOG_OUTPUTS="1:qemu 1:command 1:security 1:process 1:cgroup 1:file:$vfile"
export LIBGUESTFS_DEBUG=1
export LIBGUESTFS_TRACE=1

$dir/run $dir/builder/virt-builder \
  fedora-19 --output /tmp/fedora-19.img --size 10G |& tee $gfile

ls -l $vfile $gfile
-------------------

Why does that script never write to libvirt.log?

(In reply to Daniel Berrange from comment #3)
> On the surface this doesn't really look like a bug. If the guest is not
> running when virDomainDestroyFlags is called, then getting back this error
> code is expected. So the real question here is why QEMU is exited before
> libguestfs expected it to.

As I mentioned on IRC:

(1) We need to find out if qemu segfaulted during shutdown.
That's the reason for the graceful flag:
https://bugzilla.redhat.com/show_bug.cgi?id=853369#c12

(2) While it may be true that currently virDomainDestroyFlags acts
like you've described, it's not useful behaviour.  What we really
want is more like how Unix kill + waitpid works, ie. you can kill
a process and wait for its exit status, and that works even if the
process exits itself before or between the two system calls.

Comment 6 Daniel Berrangé 2013-10-18 14:46:29 UTC
My bad, I gave the wrong env variable name

  LIBVIRT_LOG_FILTERS="1:qemu 1:command 1:security 1:process 1:cgroup"
  LIBVIRT_LOG_OUTPUTS="1:file:/var/log/libvirt/libvirtd.log"

Comment 7 Richard W.M. Jones 2014-03-06 17:39:19 UTC
For some reason this bug has started happening again.

libvirt-1.2.2-1.fc21.x86_64
qemu-1.7.0-5.fc21.x86_64

I'll see if I can collect some debug information this time ...

Comment 8 Jaroslav Reznik 2015-03-03 15:08:42 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 22 development cycle.
Changing version to '22'.

More information and reason for this action is here:
https://fedoraproject.org/wiki/Fedora_Program_Management/HouseKeeping/Fedora22

Comment 9 Cole Robinson 2015-09-21 22:09:03 UTC
Haven't heard much on this bug for a while, so assuming it's gone. If anyone is still hitting this, please reopen


Note You need to log in before you can comment on or make changes to this bug.