Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1020216
Summary: | libvirt fails to shut down domain: could not destroy libvirt domain: Requested operation is not valid: domain is not running | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Richard W.M. Jones <rjones> |
Component: | libvirt | Assignee: | Libvirt Maintainers <libvirt-maint> |
Status: | CLOSED DEFERRED | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 22 | CC: | berrange, clalancette, crobinso, itamar, jforbes, laine, libvirt-maint, veillard, virt-maint |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2015-09-21 22:09:03 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 910269 |
Description
Richard W.M. Jones
2013-10-17 10:08:59 UTC
Some more random data points: If the machine is loaded with disk activity, then the bug doesn't happen. It seems like a race condition of some sort. Upgrading to qemu-1.6.0-10.fc21 does appear to have made the bug happen less often. I'm afraid I don't have a good reproducer for this. It may be connected with ./configure --enable-valgrind-daemon which is a debugging option that changes the order of shutdown: in production builds we always rely on libvirt actively killing qemu, but when --enable-valgrind-daemon is used, the appliance can shut itself down. Production builds would never have this option enabled. For reference the command I'm actually using to reproduce this locally is: LIBGUESTFS_DEBUG=1 ./run ./builder/website/test-guest.sh fedora-18 (In reply to Richard W.M. Jones from comment #0) > However, and this is strange: if I add a sleep to the guest > so it doesn't shut down immediately, eg. 'sleep 30', then > virDomainDestroyFlags will hang for 30 seconds, and *then* > give the same error as above. Note: This part is NOT strange. The hang here was in libguestfs. Just ignore this paragraph in the bug description. On the surface this doesn't really look like a bug. If the guest is not running when virDomainDestroyFlags is called, then getting back this error code is expected. So the real question here is why QEMU is exited before libguestfs expected it to. Can you capture a trace of libvirtd with the following log settings LIBVIRT_LOG_OUTPUTS="1:qemu 1:command 1:security 1:process 1:cgroup" while triggering the 'virDomainDestroyFlags' API, and also provide the corresponding /var/log/libvirt/qemu/$GUEST.log. The timestamps between the two may let us identify the sequencing Unfortunately, the overhead of debugging makes the bug go away ... Here is the script I'm using: ------------------- vfile=/tmp/libvirt.log gfile=/tmp/guestfs.log rm -f $vfile $gfile dir=$HOME/d/libguestfs export LIBVIRT_DEBUG=1 export LIBVIRT_LOG_OUTPUTS="1:qemu 1:command 1:security 1:process 1:cgroup 1:file:$vfile" export LIBGUESTFS_DEBUG=1 export LIBGUESTFS_TRACE=1 $dir/run $dir/builder/virt-builder \ fedora-19 --output /tmp/fedora-19.img --size 10G |& tee $gfile ls -l $vfile $gfile ------------------- Why does that script never write to libvirt.log? (In reply to Daniel Berrange from comment #3) > On the surface this doesn't really look like a bug. If the guest is not > running when virDomainDestroyFlags is called, then getting back this error > code is expected. So the real question here is why QEMU is exited before > libguestfs expected it to. As I mentioned on IRC: (1) We need to find out if qemu segfaulted during shutdown. That's the reason for the graceful flag: https://bugzilla.redhat.com/show_bug.cgi?id=853369#c12 (2) While it may be true that currently virDomainDestroyFlags acts like you've described, it's not useful behaviour. What we really want is more like how Unix kill + waitpid works, ie. you can kill a process and wait for its exit status, and that works even if the process exits itself before or between the two system calls. My bad, I gave the wrong env variable name LIBVIRT_LOG_FILTERS="1:qemu 1:command 1:security 1:process 1:cgroup" LIBVIRT_LOG_OUTPUTS="1:file:/var/log/libvirt/libvirtd.log" For some reason this bug has started happening again. libvirt-1.2.2-1.fc21.x86_64 qemu-1.7.0-5.fc21.x86_64 I'll see if I can collect some debug information this time ... This bug appears to have been reported against 'rawhide' during the Fedora 22 development cycle. Changing version to '22'. More information and reason for this action is here: https://fedoraproject.org/wiki/Fedora_Program_Management/HouseKeeping/Fedora22 Haven't heard much on this bug for a while, so assuming it's gone. If anyone is still hitting this, please reopen |