Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1910063
Summary: | internal inactive snapshot of a VM with <disk type='volume'> results in an abort() of libvirtd | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Morgan Weetman <mweetman> | ||||||
Component: | libvirt | Assignee: | Libvirt Maintainers <libvirt-maint> | ||||||
Status: | POST --- | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | unspecified | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 33 | CC: | agedosier, berrange, clalancette, itamar, jforbes, laine, libvirt-maint, pkrempa, rjones, veillard, virt-maint | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | Type: | Bug | |||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 910269 | ||||||||
Attachments: |
|
Description
Morgan Weetman
2020-12-22 14:16:08 UTC
I don't know if it's relevant, but virt-v2v creates the following libvirt XML: $ virt-v2v -i vmx OWASP\ Broken\ Web\ Apps.vmx -o local -of qcow2 -os /var/tmp/ [ 0.0] Opening the source -i vmx OWASP Broken Web Apps.vmx [ 0.0] Creating an overlay to protect the source from being modified [ 0.3] Opening the overlay [ 12.8] Inspecting the overlay [ 15.0] Checking for sufficient free disk space in the guest [ 15.0] Estimating space required on target for each disk [ 15.0] Converting Ubuntu 10.04.1 LTS to run on KVM virt-v2v: This guest has virtio drivers installed. [ 25.3] Mapping filesystem data to avoid copying unused and blank areas [ 26.4] Closing the overlay [ 26.7] Assigning disks to buses [ 26.7] Checking if the guest needs BIOS or UEFI to boot [ 26.7] Initializing the target -o local -os /var/tmp/ [ 26.7] Copying disk 1/1 to /var/tmp/OWASP Broken Web Apps VM v1.2-sda (qcow2) [ 69.6] Creating output metadata [ 69.6] Finishing off $ cat /var/tmp/OWASP\ Broken\ Web\ Apps\ VM\ v1.2.xml <?xml version='1.0' encoding='utf-8'?> <domain type='kvm'> <!-- generated by virt-v2v 1.43.2local,libvirt --> <name>OWASP Broken Web Apps VM v1.2</name> <metadata> <libosinfo:libosinfo xmlns:libosinfo='http://libosinfo.org/xmlns/libvirt/domain/1.0'> <libosinfo:os id='http://ubuntu.com/ubuntu/10.04'/> </libosinfo:libosinfo> </metadata> <memory unit='KiB'>1048576</memory> <currentMemory unit='KiB'>1048576</currentMemory> <vcpu>1</vcpu> <features> <acpi/> <apic/> </features> <os> <type arch='x86_64' machine='q35'>hvm</type> </os> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/var/tmp/OWASP Broken Web Apps VM v1.2-sda'/> <target dev='vda' bus='virtio'/> </disk> <!-- sata slot 0 is empty --> <!-- sata slot 1 is empty --> <disk device='cdrom' type='file'> <driver name='qemu' type='raw'/> <target dev='sdc' bus='sata'/> </disk> <interface type='network'> <source network='ethernet0'/> <model type='virtio'/> <mac address='00:0c:29:8f:ca:00'/> </interface> <video> <model type='qxl' ram='65536' heads='1'/> </video> <graphics type='vnc' autoport='yes' port='-1'/> <rng model='virtio'> <backend model='random'>/dev/urandom</backend> </rng> <memballoon model='virtio'/> <input type='tablet' bus='usb'/> <input type='mouse' bus='ps2'/> <console type='pty'/> </devices> </domain> It wouldn't be something stupid like the qcow2 file not having a .qcow2 extension would it? The path and file type is correct in the XML: $ file /var/tmp/OWASP\ Broken\ Web\ Apps\ VM\ v1.2-sda /var/tmp/OWASP Broken Web Apps VM v1.2-sda: QEMU QCOW2 Image (v3), 8589934592 bytes Morgan, could you tell us which version of: virt-v2v libvirt qemu you are using please. Hi Richard, I did rename the disk to have a .qcow2 extension at the same time I removed the spaces, but I discarded that as an unlikely cause. virt-v2v-1.43.1-5.fc33.1.x86_64 libvirt-6.6.0-3.fc33.x86_64 qemu-kvm-core-5.1.0-5.fc33.x86_64 thanks You were right Richard, it was the lack of file extension for the disk causing the crash. I ignored it because I believed that all Linux tools used magic or mime type to determine the file type rather than trusting the given extension. I wonder if the virt-manager reconnect is a sign of libvirtd itself crashing ? Can you see if there's any sign of that in you system logs. If so, it'd be useful to collect a stack trace of all threads from libvirtd. Hi Daniel, Yes, it looks like it crashes libvirtd which then restarts: Error creating snapshot: End of file while reading data: Input/output error Traceback (most recent call last): File "/usr/share/virt-manager/virtManager/asyncjob.py", line 65, in cb_wrapper callback(asyncjob, *args, **kwargs) File "/usr/share/virt-manager/virtManager/details/snapshots.py", line 237, in _do_create_snapshot self.vm.create_snapshot(xml) File "/usr/share/virt-manager/virtManager/object/domain.py", line 1124, in create_snapshot self._backend.snapshotCreateXML(xml, flags) File "/usr/lib64/python3.9/site-packages/libvirt.py", line 2814, in snapshotCreateXML if ret is None:raise libvirtError('virDomainSnapshotCreateXML() failed', dom=self) libvirt.libvirtError: End of file while reading data: Input/output error Jan 05 12:51:54 host.domain.tld libvirtd[1083138]: Domain id=16 name='guestfs-4vkzt50wou6tjvi4' uuid=8c9ee820-9bc9-45ae-a41a-f763f7e21312 is tainted: host-cpu Jan 05 12:51:56 host.domain.tld libvirtd[1083138]: missing device in NIC_RX_FILTER_CHANGED event Jan 05 12:52:07 host.domain.tld libvirtd[1083138]: End of file while reading data: Input/output error Jan 05 12:52:23 host.domain.tld libvirtd[1083138]: Domain id=17 name='guestfs-d1vki4vjni4svcbt' uuid=64d35f85-ebd2-4dd8-868e-ce841c35f887 is tainted: custom-argv Jan 05 12:52:23 host.domain.tld libvirtd[1083138]: Domain id=17 name='guestfs-d1vki4vjni4svcbt' uuid=64d35f85-ebd2-4dd8-868e-ce841c35f887 is tainted: host-cpu Jan 05 12:52:24 host.domain.tld libvirtd[1083138]: missing device in NIC_RX_FILTER_CHANGED event Jan 05 12:52:43 host.domain.tld libvirtd[1083138]: End of file while reading data: Input/output error Jan 05 12:53:03 host.domain.tld systemd[1]: libvirtd.service: Main process exited, code=killed, status=6/ABRT Jan 05 12:53:03 host.domain.tld systemd[1]: libvirtd.service: Failed with result 'signal'. Jan 05 12:53:03 host.domain.tld systemd[1]: libvirtd.service: Unit process 1601 (dnsmasq) remains running after unit stopped. Jan 05 12:53:03 host.domain.tld systemd[1]: libvirtd.service: Unit process 1602 (dnsmasq) remains running after unit stopped. Jan 05 12:53:03 host.domain.tld systemd[1]: libvirtd.service: Unit process 1653 (dnsmasq) remains running after unit stopped. Jan 05 12:53:03 host.domain.tld systemd[1]: libvirtd.service: Unit process 1654 (dnsmasq) remains running after unit stopped. Jan 05 12:53:03 host.domain.tld systemd[1]: libvirtd.service: Consumed 11min 30.671s CPU time. Jan 05 12:53:04 host.domain.tld systemd[1]: libvirtd.service: Scheduled restart job, restart counter is at 1. Jan 05 12:53:04 host.domain.tld systemd[1]: Stopped Virtualization daemon. Jan 05 12:53:04 host.domain.tld systemd[1]: libvirtd.service: Consumed 11min 30.671s CPU time. Jan 05 12:53:04 host.domain.tld systemd[1]: libvirtd.service: Found left-over process 1601 (dnsmasq) in control group while starting unit. Ignoring. Jan 05 12:53:04 host.domain.tld systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Jan 05 12:53:04 host.domain.tld systemd[1]: libvirtd.service: Found left-over process 1602 (dnsmasq) in control group while starting unit. Ignoring. Jan 05 12:53:04 host.domain.tld systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Jan 05 12:53:04 host.domain.tld systemd[1]: libvirtd.service: Found left-over process 1653 (dnsmasq) in control group while starting unit. Ignoring. Jan 05 12:53:04 host.domain.tld systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Jan 05 12:53:04 host.domain.tld systemd[1]: libvirtd.service: Found left-over process 1654 (dnsmasq) in control group while starting unit. Ignoring. Jan 05 12:53:04 host.domain.tld systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Jan 05 12:53:04 host.domain.tld systemd[1]: Starting Virtualization daemon... Jan 05 12:53:04 host.domain.tld systemd[1]: Started Virtualization daemon. Jan 05 12:53:04 host.domain.tld dnsmasq[1601]: read /etc/hosts - 2 addresses Jan 05 12:53:04 host.domain.tld dnsmasq[1653]: read /etc/hosts - 2 addresses Jan 05 12:53:04 host.domain.tld dnsmasq[1653]: read /var/lib/libvirt/dnsmasq/isolated.addnhosts - 0 addresses Jan 05 12:53:04 host.domain.tld dnsmasq[1601]: read /var/lib/libvirt/dnsmasq/default.addnhosts - 0 addresses Jan 05 12:53:04 host.domain.tld dnsmasq-dhcp[1601]: read /var/lib/libvirt/dnsmasq/default.hostsfile Jan 05 12:53:04 host.domain.tld dnsmasq-dhcp[1653]: read /var/lib/libvirt/dnsmasq/isolated.hostsfile I tried reproducing this bug with upstream libvirt and was unsuccessful. Please attach additional information according to our documentation on filing bugs: https://libvirt.org/bugs.html#quality specifically if libvirtd crashed, attach a backtrace as Daniel requested and also attach debug logs: https://libvirt.org/kbase/debuglogs.html We can definitely see libvirtd crashing from the system log here: Jan 05 12:53:03 host.domain.tld systemd[1]: libvirtd.service: Main process exited, code=killed, status=6/ABRT so we do need the stack trace. Created attachment 1745811 [details]
libvirtd debug log
Debug log attached
Created attachment 1745813 [details]
stack trace during snapshot attempt
So the backtrace: Thread 2 (Thread 0x7ff9f1b1d640 (LWP 1546780) "rpc-worker"): #0 0x00007ffa0096d9d5 in raise () from /lib64/libc.so.6 #1 0x00007ffa009568a4 in abort () from /lib64/libc.so.6 #2 0x00007ffa00df8f46 in virCommandAddArg.cold () from /lib64/libvirt.so.0 #3 0x00007ff9cb99c62d in qemuDomainSnapshotForEachQcow2Raw () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #4 0x00007ff9cba40ede in qemuDomainSnapshotCreateXML () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #5 0x00007ffa0106d634 in virDomainSnapshotCreateXML () from /lib64/libvirt.so.0 #6 0x00005641413c04b8 in remoteDispatchDomainSnapshotCreateXMLHelper () #7 0x00007ffa00f78b80 in virNetServerProgramDispatch () from /lib64/libvirt.so.0 #8 0x00007ffa00f7db64 in virNetServerHandleJob () from /lib64/libvirt.so.0 #9 0x00007ffa00e8d392 in virThreadPoolWorker () from /lib64/libvirt.so.0 #10 0x00007ffa00e8c985 in virThreadHelper () from /lib64/libvirt.so.0 #11 0x00007ffa00b1f3f9 in start_thread () from /lib64/libpthread.so.0 #12 0x00007ffa00a31903 in clone () from /lib64/libc.so.6 corresponds to the backtrace we've got in an upstream issue reported recently: https://gitlab.com/libvirt/libvirt/-/issues/97 What doesn't add up is that the problem in the issue happens only with <disk type='volume'> which isn't in the XML that libguestfs uses (posted by Rich) nor there is any mention of another disk being added. From the debug log though I can see that the definition has 2 disks and also that there's a <disk type='volume'> translation happening right before the crash happens: 2021-01-07 12:23:19.900+0000: 1471318: debug : virStoragePoolLookupByName:367 : conn=0x7f69300046d0, name=default 2021-01-07 12:23:19.900+0000: 1471318: debug : virStorageVolLookupByName:1330 : pool=0x7f694000c810, name=blah-sda Thus I'm infering that the reproducer steps abvoe are not accurate and in fact another disk was added to the VM which is configured as <disk type='volume'> thus this bug is the same as the above mentioned upstream issue. The following commit fixes it: commit 926563dc3a6c3f1598c5a54626acf31b7d3a6043 Author: Peter Krempa <pkrempa> Date: Mon Nov 23 11:13:19 2020 +0100 qemuDomainSnapshotForEachQcow2: Pass in 'def' rather than selecting it internally In some cases such as when creating an internal inactive snapshot we know that the domain definition in the snapshot is equivalent to the current definition. Additionally we set up the current definition for the snapshotting but not the one contained in the snapshot. Thus in some cases the caller knows better which def to use. Make qemuDomainSnapshotForEachQcow2 take the definition by the caller and copy the logic for selecting the definition to callers where we don't know for sure that the above claim applies. This fixes internal inactive snapshots when <disk type='volume'> is used as we translate the pool/vol combo only in the current def. Resolves: https://gitlab.com/libvirt/libvirt/-/issues/97 Signed-off-by: Peter Krempa <pkrempa> Reviewed-by: Ján Tomko <jtomko> v6.9.0-378-g926563dc3a The commit is in the v6.9.10 release which is available in the virt-preview repository. Moving to POST so that fedora maintainer sees this and decides whether it's worth backporting this commit. |