Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1910063 - internal inactive snapshot of a VM with <disk type='volume'> results in an abort() of libvirtd
Summary: internal inactive snapshot of a VM with <disk type='volume'> results in an ab...
Keywords:
Status: POST
Alias: None
Product: Fedora
Classification: Fedora
Component: libvirt
Version: 33
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Libvirt Maintainers
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: TRACKER-bugs-affecting-libguestfs
TreeView+ depends on / blocked
 
Reported: 2020-12-22 14:16 UTC by Morgan Weetman
Modified: 2021-01-09 11:43 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Embargoed:


Attachments (Terms of Use)
libvirtd debug log (12.53 MB, text/plain)
2021-01-09 10:06 UTC, Morgan Weetman
no flags Details
stack trace during snapshot attempt (8.97 KB, text/plain)
2021-01-09 10:24 UTC, Morgan Weetman
no flags Details

Description Morgan Weetman 2020-12-22 14:16:08 UTC
Description of problem:
qcow2 name with spaces cannot be snapshotted, virt-manager crashes and disconnects

Version-Release number of selected component (if applicable):
virt-manager-3.2.0-1.fc33.noarch

How reproducible:


Steps to Reproduce:
1. Download https://sourceforge.net/projects/owaspbwa/
2. Extract
3. sudo virt-v2v -i vmx -of qcow2 -os <storage_pool> <vm_name>.vmx
5. Attempt to create snapshot, virt-manager crashes and disconnects

Actual results:
virt-manager crashes

Expected results:
snapshot is successfully created, or a warning is provided regarding qcow2 names with spaces

Additional info:

Comment 1 Richard W.M. Jones 2020-12-22 15:06:22 UTC
I don't know if it's relevant, but virt-v2v creates the following libvirt XML:

$ virt-v2v -i vmx OWASP\ Broken\ Web\ Apps.vmx -o local -of qcow2 -os /var/tmp/
[   0.0] Opening the source -i vmx OWASP Broken Web Apps.vmx
[   0.0] Creating an overlay to protect the source from being modified
[   0.3] Opening the overlay
[  12.8] Inspecting the overlay
[  15.0] Checking for sufficient free disk space in the guest
[  15.0] Estimating space required on target for each disk
[  15.0] Converting Ubuntu 10.04.1 LTS to run on KVM
virt-v2v: This guest has virtio drivers installed.
[  25.3] Mapping filesystem data to avoid copying unused and blank areas
[  26.4] Closing the overlay
[  26.7] Assigning disks to buses
[  26.7] Checking if the guest needs BIOS or UEFI to boot
[  26.7] Initializing the target -o local -os /var/tmp/
[  26.7] Copying disk 1/1 to /var/tmp/OWASP Broken Web Apps VM v1.2-sda (qcow2)
[  69.6] Creating output metadata
[  69.6] Finishing off

$ cat /var/tmp/OWASP\ Broken\ Web\ Apps\ VM\ v1.2.xml 
<?xml version='1.0' encoding='utf-8'?>
<domain type='kvm'>
  <!-- generated by virt-v2v 1.43.2local,libvirt -->
  <name>OWASP Broken Web Apps VM v1.2</name>
  <metadata>
    <libosinfo:libosinfo xmlns:libosinfo='http://libosinfo.org/xmlns/libvirt/domain/1.0'>
      <libosinfo:os id='http://ubuntu.com/ubuntu/10.04'/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit='KiB'>1048576</memory>
  <currentMemory unit='KiB'>1048576</currentMemory>
  <vcpu>1</vcpu>
  <features>
    <acpi/>
    <apic/>
  </features>
  <os>
    <type arch='x86_64' machine='q35'>hvm</type>
  </os>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/tmp/OWASP Broken Web Apps VM v1.2-sda'/>
      <target dev='vda' bus='virtio'/>
    </disk>
    <!-- sata slot 0 is empty -->
    <!-- sata slot 1 is empty -->
    <disk device='cdrom' type='file'>
      <driver name='qemu' type='raw'/>
      <target dev='sdc' bus='sata'/>
    </disk>
    <interface type='network'>
      <source network='ethernet0'/>
      <model type='virtio'/>
      <mac address='00:0c:29:8f:ca:00'/>
    </interface>
    <video>
      <model type='qxl' ram='65536' heads='1'/>
    </video>
    <graphics type='vnc' autoport='yes' port='-1'/>
    <rng model='virtio'>
      <backend model='random'>/dev/urandom</backend>
    </rng>
    <memballoon model='virtio'/>
    <input type='tablet' bus='usb'/>
    <input type='mouse' bus='ps2'/>
    <console type='pty'/>
  </devices>
</domain>

It wouldn't be something stupid like the qcow2 file not having a .qcow2
extension would it?  The path and file type is correct in the XML:

$ file /var/tmp/OWASP\ Broken\ Web\ Apps\ VM\ v1.2-sda 
/var/tmp/OWASP Broken Web Apps VM v1.2-sda: QEMU QCOW2 Image (v3), 8589934592 bytes

Comment 2 Richard W.M. Jones 2020-12-22 15:07:04 UTC
Morgan, could you tell us which version of:

virt-v2v
libvirt
qemu

you are using please.

Comment 3 Morgan Weetman 2020-12-22 21:25:29 UTC
Hi Richard,

I did rename the disk to have a .qcow2 extension at the same time I removed the spaces, but I discarded that as an unlikely cause.

virt-v2v-1.43.1-5.fc33.1.x86_64
libvirt-6.6.0-3.fc33.x86_64
qemu-kvm-core-5.1.0-5.fc33.x86_64

thanks

Comment 4 Morgan Weetman 2020-12-23 06:19:59 UTC
You were right Richard, it was the lack of file extension for the disk causing the crash.
I ignored it because I believed that all Linux tools used magic or mime type to determine the file type rather than trusting the given extension.

Comment 5 Daniel Berrangé 2021-01-04 13:43:49 UTC
I wonder if the virt-manager reconnect is a sign of libvirtd itself crashing ? Can you see if there's any sign of that in you system logs. If so, it'd be useful to collect a stack trace of all threads from libvirtd.

Comment 6 Morgan Weetman 2021-01-05 02:01:23 UTC
Hi Daniel,

Yes, it looks like it crashes libvirtd which then restarts:

Error creating snapshot: End of file while reading data: Input/output error

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 65, in cb_wrapper
    callback(asyncjob, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/details/snapshots.py", line 237, in _do_create_snapshot
    self.vm.create_snapshot(xml)
  File "/usr/share/virt-manager/virtManager/object/domain.py", line 1124, in create_snapshot
    self._backend.snapshotCreateXML(xml, flags)
  File "/usr/lib64/python3.9/site-packages/libvirt.py", line 2814, in snapshotCreateXML
    if ret is None:raise libvirtError('virDomainSnapshotCreateXML() failed', dom=self)
libvirt.libvirtError: End of file while reading data: Input/output error




Jan 05 12:51:54 host.domain.tld libvirtd[1083138]: Domain id=16 name='guestfs-4vkzt50wou6tjvi4' uuid=8c9ee820-9bc9-45ae-a41a-f763f7e21312 is tainted: host-cpu
Jan 05 12:51:56 host.domain.tld libvirtd[1083138]: missing device in NIC_RX_FILTER_CHANGED event
Jan 05 12:52:07 host.domain.tld libvirtd[1083138]: End of file while reading data: Input/output error
Jan 05 12:52:23 host.domain.tld libvirtd[1083138]: Domain id=17 name='guestfs-d1vki4vjni4svcbt' uuid=64d35f85-ebd2-4dd8-868e-ce841c35f887 is tainted: custom-argv
Jan 05 12:52:23 host.domain.tld libvirtd[1083138]: Domain id=17 name='guestfs-d1vki4vjni4svcbt' uuid=64d35f85-ebd2-4dd8-868e-ce841c35f887 is tainted: host-cpu
Jan 05 12:52:24 host.domain.tld libvirtd[1083138]: missing device in NIC_RX_FILTER_CHANGED event
Jan 05 12:52:43 host.domain.tld libvirtd[1083138]: End of file while reading data: Input/output error
Jan 05 12:53:03 host.domain.tld systemd[1]: libvirtd.service: Main process exited, code=killed, status=6/ABRT
Jan 05 12:53:03 host.domain.tld systemd[1]: libvirtd.service: Failed with result 'signal'.
Jan 05 12:53:03 host.domain.tld systemd[1]: libvirtd.service: Unit process 1601 (dnsmasq) remains running after unit stopped.
Jan 05 12:53:03 host.domain.tld systemd[1]: libvirtd.service: Unit process 1602 (dnsmasq) remains running after unit stopped.
Jan 05 12:53:03 host.domain.tld systemd[1]: libvirtd.service: Unit process 1653 (dnsmasq) remains running after unit stopped.
Jan 05 12:53:03 host.domain.tld systemd[1]: libvirtd.service: Unit process 1654 (dnsmasq) remains running after unit stopped.
Jan 05 12:53:03 host.domain.tld systemd[1]: libvirtd.service: Consumed 11min 30.671s CPU time.
Jan 05 12:53:04 host.domain.tld systemd[1]: libvirtd.service: Scheduled restart job, restart counter is at 1.
Jan 05 12:53:04 host.domain.tld systemd[1]: Stopped Virtualization daemon.
Jan 05 12:53:04 host.domain.tld systemd[1]: libvirtd.service: Consumed 11min 30.671s CPU time.
Jan 05 12:53:04 host.domain.tld systemd[1]: libvirtd.service: Found left-over process 1601 (dnsmasq) in control group while starting unit. Ignoring.
Jan 05 12:53:04 host.domain.tld systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 05 12:53:04 host.domain.tld systemd[1]: libvirtd.service: Found left-over process 1602 (dnsmasq) in control group while starting unit. Ignoring.
Jan 05 12:53:04 host.domain.tld systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 05 12:53:04 host.domain.tld systemd[1]: libvirtd.service: Found left-over process 1653 (dnsmasq) in control group while starting unit. Ignoring.
Jan 05 12:53:04 host.domain.tld systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 05 12:53:04 host.domain.tld systemd[1]: libvirtd.service: Found left-over process 1654 (dnsmasq) in control group while starting unit. Ignoring.
Jan 05 12:53:04 host.domain.tld systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 05 12:53:04 host.domain.tld systemd[1]: Starting Virtualization daemon...
Jan 05 12:53:04 host.domain.tld systemd[1]: Started Virtualization daemon.
Jan 05 12:53:04 host.domain.tld dnsmasq[1601]: read /etc/hosts - 2 addresses
Jan 05 12:53:04 host.domain.tld dnsmasq[1653]: read /etc/hosts - 2 addresses
Jan 05 12:53:04 host.domain.tld dnsmasq[1653]: read /var/lib/libvirt/dnsmasq/isolated.addnhosts - 0 addresses
Jan 05 12:53:04 host.domain.tld dnsmasq[1601]: read /var/lib/libvirt/dnsmasq/default.addnhosts - 0 addresses
Jan 05 12:53:04 host.domain.tld dnsmasq-dhcp[1601]: read /var/lib/libvirt/dnsmasq/default.hostsfile
Jan 05 12:53:04 host.domain.tld dnsmasq-dhcp[1653]: read /var/lib/libvirt/dnsmasq/isolated.hostsfile

Comment 7 Peter Krempa 2021-01-05 14:58:04 UTC
I tried reproducing this bug with upstream libvirt and was unsuccessful. Please attach additional information according to our documentation on filing bugs:

https://libvirt.org/bugs.html#quality

specifically if libvirtd crashed, attach a backtrace as Daniel requested and also attach debug logs:

https://libvirt.org/kbase/debuglogs.html

Comment 8 Daniel Berrangé 2021-01-05 15:02:34 UTC
We can definitely see libvirtd crashing from the system log here:

Jan 05 12:53:03 host.domain.tld systemd[1]: libvirtd.service: Main process exited, code=killed, status=6/ABRT

so we do need the stack trace.

Comment 9 Morgan Weetman 2021-01-09 10:06:55 UTC
Created attachment 1745811 [details]
libvirtd debug log

Debug log attached

Comment 10 Morgan Weetman 2021-01-09 10:24:39 UTC
Created attachment 1745813 [details]
stack trace during snapshot attempt

Comment 11 Peter Krempa 2021-01-09 11:43:11 UTC
So the backtrace:

Thread 2 (Thread 0x7ff9f1b1d640 (LWP 1546780) "rpc-worker"):
#0  0x00007ffa0096d9d5 in raise () from /lib64/libc.so.6
#1  0x00007ffa009568a4 in abort () from /lib64/libc.so.6
#2  0x00007ffa00df8f46 in virCommandAddArg.cold () from /lib64/libvirt.so.0
#3  0x00007ff9cb99c62d in qemuDomainSnapshotForEachQcow2Raw () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#4  0x00007ff9cba40ede in qemuDomainSnapshotCreateXML () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#5  0x00007ffa0106d634 in virDomainSnapshotCreateXML () from /lib64/libvirt.so.0
#6  0x00005641413c04b8 in remoteDispatchDomainSnapshotCreateXMLHelper ()
#7  0x00007ffa00f78b80 in virNetServerProgramDispatch () from /lib64/libvirt.so.0
#8  0x00007ffa00f7db64 in virNetServerHandleJob () from /lib64/libvirt.so.0
#9  0x00007ffa00e8d392 in virThreadPoolWorker () from /lib64/libvirt.so.0
#10 0x00007ffa00e8c985 in virThreadHelper () from /lib64/libvirt.so.0
#11 0x00007ffa00b1f3f9 in start_thread () from /lib64/libpthread.so.0
#12 0x00007ffa00a31903 in clone () from /lib64/libc.so.6

corresponds to the backtrace we've got in an upstream issue reported recently:

https://gitlab.com/libvirt/libvirt/-/issues/97

What doesn't add up is that the problem in the issue happens only with <disk type='volume'> which isn't in the XML that libguestfs uses (posted by Rich) nor there is any mention of another disk being added.

From the debug log though I can see that the definition has 2 disks and also that there's a <disk type='volume'> translation happening right before the crash happens:

2021-01-07 12:23:19.900+0000: 1471318: debug : virStoragePoolLookupByName:367 : conn=0x7f69300046d0, name=default
2021-01-07 12:23:19.900+0000: 1471318: debug : virStorageVolLookupByName:1330 : pool=0x7f694000c810, name=blah-sda

Thus I'm infering that the reproducer steps abvoe are not accurate and in fact another disk was added to the VM which is configured as <disk type='volume'> thus this bug is the same as the above mentioned upstream issue.

The following commit fixes it:

commit 926563dc3a6c3f1598c5a54626acf31b7d3a6043
Author: Peter Krempa <pkrempa>
Date:   Mon Nov 23 11:13:19 2020 +0100

    qemuDomainSnapshotForEachQcow2: Pass in 'def' rather than selecting it internally
    
    In some cases such as when creating an internal inactive snapshot we
    know that the domain definition in the snapshot is equivalent to the
    current definition. Additionally we set up the current definition for
    the snapshotting but not the one contained in the snapshot. Thus in some
    cases the caller knows better which def to use.
    
    Make qemuDomainSnapshotForEachQcow2 take the definition by the caller
    and copy the logic for selecting the definition to callers where we
    don't know for sure that the above claim applies.
    
    This fixes internal inactive snapshots when <disk type='volume'> is used
    as we translate the pool/vol combo only in the current def.
    
    Resolves: https://gitlab.com/libvirt/libvirt/-/issues/97
    Signed-off-by: Peter Krempa <pkrempa>
    Reviewed-by: Ján Tomko <jtomko>

v6.9.0-378-g926563dc3a

The commit is in the v6.9.10 release which is available in the virt-preview repository. Moving to POST so that fedora maintainer sees this and decides whether it's worth backporting this commit.


Note You need to log in before you can comment on or make changes to this bug.