Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1866823 - Launching guest via libguestfs on ppc64le fails
Summary: Launching guest via libguestfs on ppc64le fails
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: qemu-kvm
Version: 8.3
Hardware: ppc64le
OS: Linux
urgent
urgent
Target Milestone: rc
: 8.3
Assignee: Laurent Vivier
QA Contact: Qunfang Zhang
URL:
Whiteboard:
Depends On:
Blocks: TRACKER-bugs-affecting-libguestfs
TreeView+ depends on / blocked
 
Reported: 2020-08-06 13:40 UTC by Lon Hohberger
Modified: 2020-10-28 00:42 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-28 05:11:17 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
IBM Linux Technology Center 187741 0 None None None 2020-08-19 23:34:41 UTC

Description Lon Hohberger 2020-08-06 13:40:30 UTC
Description of problem:

qemu-kvm: Requested safe cache capability level not supported by kvm, try appending -machine cap-cfpc=broken

... when building indirection images in brew.

Extended errors:

libguestfs: launch libvirt guest
libguestfs: error: could not create appliance through libvirt.

Try running qemu directly without libvirt using this environment variable:
export LIBGUESTFS_BACKEND=direct

Original error from libvirt: internal error: process exited while connecting to monitor: 2020-08-05T20:00:42.080975Z qemu-kvm: Requested safe cache capability level not supported by kvm, try appending -machine cap-cfpc=broken [code=1 int1=-1]
libguestfs: trace: launch = -1 (error)
libguestfs: trace: close
libguestfs: closing guestfs handle 0x1001c6cde80 (state 0)
libguestfs: command: run: rm
libguestfs: command: run: \ -rf /tmp/libguestfs9Sb7CN
libguestfs: command: run: rm
libguestfs: command: run: \ -rf /tmp/libguestfsVCqvKz




Version-Release number of selected component (if applicable): qemu-kvm-15:4.2.0-29.module+el8.3.0+7212+401047e6 from virt:rhel


How reproducible: 100%


Steps to Reproduce:
1. Run an indirectionimage build in brew which uses libguestfs to finalize information in qcow2 images.

Actual results:

ppc64le builds fail
x86_64 builds work


Expected results:

Both architectures work


Additional info:

1. This happens regardless of whether LIBGUESTFS_BACKEND=direct is used.

2. This is a new problem on RHEL 8.3.0 builds. 8.2.x builds work correctly.

Comment 4 Laurent Vivier 2020-08-06 19:57:44 UTC
looks like a firmware issue (see BZ 1768551)

or perhaps an already known problem between ppc64le and libguestfs (cc: Rich)?

Comment 5 Richard W.M. Jones 2020-08-06 20:43:50 UTC
I think it's either bug 1726075, or maybe it's a completely new one.  David - help!

Comment 6 David Gibson 2020-08-07 00:48:15 UTC
There are several possibilities.

a. It could be bug 1726075.  I suspect not, but I can't rule it out since the qemu version isn't given.

b. It could be that the host doesn't have a recent enough firmware to implement the Spectre workarounds that qemu is trying to activate

c. It could be that libguestfs is trying to use KVM PR rather than KVM HV.


To narrow this down we'll need:
    1) What sort of machine is this happening on?  POWER8?  POWER9?  Which model?

    2) Is this happening in: a bare metal host?  an LPAR under PowerVM?  a KVM guest?

    3) What KVM modules are loaded? (lsmod output)

    4) What firmware versions are on the system (output of "lsprop /proc/device-tree/ibm,firmware-versions")

Comment 7 Lon Hohberger 2020-08-07 13:36:37 UTC
Note: retried with AV 8.3.0 with this build:

qemu-kvm  ppc64le  15:5.0.0-0.module+el8.3.0+6620+5d5e1420

Same failure, FWIW.

I will submit tasks with "cat /proc/cpuinfo", "lsmod", - are there any other things we need to gather?

I can follow up with the hardware owners to figure out the host's hardware information.

I believe the build occurs within a KVM guest. ImageFactory boots a VM using KVM from a "utility image", which contains the environment necessary to build application-specific images (in this case, initrds as well as full operating system images). ImageFactory then SSHs in to the host commands.

Among these commands are various 'virt-customize' and such, which, due to the architecture of libguestfs, launch "lite VMs" inside the VM in order to perform those tasks.

(Again - this works on 8.2.0.)

Comment 8 Laurent Vivier 2020-08-07 13:56:01 UTC
For the record, the problem is introduced by:

commit 2782ad4c4102d57f7f8e135dce0c1adb0149de77
Author: Suraj Jitindar Singh <sjitindarsingh>
Date:   Fri Mar 1 15:46:09 2019 +1100

    target/ppc/spapr: Enable mitigations by default for pseries-4.0 machine type
    
    There are currently 3 mitigations the availability of which is controlled
    by the spapr-caps mechanism, cap-cfpc, cap-sbbc, and cap-ibs. Enable these
    mitigations by default for the pseries-4.0 machine type.
    
    By now machine firmware should have been upgraded to allow these
    settings.
    
    Signed-off-by: Suraj Jitindar Singh <sjitindarsingh>
    Message-Id: <20190301044609.9626-3-sjitindarsingh>
    Signed-off-by: David Gibson <david.id.au>

This explain why it works with rhel-8.2.0 (not rhel-av-8.2.0) because it is based on pseries-3.1 (pseries-rhel7.6.0).

All the pseries-rhel8.X.0 have this change (provided by rhel-8.3.0 and all rhel-av-8.X.0).

Comment 9 Lon Hohberger 2020-08-07 17:23:08 UTC
Excellent - so it sounds like we need to upgrade firmware on the host, then?

Comment 10 Lon Hohberger 2020-08-07 17:36:08 UTC
I've asked for cpuinfo/lsmod/lsmcode; leaving needinfo until we have it

Comment 12 Laurent Vivier 2020-08-10 11:15:14 UTC
The content of /sys/devices/system/cpu/vulnerabilities/* might also help to know the vulnerability level.
And the result of "lsprop /proc/device-tree/ibm,opal/fw-features/*/" helps to know the firmware level.

I think we need either in the vulnerabilities "Meltdown: Not affected" (to enable cap-cfpc=fixed) or 
(to enable cap-cpfc=workaround) we need "inst-l1d-flush-trig2" and "inst-l1d-flush-ori30,30,0" set to enabled in fw-features.
"fw-l1d-thread-split" is needed on P9 but not on P8.

Comment 13 Laurent Vivier 2020-08-10 11:22:14 UTC
(In reply to Laurent Vivier from comment #12)
...
> (to enable cap-cpfc=workaround) we need "inst-l1d-flush-trig2" and
> "inst-l1d-flush-ori30,30,0" set to enabled in fw-features.

we need "inst-l1d-flush-trig2" or "inst-l1d-flush-ori30,30,0", not both.

Comment 14 David Gibson 2020-08-24 04:49:35 UTC
Lon, ping?

This is one of our few remaining bugs for RHEL8.3, and we can't really resolve it without more input from you.

Comment 16 David Gibson 2020-09-21 04:28:07 UTC
Lon, without further input here, I'm going to have to close this as INSUFFICIENT_DATA.

Comment 21 Lon Hohberger 2020-10-07 18:33:18 UTC
I was late getting this information, however - the data show that this isn't actually a bug in qemu; simply outdated firmware. I'll file a ticket to get that updated.

Comment 22 David Gibson 2020-10-09 03:43:32 UTC
Understood, I thought that was a likely cause.


Note You need to log in before you can comment on or make changes to this bug.