1769600 – power9 boxes cannot successfully boot any Fedora image with qemu-4.1.0-2.fc31 (pseries-4.1 machine)

Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.

Bug 1769600 - power9 boxes cannot successfully boot any Fedora image with qemu-4.1.0-2.fc31 (pseries-4.1 machine)

Summary: power9 boxes cannot successfully boot any Fedora image with qemu-4.1.0-2.fc31...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	31
Hardware:	ppc64le
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	PPCTracker
TreeView+	depends on / blocked

Reported:	2019-11-06 22:56 UTC by Adam Williamson
Modified:	2020-01-13 12:41 UTC (History)
CC List:	31 users (show)
Fixed In Version:	kernel-5.3.15-300.fc31 kernel-5.3.15-200.fc30
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-12-10 02:55:05 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
powerpc/xive: skip ioremap() of ESB pages for LSI interrupts (5.58 KB, patch) 2019-12-03 16:30 UTC, Cedric Le Goater	no flags	Details \| Diff
View All

Description Adam Williamson 2019-11-06 22:56:38 UTC

I upgraded the openQA staging boxes to Fedora 31 on October 30th. Since then, it seems the power9 worker hosts (which run VMs in which tests happen) have not successfully booted any images at all. I just tested and confirmed that they cannot boot an image which they previously booted fine, when running Fedora 30 before the upgrade, so this isn't a problem with the images, I don't think.

Here's a (sped-up) video of what happens: https://openqa.stg.fedoraproject.org/tests/665396/file/video.ogv . It seems we see some SLOF output, then the grub menu appears, the test hits enter to boot, we see a cursor at top-left for a little bit, then we see more SLOF output, then we see the grub menu again, it times out, we see more SLOF output and it hangs, at "Booting Linux via __start() @ 0x0000000002000000 ..."

The qemu command is:

/usr/bin/qemu-system-ppc64 -g 1024x768 -vga virtio -only-migratable -chardev ringbuf,id=serial0,logfile=serial0,logappend=on -serial chardev:serial0 -soundhw ac97 -global isa-fdc.driveA= -m 4096 -machine usb=off -cpu host -netdev user,id=qanet0 -device virtio-net,netdev=qanet0,mac=52:54:00:12:34:56 -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0 -boot once=d,menu=on,splash-time=5000 -device nec-usb-xhci -device usb-tablet -device usb-kbd -smp 1 -enable-kvm -no-shutdown -vnc :105,share=force-shared -device virtio-serial -chardev socket,path=virtio_console,server,nowait,id=virtio_console,logfile=virtio_console.log,logappend=on -device virtconsole,chardev=virtio_console,name=org.openqa.console.virtio_console -chardev socket,path=qmp_socket,server,nowait,id=qmp_socket,logfile=qmp_socket.log,logappend=on -qmp chardev:qmp_socket -S -device virtio-scsi-pci,id=scsi0 -blockdev driver=file,node-name=hd0-file,filename=/var/lib/openqa/pool/15/raid/hd0,cache.no-flush=on -blockdev driver=qcow2,node-name=hd0,file=hd0-file,cache.no-flush=on -device virtio-blk,id=hd0-device,drive=hd0,serial=hd0 -blockdev driver=file,node-name=cd0-overlay0-file,filename=/var/lib/openqa/pool/15/raid/cd0-overlay0,cache.no-flush=on -blockdev driver=qcow2,node-name=cd0-overlay0,file=cd0-overlay0-file,cache.no-flush=on -device scsi-cd,id=cd0-device,drive=cd0-overlay0,serial=cd0

I noticed that the version of SLOF in F31 is somewhat old, and is tagged as being for qemu 4.0 while F31 has qemu 4.1. But I tried rebuilding both the SLOF version that was marked as being for qemu 4.1 (20190703) and the latest SLOF (20191022) and neither seems to help, so I don't think the bug is just 'SLOF is out of date', so I'm assigning it to qemu.

Comment 1 Adam Williamson 2019-11-07 03:07:18 UTC

So I rebuilt the current F30 qemu - qemu-3.1.1-2.fc30 - for F31 (I had to disable tests and backport a couple of build fix patches). With that qemu, things work again. So the problem is something between that version of qemu and the version in F31 (qemu-4.1.0-2.fc31).

Comment 2 Adam Williamson 2019-11-07 03:09:37 UTC

I just testing running qemu directly at a console without a graphical device, and that gets me a traceback:

[    0.015468] ------------[ cut here ]------------
[    0.015518] kernel BUG at arch/powerpc/include/asm/book3s/64/pgtable.h:612!
[    0.015578] Oops: Exception in kernel mode, sig: 5 [#1]
[    0.015627] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=1024 NUMA pSeries
[    0.015697] Modules linked in:
[    0.015739] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.4.0-0.rc6.git0.1.fc32.ppc64le #1
[    0.015812] NIP:  c000000000f63294 LR: c000000000f62e44 CTR: 0000000000000000
[    0.015889] REGS: c0000000fa45f0d0 TRAP: 0700   Not tainted  (5.4.0-0.rc6.git0.1.fc32.ppc64le)
[    0.015971] MSR:  8000000002029033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 44000424  XER: 00000000
[    0.016050] CFAR: c000000000f63128 IRQMASK: 0 
[    0.016050] GPR00: c000000000f62e44 c0000000fa45f360 c000000001be5400 0000000000000000 
[    0.016050] GPR04: c0000000019c7d38 c0000000fa340030 00000000fa330009 c000000001c15e18 
[    0.016050] GPR08: 0000000000000040 ffe0000000000000 0000000000000000 8418dd352dbd190f 
[    0.016050] GPR12: 0000000000000000 c000000001e00000 c00a000080060000 c00a000080060000 
[    0.016050] GPR16: 0000ffffffffffff 80000000000001ae c000000001c24d98 ffffffffffff0000 
[    0.016050] GPR20: c00a00008007ffff c000000001cafca0 c00a00008007ffff ffffffffffff0000 
[    0.016050] GPR24: c00a000080080000 c00a000080080000 c000000001cafca8 c00a000080080000 
[    0.016050] GPR28: c0000000fa32e010 c00a000080060000 ffffffffffff0000 c0000000fa330000 
[    0.016711] NIP [c000000000f63294] ioremap_page_range+0x4c4/0x6e0
[    0.016778] LR [c000000000f62e44] ioremap_page_range+0x74/0x6e0
[    0.016846] Call Trace:
[    0.016876] [c0000000fa45f360] [c000000000f62e44] ioremap_page_range+0x74/0x6e0 (unreliable)
[    0.016969] [c0000000fa45f460] [c0000000000934bc] do_ioremap+0x8c/0x120
[    0.017037] [c0000000fa45f4b0] [c0000000000938e8] __ioremap_caller+0x128/0x140
[    0.017116] [c0000000fa45f500] [c0000000000931a0] ioremap+0x30/0x50
[    0.017184] [c0000000fa45f520] [c0000000000d1380] xive_spapr_populate_irq_data+0x170/0x260
[    0.017263] [c0000000fa45f5c0] [c0000000000cc90c] xive_irq_domain_map+0x8c/0x170
[    0.017344] [c0000000fa45f600] [c000000000219124] irq_domain_associate+0xb4/0x2d0
[    0.017424] [c0000000fa45f690] [c000000000219fe0] irq_create_mapping+0x1e0/0x3b0
[    0.017506] [c0000000fa45f730] [c00000000021ad6c] irq_create_fwspec_mapping+0x27c/0x3e0
[    0.017586] [c0000000fa45f7c0] [c00000000021af68] irq_create_of_mapping+0x98/0xb0
[    0.017666] [c0000000fa45f830] [c0000000008d4e48] of_irq_parse_and_map_pci+0x168/0x230
[    0.017746] [c0000000fa45f910] [c000000000075428] pcibios_setup_device+0x88/0x250
[    0.017826] [c0000000fa45f9a0] [c000000000077b84] pcibios_setup_bus_devices+0x54/0x100
[    0.017906] [c0000000fa45fa10] [c0000000000793f0] __of_scan_bus+0x160/0x310
[    0.017973] [c0000000fa45faf0] [c000000000075fc0] pcibios_scan_phb+0x330/0x390
[    0.018054] [c0000000fa45fba0] [c00000000139217c] pcibios_init+0x8c/0x128
[    0.018121] [c0000000fa45fc20] [c0000000000107b0] do_one_initcall+0x60/0x2c0
[    0.018201] [c0000000fa45fcf0] [c000000001384624] kernel_init_freeable+0x290/0x378
[    0.018280] [c0000000fa45fdb0] [c000000000010d24] kernel_init+0x2c/0x148
[    0.018348] [c0000000fa45fe20] [c00000000000bdbc] ret_from_kernel_thread+0x5c/0x80
[    0.018427] Instruction dump:
[    0.018468] 41820014 3920fe7f 7d494838 7d290074 7929d182 f8e10038 69290001 0b090000 
[    0.018552] 7a098420 0b090000 7bc95960 7929a802 <0b090000> 7fc68b78 e8610048 7dc47378 
[    0.018636] ---[ end trace 85d1e7e46925cee9 ]---

Comment 3 Adam Williamson 2019-11-07 03:12:37 UTC

Using machine type pseries-3.1 or pseries-4.0 - instead of the default pseries-4.1 - works. So this is something to do with the pseries-4.1 machine type.

Comment 4 Michel Normand 2019-11-07 08:42:45 UTC

*** Bug 1769445 has been marked as a duplicate of this bug. ***

Comment 5 Laurent Vivier 2019-11-07 17:50:06 UTC

This happens because by default interrupt mode is dual with pseries-4.1 and on POWER9 it will switch to xive.

You can try starting the default machine forcing the interrupt mode with "-M pseries,ic-mode=xics"

Comment 6 Laurent Vivier 2019-11-07 18:05:26 UTC

Cédric, any idea about this problem with XIVE?

Comment 7 Cedric Le Goater 2019-11-07 18:10:06 UTC

What is the host kernel ? and the firmware being used on the system ?

Comment 8 Adam Williamson 2019-11-07 19:28:02 UTC

[root@openqa-ppc64le-02 adamwill][PROD]# uname -r
5.3.7-301.fc31.ppc64le
[root@openqa-ppc64le-02 adamwill][PROD]# lsmcode
Version of System Firmware : 
 Product Name          : OpenPOWER Firmware
 Product Version       : SUPERMICRO-P9DSU-V1.16-20180531-prod
 Product Extra         : 	skiboot-v6.0-p1da203b
 Product Extra         : 	bmc-firmware-version-1.27
 Product Extra         : 	occ-77bb5e6-p623d1cd
 Product Extra         : 	hostboot-f911e5c-pda8239f
 Product Extra         : 	machine-xml-218a77a
 Product Extra         : 	sbe-8e0105e
 Product Extra         : 	hcode-hw051018a.op920
 Product Extra         : 	petitboot-v1.7.1-pf773c0d
 Product Extra         : 	linux-4.16.7-openpower2-pbc45895

Comment 9 Cedric Le Goater 2019-11-07 22:01:32 UTC

This is a boston system. KVM XIVE native support on these systems is 
partial because the FW is a little old and QEMU runs with an equivalent 
of kernel_irqchip=off. 

Could you attach the .config file of the guest kernel please ?

Comment 10 Kevin Fenzi 2019-11-17 18:08:20 UTC

I am seeing this also on another power9 box. 

It had: 

Version of System Firmware :
 Product Name          : OpenPOWER Firmware
 Product Version       : SUPERMICRO-P9DSU-V2.10-20190208-prod
 Product Extra         :        skiboot-v6.0.16
 Product Extra         :        bmc-firmware-version-2.04
 Product Extra         :        occ-39d7745
 Product Extra         :        hostboot-3c093dc-pc0ab4f8
 Product Extra         :        buildroot-2018.05.1-9-gc99f2ee
 Product Extra         :        capp-ucode-p9-dd2-v4
 Product Extra         :        machine-xml-218a77a
 Product Extra         :        hostboot-binaries-hw020419a.op920
 Product Extra         :        sbe-9515af0
 Product Extra         :        hcode-hw020719a.op920
 Product Extra         :        petitboot-v1.7.5-p79ec4a8
 Product Extra         :        linux-4.17.12-openpower1-ped131c9

and I updated to the latest firmware I could find: 

 Product Name          : OpenPOWER Firmware
 Product Version       : SUPERMICRO-P9DSU-V2.14-20190807-prod
 Product Extra         :        skiboot-v6.0.20
 Product Extra         :        bmc-firmware-version-2.07
 Product Extra         :        occ-8fa3854
 Product Extra         :        hostboot-8591ded-p4f715ce
 Product Extra         :        buildroot-2018.11.3-12-g222837a
 Product Extra         :        capp-ucode-p9-dd2-v4
 Product Extra         :        machine-xml-734a35e
 Product Extra         :        hostboot-binaries-hw072719a.op920
 Product Extra         :        sbe-b6ee17b
 Product Extra         :        hcode-hw072719a.op920
 Product Extra         :        petitboot-v1.7.5-p11ed908
 Product Extra         :        linux-4.19.57-openpower1-p48ee860

no change. Passing -machine pseries-4.0 works fine.

Comment 11 Kevin Fenzi 2019-12-01 19:41:12 UTC

Any news here? This is causing some rawhide images to fail... perhaps the default could be moved back to pseries-4.0 in f31 on ppc64le for now? 

Or is there any workaround that would let us change that default globally? (passing -machine is not really an option since we would need to modify all the various things that call qemu: imagefactory/oz/virt-install/etc).

Comment 12 Cedric Le Goater 2019-12-02 18:49:13 UTC

I can not reproduce with mainline. Could you share the kernel .config file please ?

Comment 13 Kevin Fenzi 2019-12-02 19:51:42 UTC

I am using 5.3.11-300.fc31.ppc64le stock fedora kernel here.

Comment 14 Cedric Le Goater 2019-12-03 10:08:55 UTC

I could reproduce with these guest kernels :

  https://dl.fedoraproject.org/pub/alt/rawhide-kernel-nodebug/ppc64le/

I suspect a ioremap(-1) in the kernel which was not failing before.

Comment 15 Cedric Le Goater 2019-12-03 16:29:20 UTC

This is a kernel bug in the XIVE sPAPR driver for the INTx PCI interrupts
which are LSI. These are special and we should not be doing the ioremap.
This is failing in Linux 5.4 (+CONFIG_DEBUG_VM).

Comment 16 Cedric Le Goater 2019-12-03 16:30:30 UTC

Created attachment 1641728 [details]
powerpc/xive: skip ioremap() of ESB pages for LSI interrupts

Comment 17 Dan Horák 2019-12-04 11:55:03 UTC

Thanks, Cedric. The patch has been posted as https://lists.ozlabs.org/pipermail/linuxppc-dev/2019-December/201480.html

Comment 18 Fedora Update System 2019-12-05 20:15:28 UTC

FEDORA-2019-7795371386 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-7795371386

Comment 19 Kevin Fenzi 2019-12-06 22:24:53 UTC

Two questions here... 

Does this fix need to be in the host? Or the guest? or both?

Should this work for stable kernels too? or is there something in newer kernels that would cause it to work, but not work backported to older releases?

Comment 20 Fedora Update System 2019-12-07 02:19:41 UTC

kernel-5.3.15-200.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-7795371386

Comment 21 Fedora Update System 2019-12-07 03:38:59 UTC

kernel-5.3.15-300.fc31 has been pushed to the Fedora 31 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-985cb39611

Comment 22 Cedric Le Goater 2019-12-07 08:06:01 UTC

> Does this fix need to be in the host? Or the guest? or both?

Guest side only.
 
> Should this work for stable kernels too? 

Yes. I sent the patch to stable@ also.

> or is there something in newer kernels that would cause it to 
> work, but not work backported to older releases?

Backports should be fine. 

The issue only shows up on Linux 5.4 plus CONFIG_DEBUG_VM.

Comment 23 Fedora Update System 2019-12-10 02:55:05 UTC

kernel-5.3.15-300.fc31 has been pushed to the Fedora 31 stable repository. If problems still persist, please make note of it in this bug report.

Comment 24 Fedora Update System 2019-12-10 03:04:28 UTC

kernel-5.3.15-200.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.

Comment 25 Adam Williamson 2019-12-10 20:12:55 UTC

Kevin asked me to test this, but the easiest thing for me to test with is Rawhide images. Two issues there: I am not 100% sure whether the fix for this is actually in the Rawhide kernel yet (it seems clear it was specifically backported to f30 and f31 kernels, but I cannot tell for sure if it's in Rawhide kernel too), and we haven't had ppc64le images in Rawhide composes since 20191205.n.0. This seems to be because ppc64le kernel builds were turned off for some reason around then and only turned back on yesterday. The next Rawhide compose should get ppc64le images, I'll see if those work when they show up.

Comment 26 Richard W.M. Jones 2019-12-10 20:31:02 UTC

Don't know if it helps here, but libguestfs has started working again on Rawhide ppc64le,
whereas it was broken until yesterday because of (variously) missing kernel or kernel
didn't boot on qemu TCG.  For example this build uses libguestfs for some testing:

https://koji.fedoraproject.org/koji/buildinfo?buildID=1421132

Comment 27 Adam Williamson 2019-12-10 20:48:40 UTC

could be indicative, yeah. I know why the kernels went missing and it doesn't have anything to do with this bug besides making it harder to verify the fix, but if cases where we *had* kernels were previously failing to boot but are now booting with the recently-completed kernel build, that's a good sign.

Comment 28 Adam Williamson 2019-12-14 02:02:38 UTC

This does seem to be fixed for me, at least - I booted today's Rawhide Server netinst image on openqa-ppc64le-02 with `-M pseries-4.1` and it seems to have booted fine, didn't hit the traceback.

Comment 29 Kevin Fenzi 2019-12-14 04:14:19 UTC

I guess I'm hitting some other issue... libguestfs-test-tool doesn't work either on host or in guests:

Preparing to boot Linux version 5.3.15-300.fc31.ppc64le (mockbuild.fedoraproject.org) (gcc version 9.2.1 20190827 (Red Hat 9.2.1-1) (GCC)) #1 SMP Thu Dec 5 14:47:38 UTC 2019
Detected machine type: 0000000000000101
command line: panic=1 console=hvc0 console=ttyS0 edd=off udevtimeout=6000 udev.event-timeout=6000 no_timer_check printk.time=1 cgroup_disable=memory usbcore.nousb cryptomgr.notests tsc=reliable 8250.nr_uarts=1 root=/dev/sdb selinux=0 guestfs_verbose=1 TERM=screen
Max number of cores passed to firmware: 1024 (NR_CPUS = 1024)
Calling ibm,client-architecture-support...libguestfs: error: appliance closed the connection unexpectedly, see earlier error messages
libguestfs: child_cleanup: 0x14089dad0: child process died
libguestfs: error: guestfs_launch failed, see earlier error messages
libguestfs: trace: launch = -1 (error)
libguestfs: trace: close
libguestfs: closing guestfs handle 0x14089dad0 (state 0)

The cloud and container images are failing in a step where it runs libguestfs on the image... 
https://koji.fedoraproject.org/koji/taskinfo?taskID=39510967

Shall I file a new libguestfs bug on that and we can debug further?

Comment 30 Adam Williamson 2019-12-18 18:56:14 UTC

I think so? At least, I'm pretty sure the issue as I first filed it is fixed. A new bug can't hurt.

Comment 31 Kevin Fenzi 2019-12-18 20:05:37 UTC

FYI, I've filed: https://bugzilla.redhat.com/show_bug.cgi?id=1784961 on the guestfs issues. 

This is what is preventing f30/f31/rawhide cloud and containers from composing on ppc64le.

It might be related to qemu starting, erroring and restarting:

qemu-system-ppc64: warning: kernel_irqchip allowed but unavailable: IRQ_XIVE capability must be present for KVM
Falling back to kernel-irqchip=off

Comment 32 Cedric Le Goater 2019-12-19 07:36:00 UTC

QEMU warns that it is using the XIVE emulated device and not the KVM XIVE device 
because the support is not available on the host, the reason being the lack of
migration support in the FW, like on Boston systems.

It should work just the same, a little slower if you measure performance.

Comment 33 Dan Horák 2020-01-11 13:20:54 UTC

Cedric, the problem we are experiencing is the silent restart of the VM after it warns about "IRQ_XIVE capability". Some tools like libguestfs don't expect such behaviour and fails, see bug 1784961.

Comment 34 Cedric Le Goater 2020-01-13 07:37:44 UTC

When a new interrupt mode is negotiated (XICS -> XIVE) between the guest OS and 
the hypervisor, the device tree is updated and the machine is reseted. This is 
a "standard" procedure in the PAPR environment but yes, it can be a problem
for the libvirt tools.

QEMU 5.0 has a set of changes that get rid of this reset.

Comment 35 Dan Horák 2020-01-13 12:41:27 UTC

Thanks, Cedric, makes sense.

Note You need to log in before you can comment on or make changes to this bug.

airlied
amit
berrange
bskeggs
cfergeau
clg
dan
dwmw2
hdegoede
ichavero
itamar
jarodwilson
jeremy
jglisse
john.j5live
jonathan
josef
kernel-maint
kevin
linville
lvivier
masami256
mchehab
menantea
mjg59
normand
pbonzini
rhcn
rjones
steved
virt-maint