Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.

Bug 1871958

Summary: 5.9 pre kernels hang on reboot in aarch64 vm's
Product: [Fedora] Fedora Reporter: Kevin Fenzi <kevin>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: acaringi, airlied, bskeggs, dan, hdegoede, ichavero, itamar, jarodwilson, jeremy, jeremy.linton, jglisse, john.j5live, jonathan, josef, kernel-maint, kraxel, lgoncalv, linville, masami256, mboddu, mchehab, mjg59, pbrobinson, pwhalen, robatino, steved
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-09-16 09:52:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 245418, 1829022    
Attachments:
Description Flags
full boot log none

Description Kevin Fenzi 2020-08-24 17:17:02 UTC
rawhide composes have been failing since 5.9 rc1 landed. 

This is due to aarch64 Cloud-Base and/or Workstation images failing compose. 

They timeout and on investigating, they seem to compose fine, but then at the end they just hang on reboot. Here's the last messages from one of the vm's:

[  OK  ] Stopped Rebuild Hardware Database.
[  OK  ] Stopped Rebuild Journal Catalog.
         Stopping Update UTMP about System Boot/Shutdown...
[  OK  ] Stopped Update UTMP about System Boot/Shutdown.
[  OK  ] Stopped Create Volatile Files and Directories.
[  OK  ] Stopped Import network configuration from initramfs.
[  OK  ] Stopped Restore /run/initramfs on shutdown.
[  OK  ] Stopped target Local File Systems.
         Unmounting /mnt/sysimage/boot/efi...
         Unmounting /mnt/sysimage/dev/pts...
         Unmounting /mnt/sysimage/dev/shm...
         Unmounting /mnt/sysimage/proc...
         Unmounting /mnt/sysimage/run...
         Unmounting /mnt/sysimage/sys/firmware/efi/efivars...
         Unmounting /mnt/sysimage/sys/fs/selinux...
         Unmounting /mnt/sysroot/boot/efi...
         Unmounting /mnt/sysroot/dev/pts...
         Unmounting /mnt/sysroot/dev/shm...
         Unmounting /mnt/sysroot/proc...
         Unmounting /mnt/sysroot/run...
         Unmounting /mnt/sysroot/sys/firmware/efi/efivars...
         Unmounting /mnt/sysroot/sys/fs/selinux...
         Unmounting Temporary Directory (/tmp)...
[  OK  ] Unmounted /mnt/sysimage/boot/efi.
[  OK  ] Unmounted /mnt/sysimage/dev/pts.
[  OK  ] Unmounted /mnt/sysimage/dev/shm.
[  OK  ] Unmounted /mnt/sysimage/proc.
[  OK  ] Unmounted /mnt/sysimage/run.
[  OK  ] Unmounted /mnt/sysimage/sys/firmware/efi/efivars.
[  OK  ] Unmounted /mnt/sysimage/sys/fs/selinux.
[  OK  ] Unmounted /mnt/sysroot/boot/efi.
[  OK  ] Unmounted /mnt/sysroot/dev/pts.
[  OK  ] Unmounted /mnt/sysroot/dev/shm.
[  OK  ] Unmounted /mnt/sysroot/proc.
[  OK  ] Unmounted /mnt/sysroot/run.
[  OK  ] Unmounted /mnt/sysroot/sys/firmware/efi/efivars.
[  OK  ] Unmounted /mnt/sysroot/sys/fs/selinux.
         Unmounting /mnt/sysimage/dev...
         Unmounting /mnt/sysimage/sys...
         Unmounting /mnt/sysroot/dev...
         Unmounting /mnt/sysroot/sys...
[  OK  ] Unmounted /mnt/sysimage/dev.
[  OK  ] Unmounted /mnt/sysimage/sys.
[  OK  ] Unmounted /mnt/sysroot/dev.
[  OK  ] Unmounted /mnt/sysroot/sys.
         Unmounting /mnt/sysimage...
         Unmounting /mnt/sysroot...
[  OK  ] Unmounted /mnt/sysimage.
[  OK  ] Unmounted Temporary Directory (/tmp).
[  OK  ] Stopped target Swap.
         Deactivating swap Compressed swap on /dev/zram0...
[  OK  ] Deactivated swap Compressed swap on /dev/zram0.
         Stopping Create swap on /dev/zram0...
[  OK  ] Stopped Create swap on /dev/zram0.
[  OK  ] Removed slice system-swap\x2dcreate.slice.
[  OK  ] Unmounted /mnt/sysroot.
[  OK  ] Stopped target Local File Systems (Pre).
[  OK  ] Reached target Unmount All Filesystems.
[  OK  ] Stopped Create Static Device Nodes in /dev.
[  OK  ] Stopped Create System Users.
[  OK  ] Stopped Remount Root and Kernel File Systems.
[  OK  ] Reached target Shutdown.
[  OK  ] Reached target Final Step.
[  OK  ] Finished Reboot.
[  OK  ] Reached target Reboot.
[ 1774.256311] dracut Warning: Killing all remaining processes
dracut Warning: Killing all remaining processes
[ 1774.866439] dracut Warning: Unmounted /oldroot.
Rebooting.

It then sits there until timeout and failure of the compose. 

These vm's are on fedora 32 hosts and are sometimes on mustangs and sometimes on lenovo emags.

Comment 1 Peter Robinson 2020-08-24 17:26:35 UTC
What's the version of qemu/libvirt/edk2 on the hosts?

Comment 2 Kevin Fenzi 2020-08-24 17:54:08 UTC
libvirt-daemon-6.1.0-4.fc32.aarch64
qemu-4.2.1-1.fc32.aarch64
edk2-aarch64-20200201stable-1.fc32.noarch

Comment 3 Peter Robinson 2020-08-24 17:58:07 UTC
usually on aarch64 it's the PSCI firmware interface that deals with reboots and related bits, just looking through the changes there to see if there's anything of note showing up

Comment 4 Paul Whalen 2020-08-24 20:00:18 UTC
I haven't been able to reproduce on an F33 mustang or F32 eMag. 

Verified the same packages are installed on the F32 host:

libvirt-daemon-6.1.0-4.fc32.aarch64
qemu-system-aarch64-4.2.1-1.fc32.aarch64
edk2-aarch64-20200201stable-1.fc32.noarch

[root@ampere-hr350a-06 ~]# uname -r
5.7.16-200.fc32.aarch64

On the vm:
5.9.0-0.rc1.20200821gitda2968ff879b.1.fc34.aarch64

Comment 5 Peter Robinson 2020-08-24 21:44:59 UTC
I wonder if this is a imagefactory specific issue

Comment 6 Kevin Fenzi 2020-08-24 22:00:58 UTC
It's possible. Or the way it's defining the guest?

Comment 7 Paul Whalen 2020-08-25 13:09:19 UTC
Reproduced with imagefactory-1.1.15-2.fc32.noarch when trying to run the cloud-base image build.

Comment 8 Kevin Fenzi 2020-08-28 22:05:41 UTC
Marking automatic f34-beta blocker: "Bugs which entirely prevent the composition of one or more of the release-blocking images required to be built for a currently-pending (pre-)release"

Comment 9 Paul Whalen 2020-08-31 16:52:02 UTC
Cloud-base kickstart installs/reboots ok outside of imagefactory.

Comment 10 Mohan Boddu 2020-09-02 14:45:52 UTC
I think kernel-5.9.0-0.rc3.1.fc34 is still having issues https://pagure.io/releng/failed-composes/issue/1697.

Untagging it and running another compose now.

Comment 11 Mohan Boddu 2020-09-03 15:21:10 UTC
We need to find a fix for imagefactory asap. Another rawhide failure due to kernel-5.9.0-0.rc3.20200902git9c7d619be5a0.1.fc34.

I dont want to keep untagging them to get a compose out.

@Paul Whalen, any update on your issue tracking?

Comment 12 Paul Whalen 2020-09-03 16:14:01 UTC
(In reply to Mohan Boddu from comment #11)
> We need to find a fix for imagefactory asap. Another rawhide failure due to
> kernel-5.9.0-0.rc3.20200902git9c7d619be5a0.1.fc34.
> 
> I dont want to keep untagging them to get a compose out.
> 
> @Paul Whalen, any update on your issue tracking?

Unfortunately not. Perhaps we can make the image fail-able until resolved?

Comment 13 Mohan Boddu 2020-09-03 20:49:20 UTC
But Cloud base for x86_64 and aarch64 are release blocking. Although its rawhide, we always followed the same conditions.

Maybe we could make an exception.

@Kevin, are you okay with it.

Comment 14 Peter Robinson 2020-09-04 06:23:17 UTC
I started looking at it last week, I'm on PTO this week, will deal with it Monday

Comment 15 Dan HorĂ¡k 2020-09-08 15:08:36 UTC
It might be oz/imagefactory passing some not-wanted-anymore options to libvirt/qemu or something like that ...

Comment 16 Paul Whalen 2020-09-10 01:47:47 UTC
There is a crash when using the virtio_gpu driver that seems to be causing this (full boot attached):

[   12.928989] [drm] Initialized virtio_gpu 0.1.0 0 for virtio5 on minor 0
[   11.982271] Unable to handle kernel access to user memory outside uaccess routines at virtual address 0000000000000000
[   11.984567] Mem abort info:
[   11.985118]   ESR = 0x96000004
[   11.985759]   EC = 0x25: DABT (current EL), IL = 32 bits
[   11.986789]   SET = 0, FnV = 0
[   11.987382]   EA = 0, S1PTW = 0
[   11.988028] Data abort info:
[   11.988638]   ISV = 0, ISS = 0x00000004
[   11.989762]   CM = 0, WnR = 0
[   11.990424] user pgtable: 4k pages, 48-bit VAs, pgdp=00000001d24e9000
[   11.991753] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
[   11.993227] Internal error: Oops: 96000004 [#1] SMP
[   11.993955] Modules linked in: virtio_gpu(+) drm_kms_helper crct10dif_ce ghash_ce syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm virtio_blk virtio_console xhci_pci(+) xhci_pci_renesas qemu_fw_cfg virtio_mmio dm_multipath aes_neon_bs
[   11.997225] CPU: 3 PID: 527 Comm: systemd-udevd Not tainted 5.9.0-0.rc3.20200902git9c7d619be5a0.1.fc34.aarch64 #1
[   11.998815] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[   12.000068] pstate: 60400005 (nZCv daif +PAN -UAO BTYPE=--)
[   12.001083] pc : swiotlb_map+0x194/0x1b0
[   12.001771] lr : swiotlb_map+0x174/0x1b0
[   12.002470] sp : ffff800010ac35e0
[   12.003089] x29: ffff800010ac35e0 x28: 0000000000000000
[   12.004094] x27: ffff000193108850 x26: ffff000192b1a000
[   12.005138] x25: 0000000000000000 x24: ffffa60579bd40f8
[   12.006185] x23: 00000001d2654000 x22: 0000000000000000
[   12.007211] x21: 0000000000000000 x20: 0000000000001000
[   12.008244] x19: ffff000193108850 x18: 0000000000000000
[   12.009274] x17: 0000000000000000 x16: ffffa605769dc7d4
[   12.009345] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 5.09
[   12.010352] x15: ffffa60577ca2808 x14: 0000000000000000
[   12.011933] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[   12.013058] x13: 0000000000000000 x12: 0000000000000000
[   12.014364] usb usb1: Product: xHCI Host Controller
[   12.015315] x11: 0000000000000000 x10: 0000000000000000
[   12.016229] usb usb1: Manufacturer: Linux 5.9.0-0.rc3.20200902git9c7d619be5a0.1.fc34.aarch64 xhci-hcd
[   12.017155] x9 : ffffa605769e3b08 x8 : 0000000000000070
[   12.018924] usb usb1: SerialNumber: 0000:02:00.0
[   12.019853] x7 : ffff0001fe608000 x6 : 0000000000000000
[   12.021693] x5 : 0000000000000001 x4 : 0000000000001000
[   12.022632] x3 : ffff800010ac3628 x2 : ffff000193064a00
[   12.023573] x1 : ffffa60511d55620 x0 : 0000000000000000
[   12.024518] Call trace:
[   12.024959]  swiotlb_map+0x194/0x1b0
[   12.025613]  dma_direct_map_sg+0x12c/0x214
[   12.026412]  dma_map_sg_attrs+0x94/0xac
[   12.027225]  drm_gem_shmem_get_pages_sgt+0x84/0xd4 [drm]
[   12.028310]  virtio_gpu_object_shmem_init+0x4c/0x170 [virtio_gpu]
[   12.029617]  virtio_gpu_object_create+0x184/0x220 [virtio_gpu]
[   12.030815]  virtio_gpu_mode_dumb_create+0xb0/0x1a0 [virtio_gpu]
[   12.032032]  drm_mode_create_dumb+0x9c/0xc0 [drm]
[   12.033039]  drm_client_buffer_create+0x84/0x114 [drm]
[   12.034133]  drm_client_framebuffer_create+0x30/0x90 [drm]
[   12.035254]  drm_fb_helper_generic_probe+0x5c/0x170 [drm_kms_helper]
[   12.036587]  drm_fb_helper_single_fb_probe+0x2a8/0x440 [drm_kms_helper]
[   12.037910]  __drm_fb_helper_initial_config_and_unlock+0x48/0x154 [drm_kms_helper]
[   12.039426]  drm_fbdev_client_hotplug+0xbc/0x1c0 [drm_kms_helper]
[   12.040717]  drm_fbdev_generic_setup+0xc0/0x1c0 [drm_kms_helper]
[   12.041165] hub 1-0:1.0: USB hub found
[   12.041942]  virtio_gpu_probe+0xc0/0x194 [virtio_gpu]
[   12.043722]  virtio_dev_probe+0x154/0x200
[   12.044524]  really_probe+0xf0/0x504
[   12.045235]  driver_probe_device+0xe4/0x100
[   12.046065]  device_driver_attach+0xd4/0xe0
[   12.046895]  __driver_attach+0xb4/0x180
[   12.047619]  bus_for_each_dev+0x6c/0xb0
[   12.048300]  driver_attach+0x30/0x3c
[   12.048969]  bus_add_driver+0x154/0x250
[   12.049669]  driver_register+0x84/0x140
[   12.050338]  register_virtio_driver+0x30/0x50
[   12.050459] hub 1-0:1.0: 15 ports detected
[   12.051101]  virtio_gpu_driver_init+0x28/0x1000 [virtio_gpu]
[   12.052762]  do_one_initcall+0x44/0x170
[   12.053559]  do_init_module+0x60/0x27c
[   12.054282]  load_module+0x60c/0x760
[   12.054692] xhci_hcd 0000:02:00.0: xHCI Host Controller
[   12.054930]  __do_sys_init_module+0xb0/0x120
[   12.056549]  __arm64_sys_init_module+0x28/0x34
[   12.057357]  el0_svc_common.constprop.0+0x80/0x1b0
[   12.058223]  do_el0_svc+0x30/0xa0
[   12.058900]  el0_sync_handler+0x90/0x1ec
[   12.059807]  el0_sync+0x15c/0x180
[   12.060570] Code: aa1403e4 f9423660 910123e3 f9423e66 (f9400005)
[   12.061936] ---[ end trace e80946b15db81549 ]---

Comment 17 Paul Whalen 2020-09-10 01:48:49 UTC
Created attachment 1714358 [details]
full boot log

Comment 18 Peter Robinson 2020-09-10 11:29:25 UTC
Gerd are you aware of any issues around virtio-gpu on aarch64?

Comment 19 Gerd Hoffmann 2020-09-14 09:34:45 UTC
(In reply to Peter Robinson from comment #18)
> Gerd are you aware of any issues around virtio-gpu on aarch64?

virtio-gpu in general has problems, 5.9-rc5 & newer should be good again.

Comment 20 Paul Whalen 2020-09-15 18:37:38 UTC
Fedora-Rawhide-20200915.n.1 compose with kernel-5.9.0-0.rc5.11.fc34 completed successfully.

Comment 21 Peter Robinson 2020-09-16 09:52:26 UTC
> virtio-gpu in general has problems, 5.9-rc5 & newer should be good again.

Confirmed it is. Thanks