Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1789199 - kernel hangs if the qemu -smp topology is different from the host
Summary: kernel hangs if the qemu -smp topology is different from the host
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: qemu
Version: 31
Hardware: ppc64le
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Fedora Virtualization Maintainers
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: PPCTracker
TreeView+ depends on / blocked
 
Reported: 2020-01-09 02:31 UTC by Gustavo Luiz Duarte
Modified: 2020-11-24 18:54 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-24 18:54:55 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
sosreport (17.38 MB, application/x-xz)
2020-01-10 19:35 UTC, Gustavo Luiz Duarte
no flags Details


Links
System ID Private Priority Status Summary Last Updated
IBM Linux Technology Center 183138 0 None None None 2020-01-09 09:21:07 UTC

Description Gustavo Luiz Duarte 2020-01-09 02:31:33 UTC
Description of problem:

I'm running KVM on a POWER8 server (host and guest running Fedora 31). If I don't pass to -smp threads=8, which is the host topology, the VM will come up but the guest kernel will hang, sometimes spitting soft lockups or RCU stalls like the following. 

Passing to qemu "-smp cores=4,threads=8" boots fine, but any of the following will hang
-smp cores=4,threads=4
-smp cores=4,threads=1
-smp 4


Version-Release number of selected component (if applicable):

kernel-5.4.8-200.fc31.ppc64le (host and guest)
qemu-4.1.1-1.fc31.ppc64le


How reproducible:
Always


Steps to Reproduce:
1. qemu-system-ppc64 -enable-kvm -m 2048 -smp 16 -nodefaults -nographic -serial stdio -drive file=fedora-cloud.raw


Actual results:

Guest kernel hangs:

[    0.672791] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[    0.673946] rcu:     2-...!: (12 GPs behind) idle=4f0/0/0x0 softirq=19/19 fqs=1 
[    0.675267]  (detected by 0, t=97799 jiffies, g=-1111, q=111)
[    0.676335] Sending NMI from CPU 0 to CPUs 2:
[    0.677171] NMI backtrace for cpu 2
[    0.677173] CPU 2 didn't respond to backtrace IPI, inspecting paca.
[    0.677176] irq_soft_mask: 0x01 in_mce: 0 in_nmi: 0 current: 0 (swapper/2)
[    0.677895] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.3.14-300.fc31.ppc64le #1
[    0.679046] Back trace of paca->saved_r1 (0xc00000007a983920) (possibly stale):
[    0.679049] Call Trace:
[    0.680420] NIP:  c0000000000d79cc LR: c0000000000da170 CTR: c00000003fffce00
[    0.680423] REGS: c00000007a4d3b00 TRAP: 0501   Not tainted  (5.3.14-300.fc31.ppc64le)
[    0.681792] rcu: rcu_sched kthread starved for 97796 jiffies! g-1111 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2
[    0.683307] MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 44000008  XER: 00000000
[    0.683762] rcu: RCU grace-period kthread stack dump:
[    0.683766] rcu_sched       I    0    10      2 0x00000808
[    0.685212] CFAR: c000000000c57a28 IRQMASK: 0 
[    0.685212] GPR00: 0000000044000008 c00000007a4d3d90 c000000001772000 0000000000000000 
[    0.685212] GPR04: 0000000000000001 ffffffffffffffff 0000000000000000 0000000000000808 
[    0.685212] GPR08: 0000000000000000 c00000003fffe400 0000000000000001 c00000007fb10f80 
[    0.685212] GPR12: c0000000000da130 c00000003fffce00 
[    0.685235] NIP [c0000000000d79cc] plpar_hcall_norets+0x1c/0x28
[    0.686692] Call Trace:
[    0.686698] [c00000007a4cb890] [c00000007fb96600] 0xc00000007fb96600 (unreliable)
[    0.688889] LR [c0000000000da170] pseries_lpar_idle+0x40/0x60
[    0.688891] Call Trace:
[    0.690385] [c00000007a4cba70] [c000000000022800] __switch_to+0x340/0x520
[    0.690391] [c00000007a4cbad0] [c000000000d1a7a0] __schedule+0x2b0/0x810
[    0.691467] [c00000007a4d3d90] [c0000000001f7da4] tick_nohz_idle_stop_tick+0x2c4/0x380 (unreliable)
[    0.691472] [c00000007a4d3df0] [c000000000024288] arch_cpu_idle+0x68/0x160
[    0.692507] [c00000007a4cbba0] [c000000000d1ad50] schedule+0x50/0x110
[    0.692513] [c00000007a4cbbd0] [c000000000d1faf8] schedule_timeout+0x1f8/0x460
[    0.699427] [c00000007a4d3e20] [c000000000d21160] default_idle_call+0x50/0x8c
[    0.699432] [c00000007a4d3e40] [c000000000174374] do_idle+0x334/0x3b0
[    0.700175] [c00000007a4cbcc0] [c0000000001cf568] rcu_gp_kthread+0x988/0xd10
[    0.700179] [c00000007a4cbdb0] [c00000000015b834] kthread+0x154/0x1a0
[    0.700694] [c00000007a4d3ec0] [c000000000174628] cpu_startup_entry+0x38/0x50
[    0.700699] [c00000007a4d3ef0] [c000000000054eb0] start_secondary+0x630/0x660
[    0.701663] [c00000007a4cbe20] [c00000000000bed8] ret_from_kernel_thread+0x5c/0x64
[    0.713142] [c00000007a4d3f90] [c00000000000b35c] start_secondary_prolog+0x10/0x14
[    0.713943] Instruction dump:
[    0.714259] 394a3ee0 f9490000 e8010010 7c0803a6 4e800020 3c4c016a 3842a650 7c421378 
[    0.715081] 7c000026 90010008 60000000 44000022 <80010008> 7c0ff120 4e800020 7c0802a6 


Expected results:
Guest kernel should boot fine or QEMU should reject if the topology is invalid/unsupported.

Additional info:

Comment 1 IBM Bug Proxy 2020-01-09 13:11:01 UTC
------- Comment From muriloo.com 2020-01-09 08:09 EDT-------
Is it possible to attach sosreport of the host?

Comment 2 Gustavo Luiz Duarte 2020-01-10 19:35:40 UTC
Created attachment 1651363 [details]
sosreport

Attaching the sosreport of the host.

Comment 3 Ben Cotton 2020-11-03 17:07:26 UTC
This message is a reminder that Fedora 31 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '31'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 31 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 4 Ben Cotton 2020-11-24 18:54:55 UTC
Fedora 31 changed to end-of-life (EOL) status on 2020-11-24. Fedora 31 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.