Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1665063
Summary: | qemu aarch64 pl011 driver shouldn't use synchronous writes | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | Richard W.M. Jones <rjones> | ||||||
Component: | qemu-kvm | Assignee: | Guowen Shan <gshan> | ||||||
qemu-kvm sub component: | Devices | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||
Status: | CLOSED WONTFIX | Docs Contact: | |||||||
Severity: | high | ||||||||
Priority: | urgent | CC: | gshan, knoel, lcapitulino, rbalakri, virt-maint | ||||||
Version: | 8.0 | Keywords: | OtherQA, Regression | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | aarch64 | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | 1661940 | Environment: | |||||||
Last Closed: | 2020-03-01 22:32:40 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 910269, 1677408 | ||||||||
Attachments: |
|
Description
Richard W.M. Jones
2019-01-10 12:40:20 UTC
(In reply to Richard W.M. Jones from comment #0) > (Bug 1661940 is about changing the ordering of events to make > qemu more robust, so don't complain that libvirt is doing it > wrong :-) I mean to make *libvirtd* more robust :-) Setting this to 8.1, unless we consider it a blocker. btw, I'm wondering, the pl011 driver seems very old. Is there another uart driver we could use for arm? QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks Created attachment 1664266 [details]
TCP server/client program
Created attachment 1664267 [details]
qemu patch - enforce flooding on PL011 on startup
I think Richard already provided all the details. The issue seems caused by flooding on TCP connection, which isn't accepted on server side. I don't know how to build a private libvirtd, so the attached "tcp.c" and patch are used to reproduce the issue. First of all, run binary "tcp", which is build from "tcp.c" as below. The transmission fails and returns -EAGAIN on client side after continuous 431 transmissons. It means the client can send data even the connection isn't accepted on server side. Besides, the sending buffer can be overruned quickly. session1# ./tcp -s 50900 server: connected with client localhost (127.0.0.1) session2# ./tcp -c 50900 line[0]: succeed, 1024 line[1]: succeed, 1024 line[2]: succeed, 1024 line[3]: succeed, 1024 : line[431]: succeed, 1024 line[432]: succeed, 673 line[433]: ret=-1, errno=11 Another experiment is start a VM with customized qemu, which is built from upstream qemu and the the attached patch. It also proves the flooding on PL011 can prevent the system from booting: session1# ./tcp -s 50900 session2# ~/sandbox/src/qemu/qemu.main.aarch64/aarch64-softmmu/qemu-system-aarch64 \ -machine virt,gic-version=3 -cpu max -m 4096 \ -smp 2,sockets=2,cores=1,threads=1 \ -monitor none -serial tcp:127.0.0.1:50900 -nographic -s \ : flood_uart: 0 flood_uart: 1 flood_uart: 2 flood_uart: 3 flood_uart: 4 flood_uart: 5 : flood_uart: 2573 flood_uart: 2574 <system stops here> For this, a patch has been posted for review. I will do downstream backporting once it's finalized. https://patchwork.kernel.org/patch/11393393/ The community chooses not to fix the corner case according to the discussion as below link shows. Instead, the user should have the TCP connection accepted and kept being polled for data on the server side. So I'm closing this with "WONTFIX". https://patchwork.kernel.org/patch/11393393/ |