Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1478201
Summary: | kernel runs out of memory with 256 virtio-scsi disks | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Richard W.M. Jones <rjones> | ||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||
Status: | CLOSED UPSTREAM | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | rawhide | CC: | gansalmon, ichavero, itamar, jforbes, jonathan, kernel-maint, labbott, madhu.chinakonda, mchehab, pcahyna, rjones, tbzatek | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2018-05-02 08:55:16 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 910269 | ||||||
Attachments: |
|
Description
Richard W.M. Jones
2017-08-03 21:44:22 UTC
I bisected this to: 5c279bd9e40624f4ab6e688671026d6005b066fa is the first bad commit commit 5c279bd9e40624f4ab6e688671026d6005b066fa Author: Christoph Hellwig <hch> Date: Fri Jun 16 10:27:55 2017 +0200 scsi: default to scsi-mq Remove the SCSI_MQ_DEFAULT config option and default to the blk-mq I/O path now that we had plenty of testing, and have I/O schedulers for blk-mq. The module option to disable the blk-mq path is kept around for now. Signed-off-by: Christoph Hellwig <hch> Signed-off-by: Martin K. Petersen <martin.petersen> :040000 040000 57ec7d5d2ba76592a695f533a69f747700c31966 c79f6ecb070acc4fadf6fc05ca9ba32bc9c0c665 M drivers To bisect this I used the following libguestfs script which adds 1 appliance disk + 255 scratch disks (all virtio-scsi) to a VM, and checks that it boots up to userspace. The crash happens before we reach userspace. #!/usr/bin/perl -w use Sys::Guestfs; my $g = Sys::Guestfs->new (); $g->set_trace (1); $g->set_verbose (1); my $i; for ($i = 0; $i < 255; ++$i) { $g->add_drive_scratch (1024*1024); } $g->launch (); $g->shutdown (); print "PASSED\n" I wrote a script to find using a binary search the max number of disks that can be added to our guest which has 1 vCPU and 500MB RAM (no swap): With scsi-mq enabled: 175 disks With scsi-mq disabled: 1755 disks Created attachment 1309205 [details] find-max-disks.pl The test I used for comment 3. This requires supermin >= 5.1.18 and a patched libguestfs: https://github.com/rwmjones/libguestfs/tree/max-disks I started a thread on LKML. No takers at present ... https://lkml.org/lkml/2017/8/4/601 Patches posted to the kernel: https://lkml.org/lkml/2017/8/10/708 and qemu: https://lists.nongnu.org/archive/html/qemu-devel/2017-08/msg02085.html If these are accepted then we will also need changes to libvirt and libguestfs. Did these get picked up? This is not fixed upstream. Please leave this bug open. I have some more news to report on this. It was temporarily fixed in 4.15/4.16, but it has regressed again in 4.17.0-rc1. 4.15.0-0.rc2.git2.1.fc28.x86_64: >= 256 virtio-scsi disks * 4.15.0-0.rc8.git0.1.fc28.x86_64: >= 256 virtio-scsi disks * 4.16.3-300.fc28.x86_64: >= 256 virtio-scsi disks * 4.17.0-0.rc1.git1.1.fc29.x86_64: 191 virtio-scsi disks Could this be something to do with Rawhide kernels & debug settings? How do I find out if a Rawhide kernel has debug enabled? * The version of libguestfs I'm using doesn't allow me to add more than 256 disks. In general, rawhide kernels have debug enabled other than the rc*-git0.1 versions. If you want to test whether it is a debug vs non debug issue, you can always check the kernels from the rawhide-nodebug respository. kernel-4.17.0-0.rc3.git1.2.fc29.x86_64 (nodebug): >= 256 virtio-scsi disks So yes it looks like enabling debug reduces the number of virtio-scsi disks that can be added for whatever reason. Since this is now working I'm going to close this bug as fixed upstream. Is it really fixed? I am having a similar problem with the scsi_debug driver, bz1675071. The bug does not show up in Fedora, but this seems to be simply because scsi-mq is off by default. |