Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1471302
Summary: | 4.12 renders mtx unable to manipulate a tape library | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Jason Tibbitts <j> | ||||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | unspecified | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | rawhide | CC: | fedora-kernel-scsi, gansalmon, ichavero, itamar, jonathan, kernel-maint, labbott, madhu.chinakonda, mchehab | ||||||
Target Milestone: | --- | Keywords: | Reopened | ||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2018-04-06 18:39:39 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Jason Tibbitts
2017-07-14 23:23:43 UTC
So the last good build was 4.12.0-0.rc0.git2.1.fc27 which corresponds to v4.11-4395-g89c9fea. The next successful koji build is 4.12.0-0.rc0.git4.1.fc27 which corresponds to v4.11-8539-gaf82455. Sadly there are three days of merge window in between. I know I need to either bisect down further in there or manually examine the changes between those two commits but sadly I don't really have any idea how. I only every learned how to use mock to build kernel packages, so if you give me a patch I can easily test it but I don't know how to take the vanilla upstream source and do a proper bisect. In retrospect, that was rather poor wording. The last good build is 4.12.0-0.rc0.git2.1.fc27. The next koji build after that which actually produced packages is 4.12.0-0.rc0.git4.1.fc27 which I have tried and found to be bad. So the problem came in between those two releases, which correspond to v4.11-4395-g89c9fea (good) and v4.11-8539-gaf82455 (bad). I spent the whole day building kernels and git bisect gives me: 5c66d9393f583778e8dc1ee6a69c5bbe9ab28eaa is the first bad commit commit 5c66d9393f583778e8dc1ee6a69c5bbe9ab28eaa Author: NeilBrown <neilb> Date: Mon Apr 10 12:15:13 2017 +1000 scsi: ibmvfc: don't check for failure from mempool_alloc() mempool_alloc() cannot fail when passed GFP_NOIO or any other gfp setting that is permitted to sleep. So remove this pointless code. Signed-off-by: NeilBrown <neilb> Acked-by: Tyrel Datwyler <tyreld.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen> :040000 040000 2ade2de52266bc7b55c0be0d0b1c371b4dc72a1a 223c5a3fb34016c7dc230177773ff63185126442 M drivers That doesn't feel related to me, but I'll revert, rebuild and see what happens. I redid the bisection, hopefully without screwing it up, and this time it ended at: commit 28676d869bbb5257b5f14c0c95ad3af3a7019dd5 Author: Johannes Thumshirn <jthumshirn> Date: Fri Apr 7 09:34:15 2017 +0200 scsi: sg: check for valid direction before starting the request Check for a valid direction before starting the request, otherwise we risk running into an assertion in the scsi midlayer checking for valid requests. [mkp: fixed typo] Signed-off-by: Johannes Thumshirn <jthumshirn> Link: http://www.spinics.net/lists/linux-scsi/msg104400.html Reported-by: Dmitry Vyukov <dvyukov> Signed-off-by: Hannes Reinecke <hare> Tested-by: Johannes Thumshirn <jthumshirn> Reviewed-by: Christoph Hellwig <hch> Signed-off-by: Martin K. Petersen <martin.petersen> which does look like it could be related. Supposedly fixed by the following, but I'll do some more builds to test. If this does fix things I'll send it to stable and ask that it be included in our 4.12 kernels so my backups don't break. commit 68c59fcea1f2c6a54c62aa896cc623c1b5bc9b47 Author: Johannes Thumshirn <jthumshirn> Date: Fri Jul 7 10:56:38 2017 +0200 scsi: sg: fix SG_DXFER_FROM_DEV transfers SG_DXFER_FROM_DEV transfers do not necessarily have a dxferp as we set it to NULL for the old sg_io read/write interface, but must have a length bigger than 0. This fixes a regression introduced by commit 28676d869bbb ("scsi: sg: check for valid direction before starting the request") Signed-off-by: Johannes Thumshirn <jthumshirn> Fixes: 28676d869bbb ("scsi: sg: check for valid direction before starting the request") Reported-by: Chris Clayton <chris2553> Tested-by: Chris Clayton <chris2553> Cc: Douglas Gilbert <dgilbert> Reviewed-by: Hannes Reinecke <hare> Tested-by: Chris Clayton <chris2553> Acked-by: Douglas Gilbert <dgilbert> Signed-off-by: Martin K. Petersen <martin.petersen> diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c index 21225d6..1e82d41 100644 --- a/drivers/scsi/sg.c +++ b/drivers/scsi/sg.c @@ -758,8 +758,11 @@ static bool sg_is_valid_dxfer(sg_io_hdr_t *hp) if (hp->dxferp || hp->dxfer_len > 0) return false; return true; - case SG_DXFER_TO_DEV: case SG_DXFER_FROM_DEV: + if (hp->dxfer_len < 0) + return false; + return true; + case SG_DXFER_TO_DEV: case SG_DXFER_TO_FROM_DEV: if (!hp->dxferp || hp->dxfer_len == 0) return false; Turns out that patch doesn't help as far as I can tell. A build of current master HEAD still has the problem, and cherry picking that patch on top of a clean v4.12 also still has the problem. Reverting 28676d869bbb5257b5f14c0c95ad3af3a7019dd5 on top of a clean 4.12 checkout works just fine. Created attachment 1305032 [details]
Upstream fix manually applied against 4.12.
The original author decided to mostly revert the original patch and go with a much simpler check. I manually applied that against 4.12 and it's also fine. The resulting patch is attached.
Created attachment 1305357 [details]
Upstream patch fixing this issue
The attached patch was committed to the scsi-fixes tree so it should appear in 4.13. It was also sent to stable so hopefully it will appear in a 4.12 point release soon.
I should add that this has already been committed to Fedora's 4.12 series, so I'll go ahead and close this ticket out. My thanks to everyone who helped me to get this bisected and fixed. kernel-4.12.4-300.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-14ad2c5d17 kernel-4.12.4-300.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-14ad2c5d17 This somehow got marked as reopened? It was reopened when the update was submitted, and I'm thinking that particular update was superseded by another one at some point and never actually got pushed to stable. The problem with mtx has certainly not reoccurred. |