Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 228689
Summary: | 2.6.19: EFAIL on MPATH failback | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Charlie Bennett <ccb> |
Component: | kernel | Assignee: | LVM and device-mapper development team <lvm-team> |
Status: | CLOSED WONTFIX | QA Contact: | Brian Brock <bbrock> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 6 | CC: | bmarzins, cebbert, davej, mchristi, triage, wtogami |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i386 | ||
OS: | Linux | ||
Whiteboard: | bzcl34nup | ||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2008-05-06 19:13:47 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Charlie Bennett
2007-02-14 14:48:36 UTC
What does the log look like when failover works the way it should? And exactly which rpm is multipath-tools? I can't find it anywhere on the Fedora site or distro CD. I'll scan my logs for a successful failover-failback test in the morning and snip my logs for you. The RPM which packages multipath-tools is device-mapper-multipath. According to my records, it hasn't been updated since the original release of FC6. I think the reason it passes with no I/O load is probably because the kernel sees the failover / failback request from userspace, not the LLD in the kernel. In the I/O load case, the kernel sees the failure and this causes a removal of the scsi target / device which multipath does not handle well. scsi commands have a reference to the scsi device so if there were commands lingering then the device would not be completely removed and you would hit the above errors. This is probably not likely given the trace in the first comment (looks like IO is being failed correctly). So the other problem that is more likely is that older dm/dm-multipaths could not make changes to the dm table if IO was running through a device on it. So if you do IO, fail all paths, then have dm-multipath internally queue the IO (use queue_if_no_path or no_path_retry or whatver it is called in the version you are using), then dm itself has a reference to the open scsi devices because of the table reference and so when the rport is deleted and it tries to delete the scsi devices, it only partially cleans up the mess. Because dm still has a reference it still sits around and when we fix the link and try to readd devices at the scsi level we hit the nice collision we see in the log. There are work around patches for this problem but James Smart and Hannes have worked on real fixes for the scsi clean up and readdition problem For dm, the NEC guys fixed the table/queueing problem so we can now completely remove the devices and so that also sort of fixes the problem you saw. I am not sure what kernel those patches went in. We should ask alasdair. (In reply to comment #4) > For dm, the NEC guys fixed the table/queueing problem so we can now completely > remove the devices and so that also sort of fixes the problem you saw. I am not > sure what kernel those patches went in. We should ask alasdair. Is there a FC6 kernel based on 2.6.21? If so this has the dm fixes which would prevent getting into the unproperly cleaned up scsi devices mess. Fedora apologizes that these issues have not been resolved yet. We're sorry it's taken so long for your bug to be properly triaged and acted on. We appreciate the time you took to report this issue and want to make sure no important bugs slip through the cracks. If you're currently running a version of Fedora Core between 1 and 6, please note that Fedora no longer maintains these releases. We strongly encourage you to upgrade to a current Fedora release. In order to refocus our efforts as a project we are flagging all of the open bugs for releases which are no longer maintained and closing them. http://fedoraproject.org/wiki/LifeCycle/EOL If this bug is still open against Fedora Core 1 through 6, thirty days from now, it will be closed 'WONTFIX'. If you can reporduce this bug in the latest Fedora version, please change to the respective version. If you are unable to do this, please add a comment to this bug requesting the change. Thanks for your help, and we apologize again that we haven't handled these issues to this point. The process we are following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp We will be following the process here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this doesn't happen again. And if you'd like to join the bug triage team to help make things better, check out http://fedoraproject.org/wiki/BugZappers This bug is open for a Fedora version that is no longer maintained and will not be fixed by Fedora. Therefore we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen thus bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed. |