Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 845143
Summary: | Soft lockup on Ionics Stratus ARM computer | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Steven A. Falco <safalco> | ||||||
Component: | kernel | Assignee: | Jon Masters <jcm> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 17 | CC: | andrew, gansalmon, itamar, jonathan, kernel-maint, lkundrak, madhu.chinakonda, matthew.hirsch, pbrobinson | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | arm | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2013-04-13 08:27:47 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 245418 | ||||||||
Attachments: |
|
Description
Steven A. Falco
2012-08-02 00:15:34 UTC
Created attachment 601844 [details]
Complete console log showing the lockup
I'll add a "me too" - I have been having this problem (a very similar BUG message) on a sheevaplug. It could be related to the network stack. I run into it when 1) running a python script that streams data over the network 2) accessing saned via xinetd. 3) Other times seemingly at random. I have encountered this using kernel-kirkwood-3.4.2-3.fc17.armv5tel.rpm, and 3.6.0-0.rc6.git2.1.fc18.armv5tel.kirkwood. (For me, the 3.5.X kernels don't boot due to a different RTC bug). I'll attach an example of the BUG message below. There is at least one other person running into this bug on a dreamplug: http://forums.fedoraforum.org/showthread.php?t=284430 I have noticed that the RTC is set to a strange value after the soft lockup. Created attachment 617361 [details]
soft lockup bug console log
Thank you for your bug reports! Could you please check whether the following patch fixes the issue for you? https://github.com/lkundrak/linux/commit/e88886956244c67d924d612c9a8af7d01f1adc26 Unfortunately, I no longer own the IONICS Stratus, so I cannot try out the proposed fix. However, the patch you mentioned sure seems like a good candidate. Matt - you had a similar problem. Are you in a position to try the fix? I will try this with my sheevaplug - it will be a couple of weeks before I can get to it though. I'm in the middle of a huge deadline. Sorry! Actually, if you have a compiled kernel rpm package I can try it sooner, but I don't have an ARM build setup going right now. Matt, this is what I use on my Guruplug currently: http://fedorapeople.org/~lkundrak/kernel-kirkwood/ Hope it helps! Lubomir, does that kernel have the patch you suggested applied? I have installed it on my sheevaplug, and all seems well so far. I'll report back if any soft lockups happen. Thanks! Matt Just a followup - looks good so far. Haven't had any lockups under the circumstances that previously caused them. However, I do get messages like these in the messages file: [92579.500850] mv_xor mv_xor.0: mv_xor_clean_completed_slots 362 [92579.500914] mv_xor mv_xor.0: mv_xor_prep_dma_memcpy dest: 18ad9960 src 1d0cb084 len: 258 flags: 18 [92579.500936] mv_xor mv_xor.0: mv_xor_prep_dma_memcpy sw_desc de692000 async_tx de69203c [92579.500951] mv_xor mv_xor.0: mv_xor_tx_submit sw_desc de692000: async_tx de69203c [92579.500968] mv_xor mv_xor.0: mv_xor_start_new_chain 270: sw_desc de692000 [92579.500980] mv_xor mv_xor.0: activate chan. [92579.501000] mv_xor mv_xor.0: intr cause 3 [92579.501014] mv_xor mv_xor.0: mv_xor_device_clear_eoc_cause, val 0xfffffffe [92579.501036] mv_xor mv_xor.0: __mv_xor_slot_cleanup 402 [92579.501050] mv_xor mv_xor.0: current_desc 7844d40 [92579.501062] mv_xor mv_xor.0: mv_xor_clean_completed_slots 362 [92579.501097] mv_xor mv_xor.0: mv_xor_clean_slot 379: desc de692000 flags 18 [92579.501113] mv_xor mv_xor.0: mv_xor_free_slots 255 slot de692000 [92579.501164] mv_xor mv_xor.0: mv_xor_clean_completed_slots 362 Thank you for your testing, Matt. What you see is expected -- it's based off Rawhide kernel that has tons of debugging and sanity checks enabled (this is from CONFIG_DMA_DEBUG or something like that), equivalent to kernel-debug from release builds. It is probably a bit slower than release kernel as well, thus if you intend to use this, you may want to reapply the patch to a release kernel instead. Have a nice 2013! As mentioned on the linux-arm-kernel mailing list, removing these checks is wrong. The hardware imposes a minimum buffer size of 16 bytes. In practice, any DMA operation less than 128 bytes are rejected, because software will be faster than setting up the DMA, handling the interrupt when its finished, scheduling and running a tasklet, and running the completion callback etc. The crypto/async_tx/async_memcpy.c:async_memcpy() functions has no issues with this minimum size limit, it falls back to memcpy(). Have you tried the following. On low memory systems it basically keeps a little free for things like network buffers. We've seen it helps for some other ARM platforms. cat << EOF > /etc/sysctl.conf vm.min_free_kbytes = 12288 EOF Two notes: I tried kernel-kirkwood-3.7.3-101.fc17.armv5tel, and it seemed to still suffer from this problem. I have been running Lubomir's patch for two months without encountering this bug, so something he did should give a clue as to what the problem is, even if it's not the "right" fix. It seems vm.min_free_kbytes = 12288 is already set by default by initscripts-9.37.2-1.fc17.armv5tel, so that does not address this issue. Please retest with 3.8.x Please reopen this bug. I still have this problem with the latest kernel, 3.8.3-102.fc17.armv5tel.kirkwood. Please reopen this bug. The current kernel still has this problem. |