Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 121902
Summary: | [FIXED] java hangs i686 kernel (eventually), but not i586 | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Keith Irwin <keith.irwin> | ||||||||||||
Component: | kernel | Assignee: | Arjan van de Ven <arjanv> | ||||||||||||
Status: | CLOSED RAWHIDE | QA Contact: | |||||||||||||
Severity: | high | Docs Contact: | |||||||||||||
Priority: | high | ||||||||||||||
Version: | rawhide | CC: | bikehead, byte, chongym, d.bz-redhat, erich, jdennis, jesus.salvo, jjweston, k.georgiou, mack.sessoms, mhw, ronny-rhbugzilla, tinfinity, wtogami | ||||||||||||
Target Milestone: | --- | ||||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | i686 | ||||||||||||||
OS: | Linux | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | Environment: | ||||||||||||||
Last Closed: | 2004-06-09 09:59:38 UTC | Type: | --- | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Bug Depends On: | |||||||||||||||
Bug Blocks: | 125270 | ||||||||||||||
Attachments: |
|
Description
Keith Irwin
2004-04-28 23:37:37 UTC
I take at least some of this back. Playing around with java (compiling a big project here or there, running jboss, etc) can lock things up regardless. Well, java itself locks up, which locks the terminal it's in. I can't kill -9 the processes, and when I ssh in from elsewhere and "reboot," there seems to be some trouble umounting the /home partition (where all this takes place). Where should I file this? Kernel? I had exactly the same symptoms. System would work fine for a while, then the keyboard would stop working (sometimes sticking a key, which seems to consume all the CPU handling it). I can still SSH in. Eventually the system hangs completely. I blamed my crypto-filesystem, so tried again a bunch of times without it. I even tried a uniprocessor kernel. Still got the same problem. Finally, I downgraded to a -305 kernel I had lying around, and the problems went away. The system is a single P4-HT system, 2G memory, matrox video drivers, also running java. I think my IDE runs under 1.4.1_03 (so it's not just JDK 1.4.2_04). It's not local-X related, I had the same problem running X-over-ssh. If you can trace it to the kernel, I'm going to move it to that "component" in hopes that someone will get a look at it sooner rather than later. Now, if I could just find that kernel ... Adding arjanv to the CC list as he's the contact for kernel related matters, I think. And, finally, changing summary. Spelling. (I need my coffee.) can I get slabtop and dmesg output ? I'd be happy to oblige: are you talking just a normal dmesg output once the machine is booted? While java is running (before things crash)? And how to I give you slabtop info? Created attachment 99791 [details]
dmesg output
Here's the dmesg output.
Created attachment 99792 [details]
Snapshot of slabtop output.
Here's the slabtop output.
Created attachment 99794 [details]
Slabtop output with java running (jboss).
Started jboss, here's slabtop. Will try to do something "post crash" next.
Created attachment 99795 [details]
Slabtop output after key lock up (in X).
Here it is after the keys have locked.
Minor correction - I downgraded to kernel-2.6.5-1.315 I run FC2 T3 i386 on an AMD-64. I can compile the kernel and do other large compile jobs no problem. However, when I compile a large java project we are developing the system locks hard: no response, no ssh. It's completely dead. Reboot time. I've tried different Sun VMs 1.4.1_03, 1.4.2_4 and they all do it. I hope this gets fixed before release or FC2 will be completely useless as my java development platform. I just wanted to add that on my machine running X has nothing to do with it. I booted the machine at init level 3 and ran my large compile and it locked the machine in the exact same way. Are there any large open-source (or otherwise freely available) Java programs that can be compiled in order to reproduce this bug? (If I could reproduce it then maybe I could try to narrow down the cause of the bug.) I can confirm this bug; it bit me the other day. Symptoms are as in the original report: keyboard not working, the letter "e" repeated forever in the currently focused window (I was typing "locate" in a terminal when this occurred), java process resisting any attempt at a kill -9 (and an empty strace, too). At the moment the keyboard stopped working, I was simply testing the code (started through Java Web Start) and nothing was being compiled. Of course the bug could have been actually triggered a few seconds before, when it was still compiling. For what it's worth, my code uses a bit of JNI and two ORBs (MICO on the native side and Sun's own on the Java side). This was with the Java 1.5 beta. System is a HT P4 2.8GHz with 2GB of memory, SATA, GeForce4 (nv driver) and a Radeon9200 PCI (open-source driver). Same problem here. It drives me crazy. After recompiling one kernel with the settings I like (e.g. preemptive option), the PC does not completely crash, but only the Java process crashes/stalls and uses 100% CPU time. It can not be killed. You can use eclipse and try a number of features, soon or later the whole PC crashes. It must be new and related to one of the most recent kernels, with test 2 I had no problem. Test 3 does not even boot on one of my PCs. Please try the latest i686 kernel, currently 2.6.5-1.358. Then try booting with kernel paramter 'vdso=0'. If that does not change any behavior, see if the i586 kernel of the same version works any better. http://java.sun.com/j2se/1.5.0/download.jsp Also out of curiosity, is behavior improved any with Sun's Java 1.5.0 beta? Please test everything and report back. So far I have tried it with 2.6.5-1.358 i686 with and without 'vdso=0' and get the same failure. I would try it with the i586 kernel, but can someone tell me a way to install it over a i686 kernel without reinstalling everything? Removing the kernel seems dangerous. Perhaps I do a "rpm -hiv --force"? I understand this is a test machine, but I would like to do the replace in the most safe manner possible. Safest way would be to install an older kernel (hopefully you still have one lying around; if not, then http://people.redhat.com/arjanv/2.6 still had 356 last time I checked), reboot into that, remove 358, then install the i586 358 and reboot into it. Switching to i586 kernel made the problem go away. I hammered the system with multiple simultaneous java compiles and runs with out a problem. With a i686 kernel it would have definitely locked the system up. I've varified this on two different systems. (The weird thing is that I'm positive I installed the i586 kernel but doing uname -a yields "Linux kelly 2.6.5-1.358 #1 Sat May 8 09:00:01 EDT 2004 i686 athlon i386 GNU/Linux"; I can't find i586 mentioned anyware except in the rpm) rpm -q --qf '%{name}-%{version}-%{release}.%{arch}\n' kernel Use this command to display archs of installed kernels. Use "kernel-smp". Oops, the above comment might look funny. If you see a yen symbol before the "n" at the end, it is really a backslash. Also I meant use "kernel-smp" if you have SMP kernels instead of uniprocessor. In order to diagnose this kernel problem, the kernel developers will probably need either Open Source or 'Free as in Beer' test cases that they can run on their own machines. If you know of any that 100% reliably reproducible cases that are legally distributable, please post URLs for download. It would save kernel developer time if you can provide detailed installation, build, and reproducing instructions too. I tried to reproduce it using compiles of eclipse and jboss, but no luck. Still trying to figure out what is special about my build Thank you! With "kernel-2.6.5-1.356.i586" my system can finally run Java apps again :-) I have a P4 2.6 GHz (not HT). Would be interesting to see what the cause is, perhaps a compiler bug or so? Complete lockup with FC2 when : - executing twgcon (IBM Director 4.12 console component), freely (as in beer) installable on qualified IBM equipment (ThinkPad, xSeries, ...). - with the IBMJava2-JRE-1.4.1-8 RPM (from RHEL3 lacd) instead of the standard included IBM Java 1.3 (the latter does not run on FC). - Reproducible, does not happen on FC1 ; - machine is pingable, but cannot be ssh'd into ; Alt-SysRq works. http://people.redhat.com/arjanv/2.6/RPMS.kernel/ Please test the newer i686 kernels from here. Also note that the i586 kernel can be used as a temporary workaround for now. WRT Comment #29 : I am already running 2.6.6-1.370 (evaluating possible ACPI & IEEE1394 fixes), but, this being my production machine (yeah I know), I am not very inclined to test complete lockups if there is no indication in the rpm changelogs that possible fixes/workarounds for e.g. this particular problem are being worked on. In other words, I really appreciate the (sparse) bugzilla #xyz references in the changelogs. :) I'm getting my machine locking up whenever I load an applet in Epiphany. Also, if I start Eclipse through a Gnome launcher my machine locks up too. But if I start Eclipse from the command line, or configure the launcher to "run in a terminal" it seems to work fine, though I haven't had the system installed long enough to know if I run into other problems with more use of Eclipse. I started having this problem after fc2t2 as well. I'm trying to stress test apache tomcat 5.x. I get a complete lock up of fc2 when starting tomcat using the ibm sdk 141. If i use sun sdk 1.4.2_04, tomcat starts but then folds 15 minutes into a stress test (out of memory errors in catalina.out file however "top" shows fc2 has plenty of physical memory available). I also experienced a java process under Fedora Core 2 (kernel 2.6.5-1.358smp) failing to terminate... consuming 99% cpu and kill -9 has no effect. Had to reboot to clear it, though the machine was not "locked up". Sun java version 1.4.2_04-b05. I was running just a regular Java program which I tried to kill with "ctrl-C". I don't believe I was compiling at the time, though I may have been. I had previously been running core 1 on the same machine for a long time without ever seeing this problem. 2 days after the core 2 upgrade, this happens. Definitely seems to be a core 2 specific issue! I'm not running a "test" release but the "official" release, fully up-to-date as of today. How do you install a 586 kernel? I get conflicts with the 686 kernel. Do you do a force, or do you boot in rescue mode off the CD? i've gotten my java apps to work using "setarch -3 i386 (command)" Vanilla RHEL 3 Update 2 on Opteron + Sun JVM 1.4.2 == Abort. Just typing "java" to get command line args causes it. Please try kernel-2.6.6-1.422. I think mingo found a kernel fix for this, but I am not sure if it made it into the test kernels yet. First of all, I have been running the 414 i686 kernel with no problems, where I had a problem with 358 i686 and had to run with a i586 kernel. I tried to install 422, but I get the following error when running as root. Preparing... ########################################### [100%] 1:kernel ########################################### [100%] memlock: Cannot allocate memory Couldn't lock into memory, exiting. mkinitrd failed so I do it by hand: sudo mkinitrd /boot/initrd-2.6.6-1.422.img 2.6.6-1.422 memlock: Cannot allocate memory Couldn't lock into memory, exiting. I tried this twice. Have others successfull installed this kernel? With kernel-2.6.6-1.422, IBM Director 4.12 actually runs without hardlocking my machine. I've overcame my mkinitrd problem (losetup couldn't allocate memory for a new loop device?) and installed 422. I am not seeing lookups that I saw before. The 686 build of kernel-2.6.6-1.422 is working for me too, running WebSphere Application Server 5.1 with IBM's JDK 1.4.1. kernel-2.6.5 would hard-lock the system when starting the app server; I had been using the 586 build as a workaround. I am guessing the memlock problem is due to a bug in that kernel revision that was since fixed. In any case I believe this bug is now fixed for the next FC2 update kernel. Keeping bug open for now so it is easier for others to find. Can someone provide more information about the fix? Or will this information be included inside the changelog? Pascal, you can find more information in this kernel issue: http://bugzilla.kernel.org/show_bug.cgi?id=2839 Created attachment 103728 [details]
syslog
I've just updated FC1 to FC2. Java is unstable, I have some 50+ logs from yesterday (Unexpected Signal : 11). This morning, the system hung up with this in the syslog (the whole file is attached): Sep 11 12:38:46 charizard kernel: kernel BUG at mm/rmap.c:348! [...] Sep 11 12:38:46 charizard kernel: Process java (pid: 7850, threadinfo=e12c0000 task=e024cc50) Linux charizard 2.6.8-1.521.stk16 #1 Fri Sep 3 08:45:37 CDT 2004 i686 i686 i386 GNU/Linux I'll try a 586 kernel, but have to build a >4K stack if the system is to say the same (NVidia driver - hence the taint). Can provide logs if useful, open new bug, report elsewehere, etc.etc., please let me know. Jim, we absolutely cannot support any use with the nvidia driver. Also we do not support if you rebuild the kernel yourself, or use non-standard kernels from 3rd parties. Given that nobody else complained about this problem for a LONG TIME, I wonder if there is something wrong with your configuration or 3rd party kernel. Hi Warren, the problem seems to have come back again. I updated the kernel to 2.6.8-1.521, and this the error I get with IBM's Java SDK 1.4.1 : JVMDG080: Cannot find class com/ibm/jvm/Trace JVMXM012: Error occurred in diagnostics initialization(2) Could not create the Java virtual machine. Blackdown's Java 1.4.1 still runs OK, though I'm not sure if it will hang after a long period of time. Does Sun java have this problem? It causes the kernel and entire system to hang? The original problem was that Java would cause the kernel to fail. If Java itself is failing, then maybe it is just a java problem? After trying lots of kernels and after a complete reinstall (no NVIDIA driver yet), I changed out the memory. This has definitely improved the situation, although galeon, some gnome_applets and Java still occasionally crash. FC1 had been stable for some considerable time (>year?), so either there was a hardware failure coincident with the FC2 upgrade, or something in FC2 works the memory harder/differently from FC1 exposing a pre-existing, but formerly benign, hardware problem. (I considered installing FC1 to determine this, but couldn't be bothered.) Either way, absent new syslog messages, there's no reason to believe the current crashes are kernel related. Sorry to cause trouble. PS The IBM problem seems well know (also bizarre). The advice seems to be to wait for the next update from IBM. Why this would be considered even vaguely kernel related is a mystery to me. Updating to kernel 2.6.8-1.521 from kernel-2.6.7-1.494.2.2 causes this com/ibm/jvm/Trace error for me as well. According to the IBM folks this will be fixed in their next JDK update, and in the meantime there is a workaround of setting LD_ASSUME_KERNEL=2.4 . http://www-106.ibm.com/developerworks/forums/dw_thread.jsp?message=4203085&cat=10&thread=60016&forum=367#4203085 In case anyone is still following along, IBM's Java SDK 1.4.2 build cxia32142-20040926 works on kernel 2.6.8-1.521. If you are a nut case like me trying to run WebSphere App Server on this kernel, you need WAS 5.1.1 with the JDK update from here: http://www-1.ibm.com/support/docview.wss?rs=180&uid=swg24007893 .. at least until 5.1.2 is out. :) That fix mentioned in http://bugzilla.kernel.org/show_bug.cgi?id=2839 ... does it only apply for x86_64 ? Given what was changed was: --- arch/x86_64/mm/fault.c.orig 2004-06-10 19:51:45.000000000 +0200 +++ arch/x86_64/mm/fault.c 2004-06-10 20:38:38.000000000 +0200 |