Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1224764
Created attachment 1029500 [details]
dmesg taken from working kernel 3.19.7-200
Created attachment 1033734 [details]
acpidump
Can you try following the instructions at https://fedoraproject.org/wiki/Kernel/EarlyDebugging to get more information? a bit more context would be be helpful. Created attachment 1035607 [details]
[1/2] backtrace of crash with lpj=2195008 loglevel=7 bootdelay=500
Booted the problematic kernel with lpj=2195008 loglevel=7 bootdelay=500 and this is all I get.
Created attachment 1035608 [details]
[2/2] backtrace of crash with lpj=2195008 loglevel=7 bootdelay=500
With those additional kernel parameters, the crash is shown very quickly, so I guess this happens very early on boot. Tested kernel-4.0.5-300.fc22.x86_64 and the issue still remains. I've tested kernel-4.2.0-0.rc3.git0.1.fc24.x86_64.rpm from rawhide in this box (Fedora 22) and got the same results (panic). Please let me know if there's anything else I can test/provide Thanks, I've tried to boot again kernel 4.0.8 with lpj=2195008 loglevel=7 bootdelay=100000 but the previous log messages before the backtrace are printed way to fast to read. I could only get an extremely blurry picture with my phone's video recorder. If there's any way to get the kernel to print this messages so they can be read properly? Thanks, Created attachment 1054976 [details]
[1/2] Kernel msgs before backtrace
Created attachment 1054977 [details]
[2/2] Kernel msgs before backtrace
You can try the (experimental) scripts I wrote to do a bisect between the f21 kernel you were using and 4.0.4. This will identify which commit is actually breaking things for you. Please see https://pagure.io/fedbisect Hi I tested your scripts to bisect between kernels. I had some problems with the Makefile in which the script tried to automerge a couple of times. When this happened, I ran the script again. After fixing that the process stopped here: [root@zotac fedora]# ./fedbisect.sh good Makefile: needs merge Makefile: needs merge Makefile: unmerged (eb4eca56843a9fc205bfefed40c11927542ec368) Makefile: unmerged (ef748e17702f5109bf2678fb57f7929ef411d938) Makefile: unmerged (28126de3118a1337f9f83b94b0812ec2058a64fa) fatal: git-write-tree: error building trees Cannot save the current index state Makefile: needs merge error: you need to resolve your current index first 829a3ada9cc7d4c30fa61f8033403fb6c8f8092a is the first bad commit commit 829a3ada9cc7d4c30fa61f8033403fb6c8f8092a Author: Jesse Gross <jesse> Date: Fri Jan 2 18:26:03 2015 -0800 geneve: Simplify locking. The existing Geneve locking scheme was pulled over directly from VXLAN. However, VXLAN has a number of built in mechanisms which make the locking more complex and are unlikely to be necessary with Geneve. This simplifies the locking to use a basic scheme of a mutex when doing updates plus RCU on receive. In addition to making the code easier to read, this also avoids the possibility of a race when creating or destroying sockets since UDP sockets and the list of Geneve sockets are protected by different locks. After this change, the entire operation is atomic. Signed-off-by: Jesse Gross <jesse> Signed-off-by: David S. Miller <davem> :040000 040000 ae876f8b2255f74b093bc55339356c8e1831754c a59803cc74a79fda347cbbd5a256edcf7a898af8 M include :040000 040000 3ae4afd4d2b076ee7f268ce05164b93b984992e8 ddaa34292d988c43eb241e4436ad31f1a6e50b57 M net # first bad commit: [829a3ada9cc7d4c30fa61f8033403fb6c8f8092a] geneve: Simplify locking. Found your commit! Please let me know if this makes sense or if the conflicts I had have messed the bisect process. Hi again, I tested your scripts again to bisect between kernels because I think I messed up the first time I ran it with the conflicts I mentioned and this time It reached a different bad commit which seems to make more sense: ./fedbisect.sh start 3.19.8-200.fc21 4.0.0-1.fc22 [...] ./fedbisect.sh bad No local changes to save 659006bf3ae37a08706907ce1a36ddf57c9131d2 is the first bad commit commit 659006bf3ae37a08706907ce1a36ddf57c9131d2 Author: Thomas Gleixner <tglx> Date: Thu Jan 15 21:22:26 2015 +0000 x86/x2apic: Split enable and setup function enable_x2apic() is a convoluted unreadable mess because it is used for both enablement in early boot and for setup in cpu_init(). Split the code into x2apic_enable() for enablement and x2apic_setup() for setup of (secondary cpus). Make use of the new state tracking to simplify the logic. Signed-off-by: Thomas Gleixner <tglx> Cc: Jiang Liu <jiang.liu.com> Cc: Joerg Roedel <joro> Cc: Tony Luck <tony.luck> Cc: Borislav Petkov <bp> Link: http://lkml.kernel.org/r/20150115211703.129287153@linutronix.de Signed-off-by: Thomas Gleixner <tglx> :040000 040000 d14acf68224b6524568662dba1c3df4a5d4e8e46 979ea61c8245c1a1c47f14179f31cb96619e9357 M arch # first bad commit: [659006bf3ae37a08706907ce1a36ddf57c9131d2] x86/x2apic: Split enable and setup function Found your commit! Regarding the conflicts I mentioned, this usually happened in the kernel Makefile: javier@zotac ~ % git diff diff --git a/Makefile b/Makefile index e41a335..4a7be84 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,7 @@ VERSION = 3 PATCHLEVEL = 19 SUBLEVEL = 0 -EXTRAVERSION = -rc4 +EXTRAVERSION = -fedbisect-1 NAME = Diseased Newt # *DOCUMENTATION* Which I fixed with: javier@zotac ~ % git checkout -- ../Makefile I guess restarting the process by running again the ./fedbisect.sh <good|bad> script messed the first bisect attempt. Please let me know if you need any other thing for me to test. Thanks. Now that it's clearer why the kernel crashes, I changed a bios setting that says Local x2apic to disabled and managed to boot kernel 4.0.8, but I'm not sure what I'm missing. Still, I would like to continue debugging this problem since this cpu enables that setting by default everytime it crashes and loads bios default settings. Thanks for doing the bisect with the experimental scripts. I need to make the script work across rcs as well. The change you found makes sense and it's good to know that changing a BIOS setting works as well. I'll send a report upstream. Actually Before I send an e-mail out, can you try the latest kernel and verify that it is still broken? I've upgraded to lastest available kernel in f22: kernel-core-4.1.3-201.fc22.x86_64. Disabling the x2apic in the bios boots and enabling it crashes with the same trace as before, so yes, it's still broken. Thanks. Created attachment 1064089 [details]
possible fix for apic crash
Can you try the following patch from tglx?
Hi The patch doesn't work. I've compiled kernel 4.1.6 with the patch following the instructions found here: https://fedoraproject.org/wiki/Building_a_custom_kernel In order to apply the patch, I added it to the kernel.spec file in the standalone patches section: # Standalone patches Patch512: 0001-Test-patch-from-tglx.patch and where the patches are applied: # Misc fixes ApplyPatch 0001-Test-patch-from-tglx.patch And built the kernel. I also checked that the kernel-4.1.fc22/linux-4.1.6-201.fc22.x86_64/arch/x86/kernel/apic/apic.c file included the patch. I rebooted, enabled local x2apic in the bios and booted kernel 4.1.6 and the crash happened. Created attachment 1064487 [details]
More kernel panic error msgs.
One thing I forgot to mention when doing the bisect is that one of the kernels that crashed included the following error:
Kernel panic - not syncing: Boot APIC ID in local APIC unexpected (255 vs 0).
See the attached screenshot for more details.
Thanks for testing, did the patch have any effect at all or was it still the same crash? That screen shot you showed during the bisect is useful as well (at least I think so, I'll have to pass that along upstream) The crash with the patched kernel appeared to be the same one, so I think it did not have any effect. Can you please boot with that patch applied and add the following on the kernel command line: nox2apic Thanks, tglx From the picture I'm seeing its a zotac zbox. Some of them have a pin header for connecting a serial port. Does yours have one by chance? Hi I've booted with 4.1.6 with the patch applied and nox2apic and the machine booted fine without crashing. Thanks, You're right, this is a zotac zbox ID82. Unfortunately, I've checked the motherboard and the COM 1 doesn't include the pin header. > I've booted with 4.1.6 with the patch applied and nox2apic and the machine
> booted fine without crashing.
Can you please provide the dmesg of that boot?
Created attachment 1065560 [details]
dmesg from 4.1.6 patched, using param nox2apic and booting with local x2apic enabled on bios.
Can you upload your .config as well, please? Created attachment 1065731 [details]
Proposed fix
Can you please replace the first patch by this one. I think I identified the reason for the wreckage. Remove nox2apic from the command line again.
Thanks,
tglx
Created attachment 1065794 [details]
config for patched 4.1.6
Hi again, The new patch works and the computer boots with local x2apic enabled on bios and without nox2apic kernel parameter. Thanks! Created attachment 1065829 [details]
dmesg from 4.1.6 with patch 2 booting with local x2apic enabled on bios.
Created attachment 1065830 [details]
config from 4.1.6 with patch 2.
Fix hit Linus tree and is tagged for stable http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=a57e456a7b28431b55e407e5ab78ebd5b378d19e Javier, thanks for your help! kernel-4.2.0-1.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2015-14782 kernel-4.2.0-1.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report.\nIf you want to test the update, you can install it with \n su -c 'yum --enablerepo=updates-testing update kernel'. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-14782 Hi Is this going to be backported to Fedora 22? Thanks, Yes, it's in the tree. The next time a build happens it will be released. kernel-4.2.0-1.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report. kernel-4.1.6-201.fc22 has been submitted as an update to Fedora 22. https://bodhi.fedoraproject.org/updates/FEDORA-2015-15130 kernel-4.1.6-201.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report.\nIf you want to test the update, you can install it with \n su -c 'yum --enablerepo=updates-testing update kernel'. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-15130 kernel-4.1.6-201.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report. kernel-4.1.7-100.fc21 has been submitted as an update to Fedora 21. https://bodhi.fedoraproject.org/updates/FEDORA-2015-15933 kernel-4.1.7-100.fc21 has been pushed to the Fedora 21 testing repository. If problems still persist, please make note of it in this bug report.\nIf you want to test the update, you can install it with \n su -c 'yum --enablerepo=updates-testing update kernel'. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-15933 kernel-4.1.7-100.fc21 has been pushed to the Fedora 21 stable repository. If problems still persist, please make note of it in this bug report. |
Created attachment 1029498 [details] Picture of kernel panic Description of problem: I've upgraded from fedora 21 to 22 (x86_64) and the kernel 4.0.4 doesn't boot, it crashes in a kernel panic that includes the following: native_apic_mem_read+0x3/0x10 Previous kernel (kernel-core-3.19.7-200.fc21.x86_64) booted fine. Version-Release number of selected component (if applicable): name : kernel-core Arch : x86_64 Epoch : 0 Version : 4.0.4 Release : 301.fc22 How reproducible: Always. Additional info: I've booted with noefi, nox2apic, acpi_rsdp=APIC but same results. Thanks,