Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1438316 - CPU x: Machine Check 0: Bank 128: 00000000880x080 Reports
Summary: CPU x: Machine Check 0: Bank 128: 00000000880x080 Reports
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 25
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-04-03 02:31 UTC by gizmo
Modified: 2017-04-14 22:49 UTC (History)
13 users (show)

Fixed In Version: kernel-4.10.9-100.fc24 kernel-4.10.9-200.fc25
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-04-14 22:19:47 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
mce errors (deleted)
2017-04-04 17:58 UTC, Mikhail Krutov
no flags Details

Description gizmo 2017-04-03 02:31:37 UTC
Description of problem:
Seeing multiple mcelog messages which appear to be generated at startup but are output to local terminal after startup.

Version-Release number of selected component (if applicable):
Problem first appeared in kernel 4.10.5-200 and continues in 4.10.6-200.

How reproducible:
For me, it occurs every time, all I have to do is restart the system, or wait a while.


Steps to Reproduce:
1. Reboot
2. Open terminal (using Guake in my case, but seems to appear on any terminal, such as xterm or gnome-terminal)
3.

Actual results:
Random Machine Check 0 errors, that seem to be various combinations of the address 00000000880x080 and CPU x (where 'x' is some number).

Expected results:
No Machine Checks unless there is an actual hardware fault.

Additional info:
I have run both RAM and CPU tests (memtest) and have not found any errors with my hardware (Dell Precision M4800 with i7-4900MQ CPU and 16 GB RAM).  I did notice that there is no /var/log/mcelog file (which is the default location, according to the mcelog man page).  Further investigation shows that mcelog is being launched as follows:

/usr/sbin/mcelog --ignorenodev --daemon --foreground

According to the mcelog man page, --foreground is intended to be used only for debugging.  Is this intended?  I certainly have not made any intentional configuration changes that would have caused mcelog's launch to be altered.

Comment 1 Mikhail Krutov 2017-04-04 10:15:00 UTC
Happens to me as well; 

memtest didn't find any issues; hardware is Asus N750Vj (Intel(R) Core(TM) i7-4700HQ CPU @ 2.40GHz, 16Gb RAM); same software versions. Does not happened on 4.8.6-300.fc25 kernel.

Comment 2 Prarit Bhargava 2017-04-04 17:53:51 UTC
(In reply to Krutov Mikhail from comment #1)
> Happens to me as well; 
> 
> memtest didn't find any issues; hardware is Asus N750Vj (Intel(R) Core(TM)
> i7-4700HQ CPU @ 2.40GHz, 16Gb RAM); same software versions. Does not
> happened on 4.8.6-300.fc25 kernel.

Are your addresses the same as in the description?

P.

Comment 3 Mikhail Krutov 2017-04-04 17:58:55 UTC
Created attachment 1268728 [details]
mce errors

yes, it seems so to me. I've uploaded a $dmesg|grep bank as mce_errors.log.

Comment 4 Randy Barlow 2017-04-04 19:25:58 UTC
I have been receiving MCE's on my laptop the entire time I've had it (X1 Carbon) - I suspect it's due to poor cooling. Nevertheless, I've never had them getting written into my terminals until just yesterday when I updated to kernel-4.10.6-100.fc24.x86_64. I just tried rebooting into 4.9.17-100.fc24.x86_64 and I triggered an MCE there and nothing was written to my terminals. Thus, I think something changed between these kernel versions that is causing the messages to be written to the terminals. It is pretty disruptive, since I do most of my work in terminals as it makes it difficult to use programs like tmux and vim.

Comment 5 Randy Barlow 2017-04-04 19:40:16 UTC
nekoexmachina on Freenode mentioned a workaround for this issue to me, though I haven't tried it myself. /etc/rsyslogd.conf has this line:

*.emerg                                                 :omusrmsg:*

nekoexmachina suggested changing it to:

*.emerg                                                 /var/log/emergency

I don't believe this line is what is suddenly causing this issue to happen (I suspect the kernel is now logging MCE's at emerg level when it logged them at some other level before.) But it might be a nice way to avoid having the terminals spammed. I'm not sure what other messages are logged at emerg that you might *want* to have spammed to terminals though, so this might not be a generally advisable workaround. For example, are shutdown messages sent with this config? (I'm not sure.)

Comment 6 Prarit Bhargava 2017-04-04 20:34:31 UTC
Randy, that's interesting information.  When you updated to the new kernel, did you only update the kernel or other packages as well?

P.

Comment 7 Mikhail Krutov 2017-04-04 20:35:46 UTC
Prarit, in my case I can still run kernel v 4.8.6 (my pre-update kernel) and have no messages in my terminals, even without rsyslogd workaround.

Hope that info helps!

Comment 8 Randy Barlow 2017-04-04 22:24:16 UTC
labbott found an upstream kernel commit that fixes this issue:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cc66afea58f858ff6da7f79b8a595a67bbb4f9a9

I think we can change the component of the bug to the kernel.

Comment 9 Randy Barlow 2017-04-04 22:25:17 UTC
Prarit, similar to what Krutov said I can still run an older kernel that I have installed and this issue does not occur.

Comment 10 Prarit Bhargava 2017-04-10 11:42:42 UTC
(In reply to Randy Barlow from comment #8)
> labbott found an upstream kernel commit that fixes this issue:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> ?id=cc66afea58f858ff6da7f79b8a595a67bbb4f9a9
> 
> I think we can change the component of the bug to the kernel.

Yeah, this looks like it will fix the problem.  I suspect the next kernel rebase will pick up this patch (if it hasn't already).

P.

Comment 11 Fedora Update System 2017-04-10 23:27:25 UTC
kernel-4.10.9-200.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2017-3a9ec92dd6

Comment 12 Fedora Update System 2017-04-10 23:29:12 UTC
kernel-4.10.9-100.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2017-502cf68d68

Comment 13 Prarit Bhargava 2017-04-11 11:32:09 UTC
Neither of the above builds will contain linux.git commit cc66afea58f8 ("x86/mce: Don't print MCEs when mcelog is active").  The commit is in 4.11-rc6.

Moving back to ASSIGNED.

It would be interesting to get test results with 

https://koji.fedoraproject.org/koji/buildinfo?buildID=878715

P.

Comment 14 Fedora Update System 2017-04-11 18:54:42 UTC
kernel-4.10.9-100.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-502cf68d68

Comment 15 Fedora Update System 2017-04-11 19:24:51 UTC
kernel-4.10.9-200.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-3a9ec92dd6

Comment 16 Justin M. Forbes 2017-04-11 19:30:00 UTC
(In reply to Prarit Bhargava from comment #13)
> Neither of the above builds will contain linux.git commit cc66afea58f8
> ("x86/mce: Don't print MCEs when mcelog is active").  The commit is in
> 4.11-rc6.
> 
> Moving back to ASSIGNED.
> 
> It would be interesting to get test results with 
> 
> https://koji.fedoraproject.org/koji/buildinfo?buildID=878715
> 
> P.

Yes, those builds do contain commit cc66afea58f8 as "0001-x86-mce-Don-t-print-MCEs-when-mcelog-is-active.patch"

# rhbz 1438316
Patch859: 0001-x86-mce-Don-t-print-MCEs-when-mcelog-is-active.patch

Did you look at them, or just assume based on upstream stable release?

Comment 17 Prarit Bhargava 2017-04-11 20:34:49 UTC
(In reply to Justin M. Forbes from comment #16)
> (In reply to Prarit Bhargava from comment #13)
> > Neither of the above builds will contain linux.git commit cc66afea58f8
> > ("x86/mce: Don't print MCEs when mcelog is active").  The commit is in
> > 4.11-rc6.
> > 
> > Moving back to ASSIGNED.
> > 
> > It would be interesting to get test results with 
> > 
> > https://koji.fedoraproject.org/koji/buildinfo?buildID=878715
> > 
> > P.
> 
> Yes, those builds do contain commit cc66afea58f8 as
> "0001-x86-mce-Don-t-print-MCEs-when-mcelog-is-active.patch"
> 
> # rhbz 1438316
> Patch859: 0001-x86-mce-Don-t-print-MCEs-when-mcelog-is-active.patch
> 
> Did you look at them, or just assume based on upstream stable release?

Nope, did a rpm -q --changelog on them and some how missed it :/.  Sorry for that.  

P.

Comment 18 Fedora Update System 2017-04-14 22:19:47 UTC
kernel-4.10.9-100.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.

Comment 19 Fedora Update System 2017-04-14 22:49:58 UTC
kernel-4.10.9-200.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.