Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1047637 - kernel 3.12.5-302 page faults at boot on AMD-64 8-core piledriver
Summary: kernel 3.12.5-302 page faults at boot on AMD-64 8-core piledriver
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: xorg-x11-drv-nouveau
Version: 20
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Ben Skeggs
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1003217 1055241 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-01-01 12:31 UTC by bob
Modified: 2015-06-29 14:00 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-06-29 14:00:57 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
screen shot of page fault error (4.10 MB, image/jpeg)
2014-01-01 18:29 UTC, bob
no flags Details
lspci -nvvv | grep 01:00.0 -A12 (834 bytes, text/plain)
2014-01-01 20:47 UTC, bob
no flags Details
lspci | grep VGA (86 bytes, text/plain)
2014-01-01 20:48 UTC, bob
no flags Details
lspci -nvvvv (26.48 KB, text/plain)
2014-01-01 20:48 UTC, bob
no flags Details
dmesg output (149.01 KB, text/plain)
2014-01-02 13:39 UTC, bob
no flags Details
lsmod output (3.56 KB, text/plain)
2014-01-02 13:40 UTC, bob
no flags Details


Links
System ID Private Priority Status Summary Last Updated
FreeDesktop.org 73233 0 None None None Never

Description bob 2014-01-01 12:31:30 UTC
Description of problem:

The AMD-Vi Page Fault error continues to rear it's ugly head in Kernel version 3.12.5-302.fc20.  At boot, several pages of status messages continue normally, until eventually the kernel issues a page fault warning and goes into an endless loop.

Version-Release number of selected component (if applicable):

kernel.x86-64 3.12.2-302.fc20

How reproducible:

Intermittent. In prior releases the boot problem was a 100% show-stopper, where the kernel would never allow the machine to boot.  After the FC20 upgrade the problem is now intermittent -- sometimes the kernel will process the temporary AMD-Vi page fault errors and proceed normally, but sometimes the machine will get stuck in an endless loop.  I'd say I have a 50/50 chance of a successful boot in most circumstances.  This renders the machine unreliable for unattended operation.  Reverting to 3.4.4-200.fc18 results in reliable operation.

Steps to Reproduce:
1. Upgrade to current kernel release
2. Try to boot
3.

Actual results:

AMD-Vi page fault errors and endless loop at boot

Expected results:

normal boot.

Additional info:

Comment 1 Michele Baldessari 2014-01-01 17:15:12 UTC
Can we get a screenshot or even better the exact error messages, please?

Comment 2 bob 2014-01-01 18:29:09 UTC
Created attachment 844192 [details]
screen shot of page fault error

Page fault errors during boot.  AMD-64 8-core pyledriver cpu.

Comment 3 Michele Baldessari 2014-01-01 19:57:07 UTC
Hi Bob,

ok so it is device 01:00.0 generating those. Can you paste:
lspci -nvvv | grep 01:00.0 -A12

to this BZ please?

Also is this an optimus system:?
Can you post "lspci | grep VGA" here please

thanks,
Michele

Comment 4 Michele Baldessari 2014-01-01 19:59:44 UTC
Meh ignore the previous command and just post the full content of "lspci -nvvvv" here please

Comment 5 bob 2014-01-01 20:47:04 UTC
Created attachment 844238 [details]
lspci -nvvv | grep 01:00.0 -A12

results of the following command:

lspci -nvvv | grep 01:00.0 -A12

Comment 6 bob 2014-01-01 20:48:00 UTC
Created attachment 844239 [details]
lspci | grep VGA

Comment 7 bob 2014-01-01 20:48:55 UTC
Created attachment 844240 [details]
lspci -nvvvv

Comment 8 Michele Baldessari 2014-01-02 07:19:22 UTC
So the device creating these is a:
NV43 [GeForce 6600 GT]

Kernel driver in use: nouveau

Attempting to bisect the change from 3.4.4 where this works would be way too
much work, I'm afraid.

Can you get me the output of dmesg and lsmod please?

Thanks,
Michele

Comment 9 bob 2014-01-02 13:39:56 UTC
Created attachment 844560 [details]
dmesg output

Comment 10 bob 2014-01-02 13:40:36 UTC
Created attachment 844561 [details]
lsmod output


thanks again.

Comment 11 Michele Baldessari 2014-01-02 18:03:12 UTC
I opened https://bugs.freedesktop.org/show_bug.cgi?id=73233 upstream for this.
Feel free to CC yourself there so we don't need to wait on me to relay stuff here.

Comment 12 Josh Boyer 2014-01-06 19:42:55 UTC
*** Bug 1003217 has been marked as a duplicate of this bug. ***

Comment 13 bob 2014-01-19 19:52:05 UTC
Still a problem with 3.12.7-300.fc20.x86_64:

https://bugzilla.redhat.com/show_bug.cgi?id=1055241

Comment 14 Josh Boyer 2014-01-20 14:24:54 UTC
*** Bug 1055241 has been marked as a duplicate of this bug. ***

Comment 15 bob 2014-01-21 08:39:55 UTC
FYI:  Switching to the proprietary Nvidia video driver completely eliminates the wont-boot problem that is caused by the Nouveau driver.  The proprietary driver works great.  The open source driver won't let my machine boot.  

Insofar as this Nouveau driver has been a plauge upon Fedora since 18 and it just isn't getting fixed, I'm throwing in the towel on the Nouveau driver, and I'll deal with running the proprietary Nvidia driver because at least it will let my machine boot.

I'll follow this bug report via email updates just in case any further information is needed.  Thanks for your help.

Comment 16 bob 2014-10-01 21:01:49 UTC
Update:

I just built another system based on the AMD FX-8350 CPU and an ASUS M5A92 LE 2.0 motherboard.  This bug is persistent.

Attempting to boot the F20 DVD installation media fails when the Nouveau driver is loaded.  The AMD-Vi page-fault errors continue to occur upon booting the Live DVD if IOMMU is enabled in BIOS.  Turning off IOMMU allows the Live DVD to boot, though the install ultimately fails.  (The install fails because Anaconda won't properly bring up the ethernet adapter with the 64-bit DVD, though ethernet works fine with 32-bit installation media.)  

So it seems that I'm confirming the persistence of the Nouveau video driver bug with 64-bit F20 installation media, if IOMMU and AMD-Vi are turned on.

(I guess I'm going to need to report a new/separate bug related to the defective DVD installation medium in bringing up ethernet.)

Comment 17 Fedora End Of Life 2015-05-29 10:16:24 UTC
This message is a reminder that Fedora 20 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 20. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '20'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 20 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 18 Fedora End Of Life 2015-06-29 14:00:57 UTC
Fedora 20 changed to end-of-life (EOL) status on 2015-06-23. Fedora 20 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.