Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.

Bug 528312 (udev-intel-flood)

Summary:

udev takes almost 100% CPU due to xorg (intel) continuously re-initializing displays

Product:

[Fedora] Fedora

Reporter:

Matěj Cepl <mcepl>

Component:

xorg-x11-server

Assignee:

Adam Jackson <ajax>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

high

Docs Contact:

Priority:

medium

Version:

CC:

alessandro.suardi, alexvillacislasso, andy, apbartok, awilliam, bobg+redhat, bookreviewer, bschneiders, bugzilla.redhat, cbm, cindwhite1, erecio, eric.brunet, erik-fedora, fzuuzf, gbarros, jaroslav.pulchart, jirka, jrb, jruemker, karlcz, karl+rhbugzilla, knnthsrnsn, lfarkas, luto, martin, mcepl, mdl-mailing, me, mefoster, mhlavink, mishu, nkudriavtsev, opensource, paul, pbaumgar, pingou, pnewell0705, ramindeh, rcrodgers622, redhat-bugzilla, redhat, rhbugzilla, smarlow, theo148, tvujec, valent.turkovic, whanlon, xgl-maint, zingale

Target Milestone:

---

Keywords:

Patch, Reopened, Triaged

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

card_GM45

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Clones:

591709 (view as bug list)

Environment:

Last Closed:

2011-11-30 15:37:30 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

432388, 591709

Attachments:

Description	Flags
/var/log/messages	none
/var/log/audit/audit.log	none
stdout from dmesg	none
strace.txt	none
strace-02.txt	none
strace-3.txt	none
ps-axc-log.txt	none
ps-axc-02.txt	none
udevadm.txt	none
udevadm-2.txt	none
/var/log/Xorg.0.log	none
'udevadm monitor --env' from right before suspend, then resume, and until things work as normal	none
upstream patch reformatted to fit Fedora kernel tree	none
Disable HDMI hotplug for 2.6.32.9	none
/proc/interrupts dump from a machine where xorg intel and apcupsd seem to fight	none
patch from the lkml discussion	none
udevadm monitor --property of kernel 2.6.32.10-94.bz528312.fc12.x86_64	none
dmesg with drm.debug=0x0f from effected system	none
messages, dmesg and xorg.0.log from effected system	none
dmesg with drm.debug=0x0f on Intel Mobile 4	none
relevant parts of /var/log/messages corresponding to previous dmesg output	none
Workaround patch	none
Output of intel_gpu_dump	none
Ported the workaround patch to kernel 3.5.3	none

Description Matěj Cepl 2009-10-11 07:42:33 UTC

Created attachment 364366 [details]
/var/log/messages

Description of problem:
When resuming computer from suspend-to-RAM after a brief pause udevd takes 100% CPU and it won't let computer work for couple of seconds. Then computer works as it is supposed to.

Version-Release number of selected component (if applicable):
udev-debuginfo-145-9.fc12.x86_64
udev-145-10.fc12.x86_64

How reproducible:
75% (mostly it happens, but no always)

Steps to Reproduce:
1.suspend/resume notebook
2.
3.
  
Actual results:
see above

Expected results:
computer should just work upon resume

Additional info:

Comment 1 Matěj Cepl 2009-10-11 07:43:32 UTC

Created attachment 364367 [details]
/var/log/audit/audit.log

Comment 2 Matěj Cepl 2009-10-11 07:44:01 UTC

Created attachment 364368 [details]
stdout from dmesg

Comment 3 Harald Hoyer 2009-10-11 10:48:41 UTC

can you strace udevd?

what is the output of "ps axc" during the 100% CPU?

Comment 4 Matěj Cepl 2009-10-12 15:52:14 UTC

So, of course, in the moment I want to debug, I cannot reproduce it. Picking NEEDINFO and let's see how it goes. When I will have more information, I will let you know, otherwise feel free to close.

Comment 5 Matěj Cepl 2009-10-13 19:36:20 UTC

Created attachment 364640 [details]
strace.txt

Comment 6 Matěj Cepl 2009-10-13 19:36:27 UTC

Created attachment 364641 [details]
strace-02.txt

Comment 7 Matěj Cepl 2009-10-13 19:36:41 UTC

Created attachment 364642 [details]
strace-3.txt

Comment 8 Matěj Cepl 2009-10-13 19:36:49 UTC

Created attachment 364643 [details]
ps-axc-log.txt

Comment 9 Matěj Cepl 2009-10-13 19:36:56 UTC

Created attachment 364644 [details]
ps-axc-02.txt

Comment 10 Matěj Cepl 2009-10-13 20:18:58 UTC

(In reply to comment #3)
> can you strace udevd?
> 
> what is the output of "ps axc" during the 100% CPU?  

It is hard to start all data collection when udev strikes, but I think I managed to get some data at least. What do you think?

Comment 11 Harald Hoyer 2009-10-14 08:14:12 UTC

ok :) the important part in strace is missing...

"strace -s 2048" extends the string size of strace.

It seems something emits a lot of "change" events.. (means a lot of open("w")/close() on a device) or fnotify does not work.

Try to run as root:

# udevadm monitor --env

so we can see exactly what is happening

Comment 12 Matěj Cepl 2009-10-14 21:35:16 UTC

Created attachment 364819 [details]
udevadm.txt

Comment 13 Matěj Cepl 2009-10-14 21:35:24 UTC

Created attachment 364820 [details]
udevadm-2.txt

Comment 14 Matěj Cepl 2009-10-14 21:36:29 UTC

Hmm, plot is getting darker ... I am afraid that after all Xorg IS root of all evil :(

Comment 15 Harald Hoyer 2009-10-15 08:39:41 UTC

yes.. it seems to open()/close() very fast and in a loop.. please reassign.

Comment 16 Matěj Cepl 2009-10-15 09:49:56 UTC

Created attachment 364885 [details]
/var/log/Xorg.0.log

Comment 17 Mary Ellen Foster 2009-10-22 13:34:56 UTC

*** Bug 528894 has been marked as a duplicate of this bug. ***

Comment 18 Mary Ellen Foster 2009-10-23 15:17:42 UTC

I'm not sure what has changed, but I've rebooted several times today (I usually would see these symptoms after a cold boot) and X hasn't gone off the deep end once.

Comment 19 Mary Ellen Foster 2009-10-23 15:20:39 UTC

Addendum: looking at the list of packages that were updated on this computer yesterday -- yesterday because I didn't get symptoms this morning even -- it looks like it *might* have been the update to xorg-x11-drv-evdev-2.3.0-1.fc12.x86_64 that fixed things ...

Comment 20 Adam Williamson 2009-10-23 21:45:28 UTC

discussed at today's blocker bug meeting: this is downgraded to target as the impact is not serious enough to be a blocker (just means the system is very slow for a couple of minutes after resuming).

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 21 Mary Ellen Foster 2009-10-26 10:08:17 UTC

Well, looks like I spoke too soon on Friday -- when I booted this morning, the computer was pretty much unusable for about 15 minutes after I logged in (X using 80+% CPU, load average over 5). So for me at least, it's more like very slow for 15 minutes after booting, not just for "a couple of minutes after resuming" -- certainly serious for me. :(

Addendum: and it started doing it again about 10 minutes after it stopped! Argh, must be Monday ...

Comment 22 Adam Jackson 2009-10-29 23:12:01 UTC

Reporters, what outputs does your machine claim to have?  'xrandr' output is sufficient.

Comment 23 Mary Ellen Foster 2009-10-30 09:56:44 UTC

Here's xrandr on my machine (desktop, Intel graphics):
Screen 0: minimum 320 x 200, current 1280 x 1024, maximum 8192 x 8192
VGA1 connected 1280x1024+0+0 (normal left inverted right x axis y axis) 338mm x 270mm
   1280x1024      60.0*+
   1024x768       60.0
   800x600        60.3
   640x480        60.0
DVI1 disconnected (normal left inverted right x axis y axis)
DP1 disconnected (normal left inverted right x axis y axis)

Comment 24 MartinG 2009-10-31 23:05:50 UTC

One more datapoint from me:

I've seen this bug in Rawhide for some time now (a month or two?).
Lenovo Thinkpad T400, Intel Corporation Mobile 4 Series, i915.
When I do a clean boot, everything works just fine, but when I suspend to ram,
and then resume, I have normal behaviour for maybe 5-10 seconds, then
everything locks up, 'top' shows X is eating cpu. After 10-30 seconds I can use
the laptop for maybe 2-3 seconds, and things lock up again. This continues for
about 1-2 minutes. Then everything works as normal.

For me this is 100% reproducable.

I use KMS (ie. I don't have nomodeset in grub).

$ xrandr
Screen 0: minimum 320 x 200, current 1440 x 900, maximum 8192 x 8192
LVDS1 connected 1440x900+0+0 (normal left inverted right x axis y axis) 304mm x
190mm
   1440x900       60.2*+   50.0
   1024x768       60.0
   800x600        60.3     56.2
   640x480        59.9
VGA1 disconnected (normal left inverted right x axis y axis)
DVI1 disconnected (normal left inverted right x axis y axis)
DP1 disconnected (normal left inverted right x axis y axis)
DVI2 disconnected (normal left inverted right x axis y axis)
DP2 disconnected (normal left inverted right x axis y axis)
DP3 disconnected (normal left inverted right x axis y axis)


My current versions:
$ rpm -qa \*udev\* \*intel\*
libgudev1-145-11.fc12.x86_64
xorg-x11-drv-intel-2.9.1-1.fc12.x86_64
system-config-printer-udev-1.1.13-6.fc12.x86_64
libudev-145-11.fc12.i686
libudev-145-11.fc12.x86_64
libgudev1-145-11.fc12.i686
udev-145-11.fc12.x86_64
intel-gpu-tools-2.9.1-1.fc12.x86_64
xorg-x11-drv-intel-devel-2.9.1-1.fc12.x86_64

Comment 25 MartinG 2009-10-31 23:07:50 UTC

Created attachment 366986 [details]
'udevadm monitor --env' from right before suspend, then resume, and until things work as normal

Comment 26 Matěj Cepl 2009-10-31 23:43:49 UTC

bradford:~$ xrandr 
Screen 0: minimum 320 x 200, current 1440 x 900, maximum 8192 x 8192
LVDS1 connected 1440x900+0+0 (normal left inverted right x axis y axis) 303mm x 190mm
   1440x900       60.0*+   50.0  
   1024x768       60.0  
   800x600        60.3     56.2  
   640x480        59.9  
VGA1 disconnected (normal left inverted right x axis y axis)
DVI1 disconnected (normal left inverted right x axis y axis)
DP1 disconnected (normal left inverted right x axis y axis)
DVI2 disconnected (normal left inverted right x axis y axis)
DP2 disconnected (normal left inverted right x axis y axis)
DP3 disconnected (normal left inverted right x axis y axis)
bradford:~$

Comment 27 Matěj Cepl 2009-10-31 23:45:04 UTC

and yes behavior is identical to what MartinG described in comment 24

Comment 28 MartinG 2009-11-02 22:19:07 UTC

I see that this bug is set as a "fedora-x-target Fedora Universal X target" blocker. Shouldn't it also be a F12 blocker? Or is to too hardware specific?

Just asking, no biggie. Please let me know if I can provide any other logs etc...

Comment 29 Adam Williamson 2009-11-02 22:24:36 UTC

see comment #20, we did discuss it at a meeting and decided the impact was not severe enough to qualify as a release blocker.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 30 MartinG 2009-11-04 22:03:23 UTC

Good news! I just upgraded the kernel from kernel-2.6.31.5-96.fc12.x86_64 to kernel-2.6.31.5-115.fc12.x86_64, and have not hit the bug yet (crossing fingers). I've just done five suspend(ram)/resume cycles, successfully. I also upgraded a bunch of other packages, xorg-x11-server-common-1.7.0-5.fc12.x86_64, xorg-x11-server-Xorg-1.7.0-5.fc12.x86_64, glibc-2.11-1.x86_64, DeviceKit-disks-009-3.fc12.x86_64 to mention some.

And may I add that resuming is incredibly fast, yay! (I've got a solid state disk, btw)

Comment 31 Matěj Cepl 2009-11-05 10:33:34 UTC

(In reply to comment #28)
> I see that this bug is set as a "fedora-x-target Fedora Universal X target"
> blocker. Shouldn't it also be a F12 blocker? Or is to too hardware specific?

fedora-x-target blocks F12Target
fedora-x-blocker blocks F12Blocker

Comment 32 MartinG 2009-11-05 15:06:04 UTC

I spoke too soon. I had this lockup issue again. What I did this time, was to put the laptop in suspend-to-ram while the power cord was plugged, let it be suspended for several hours, then unplug the powercord while in suspend, and then open the lid to wake it up. After about five seconds of "normal" behaviour, the mouse got jerky, and things locked up for several seconds. 

Thanks for the clarification in comment #31, btw.

Comment 33 Matěj Cepl 2009-11-05 17:17:47 UTC

Since this bugzilla report was filed, there have been several major updates in various components of the Xorg system, which may have resolved this issue. Users who have experienced this problem are encouraged to upgrade their system to the latest version of their packages (at least F12Beta, but even better if the very latest versions).

Please, if you experience this problem on the up-to-date system, let us now in the comment for this bug, or whether the upgraded system works for you.

If you won't be able to reply in one month, I will have to close this bug as INSUFFICIENT_DATA. Thank you.

[This is a bulk message for all open Fedora Rawhide Xorg-related bugs. I'm adding myself to the CC list for each bug, so I'll see any comments you make after this and do my best to make sure every issue gets proper attention.]

Comment 34 MartinG 2009-11-05 19:55:20 UTC

My comment #32 is with a fully updated system. No new updates available in Rawhide per Thu Nov  5 19:52:22 UTC 2009. So this bug is still valid.

Current packages:
kernel-2.6.31.5-115.fc12.x86_64

# rpm -qa \*intel\* \*Xorg\* \*drm\* \*glibc\* \*udev\*
libgudev1-145-11.fc12.x86_64
libdrm-devel-2.4.15-1.fc12.x86_64
xorg-x11-drv-intel-2.9.1-1.fc12.x86_64
glibc-2.11-1.i686
system-config-printer-udev-1.1.13-6.fc12.x86_64
glibc-2.11-1.x86_64
libdrm-2.4.15-1.fc12.i686
glibc-headers-2.11-1.x86_64
libdrm-2.4.15-1.fc12.x86_64
libudev-145-11.fc12.i686
libudev-145-11.fc12.x86_64
glibc-debuginfo-2.10.90-25.x86_64
libgudev1-145-11.fc12.i686
udev-145-11.fc12.x86_64
intel-gpu-tools-2.9.1-1.fc12.x86_64
xorg-x11-drv-intel-devel-2.9.1-1.fc12.x86_64
glibc-devel-2.11-1.x86_64
glibc-common-2.11-1.x86_64
xorg-x11-server-Xorg-1.7.0-5.fc12.x86_64

Lenovo Thinkpad T400

I'd be happy to test suggested packages from koji if any.

Comment 35 Adam Williamson 2009-11-05 21:43:35 UTC

martin: #33 was an automated comment which makes not much sense in this context, sorry.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 36 MartinG 2009-11-07 23:30:44 UTC

Bug still around using kernel-2.6.31.5-122.fc12.x86_64, xorg-x11-server-Xorg-1.7.1-7.fc12.x86_64.

Comment 37 Bug Zapper 2009-11-16 13:29:56 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle.
Changing version to '12'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 38 Harald Hoyer 2009-11-17 12:23:52 UTC

*** Bug 538006 has been marked as a duplicate of this bug. ***

Comment 39 Matěj Cepl 2009-11-24 21:23:19 UTC

We filed this bug in the upstream database (https://bugs.freedesktop.org/show_bug.cgi?id=25259) and believe that it is more appropriate to let it be resolved upstream.

We will continue to track the issue in the centralized upstream bug tracker, and will review any bug fixes that become available for consideration in future updates.

Thank you for the bug report.

Comment 40 william hanlon 2009-12-10 20:55:14 UTC

I experience this bug on a i386 desktop system running Fedora 12 fully updated and it is a show stopper for me. Anyone else experiencing it would agree. The system becomes practically unusable. It should not be closed.

see bugs #538196 and #541184.

Comment 41 Matthew Hails 2009-12-11 09:40:16 UTC

I also experience this problem - on an HP EliteBook 6930p, running Fedora 12 i686 fully updated, without using suspend. The problem is very erratic, and I often have long periods of usability, but once it kicks in the system is pretty much unusable. I also consider it a showstopper.

Comment 42 Adam Williamson 2009-12-11 22:13:01 UTC

it's closed upstream because it's being worked on upstream. It doesn't mean it won't be fixed.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 43 Adam Williamson 2009-12-11 22:14:27 UTC

*** Bug 538196 has been marked as a duplicate of this bug. ***

Comment 44 Adam Williamson 2009-12-11 22:14:32 UTC

*** Bug 541184 has been marked as a duplicate of this bug. ***

Comment 45 Matěj Cepl 2009-12-15 08:20:11 UTC

Created attachment 378444 [details]
upstream patch reformatted to fit Fedora kernel tree

Comment 46 Ralf Ertzinger 2009-12-16 13:54:17 UTC

I have built 2.6.30.10-104 from the F-11 CVS tree with the patch from #45 included, and it does not fix the problem on my machine.

Comment 47 Ralf Ertzinger 2009-12-16 14:08:39 UTC

However, also excluding HDMIC_HOTPLUG_INT_STATUS seems to do the trick for me, the udev storms are gone.

Comment 48 Matěj Cepl 2009-12-17 17:48:10 UTC

(In reply to comment #46)
> I have built 2.6.30.10-104 from the F-11 CVS tree with the patch from #45
> included, and it does not fix the problem on my machine.  

Yeah, comments in the upstream bug indicate that it helps only for some models; it actually seems to help me, but not you.

Comment 49 Levente Farkas 2009-12-21 07:21:28 UTC

Thats vwhat happened with me too. And my old f11 udev scripts no longer works. But xorg problem is more. Important

Comment 50 Adam Williamson 2009-12-22 15:10:57 UTC

*** Bug 509762 has been marked as a duplicate of this bug. ***

Comment 51 EMR_Fedora 2009-12-31 20:23:15 UTC

Please note after applying the patch listed for my HDMI/udevd issue, I get the following every second in my syslog:

Dec 31 15:19:40 pcsca65 kernel: DRHD: handling fault status reg 3
Dec 31 15:19:40 pcsca65 kernel: DMAR:[DMA Write] Request device [00:02.0] fault addr b08003000
Dec 31 15:19:40 pcsca65 kernel: DMAR:[fault reason 05] PTE Write access is not set
Dec 31 15:19:40 pcsca65 kernel: DRHD: handling fault status reg 3
Dec 31 15:19:40 pcsca65 kernel: DMAR:[DMA Write] Request device [00:02.0] fault addr b08003000
Dec 31 15:19:40 pcsca65 kernel: DMAR:[fault reason 05] PTE Write access is not set

I tried commenting out two bits and just three bits same error message.

Comment 52 MartinG 2010-01-15 19:35:54 UTC

Anything I can test on my Lenovo T400? I still have this bug with kernel 2.6.32.2-18.fc13.x86_64 (rawhide) on Intel Mobile 4, i915; on every resume from suspend (to ram), the system functions normal for some seconds, and then locks up for up to a minute or so.

Comment 53 Adam Williamson 2010-02-05 17:27:37 UTC

Is this the same as https://bugzilla.redhat.com/show_bug.cgi?id=523646 ?

Comment 54 Matěj Cepl 2010-02-06 14:44:38 UTC

I don't think so, see my bug 523646 comment 55

Comment 55 Levente Farkas 2010-03-03 16:53:09 UTC

even if it's upstream why do you close this bug?
udev fills my Xorg.0.log and it becomes a few gigabytes since udev re-discover my samsung lcd. and since udev use 100% cpu i can't use my system. it's still valid on a fully updated f12!

Comment 56 william hanlon 2010-03-03 18:00:45 UTC

(In reply to comment #55)
> even if it's upstream why do you close this bug?
> udev fills my Xorg.0.log and it becomes a few gigabytes since udev re-discover
> my samsung lcd. and since udev use 100% cpu i can't use my system. it's still
> valid on a fully updated f12!    

I agree as well. Perhaps I'm ignorant of how things are done, but closing the bug kind of sweeps it under the rug doesn't it? You have a product that is practically crippled when used on certain popular hardware. Should it not stay open in some fashion so that if you do some sort of audit of bugs that need to be addressed, it'll show up? Even if it's moved upstream it still affects your currently released product and it should be recognized as being an outstanding issue.

Comment 57 EMR_Fedora 2010-03-04 06:43:02 UTC

I agree too. I have been running a kernel.org (hand patched) kernel b/c it's the only way I can use my computer.

Comment 58 Matěj Cepl 2010-03-04 17:30:17 UTC

*** Bug 523646 has been marked as a duplicate of this bug. ***

Comment 59 Danny Yee 2010-03-04 18:16:15 UTC

If this is the bug that's going to be kept, and the others closed, the title should be changed.  This problem has nothing to do with resume from suspend - except as one possible trigger - as I was getting this without any suspension involved.

I eventually got the system to work by reverting to a Fedora 11 kernel, but this is hardly a robust solution.  This appears to affect all Intel X4500 drivers, so it's a pretty big problem.

Comment 60 Adam Williamson 2010-03-04 20:06:50 UTC


-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 61 Adam Williamson 2010-03-04 20:07:03 UTC

adjusted, hope that's correct.

Comment 62 Pierre-YvesChibon 2010-03-05 07:22:48 UTC

(In reply to comment #55)
> since udev use 100% cpu i can't use my system. it's still
> valid on a fully updated f12!    
Same problem for me, I still have to "killall udevd" each time I reboot the computer

Comment 63 Steven Dollins 2010-03-06 19:46:24 UTC

The duplicate bug https://bugzilla.redhat.com/show_bug.cgi?id=523646 had
priority high and had F13Blocker status.  Could we please have those
designations added to this one?

Comment 64 Adam Williamson 2010-03-07 06:30:03 UTC

you can nominate it as a blocker yourself, it requires no special privileges. Priority is to be set by the package maintainer only - https://fedoraproject.org/wiki/BugZappers/BugStatusWorkFlow#Priority_and_Severity

Comment 65 Steven Dollins 2010-03-09 06:53:42 UTC

The recently released F12 kernel 2.6.32.9 appears to have fixed the bug for my ASUS ul80vt notebook (hybrid Intel GM45 / nVidia discrete graphics).  I can cold boot, warm boot, and resume from hibernate or suspend with only a few EDID probes triggered by each.

Thank you for the kernel version bump.

Does anyone still have trouble after updating to this kernel?

Comment 66 Pierre-YvesChibon 2010-03-09 07:58:03 UTC

I have just rebooted on 2.6.32.9-67.fc12.x86_64 and after a while, Xorg starts again to use one processor slowing down to death the computer.
The usual killall udevd works still (and is still needed).

In my case (might be related I have no idea), I have:
* a dual screen (vga and hdmi)
* a wireless keyboard/mouse on usb
* graphic card: 4 Series Chipset Integrated Graphics Controller [8086:2E12] (driver i915)

Comment 67 Matthew Hails 2010-03-11 14:44:21 UTC

Problem still happens for me using 2.6.32.9-70.fc12.i686, on HP EliteBook 6930p (Intel GM45). System is completely unusable for 5 - 10 minutes at a time. And that's just cold booting, no suspend/resume.

I can get rid of the problem (as before) by patching my kernel as per https://bugs.freedesktop.org/show_bug.cgi?id=25259 - by completely disabling the HDMI bits in the hotplug mask (I'm only using VGA output, not HDMI).

But since the last comment in that freedesktop bug suggests it might be fixed in 2.6.33rc7, perhaps I shouldn't be surprised it's not fixed in 2.6.32.9.

Comment 68 Levente Farkas 2010-03-11 14:59:08 UTC

for me it also still exist with kernel-PAE-2.6.32.9-70.fc12.i686

Comment 69 Jaroslav Pulchart 2010-03-11 15:28:06 UTC

Same for my F12 installation with kernel 2.6.32.9-70. Suspend and resume is trigger of this issue for me.

Comment 70 Colin Macdonald 2010-03-11 16:40:39 UTC

I'm curious about something: in the upstream bug, comment #2 (https://bugs.freedesktop.org/show_bug.cgi?id=25259#c2), ajax comments that this is because of a patch (uevent.patch?) that Fedora ships that does input plugging events.

So why not just disable uevent.patch in the SRPM?  That's what I did and haven't seen this problem since.  

Plugging in a VGA monitor, having X notice and extending desktop is cool: but not as cool as having working suspend/resume!

Wouldn't this be a stopgap solution for F12?  Then try to get it working in F13...

Comment 71 Levente Farkas 2010-03-11 17:01:53 UTC

100% agree!

Comment 72 Adam Williamson 2010-03-11 22:13:10 UTC

That does sound pretty reasonable to me.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 73 holger.schletz 2010-03-13 09:59:54 UTC

(In reply to comment #70)
> So why not just disable uevent.patch in the SRPM?  That's what I did and
> haven't seen this problem since.  

I tried to disable the patch just to find out that it's not there. What exactly have you done to get it working?

The well-known patch for 2.6.31 (disabling all HDMI bits) no longer works with 2.6.32. Is there a working solution for the latest F12 kernel?

Comment 74 Aram Agajanian 2010-03-16 00:02:27 UTC

I was able to disable the uevent patch in xorg-x11-drv-intel as per comment #70.  

The user indicated their reported slowdown problem did not occur since the build without the uevent patch was installed.

Comment 75 Matthew Hails 2010-03-16 08:56:35 UTC

Created attachment 400407 [details]
Disable HDMI hotplug for 2.6.32.9

Comment 76 Matthew Hails 2010-03-16 08:57:56 UTC

(In reply to comment #73)
> The well-known patch for 2.6.31 (disabling all HDMI bits) no longer works with
> 2.6.32. Is there a working solution for the latest F12 kernel?    

Works for me. See attachment (id=400407). It's the only way my laptop is usable!

Comment 77 Jaroslav Pulchart 2010-03-16 09:59:56 UTC

Could we expect this patch in kernel build 2.6.32.10.*?

Comment 78 Adam Williamson 2010-03-16 17:36:41 UTC

I suppose it might be nice to throw in a kernel parameter to enable the hotplug code, for people for whom it works and who actually find it useful?



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 79 Andrew Parker 2010-03-16 18:01:21 UTC

(In reply to comment #75)
> Created an attachment (id=400407) [details]
> Disable HDMI hotplug for 2.6.32.9    

alas this didn't work for me, i still have to kill off udevd to be able to do anything on my systems.

Comment 80 william hanlon 2010-03-16 19:21:49 UTC

I've been running kernel-2.6.32.9-70.fc12.i686 for about a week now and it has resolved my problems. I'm running on a desktop using VGA only (no suspend/resume issues here).

Comment 81 Need Real Name 2010-03-19 02:10:04 UTC

This problem still occurs for me on an Intel desktop motherboard.  It is a little more difficult to trigger than just booting, but is still consistently triggered on my mythtv machine at home.

X will start and auto-login to my mythtv GUI user without trouble.  But when I activate mythfrontend, it will trigger this spinning, so it seems somehow the application is inducing the server to start probing the outputs again.

Interestingly, on my system apcupsd is also spinning at the same time, and if I kill apcupsd, xorg will stop spinning and behave normally. But apcupsd doesn't act abnormally except with xorg is also spinning.  Given that apcupsd is just talking to my UPS via USB, perhaps this clue will help people track down the interrupt handling mess?  I will attach a dump of /proc/interrupts from the machine.

Comment 82 Need Real Name 2010-03-19 02:12:20 UTC

Created attachment 401161 [details]
/proc/interrupts dump from a machine where xorg intel and apcupsd seem to fight

This is the /proc/interrupts dump I mentioned in my previous comment

Comment 83 Andrew Parker 2010-03-30 10:24:49 UTC

System is still unusable with 2.6.32.10-90.fc12

Comment 84 fzuuzf 2010-03-30 14:57:18 UTC

The patch in
http://lkml.org/lkml/2010/3/27/88
seams to help here.
I would like to know, if it also helps you.

Comment 85 Matěj Cepl 2010-03-31 12:31:17 UTC

Created attachment 403730 [details]
patch from the lkml discussion

(In reply to comment #84)
> The patch in
> http://lkml.org/lkml/2010/3/27/88
> seams to help here.
> I would like to know, if it also helps you.    

Taken from http://thread.gmane.org/gmane.linux.kernel/967076 (or http://article.gmane.org/gmane.linux.kernel/967076/raw if you prefer).

Comment 86 Andrew Parker 2010-03-31 21:34:24 UTC

(In reply to comment #84)
> The patch in
> http://lkml.org/lkml/2010/3/27/88
> seams to help here.
> I would like to know, if it also helps you.    

Works like a charm for me.

Comment 88 Matěj Cepl 2010-04-01 15:03:10 UTC

Testing scratch build is now brewing at http://koji.fedoraproject.org/koji/taskinfo?taskID=2088892 anybody can download and try this.

Comment 89 Andrew Parker 2010-04-01 23:17:04 UTC

(In reply to comment #88)
> Testing scratch build is now brewing at
> http://koji.fedoraproject.org/koji/taskinfo?taskID=2088892 anybody can download
> and try this.    

this also works for me too.

thanks

Comment 90 Guil Barros 2010-04-02 12:16:10 UTC

same here, fixed.

Comment 92 MartinG 2010-04-03 11:44:10 UTC

Almost gone, but not quite:
I installed kernel-2.6.32.10-94.bz528312.fc12.x86_64, rebooted, logged in and then put the laptop to sleep (Lenovo Thinkpad T400, Intel Mobile 4 series). Then, unplugged the power supply, and opened the lid to wake it up; as usual, the laptop partly froze for several seconds (maybe one minute).

However, when I tried to reproduce it, by doing a couple of more suspend/resume cycles, everything seems to work just smooth.

"udevadm monitor --property" gives about 52 KiB of text.

(btw, the severe flickering seen on eg. kernel-2.6.34-0.19.rc2.git4.fc14.x86_64 is gone too)

Comment 93 MartinG 2010-04-03 14:34:23 UTC

Created attachment 404303 [details]
udevadm monitor --property of kernel 2.6.32.10-94.bz528312.fc12.x86_64

The problem persists on kernel 2.6.32.10-94.bz528312.fc12.x86_64, but is less frequent it seems. Attached is "udevadm monitor --property" from right before suspend/resume cycle, until things behave normal again. I had to let the machine stay in suspend for a while to reproduce it. The partial lockup lasts for about a minute. This is on a Lenovo T400, Intel Mobile 4.

Comment 94 fzuuzf 2010-04-03 17:00:35 UTC

(In reply to comment #93)
> This is on a Lenovo T400, Intel Mobile 4.

What kind & count of connectors to external displays does it have?

Comment 95 MartinG 2010-04-03 18:35:33 UTC

> What kind & count of connectors to external displays does it have?

I have none connected, but there is one VGA connector directly on the laptop, and the dock-in station have one or maybe two DVIs if recall correctly. (I am not using the dock-in station).

Comment 96 Andrew Parker 2010-04-11 17:48:48 UTC

kernel-PAE-2.6.32.11-99.fc12.i686 works great for me.

Note that this is with a desktop, so suspend/resume was n/a for me.

Comment 97 Jaroslav Pulchart 2010-04-12 06:19:21 UTC

My laptop: Lenovo T400 with docking station, DVI connected to second monitor
Kernel: 2.6.32.10-94.bz528312.fc12.x86_64

OK:
- I cannot reproduce this issue after suspend to disk or ram :)

ISSUE:
- after some "working time" GUI "freeze" again with udev 100% :(

Comment 98 ramindeh 2010-04-12 21:00:17 UTC

I have the same problem: udevd eating up the CPU. Problem did not occur in FC10, started with clean install of FC12. In "normal" state, the CPU is hot (60° and more) and the load is at 52%, what would correspond to one full core of the CPU :-(

Killing udevd (all the processes) helps, but then disks will not automount and the cursor focus is lost every minute or so - e.g. in Konsole the focus goes away from the shell and the menu Edit is selected. I have noticed that opening a Dolphin (file manager) window will increase the CPU load, same thing when I mount a removable drive.

There is always one udevd process which eats the CPU, a second one which is always restarted by the heavy one, and about a dozen which seem to be just idling and which will not restart if killed.


System info:

Kernel: 2.6.32.11-99.fc12.x86_64
Hardware: MSI-GT725
CPU: Intel core2duo P-9500 dual core 
ATI M98L mobility Radeon HD-4850

$ xrandr
Screen 0: minimum 320 x 200, current 1680 x 1050, maximum 8192 x 8192
LVDS connected 1680x1050+0+0 (normal left inverted right x axis y axis) 0mm x 0mm
   1680x1050      60.0*+
   1400x1050      60.0     60.0  
   1280x1024      59.9     60.0  
   1440x900       59.9  
   1280x960       60.0     59.9  
   1280x854       59.9  
   1280x800       59.8  
   1280x720       59.9  
   1152x768       59.8  
   1024x768       60.0     59.9  
   800x600        60.3     59.9     56.2  
   848x480        59.7  
   720x480        59.7  
   640x480        59.9     59.4  
VGA-0 disconnected (normal left inverted right x axis y axis)
HDMI-0 disconnected (normal left inverted right x axis y axis)


$ less /proc/interrupts
          CPU0       CPU1       
  0:    5639116    8699832   IO-APIC-edge      timer
  1:          5       4995   IO-APIC-edge      i8042
  4:          0          0   IO-APIC-edge      enecir
  8:          1          0   IO-APIC-edge      rtc0
  9:        543     240060   IO-APIC-fasteoi   acpi
 12:         71         65   IO-APIC-edge      i8042
 16:          1        248   IO-APIC-fasteoi   uhci_hcd:usb3, firewire_ohci, mmc0
 17:         89         15   IO-APIC-fasteoi   HDA Intel
 18:          0          0   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb8
 19:     183242         20   IO-APIC-fasteoi   uhci_hcd:usb5, uhci_hcd:usb7
 21:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4
 22:       2631        141   IO-APIC-fasteoi   HDA Intel
 23:          0     191516   IO-APIC-fasteoi   ehci_hcd:usb2, uhci_hcd:usb6
 24:          0          0   PCI-MSI-edge      pciehp
 25:          0          0   PCI-MSI-edge      pciehp
 26:          0          0   PCI-MSI-edge      pciehp
 27:          0          0   PCI-MSI-edge      pciehp
 28:          0          0   PCI-MSI-edge      pciehp
 29:    8954244      10423   PCI-MSI-edge      ahci
 30:        163     149265   PCI-MSI-edge      radeon
 31:         14      34116   PCI-MSI-edge      eth0
 32:          0          0   PCI-MSI-edge      iwlagn
NMI:          0          0   Non-maskable interrupts
LOC:   12111100   10081211   Local timer interrupts
SPU:          0          0   Spurious interrupts
PMI:          0          0   Performance monitoring interrupts
PND:          0          0   Performance pending work
RES:     148588     242181   Rescheduling interrupts
CAL:         58        140   Function call interrupts
TLB:      84812      81857   TLB shootdowns
TRM:          0          0   Thermal event interrupts
THR:          0          0   Threshold APIC interrupts
MCE:          0          0   Machine check exceptions
MCP:         54         54   Machine check polls
ERR:          1
MIS:          0

From /var/log/Xorg.0.log :
(EE) AIGLX error: dlopen of /usr/lib64/dri/r600_dri.so failed (/usr/lib64/dri/r600_dri.so: cannot open shared object file: No such file or directory)


$ udevadm monitor --env 
(spews lots of entries like the one below - after udevd is killed, only one entry per 2-3 seconds) 

monitor will print the received events for:
UDEV - the event which udev sends out after rule processing
KERNEL - the kernel uevent

KERNEL[1271104588.993266] change   /devices/pci0000:00/0000:00:1f.2/host4/target4:0:0/4:0:0:0 (scsi)
UDEV_LOG=3
ACTION=change
DEVPATH=/devices/pci0000:00/0000:00:1f.2/host4/target4:0:0/4:0:0:0
SUBSYSTEM=scsi
SDEV_MEDIA_CHANGE=1
DEVTYPE=scsi_device
DRIVER=sr
MODALIAS=scsi:t-0x05
SEQNUM=471291

UDEV  [1271104589.008249] change   /devices/pci0000:00/0000:00:1f.2/host4/target4:0:0/4:0:0:0/block/sr0 (block)
UDEV_LOG=3
ACTION=change
DEVPATH=/devices/pci0000:00/0000:00:1f.2/host4/target4:0:0/4:0:0:0/block/sr0
SUBSYSTEM=block
DEVNAME=/dev/sr0
DEVTYPE=disk
SEQNUM=388466
ID_CDROM=1
ID_CDROM_CD_R=1
ID_CDROM_CD_RW=1
ID_CDROM_DVD=1
ID_CDROM_DVD_R=1
ID_CDROM_DVD_RW=1
ID_CDROM_DVD_RAM=1
ID_CDROM_DVD_PLUS_R=1
ID_CDROM_DVD_PLUS_RW=1
ID_CDROM_DVD_PLUS_R_DL=1
ID_CDROM_MRW=1
ID_CDROM_MRW_W=1
ID_VENDOR=Optiarc
ID_VENDOR_ENC=Optiarc\x20
ID_MODEL=DVD_RW_AD-7560S
ID_MODEL_ENC=DVD\x20RW\x20AD-7560S\x20
ID_REVISION=SX01
ID_TYPE=cd
ID_BUS=scsi
ID_PATH=pci-0000:00:1f.2-scsi-4:0:0:0
ACL_MANAGE=1
ANACBIN=/sbin
GENERATED=1
DKD_PRESENTATION_NOPOLICY=0
MAJOR=11
MINOR=0
DEVLINKS=/dev/block/11:0 /dev/scd0 /dev/disk/by-path/pci-0000:00:1f.2-scsi-4:0:0:0 /dev/cdrom /dev/cdrw /dev/dvd /dev/dvdrw

KERNEL[1271104589.055744] change   /devices/pci0000:00/0000:00:1f.2/host4/target4:0:0/4:0:0:0 (scsi)
UDEV_LOG=3
ACTION=change
DEVPATH=/devices/pci0000:00/0000:00:1f.2/host4/target4:0:0/4:0:0:0
SUBSYSTEM=scsi
SDEV_MEDIA_CHANGE=1
DEVTYPE=scsi_device
DRIVER=sr
MODALIAS=scsi:t-0x05
SEQNUM=471292


...8<.....................................

Hope this helps.

Comment 99 Need Real Name 2010-04-14 02:58:00 UTC

I tested 2.6.32.11-99.fc12.x86_64 and at least on the first boot it seems to stabilize pretty quickly. However, watching top via ssh, I did notice that Xorg and apcupsd both get very busy at the same time several times (getting up to 50-60% CPU each for 10 or more seconds, much more than I see on other systems during X startup), I think corresponding to initial gdm startup and then again during auto-login to GNOME desktop on my media-center PC.

Also, I notice while watching top that some CPU% numbers are spurious like negative numbers or 9999% during a single refresh of the screen, then go back to sensible numbers.  I have seen this on several kernel versions now, during this bootup phase when Xorg tends to go crazy. I don't think I ever see such behavior from top on other systems. Is it possible there is an underlying system time bug that triggers this Xorg/intel problem...?  The clocksource is defaulting to HPET.

Also, unlike my earlier comment #81 killing apcupsd does not always resolve the issue when it was malfunctioning. Sometimes killing apcupsd and even udevd were not enough, and I had to kill -9 Xorg as well (so gdm would restart it).

Comment 100 Bryan Schneiders 2010-04-28 21:41:26 UTC

Still experiencing this problem on kernel-2.6.32.11-99.fc12.x86_64

Comment 101 Nicholas Kudriavtsev 2010-05-01 10:41:29 UTC

Hello! I have a bug https://bugzilla.redhat.com/show_bug.cgi?id=528312 with partially different symptoms, but hope the source of the bug is the same. I found out the kernel commit with regression https://bugzilla.redhat.com/show_bug.cgi?id=573200#c11 .

I did not post a patch to disable the commit by myself. You can look at http://lkml.indiana.edu/hypermail//linux/kernel/1001.1/00966.html how to do that.

Comment 102 andrewgfry 2010-05-01 12:51:53 UTC

kernel-2.6.32.11-99.fc12.x86_64 seems to have resolved the problem for me, I experienced the bug every time for the -90 kernel (had other hassle with -94.bz528312 but -99 has been fine.

(Desktop system only, was previously seeing high CPU usage and huge Xorg.log files every time, but seems resolved by -99).

Thanks.

Comment 103 Nicholas Kudriavtsev 2010-05-01 14:06:51 UTC

If we speak about the same bug, it is resolved only for i8xx chipsets. I have GM45.

Comment 104 Jaroslav Pulchart 2010-05-07 06:50:40 UTC

I updated to F13 (devel) and this issue is still valid (Xorg.0.log is full of
"EDID for output ...") after suspend to disk and resume. Kernel version
2.6.33.3-79.fc13.x86_64.

Comment 105 Bryan Schneiders 2010-05-07 14:33:14 UTC

This issue is still live in Fedora 12.

I'm using:
kernel-2.6.32.11-99.fc12.x86_64
xorg-x11-drv-intel-2.9.1-1.fc12.x86_64

I've rebuilt the xorg-x11-drv-intel package without the uevent.patch as a workaround.  This makes Xorg not use 100% CPU and keeps the Xorg.0.log from filling up with EDID events.  But udevd is still constantly using some CPU instead.

This is an HP Pavilion p6340f.  The sticker says "Intel GMA X4500 integrated graphics".  lspci says "VGA compatible controller: Intel Corporation 4 Series Chipset Integrated Graphics Controller (rev 03)" and suggests it's using the i915 driver. dmesg says "agpgart-intel 0000:00:00.0: Intel G45/G43 Chipset".

Comment 106 Alessandro Suardi 2010-05-23 12:39:20 UTC

First time for me to notice udevd spinning at 100% on one of my two cores, GM45 chipset on a Dell E6400 - on 2.6.34-git8. I was just working in Xorg and found laptop hot on my lap... killing udevd brought back my laptop to normal state.

No suspend/resume.
No extra logging found in Xorg.0.log.

I build and run approx 75-90% of all released -git kernels, and I had *never* seen this before. Last known good kernel: 2.6.34-git4.

Latest Xorg related yum update:
May 16 18:20:05 Updated: xorg-x11-drv-evdev-2.3.3-1.fc12.x86_64

Comment 107 Alessandro Suardi 2010-05-23 19:08:52 UTC

Oh, never mind... Mine seems to be a new mainline kernel bug, as per:

http://lkml.org/lkml/2010/5/23/100

Sorry for the noise.

Comment 108 Adam Jackson 2010-05-24 17:45:31 UTC

Can someone who can reproduce this please boot with drm.debug=0x0f and attach dmesg from the resulting udev storm?

Comment 109 andrewgfry 2010-05-25 07:15:34 UTC

Created attachment 416285 [details]
dmesg with drm.debug=0x0f from effected system

Booted 2.6.32.12-115.fc12.x86_64 with drm.debug=0x0f.
System misbehaved at around May 25 16:49.
Attached is copy of dmesg.

Comment 110 andrewgfry 2010-05-25 07:22:28 UTC

Created attachment 416287 [details]
messages, dmesg and xorg.0.log from effected system

Booted kernel with drm.debug=0x0f

It ran run perhaps 10 mins, and at around May 25 16:49 (perhaps) started displaying poor behaviour. (Current kernel seems to run for a short time (10-15 mins) before getting upset, previous had displayed problem immediately!)

Attached contains dmesg, messages, and xorg.0.log.

Comment 111 Jaroslav Pulchart 2010-05-25 15:50:32 UTC

Hi, yes the uptime was now 20minuts (for me) before the issue turned up :( (F13 kernel 2.6.33.4-95.fc13.x86_64)

Comment 112 MartinG 2010-05-25 18:07:41 UTC

Created attachment 416461 [details]
dmesg with drm.debug=0x0f on Intel Mobile 4

After my second suspend/resume cycle (uptime 23 hours or so), the system hung for about a minute (right before May 25 20:06:46 CEST 2010). See attached dmesg. This is on a Lenovo Thinkpad T400:

$ cat /proc/cmdline 
ro root=/dev/VolGroup00/lv_root rhgb quiet selinux=0 vga=0x318 SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=no rd_plytheme=charge intel_iommu=igfx_off drm.debug=0x0f

$ uname -r
2.6.34-11.fc14.x86_64

$ rpm -qa \*intel\*
intel-gpu-tools-2.10.0-5.fc14.x86_64
xorg-x11-drv-intel-2.10.0-5.fc14.x86_64

$ lspci|grep VGA
00:02.0 VGA compatible controller: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07)

Comment 113 MartinG 2010-05-25 18:11:46 UTC

Created attachment 416462 [details]
relevant parts of /var/log/messages corresponding to previous dmesg output

Comment 114 Bob Glickstein 2010-06-07 04:45:36 UTC

I filed https://bugzilla.redhat.com/show_bug.cgi?id=600465 but it's almost certainly a duplicate of this bug.  A couple of data points that I haven't seen mentioned:

1. When the system is unresponsive it's possible to hotkey out of X to a textual virtual terminal and performance returns to normal immediately.  Switching back to the X session makes the system bog down again.

2. This bug appeared for the first time only a week or two ago for me and was very intermittent and solved by a reboot.  It got worse and worse, especially in the last couple of days, to the point where my system is almost totally unusable, almost all the time.  Possibly related: the weather turned distinctly hotter in the last few days (the only other relevant variable I can think of).

Mine is a Dell "Studio Slim" x86_64 desktop.  This is a total showstopper for me.

Comment 115 Andy Lutomirski 2010-06-09 22:13:49 UTC

In intel_setup_outputs in drivers/gpu/drm/i915/intel_display.c, try commenting out the entire contents of the "} else if (SUPPORTS_DIGITAL_OUTPUTS(dev)) {" block.  That fixes it for me (at the cost of breaking displayport).

Comment 116 Bob Glickstein 2010-06-11 13:46:35 UTC

In a clarification I just added to bug 600465 (a likely duplicate of this bug), I pointed out that while some sufferers of this bug have symptoms for just a minute or two, mine continue indefinitely once they begin.  From some of the comments here it sounds like others may be having the same experience?  Please clarify if you can.

I also noted: "this problem was recurring a few times each day when I first reported it, but in the past few days it has happened less than once per day.  The only difference I can think of between then and now has been the ambient temperature -- it was very hot, but it cooled off.  This weekend is supposed to be very hot again, so we'll see if the problem worsens"

Comment 117 Andy Lutomirski 2010-06-11 14:33:19 UTC

Bob: what kernel are you on?  I think 2.6.35 has a regression, which I'm about to post a patch for 2.6.35 to at least not make it worse.

Also, for my amusement, can you wait until the problem is happening, then (after switching to a VT if you have to) run intel_reg_read 0x61114 a bunch of times and send me all the output?

intel_reg_read 0x61110 might also be interesting, but you only have to do that once.

Comment 118 Andy Lutomirski 2010-06-12 09:51:37 UTC

Created attachment 423471 [details]
Workaround patch

For those of you who just want to use your computer without waiting for this to get fixed for real, try the attached kernel patch.  Then boot with i915.hotplug_mask=0x38000000.  That might mean you have to run xrandr (no parameters needed) after plugging or unplugging a digital cable.

You could also try specifying even fewer bits to see if some combination keeps the problem fixed but lets hotplug work.  For example, 0x08000000 stops the bug for me (but I haven't tested hotplug yet since I'm away from my docking station).

Comment 119 Bob Glickstein 2010-06-12 13:43:06 UTC

Andy: I'm using an up-to-date F12, so I'm on kernel 2.6.32.12-115.fc12.x86_64.

Also, my intel-gpu-tools is 2.9.1 and doesn't have intel_reg_read.

Please see bug 600465 for some possibly informative attachments I just added in response to a NEEDINFO request.

Comment 120 Need Real Name 2010-06-12 18:31:16 UTC

Another workaround, which works for me on my Q45 based system: killall -STOP Xorg during the storm, then wait until udev finishes spinning at 100% (a minute or less), then killall -9 Xorg.  After this, the system seems to behave normally.  For me, the storm only happens after bootup, and not every time, but I never use suspend/resume on this host...

Comment 121 Scott Marlow 2010-06-18 11:56:25 UTC

(In reply to comment #118)
> For those of you who just want to use your computer without waiting for this to
> get fixed for real, try the attached kernel patch.  Then boot with
> i915.hotplug_mask=0x38000000.  That might mean you have to run xrandr (no
> parameters needed) after plugging or unplugging a digital cable.

Thanks for the workaround, I applied it and my laptop has been working fine for the past 18 hours.

Comment 122 Roland Tapken 2010-06-22 09:58:20 UTC

I've using the workaround patch from Andy for a week now, and I've never had this performance issue again (F13 with Intel GM45 and kernel 2.6.33.5-112). But a permanent solution is really needed :-(

Comment 123 Bob Glickstein 2010-06-25 15:23:26 UTC

Update: I built a kernel with the patch from comment #118 (and ran it with the appropriate flags) and it DID NOT HELP.  The symptoms appeared after a warm and a cold reboot.

So I tried disabling the uevent patch in xorg-x11-drv-intel as suggested in comment #70 and it DID HELP.  That is, the udev storms continued to happen, but they did not slow X to a crawl.  In fact, a udev storm is happening right now as I type this, but it's monopolizing just one of my four cores, which I can live with for now.

(Cross-posting this update to bug #600465.)

Comment 124 Roland Tapken 2010-06-25 16:13:33 UTC

Bob, did you append "i915.hotplug_mask=0x38000000" to your kernel arguments?

Comment 125 Bob Glickstein 2010-06-25 16:59:49 UTC

Yes I did; in fact I hard-coded it into grub.conf.  And I double-checked the kernel arguments at the boot menu.  And I triple-checked with dmesg that that argument made it to the running kernel.  (I also counted the correct number of trailing zeroes a few times to make sure I had it right.)

Comment 126 Andy Lutomirski 2010-06-25 17:54:01 UTC

Bob: what kernel version are you running?  If it's 2.6.35-anything, try the patch here in addition to the hotplug_mask patch:

https://patchwork.kernel.org/patch/105727/

Failing that, can you run either intel_reg_read 0x61110 or intel_reg_dumper and post the output?  Both of them live in intel-gpu-tools.

Comment 127 Bob Glickstein 2010-06-25 18:28:45 UTC

Hi Andy,

As I reported in comment #119, my version of intel-gpu-tools does not have intel_reg_read (or intel_reg_dumper).  But it does have something called intel_gpu_dump, so just in case that's useful, I'll attach its output.

On the other hand, I'm on a slightly newer kernel now: 2.6.32.14-127.i915_irq_patch.fc12.x86_64.  Not new enough for the patch you suggested -- though it looks like the patch will apply to 2.6.32 just fine, except for the

  hotplug_en &= CRT_HOTPLUG_MASK;

line in intel_crt.c, which doesn't exist.  (In fact, although i915_reg.h defines CRT_HOTPLUG_MASK, no code in that directory appears to use it.)  If I get some time this weekend perhaps I will try the patch anyway.

Comment 128 Bob Glickstein 2010-06-25 18:29:39 UTC

Created attachment 426952 [details]
Output of intel_gpu_dump

Comment 129 Andy Lutomirski 2010-06-25 19:46:19 UTC

Unfortunately, the gpu dump doesn't help, and your kernel might have different hotplug code.  Basically, if the PORT_HOTPLOG_EN (0x61110) register has any of bits 0x38000000 set, then my patch didn't work.  If not, then either your kernel does something strange or there's a differnet bug.  Is there any chance you could download the intel-gpu-tools source and build it?

git link and tarballs are here:

http://cgit.freedesktop.org/xorg/app/intel-gpu-tools/

You shouldn't need the other patch unless fc12 backported a regression from 2.6.35, which sounds rather unlikely.

Comment 130 Bob Glickstein 2010-06-26 03:34:42 UTC

OK, I built the newer intel-gpu-tools, and the output of intel_reg_read 0x61110 is... 0x38000320.

So I quadruple-checked the kernel params:

% dmesg
...
Kernel command line: ro root=/dev/mapper/vg_marzipan2-lv_root noiswmd LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb quiet i915.hotplug_mask=0x38000000
...

and I also checked that I'm running a kernel that actually contains the patch in question:

% modinfo i915
...
parm:           hotplug_mask:Disable these hotplug bits (non-Ironlake) (uint)
...

Eager to help get to the bottom of this.  Feels like we're close.  (There's just a few other places that write to PORT_HOTPLUG_EN.)  Let me know what else I can do.

Comment 131 Andy Lutomirski 2010-06-26 03:52:27 UTC

You probably need the fix in commit 6e0032f0ae4440e75256bee11b163552cae21962, which you can find here:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=6e0032f0ae4440e75256bee11b163552cae21962

I'll be out of town for a few days, so good luck :)

Comment 132 Bob Glickstein 2010-06-26 15:53:37 UTC

Thanks Andy.  I applied that diff and built another kernel, and this time:

% sudo intel_reg_read 0x61110
0x61110 : 0x320

I'll do what I can to provoke a udev storm and will report back in a few days.

Comment 133 Bob Glickstein 2010-06-30 20:01:35 UTC

A few days later: the issue is completely resolved.  Thanks again!

Comment 134 Valent Turkovic 2010-07-05 08:40:25 UTC

I which version of Fedora Linux kernel is this bug fixed?

Comment 135 Andy Lutomirski 2010-07-05 13:43:52 UTC

Valent: none.  The bug seems to be resolved if you apply my patch and boot with i915.hotplug_mask=0x38000000.  That causes hotplug of digital outputs to work considerably less well (the kernel won't notice in a timely manner), so it's unlikely to get applied anywhere.

Comment 136 Albert 2010-07-06 18:51:55 UTC

The patch has been working nicely so far. I only rebuilt the patched i915.ko module. I am wondering if someone was kind enough to make the freshest i915.ko module available when a kernel update comes out, as this would be enormous help to people like me (not as if the patch itself wasn't useful on its own).

Comment 137 Bryan Schneiders 2010-09-02 19:48:19 UTC

Any news on a patch in Fedora, not just the workarounds above?  We're still using one of those workarouds, which isn't perfect, and this bug continues to be an issue on a large number of workstations.

Comment 138 Éric Brunet 2010-09-11 08:07:00 UTC

Im also hit by this bug on a dell E4200 laptop (intel chipset, x86-64, F13 up to date, KDE). A couple of minutes after most resume, the system hangs for about one minute with X taking all the CPU, then everything is back to normal. Thes /var/log/Xorg.0.log increases a lot during one of these storms. On the last instance, I had this

[ 17359.712] (II) intel(0): EDID for output LVDS1
[ 17359.712] (II) intel(0): Manufacturer: LCD  Model: 2109  Serial#: 909718585
[ 17359.712] (II) intel(0): Year: 2010  Week: 12
[ 17359.712] (II) intel(0): EDID Version: 1.3
[ 17359.712] (II) intel(0): Digital Display Input
[ 17359.712] (II) intel(0): Max Image Size [cm]: horiz.: 26  vert.: 16
[ 17359.712] (II) intel(0): Gamma: 2.20
[ 17359.712] (II) intel(0): No DPMS capabilities specified
[ 17359.712] (II) intel(0): Supported color encodings: RGB 4:4:4 YCrCb 4:4:4
[ 17359.712] (II) intel(0): First detailed timing is preferred mode
[ 17359.712] (II) intel(0): redX: 0.580 redY: 0.340   greenX: 0.310 greenY: 0.550
[ 17359.712] (II) intel(0): blueX: 0.155 blueY: 0.155   whiteX: 0.313 whiteY: 0.329
[ 17359.712] (II) intel(0): Manufacturer's mask: 0
[ 17359.713] (II) intel(0): Supported detailed timing:
[ 17359.713] (II) intel(0): clock: 82.0 MHz   Image Size:  261 x 163 mm
[ 17359.713] (II) intel(0): h_active: 1280  h_sync: 1352  h_sync_end 1480 h_blank_end 1660 h_border: 0
[ 17359.713] (II) intel(0): v_active: 800  v_sync: 803  v_sync_end 809 v_blanking: 823 v_border: 0
[ 17359.713] (II) intel(0): Supported detailed timing:
[ 17359.713] (II) intel(0): clock: 56.3 MHz   Image Size:  261 x 163 mm
[ 17359.714] (II) intel(0): h_active: 1280  h_sync: 1352  h_sync_end 1480 h_blank_end 1694 h_border: 0
[ 17359.714] (II) intel(0): v_active: 800  v_sync: 803  v_sync_end 809 v_blanking: 831 v_border: 0
[ 17359.714] (II) intel(0):  HMW1K@121EWU
[ 17359.714] (II) intel(0):
[ 17359.714] (II) intel(0): EDID (in hex):
[ 17359.714] (II) intel(0):     00ffffffffffff003064092139343936
[ 17359.715] (II) intel(0):     0c140103901a10780a87f594574f8c27
[ 17359.715] (II) intel(0):     27505400000001010101010101010101
[ 17359.715] (II) intel(0):     0101010101010820007c512017304880
[ 17359.715] (II) intel(0):     360005a31000001afe15009e51201f30
[ 17359.715] (II) intel(0):     4880360005a31000001a000000fe0048
[ 17359.715] (II) intel(0):     4d57314b403132314557550a000000fe
[ 17359.715] (II) intel(0):     00000000000000000001010a202000dc
[ 17359.715] (II) intel(0): EDID vendor "LCD", prod id 8457
[ 17359.716] (II) intel(0): Printing DDC gathered Modelines:
[ 17359.716] (II) intel(0): Modeline "1280x800"x0.0   82.00  1280 1352 1480 1660  800 803 809 823 +hsync -vsync (49.4 kHz)
[ 17359.716] (II) intel(0): Modeline "1280x800"x0.0   56.30  1280 1352 1480 1694  800 803 809 831 +hsync -vsync (33.2 kHz)
[ 17359.717] (II) intel(0): Not using default mode "320x240" (doublescan mode not supported)
[ 17359.717] (II) intel(0): Not using default mode "400x300" (doublescan mode not supported)
[ 17359.717] (II) intel(0): Not using default mode "400x300" (doublescan mode not supported)
[ 17359.717] (II) intel(0): Not using default mode "512x384" (doublescan mode not supported)
[ 17359.717] (II) intel(0): Not using default mode "640x480" (doublescan mode not supported)
[ 17359.717] (II) intel(0): Not using default mode "640x512" (doublescan mode not supported)
[ 17359.717] (II) intel(0): Not using default mode "800x600" (doublescan mode not supported)
[ 17359.717] (II) intel(0): Not using default mode "896x672" (doublescan mode not supported)
[ 17359.717] (II) intel(0): Not using default mode "928x696" (doublescan mode not supported)
[ 17359.717] (II) intel(0): Not using default mode "960x720" (doublescan mode not supported)
[ 17359.717] (II) intel(0): Not using default mode "700x525" (doublescan mode not supported)
[ 17359.717] (II) intel(0): Not using default mode "1024x768" (doublescan mode not supported)
[ 17359.717] (II) intel(0): Printing probed modes for output LVDS1
[ 17359.717] (II) intel(0): Modeline "1280x800"x60.0   82.00  1280 1352 1480 1660  800 803 809 823 +hsync -vsync (49.4 kHz)
[ 17359.717] (II) intel(0): Modeline "1280x800"x40.0   56.30  1280 1352 1480 1694  800 803 809 831 +hsync -vsync (33.2 kHz)
[ 17359.717] (II) intel(0): Modeline "1024x768"x60.0   65.00  1024 1048 1184 1344  768 771 777 806 -hsync -vsync (48.4 kHz)
[ 17359.718] (II) intel(0): Modeline "800x600"x60.3   40.00  800 840 968 1056  600 601 605 628 +hsync +vsync (37.9 kHz)
[ 17359.718] (II) intel(0): Modeline "800x600"x56.2   36.00  800 824 896 1024  600 601 603 625 +hsync +vsync (35.2 kHz)
[ 17359.718] (II) intel(0): Modeline "640x480"x59.9   25.18  640 656 752 800  480 490 492 525 -hsync -vsync (31.5 kHz)
[ 17359.744] (II) intel(0): EDID for output VGA1
[ 17359.753] (II) intel(0): EDID for output HDMI1
[ 17359.753] (II) intel(0): EDID for output DP1
[ 17359.762] (II) intel(0): EDID for output HDMI2
[ 17359.762] (II) intel(0): EDID for output DP2
[ 17359.762] (II) intel(0): EDID for output DP3

repeated 276 times, for a total duration of 40 seconds.

I have the impression that it happens nearly always after long suspends (more than 20 minutes) but not always after short suspends (couple of seconds, for testing). I have the impression that the storm starts when there is some high system activity (when I launch konqueror or bring on the front an opened oowriter), but it might just be impressions. It never happens on a cold boot.

On a top, I see xorg taking the CPU, but I don't see udevd.

This is an incredibly annoying bug.

Comment 139 Till Maas 2010-09-13 15:16:23 UTC

I get the excessive udev output currently only with a certain hdmi-dvi converter. Using another one works.

Comment 140 Matěj Cepl 2010-10-08 15:35:45 UTC

*** Bug 640884 has been marked as a duplicate of this bug. ***

Comment 141 Albert 2010-10-12 09:44:29 UTC

This bug seems to have been eliminated in F14. (There is a MUCH more annoying bug instead: 632031, but that's a different story. Hopefully this other one won't affect you.)

Comment 142 George Lebl 2010-10-12 20:42:44 UTC

I have what is to become F14 here and just had an occurrence of this bug yesterday.  And very bad one, it didn't go away for quite a long time, even changing VTs, moving windows around, etc... wasn't helping.  I had an office full of students so I didn't have time to investigate.  Finally I just shut down and rebooted.  The machine is up to date.

If anything I would say the problems got more common and more severe since I moved from F13 to F14.  Yesterday's storm generated an X log file of 114 megabytes.

Kernel is: 2.6.35.6-39.fc14.x86_64
udev-161-4.fc14.x86_64
xorg-x11-drv-intel-2.12.0-6.fc14.1.x86_64

Comment 143 Éric Brunet 2010-10-13 12:39:49 UTC

(In reply to comment #141)
> This bug seems to have been eliminated in F14. (There is a MUCH more annoying
> bug instead: 632031, but that's a different story. Hopefully this other one
> won't affect you.)

On my system (dell E4200 laptop, intel chipset, x86-64, KDE), I could indeed fix the bug (it didn't show up after 5 suspend/resume cycle) by simply installing and running the F14 kernel kernel-2.6.35.6-39.fc14.x86_64 without changing anything else.

And I can still suspend !

Comment 144 Bug Zapper 2011-06-02 17:37:31 UTC

This message is a reminder that Fedora 13 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 13.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '13'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 13's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 13 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 145 Roland Tapken 2011-06-02 20:18:15 UTC

Please change version to 14 as the problem still exists. Probably even in fc15, not tested yet.

Comment 146 Andy Lutomirski 2011-06-02 20:23:48 UTC

I'm not using my affected laptop much anymore, so I've stopped really thinking about this bug.

The real fix is known but it's complicated.  Maybe someone can be persuaded to do it some day :)

Comment 147 Adam Williamson 2011-06-04 15:17:12 UTC


-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 148 Adam Jackson 2011-11-30 15:37:30 UTC

This bug is well into tl;dr land, F14 is EOL very soon, and I've not seen any instance of this in ages.  Closing.  Please file new bugs if this is still an issue in F15 or later.

Comment 149 Andy Lutomirski 2011-11-30 17:06:53 UTC

Fair enough.

I'm about to resurrect the affected laptop (probably on Ubuntu LTS this time), and I'll repoen a bug somewhere if the problem is still there.  My newer Intel machines are unaffected.

Comment 150 Jaroslav Pulchart 2011-11-30 18:35:12 UTC

I'm without affected notebook now too.

Comment 151 Nicholas Kudriavtsev 2011-11-30 20:11:16 UTC

I still have an affected notebook, but it is fine with F15 and F16.

Comment 152 Ralf Ertzinger 2011-11-30 21:39:21 UTC

I have a (formerly) affected desktop that's still running F14, but I haven't experienced the bug in a long time.

Comment 153 Roland Tapken 2011-12-01 21:45:14 UTC

The bug still exists on Acer TravelMate 1810TZ on F15 using kernel 2.6.41.1-1.fc15.x86_64 and disappears when applying the patch from comment #118.

Comment 154 Roland Tapken 2011-12-01 21:46:24 UTC

Sorry, meant Acer *Aspire* 1810TZ.

Comment 155 Roland Tapken 2012-06-08 13:29:38 UTC

And nothing changed for 3.4.0... losing my hope to boot with a standard kernel some day ;-)

Comment 156 Roland Tapken 2012-09-12 17:13:47 UTC

Created attachment 612185 [details]
Ported the workaround patch to kernel 3.5.3

Since this bug is still valid for me I ported this patch to the current kernel of fc17, 3.5.3.

Comment 157 Roland Tapken 2013-01-08 17:49:13 UTC

It seems that this bug has been solved in kernel, I didn't expire it anymore.

Maybe it was commit d1757408bfe3adca81ff1c88fcb2d578864f8e9d by Jani Nikula:

> drm/i915: only enable sdvo hotplug irq if needed

or 768b107e4b3be0acf6f58e914afe4f337c00932b by Daniel Vetter

> drm/i915: disable sdvo hotplug on i945g/gm
>   v2: While at it, remove the bogus hotplug_active read, and do not mask
>   hotplug_active[0] before checking whether the irq is needed

Since I seemed to be the only one who has still been struggling with this I think this bug can be closed now.