Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 117218
Summary: | Entropy pool not updated (/dev/random blocks) | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | Aaron Straus <aaron> | ||||
Component: | kernel | Assignee: | Ernie Petrides <petrides> | ||||
Status: | CLOSED ERRATA | QA Contact: | |||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 3.0 | CC: | guillaume.berche, hudson, jrichard, leonard-rh-bugzilla, petrides, redhat, riel, sopwith, walter | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | i686 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2005-07-22 20:42:47 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Aaron Straus
2004-03-01 18:44:24 UTC
We had to reboot the machine today. /dev/random is now fine. Just had the same problem on a Dell Poweredge 2550 with an LSI RAID card. Kernel kernel-smp-2.4.21-9.EL. Rebooting corrected it as well. Additional information: "service random stop; service random start" did not help. /dev/urandom does not block. It happened again. Reboot fixed it again. Kernel 9.0.1-smp this time. This is causing downtime on a production server. What should I look for if (when) it happens again to provide additional information? *** Bug 101266 has been marked as a duplicate of this bug. *** Reproduced on 2.4.21-9 (9.0.1-smp) on two different machines (supermicro p4 dual-xeon with HT with qlogic2300 HBAs, supermicro p4 dual-xeon with HT with 3ware IDE RAID) also have a look here. seems like a similar problem, here on a Fedora-Core1-system: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=118921 AFAICT /dev/random running out of entropy is expected behaviour. It needs input for new entropy. This is why /dev/urandom is there. I would say NOTABUG, but wont close as I am not 100% sure. man 4 random states: When the entropy pool is empty, reads from /dev/random will block until additional environmental noise is gathered. I.e. known behaviour. Closing NOTABUG. I disagree. It is true that /dev/random __should__ block until there is sufficient entropy. However, the problem is that the entropy pool is __never__ refilled. Moving the mouse, typing on the keyboard and disk activity should all refill the pool and you should get bytes out of /dev/random at that point. On my system this never happened. Also nothing was reading /dev/random, so nothing was draining the entropy pool. I believe this is a bug? What aaron said. The problem is that it blocks forever. It does not gather additional environmental noise and pass it to the app trying to do the read. This is undoubtedly a bug and will be fixed in RHEL3 U3. I have back-ported changes from 2.6 which will be committed after U3 opens. The problem is that critical data structures are completely unguarded by locks and consequently end up in a state in which no entropy is generated on SMP systems. Has anyone reproduced this problem on a uniprocessor? Sorry for this. I got confused by the fact that bug 118921, which refereneces this bug, only states that /dev/random blocks when it runs out of entropy. That is expected behaviour. Instead of closing this bug I should have looked more closely before doing so. Different reporters have different issues. Again, sorry. It hopefully won't happen again. The problem of not having the entropy pool fill again occured for me on a uniprocessor machine, so this is not (exclusively) SMP-related. Maybe there are two issues with the same effect? Yes, I believe that missed wakeups are also a possibility. However, I have had no success reproducing this on any in-house machine other than production servers. Consequently, I am making a test kernel available for testing on my Red Hat "people" page. The URL is http://people.redhat.com/~mdewand/.dev_random/. Here you will find the following two choices for download: kernel-2.4.21-12.EL.mdewand.rand.1.i686.rpm kernel-smp-2.4.21-12.EL.mdewand.rand.1.i686.rpm These kernels contain changes to the /dev/random driver back-ported from 2.6. I would appreciate any feedback that anyone can provide regarding their experiences with either of these kernels. *** Bug 119526 has been marked as a duplicate of this bug. *** Ernie, Thanks for including my bug in this sorry I missed it when I searched for it. Just a reminder I see this on RH 8 systems as well, though I'm aware that they are out of support. So this problem probably exists in other kernels as well. Any feedback regarding the RPMs I posted a week ago? Sorry, currently have no chance to reproduce this on a server, since they are production and I don't have adequat test-equipment here at the moment. They were installed on the server that had the problem last week (wednesday, I think). So far so good, but the problem was rare enough that it will be a few weeks before I can comfortably say it's fixed. If it does turn out to be a fix, can you provide patched kernels for any kernel updates until it's included in U3? Hi Stefan H., could you maybe do/try some stresstesting? I'm thinking about reading from /dev/random to /dev/null until it's empty or so. It should imho be possible to read faster from /dev/random than the entropy-pool can fill up again. And then we could see if at the point where the pool is exhausted new entropy is still gennerated. by the way: You're also running it on a server without kbd/mouse? And does the disk have few/high hdd-activity? Sorry for not following up on this sooner. We have tested /dev/random a number of times over the last few weeks as you describe, and the entropy pool always fills back up correctly now after being exhausted. The server has a keyboard and mouse attached through a KVM, but it is not selected most of the time - someone logs into it for a few minutes every couple days on average. Did this patch make it into 9.0.3? Or do we need to wait for RHEL3-U3 to get it in the mainline kernel? The fixes for this problem that Mark DeWandel back-ported from 2.6 have just been committed to the RHEL3 U3 patch pool this evening (in kernel version 2.4.21-15.3.EL). Stefan, just to clarify, the fix did *not* make it into -9.0.3.EL nor into -15.EL (the U2 kernel). Thus, the first officially supported RHEL3 kernel with the fix will be the U3 kernel. Will these fixes also soon be ported over to Fedora? Anything known about their next kernel-release that might include this? Thank you guys for taking this bug seriously! Stefan, my understanding is that the fixes came from 2.6, which is what Fedora (as of FC2) is based on. So I'd guess the fixes are there already. If you need me to check out a specific FC kernel version to verify that the fixes are contained there, please let me know. I'll attach the RHEL3 U3 patch that I committed last night in the next comment for reference. Created attachment 100157 [details]
/dev/random driver fixes committed in RHEL3 U3
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-433.html Thank you very much. Does somebody know when ports of these fixes will occur in Fedora Core 2? Stefan, your question in comment #30 are already answered in comment #27. Still seeing this on 2.4.21-20.ELsmp - the errata suggests it's been fixed and yet on a server with no keyboard and mouse the /dev/random device can produce no output for minutes at a time. This is causing all java JINI services to hang on startup and some java SSL services. The only safe workaround for JINI is rm /dev/random mknod -m 0444 /dev/random c 1 9 as suggested in http://linux.about.com/od/commands/l/blcmdl4_random.htm Comment 32: Jos, please see comment 8, 9 and 10. (In reply to comment #33) > Comment 32: Jos, please see comment 8, 9 and 10. > Jos I indeed read comments #8, to comment #10 but could not find the answer to Jos: is the fix included in RHEL3 U3 and attached in comment #28 makes use of /dev/random possible is a headless server such as the ones in most data centers (i.e. without mouse and keyword). Was the fix able to include other environmental data such as interrupts or hardware specific data such as hard disk statistics (BTW more details about how the fix works would certainly help in understanding how the bug was fixed)? Is the workaround of using /dev/urandom instead still necessary on headless computers running RHEL3 U3? Sorry, it seems that while adding myself to the CC list I have by mistake changed the status of this bug, which was not my intention. I therefore tried to put it back to the previous state left by "John Flanagan on 2004-09-02 00:31 EST", i.e. "CLOSED ERRATA", but was refused permission to do so. But I would still appreciate details about how this bug was fixed. Hello, Guillaume. From reading the patch in comment #28, it looks like the fixes were oriented around sleep/wakeup synchronization (as opposed to incorporating new sources of randomness). Unfortunately, the person who did this work is no longer here. Sorry I'm not able to get better answers for you. Reclosing bug. |