Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 117218 - Entropy pool not updated (/dev/random blocks)
Summary: Entropy pool not updated (/dev/random blocks)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: i686
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Ernie Petrides
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-03-01 18:44 UTC by Aaron Straus
Modified: 2007-11-30 22:07 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-07-22 20:42:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
/dev/random driver fixes committed in RHEL3 U3 (deleted)
2004-05-11 18:40 UTC, Ernie Petrides
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2004:433 0 normal SHIPPED_LIVE Updated kernel packages available for Red Hat Enterprise Linux 3 Update 3 2004-09-02 04:00:00 UTC

Description Aaron Straus 2004-03-01 18:44:24 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5)
Gecko/20040130 Firebird/0.7

Description of problem:
We have a number of Dell PowerEdge 1750s.   All are configured
identically.  

On one machine we've gotten it to a state where the entropy pool is
never updated so /dev/random always blocks on reads.  No process that
I can see is reading /dev/random.  /dev/random has worked on this
machine in the past.  We have an identically configured machine where
/dev/random will return random bytes.   It does not help to do disk
accesses or use the keyboard & mouse.

I have not tried rebooting yet, my guess is it will fix it?

Version-Release number of selected component (if applicable):
kernel-2.4.21-9.0.1.EL

How reproducible:
Sometimes

Steps to Reproduce:
1.  cat /dev/random | od
2.  wiggle mouse, type on keyboard, do disk accesses
3.  
    

Actual Results:  no output from cat

Expected Results:  on an identically configured machine random bytes
are printed

Additional info:
All the machines have SCSI drives.

Comment 2 Aaron Straus 2004-03-06 18:04:03 UTC
We had to reboot the machine today.  /dev/random is now fine.

Comment 3 Stefan Hudson 2004-03-11 16:55:09 UTC
Just had the same problem on a Dell Poweredge 2550 with an LSI RAID
card.  Kernel kernel-smp-2.4.21-9.EL.  Rebooting corrected it as well.

Additional information:
"service random stop; service random start" did not help.
/dev/urandom does not block.

Comment 4 Stefan Hudson 2004-03-16 23:45:27 UTC
It happened again.  Reboot fixed it again.  Kernel 9.0.1-smp this
time.  This is causing downtime on a production server.  What should I
look for if (when) it happens again to provide additional information?

Comment 5 Mark DeWandel 2004-03-17 14:47:50 UTC
*** Bug 101266 has been marked as a duplicate of this bug. ***

Comment 6 yuval yeret 2004-03-22 09:48:21 UTC
Reproduced on 2.4.21-9 (9.0.1-smp) on two different machines 
(supermicro p4 dual-xeon with HT with qlogic2300 HBAs, 
 supermicro p4 dual-xeon with HT with 3ware IDE RAID)



Comment 7 Stefan Neufeind 2004-03-22 20:36:18 UTC
also have a look here. seems like a similar problem, here on a 
Fedora-Core1-system:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=118921

Comment 8 Leonard den Ottolander 2004-03-25 19:53:25 UTC
AFAICT /dev/random running out of entropy is expected behaviour. It
needs input for new entropy. This is why /dev/urandom is there.

I would say NOTABUG, but wont close as I am not 100% sure.


Comment 9 Leonard den Ottolander 2004-03-25 20:04:03 UTC
man 4 random states:

When the entropy pool is empty, reads from /dev/random will block
until additional environmental noise is gathered.

I.e. known behaviour. Closing NOTABUG.


Comment 10 Aaron Straus 2004-03-25 20:29:24 UTC
I disagree.  It is true that /dev/random __should__ block until there
is sufficient entropy.  However, the problem is that the entropy pool
is __never__ refilled.  Moving the mouse, typing on the keyboard and
disk activity should all refill the pool and you should get bytes out
of /dev/random at that point.  On my system this never happened.

Also nothing was reading /dev/random, so nothing was draining the
entropy pool.  

I believe this is a bug?

Comment 11 Elliot Lee 2004-03-25 20:42:24 UTC
What aaron said. The problem is that it blocks forever. It does not
gather additional environmental noise and pass it to the app trying to
do the read.

Comment 12 Mark DeWandel 2004-03-25 20:58:46 UTC
This is undoubtedly a bug and will be fixed in RHEL3 U3.  I have
back-ported changes from 2.6 which will be committed after U3 opens.
The problem is that critical data structures are completely unguarded
by locks and consequently end up in a state in which no entropy is
generated on SMP systems.  Has anyone reproduced this problem on a
uniprocessor?


Comment 13 Leonard den Ottolander 2004-03-25 22:24:01 UTC
Sorry for this. I got confused by the fact that bug 118921, which
refereneces this bug, only states that /dev/random blocks when it runs
out of entropy. That is expected behaviour.

Instead of closing this bug I should have looked more closely before
doing so. Different reporters have different issues. Again, sorry. It
hopefully won't happen again.


Comment 14 Stefan Neufeind 2004-03-25 23:02:07 UTC
The problem of not having the entropy pool fill again occured for me 
on a uniprocessor machine, so this is not (exclusively) SMP-related. 
Maybe there are two issues with the same effect?

Comment 15 Mark DeWandel 2004-03-30 18:41:49 UTC
Yes, I believe that missed wakeups are also a possibility.  However,
I have had no success reproducing this on any in-house machine other
than production servers.  Consequently, I am making a test kernel
available for testing on my Red Hat "people" page.  The URL is
http://people.redhat.com/~mdewand/.dev_random/.  Here you will
find the following two choices for download:

kernel-2.4.21-12.EL.mdewand.rand.1.i686.rpm
kernel-smp-2.4.21-12.EL.mdewand.rand.1.i686.rpm

These kernels contain changes to the /dev/random driver back-ported
from 2.6.  I would appreciate any feedback that anyone can provide
regarding their experiences with either of these kernels.

Comment 16 Ernie Petrides 2004-03-31 22:35:43 UTC
*** Bug 119526 has been marked as a duplicate of this bug. ***

Comment 17 Jim Richard 2004-04-05 01:02:12 UTC
Ernie,

Thanks for including my bug in this sorry I missed it when I searched
for it. Just a reminder I see this on RH 8 systems as well, though I'm
aware that they are out of support. So this problem probably exists in
other kernels as well. 



Comment 18 Mark DeWandel 2004-04-06 12:26:47 UTC
Any feedback regarding the RPMs I posted a week ago?

Comment 19 Stefan Neufeind 2004-04-06 13:45:03 UTC
Sorry, currently have no chance to reproduce this on a server, since 
they are production and I don't have adequat test-equipment here at 
the moment.

Comment 20 Stefan Hudson 2004-04-06 15:33:47 UTC
They were installed on the server that had the problem last week
(wednesday, I think).  So far so good, but the problem was rare enough
that it will be a few weeks before I can comfortably say it's fixed.

If it does turn out to be a fix, can you provide patched kernels for
any kernel updates until it's included in U3?

Comment 21 Stefan Neufeind 2004-04-06 15:44:10 UTC
Hi Stefan H.,

could you maybe do/try some stresstesting? I'm thinking about reading 
from /dev/random to /dev/null until it's empty or so. It should imho 
be possible to read faster from /dev/random than the entropy-pool can 
fill up again. And then we could see if at the point where the pool 
is exhausted new entropy is still gennerated.

by the way: You're also running it on a server without kbd/mouse? And 
does the disk have few/high hdd-activity?

Comment 23 Stefan Hudson 2004-05-10 22:21:40 UTC
Sorry for not following up on this sooner.  We have tested /dev/random
a number of times over the last few weeks as you describe, and the
entropy pool always fills back up correctly now after being exhausted.

The server has a keyboard and mouse attached through a KVM, but it is
not selected most of the time - someone logs into it for a few minutes
every couple days on average.

Did this patch make it into 9.0.3?  Or do we need to wait for RHEL3-U3
to get it in the mainline kernel?

Comment 24 Ernie Petrides 2004-05-11 04:01:21 UTC
The fixes for this problem that Mark DeWandel back-ported from 2.6
have just been committed to the RHEL3 U3 patch pool this evening
(in kernel version 2.4.21-15.3.EL).


Comment 25 Ernie Petrides 2004-05-11 04:16:19 UTC
Stefan, just to clarify, the fix did *not* make it into -9.0.3.EL
nor into -15.EL (the U2 kernel).  Thus, the first officially
supported RHEL3 kernel with the fix will be the U3 kernel.


Comment 26 Stefan Neufeind 2004-05-11 05:21:44 UTC
Will these fixes also soon be ported over to Fedora? Anything known 
about their next kernel-release that might include this?

Thank you guys for taking this bug seriously!

Comment 27 Ernie Petrides 2004-05-11 18:38:10 UTC
Stefan, my understanding is that the fixes came from 2.6, which
is what Fedora (as of FC2) is based on.  So I'd guess the fixes
are there already.  If you need me to check out a specific FC
kernel version to verify that the fixes are contained there,
please let me know.  I'll attach the RHEL3 U3 patch that I
committed last night in the next comment for reference.


Comment 28 Ernie Petrides 2004-05-11 18:40:05 UTC
Created attachment 100157 [details]
/dev/random driver fixes committed in RHEL3 U3

Comment 29 John Flanagan 2004-09-02 04:31:06 UTC
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-433.html


Comment 30 Stefan Neufeind 2004-09-02 05:40:20 UTC
Thank you very much. Does somebody know when ports of these fixes 
will occur in Fedora Core 2?

Comment 31 Leonard den Ottolander 2004-09-02 10:24:21 UTC
Stefan, your question in comment #30 are already answered in comment #27.


Comment 32 Jos Martin 2004-11-25 18:39:27 UTC
Still seeing this on 2.4.21-20.ELsmp - the errata suggests it's been
fixed and yet on a server with no keyboard and mouse the /dev/random
device can produce no output for minutes at a time. This is causing
all java JINI services to hang on startup and some java SSL services.
The only safe workaround for JINI is 

rm /dev/random
mknod -m 0444 /dev/random c 1 9

as suggested in http://linux.about.com/od/commands/l/blcmdl4_random.htm

Comment 33 Leonard den Ottolander 2004-11-25 21:45:43 UTC
Comment 32: Jos, please see comment 8, 9 and 10.


Comment 34 Guillaume Berche 2005-07-22 11:58:30 UTC
(In reply to comment #33)
> Comment 32: Jos, please see comment 8, 9 and 10.
> 

Jos I indeed read comments #8, to comment #10 but could not find the answer to
Jos: is the fix included in RHEL3 U3 and attached in comment #28 makes use of
/dev/random possible is a headless server such as the ones in most data centers
(i.e. without mouse and keyword).

Was the fix able to include other environmental data such as interrupts or
hardware specific data such as hard disk statistics (BTW more details about how
the fix works would certainly help in understanding how the bug was fixed)?

Is the workaround of using /dev/urandom instead still necessary on headless
computers running RHEL3 U3?

Comment 35 Guillaume Berche 2005-07-22 13:54:56 UTC
Sorry, it seems that while adding myself to the CC list I have by mistake
changed the status of this bug, which was not my intention. I therefore tried to
put it back to the previous state left by "John Flanagan on 2004-09-02 00:31
EST", i.e. "CLOSED ERRATA", but was refused permission to do so.

But I would still appreciate details about how this bug was fixed.

Comment 36 Ernie Petrides 2005-07-22 20:42:47 UTC
Hello, Guillaume.  From reading the patch in comment #28, it looks like
the fixes were oriented around sleep/wakeup synchronization (as opposed
to incorporating new sources of randomness).  Unfortunately, the person
who did this work is no longer here.  Sorry I'm not able to get better
answers for you.

Reclosing bug.



Note You need to log in before you can comment on or make changes to this bug.