Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 132439
Summary: | System clock runs faster under high (interrupt?) load... | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 2.1 | Reporter: | Phil D'Amore <damorep> |
Component: | kernel | Assignee: | Don Howard <dhoward> |
Status: | CLOSED DUPLICATE | QA Contact: | Brian Brock <bbrock> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 2.1 | CC: | barryn, everaldo.peixoto, guppie, rhbugs, riel |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2006-02-06 20:15:38 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Phil D'Amore
2004-09-13 14:36:36 UTC
According to the openntpd maintainer (who I spoke with at SUCON), the adjtime system call in Linux isn't POSIX compliant and allows the time to jump. This could well be a bug we need to fix for RHEL4. Maybe RHEL3 and RHEL2.1 too... *** Bug 133918 has been marked as a duplicate of this bug. *** 2 Comments here: 1) JFYI, according to the vsftpd maintainer, whom we've contacted about this issue, the latest vsftpd is fixed to the point that transfers will at least not fail. There are still problems (I'm guessing bandwidth limits will probably not work properly in this scenario), and the maintainer is addressing them, but the transfers will work. 2) While it is all well and good that adjtime isn't POSIX compliant, the reason I opened this bug in the first place is because I velieve the root of this problem is that, under load, the system clock is running extremely fast. If that problem didn't happen, the rest of this would be irrelevant, at least to this bug. Similar problem at FC4 X86_64. The system clock runs more than 2 times faster than it should. The system load is also unusually high. When you run procinfo it aborts with buffer overflow, as follows: Linux 2.6.11-1.1369_FC4 (bhcompile@dolly) (gcc 4.0.0 20050525 ) #1 1CPU [zefir] Linux 2.6.11-1.1369_FC4 (bhcompile@dolly) (gcc 4.0.0 20050525 ) #1 1CPU [zefir] Memory: Total Used Free Shared Buffers Mem: 0 24 -24 0 0 Swap: 0 0 0 Bootup: Mon Aug 8 12:30:49 2005 Load average: 0.76 0.59 0.64 2/116 3872 Bootup: Mon Aug 8 12:30:49 2005 Load average: 0.89 0.62 0.65 1/116 3872 user : 0:01:57.74 3.4% page in : 0 user : 0:01:43.58 0.2% page in : 0 nice : 0:00:00.00 0.0% page out: 0 system: 7:15:04.00 50.4% swap in : 0 idle : 7:07:17.86 49.5% swap out: 0 uptime: 0:57:43.59 context : 8416472 irq 0: 3453095 timer irq 11: 1 irq 0:17936650240 timer irq 11: 5179 irq 1: 51793 i8042 irq 12: 574909 i8042 irq 3: 10358 irq 14: 71920696 ide0 irq 4: 25896 irq 15: 149756224 ide1 irq 5: 10358 irq169: 1704011 acpi irq 6: 5179 irq177: 5179 ATI IXP, yenta irq 7: 5179 irq193: 44060876 eth0 irq 8: 0 rtc irq201:2982460928 ehci_hcd:usb1, ohci_ irq 9: 5179 irq209: 15538 ohci1394 irq 10: 5179 *** buffer overflow detected ***: procinfo terminated ======= Backtrace: ========= /lib64/libc.so.6(__chk_fail+0x2f)[0x36475dd23f] /lib64/libc.so.6[0x36475dc819] /lib64/libc.so.6(_IO_default_xsputn+0x86)[0x3647567d56] /lib64/libc.so.6(_IO_vfprintf+0xd5b)[0x3647541c4b] /lib64/libc.so.6(__vsprintf_chk+0xa9)[0x36475dc8c9] /lib64/libc.so.6(__sprintf_chk+0x80)[0x36475dc800] procinfo[0x40699b] procinfo[0x40232a] procinfo[0x40634e] /lib64/libc.so.6(__libc_start_main+0xdc)[0x364751c4cc] procinfo[0x401999] ======= Memory map: ======== 00400000-00409000 r-xp 00000000 fd:00 5393600 /usr/bin/procinfo 00509000-0050a000 rw-p 00009000 fd:00 5393600 /usr/bin/procinfo 0050a000-0052c000 rw-p 0050a000 00:00 0 [heap] 3647300000-364731a000 r-xp 00000000 fd:00 1044711 /lib64/ld-2.3.5.so Linux 2.6.11-1.1369_FC4 (bhcompile@dolly) (gcc 4.0.0 20050525 ) #1 1CPU [zefir] Memory: Total Used Free Shared Buffers /lib64/libc-2.3.5.soMem: 0 0 0 0 0 /lib64/libc-2.3.5.soSwap: 0 0 01044712 /lib64/libc-2.3.5.so3647731000-3647733000 rw-p 00131000 fd:00 1044712 /lib64/libc-2.3.5.soBootup: Mon Aug 8 12:30:49 2005 Load average: 0.75 0.60 0.64 1/116 3874 3648200000-3648203000 r-xp 00000000 fd:00 1044553 /lib64/libtermcap.so.2.0.8 3648203000-3648302000 ---p 00003000 fd:00 1044553 /lib64/libtermcap.so.2.0.8 3648302000-3648303000 rw-p 00002000 fd:00 1044553 /lib64/libtermcap.so.2.0.8 364dc00000-364dc0d000 r-xp 00000000 fd:00 1044717 /lib64/libgcc_s-4.0.0-20050520.so.1 364dc0d000-364dd0c000 ---p 0000d000 fd:00 1044717 /lib64/libgcc_s-4.0.0-20050520.so.1 364dd0c000-364dd0d000 rw-p 0000c000 fd:00 1044717 /lib64/libgcc_s-4.0.0-20050520.so.1 2aaaaaaab000-2aaaaaab3000 rw-p 2aaaaaaab000 00:00 0 2aaaaaabd000-2aaaaaac0000 rw-p 2aaaaaabd000 00:00 0 7fffffd14000-7fffffd29000 rw-p 7fffffd14000 00:00 0 [stack] [Aborted]f600000-ffffffffffe00000 ---p 00000000 00:00 0 [vdso] Issuing the command uptime just after the procinfo above we have? uptime 13:30:25 up 59 min, 2 users, load average: 0.42, 0.54, 0.62 The real system uptime at that moment was only 26 minutes. I must bring to your attention that in such case, the usefull system load was very low (only the Mozilla Browser filling the comment field of this bug report. Another symptom is that the keyboard repeat rate is very high (I'm not sure but I would say more than twice as fast as it should be). For what it's worth, I'm also seeing symptoms described in this bug on a FC4 x86-64 system. My clock drifts at about the same rate as the original reporter, but not at 2x speed as in comment #4. procinfo dies the same horrible death as in comment #4, but that's probably something unrelated. There's an interesting correlation between the clock drift and the amount of data transferred via network. For every ~11MB transferred, the clock advances one second. I'm currently running ntpd -q every five minutes to correct the drift, and usually the correction is only a few seconds, less during the night when the traffic is lower. I tried downloading an ISO image and the correction amount jumped to ~20 seconds / 5 minutes. This is an ABIT AV8 motherboard (VIA K8T800 chipset) with AMD Athlon 64 4600+ dual-core CPU, 250GB SATA Maxtor hard disk, 4GB of RAM and ATI Radeon 7000 as the display adapter. The on-board network adapter is apparently "VIA Technologies, Inc. VT6120/VT6121/VT6122 Gigabit Ethernet Adapter (rev 11)", using the via_velocity driver, kernel 2.6.13-1.1526_FC4smp. I'm not running cpufreq, Powernow has been disabled. Various listings from the system can be found at http://jaguaari.miuku.net/clock/ if they're of assistance. Unfortunately this is a production system, located some 150km away from me, so my debugging possibilities are rather limited in this regard. Speaking of FC4, the new 2.6.13-1.1532_FC4smp kernel seems to behave an order of magnitude better than 1526, although this is based on very preliminary test results. I've experienced the same problems on a dual core AMD64 machine, using a vanilla 2.6.14-2 kernel. When making a disk benchmark with bonnie, the clock is accelerated by a factor 60 or so. I can see the minutes passing by in seconds :-). Using the kernel options "notsc" and "clock=pmtmr" seems to have fixed the problem however, at least to the point where a clock "drift" is no longer in the range of a 60x acceleration. The original problem reported here was that the system clock on 2.1 was running fast during periods of high network interrupt activity. Was the netconsole module loaded? A 2.1 bug was recently fixed where netconsole was incorrectly influencing the system clock. See BZ 163211 for details. There are reports of other clock issues - please see BZ 175293 for this issue on dual-core amd RHEL4/fc4. The original bug reported here is more than 6 months old and suspected to be corrected, so I am closing this ticket. Please re-open if the fast clock issue is still seen on current RHEL2.1. |