Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1174664 - firefox loops in vclock_gettime() [NEEDINFO]
Summary: firefox loops in vclock_gettime()
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 21
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Marcelo Tosatti
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-12-16 09:54 UTC by Juan Quintela
Modified: 2015-02-24 16:12 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-02-24 16:12:48 UTC
Type: Bug
Embargoed:
jforbes: needinfo?


Attachments (Terms of Use)
cpuinfo (deleted)
2014-12-16 09:57 UTC, Juan Quintela
no flags Details
virtual machine description (deleted)
2014-12-16 10:03 UTC, Juan Quintela
no flags Details

Description Juan Quintela 2014-12-16 09:54:35 UTC
Description of problem:

Firefox hangs and screen don't redraw.  All threads are waiting for the one looping on vclock_gettime().  This happens when firefox is used inside a virtual machine.


Version-Release number of selected component (if applicable):

Since F21.


How reproducible:

Once or twice a day.



Steps to Reproduce:
1. launch firefox inside a VM
2. wait until it hangs.  Loging out/in and launching firefox each loging helps to reproduce.

Actual results:

firefox has hanged, and you can't use it.

Expected results:

firefox working as expected.


Additional info:

Comment 1 Juan Quintela 2014-12-16 09:57:32 UTC
Created attachment 969494 [details]
cpuinfo

Comment 2 Juan Quintela 2014-12-16 10:01:32 UTC
As marcelo recommended, I have switched to kernel-debug and add to the command line: slub_debug=ZFPU.  No slub debug output at all.  Looking at the MSR's:

# dmesg | grep msr
[    0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
[    0.000000] kvm-clock: cpu 0, msr 1:1ffd7001, primary cpu clock
[    0.000000] kvm-stealtime: cpu 0, msr 11ae0d380
[    0.050340] kvm-clock: cpu 1, msr 1:1ffd7041, secondary cpu clock
[    0.063066] kvm-stealtime: cpu 1, msr 11b00d380
[root@browser ~]#

And looking at this from qemu:

#virsh qemu-monitor-command --hmp browser  'xp /8x 0x1ffd7000'
000000001ffd7000: 0x00000000 0x00000000 0x00000000 0x00000000
000000001ffd7010: 0x00000000 0x00000000 0x00000000 0x00000000

# virsh qemu-monitor-command --hmp browser  'xp /8x 0x1ffd7040'
000000001ffd7040: 0x00000000 0x00000000 0x00000000 0x00000000
000000001ffd7050: 0x00000000 0x00000000 0x00000000 0x00000000

A bit later:

# virsh qemu-monitor-command --hmp browser  'xp /8x 0x1ffd7040'
000000001ffd7040: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
000000001ffd7050: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a

And right now:


# virsh qemu-monitor-command --hmp browser  'xp /8x 0x1ffd7000'
000000001ffd7000: 0x384adf93 0x5a5a5a5a 0x057364b8 0x00007f4d
000000001ffd7010: 0x00000000 0x00000013 0x000ffc7b 0x10180000


[root@trasno yum.repos.d]# virsh qemu-monitor-command --hmp browser  'xp /8x 0x1ffd7040'
000000001ffd7040: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
000000001ffd7050: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a


I haven't found what is corruputing the MSR's values.

Comment 3 Juan Quintela 2014-12-16 10:03:24 UTC
Created attachment 969505 [details]
virtual machine description

Comment 4 Juan Quintela 2014-12-16 10:06:10 UTC
Notice that I have two vcpu's on the guest.  With only one I was not able to reproduce (have to try harder).

Notice also that this is on Haswell host, a similar guest with Sandy Bridge host didn't show this problem.   But that guest also have F20 instead of F21.

Problem has happened since mid F21 development, tried to debug it with marcelo, and he ended asking me to fill the bugreport.

Comment 5 Juan Quintela 2014-12-16 12:31:31 UTC
Another round

# dmesg | grep msr
[    0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
[    0.000000] kvm-clock: cpu 0, msr 1:1ffd8001, primary cpu clock
[    0.000000] kvm-stealtime: cpu 0, msr 11fc0d100
[    0.041174] kvm-clock: cpu 1, msr 1:1ffd8041, secondary cpu clock
[    0.053011] kvm-stealtime: cpu 1, msr 11fc8d100


After start:

[root@trasno yum.repos.d]# virsh qemu-monitor-command --hmp browser  'xp /8x 0x1ffd8000'
000000001ffd8000: 0x3b401060 0xfffc7f4b 0x3b42d040 0xfffc7f4b
000000001ffd8010: 0x3b42d460 0xfffc7f4b 0x3b42d4c0 0xfffc7f4b


[root@trasno yum.repos.d]# virsh qemu-monitor-command --hmp browser  'xp /8x 0x1ffd8040'
000000001ffd8040: 0x3b42d700 0xfffc7f4b 0x3b42d760 0xfffc7f4b
000000001ffd8050: 0x3b42d7c0 0xfffc7f4b 0x3b42d820 0xfffc7f4b

When firefox hangs

[root@trasno yum.repos.d]# virsh qemu-monitor-command --hmp browser  'xp /8x 0x1ffd8000'
000000001ffd8000: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
000000001ffd8010: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a


[root@trasno yum.repos.d]# virsh qemu-monitor-command --hmp browser  'xp /8x 0x1ffd8040'
000000001ffd8040: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
000000001ffd8050: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a

Comment 6 Kamil Dudka 2015-01-09 15:21:27 UTC
There is a possible duplicate of this bug: bug 1178975

Comment 7 Justin M. Forbes 2015-01-27 15:00:34 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 21 kernel bugs.

Fedora 21 has now been rebased to 3.18.3-201.fc21.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you experience different issues, please open a new bug report for those.

Comment 8 Fedora Kernel Team 2015-02-24 16:12:48 UTC
*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in over 3 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.


Note You need to log in before you can comment on or make changes to this bug.