Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1194366 - ftrace writes to random memory when loading a module
Summary: ftrace writes to random memory when loading a module
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: aarch64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: TRACKER-bugs-affecting-libguestfs
TreeView+ depends on / blocked
 
Reported: 2015-02-19 16:28 UTC by Richard W.M. Jones
Modified: 2015-10-08 08:10 UTC (History)
8 users (show)

Fixed In Version: kernel-4.0.0-0.rc1.git0.2.fc23
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-03-25 12:24:59 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
crc32-arm64.ko (8.50 KB, application/x-object)
2015-02-19 17:14 UTC, Richard W.M. Jones
no flags Details
crc32.ko (6.34 KB, application/x-object)
2015-02-19 17:15 UTC, Richard W.M. Jones
no flags Details

Description Richard W.M. Jones 2015-02-19 16:28:23 UTC
Description of problem:

kernel-3.20.0-0.rc0.git7.3.bz1193875.fc23.aarch64 cannot run KVM
guests (sometimes anyway).  It looks like it's missing some
emulation from kvm.ko:

[   66.722766] kvm [1301]: load/store instruction decoding not implemented

It happens when the guest inserts a kernel module.  It may be
`crc32-arm64.ko' but I cannot be completely certain about that,
since it may be inserting another module but not getting the
debug message out.

Version-Release number of selected component (if applicable):

kernel-3.20.0-0.rc0.git7.3.bz1193875.fc23.aarch64
qemu-2.2.0-5.fc22.aarch64

How reproducible:

100%

Steps to Reproduce:
1. Run: libguestfs-test-tool

Comment 1 Richard W.M. Jones 2015-02-19 16:31:25 UTC
I should note that what specifically happens is the guest
starts up, and then suddenly aborts (when inserting the guest
kernel module).

Comment 2 Richard W.M. Jones 2015-02-19 16:46:55 UTC
In the test above, I was using guest kernel == host kernel ==
3.20.0-0.rc0.git7.3.bz1193875.fc23.

I tried this again, with:
guest kernel == 3.20.0-0.rc0.git7.3.bz1193875.fc23
host kernel == 3.19.0-0.rc7.git1.1.fc22
This *also* crashes when loading the kernel module in the
guest.

So it seems as if the problem is some new processor instruction
is used in a kernel module (possibly crc32-arm64.ko), which KVM
is unable to emulate.

Comment 3 Richard W.M. Jones 2015-02-19 16:57:55 UTC
A couple of other observations:

(1) Doesn't fail with host == guest == 3.19.0-0.rc7.git1.1.fc22.aarch64

(2) I'm very certain the troublesome kernel module is either
`crc32-arm64.ko' or `crc32.ko', and I'm about 90% certain it is
`crc32-arm64.ko'.

Comment 4 Richard W.M. Jones 2015-02-19 17:14:56 UTC
Created attachment 993700 [details]
crc32-arm64.ko

It occurred to me that maybe people wouldn't be able to
download the suspected modules, so I'll attached them here.

Comment 5 Richard W.M. Jones 2015-02-19 17:15:14 UTC
Created attachment 993701 [details]
crc32.ko

Comment 6 Richard W.M. Jones 2015-02-19 17:24:47 UTC
Here is a diff of the instructions used in crc32-arm64 3.19 vs 3.20.

 ldrh
 mov
 mvn
+nop
 orr
 ret

Seems strange.  I checked the 3.20 module and there is a nop instruction
inserted after every ret or unconditional branch.  I have no idea if
nop would be a problem on aarch64.  Maybe this is a wild goose chase.

Comment 7 Richard W.M. Jones 2015-02-20 09:45:34 UTC
I noticed that qemu dumps the registers on stderr before exiting:

error: kvm run failed Function not implemented
PC=fffffe000046cf4c  SP=fffffe0028383ba0
X00=fffffdfffaa20020 X01=fffffe0028383c00 X02=fffffffffffffffc X03=00000000d503201f
X04=fffffdfffaa20024 X05=ffffffffffffffff X06=0000000000000bb0 X07=fffffe0001a3c3b8
X08=fffffe0028380000 X09=fffffe0000f91000 X10=fffffe0001cfc000 X11=fffffe000123b000
X12=0000000000000000 X13=fffffe0001a3b808 X14=ffff000000000000 X15=ffffffffffffffff
X16=fffffe0000165898 X17=0000000000000001 X18=0000000000000d71 X19=0000040000000000
X20=fffffdfffc000020 X21=0000000000000140 X22=fffffe0029471180 X23=fffffe0000f218e8
X24=0000000000000000 X25=0000000000000000 X26=fffffe0001d6d000 X27=fffffdfffc0007a8
X28=fffffe0029660000 X29=fffffe0028383ba0 X30=fffffe00001e1bb0 PSTATE=600001c5 (flags -ZC-)

Not very helpful without knowing the address space layout of
the guest kernel.

Comment 8 Richard W.M. Jones 2015-02-24 12:17:00 UTC
I resolved PC against the symbol table, and it happens in the
guest kernel function '__copy_to_user', at the place marked
with <<< below:

fffffe00003e3040 <__copy_to_user>:
fffffe00003e3040:       8b020004        add     x4, x0, x2
fffffe00003e3044:       f1002042        subs    x2, x2, #0x8
fffffe00003e3048:       540000a4        b.mi    fffffe00003e305c <__copy_to_user+0x1c>
fffffe00003e304c:       f8408423        ldr     x3, [x1],#8
fffffe00003e3050:       f1002042        subs    x2, x2, #0x8
fffffe00003e3054:       f8008403        str     x3, [x0],#8
fffffe00003e3058:       54ffffa5        b.pl    fffffe00003e304c <__copy_to_user+0xc>
fffffe00003e305c:       b1001042        adds    x2, x2, #0x4
fffffe00003e3060:       54000084        b.mi    fffffe00003e3070 <__copy_to_user+0x30>
fffffe00003e3064:       b8404423        ldr     w3, [x1],#4
fffffe00003e3068:       d1001042        sub     x2, x2, #0x4
fffffe00003e306c:       b8004403        str     w3, [x0],#4   <<<<<<<
fffffe00003e3070:       b1000842        adds    x2, x2, #0x2
fffffe00003e3074:       54000084        b.mi    fffffe00003e3084 <__copy_to_user+0x44>
fffffe00003e3078:       78402423        ldrh    w3, [x1],#2
fffffe00003e307c:       d1000842        sub     x2, x2, #0x2
fffffe00003e3080:       78002403        strh    w3, [x0],#2
fffffe00003e3084:       b1000442        adds    x2, x2, #0x1
fffffe00003e3088:       54000064        b.mi    fffffe00003e3094 <__copy_to_user+0x54>
fffffe00003e308c:       39400023        ldrb    w3, [x1]
fffffe00003e3090:       39000003        strb    w3, [x0]
fffffe00003e3094:       d2800000        mov     x0, #0x0                        // #0
fffffe00003e3098:       d65f03c0        ret


Unfortunately qemu doesn't dump a stack trace before it exits.  I
will try to attach gdb to see if that gives any extra information.

Comment 9 Richard W.M. Jones 2015-02-24 12:24:54 UTC
gdb gives this stack trace, which looks bogus to me:

Program received signal SIGABRT, Aborted.
__copy_to_user () at arch/arm64/lib/copy_to_user.S:43
43	USER(9f, str	w3, [x0], #4	)
(gdb) bt
#0  __copy_to_user () at arch/arm64/lib/copy_to_user.S:43
#1  0xfffffe00001a6558 in __probe_kernel_write (dst=<optimized out>, 
    src=<optimized out>, size=<optimized out>) at mm/maccess.c:56
#2  0x0000000000000000 in ?? ()

More gdb information:

(gdb) info registers 
x0             0xfffffdfffaa20020	-2199113301984
x1             0xfffffe0028343c20	-2198348743648
x2             0xfffffffffffffffc	-4
x3             0xd503201f	3573751839
x4             0xfffffdfffaa20024	-2199113301980
x5             0xffffffffffffffff	-1
x6             0xfffffe0000a1b588	-2199012657784
x7             0xfffffe0000a1b570	-2199012657808
x8             0xfffffe0000a1b558	-2199012657832
x9             0xfffffdfee01a4480	-2203853372288
x10            0x101010101010101	72340172838076673
x11            0x6	6
x12            0x0	0
x13            0xffffffffffffffff	-1
x14            0xffff000000000000	-281474976710656
x15            0xffffffffffffffff	-1
x16            0xfffffe000013a5e0	-2199021967904
x17            0x1	1
x18            0x0	0
x19            0x40000000000	4398046511104
x20            0xfffffdfffc000020	-2199090364384
x21            0x140	320
x22            0x0	0
x23            0xfffffe0000dc17d8	-2199008831528
x24            0xfffffe000009c5b0	-2199022615120
x25            0xfffffe0000f65000	-2199007113216
x26            0x0	0
x27            0x0	0
x28            0xfffffe0029120000	-2198334210048
x29            0xfffffe0028343bc0	-2198348743744
x30            0xfffffe00001a6558	-2199021525672
sp             0xfffffe0028343bc0	0xfffffe0028343bc0
pc             0xfffffe00003e306c	0xfffffe00003e306c <__copy_to_user+44>
cpsr           0x600001c5	1610613189
fpsr           0x0	0
fpcr           0x0	0

(gdb) frame 1
#1  0xfffffe00001a6558 in __probe_kernel_write (dst=<optimized out>, 
    src=<optimized out>, size=<optimized out>) at mm/maccess.c:56
56		ret = __copy_to_user_inatomic((__force void __user *)dst, src, size);

Comment 10 Richard W.M. Jones 2015-02-24 14:14:32 UTC
Thread on kvmarm:
https://lists.cs.columbia.edu/pipermail/kvmarm/2015-February/thread.html#13632

Comment 11 Richard W.M. Jones 2015-02-24 16:22:43 UTC
ftrace is implicated:
https://lists.cs.columbia.edu/pipermail/kvmarm/2015-February/013652.html

Comment 12 Richard W.M. Jones 2015-02-24 18:12:21 UTC
Marc Zyngier posted a patch here which works for me:

http://lists.infradead.org/pipermail/linux-arm-kernel/2015-February/325445.html

I intend to add this to the kernel package in Rawhide unless
someone gets there first.

Comment 13 Richard W.M. Jones 2015-10-08 08:10:00 UTC
Similar new bug in 4.2.0:
https://bugzilla.redhat.com/show_bug.cgi?id=1269779


Note You need to log in before you can comment on or make changes to this bug.