Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1126199 - qemu is mis-linked on aarch64 when PIE+RELRO+combreloc
Summary: qemu is mis-linked on aarch64 when PIE+RELRO+combreloc
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: binutils
Version: rawhide
Hardware: aarch64
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kyle McMartin
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: ARM64, F-ExcludeArch-aarch64
TreeView+ depends on / blocked
 
Reported: 2014-08-03 13:16 UTC by Richard W.M. Jones
Modified: 2015-09-01 03:55 UTC (History)
15 users (show)

Fixed In Version: binutils-2.24-22.fc22
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-08-22 02:38:45 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
cpus.o-no-opt.txt (140.40 KB, text/plain)
2014-08-03 14:37 UTC, Richard W.M. Jones
no flags Details
cpus.o-opt.txt (140.13 KB, text/plain)
2014-08-03 14:38 UTC, Richard W.M. Jones
no flags Details

Description Richard W.M. Jones 2014-08-03 13:16:50 UTC
Description of problem:

When optimizations are enabled when compiling qemu on aarch64,
the qemu binary crashes reliably here:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  qemu_kvm_cpu_thread_fn (arg=0x556cefd880)
    at /usr/src/debug/qemu-2.1.0/cpus.c:858
858	    current_cpu = cpu;

#0  0x00000055670bccbc in qemu_kvm_cpu_thread_fn (arg=0x556cefd880)
    at /usr/src/debug/qemu-2.1.0/cpus.c:858
#1  0x0000007f8d92f04c in start_thread (arg=0x7f88add550)
    at pthread_create.c:312
#2  0x0000007f8b25e590 in thread_start ()
    at ../sysdeps/unix/sysv/linux/aarch64/clone.S:89

Version-Release number of selected component (if applicable):

qemu 2.1.0

How reproducible:

100%

Steps to Reproduce:
1. Start any VM.

Additional info:

The current_cpu macro is doing some Thread-Local Storage stuff,
which might be relevant.

Comment 1 Richard W.M. Jones 2014-08-03 14:37:30 UTC
Created attachment 923631 [details]
cpus.o-no-opt.txt

Compiled code with no optimization (working).

Comment 2 Richard W.M. Jones 2014-08-03 14:38:07 UTC
Created attachment 923632 [details]
cpus.o-opt.txt

Compiled code with optimizations and PIE (not working).

Comment 3 Richard W.M. Jones 2014-08-04 11:17:49 UTC
I added this patch to qemu in Rawhide to temporarily work
around the issue while we try to work out what's going on:

http://pkgs.fedoraproject.org/cgit/qemu.git/commit/?id=a6c45000fe26a552c7f72ba90e5ebfb9d27ffb90

Comment 4 Richard W.M. Jones 2014-08-04 13:36:14 UTC
Kyle McMartin asked me to try -mtls-dialect=trad.  However it
crashes in the same place.

Note that I'm only guessing that it's to do with TLS.  It could
be something completely different.

Comment 5 Richard W.M. Jones 2014-08-15 20:44:45 UTC
The reproducer for this is as follows:

Check out qemu from git.

./configure \
  --target-list="aarch64-softmmu" \
  --extra-cflags="-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches" \
  --extra-ldflags="-Wl,-z,relro -Wl,-z,now" \
  --enable-kvm

make

You will need a kernel (any kernel) whichis uncompressed, so do
something like:

zcat /boot/vmlinuz-3.WHATEVER.fc22.aarch64 > /tmp/vmlinux

Then try to boot the kernel in qemu:

gdb --args ./aarch64-softmmu/qemu-system-aarch64 -nodefaults -machine
virt,accel=kvm -kernel /tmp/vmlinux -monitor none -serial stdio

and gdb will catch the segfault.

Note that I am using aarch64 host running Fedora Rawhide.

Comment 6 Kyle McMartin 2014-08-20 18:30:24 UTC
http://arm.koji.fedoraproject.org/koji/taskinfo?taskID=2568981
http://arm.koji.fedoraproject.org/koji/taskinfo?taskID=2568986

can you try both these builds and let me know which work? I think I've narrowed the problem down, but it's a bit nasty.

Comment 7 Richard W.M. Jones 2014-08-20 21:32:40 UTC
The bz1126199jkkm1 package:

error: kvm run failed Bad address

This error message causes abort() to be called so the process
segfaults:

(gdb) bt
#0  0x0000007fb549d098 in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/unix/sysv/linux/raise.c:55
#1  0x0000007fb549ee0c in __GI_abort () at abort.c:89
#2  0x0000005589d51f18 in kvm_cpu_exec (cpu=cpu@entry=0x558a7d1f60)
    at /usr/src/debug/qemu-2.1.0/kvm-all.c:1727
#3  0x0000005589d40dcc in qemu_kvm_cpu_thread_fn (arg=0x558a7d1f60)
    at /usr/src/debug/qemu-2.1.0/cpus.c:874
#4  0x0000007fb7d4604c in start_thread (arg=0x7fb2f38550)
    at pthread_create.c:312
#5  0x0000007fb554b590 in thread_start ()
    at ../sysdeps/unix/sysv/linux/aarch64/clone.S:89
(gdb) frame 2
#2  0x0000005589d51f18 in kvm_cpu_exec (cpu=cpu@entry=0x558a7d1f60)
    at /usr/src/debug/qemu-2.1.0/kvm-all.c:1727
1727	            abort();
(gdb) frame 3
#3  0x0000005589d40dcc in qemu_kvm_cpu_thread_fn (arg=0x558a7d1f60)
    at /usr/src/debug/qemu-2.1.0/cpus.c:874
874	            r = kvm_cpu_exec(cpu);
(gdb) print cpu
$1 = (CPUState *) 0x558a7d1f60
(gdb) print *cpu
$2 = {
  parent_obj = {
    parent_obj = {
      class = 0x558a7d1d90, 
      free = 0x7fb7c1f564 <g_free>, 
      properties = {
        tqh_first = 0x558a7c1960, 
        tqh_last = 0x558a7e0038
      }, 
      ref = 2, 
      parent = 0x558a7e6990
    }, 
    id = 0x0, 
    realized = true, 
    pending_deleted_event = false, 
    opts = 0x0, 
    hotplugged = 0, 
    parent_bus = 0x0, 
    gpios = {
      lh_first = 0x558a7e0440
    }, 
    child_bus = {
      lh_first = 0x0
    }, 
    num_child_bus = 0, 
    instance_id_alias = -1, 
    alias_required_for_version = 0
  }, 
  nr_cores = 1, 
  nr_threads = 1, 
  numa_node = 0, 
  thread = 0x558a7ecb60, 
  thread_id = 3556, 
  host_tid = 0, 
  running = false, 
  halt_cond = 0x558a7ecb80, 
  queued_work_first = 0x0, 
  queued_work_last = 0x0, 
  thread_kicked = false, 
  created = true, 
  stop = false, 
  stopped = false, 
  exit_request = 0, 
  interrupt_request = 0, 
  singlestep_enabled = 0, 
  icount_extra = 0, 
  jmp_env = {{
      __jmpbuf = {0 <repeats 22 times>}, 
      __mask_was_saved = 0, 
      __saved_mask = {
        __val = {0 <repeats 16 times>}
      }
    }}, 
  as = 0x558a291b28 <address_space_memory>, 
  tcg_as_listener = 0x0, 
  env_ptr = 0x558a7da218, 
  current_tb = 0x0, 
  tb_jmp_cache = {0x0 <repeats 4096 times>}, 
  gdb_regs = 0x558a7ecb30, 
  gdb_num_regs = 68, 
  gdb_num_g_regs = 34, 
  node = {
    tqe_next = 0x0, 
    tqe_prev = 0x558a2231f0 <cpus>
  }, 
  breakpoints = {
    tqh_first = 0x0, 
    tqh_last = 0x558a7da1a8
  }, 
  watchpoints = {
    tqh_first = 0x0, 
    tqh_last = 0x558a7da1b8
  }, 
  watchpoint_hit = 0x0, 
  opaque = 0x0, 
  mem_io_pc = 0, 
  mem_io_vaddr = 0, 
  kvm_fd = 10, 
  kvm_vcpu_dirty = false, 
  kvm_state = 0x558a7bfba0, 
  kvm_run = 0x7fb7fd9000, 
  cpu_index = 0, 
  halted = 0, 
  icount_decr = {
    u32 = 0, 
    u16 = {
      low = 0, 
      high = 0
    }
  }, 
  can_do_io = 0, 
  exception_index = 0, 
  tcg_exit_req = 0
}

Comment 8 Richard W.M. Jones 2014-08-20 21:41:46 UTC
OK let's ignore the previous comment.  I checked back with
unoptimized qemu from git and that is now failing in the
same way as above on this machine.

Comment 9 Richard W.M. Jones 2014-08-20 22:03:16 UTC
This time with a working kernel.

The bz1126199jkkm1 package works.

The bz1126199jkkm2 package works.

Comment 10 Kyle McMartin 2014-08-20 23:20:07 UTC
Spiffy, this is going to be fun to debug... Thanks Richard, just wanted to double check that you were seeing the same results, since the issue is weird. :)

Comment 11 Kyle McMartin 2014-08-22 00:25:15 UTC
OK, it appears to be fixed with upstream binutils... I'll work on identifying a fix.

A workaround for the moment is to set -Wl,-z,nocombreloc to avoid sorting .rela which seems to result in the right GOT entries for the TLS vars.

regards, Kyle

Comment 12 Kyle McMartin 2014-08-22 02:17:46 UTC
the fix is:

commit f44a1f8e513b37bcc52ba9ea0c172c3e94852756
Author: Christophe Lyon <christophe.lyon>
Date:   Tue Jan 14 15:53:50 2014 +0100

    2014-01-14  Michael Hudson-Doyle  <michael.hudson>
            Kugan Vivekanandarajah  <kugan.vivekanandarajah>
    
        bfd/
        * elfnn-aarch64.c (elfNN_aarch64_final_link_relocate): Use correct
        offset while calculating relocation address.
        (elfNN_aarch64_create_small_pltn_entry): Likewise.
        (elfNN_aarch64_init_small_plt0_entry): Likewise.

i'll commit it to binutils after i do a bit more testing.

Comment 13 Kyle McMartin 2014-08-22 02:38:45 UTC
test results look good, pushed.

Comment 14 Richard W.M. Jones 2014-08-22 08:15:29 UTC
Thanks Kyle!

I have verified that a self-compiled binutils -22 fixes the
problem for me.


Note You need to log in before you can comment on or make changes to this bug.