Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1304591
Summary: | golang segmentation fault (core dumped) | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | aslersiam | ||||||
Component: | golang | Assignee: | Jakub Čajka <jcajka> | ||||||
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | unspecified | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | rawhide | CC: | admiller, amurdaca, aslersiam, david.voit, esm, fweimer, golang-updates, jcajka, jdulaney, lemenkov, ls, pbrobinson, rbarlow, releng, renich, s, vbatts | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | golang-1.6-1.fc24 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2016-02-22 16:01:33 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1305208, 1301536 | ||||||||
Attachments: |
|
Hello, I had bit problem to reproduce it locally, I'm now investigating it further. Could you please provide output of following commands lscpu uname -a rpm -qa glibc Thanks, Jakub Hello Jakub, sorry for the delay. here is the output's [aslersiam@localhost ~]$ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 2 On-line CPU(s) list: 0,1 Thread(s) per core: 1 Core(s) per socket: 2 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 55 Model name: Intel(R) Celeron(R) CPU N2840 @ 2.16GHz Stepping: 8 CPU MHz: 1476.974 CPU max MHz: 2582.3000 CPU min MHz: 499.8000 BogoMIPS: 4326.40 Virtualization: VT-x L1d cache: 24K L1i cache: 32K L2 cache: 1024K NUMA node0 CPU(s): 0,1 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 movbe popcnt tsc_deadline_timer rdrand lahf_lm 3dnowprefetch epb tpr_shadow vnmi flexpriority ept vpid tsc_adjust smep erms dtherm ida arat [aslersiam@localhost ~]$ ============================================================================ [aslersiam@localhost ~]$ uname -a Linux localhost.localdomain 4.5.0-0.rc2.git1.1.fc24.x86_64 #1 SMP Tue Feb 2 22:02:01 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux [aslersiam@localhost ~]$ ============================================================================ [aslersiam@localhost ~]$ rpm -qa glibc glibc-2.22.90-32.fc24.x86_64 [aslersiam@localhost ~]$ ============================================================================ maybe i need upload a specific file of the abrt log ? [aslersiam@localhost ccpp-2016-02-04-01:29:58-5483]$ ls abrt_version dso_list maps pkg_epoch time analyzer environ mountinfo pkg_name type architecture executable namespaces pkg_release uid cgroup exploitable open_fds pkg_version username cmdline global_pid os_info proc_pid_status uuid component hostname os_release pwd var_log_messages core_backtrace kernel package reason coredump last_occurrence pid runlevel count limits pkg_arch tid Thanks :) FWIW, I'm seeing repeatable "go test" segfaults with 1.6 on x86_64 (i386 was fine) in copr: https://copr-be.cloud.fedoraproject.org/results/logic/vault/fedora-rawhide-x86_64/00158247-golang-github-hashicorp-golang-lru/ From build.log.gz: /var/tmp/rpm-tmp.5sQFy2: line 34: 3601 Segmentation fault (core dumped) go test -compiler gc -ldflags "${LDFLAGS:-}" github.com/hashicorp/golang-lru Version golang-1.6-0.2.rc1.fc24.x86_64. Er, ignore what I said about i386. ;) I only have an x86_64 reproduction right now, no idea if it's working correctly on 32-bit. (In reply to Ed Marshall from comment #4) > Er, ignore what I said about i386. ;) I only have an x86_64 reproduction > right now, no idea if it's working correctly on 32-bit. My observation is that this is reproducible only on CPU without support for AVX and with presence of glibc built using GCC6(latest golang built for F23 seems to work just fine). But I'm still looking for root cause, reduced reproducer. Actually I seen this issue in COPR circa a week ago, but I had hard time reproducing it locally. According to result from CORP it seems it forks fine on 32-bit Intel https://copr.fedorainfracloud.org/coprs/jcajka/golang-rawhide/build/157662/. Note that the Go program in question needs to use CGO (i.e. call into libc or other C code) for the issue to show up. A trivial hello world program compiled[1] with the race detector (thus forcing CGO) is a 100% reliable way to reproduce this, at least on my machine (Nehalem-era Core i7, no AVX). Your observation regarding glibc seems about right. Version glibc-2.22.90-29.fc24.x86_64 is the last one that doesn't show the issue on Rawhide. [1] use -race flag when building For me, just running the Go command itself results in a crash: (gdb) r Starting program: /usr/bin/go Missing separate debuginfos, use: dnf debuginfo-install golang-bin-1.6-0.3.rc1.fc24.x86_64 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Program received signal SIGSEGV, Segmentation fault. 0x00007ffff7de1be5 in _dl_lookup_symbol_x () from /lib64/ld-linux-x86-64.so.2 (gdb) bt #0 0x00007ffff7de1be5 in _dl_lookup_symbol_x () from /lib64/ld-linux-x86-64.so.2 #1 0x00007ffff7de6ea4 in _dl_fixup () from /lib64/ld-linux-x86-64.so.2 #2 0x00007ffff7def2af in _dl_runtime_resolve_sse () from /lib64/ld-linux-x86-64.so.2 #3 0x0000000000829a9c in x_cgo_mmap () #4 0x0000000000000000 in ?? () → So it crashes extremely early in the process, in the dynamic linker. (gdb) disassemble Dump of assembler code for function _dl_lookup_symbol_x: 0x00007ffff7de1b60 <+0>: push %rbp 0x00007ffff7de1b61 <+1>: mov %rsp,%rbp 0x00007ffff7de1b64 <+4>: push %r15 0x00007ffff7de1b66 <+6>: push %r14 … 0x00007ffff7de1bdb <+123>: test %r12,%r12 0x00007ffff7de1bde <+126>: mov %rax,-0xa0(%rbp) => 0x00007ffff7de1be5 <+133>: movaps %xmm0,-0x90(%rbp) 0x00007ffff7de1bec <+140>: je 0x7ffff7de1bfb <_dl_lookup_symbol_x+155> 0x00007ffff7de1bee <+142>: testl $0xfffffffa,0x10(%rbp) 0x00007ffff7de1bf5 <+149>: jne 0x7ffff7de2c9b <_dl_lookup_symbol_x+4411> → The crash is at an SSE2 instruction. These typically have alignment requirements (the addresses must be a multiple of 16). Its a store onto the stack, so the most likely explanation is that the stack is misaligned. (gdb) print/x $rbp $4 = 0x7fffffffe1c8 → Yep, not a multiple of 16. Lets see where this comes from. (gdb) break x_cgo_mmap Breakpoint 1 at 0x829a90 (gdb) r The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /usr/bin/go [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Breakpoint 1, 0x0000000000829a90 in x_cgo_mmap () (gdb) disassemble Dump of assembler code for function x_cgo_mmap: => 0x0000000000829a90 <+0>: sub $0x8,%rsp 0x0000000000829a94 <+4>: mov %r9d,%r9d 0x0000000000829a97 <+7>: callq 0x829d40 <mmap@plt> 0x0000000000829a9c <+12>: cmp $0xffffffffffffffff,%rax 0x0000000000829aa0 <+16>: je 0x829ab0 <x_cgo_mmap+32> 0x0000000000829aa2 <+18>: add $0x8,%rsp 0x0000000000829aa6 <+22>: retq 0x0000000000829aa7 <+23>: nopw 0x0(%rax,%rax,1) 0x0000000000829ab0 <+32>: callq 0x829bd0 <__errno_location@plt> 0x0000000000829ab5 <+37>: movslq (%rax),%rax 0x0000000000829ab8 <+40>: add $0x8,%rsp 0x0000000000829abc <+44>: retq End of assembler dump. (gdb) print $rsp $1 = (void *) 0x7fffffffe320 → This is incorrect. On function entry, %rsp + 8 must be a multiple of 16. Lets go up further the call stack. (gdb) up #1 0x00000000004e8f35 in runtime.callCgoMmap () at /usr/lib/golang/src/runtime/sys_linux_amd64.s:269 269 CALL AX (gdb) disassemble Dump of assembler code for function runtime.callCgoMmap: 0x00000000004e8f10 <+0>: mov 0x8(%rsp),%rdi 0x00000000004e8f15 <+5>: mov 0x10(%rsp),%rsi 0x00000000004e8f1a <+10>: mov 0x18(%rsp),%edx 0x00000000004e8f1e <+14>: mov 0x1c(%rsp),%ecx 0x00000000004e8f22 <+18>: mov 0x20(%rsp),%r8d 0x00000000004e8f27 <+23>: mov 0x24(%rsp),%r9d 0x00000000004e8f2c <+28>: mov 0x78b225(%rip),%rax # 0xc74158 <_cgo_mmap> 0x00000000004e8f33 <+35>: callq *%rax => 0x00000000004e8f35 <+37>: mov %rax,0x28(%rsp) 0x00000000004e8f3a <+42>: retq 0x00000000004e8f3b <+43>: int3 0x00000000004e8f3c <+44>: int3 0x00000000004e8f3d <+45>: int3 0x00000000004e8f3e <+46>: int3 0x00000000004e8f3f <+47>: int3 End of assembler dump. (gdb) → This is a hand-written assembly routine. It is called with a correctly aligned %rsp. But then, it calls another function without making sure that this function, in turn, has %rsp + 8 as a multiple of 16 when entered. So, in short, this is a Go bug in the hand-written assembler code used as part of cgo. (In reply to Florian Weimer from comment #7) > For me, just running the Go command itself results in a crash: > > (gdb) r > Starting program: /usr/bin/go > Missing separate debuginfos, use: dnf debuginfo-install > golang-bin-1.6-0.3.rc1.fc24.x86_64 > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib64/libthread_db.so.1". > > Program received signal SIGSEGV, Segmentation fault. > 0x00007ffff7de1be5 in _dl_lookup_symbol_x () from /lib64/ld-linux-x86-64.so.2 > (gdb) bt > #0 0x00007ffff7de1be5 in _dl_lookup_symbol_x () from > /lib64/ld-linux-x86-64.so.2 > #1 0x00007ffff7de6ea4 in _dl_fixup () from /lib64/ld-linux-x86-64.so.2 > #2 0x00007ffff7def2af in _dl_runtime_resolve_sse () > from /lib64/ld-linux-x86-64.so.2 > #3 0x0000000000829a9c in x_cgo_mmap () > #4 0x0000000000000000 in ?? () > > → So it crashes extremely early in the process, in the dynamic linker. > > (gdb) disassemble > Dump of assembler code for function _dl_lookup_symbol_x: > 0x00007ffff7de1b60 <+0>: push %rbp > 0x00007ffff7de1b61 <+1>: mov %rsp,%rbp > 0x00007ffff7de1b64 <+4>: push %r15 > 0x00007ffff7de1b66 <+6>: push %r14 > … > 0x00007ffff7de1bdb <+123>: test %r12,%r12 > 0x00007ffff7de1bde <+126>: mov %rax,-0xa0(%rbp) > => 0x00007ffff7de1be5 <+133>: movaps %xmm0,-0x90(%rbp) > 0x00007ffff7de1bec <+140>: je 0x7ffff7de1bfb <_dl_lookup_symbol_x+155> > 0x00007ffff7de1bee <+142>: testl $0xfffffffa,0x10(%rbp) > 0x00007ffff7de1bf5 <+149>: jne 0x7ffff7de2c9b > <_dl_lookup_symbol_x+4411> > > → The crash is at an SSE2 instruction. These typically have alignment > requirements (the addresses must be a multiple of 16). Its a store onto the > stack, so the most likely explanation is that the stack is misaligned. > > (gdb) print/x $rbp > $4 = 0x7fffffffe1c8 > > → Yep, not a multiple of 16. Lets see where this comes from. > > (gdb) break x_cgo_mmap > Breakpoint 1 at 0x829a90 > (gdb) r > The program being debugged has been started already. > Start it from the beginning? (y or n) y > Starting program: /usr/bin/go > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib64/libthread_db.so.1". > > Breakpoint 1, 0x0000000000829a90 in x_cgo_mmap () > (gdb) disassemble > Dump of assembler code for function x_cgo_mmap: > => 0x0000000000829a90 <+0>: sub $0x8,%rsp > 0x0000000000829a94 <+4>: mov %r9d,%r9d > 0x0000000000829a97 <+7>: callq 0x829d40 <mmap@plt> > 0x0000000000829a9c <+12>: cmp $0xffffffffffffffff,%rax > 0x0000000000829aa0 <+16>: je 0x829ab0 <x_cgo_mmap+32> > 0x0000000000829aa2 <+18>: add $0x8,%rsp > 0x0000000000829aa6 <+22>: retq > 0x0000000000829aa7 <+23>: nopw 0x0(%rax,%rax,1) > 0x0000000000829ab0 <+32>: callq 0x829bd0 <__errno_location@plt> > 0x0000000000829ab5 <+37>: movslq (%rax),%rax > 0x0000000000829ab8 <+40>: add $0x8,%rsp > 0x0000000000829abc <+44>: retq > End of assembler dump. > (gdb) print $rsp > $1 = (void *) 0x7fffffffe320 > > → This is incorrect. On function entry, %rsp + 8 must be a multiple of 16. > Lets go up further the call stack. > > (gdb) up > #1 0x00000000004e8f35 in runtime.callCgoMmap () > at /usr/lib/golang/src/runtime/sys_linux_amd64.s:269 > 269 CALL AX > (gdb) disassemble > Dump of assembler code for function runtime.callCgoMmap: > 0x00000000004e8f10 <+0>: mov 0x8(%rsp),%rdi > 0x00000000004e8f15 <+5>: mov 0x10(%rsp),%rsi > 0x00000000004e8f1a <+10>: mov 0x18(%rsp),%edx > 0x00000000004e8f1e <+14>: mov 0x1c(%rsp),%ecx > 0x00000000004e8f22 <+18>: mov 0x20(%rsp),%r8d > 0x00000000004e8f27 <+23>: mov 0x24(%rsp),%r9d > 0x00000000004e8f2c <+28>: mov 0x78b225(%rip),%rax # 0xc74158 > <_cgo_mmap> > 0x00000000004e8f33 <+35>: callq *%rax > => 0x00000000004e8f35 <+37>: mov %rax,0x28(%rsp) > 0x00000000004e8f3a <+42>: retq > 0x00000000004e8f3b <+43>: int3 > 0x00000000004e8f3c <+44>: int3 > 0x00000000004e8f3d <+45>: int3 > 0x00000000004e8f3e <+46>: int3 > 0x00000000004e8f3f <+47>: int3 > End of assembler dump. > (gdb) > > → This is a hand-written assembly routine. It is called with a correctly > aligned %rsp. But then, it calls another function without making sure that > this function, in turn, has %rsp + 8 as a multiple of 16 when entered. > > So, in short, this is a Go bug in the hand-written assembler code used as > part of cgo. Already there, too :), unfortunately got some unexpected distraction yesterday, requiring my immediate attention..., hopefully I will have fix today... “LD_BIND_NOW=1 go” gets past this particular failure because all the binding happens before Go code is run. *** Bug 1309149 has been marked as a duplicate of this bug. *** Turns out I am hitting this with Docker, as well. Hello John, you may be seeing https://bugzilla.redhat.com/show_bug.cgi?id=1304062 which we think may be a dupe of this bug. *** Bug 1304062 has been marked as a duplicate of this bug. *** (In reply to Randy Barlow from comment #12) > Hello John, you may be seeing > https://bugzilla.redhat.com/show_bug.cgi?id=1304062 which we think may be a > dupe of this bug. Indeed, that's what I am thinking. (In reply to Randy Barlow from comment #12) > Hello John, you may be seeing > https://bugzilla.redhat.com/show_bug.cgi?id=1304062 which we think may be a > dupe of this bug. If you look at 1309149, you can see the discussion that led to this conclusion. Jakub reported this upstream: https://github.com/golang/go/issues/14384 Created attachment 1128591 [details]
go1.6-runtime-psABI-alignment.patch
From the linked upstream ticket
I atteched this patch, did a copr build and it seems to have worked https://copr.fedorainfracloud.org/coprs/davidvoit/runc/build/161355/ (In reply to David Voit from comment #18) > I atteched this patch, did a copr build and it seems to have worked > https://copr.fedorainfracloud.org/coprs/davidvoit/runc/build/161355/ Updated to that build, getting the same thing. From gdb: (gdb) bt #0 _dl_lookup_symbol_x (undef_name=0x4062e2 "mmap", undef_map=0x7fb9dc870128, ref=ref@entry=0x7fffcd31dd50, symbol_scope=0x7fb9dc870480, version=0x7fb9dc7b1988, type_class=type_class@entry=1, flags=1, skip_map=0x0) at dl-lookup.c:809 #1 0x00007fb9dc658ea4 in _dl_fixup (l=<optimized out>, reloc_arg=<optimized out>) at ../elf/dl-runtime.c:111 #2 0x00007fb9dc6612af in _dl_runtime_resolve_sse () at ../sysdeps/x86_64/dl-trampoline.h:112 #3 0x00000000010cee6c in ?? () #4 0x0000000000000000 in ?? () Than maybe i could lucky on the copr machine... I tryed to compile runc on copr and got an segfault - even building go produced one at first try, on second build it worked. So i now tested this in a mock chroot (Intel Core2 Duo laptop) Without the patch, just calling go - segmention fault With the rawhide copr build, it worked. John could you double check that you installed the rpm, or maybe we have here two issues? To fix all affected packages, there is need to do "mini" rebuild of all packages that build requires golang. I requested side tag re-build by release engineering in ticket https://fedorahosted.org/rel-eng/ticket/6351. Did a scratch rebuild of docker: http://koji.fedoraproject.org/koji/taskinfo?taskID=13099108 Note that I did not bump the release. *** Bug 1307505 has been marked as a duplicate of this bug. *** *** Bug 1307554 has been marked as a duplicate of this bug. *** *** Bug 1307555 has been marked as a duplicate of this bug. *** *** Bug 1307556 has been marked as a duplicate of this bug. *** *** Bug 1307560 has been marked as a duplicate of this bug. *** *** Bug 1307565 has been marked as a duplicate of this bug. *** *** Bug 1307566 has been marked as a duplicate of this bug. *** *** Bug 1307567 has been marked as a duplicate of this bug. *** *** Bug 1307573 has been marked as a duplicate of this bug. *** *** Bug 1307574 has been marked as a duplicate of this bug. *** *** Bug 1307576 has been marked as a duplicate of this bug. *** *** Bug 1307577 has been marked as a duplicate of this bug. *** *** Bug 1307579 has been marked as a duplicate of this bug. *** *** Bug 1307586 has been marked as a duplicate of this bug. *** *** Bug 1307585 has been marked as a duplicate of this bug. *** *** Bug 1307584 has been marked as a duplicate of this bug. *** |
Created attachment 1121024 [details] backtrace Description of problem: basically i can't run golang the terminal output say me "segmentation fault". Version-Release number of selected component (if applicable): golang-1.6-0.2.rc1.fc24.x86_64 How reproducible: a- install golang and later make a example file and try to run b- run go with any flag [ go --help ] Steps to Reproduce: 1. dnf -y install golang 2. go --help [ or any flag ] 3. Actual results: Expected results: Additional info: i can't report this with ABRT cuz this say that the backtrace is unusable. for this reason I've upload the backtrace file.