Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1452813
Summary: | Programs segfault when linked to libtcmalloc: Relink `<...>' with `/lib64/libtcmalloc.so.4' for IFUNC symbol `_ZdlPvm' | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Richard W.M. Jones <rjones> | ||||
Component: | gperftools | Assignee: | Tom "spot" Callaway <tcallawa> | ||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | rawhide | CC: | alkondratenko, amit, berrange, cfergeau, dan, dwmw2, fweimer, itamar, pbonzini, ppisar, rjones, tcallawa, virt-maint | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | ppc64 | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | gperftools-2.5.93-1.fc26 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2017-06-09 19:09:35 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 910269, 1071880 | ||||||
Attachments: |
|
Description
Richard W.M. Jones
2017-05-19 17:11:04 UTC
I was able to reproduce this on emulated hardware. The problem happens in an ifunc during ELF relocations: (gdb) run -help Starting program: /usr/bin/qemu-system-ppc64 -help Program received signal SIGSEGV, Segmentation fault. 0x0000000000000000 in ?? () (gdb) bt #0 0x0000000000000000 in ?? () #1 0x00003fffb5aa8640 in ?? () #2 0x00003fffb5aaa544 in ?? () #3 0x00003fffb7fba268 in resolve_ifunc (sym_map=0x3fffb5bb0000, map=<optimized out>, value=70367497069760) at ../sysdeps/powerpc/powerpc64/dl-machine.h:666 #4 elf_machine_rela (skip_ifunc=<optimized out>, reloc_addr_arg=0x3fffb5afecd0, version=<optimized out>, sym=<optimized out>, reloc=0x3fffb5aa1a78, map=0x3fffb5bb0000) at ../sysdeps/powerpc/powerpc64/dl-machine.h:708 #5 elf_dynamic_do_Rela (skip_ifunc=<optimized out>, lazy=<optimized out>, nrelative=<optimized out>, relsize=<optimized out>, reladdr=<optimized out>, map=<optimized out>) at do-rel.h:137 #6 _dl_relocate_object (scope=0x3fffb5bb0378, reloc_mode=<optimized out>, consider_profiling=<optimized out>) at dl-reloc.c:259 #7 0x00003fffb7fa588c in dl_main (phdr=<optimized out>, phnum=<optimized out>, user_entry=<optimized out>, auxv=<optimized out>) at rtld.c:2047 #8 0x00003fffb7fd18b4 in _dl_sysdep_start (start_argptr=<optimized out>, dl_main=0x3fffb7fa27d0 <dl_main>) at ../elf/dl-sysdep.c:253 #9 0x00003fffb7fa1de8 in _dl_start_final (arg=0x3ffffffff270, info=0x3fffffffecd0) at rtld.c:303 #10 0x00003fffb7fa74b4 in _dl_start (arg=0x3ffffffff270) at rtld.c:411 #11 0x00003fffb7fa1578 in _start () from /lib64/ld64.so.2 Possibly the library which is failing is /lib64/libtcmalloc.so.4 The symbol which is being relocated may be _ZdlPvm. The last few lines of LD_DEBUG output before the crash are: 12630: symbol=_ZdlPvm; lookup in file=/lib64/libtcmalloc.so.4 [0] 12630: binding file /lib64/libtcmalloc.so.4 [0] to /lib64/libtcmalloc.so.4 [0]: normal symbol `_ZdlPvm' I bumped the release and rebuilt libtcmalloc (https://koji.fedoraproject.org/koji/taskinfo?taskID=19684296) but I already suspect that was the wrong thing to do. I suspect we need instead to rebuild the dependent packages instead (ie. qemu, mongodb, and possibly many more). I will rebuild qemu shortly. This looks like bug 1312462 has resurfaced. Bumped and rebuilt qemu: https://koji.fedoraproject.org/koji/taskinfo?taskID=19684916 This is just a test to see if it needs to be rebuilt against the newer tcmalloc which was released on May 15th. *** Bug 1453099 has been marked as a duplicate of this bug. *** The qemu rebuild failed on ppc64 when it tries to run the just-built qemu-system-ppc64 command. So it's not just a simple matter of rebuilding dependencies. It's an actual bug in tcmalloc. As Florian notes, it's most likely that bug 1312462 has reappeared. Thanks for cc-ing me in. And even more thanks for quickly importing release candidate of gperftools. Yes, I've re-enabled ifunc-driven runtime switch for sized-deleted support in tcmalloc. The change I've made compared to previous time is to avoid calling any libc functions (like strlen or strcmp). My understanding is that since no calls to any functions should happen, we should be immune to "ifunc handler cannot call anything" problem. So I am curious what exactly is going on. Perhaps I've missed something. Can you post symbolized backtrace of the crash ? Also ifunc stuff can be disabled by passing --disable-dynamic-sized-delete-support to configure while we debug. Does this patch helps: https://gist.github.com/alk/d97b2df483dfc512621385c53bd6f63f ? I suspect it might, but maybe I am too naive anyways. (In reply to Aliaksei Kandratsenka from comment #10) > Does this patch helps: > https://gist.github.com/alk/d97b2df483dfc512621385c53bd6f63f ? I suspect it > might, but maybe I am too naive anyways. I will try a few things. It's rather slow going because I have to test everything under emulation. (In reply to Richard W.M. Jones from comment #11) > (In reply to Aliaksei Kandratsenka from comment #10) > > Does this patch helps: > > https://gist.github.com/alk/d97b2df483dfc512621385c53bd6f63f ? I suspect it > > might, but maybe I am too naive anyways. > > I will try a few things. It's rather slow going because I have > to test everything under emulation. Richard, do you want me to apply his patch in Comment 10 and do a new build? Alternately, if you need me to pass --disable-dynamic-sized-delete-support for now, let me know. Sorry about the delays - it's very painful building gperftools under emulation. However I also used scratch builds in Koji to answer the questions above: (In reply to Aliaksei Kandratsenka from comment #9) > Also ifunc stuff can be disabled by passing > --disable-dynamic-sized-delete-support to configure while we debug. Yes, this DOES fix the problem (not surprisingly, really). Do we lose very much by disabling this? If C++ code uses sized delete + tcmalloc, will it fail to {compile|run}? (In reply to Aliaksei Kandratsenka from comment #10) > Does this patch helps: > https://gist.github.com/alk/d97b2df483dfc512621385c53bd6f63f ? I suspect it > might, but maybe I am too naive anyways. No this patch does NOT fix the problem. Thanks. Is it ppc-only now? And -Wl,-z,now is necessary and sufficient (and maybe also relro?) ? I'll need some help debugging further. There are no plt calls in the new code and no calls to ifunc-ed functions too. So unless "you cannot call anything at all from ifunc handler on ppc (must inline everything)" holds, I cannot see how it may fail. So there is some generic value w.r.t. clarifying ifunc semantics in debugging this further. And yes, I can disable this feature upstream (say whitelist arm64 and x86 where I can test). (In reply to Aliaksei Kandratsenka from comment #14) > Thanks. Is it ppc-only now? And -Wl,-z,now is necessary and sufficient (and > maybe also relro?) ? It is ppc64 and ppc64le only. It is NOT related to -z now or any other special linker flag. Merely linking to -ltcmalloc is sufficient. I'm not able to reproduce this with the upstream gperftools (from git), but still trying ... Created attachment 1281209 [details]
build.sh
Well, I tried to reproduce what we see with the Fedora package
using the upstream git repo, and I cannot reproduce it.
This may or may not be surprising - it may be that the ifunc
problems depends in great detail on some aspect of the precise order
in which the libtcmalloc.so library is linked together at build time.
Anyway, attached is the build.sh script I was using to try to
reproduce this (on ppc64le hardware), in case someone else wants
to have a go.
Then lets debug specific crash that triggered this ticket. Is there any way to get symbol names in the crash? Hm. So apparently duplicated ticket #1453099 is crashing on amd64. So perhaps debugging that crash would be easier. Is there some easy for me to reproduce #1453099 say within docker? Thanks again for raising it. I can reproduce the problem on debian sid amd64 by adding LDFLAGS='-Wl,-z,now -Wl,-z,relro' and running unit tests. Specific issue is __environ relocation is not available during ifunc handler invocation. So this is hopeless indeed. I will disable this feature again. Sorry for the noise (but please consider fixing and expanding scope of ifunc; it would be nice if all normal relocations could be done before ifunc resolutions start). Fixed upstream by: commit f2bae51e7e609855c26095f14ffbb84082694acb Author: Aliaksey Kandratsenka <alkondratenko> Date: Mon May 22 18:58:15 2017 -0700 Revert "Revert "disable dynamic sized delete support by default"" This reverts commit b82d89cb7c8781a6028f6f5959cabdc5a273aec3. Dynamic sized delete support relies on ifunc handler being able to look up environment variable. The issue is, when stuff is linked with -z now linker flags, all relocations are performed early. And sadly ifunc relocations are not treated specially. So when ifunc handler runs, it cannot rely on any dynamic relocations at all, otherwise crash is real possibility. So we cannot afford doing it until (and if) ifunc is fixed. This was brought to my attention by Fedora people at https://bugzilla.redhat.com/show_bug.cgi?id=1452813 Spot: You'll have to either disable sized delete on every architecture or add the above commit. (In reply to Aliaksei Kandratsenka from comment #19) > Specific issue is __environ relocation is not available during ifunc handler > invocation. > > So this is hopeless indeed. I will disable this feature again. Sorry for the > noise (but please consider fixing and expanding scope of ifunc; it would be > nice if all normal relocations could be done before ifunc resolutions start). We can give you a valid __environ relocation (I have a patch for that), but with BIND_NOW, the variable itself will still not have been initialized when the IFUNC resolver runs, so you still won't be able to detect that something has been configured through the process environment. gperftools-2.5.93-1.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-685f48d47a gperftools-2.5.93-1.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-685f48d47a gperftools-2.5.93-1.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report. |