Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1965374
Summary: | glibc: valgrind suppressions no longer active after debuginfo removal | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Mattias Ellert <mattias.ellert> |
Component: | glibc | Assignee: | Florian Weimer <fweimer> |
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 34 | CC: | andrii.verbytskyi, aoliva, arjun.is, codonell, dj, dmalcolm, dodji, fweimer, jakub, jan.kratochvil, jwakely, law, mcermak, mfabian, mjw, mpolacek, msebor, nickc, pfrankli, rth, sipoyare, tuliom |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | glibc-2.33.9000-21.fc35 glibc-2.32-8.fc33 glibc-2.33-18.fc34 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-07-06 13:25:38 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Mattias Ellert
2021-05-27 15:08:52 UTC
It seems that valgrind had suppressions that relied on glibc debugging information: ==20030== Conditional jump or move depends on uninitialised value(s) ==20030== at 0x4C195EC: ??? (in /usr/lib64/libc-2.33.so) ==20030== by 0x4BED35F: ??? (in /usr/lib64/libc-2.33.so) ==20030== by 0x4BF753B: ??? (in /usr/lib64/libc-2.33.so) ==20030== by 0x4C7506B: __sprintf_chk (in /usr/lib64/libc-2.33.so) ==20030== by 0x48CF4F7: HepMC3::WriterAscii::write_event(HepMC3::GenEvent const&) (stdio2.h:38) ==20030== by 0x10A50B: main (testIO6.cc:27) After removal of that debugging information, these suppressions no longer work. Perhaps there is a fix for this without bringing back all the debugging information. I've just tried to rebuild valgrind-3.17.0-3.fc35.src.rpm for aarch64 on f33 and f34 and see -- Running tests in gdbserver_tests ----------------------------------- hginfo: valgrind --tool=helgrind --vgdb=yes --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-hginfo -q ./../helgrind/tests/hg01_all_ok (progB: ./gdb --quiet -l 60 --nx ../helgrind/tests/hg01_all_ok) *** hginfo failed (stderr) *** *** hginfo failed (stdoutB) *** *** hginfo failed (stderrB) *** hgtls: valgrind --tool=helgrind --vgdb=yes --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-hgtls -q ./../none/tests/tls (progB: ./gdb --quiet -l 60 --nx ../none/tests/tls) *** hgtls failed (stdoutB) *** mcblocklistsearch: valgrind --tool=memcheck --vgdb=yes --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-mcblocklistsearch -q ./../memcheck/tests/leak-tree (progB: ./gdb --quiet -l 60 --nx 1>&2 ../memcheck/tests/leak-tree) mcbreak: valgrind --tool=memcheck --vgdb=yes --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-mcbreak ./t (progB: ./gdb --quiet -l 60 --nx ./t) sh: line 1: 14371 Aborted (core dumped) ./gdb --quiet -l 60 --nx ./t < mcbreak.stdinB.gdb > mcbreak.stdoutB.out 2> mcbreak.stderrB.out Best regards, Andrii Is this an arm64 only issue? Or are you seeing this on other architectures too? I've tried for arm64 only. The f33/f34 builds failed, but f32 and rawhide were ok. Just restarted the builds for other architectures. The rebuild fails on multiple architectures. It succeeds only for all *86 and for aarch64@rawhide/f32. glibc-2.33.9000-16.fc35 may improve this situation: https://koji.fedoraproject.org/koji/buildinfo?buildID=1772153 It is available in the f35-build-side-42221 side tag. I may not have guessed the relevant symbols correctly, but we can extend the symbol set easily enough. If rawhide is confirmed fixed by this, I will backport to Fedora 34 and 33. The HepMC3 build in koschei for f35 (rawhide) failed with glibc-2.33.9000-18.fc35.aarch64 https://koschei.fedoraproject.org/package/HepMC3?collection=f35 (In reply to Mattias Ellert from comment #7) > The HepMC3 build in koschei for f35 (rawhide) failed with > glibc-2.33.9000-18.fc35.aarch64 > https://koschei.fedoraproject.org/package/HepMC3?collection=f35 The valgrind errors suggest that the required all symbols are present, so maybe the issue wasn't the missing symbols after all. Hmm. Unfortunately I do not have the resources to analyze this further at this time, sorry. I had to revert the addition of the __str* symbols due to bug 1973026. I spent some time figuring out why __strlen_mte is apparently selected: ==2874915== Conditional jump or move depends on uninitialised value(s) ==2874915== at 0x4C29AEC: __strlen_mte (in /usr/lib64/libc.so.6) ==2874915== by 0x4BF008F: __vfprintf_internal (in /usr/lib64/libc.so.6) ==2874915== by 0x4BFA0FB: ??? (in /usr/lib64/libc.so.6) ==2874915== by 0x4C84E9B: __sprintf_chk (in /usr/lib64/libc.so.6) ==2874915== by 0x48CF037: HepMC3::WriterAscii::write_event(HepMC3::GenEvent const&) (stdio2.h:38) ==2874915== by 0x10D71F: main (testBoost.cc:132) This may be a binutils bug. It looks like the IFUNC is bypassed and the weak definition of __strlen_mte is used to bind strlen locally. For example: Dump of assembler code for function __strdup: 0x0000000000093e90 <+0>: paciasp 0x0000000000093e94 <+4>: stp x29, x30, [sp, #-32]! 0x0000000000093e98 <+8>: mov x29, sp 0x0000000000093e9c <+12>: stp x19, x20, [sp, #16] 0x0000000000093ea0 <+16>: mov x20, x0 0x0000000000093ea4 <+20>: bl 0x9dac0 <__strlen_mte> 0x0000000000093ea8 <+24>: add x19, x0, #0x1 0x0000000000093eac <+28>: mov x0, x19 0x0000000000093eb0 <+32>: bl 0x27e50 <malloc@plt> 0x0000000000093eb4 <+36>: cbz x0, 0x93ed0 <__strdup+64> 0x0000000000093eb8 <+40>: mov x2, x19 0x0000000000093ebc <+44>: mov x1, x20 0x0000000000093ec0 <+48>: ldp x19, x20, [sp, #16] 0x0000000000093ec4 <+52>: ldp x29, x30, [sp], #32 0x0000000000093ec8 <+56>: autiasp 0x0000000000093ecc <+60>: b 0x9b780 <__memcpy_generic> 0x0000000000093ed0 <+64>: ldp x19, x20, [sp, #16] 0x0000000000093ed4 <+68>: ldp x29, x30, [sp], #32 0x0000000000093ed8 <+72>: autiasp 0x0000000000093edc <+76>: ret End of assembler dump. I really don't want to bring back the full symbol table for libc.so.6 because it is so large. Maybe __strlen_asimd can be emulated better by valgrind, but it is significantly more complex. __strlen_mte is used deliberately: /* It doesn't make sense to send libc-internal strlen calls through a PLT. */ .globl __GI_strlen; __GI_strlen = __strlen_mte So no binutils bug. These are the intercepted symbols from valgrind shared/vg_replace_strmem.c: bcmp bcopy __GI_memchr __GI_memcmp __GI_memcpy __GI_memmove __GI_mempcpy __GI___rawmemchr __GI_stpcpy __GI_strcasecmp __GI___strcasecmp_l __GI_strcasecmp_l __GI_strcat __GI_strchr __GI_strcmp __GI_strcpy __GI_strcspn __GI_strlen __GI_strncasecmp __GI___strncasecmp_l __GI_strncasecmp_l __GI_strncmp __GI_strncpy __GI_strnlen __GI_strrchr __GI_wcsnlen __GI_wmemchr index memchr memcmp __memcmp_sse2 __memcmp_sse4_1 memcpy __memcpy_avx_unaligned_erms __memcpy_chk __memcpy_sse2 memcpyZAGLIBCZu2Zd2Zd5 memcpyZAZAGLIBCZu2Zd14 memcpyZDVARIANTZDsse3x memcpyZDVARIANTZDsse42 memcpyZPZa memmove __memmove_chk memmoveZDVARIANTZDsse3x memmoveZDVARIANTZDsse42 memmoveZPZa mempcpy memrchr memset memsetZPZa putenv rawmemchr rindex setenv stpcpy __stpcpy_chk __stpcpy_sse2 __stpcpy_sse2_unaligned stpncpy strcasecmp strcasecmp_l strcasestr strcat strchr strchrnul __strchr_sse2 __strchr_sse2_no_bsf strcmp __strcmp_sse2 __strcmp_sse42 strcpy __strcpy_chk strcspn strlcat strlcpy strlen __strlen_sse2 __strlen_sse2_no_bsf __strlen_sse42 strncasecmp strncasecmp_l strncat strncmp __strncmp_sse2 __strncmp_sse42 strncpy __strncpy_sse2 __strncpy_sse2_unaligned strnlen strpbrk strrchr __strrchr_sse2 __strrchr_sse2_no_bsf __strrchr_sse42 strspn strstr __strstr_sse2 __strstr_sse42 unsetenv wcschr wcscmp wcscpy wcslen wcsncmp wcsnlen wcsrchr wmemchr Note that some of these might be ancient, non-existing or unneeded these days. The intercepts are there for functions that are so optimized that valgrind/memcheck cannot proof them correct (at the time). Thanks. I will not include the public symbols, only the __ variants, assuming that valgrind can lift them from .dynsym. HepMC3 scratch build in f35-build-side-42819 with glibc-2.33.9000-21.fc35: https://koji.fedoraproject.org/koji/taskinfo?taskID=70305267 Succeeds on all architectures, including aarch64 that used to fail. Thank you all! Another HepMC3 scratch build against glibc-2.33.9000-22.fc35: https://koji.fedoraproject.org/koji/taskinfo?taskID=70343510 I'm going to treat this as a glibc bug (and not valgrind improperly depending on glibc internals). I plan to backport the changes to Fedora 34 and 33, which also received the glibc debuginfo updates. FEDORA-2021-483200b24b has been submitted as an update to Fedora 34. https://bodhi.fedoraproject.org/updates/FEDORA-2021-483200b24b FEDORA-2021-483200b24b has been pushed to the Fedora 34 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-483200b24b` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-483200b24b See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates. FEDORA-2021-483200b24b has been pushed to the Fedora 34 stable repository. If problem still persists, please make note of it in this bug report. HepMC3 build in koschei for Fedora 34 now OK: https://koschei.fedoraproject.org/package/HepMC3?collection=f34 Only Fedora 33 is still broken. FEDORA-2021-f29b4643c7 has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2021-f29b4643c7 FEDORA-2021-f29b4643c7 has been pushed to the Fedora 33 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-f29b4643c7` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-f29b4643c7 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates. If there are needed more symbols they need to be added to .dynsym, not to an incomplete .symtab: Bug 1975895 Comment 2 (In reply to Jan Kratochvil from comment #22) > If there are needed more symbols they need to be added to .dynsym, not to an > incomplete .symtab: Bug 1975895 Comment 2 Many of these .symtab symbols are hidden, so they cannot be added to .dynsym. Ideally, tools would be fixed not to assume that .symtab is a superset of .dynsym. (In reply to Florian Weimer from comment #23) > Many of these .symtab symbols are hidden, so they cannot be added to > .dynsym. Ideally, tools would be fixed not to assume that .symtab is a > superset of .dynsym. Why the STB_LOCAL symbols cannot be in .dynsym? Maybe some tool does not support that and it could be fixed? Isn't it another option instead of the .symtab+.dynsym merging? Wouldn't be the STB_LOCAL in .dynsym more compatible across consumers? Or I suspect it would break existing ld.so (I have not checked)? FEDORA-2021-f29b4643c7 has been pushed to the Fedora 33 stable repository. If problem still persists, please make note of it in this bug report. Reproducer (build with -O2): #define _FORTIFY_SOURCE 2 #include <string> #include <stdio.h> void __attribute__((weak)) test(const char *a, const char *b) { std::string mu = a; std::string lu = b; char buf[256]; sprintf(buf, "U %s %s", mu.c_str(), lu.c_str()); puts(buf); } int main() { test("GEV", "MM"); } (In reply to Florian Weimer from comment #26) > Reproducer (build with -O2): > > #define _FORTIFY_SOURCE 2 > #include <string> > #include <stdio.h> > > void __attribute__((weak)) > test(const char *a, const char *b) > { > std::string mu = a; > std::string lu = b; > char buf[256]; > sprintf(buf, "U %s %s", mu.c_str(), lu.c_str()); > puts(buf); > } > > int > main() > { > test("GEV", "MM"); > } I cannot reproduce this on either fedora 34 or rawhide on x86_64. In which situations does this cause valgrind memcheck errors? P.S. Note that this bug is already closed. (In reply to Mark Wielaard from comment #27) > (In reply to Florian Weimer from comment #26) > > Reproducer (build with -O2): > > > > #define _FORTIFY_SOURCE 2 > > #include <string> > > #include <stdio.h> > > > > void __attribute__((weak)) > > test(const char *a, const char *b) > > { > > std::string mu = a; > > std::string lu = b; > > char buf[256]; > > sprintf(buf, "U %s %s", mu.c_str(), lu.c_str()); > > puts(buf); > > } > > > > int > > main() > > { > > test("GEV", "MM"); > > } > > I cannot reproduce this on either fedora 34 or rawhide on x86_64. > In which situations does this cause valgrind memcheck errors? The bug only occurred on aarch64. I made some changes to .symtab in Fedora 34, so I wanted to re-run the reproducer. I verified that it fails with glibc-2.33-12.fc34 (if I recall correctly; not all the relevant builds are still in Koji, but that one is still there). The reproducer passes on aarch64 on current rawhide and Fedora 34, and Fedora 34 with my upcoming changes in glibc-2.33-19.fc34 (essentially backporting the .symtab construction to Fedora 34). So there is nothing to do here really. I just kept rewriting that reproducer from scratch over and over again, so I wanted to have something that I can cut-and-paste easily. Sorry for any confusion caused. |