Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1951492
Summary: | A glibc test hangs upon pthread cancellation when glibc is compiled with annobin turned on | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Arjun Shankar <ashankar> |
Component: | annobin | Assignee: | Nick Clifton <nickc> |
Status: | ASSIGNED --- | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | rawhide | CC: | awilliam, bcotton, fweimer, jakub, nickc, pbrobinson, pwhalen, robatino, yaneti |
Target Milestone: | --- | Keywords: | Bugfix |
Target Release: | --- | Flags: | bcotton:
fedora_prioritized_bug+
|
Hardware: | armv7hl | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | Type: | Bug | |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 245418 |
Description
Arjun Shankar
2021-04-20 09:16:10 UTC
In theory annobin should no affect on the execution of any binary to which it has been applied. The plugin just creates a non-loadable note section and some extra symbols in the symbol table. In practice those extra symbols can sometimes be problematical, and maybe this is the case in this particular scenario. Without knowing more about why the lock is being held, it is hard to say any more. But a possible place to look is any ARM specific code in the thread library. In particular is there any code that scans the symbol table of ARM binaries, possibly looking for function symbols or the like ? ARM EABI uses non-DWARF exception handling. Perhaps that's why it's disturbed by annobin data and the extra symbols? Hi Arjun, If it is the annobin symbols that are causing a problem, then you *might* be able to make the test work by stripping them out. For example: objcopy --strip-unneeded a.out a.stripped. Of course this might also break the ARM unwinder by removing symbols that it needs, so no guarantees that it won't make things worse... Cheers Nick Proposed as a Blocker for 35-beta by Fedora user pbrobinson using the blocker tracking app because: This is actually a mass rebuild blocker but we don't have the ability to add that so adding it here so it's tracked somewhere. This may be fixed by annobin-9.72-1.fc35. Arjun - please can you check ? (In reply to Nick Clifton from comment #5) > This may be fixed by annobin-9.72-1.fc35. Arjun - please can you check ? Thanks, Nick! I'm on it. "This is actually a mass rebuild blocker but we don't have the ability to add that so adding it here so it's tracked somewhere." That's what the prioritized bug tracker is for: https://docs.fedoraproject.org/en-US/program_management/prioritized_bugs/ Hi Arjun, Given your recent results, I think that were actually two problems: 1. The hang in pthread cancellation. This I think was not caused by the annobin problem (below) but rather something else. A recent commit to the glibc sources appears to have fixed the problem, even if annobin is used when compiling the sources. 2. When a relocatable link is performed on ARM object files that have been annotated by the annobin plugin, the resulting unwind information is corrupt. I think that this has been fixed in the annobin-9.72-1.fc35 build. Do you agree ? If so, then I think that we can close this BZ. If 1) is true but 2) is not, then it would be better to open a separate BZ for it. But if 1) is false, then more investigation is needed, although I am not sure where. Cheers Nick Hi Nick! So, I tested with "-Wl,--force-group-allocation" for libc_pic.os and that seems to remove the hang. i.e.: * Without the option but with annobin turned on: it hangs * With the option and with annobin turned on: it does not hang Note that this is at a glibc commit that was already hanging. What we know now: 1. A hang started occuring at glibc commit "C1" (say). 2. Any *one* of three events appear to remove the hang: * turning off annobin * building libc_pic.os with --force-group-allocation * fast-forwarding glibc to commit "C2" Does this pinpoint any more about where bug #1 might lie? Hi Arjun,
> Does this pinpoint any more about where bug #1 might lie?
Yes - I think that it is safe to say that there is a latent problem with ARM unwind information and annobin annotated code. Commit C1 exposed this problem, (which presumably has existed for a long time, but is only now coming to light) and commit C2 has hidden it again.
I had really hoped that annobin-9.73 would fix this problem, as it contains ARM specific code to disable the generation of section groups. (I believe annobin's use of section groups to be the underlying cause of the problem).
So back to the drawing board for me I guess.
Cheers
Nick
In today's Prioritized Bugs meeting[1], we accepted this as a Prioritized Bug. [1] https://meetbot.fedoraproject.org/fedora-meeting-1/2021-06-16/fedora_prioritized_bugs_and_issues.2021-06-16-15.01.log.html#l-46 If anyone has additional input or can do additional testing, please comment. |