Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.

Bug 1951492

Summary: A glibc test hangs upon pthread cancellation when glibc is compiled with annobin turned on
Product: [Fedora] Fedora Reporter: Arjun Shankar <ashankar>
Component: annobinAssignee: Nick Clifton <nickc>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 35CC: awilliam, bcotton, fweimer, jakub, nickc, pbrobinson, pwhalen, robatino, yaneti
Target Milestone: ---Keywords: Bugfix
Target Release: ---   
Hardware: armv7hl   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-12-13 15:21:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 245418    

Description Arjun Shankar 2021-04-20 09:16:10 UTC
malloc/tst-malloc-stats-cancellation hangs when I use the following configure line:

../configure CFLAGS="-v -w -g -O2 -iplugindir=/usr/lib/gcc/armv7hl-redhat-linux-gnueabi/11/plugin -fplugin=annobin" --prefix=/usr --with-nonshared-cflags="-fplugin=annobin -fplugin-arg-annobin-disable" --disable-werror

...but not when I use this:

../configure CFLAGS="-v -w -g -O2" --prefix=/usr --disable-werror

I'm not sure how the hang is related to annobin, but: a child thread is cancelled but the cancellation does not occur cleanly: a lock on stderr is not released; and the parent tries to acquire the lock after the child's cancellation, ending up waiting on it until the test times out.

Comment 1 Nick Clifton 2021-04-20 12:46:42 UTC
In theory annobin should no affect on the execution of any binary to which it has been applied.  The plugin just creates a non-loadable note section and some extra symbols in the symbol table.  In practice those extra symbols can sometimes be problematical, and maybe this is the case in this particular scenario.

Without knowing more about why the lock is being held, it is hard to say any more.  But a possible place to look is any ARM specific code in the thread library.  In particular is there any code that scans the symbol table of ARM binaries, possibly looking for function symbols or the like ?

Comment 2 Florian Weimer 2021-04-20 13:26:31 UTC
ARM EABI uses non-DWARF exception handling. Perhaps that's why it's disturbed by annobin data and the extra symbols?

Comment 3 Nick Clifton 2021-04-21 12:15:34 UTC
Hi Arjun,

  If it is the annobin symbols that are causing a problem, then you *might* be able to make the test work by stripping them out.  For example:

    objcopy --strip-unneeded a.out a.stripped.

  Of course this might also break the ARM unwinder by removing symbols that it needs, so no guarantees that it won't make things worse...

Cheers
  Nick

Comment 4 Fedora Blocker Bugs Application 2021-05-19 08:02:17 UTC
Proposed as a Blocker for 35-beta by Fedora user pbrobinson using the blocker tracking app because:

 This is actually a mass rebuild blocker but we don't have the ability to add that so adding it here so it's tracked somewhere.

Comment 5 Nick Clifton 2021-05-20 12:27:15 UTC
This may be fixed by annobin-9.72-1.fc35.  Arjun - please can you check ?

Comment 6 Arjun Shankar 2021-05-20 14:04:41 UTC
(In reply to Nick Clifton from comment #5)
> This may be fixed by annobin-9.72-1.fc35.  Arjun - please can you check ?

Thanks, Nick! I'm on it.

Comment 7 Adam Williamson 2021-05-20 16:12:32 UTC
"This is actually a mass rebuild blocker but we don't have the ability to add that so adding it here so it's tracked somewhere."

That's what the prioritized bug tracker is for:
https://docs.fedoraproject.org/en-US/program_management/prioritized_bugs/

Comment 8 Nick Clifton 2021-05-21 08:22:33 UTC
Hi Arjun,

  Given your recent results, I think that were actually two problems:

    1. The hang in pthread cancellation.  This I think was not caused
       by the annobin problem (below) but rather something else.  A
       recent commit to the glibc sources appears to have fixed the
       problem, even if annobin is used when compiling the sources.

    2. When a relocatable link is performed on ARM object files that
       have been annotated by the annobin plugin, the resulting 
       unwind information is corrupt.  I think that this has been 
       fixed in the annobin-9.72-1.fc35 build.

  Do you agree ?  If so, then I think that we can close this BZ.  If 1)
  is true but 2) is not, then it would be better to open a separate BZ
  for it.  But if 1) is false, then more investigation is needed,
  although I am not sure where.

Cheers
  Nick

Comment 9 Arjun Shankar 2021-05-24 15:06:52 UTC
Hi Nick!

So, I tested with "-Wl,--force-group-allocation" for libc_pic.os and
that seems to remove the hang. i.e.:

* Without the option but with annobin turned on: it hangs
* With the option and with annobin turned on: it does not hang

Note that this is at a glibc commit that was already hanging.

What we know now:

1. A hang started occuring at glibc commit "C1" (say).

2. Any *one* of three events appear to remove the hang:
 * turning off annobin
 * building libc_pic.os with --force-group-allocation
 * fast-forwarding glibc to commit "C2"

Does this pinpoint any more about where bug #1 might lie?

Comment 10 Nick Clifton 2021-05-28 10:33:41 UTC
Hi Arjun,

> Does this pinpoint any more about where bug #1 might lie?

  Yes - I think that it is safe to say that there is a latent problem with ARM unwind information and annobin annotated code.  Commit C1 exposed this problem, (which presumably has existed for a long time, but is only now coming to light) and commit C2 has hidden it again.

  I had really hoped that annobin-9.73 would fix this problem, as it contains ARM specific code to disable the generation of section groups.  (I believe annobin's use of section groups to be the underlying cause of the problem).

  So back to the drawing board for me I guess.

Cheers
  Nick

Comment 11 Ben Cotton 2021-06-16 15:28:40 UTC
In today's Prioritized Bugs meeting[1], we accepted this as a Prioritized Bug.

[1] https://meetbot.fedoraproject.org/fedora-meeting-1/2021-06-16/fedora_prioritized_bugs_and_issues.2021-06-16-15.01.log.html#l-46

Comment 12 Ben Cotton 2021-07-09 15:32:30 UTC
If anyone has additional input or can do additional testing, please comment.

Comment 13 Ben Cotton 2021-08-10 12:59:22 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 35 development cycle.
Changing version to 35.

Comment 14 Ben Cotton 2021-08-11 15:38:57 UTC
In today's Prioritized Bugs meeting, we agreed that this bug is no longer a prioritized bug as the mass rebuild seems to have completed successfully without a fix.

https://meetbot.fedoraproject.org/fedora-meeting-1/2021-08-11/fedora_prioritized_bugs_and_issues.2021-08-11-15.00.html

Comment 15 Ben Cotton 2022-11-29 16:55:34 UTC
This message is a reminder that Fedora Linux 35 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 35 on 2022-12-13.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '35'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 35 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 16 Ben Cotton 2022-12-13 15:21:23 UTC
Fedora Linux 35 entered end-of-life (EOL) status on 2022-12-13.

Fedora Linux 35 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.