Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.

Bug 1108925

Summary: libcap-ng: issues with python testsuite
Product: [Fedora] Fedora Reporter: Peter Robinson <pbrobinson>
Component: gccAssignee: Jakub Jelinek <jakub>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: blc, codonell, jakub, kmcmartin, law, mjuszkie, pbrobinson, pfrankli, spoyarek
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-08-01 13:34:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 922257    
Attachments:
Description Flags
the build logs none

Description Peter Robinson 2014-06-12 21:04:08 UTC
Created attachment 908307 [details]
the build logs

We're seeing a failure in building libcap-ng on any release of glibc-2.19.90-17.fc21 or later.

With glibc-2.19.90-16.fc21 and earlier it builds fine.

http://arm.koji.fedoraproject.org/koji/taskinfo?taskID=2405574

Comment 1 Marcin Juszkiewicz 2014-06-12 21:16:16 UTC
Note: those checks were made with up-to-date rawhide with just glibc packages exchanged to older versions.

Comment 2 Siddhesh Poyarekar 2014-06-12 21:24:04 UTC
Could you please isolate what failed so that we know actually what's wrong?  There seem to be a bunch of tests in a single file.

In other words, a reproducer would be helpful.

Comment 3 Marcin Juszkiewicz 2014-06-12 21:35:42 UTC
Siddhesh: cd bindings/python/test/; make check (or make test)

It will run Python script which has several checks and when one of them fail then it exits. With glibc 2.19.90-16 it passes. With newer it fails.

Comment 4 Siddhesh Poyarekar 2014-06-13 09:06:42 UTC
It looks like Kyle's patch is causing it.  Kyle, can you please look at it?

Comment 5 Kyle McMartin 2014-06-13 13:40:45 UTC
Yes. Although, given it's a one line patch that fixes dlopen-ing anything with TLS, I'm inclined to say we should just declare TLSDESC-by-default a failed experiment and flip to traditional TLS in gcc. Sigh. (Given until my patch, we were always using a different code path for TLS descriptors until we exhausted static TLS space, I'm not surprised there is ugliness lurking.)

Comment 6 Kyle McMartin 2014-06-13 13:54:14 UTC
Hmm, hang on. It works fine built against an older glibc which contains that patch. I suspect that might be a red herring. Possibly new gcc skew?

Comment 7 Kyle McMartin 2014-06-13 13:55:06 UTC
Nope, 4.9.0-5 in both. Awesome.

Comment 8 Kyle McMartin 2014-06-13 17:20:17 UTC
Reverting it and rebuilding glibc does indeed seem to fix it. Wonderful. So there's a subtle bug in the _dl_tlsdesc_dynamic code somewhere...

Comment 9 Kyle McMartin 2014-06-17 17:01:55 UTC
http://koji.fedoraproject.org/koji/taskinfo?taskID=7052250

well, the failures in the python test-suite are not limited to aarch64. I see them on i686 and x86_64 as well.

Better still, an earlier test fails when you run it in mock --shell in the same way on aarch64 and x86_64.

Better-better still, this means it has nothing to do with TLS descriptors, and may be a generic TLS code generation bug in GCC, since x86_64 does not use them to access dynamic TLS symbols.

Comment 10 Kyle McMartin 2014-06-17 18:34:49 UTC
Building with CFLAGS="-O1" results in a working src/.libs/libcap-ng.so.0 so this looks like an optimization bug on AArch64... my current theory is that the reason the glibc version matters is that in the older glibc, we'd get a static TLS slot, which means we'd take the fast return path and probably avoid clobbering the register.

The X86_64 thing seems like a red herring at the moment... I'm working on reducing this to the appropriate gcc optimizer flag.

Comment 11 Kyle McMartin 2014-06-17 19:54:05 UTC
OK, building with -O2 -fno-schedule-insns -fno-schedule-insns2 appears to get things working again.