Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.

Bug 1862110

Summary: F33 has PAC/BTI enabled for rawhide but this is causing binutils ld producing broken PLT
Product: [Fedora] Fedora Reporter: Mark Wielaard <mjw>
Component: binutilsAssignee: Nick Clifton <nickc>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 33CC: aoliva, dvlasenk, extras-qa, fche, fweimer, jakub, jeremy.linton, jlinton, law, me, mjw, nickc, pbrobinson, sipoyare
Target Milestone: ---   
Target Release: ---   
Hardware: aarch64   
OS: Unspecified   
Whiteboard:
Fixed In Version: binutils-2.35-7.fc33 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1861423 Environment:
Last Closed: 2021-11-10 18:30:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1861423    
Bug Blocks: 245418, 1847148    

Description Mark Wielaard 2020-07-30 13:33:50 UTC
This clone of the bug is against binutils, and is for:

Part 2) rpm debugedit (which used elfutils libelf) not being able to update a file because of "invalid section entry size".

+++ This bug was initially created as a clone of Bug #1861423 +++

Description of problem: elfutils when built -mbranch-protection=standard is experiencing a unit test failure on aarch64. Further it appears that its also causing debuginfo extraction problems in other packages.

When built on aarch64:

============================================================================
Testsuite summary for elfutils 0.180
============================================================================
# TOTAL: 219
# PASS:  213
# SKIP:  5
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0
============================================================================
See tests/test-suite.log
Please report to https://sourceware.org/bugzilla
============================================================================
FAIL: run-backtrace-native-core.sh
==================================

/usr/bin/coredumpctl
           PID: 7477 (backtrace-child)
           UID: 0 (root)
           GID: 0 (root)
        Signal: 6 (ABRT)
     Timestamp: Tue 2020-07-28 11:01:25 EDT (2s ago)
  Command Line: /root/t/elfutils/elfutils-0.180/tests/backtrace-child --gencore
    Executable: /root/t/elfutils/elfutils-0.180/tests/backtrace-child
 Control Group: /user.slice/user-0.slice/session-3.scope
          Unit: session-3.scope
         Slice: user-0.slice
       Session: 3
     Owner UID: 0 (root)
       Boot ID: e42abccd30874f80a5904ce3a8e2c9f1
    Machine ID: e4e16166188344d5acacabe5d9d3dd3c
      Hostname: localhost.localdomain
       Storage: /var/lib/systemd/coredump/core.backtrace-child.0.e42abccd30874f80a5904ce3a8e2c9f1.7477.1595948485000000000000.zst
       Message: Process 7477 (backtrace-child) of user 0 dumped core.
                
                Stack trace of thread 7482:
                #0  0x0000ffffa733aaf8 raise (libpthread.so.0 + 0x13af8)
                #1  0x0000aaaaafa2de4c n/a (/root/t/elfutils/elfutils-0.180/tests/backtrace-child + 0xe4c)
                #2  0x0000aaaaafa2de4c n/a (/root/t/elfutils/elfutils-0.180/tests/backtrace-child + 0xe4c)
                #3  0x0000aaaaafa2df2c n/a (/root/t/elfutils/elfutils-0.180/tests/backtrace-child + 0xf2c)
                #4  0x0000aaaaafa2df44 n/a (/root/t/elfutils/elfutils-0.180/tests/backtrace-child + 0xf44)
                #5  0x0000aaaaafa2df54 n/a (/root/t/elfutils/elfutils-0.180/tests/backtrace-child + 0xf54)
                #6  0x0000ffffa732ef74 start_thread (libpthread.so.0 + 0x7f74)
                
                Stack trace of thread 7477:
                #0  0x0000ffffa73303c0 __pthread_clockjoin_ex (libpthread.so.0 + 0x93c0)
                #1  0x0000aaaaafa2dc34 n/a (/root/t/elfutils/elfutils-0.180/tests/backtrace-child + 0xc34)
                #2  0x0000aaaaafa2dc34 n/a (/root/t/elfutils/elfutils-0.180/tests/backtrace-child + 0xc34)
                #3  0x0000ffffa71c5878 __libc_start_main (libc.so.6 + 0x24878)
backtrace: backtrace.c:144: callback_verify: Assertion `symname != NULL && strcmp (symname, "backtracegen") == 0' failed.
./test-subr.sh: line 84:  8904 Aborted                 (core dumped) LD_LIBRARY_PATH="${built_library_path}${LD_LIBRARY_PATH:+:}$LD_LIBRARY_PATH" $VALGRIND_CMD "$@"
backtrace-child-core.7477: no main
rmdir: failed to remove 'test-7404': Directory not empty
FAIL run-backtrace-native-core.sh (exit status: 1)


Version-Release number of selected component (if applicable):
0.180

How reproducible: at the moment 100% 


Steps to Reproduce:
1. Acquire rawhide/f33 with gcc 10.2.1+recent binutils
2. build elfutils on that machine with `fedpkg local`


Actual results:
As seen above

backtrace: backtrace.c:144: callback_verify: Assertion `symname != NULL && strcmp (symname, "backtracegen") == 0' failed.


(glibc failure caused by elfutils)

++ /usr/lib/rpm/debugedit -b /root/t/glibc -d /usr/src/debug -i -l ./debugsources.list /root/rpmbuild/BUILDROOT/glibc-2.31.9000-21.fc33.aarch64/usr/bin/gencat
Failed to update file: invalid section entry size


Expected results:

Additional info:

--- Additional comment from Jeremy Linton on 2020-07-28 16:43:24 UTC ---



--- Additional comment from Jeremy Linton on 2020-07-28 16:45:19 UTC ---



--- Additional comment from Mark Wielaard on 2020-07-28 21:38:44 UTC ---

So this is really 2 bugs.

1) elfutils backtrace failing when building with -mbranch-protection=standard

2) rpm debugedit (which used elfutils libelf) not being able to update a file because of "invalid section entry size".

I can replicate 1) by building upstream elfutils with CFLAGS="-g -O2 -mbranch-protection=standard" CXXFLAGS="$CFLAGS"
In that case both run-backtrace-native.sh and run-backtrace-native-core.sh fail. They succeed without -mbranch-protection=standard

Issue 2) can be shown with the gencat ELF file attachment:
# eu-elflint --gnu ./gencat 
section [14] '.plt': size not multiple of entry size
section [23] '.dynamic': entry 22: unknown tag

And indeed, the .plt section is bad:
[14] .plt                 PROGBITS     0000000000401140 00001140 00000410 24 AX     0   0 16

410 hex = 1040 is not dividable by the entry size 24
(it looks like there are 43 entries and then 8 extra bytes)

I'll try to figure out issue 1. But issue 2 must be somewhere else, probably binutils ld which generated the .plt section.

--- Additional comment from Mark Wielaard on 2020-07-28 21:45:33 UTC ---

> section [23] '.dynamic': entry 22: unknown tag

BTW. This is   <unknown>: 0x70000001 000000000000000000
If someone knows what d_tag type 0x70000001 (DT_LOPROC + 1) is, that would be appreciated.
It isn't listed in glibc /usr/include/elf.h (which is what elfutils uses).
The only entry for aarch64 is #define DT_AARCH64_VARIANT_PCS  (DT_LOPROC + 5)

--- Additional comment from Mark Wielaard on 2020-07-29 10:16:07 UTC ---

Note that this does NOT seem to impact the mass rebuild going on.
As far as I can see builds on aarch64 are fine, elfutils itself got rebuild without showing any failures:
https://kojipkgs.fedoraproject.org//packages/elfutils/0.180/6.fc33/data/logs/aarch64/build.log

It does look like it is using -mbranch-protection=standard
But I also see SKIP: run-backtrace-native-core.sh which means no core file was generated on the koji builder.

Same for glibc, I don't see any debugedit failures in the aarch64 build.log:
https://kojipkgs.fedoraproject.org//work/tasks/5655/47975655/build.log

--- Additional comment from Florian Weimer on 2020-07-29 10:28:08 UTC ---

This issue may also trigger during an aarch64 rebuild of glibc if PAC+BTI is enabled:

extracting debug info from /builddir/build/BUILDROOT/glibc-2.31.9000-23.fc33.aarch64/lib64/libutil-2.31.9000.so
explicitly decompress any DWARF compressed ELF sections in /builddir/build/BUILDROOT/glibc-2.31.9000-23.fc33.aarch64/sbin/ldconfig
extracting debug info from /builddir/build/BUILDROOT/glibc-2.31.9000-23.fc33.aarch64/sbin/ldconfig
explicitly decompress any DWARF compressed ELF sections in /builddir/build/BUILDROOT/glibc-2.31.9000-23.fc33.aarch64/usr/bin/gencat
extracting debug info from /builddir/build/BUILDROOT/glibc-2.31.9000-23.fc33.aarch64/usr/bin/gencat
Failed to update file: invalid section entry size
error: Bad exit status from /var/tmp/rpm-tmp.lOeYPV (%install)
    Bad exit status from /var/tmp/rpm-tmp.lOeYPV (%install)

My guess: We do not see it more widely because glibc in the buildroot is built without PAC+BTI. The link editor does not produce the problematic output as a result, masking any elfutils problems that may exist.

--- Additional comment from Jakub Jelinek on 2020-07-29 10:37:17 UTC ---

/* Processor specific dynamic array tags.  */
#define DT_AARCH64_BTI_PLT      (DT_LOPROC + 1)
#define DT_AARCH64_PAC_PLT      (DT_LOPROC + 3)
#define DT_AARCH64_VARIANT_PCS  (DT_LOPROC + 5)
is what binutils sources have.

--- Additional comment from Mark Wielaard on 2020-07-29 10:44:43 UTC ---

(In reply to Jakub Jelinek from comment #7)
> /* Processor specific dynamic array tags.  */
> #define DT_AARCH64_BTI_PLT      (DT_LOPROC + 1)
> #define DT_AARCH64_PAC_PLT      (DT_LOPROC + 3)
> #define DT_AARCH64_VARIANT_PCS  (DT_LOPROC + 5)
> is what binutils sources have.

Ah, great, so this does seem to confirm that something is up with the .plt section.
Is there any documentation on what it means to have those tags in the dynamic array?

I looked to the change request at https://fedoraproject.org/wiki/Changes/Aarch64_PointerAuthentication
and asked around, but nobody seems to know anything about any ELF, DWARF or gabi changes.
But I guess there must be seeing the issues with the dynamic tags, .plt section and the fact that unwinding seems broken.

Can we merge them into glibc elf.h to expose them to other tools?

--- Additional comment from Mark Wielaard on 2020-07-29 10:58:08 UTC ---

(In reply to Florian Weimer from comment #6)
> This issue may also trigger during an aarch64 rebuild of glibc if PAC+BTI is
> enabled:
> 
> extracting debug info from
> /builddir/build/BUILDROOT/glibc-2.31.9000-23.fc33.aarch64/usr/bin/gencat
> Failed to update file: invalid section entry size
> error: Bad exit status from /var/tmp/rpm-tmp.lOeYPV (%install)
>     Bad exit status from /var/tmp/rpm-tmp.lOeYPV (%install)

This issue is analyzed a bit in comment #3.
You can also see this running eu-elflint on gencat:
section [14] '.plt': size not multiple of entry size

Given some of the other observations, might it be that the linker somehow creates .plt entries of different sizes when creating gencat?
That would cause sh_size % sh_entsize != 0 which makes debugedit/libelf throw an error when it encounters such an .plt section.

--- Additional comment from Mark Wielaard on 2020-07-29 11:13:46 UTC ---

GDB does seem able to unwind through the core file, but eu-stack doesn't:

# gdb --core tests/test-187673/core.187694 tests/backtrace-child

(gdb) thread apply all bt

Thread 2 (Thread 0xffff9777e010 (LWP 187694)):
#0  0x0000ffff97726610 in __pthread_clockjoin_ex () from /lib64/libpthread.so.0
#1  0x0000aaaad1523b3c in main (argc=<optimized out>, argv=<optimized out>) at backtrace-child.c:241

Thread 1 (Thread 0xffff975a6110 (LWP 187695)):
#0  0x0000ffff97730d48 in raise () from /lib64/libpthread.so.0
#1  0x0000aaaad1523d4c in sigusr2 (signo=<optimized out>) at backtrace-child.c:132
#2  0x0000aaaad1523e2c in stdarg (f=<optimized out>) at backtrace-child.c:176
#3  0x0000aaaad1523e44 in backtracegen () at backtrace-child.c:190
#4  0x0000aaaad1523e54 in start (arg=<optimized out>) at backtrace-child.c:205
#5  0x0000ffff97725294 in start_thread () from /lib64/libpthread.so.0
#6  0x0000ffff9767d27c in thread_start () from /lib64/libc.so.6

# eu-stack -v --core tests/test-187673/core.187694 --exec tests/backtrace-child
PID 187694 - core
TID 187695:
#0  0x0000ffff97730d48     raise - libpthread.so.0
#1  0x0000aaaad1523d4c - 1 sigusr2 - backtrace-child
    /root/elfutils/tests/backtrace-child.c:132:3
#2  0x0000aaaad1523e2c - 1 stdarg - backtrace-child
    /root/elfutils/tests/backtrace-child.c:176:3
#3  0x0000ffff9774c000 - 1 - libpthread.so.0
eu-stack: dwfl_thread_getframes tid 187695 at 0xffff9774bfff in libpthread.so.0: No DWARF information found
TID 187694:
#0  0x0000ffff97726610     __pthread_clockjoin_ex - libpthread.so.0
#1  0x0000aaaad1523b3c - 1 main - backtrace-child
    /root/elfutils/tests/backtrace-child.c:241:5
#2  0x0000ffff975cb838 - 1 __libc_start_main - libc.so.6
#3  0xf00000f4a90153f3 - 1
#4  0xf00000f4a90153f3 - 1
eu-stack: dwfl_thread_getframes tid 187694 at 0xf00000f4a90153f2 in <unknown>: No DWARF information found

--- Additional comment from Mark Wielaard on 2020-07-29 12:01:49 UTC ---

Note that most backtraces actually work. Unless it goes through a signal frame.
Is there anything about PAC that changes how one unwinds through a signal frame?

--- Additional comment from Florian Weimer on 2020-07-29 12:04:01 UTC ---

Regarding the gencat problem, the PLT0 entry for gencat has a different size than the other PLT entries:

Disassembly of section .plt:

0000000000401140 <.plt>:
  401140:       d503245f        bti     c
  401144:       a9bf7bf0        stp     x16, x30, [sp, #-16]!
  401148:       d00000f0        adrp    x16, 41f000 <__FRAME_END__+0x1abd4>
  40114c:       f9474a11        ldr     x17, [x16, #3728]
  401150:       913a4210        add     x16, x16, #0xe90
  401154:       d61f0220        br      x17
  401158:       d503201f        nop
  40115c:       d503201f        nop

0000000000401160 <memcpy@plt>:
  401160:       d503245f        bti     c
  401164:       d00000f0        adrp    x16, 41f000 <__FRAME_END__+0x1abd4>
  401168:       f9474e11        ldr     x17, [x16, #3736]
  40116c:       913a6210        add     x16, x16, #0xe98
  401170:       d61f0220        br      x17
  401174:       d503201f        nop

0000000000401178 <strlen@plt>:
  401178:       d503245f        bti     c
  40117c:       d00000f0        adrp    x16, 41f000 <__FRAME_END__+0x1abd4>
  401180:       f9475211        ldr     x17, [x16, #3744]
  401184:       913a8210        add     x16, x16, #0xea0
  401188:       d61f0220        br      x17
  40118c:       d503201f        nop

I don't think that's valid ELF. Another oddity is that the binary has just an AARCH64_BTI_PLT entry:

Dynamic section at offset 0xfc60 contains 29 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [ld-linux-aarch64.so.1]
 0x000000000000000c (INIT)               0x401120
 0x000000000000000d (FINI)               0x403868
 0x0000000000000019 (INIT_ARRAY)         0x41fc40
 0x000000000000001b (INIT_ARRAYSZ)       8 (bytes)
 0x000000000000001a (FINI_ARRAY)         0x41fc48
 0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)
 0x0000000000000004 (HASH)               0x400330
 0x000000006ffffef5 (GNU_HASH)           0x400498
 0x0000000000000005 (STRTAB)             0x400990
 0x0000000000000006 (SYMTAB)             0x4004e0
 0x000000000000000a (STRSZ)              575 (bytes)
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000015 (DEBUG)              0x0
 0x0000000000000003 (PLTGOT)             0x41fe80
 0x0000000000000002 (PLTRELSZ)           1008 (bytes)
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000017 (JMPREL)             0x400d30
 0x0000000000000007 (RELA)               0x400c88
 0x0000000000000008 (RELASZ)             168 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x0000000070000001 (AARCH64_BTI_PLT)    
 0x0000000000000018 (BIND_NOW)           
 0x000000006ffffffb (FLAGS_1)            Flags: NOW
 0x000000006ffffffe (VERNEED)            0x400c38
 0x000000006fffffff (VERNEEDNUM)         2
 0x000000006ffffff0 (VERSYM)             0x400bd0
 0x0000000000000000 (NULL)               0x0

But it enables both BTI *and* PAC:

Displaying notes found in: .note.gnu.property
  Owner                Data size        Description
  GNU                  0x00000010       NT_GNU_PROPERTY_TYPE_0
      Properties: AArch64 feature: BTI, PAC

Maybe ld got confused in some way? I'm going to file a binutils bug once I have a few more details.

--- Additional comment from Jeremy Linton (ARM) on 2020-07-29 15:11:35 UTC ---

So, the arm-elf document https://developer.arm.com/documentation/ihi0056/g/ describes the elf related changes. 


In reference to #11 i remember there was a tweak around general exception handling, which affected libc (and that patch landed a year or so again IIRC), but I need to dig up the details.

Comment 1 Mark Wielaard 2020-07-30 13:34:34 UTC
The upstream bug is https://sourceware.org/bugzilla/show_bug.cgi?id=26312 which has a proposed patch at https://sourceware.org/pipermail/binutils/2020-July/112643.html

Comment 2 Nick Clifton 2020-08-10 11:24:20 UTC
Fixed in: binutils-2.35-7.fc33

Comment 3 Ben Cotton 2020-08-11 13:50:56 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 33 development cycle.
Changing version to 33.

Comment 4 Ben Cotton 2021-11-04 17:29:58 UTC
This message is a reminder that Fedora 33 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 33 on 2021-11-30.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '33'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 33 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 5 Jeremy Linton 2021-11-10 18:30:07 UTC
This is AFAIK working now, the efultils test in question is passing. The only failures are in debuginfod-find which fails in the same way on x86 with a local build.

Comment 6 Frank Ch. Eigler 2021-11-10 23:09:42 UTC
(In reply to Jeremy Linton from comment #5)
> This is AFAIK working now, the efultils test in question is passing. The
> only failures are in debuginfod-find which fails in the same way on x86 with
> a local build.

ears perking up, what are you observing with debuginfod-find?

Comment 7 Mark Wielaard 2021-11-15 10:26:16 UTC
(In reply to Jeremy Linton from comment #5)
> This is AFAIK working now, the efultils test in question is passing. The
> only failures are in debuginfod-find which fails in the same way on x86 with
> a local build.

Yes, https://sourceware.org/bugzilla/show_bug.cgi?id=26312 has been resolved.
But debuginfod-find shouldn't fail (on any arch). What failure are you seeing exactly?

Comment 8 Jeremy Linton 2022-04-26 19:56:22 UTC
Well whatever it was, its gone now:

estsuite summary for elfutils 0.187
============================================================================
# TOTAL: 257
# PASS:  253
# SKIP:  4
# XFAIL: 0
# FAIL:  0
# XPASS: 0
# ERROR: 0