Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1869884 - chromium/ffmpeg fails to build on aarch64 on Fedora 33+
Summary: chromium/ffmpeg fails to build on aarch64 on Fedora 33+
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: binutils
Version: 33
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Nick Clifton
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: ARMTracker
TreeView+ depends on / blocked
 
Reported: 2020-08-18 22:08 UTC by Tom "spot" Callaway
Modified: 2021-02-09 15:45 UTC (History)
13 users (show)

Fixed In Version: binutils-2.35-13.fc34
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-09 15:45:33 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
preprocessed source (gcc -E) (deleted)
2020-08-19 18:03 UTC, Tom "spot" Callaway
no flags Details
rsp file (deleted)
2020-08-19 18:12 UTC, Tom "spot" Callaway
no flags Details
binary object, no patch (deleted)
2020-08-24 15:26 UTC, Tom "spot" Callaway
no flags Details
binary object, with patch (deleted)
2020-08-24 15:27 UTC, Tom "spot" Callaway
no flags Details
preprocessed source (gcc -E), no patch (deleted)
2020-08-24 15:27 UTC, Tom "spot" Callaway
no flags Details
preprocessed source (gcc -E), patch applied (deleted)
2020-08-24 15:28 UTC, Tom "spot" Callaway
no flags Details
libclearkeycdm.so linked with old binutils and no patch applied to videodsp.S (deleted)
2020-08-25 16:08 UTC, Tom "spot" Callaway
no flags Details
videodsp, no patch, old (f32) binutils (deleted)
2020-08-25 16:10 UTC, Tom "spot" Callaway
no flags Details

Description Tom "spot" Callaway 2020-08-18 22:08:26 UTC
Description of problem:

chromium (or more specifically, the ffmpeg part of chromium) is failing on aarch64 on F33+ (everywhere else it is fine).


Version-Release number of selected component (if applicable):

84.0.4147.125-2

How reproducible:

Every aarch64 build fails.

Actual results:

g++ -shared -Wl,--fatal-warnings -fPIC -Wl,-z,noexecstack -Wl,-z,relro -Wl,-z,now -Wl,-z,defs -Wl,--as-needed -Wl,-O2 -Wl,--gc-sections -rdynamic -o "./libclearkeycdm.so" -Wl,-soname="libclearkeycdm.so" @"./libclearkeycdm.so.rsp"
obj/third_party/ffmpeg/ffmpeg_internal/videodsp.o: in function `ff_prefetch_aarch64':
(.text+0x10): relocation truncated to fit: R_AARCH64_CONDBR19 against symbol `ff_prefetch_aarch64' defined in .text section in obj/third_party/ffmpeg/ffmpeg_internal/videodsp.o
collect2: error: ld returned 1 exit status

Expected results:

No errors.

Additional info:

third_party/ffmpeg/libavcodec/aarch64/videodsp.S

#include "libavutil/aarch64/asm.S"

function ff_prefetch_aarch64, export=1
        subs            w2,  w2,  #2
        prfm            pldl1strm, [x0]
        prfm            pldl1strm, [x0,  x1]
        add             x0,  x0,  x1,  lsl #1
        b.gt            X(ff_prefetch_aarch64)
        ret
endfunc

Comment 1 Tom "spot" Callaway 2020-08-19 18:03:54 UTC
Created attachment 1711927 [details]
preprocessed source (gcc -E)

preprocessed source

Comment 2 Tom "spot" Callaway 2020-08-19 18:11:14 UTC
One note I just realized: this error is being thrown at link time, but only for the clear_cdm_key target. The code is compiled for the headless_shell and chrome targets as well, and it links successfully into those binaries. The clear_cdm_key target is built from the chrome buildenv, so it inherits the chrome build of videodev.o, but it cannot link into the libclearkey.so file.

I forced it to rebuild the ffmpeg bits again and it did not change the outcome.

See:

[6/6] g++ -shared -Wl,--fatal-warnings -fPIC -Wl,-z,noexecstack -Wl,-z,relro -Wl,-z,now -Wl,-z,defs -Wl,--as-needed -Wl,-O2 -Wl,--gc-sections -rdynamic -o "./libclearkeycdm.so" -Wl,-soname="libclearkeycdm.so" @"./libclearkeycdm.so.rsp"
FAILED: libclearkeycdm.so 
g++ -shared -Wl,--fatal-warnings -fPIC -Wl,-z,noexecstack -Wl,-z,relro -Wl,-z,now -Wl,-z,defs -Wl,--as-needed -Wl,-O2 -Wl,--gc-sections -rdynamic -o "./libclearkeycdm.so" -Wl,-soname="libclearkeycdm.so" @"./libclearkeycdm.so.rsp"
obj/third_party/ffmpeg/ffmpeg_internal/videodsp.o: in function `ff_prefetch_aarch64':
(.text+0x10): relocation truncated to fit: R_AARCH64_CONDBR19 against symbol `ff_prefetch_aarch64' defined in .text section in obj/third_party/ffmpeg/ffmpeg_internal/videodsp.o
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.

Comment 3 Tom "spot" Callaway 2020-08-19 18:12:54 UTC
Created attachment 1711929 [details]
rsp file

Comment 4 Tom "spot" Callaway 2020-08-19 18:14:02 UTC
I have an F33 aarch64 instance at the ready to test or provide more debugging info.

Comment 5 Tom "spot" Callaway 2020-08-19 18:36:01 UTC
When I downgraded binutils to 2.34-4.fc32, the issue disappears. Reassigning to binutils.

Comment 6 Tom "spot" Callaway 2020-08-19 19:58:57 UTC
Additional data point: forcing gold at link time works around this error:

g++ -shared -Wl,--fatal-warnings -fPIC -Wl,-z,noexecstack -Wl,-z,relro -Wl,-z,now -Wl,-z,defs -Wl,--as-needed -Wl,-O2 -Wl,--gc-sections -rdynamic -o "./libclearkeycdm.so" -Wl,-soname="libclearkeycdm.so" @"./libclearkeycdm.so.rsp" -fuse-ld=gold

This isn't really practical to do for chromium, because gold is not reliable for the other parts. Hopefully it does help narrow where the bug is though. Please let me know what more debugging I can generate for you.

Comment 7 Nick Clifton 2020-08-21 09:02:28 UTC
Hi Tom,

  It sounds to me like this might due to how the linker is assigning sections to the address space.
  (ld.bfd and ld.gold have different heuristics for this).  Reproducing the problem is going to be
  a pain without all of the object files specified in the .rsp file, and all of the system libraries
  too.

  As a possible workaround, if you change the code in ff_prefetch_aarch64 to:

        add       x0,  x0,  x1,  lsl #1
        b.le      1f
        b         X(ff_prefetch_aarch64)
     1:
        ret

  Would that suffice ?

Cheers
  Nick

Comment 8 Tom "spot" Callaway 2020-08-21 15:53:22 UTC
That workaround resolves the issue. I still have this environment spun up on EC2, so if you need something to figure out what the proper binutils fix is, let me know.

Comment 9 Nick Clifton 2020-08-22 15:27:09 UTC
Hi Tom,

  I have been talking to some of the engineers at ARM but there is a problem with the
  example code:

    function ff_prefetch_aarch64, export=1
    [...]
    b.gt X(ff_prefetch_aarch64)

  Is this really a branch back to the start of the ff_prefetch_aarch64 function ?
  It seems likely that the X() macro mangles the name somehow, but this is not
  clear from the text.

  Are you able to upload a copy of the object file containing this assembled code ?
  That would help with the analysis.

  Also - does adding "-Wl,--stub-group-size=-1" or "-Wl,--stub-group-size=+1" to
  the final g++ command line make the problem go away ?

Cheers
  Nick

Comment 10 Jakub Jelinek 2020-08-24 07:07:28 UTC
Nick, I think you want the preprocessed assembly (i.e. videosp.s) instead, there are quite complicated macros and perhaps something changed on the binutils as side on how they are handled.
See https://fossies.org/linux/ffmpeg/libavutil/aarch64/asm.S ?

Comment 11 Nick Clifton 2020-08-24 10:02:21 UTC
(In reply to Jakub Jelinek from comment #10)
> Nick, I think you want the preprocessed assembly (i.e. videosp.s) instead,

Actually both would be good.  The point about seeing the object file is that I can examine the relocations that have been created.  This will tell me the real name of the function that is the destination of the branch, plus also the type of the relocation.  The AArch64 ABI specifies that the R_AARCH64_CONDBR19 relocation should not be used if the branch could be out of range.  So I need to know if this is an assembler problem - the assembler is not flagging up potentially bad code - or a linker problem - the linker is trying to relax the branch but generating the wrong relocation.

Of course I can assemble the .s file, but there is the possibility that I might not be using the same set of command line options as Tom's build.  So an already assembled file is best.

Comment 12 Jakub Jelinek 2020-08-24 10:06:26 UTC
True.  And with the workaround you've provided, it might be also interesting to see how it got linked (i.e. what the unconditional branch branched around by the negated conditional branch branches too, how far exactly it is).

Comment 13 Tom "spot" Callaway 2020-08-24 15:26:16 UTC
Okay. 

1. Neither -Wl,--stub-group-size=-1 nor -Wl,--stub-group-size=+1 had any impact on the issue, it still throws the error at link of libclearkeycdm.so.
2. I will attach a copy of the generated videodsp.o without the workaround patch as videodsp-no-patch.o
3. I generated preprocessed source without the workaround patch like this:
gcc -MMD -MF obj/third_party/ffmpeg/ffmpeg_internal/videodsp.o.d -DHAVE_AV_CONFIG_H -D_POSIX_C_SOURCE=200112 -D_XOPEN_SOURCE=600 -DPIC -DFFMPEG_CONFIGURATION=NULL -D_ISOC99_SOURCE -D_LARGEFILE_SOURCE -DUSE_UDEV -DUSE_AURA=1 -DUSE_GLIB=1 -DUSE_NSS_CERTS=1 -DUSE_X11=1 -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DNDEBUG -DNVALGRIND -DDYNAMIC_ANNOTATIONS_ENABLED=0 -DOPUS_FIXED_POINT -I../../third_party/ffmpeg/chromium/config/Chromium/linux/arm64 -I../../third_party/ffmpeg -I../../third_party/ffmpeg/compat/atomics/gcc -I../.. -Igen -I../../third_party/opus/src/include -DHAVE_VFP_ARGS=1 -fPIC -fno-strict-aliasing --param=ssp-buffer-size=4 -fstack-protector -funwind-tables -fPIC -pipe -pthread -std=gnu11 -g0 -c ../../third_party/ffmpeg/libavcodec/aarch64/videodsp.S -E &> videodsp-preprocessed-E-no-patch.output
4. I then applied the patch and compiled again, this copy of videodsp.o will be attached as videodsp-patched.o
5. Same for the preprocessed source (videodsp-preprocessed-E-patched.output).

Comment 14 Tom "spot" Callaway 2020-08-24 15:26:59 UTC
Created attachment 1712386 [details]
binary object, no patch

Comment 15 Tom "spot" Callaway 2020-08-24 15:27:20 UTC
Created attachment 1712387 [details]
binary object, with patch

Comment 16 Tom "spot" Callaway 2020-08-24 15:27:45 UTC
Created attachment 1712388 [details]
preprocessed source (gcc -E), no patch

Comment 17 Tom "spot" Callaway 2020-08-24 15:28:06 UTC
Created attachment 1712389 [details]
preprocessed source (gcc -E), patch applied

Comment 18 Nick Clifton 2020-08-25 10:40:31 UTC
Hi Tom,

  Thanks for the uploads.

  So, it does look like the code is looping back on itself.  The disassembled
  output for the no-patch object file looks like this:

  % objdump -dr videodsp-no-patch.o
  0000000000000000 <ff_prefetch_aarch64>:
     0:	71000842 	subs	w2, w2, #0x2
     4:	f9800001 	prfm	pldl1strm, [x0]
     8:	f8a16801 	prfm	pldl1strm, [x0, x1]
     c:	8b010400 	add	x0, x0, x1, lsl #1
    10:	5400000c 	b.gt	0 <ff_prefetch_aarch64>
			10: R_AARCH64_CONDBR19	ff_prefetch_aarch64
    14:	d65f03c0 	ret

  The problem is that the ff_prefetch_aarch64 symbol is global, so the linker
  changes the branch to go to the entry in the PLT table for the function.  Which,
  because the program is so big, is too far away for a conditional branch.  (You
  were lucky with the other builds that the branch was in range.  This problem
  could have arisen at any time in the past, if the build had been big enough).

  So this actually looks like a programming bug.  What the assembly code
  should look like is this:

    function ff_prefetch_aarch64, export=1
    1:
        subs            w2,  w2,  #2
        prfm            pldl1strm, [x0]
        prfm            pldl1strm, [x0,  x1]
        add             x0,  x0,  x1,  lsl #1
        b.gt            1b
        ret
    endfunc

  Ie a conditional branch back to a local symbol placed at the start of the
  function.  (You should find that this code is actually faster than the old
  code, since now there will be no indirection via the PLT).

  In general the rule is that AArch64 assembler code should never use a 
  conditional branch to move to the start of a global function.  Does this
  make sense ?

Cheers
  Nick

Comment 19 Tom "spot" Callaway 2020-08-25 14:18:28 UTC
Okay, that makes sense, except that:

1. Downgrading to the old binutils works.
2. This code has been in ffmpeg for a long time without issue.

Is it plausible that binutils has gotten stricter in some way?

Comment 20 Jeff Law 2020-08-25 14:37:50 UTC
Its certainly possible that binutils has gotten stricter, but that would also imply that the code wasn't being used before since if the branch was out of range you'd end up going to somewhere fairly unpredictable.

It could also be the case that we've got more PLT entries or that the PLT entries are larger and as a result the branch no longer reaches the appropriate PLT entry.

It could also be the case that there's been a layout change in the linker.

There's other scenarios I can theorize about, but those are the most likely in my mind.

Nick could probably dig this out but I'm not sure it's all that useful in the end -- a conditional branch to a PLT entry is a recipe for disaster on many architectures precisely because conditional branches usually have a limited branching range.  The code really needs to get fixed.

Comment 21 Jakub Jelinek 2020-08-25 14:51:54 UTC
Tom, do you have the library linked with older binutils?  Where does the branch branch to in that case?  To the start of the function, or the PLT, and if the latter, how far is it from the function?
I've tried to look at chromium-libs in older Fedora releases for aarch64, but can't find ff_prefetch_aarch64 in there...

Comment 22 Tom "spot" Callaway 2020-08-25 16:08:48 UTC
Created attachment 1712571 [details]
libclearkeycdm.so linked with old binutils and no patch applied to videodsp.S

Comment 23 Tom "spot" Callaway 2020-08-25 16:10:01 UTC
Created attachment 1712572 [details]
videodsp, no patch, old (f32) binutils

Comment 24 Jakub Jelinek 2020-08-25 16:34:48 UTC
In the old binutils built shared library:
000000000048a24c <ff_prefetch_aarch64>:
  48a24c:       71000842        subs    w2, w2, #0x2
  48a250:       f9800001        prfm    pldl1strm, [x0]
  48a254:       f8a16801        prfm    pldl1strm, [x0, x1]
  48a258:       8b010400        add     x0, x0, x1, lsl #1
  48a25c:       54ffff8c        b.gt    48a24c <ff_prefetch_aarch64>
  48a260:       d65f03c0        ret
  48a264:       d503201f        nop
  48a268:       d503201f        nop
  48a26c:       d503201f        nop
and no ff_prefetch_aarch64 in .plt section,

and in the object file:
0000000000000000 <ff_prefetch_aarch64>:
   0:   71000842        subs    w2, w2, #0x2
   4:   f9800001        prfm    pldl1strm, [x0]
   8:   f8a16801        prfm    pldl1strm, [x0, x1]
   c:   8b010400        add     x0, x0, x1, lsl #1
  10:   5400000c        b.gt    0 <ff_prefetch_aarch64>
                        10: R_AARCH64_CONDBR19  ff_prefetch_aarch64
  14:   d65f03c0        ret

So, Nick, why is a PLT entry added for it with the new binutils?

Comment 25 Jakub Jelinek 2020-08-26 12:50:53 UTC
Trying:
.text
.globl foo
foo:
testl %edi, %edi
jne foo
ret
on x86_64-linux and i686-linux, in both cases when this is linked into a shared library the conditional branch goes to the local foo, rather than to a PLT slot.
While when the conditional branch is to bar (undefined external), then it goes through PLT on x86_64 and on i686 becomes a DT_TEXTREL relocation.
So I'd say that aarch64 should behave similarly.

Comment 26 Nick Clifton 2020-08-26 15:37:20 UTC
Hi Jakub,

  What about symbol preemption ?

  If foo (or ff_prefetch_aarch64) has default visibility then it could be replaced by another definition in a different component.  Hence you need to indirect via the PLT.

  Despite saying this however, there has been a change in the binutils which is probably related to this problem:

	* elfnn-aarch64.c (elfNN_aarch64_final_link_relocate): Club
	BFD_RELOC_AARCH64_BRANCH19 and BFD_RELOC_AARCH64_TSTBR14
	cases with BFD_RELOC_AARCH64_JUMP26.
	(elfNN_aarch64_check_relocs): Likewise.

  This patch is intended to handle the case where a conditional branch is made to an undefined symbol.  (The linker used to just change this to a branch to 0 and not issue an error message).  I am talking to the ARM engineers who created the patch to see if they have any ideas on the issue.

Cheers
  Nick

Comment 27 Nick Clifton 2020-09-10 16:50:31 UTC
Hi Tom,

  Would you mind seeing how a build goes with the latest rawhide binutils: binutils-2.35-13.fc34

  It won't fix the problem - you need to patch the assembler source code for that - but it
  should issue a better error message, explicitly stating that that kind of conditional branch
  is not allowed.

Cheers
  Nick

Comment 28 Nick Clifton 2020-09-11 08:09:22 UTC
(In reply to Nick Clifton from comment #27)

>   It won't fix the problem - you need to patch the assembler source code for
> that - but it
>   should issue a better error message, explicitly stating that that kind of
> conditional branch
>   is not allowed.

Actually I take that back.  The patch should fix the build issue, and you do 
not need to change the assembler.  At least that is what I hope will happen.
Please let me know.

Comment 29 Tom "spot" Callaway 2021-02-09 15:45:33 UTC
This is all working now (has been for a while).

Thanks!


Note You need to log in before you can comment on or make changes to this bug.