Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 799153 - ceph fails to build on ARM architectures
Summary: ceph fails to build on ARM architectures
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: ceph
Version: 17
Hardware: arm
OS: Unspecified
high
medium
Target Milestone: ---
Assignee: Josef Bacik
QA Contact: Fedora Extras Quality Assurance
URL: https://github.com/ivmai/libatomic_op...
Whiteboard:
Depends On:
Blocks: ARMTracker
TreeView+ depends on / blocked
 
Reported: 2012-03-01 21:55 UTC by Niels de Vos
Modified: 2012-04-07 12:26 UTC (History)
5 users (show)

Fixed In Version: ceph-0.44-5.fc17
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-04-07 12:26:32 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Proposed patch (unverified) (2.12 KB, patch)
2012-03-01 21:55 UTC, Niels de Vos
no flags Details | Diff
Compile test to narrow down the cause and fix (1001 bytes, text/plain)
2012-03-07 08:22 UTC, Niels de Vos
no flags Details
build failure of ceph-0.44-5 on ARMv5tel (45.62 KB, text/plain)
2012-04-04 07:23 UTC, Niels de Vos
no flags Details
Changed to ceph.spec to complete building of ceph-0.44 (1.32 KB, patch)
2012-04-04 15:33 UTC, Niels de Vos
no flags Details | Diff

Description Niels de Vos 2012-03-01 21:55:36 UTC
Created attachment 566953 [details]
Proposed patch (unverified)

Description of problem:
The Fedora ARM SIG is working on getting ARM a primary architecture. koji-shadow is building all the packages on native ARM machines. ceph is a key package and blocking a lot of other builds that depend on it. 

The current F17 version of the package fails to build on at least two ARM platforms (armv5tel and armv7hl).

Version-Release number of selected component (if applicable):
ceph-0.41-2.fc17 and ceph-0.41-1.fc17

How reproducible:
100%

Steps to Reproduce:
1. fedpkg clone ceph
2. cd ceph
3. fedpkg switch-branch f17
4. arm-koji build --scratch f17 $(fedpkg giturl)
  
Actual results:
Build failures like these on armv5tel:
- http://arm.koji.fedoraproject.org/koji/getfile?taskID=536763&name=build.log

Build failures like these on armv7hl:
- http://arm.koji.fedoraproject.org/koji/getfile?taskID=537202&name=build.log

Expected results:
ceph builds completely.


Additional info:

Errors:
./include/atomic.h: In member function 'size_t ceph::atomic_t::inc()':
./include/atomic.h:40:36: error: 'AO_fetch_and_add1' was not declared in this scope
./include/atomic.h: In member function 'size_t ceph::atomic_t::dec()':
./include/atomic.h:43:42: error: 'AO_fetch_and_sub1_write' was not declared in this scope
...

The ceph.spec contain a BuildRequires for libatomic_ops-devel, which is provided by gc.

AO_fetch_and_add1 is defined in usr/include/atomic_ops/sysdeps/gcc/arm.h. That file only does some useful stuff for ARMv6 and higher architectures. A possible workaround is to define AO_USE_PTHREAD_DEFS and not use the optimised code for armv5tel. 

An unmodified v7hl-only scratch build failed as well:
- http://arm.koji.fedoraproject.org/koji/taskinfo?taskID=537177

Errors:
/usr/bin/ld: ./.libs/libosdc.a(libcommon_la-ceph_context.o): undefined reference to symbol 'sem_post@@GLIBC_2.4'
/usr/bin/ld: note: 'sem_post@@GLIBC_2.4' is defined in DSO /lib/libpthread.so.0 so try adding it to the linker command line
/lib/libpthread.so.0: could not read symbols: Invalid operation


I guess that AO_USE_PTHREAD_DEFS should be defined automatically within the libatomic_ops-devel package (created from the gc srpm) on < ARMv6. It also requires adding -lpthread to LDFLAGS (both armv5tel and armv7hl). But a possible workaround like the attached patch may be the quicker solution to get ceph built.

A scratch-build with the attached patch will show if I am right or wrong:
- http://arm.koji.fedoraproject.org/koji/taskinfo?taskID=538271

Comment 1 Niels de Vos 2012-03-02 21:50:43 UTC
The test-build finished correctly on both architecttures.

gc-7.2-0.7.alpha6.fc17 includes a patch (not upstream yet) that should cause armv5tel to fall back on AO_USE_PTHREAD_DEFS by default. The first error mentioned in comment #0 should have been solved with this new gc package.

The second error which is caused by the missing LDFLAG -lpthread needs to be fixed in ceph itself. I am not sure what the best upstreamable solution for that is. I guess that configure.ac can contain a test for the architectute, and if it is running on ARM, add -lpthread to the (global?) LDFLAGS.

Thoughts?

Comment 2 Niels de Vos 2012-03-05 12:58:47 UTC
Upstream libatomic_ops-devel says that the problem is not a bug. There is no guarantee that fetch_and_add() and similar is available. configure.ac may need extensions to check for AO_HAVE_* defines. Alternatively defining AO_REQUIRE_CAS will force the fetch_and_add() and similar to be made available.

Information gathered from:
- https://github.com/ivmai/libatomic_ops/issues/3#issuecomment-4302250

Currently running test-build with defining AO_REQUIRE_CAS:
- http://arm.koji.fedoraproject.org/koji/taskinfo?taskID=560468

Actions needed:
If the build succeeds, ceph needs to define AO_REQUIRE_CAS and link against libpthread. This is likely something that needs to be passed to upstream ceph, but we can have it in our spec file for the time being.

Comment 3 Niels de Vos 2012-03-07 08:22:51 UTC
Created attachment 568191 [details]
Compile test to narrow down the cause and fix

This test-case shows that there is an issue with libatomic_ops-devel that affects the building of ceph on ARMv5tel.

Notes from the comments in the attached rhbz799153-libatomic_ops.c:
> Upstream thinks/suggests that AO_REQUIRE_CAS should be sufficient.
> AO_USE_PTHREAD_DEFS should not be needed, but is seems to be the only
> working solution to compile ceph.
>
> This little test uses one of the problematic functions that cause the
> ceph build to fail. If we can find the correct switches to compile
> this, ceph should be able to use them as well.

Until upstream in https://github.com/ivmai/libatomic_ops/issues/3 advises us with a working solution, I suggest to use the attached proposed patch for the ceph.spec.

ARMv5tel should fall-back on using pthread implementation for atomic_ops (limitations documented at https://github.com/ivmai/libatomic_ops/blob/master/doc/README.txt#L39). ARMv7hl should NOT define AO_USE_PTHREAD_DEFS as there is an optimized implementation available for ARMv6+.

Comment 4 Niels de Vos 2012-03-07 09:59:22 UTC
There also seems to be an ./configure option called --without-libatomic-ops. Maybe we should use this on ARMv5tel?

A new test-build at:
- http://arm.koji.fedoraproject.org/koji/taskinfo?taskID=571629

Comment 5 Niels de Vos 2012-03-07 11:08:21 UTC
Okay, the build in comment #4 failed as well...

This is getting confusing :-/

To verify, an other build where AO_USE_PTHREAD_DEFS is defined (and not AO_REQUIRE_CAS). The build.log is already passed the critical section where other builds failed (http://arm.koji.fedoraproject.org/koji/taskinfo?taskID=571842).


Summary:

After testing and checking with upstream, the only working solution that I am aware of, is making two changes in the ceph.spec:
1. add CFLAGS=-lpthread to any ARM build
2. add CFLAGS=-DAO_USE_PTHREAD_DEFS for ARMv5tel

These are exactly the changes in the attached patch from comment #0.

Comment 6 Niels de Vos 2012-04-04 07:23:44 UTC
Created attachment 575052 [details]
build failure of ceph-0.44-5 on ARMv5tel

After updating it seems that the proposed solution does not longer work. The build.log contains the exact error:

    {standard input}:276346: Error: unknown pseudo-op: `.lb'

My current best guess is that libatomic_ops contains this assembler instruction for an ARMv5tel branch.

Comment 7 Niels de Vos 2012-04-04 11:41:31 UTC
Adding -DAO_USE_PTHREAD_DEFS=1 to CFLAGS on ARMv5tel and pass on to ./configure makes the build pass the previous error:
http://arm.koji.fedoraproject.org/koji/taskinfo?taskID=688847

Building unchanged sources for ARMv7hl:
http://arm.koji.fedoraproject.org/koji/taskinfo?taskID=688368

It is unclear to me if we still need to pass -lpthread in LDFLAGS. When building has finished/aborted we'll see again.

Comment 8 Niels de Vos 2012-04-04 13:23:18 UTC
ARMv7hl needs the -lpthread in LDFLAGS:

/usr/bin/ld: ./.libs/libosdc.a(libcommon_la-ceph_context.o): undefined reference to symbol 'sem_post@@GLIBC_2.4'
/usr/bin/ld: note: 'sem_post@@GLIBC_2.4' is defined in DSO /lib/libpthread.so.0 so try adding it to the linker command line

Comment 9 Niels de Vos 2012-04-04 14:55:47 UTC
The modified (comment #7) ARMv5tel src.rpm needs -lpthread in LDFLAGS too:

/usr/bin/ld: ./.libs/libosdc.a(libcommon_la-ceph_context.o): undefined reference to symbol 'sem_post@@GLIBC_2.4'
/usr/bin/ld: note: 'sem_post@@GLIBC_2.4' is defined in DSO /lib/libpthread.so.0 so try adding it to the linker command line
/lib/libpthread.so.0: could not read symbols: Invalid operation

Comment 10 Niels de Vos 2012-04-04 15:33:46 UTC
Created attachment 575174 [details]
Changed to ceph.spec to complete building of ceph-0.44

Updates in the ceph.spec needed to build ceph on ARM:
- Add LDFLAGS=-lpthread on any ARM architecture
- Add CFLAGS=-DAO_USE_PTHREAD_DEFS on ARMv5tel

Test build with this patch:
http://arm.koji.fedoraproject.org/koji/taskinfo?taskID=690253

The patch can be applied with 'patch -p1 < bz799153-build-ceph-on-arm.patch' in a cloned ceph package repository.

Note that the patch adds .0.arm to the release of the package, this needs to be corrected before committing the changes.

Comment 11 Niels de Vos 2012-04-05 08:19:56 UTC
Patch has been applied and a build has finished successfully:
- http://koji.fedoraproject.org/koji/buildinfo?buildID=311692
- http://arm.koji.fedoraproject.org/koji/taskinfo?taskID=690253

Comment 12 Niels de Vos 2012-04-07 12:26:32 UTC
The errata is available available at https://admin.fedoraproject.org/updates/FEDORA-2012-5354 but seems to be missing a reference to this Bug.

Assuming the bugfixes will be included in the next releases, therefore closing this one out.


Note You need to log in before you can comment on or make changes to this bug.