Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.

Bug 1366328

Summary: libosmium 2.6+ is FTBFS on aarch64/ppc64le due to failing tests
Product: [Fedora] Fedora Reporter: Peter Robinson <pbrobinson>
Component: libosmiumAssignee: Tom Hughes <tom>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: tom
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-17 22:30:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 245418, 1071880, 922257, 1051573    

Description Peter Robinson 2016-08-11 15:58:19 UTC
libosmium-2.8.0-1.fc25

http://arm.koji.fedoraproject.org/koji/taskinfo?taskID=3665401

aarch64 test failures:

The following tests FAILED:
	 35 - io_test_reader (Failed)
	 36 - io_test_reader_with_mock_decompression (Failed)
	 37 - io_test_reader_with_mock_parser (Failed)
	 39 - io_test_output_iterator (Failed)
	 41 - io_test_writer (Failed)
	 57 - testdata-testcases (Failed)
	 58 - testdata-xml (Failed)
	 59 - testdata-overview (OTHER_FAULT)
	 60 - testdata-multipolygon (Failed)

Comment 1 Tom Hughes 2016-08-14 10:36:47 UTC
I'm struggling to find any way to figure this out without access to aarch64 hardware.

They mostly seem to be heap corruptions so I assumed they would be easy to find with valgrind or ASAN on another platform but I've tried both and found nothing so far.

Comment 2 Peter Robinson 2016-08-14 10:41:08 UTC
(In reply to Tom Hughes from comment #1)
> I'm struggling to find any way to figure this out without access to aarch64
> hardware.

You can run a aarch64 install in a VM on x86_64 
https://fedoraproject.org/wiki/Architectures/AArch64/F24/Installation#Install_with_QEMU

Comment 3 Tom Hughes 2016-08-14 10:42:54 UTC
Yeah I've done that for ARM32 in the past and it's not an experience I particularly want to repeat...

Maybe when I've got a week or two spare I'll take a look.

Comment 4 Tom Hughes 2016-08-14 14:43:45 UTC
So I got an F24 VM up and running, distro synced to rawhide and rebooted ready to try a build only to find it no longer boots and bombs out before reaching grub with:

Failed to set MokListRT: Invalid Parameter
FSOpen: Open '\EFI\fedora\grubaa64.efi' Success


Synchronous Exception at 0x00000000B834C498

  X0 0x1100069417FFFFE2   X1 0x00000000BBFF0018   X2 0x00000000B83526E8   X3 0x00000000000FD000
  X4 0x0000000000000000   X5 0x0000000000000007   X6 0x0000000000000000   X7 0x00000000BBBE24D4
  X8 0x0000000000000208   X9 0x00000000BF02AA00  X10 0x0000000000000023  X11 0x00000000000000AB
 X12 0x0000000070FFE07A  X13 0x0000000000000000  X14 0x0000000000000000  X15 0x0000000000000000
 X16 0x00000000BF02AC80  X17 0x0000000000000000  X18 0x0000000000000000  X19 0x00000000BBFF0018
 X20 0x0000000000000000  X21 0x00000000B8352000  X22 0x0000000000000000  X23 0xAA1303E4F940E022
 X24 0x0000000000000000  X25 0x0000000000000000  X26 0x0000000000000000  X27 0x0000000000000000
 X28 0x0000000000000000   FP 0x00000000BF02AA10   LR 0x00000000B834D040  

  V0 0x0000000000000000 0000000000000000   V1 0x0000000000000000 0000000000000000
  V2 0x0000000000000000 0000000000000000   V3 0x0000000000000000 0000000000000000
  V4 0x0000000000000000 0000000000000000   V5 0x0000000000000000 0000000000000000
  V6 0x0000000000000000 0000000000000000   V7 0x0000000000000000 0000000000000000
  V8 0x0000000000000000 0000000000000000   V9 0x0000000000000000 0000000000000000
 V10 0x0000000000000000 0000000000000000  V11 0x0000000000000000 0000000000000000
 V12 0x0000000000000000 0000000000000000  V13 0x0000000000000000 0000000000000000
 V14 0x0000000000000000 0000000000000000  V15 0x0000000000000000 0000000000000000
 V16 0x0000000000000000 0000000000000000  V17 0x0000000000000000 0000000000000000
 V18 0x0000000000000000 0000000000000000  V19 0x0000000000000000 0000000000000000
 V20 0x0000000000000000 0000000000000000  V21 0x0000000000000000 0000000000000000
 V22 0x0000000000000000 0000000000000000  V23 0x0000000000000000 0000000000000000
 V24 0x0000000000000000 0000000000000000  V25 0x0000000000000000 0000000000000000
 V26 0x0000000000000000 0000000000000000  V27 0x0000000000000000 0000000000000000
 V28 0x0000000000000000 0000000000000000  V29 0x0000000000000000 0000000000000000
 V30 0x0000000000000000 0000000000000000  V31 0x0000000000000000 0000000000000000

  SP 0x00000000BF02AA10  ELR 0x00000000B834C498  SPSR 0x60000305  FPSR 0x00000000
 ESR 0x94000004          FAR 0x1100069417FFFFE2

 ESR : EC 0x25  IL 0x0  ISS 0x00000004

Data abort: Translation fault, zeroth level
ASSERT [ArmCpuDxe] /builddir/build/BUILD/tianocore-edk2-a8c39ba/ArmPkg/Library/DefaultExceptionHandlerLib/AArch64/DefaultExceptionHandler.c(184): ((BOOLEAN)(0==1))

Comment 5 Tom Hughes 2016-08-15 21:47:41 UTC
So I've managed to build this on aarch64 now and run valgrind on one of the failing tests and the first report is:

==11158== Thread 2 _osmium_write:
==11158== Invalid read of size 8
==11158==    at 0x1B5674: wait (future:325)
==11158==    by 0x1B5674: _M_get_result (future:687)
==11158==    by 0x1B5674: get (future:766)
==11158==    by 0x1B5674: pop (queue_util.hpp:142)
==11158==    by 0x1B5674: operator() (write_thread.hpp:85)
==11158==    by 0x1B5674: osmium::io::Writer::write_thread(osmium::thread::Queue<std::future<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, std::unique_ptr<osmium::io::Compressor, std::default_delete<osmium::io::Compressor> >&&, std::promise<bool>&&) (writer.hpp:124)
==11158==    by 0x49BABFB: ??? (in /usr/lib64/libstdc++.so.6.0.22)
==11158==    by 0x48C7173: start_thread (in /usr/lib64/libpthread-2.24.90.so)
==11158==    by 0x4C93F87: thread_start (in /usr/lib64/libc-2.24.90.so)
==11158==  Address 0x4d6cba0 is 16 bytes inside a block of size 48 free'd
==11158==    at 0x48854F4: operator delete(void*) (vg_replace_malloc.c:576)
==11158==    by 0x1B58A3: _M_release (shared_ptr_base.h:166)
==11158==    by 0x1B58A3: ~__shared_count (shared_ptr_base.h:662)
==11158==    by 0x1B58A3: ~__shared_ptr (shared_ptr_base.h:928)
==11158==    by 0x1B58A3: ~shared_ptr (shared_ptr.h:93)
==11158==    by 0x1B58A3: ~__basic_future (future:641)
==11158==    by 0x1B58A3: ~future (future:731)
==11158==    by 0x1B58A3: destroy<std::future<std::__cxx11::basic_string<char> > > (new_allocator.h:124)
==11158==    by 0x1B58A3: destroy<std::future<std::__cxx11::basic_string<char> > > (alloc_traits.h:467)
==11158==    by 0x1B58A3: pop_front (stl_deque.h:1554)
==11158==    by 0x1B58A3: pop (stl_queue.h:271)
==11158==    by 0x1B58A3: wait_and_pop (queue.hpp:171)
==11158==    by 0x1B58A3: pop (queue_util.hpp:141)
==11158==    by 0x1B58A3: operator() (write_thread.hpp:85)
==11158==    by 0x1B58A3: osmium::io::Writer::write_thread(osmium::thread::Queue<std::future<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, std::unique_ptr<osmium::io::Compressor, std::default_delete<osmium::io::Compressor> >&&, std::promise<bool>&&) (writer.hpp:124)
==11158==    by 0x49BABFB: ??? (in /usr/lib64/libstdc++.so.6.0.22)
==11158==    by 0x48C7173: start_thread (in /usr/lib64/libpthread-2.24.90.so)
==11158==    by 0x4C93F87: thread_start (in /usr/lib64/libc-2.24.90.so)
==11158==  Block was alloc'd at
==11158==    at 0x48843D4: operator new(unsigned long) (vg_replace_malloc.c:334)
==11158==    by 0x1B94AF: allocate (new_allocator.h:104)
==11158==    by 0x1B94AF: allocate (alloc_traits.h:416)
==11158==    by 0x1B94AF: __allocate_guarded<std::allocator<std::_Sp_counted_ptr_inplace<std::__future_base::_State_baseV2, std::allocator<std::__future_base::_State_baseV2>, (__gnu_cxx::_Lock_policy)2u> > > (allocated_ptr.h:103)
==11158==    by 0x1B94AF: __shared_count<std::__future_base::_State_baseV2, std::allocator<std::__future_base::_State_baseV2> > (shared_ptr_base.h:613)
==11158==    by 0x1B94AF: __shared_ptr<std::allocator<std::__future_base::_State_baseV2> > (shared_ptr_base.h:1100)
==11158==    by 0x1B94AF: shared_ptr<std::allocator<std::__future_base::_State_baseV2> > (shared_ptr.h:319)
==11158==    by 0x1B94AF: allocate_shared<std::__future_base::_State_baseV2, std::allocator<std::__future_base::_State_baseV2> > (shared_ptr.h:620)
==11158==    by 0x1B94AF: make_shared<std::__future_base::_State_baseV2> (shared_ptr.h:636)
==11158==    by 0x1B94AF: promise (future:1024)
==11158==    by 0x1B94AF: void osmium::io::detail::add_to_queue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(osmium::thread::Queue<std::future<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&) (queue_util.hpp:81)
==11158==    by 0x1B98C7: send_to_output_queue (output_format.hpp:117)
==11158==    by 0x1B98C7: osmium::io::detail::XMLOutputFormat::write_header(osmium::io::Header const&) (xml_output_format.hpp:475)
==11158==    by 0x1BA46F: operator() (writer.hpp:231)
==11158==    by 0x1BA46F: ensure_cleanup<osmium::io::Writer::Writer(const osmium::io::File&, TArgs&& ...) [with TArgs = {osmium::io::Header&, osmium::io::overwrite}]::<lambda()> > (writer.hpp:152)
==11158==    by 0x1BA46F: osmium::io::Writer::Writer<osmium::io::Header&, osmium::io::overwrite>(osmium::io::File const&, osmium::io::Header&, osmium::io::overwrite&&) (writer.hpp:230)
==11158==    by 0x1BAA9F: osmium::io::Writer::Writer<osmium::io::Header&, osmium::io::overwrite>(char const*, osmium::io::Header&, osmium::io::overwrite&&) (writer.hpp:242)
==11158==    by 0x1B048B: ____C_A_T_C_H____T_E_S_T____7() (test_output_iterator.cpp:11)
==11158==    by 0x1C81EB: invoke (catch.hpp:6582)
==11158==    by 0x1C81EB: invoke (catch.hpp:7519)
==11158==    by 0x1C81EB: invokeActiveTestCase (catch.hpp:6158)
==11158==    by 0x1C81EB: runCurrentTest (catch.hpp:6129)
==11158==    by 0x1C81EB: runTest (catch.hpp:5949)
==11158==    by 0x1C81EB: Catch::runTests(Catch::Ptr<Catch::Config> const&) (catch.hpp:6297)
==11158==    by 0x1AF597: run (catch.hpp:6405)
==11158==    by 0x1AF597: run (catch.hpp:6384)
==11158==    by 0x1AF597: main (catch.hpp:10333)

which relates to this piece of code:

  std::future<T> data_future;
  m_queue.wait_and_pop(data_future);
  data = std::move(data_future.get());

and specifically seems to say that the attempt to get the value of the future on the third line is accessing memory that was freed while popping the future from the queue on the previous line.

That doesn't seem to make much sense though, because wait_and_pop is doing:

  value = std::move(m_queue.front());
  m_queue.pop();

so the returned value is moved out of the queue before the pop and hence the pop should be destroying an empty value, not the returned value.

Comment 6 Peter Robinson 2016-08-15 21:54:45 UTC
> which relates to this piece of code:
> 
>   std::future<T> data_future;
>   m_queue.wait_and_pop(data_future);
>   data = std::move(data_future.get());
> 
> and specifically seems to say that the attempt to get the value of the
> future on the third line is accessing memory that was freed while popping
> the future from the queue on the previous line.
> 
> That doesn't seem to make much sense though, because wait_and_pop is doing:
> 
>   value = std::move(m_queue.front());
>   m_queue.pop();
> 
> so the returned value is moved out of the queue before the pop and hence the
> pop should be destroying an empty value, not the returned value.

Compiler error?

Comment 7 Tom Hughes 2016-08-15 21:58:58 UTC
I'm wondering about a possible compiler or libstdc++ issue yes...

Comment 8 Tom Hughes 2016-11-16 10:44:03 UTC
This also seems to fail in the same way on ppc64le (but not ppc64) so I have excluded that as well for now. Failed scratch build logs for 2.10.2 on each:

https://kojipkgs.fedoraproject.org/work/tasks/2495/16472495/build.log
https://kojipkgs.fedoraproject.org/work/tasks/2499/16472499/build.log

Also reported upstream now:

https://github.com/osmcode/libosmium/issues/176

Comment 9 Tom Hughes 2016-11-20 13:36:04 UTC
Restoring correct trackers per current ExcludeArch policy (see https://fedoraproject.org/wiki/Packaging:Guidelines#Architecture_Build_Failures).

Comment 10 Tom Hughes 2017-01-17 22:30:47 UTC
I have no idea what changed but libosmium 2.11.0 has built successfully and passed tests on all architectures so either something has changed in libosmium or some bug has been fixed in the toolchain or system libraries.