Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.

Bug 1705301

Summary: mpi4py FTBFS with Python 3.8
Product: [Fedora] Fedora Reporter: Miro Hrončok <mhroncok>
Component: mpi4pyAssignee: Thomas Spura <tomspur>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: rawhideCC: dakingun, orion, python-sig, tomspur, zbyszek
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-03 15:47:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1686977, 1705296    
Attachments:
Description Flags
Full log from Copr none

Description Miro Hrončok 2019-05-01 23:52:00 UTC
Created attachment 1561224 [details]
Full log from Copr

After the symptoms described in bz1705296 I've rebuilt mpich and openmpi, but now mpi4py no longer builds. That is  mpi4py-3.0.1-4.fc31:


======================================================================
FAIL: testCompareAndSwap (test_rma.TestRMASelf)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/test_rma.py", line 228, in testCompareAndSwap
    self.assertEqual(rbuf[1], -1)
AssertionError: 0 != -1

======================================================================
FAIL: testFetchAndOp (test_rma.TestRMASelf)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/test_rma.py", line 190, in testFetchAndOp
    self.assertEqual(rbuf[1], -1)
AssertionError: -116 != -1

======================================================================
FAIL: testCompareAndSwap (test_rma.TestRMAWorld)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/test_rma.py", line 228, in testCompareAndSwap
    self.assertEqual(rbuf[1], -1)
AssertionError: 0 != -1

======================================================================
FAIL: testFetchAndOp (test_rma.TestRMAWorld)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/test_rma.py", line 190, in testFetchAndOp
    self.assertEqual(rbuf[1], -1)
AssertionError: -124 != -1

----------------------------------------------------------------------
Ran 1100 tests in 3.549s

FAILED (failures=4, skipped=61)
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[9253,1],0]
  Exit code:    1
--------------------------------------------------------------------------
error: Bad exit status from /var/tmp/rpm-tmp.VaRIOu (%check)

Full log attached.

Comment 1 Zbigniew Jędrzejewski-Szmek 2019-05-02 18:32:33 UTC
I opened https://bitbucket.org/mpi4py/mpi4py/issues/124/test-failure-with-openmpi-401.

Comment 2 Miro Hrončok 2019-05-27 10:20:02 UTC
There is a new failure after 3.8.0a4:

src/mpi4py.MPI.c:314:11: error: too few arguments to function ‘PyCode_New’
  314 |           PyCode_New(a, k, l, s, f, code, c, n, v, fv, cell, fn, name, fline, lnos)
      |           ^~~~~~~~~~

This means that the sources need to be recythonized.

Comment 3 Miro Hrončok 2019-05-27 10:23:57 UTC
Adding this to %prep seems to help:

# Remove precythonized C sources
rm $(grep -rl '/\* Generated by Cython')



Building in Copr to see if the previous failure is still there.

Comment 4 Miro Hrončok 2019-05-27 10:39:50 UTC
Recythonizing the sources leads to:

+ mpiexec -np 1 python3 test/runtests.py -v --no-builddir --thread-level=serialized -e spawn
[41f0acf557e440989184fec990a11425:4660 :0:4660] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x7ff83a90b948)
==== backtrace ====
    0  /lib64/libucs.so.0(+0x194a3) [0x7ff83a8934a3]
    1  /lib64/libucs.so.0(+0x1965a) [0x7ff83a89365a]
    2  /lib64/libuct.so.0(+0x1b72b) [0x7ff83aa4172b]
    3  /lib64/ld-linux-x86-64.so.2(+0xfe4a) [0x7ff83da9fe4a]
    4  /lib64/ld-linux-x86-64.so.2(+0xff51) [0x7ff83da9ff51]
    5  /lib64/ld-linux-x86-64.so.2(+0x13eae) [0x7ff83daa3eae]
    6  /lib64/libc.so.6(_dl_catch_exception+0x79) [0x7ff83d9ff1f9]
    7  /lib64/ld-linux-x86-64.so.2(+0x1372e) [0x7ff83daa372e]
    8  /lib64/libdl.so.2(+0x239c) [0x7ff83d53739c]
    9  /lib64/libc.so.6(_dl_catch_exception+0x79) [0x7ff83d9ff1f9]
   10  /lib64/libc.so.6(_dl_catch_error+0x33) [0x7ff83d9ff293]
   11  /lib64/libdl.so.2(+0x2b09) [0x7ff83d537b09]
   12  /lib64/libdl.so.2(dlopen+0x4a) [0x7ff83d53742a]
   13  /usr/lib64/openmpi/lib/libopen-pal.so.40(+0x6ead7) [0x7ff83cb23ad7]
   14  /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_component_repository_open+0x1f4) [0x7ff83cb01524]
   15  /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_component_find+0x35b) [0x7ff83cb004eb]
   16  /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_framework_components_register+0x2e) [0x7ff83cb0bdfe]
   17  /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_framework_register+0x256) [0x7ff83cb0c2e6]
   18  /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_framework_open+0x14) [0x7ff83cb0c344]
   19  /usr/lib64/openmpi/lib/libmpi.so.40(ompi_mpi_init+0x695) [0x7ff83cc76795]
   20  /usr/lib64/openmpi/lib/libmpi.so.40(PMPI_Init_thread+0x99) [0x7ff83cca6bf9]
   21  /builddir/build/BUILDROOT/mpi4py-3.0.1-2.fc31.x86_64/usr/lib64/python3.8/site-packages/openmpi/mpi4py/MPI.cpython-38-x86_64-linux-gnu.so(+0x329bc) [0x7ff83cd849bc]
   22  /lib64/libpython3.8.so.1.0(PyModule_ExecDef+0x77) [0x7ff83d724b27]
   23  /lib64/libpython3.8.so.1.0(+0x1c7b93) [0x7ff83d724b93]
   24  /lib64/libpython3.8.so.1.0(_PyMethodDef_RawFastCallDict+0x350) [0x7ff83d67f9e0]
   25  /lib64/libpython3.8.so.1.0(_PyCFunction_FastCallDict+0x23) [0x7ff83d67fa93]
   26  /lib64/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x640d) [0x7ff83d6e82ad]
   27  /lib64/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x311) [0x7ff83d66e721]
   28  /lib64/libpython3.8.so.1.0(_PyFunction_FastCallKeywords+0x196) [0x7ff83d6ac346]
   29  /lib64/libpython3.8.so.1.0(+0x159bbf) [0x7ff83d6b6bbf]
   30  /lib64/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x57ea) [0x7ff83d6e768a]
   31  /lib64/libpython3.8.so.1.0(_PyFunction_FastCallKeywords+0xfa) [0x7ff83d6ac2aa]
   32  /lib64/libpython3.8.so.1.0(+0x159bbf) [0x7ff83d6b6bbf]
   33  /lib64/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0xd7d) [0x7ff83d6e2c1d]
   34  /lib64/libpython3.8.so.1.0(_PyFunction_FastCallKeywords+0xfa) [0x7ff83d6ac2aa]
   35  /lib64/libpython3.8.so.1.0(+0x159bbf) [0x7ff83d6b6bbf]
   36  /lib64/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0xc1c) [0x7ff83d6e2abc]
   37  /lib64/libpython3.8.so.1.0(_PyFunction_FastCallKeywords+0xfa) [0x7ff83d6ac2aa]
   38  /lib64/libpython3.8.so.1.0(+0x159bbf) [0x7ff83d6b6bbf]
   39  /lib64/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0xc1c) [0x7ff83d6e2abc]
   40  /lib64/libpython3.8.so.1.0(_PyFunction_FastCallDict+0x11a) [0x7ff83d66f44a]
   41  /lib64/libpython3.8.so.1.0(+0x121787) [0x7ff83d67e787]
   42  /lib64/libpython3.8.so.1.0(_PyObject_CallMethodIdObjArgs+0xb9) [0x7ff83d6a65d9]
   43  /lib64/libpython3.8.so.1.0(PyImport_ImportModuleLevelObject+0x26b) [0x7ff83d67263b]
   44  /lib64/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x3219) [0x7ff83d6e50b9]
   45  /lib64/libpython3.8.so.1.0(_PyFunction_FastCallKeywords+0xfa) [0x7ff83d6ac2aa]
   46  /lib64/libpython3.8.so.1.0(+0x159bbf) [0x7ff83d6b6bbf]
   47  /lib64/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0xc1c) [0x7ff83d6e2abc]
   48  /lib64/libpython3.8.so.1.0(+0x1da7df) [0x7ff83d7377df]
   49  /lib64/libpython3.8.so.1.0(+0x159bbf) [0x7ff83d6b6bbf]
   50  /lib64/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0xc1c) [0x7ff83d6e2abc]
   51  /lib64/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x311) [0x7ff83d66e721]
   52  /lib64/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x39) [0x7ff83d66f329]
   53  /lib64/libpython3.8.so.1.0(PyEval_EvalCode+0x1b) [0x7ff83d6ff84b]
   54  /lib64/libpython3.8.so.1.0(+0x20ee30) [0x7ff83d76be30]
   55  /lib64/libpython3.8.so.1.0(PyRun_FileExFlags+0x97) [0x7ff83d76c3b7]
   56  /lib64/libpython3.8.so.1.0(PyRun_SimpleFileExFlags+0x19a) [0x7ff83d7736da]
   57  /lib64/libpython3.8.so.1.0(_Py_RunMain+0x353) [0x7ff83d774d13]
   58  /lib64/libpython3.8.so.1.0(+0x217eb6) [0x7ff83d774eb6]
   59  /lib64/libpython3.8.so.1.0(_Py_UnixMain+0x35) [0x7ff83d774f55]
   60  /lib64/libc.so.6(__libc_start_main+0xf3) [0x7ff83d8eb193]
===================
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 0 on node 41f0acf557e440989184fec990a11425 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

Comment 5 Orion Poplawski 2019-05-27 13:31:49 UTC
The segfault is a current issue with openmpi 4/UCX that has yet to be resolved.

Comment 6 Miro Hrončok 2019-06-03 11:46:30 UTC
Orion, do you happen to have some pointers for that segfault?

Comment 7 Orion Poplawski 2019-06-03 14:30:10 UTC
I'm hoping that it's been resolved with the latest openmpi build - can you try another build?

Comment 8 Miro Hrončok 2019-06-03 14:45:21 UTC
OK. Rebuilding updated openmpi first.

Comment 9 Miro Hrončok 2019-06-03 15:47:07 UTC
mpi4py builds.