Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.

Bug 1799473

Summary: gromacs: FTBFS in Fedora rawhide/f32
Product: [Fedora] Fedora Reporter: Fedora Release Engineering <releng>
Component: gromacsAssignee: Christoph Junghans <junghans>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 32CC: dakingun, dominik, junghans, orion
Target Milestone: ---Flags: junghans: needinfo+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-02-23 23:34:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1750908    
Attachments:
Description Flags
build.log
none
root.log
none
state.log none

Description Fedora Release Engineering 2020-02-06 17:16:54 UTC
gromacs failed to build from source in Fedora rawhide/f32

https://koji.fedoraproject.org/koji/taskinfo?taskID=41318028


For details on the mass rebuild see:

https://fedoraproject.org/wiki/Fedora_32_Mass_Rebuild
Please fix gromacs at your earliest convenience and set the bug's status to
ASSIGNED when you start fixing it. If the bug remains in NEW state for 8 weeks,
gromacs will be orphaned. Before branching of Fedora 33,
gromacs will be retired, if it still fails to build.

For more details on the FTBFS policy, please visit:
https://fedoraproject.org/wiki/Fails_to_build_from_source

Comment 1 Fedora Release Engineering 2020-02-06 17:16:57 UTC
Created attachment 1659221 [details]
build.log

file build.log too big, will only attach last 32768 bytes

Comment 2 Fedora Release Engineering 2020-02-06 17:16:59 UTC
Created attachment 1659222 [details]
root.log

file root.log too big, will only attach last 32768 bytes

Comment 3 Fedora Release Engineering 2020-02-06 17:17:01 UTC
Created attachment 1659223 [details]
state.log

Comment 4 Christoph Junghans 2020-02-06 17:43:04 UTC
The ppc64 error:
-- Could not find any flag to build test source (this could be due to either the compiler or binutils)
CMake Error at cmake/gmxManageSimd.cmake:51 (message):
  Cannot find IBM VSX compiler flag.  Use a newer compiler, or disable SIMD
  support (slower).
Call Stack (most recent call first):
  cmake/gmxManageSimd.cmake:265 (gmx_give_fatal_error_when_simd_support_not_found)
  CMakeLists.txt:719 (gmx_manage_simd)
-- Configuring incomplete, errors occurred!

Something is wrong with SIMD flag.

On aarch64 the error is:
Mdrun cannot use the requested (or automatic) number of ranks, retrying with 8.
Abnormal return value for ' gmx mdrun    -nb cpu   -notunepme >mdrun.out 2>&1' was 1
Retrying mdrun with better settings...
.....
98% tests passed, 1 tests failed out of 46
Label Time Summary:
GTest              =  33.68 sec*proc (40 tests)
IntegrationTest    =   6.53 sec*proc (5 tests)
MpiTest            =   2.22 sec*proc (3 tests)
SlowTest           =  20.76 sec*proc (1 test)
UnitTest           =   6.39 sec*proc (34 tests)
Total Test time (real) = 2697.26 sec
The following tests FAILED:
         43 - regressiontests/kernel (Timeout)
Errors while running CTest

Comment 5 Christoph Junghans 2020-02-06 17:53:57 UTC
Using "mock -r fedora-rawhide-ppc64le --no-clean gromacs-2019.5-2.fc32.1.src.rpm"
I get a:
+++ /usr/bin/ps -p 160 -ocomm=
Signal 4 (ILL) caught by ps (3.3.15).
/usr/bin/ps:ps/display.c:66: please report this bug
++ my_shell=

Comment 6 Ben Cotton 2020-02-11 17:06:25 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 32 development cycle.
Changing version to 32.

Comment 7 Christoph Junghans 2020-02-12 03:02:03 UTC
Details on the aarch64 error:
22/27 Test #22: UtilityMpiUnitTests ..............***Failed    0.52 sec
Invalid error code (-2) (error ring index 127 invalid)
INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in MPID_nem_tcp_init:373
Invalid error code (-2) (error ring index 127 invalid)
INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in MPID_nem_tcp_init:373
Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(586)..............:
MPID_Init(224).....................: channel initialization failed
MPIDI_CH3_Init(105)................:
MPID_nem_init(324).................:
MPID_nem_tcp_init(175).............:
MPID_nem_tcp_get_business_card(401):
MPID_nem_tcp_init(373).............: gethostbyname failed, 9642102373514ac7b8330d80c6ee96d2 (errno 0)
Invalid error code (-2) (error ring index 127 invalid)
Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(586)..............:
MPID_Init(224).....................: channel initialization failed
MPIDI_CH3_Init(105)................:
MPID_nem_init(324).................:
MPID_nem_tcp_init(175).............:
MPID_nem_tcp_get_business_card(401):
MPID_nem_tcp_init(373).............: gethostbyname failed, 9642102373514ac7b8330d80c6ee96d2 (errno 0)

So this seems to be a bug in mpich.

Comment 8 Christoph Junghans 2020-02-14 22:39:25 UTC
ppc64le issue reported upstream: https://redmine.gromacs.org/issues/3380

Comment 9 Christoph Junghans 2020-02-14 22:40:13 UTC
(In reply to Christoph Junghans from comment #7)
> Details on the aarch64 error:
> 22/27 Test #22: UtilityMpiUnitTests ..............***Failed    0.52 sec
> Invalid error code (-2) (error ring index 127 invalid)
> INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in
> MPID_nem_tcp_init:373
> Invalid error code (-2) (error ring index 127 invalid)
> INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in
> MPID_nem_tcp_init:373
> Fatal error in PMPI_Init_thread: Other MPI error, error stack:
> MPIR_Init_thread(586)..............:
> MPID_Init(224).....................: channel initialization failed
> MPIDI_CH3_Init(105)................:
> MPID_nem_init(324).................:
> MPID_nem_tcp_init(175).............:
> MPID_nem_tcp_get_business_card(401):
> MPID_nem_tcp_init(373).............: gethostbyname failed,
> 9642102373514ac7b8330d80c6ee96d2 (errno 0)
> Invalid error code (-2) (error ring index 127 invalid)
> Fatal error in PMPI_Init_thread: Other MPI error, error stack:
> MPIR_Init_thread(586)..............:
> MPID_Init(224).....................: channel initialization failed
> MPIDI_CH3_Init(105)................:
> MPID_nem_init(324).................:
> MPID_nem_tcp_init(175).............:
> MPID_nem_tcp_get_business_card(401):
> MPID_nem_tcp_init(373).............: gethostbyname failed,
> 9642102373514ac7b8330d80c6ee96d2 (errno 0)
> 
> So this seems to be a bug in mpich.

MPICH issuue patched here: https://src.fedoraproject.org/rpms/mpich/pull-request/2

Comment 10 Fedora Release Engineering 2020-02-16 04:26:47 UTC
Dear Maintainer,

your package has not been built successfully in 32. Action is required from you.

If you can fix your package to build, perform a build in koji, and either create
an update in bodhi, or close this bug without creating an update, if updating is
not appropriate [1]. If you are working on a fix, set the status to ASSIGNED to
acknowledge this. Following the latest policy for such packages [2], your package
will be orphaned if this bug remains in NEW state more than 8 weeks.

A week before the mass branching of Fedora 33 according to the schedule [3],
any packages not successfully rebuilt at least on Fedora 31 will be
retired regardless of the status of this bug.

[1] https://fedoraproject.org/wiki/Updates_Policy
[2] https://docs.fedoraproject.org/en-US/fesco/Fails_to_build_from_source_Fails_to_install/
[3] https://fedoraproject.org/wiki/Releases/33/Schedule