Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1556989 - gromacs: EwaldUnitTests and SimdUnitTests fail on ppc64le with gcc-8.0.1
Summary: gromacs: EwaldUnitTests and SimdUnitTests fail on ppc64le with gcc-8.0.1
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: gromacs
Version: 29
Hardware: ppc64le
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Christoph Junghans
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: PPCTracker
TreeView+ depends on / blocked
 
Reported: 2018-03-15 17:08 UTC by Christoph Junghans
Modified: 2019-11-27 22:47 UTC (History)
15 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-11-27 22:47:29 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
GNU Compiler Collection 84907 0 None None None 2019-06-18 13:35:06 UTC

Description Christoph Junghans 2018-03-15 17:08:44 UTC
Description of problem:
Gromacs' SimdUnitTests fails it:

/builddir/build/BUILD/gromacs-2018/src/gromacs/simd/tests/bootstrap_loadstore.cpp:112: Failure
Value of: pCopyDst[i]
  Actual: 5
Expected: pCopySrc[i]
Which is: 6
SIMD load or store not moving data correctly for element 0

Version-Release number of selected component (if applicable):
gromacs-2018.0 on f28, f29 with gcc-8

How reproducible:
Build gromacs-2018-1.fc29.1 (https://src.fedoraproject.org/rpms/gromacs/c/c3a5a134b34548618e3b98aea66b700b7fb4e122?branch=master) on f28 or f29 (currently with coming with gcc-8)

Steps to Reproduce:
1. fedpkg clone gromacs
2. cd gromacs
3. git checkout c3a5a134b34548618e3b98aea66b700b7fb4e122
4. fedpkg srpm
5. fedpkg build --scratch --srpm gromacs-2018-1.fc29.1.src.rpm

Actual results:
Test fail

Expected results:
Test should pass (and pass on f27 with older gcc)

Additional info:
Reported upstream here https://redmine.gromacs.org/issues/2421, they think it is a compiler bug.

https://koji.fedoraproject.org/koji/taskinfo?taskID=25294220

Comment 1 Jakub Jelinek 2018-03-15 18:27:56 UTC
I see gromacs-2018-1.fc29.2.src.rpm has been built successfully.
So, does the -1.fc29.2 contain some workaround for this?
https://kojipkgs.fedoraproject.org/packages/gromacs/2018/1.fc29.2/data/logs/ppc64le/build.log
has
      Start  7: EwaldUnitTests
 7/24 Test  #7: EwaldUnitTests ...................   Passed    0.41 sec
      Start 23: SimdUnitTests
23/24 Test #23: SimdUnitTests ....................   Passed    0.06 sec

Comment 2 Dan Horák 2018-03-15 20:39:07 UTC
It does have a workaround, if I see right - https://src.fedoraproject.org/rpms/gromacs/c/c5c17791036e12d4ddcd6e017b22f5e7bd33ffcb?branch=master - VSX is not used

Comment 3 Jakub Jelinek 2018-03-16 16:09:29 UTC
This is a gromacs bug.

I assume at minimum you need something like following, where vec_xl and vec_xst are proper intrinsics for unaligned vector loads and stores,
but looking around I see various *oadU* and *toreU* functions/function templates calling non-U suffixed ones, so that might need fixing too.
E.g. gatherLoadUTranspose uses simdLoad rather than simdLoadU etc., that looks wrong to me.

--- gromacs-2018/src/gromacs/simd/impl_ibm_vsx/impl_ibm_vsx_simd_float.h.jj	2017-12-11 13:44:54.000000000 +0100
+++ gromacs-2018/src/gromacs/simd/impl_ibm_vsx/impl_ibm_vsx_simd_float.h	2018-03-16 17:02:45.915186989 +0100
@@ -121,14 +121,14 @@ static inline SimdFloat gmx_simdcall
 simdLoadU(const float *m, SimdFloatTag = {})
 {
     return {
-               *reinterpret_cast<const __vector float *>(m)
+               vec_xl(0, m)
     };
 }
 
 static inline void gmx_simdcall
 storeU(float *m, SimdFloat a)
 {
-    *reinterpret_cast<__vector float *>(m) = a.simdInternal_;
+    vec_xst(a.simdInternal_, 0, m);
 }
 
 static inline SimdFloat gmx_simdcall
@@ -157,14 +157,14 @@ static inline SimdFInt32 gmx_simdcall
 simdLoadU(const std::int32_t *m, SimdFInt32Tag)
 {
     return {
-               *reinterpret_cast<const __vector int *>(m)
+               vec_xl(0, m)
     };
 }
 
 static inline void gmx_simdcall
 storeU(std::int32_t * m, SimdFInt32 a)
 {
-    *reinterpret_cast<__vector int *>(m) = a.simdInternal_;
+    vec_xst(a.simdInternal_, 0, m);
 }
 
 static inline SimdFInt32 gmx_simdcall
--- gromacs-2018/src/gromacs/simd/impl_ibm_vsx/impl_ibm_vsx_simd_double.h.jj	2017-12-11 13:44:54.000000000 +0100
+++ gromacs-2018/src/gromacs/simd/impl_ibm_vsx/impl_ibm_vsx_simd_double.h	2018-03-16 17:05:08.248153217 +0100
@@ -121,14 +121,14 @@ static inline SimdDouble gmx_simdcall
 simdLoadU(const double *m, SimdDoubleTag = {})
 {
     return {
-               *reinterpret_cast<const __vector double *>(m)
+               vec_xl(0, m)
     };
 }
 
 static inline void gmx_simdcall
 storeU(double *m, SimdDouble a)
 {
-    *reinterpret_cast<__vector double *>(m) = a.simdInternal_;
+    vec_xst(a.simdInternal_, 0, m);
 }
 
 static inline SimdDouble gmx_simdcall

Comment 4 Jakub Jelinek 2018-03-16 17:58:16 UTC
Feel free to point upstream at the above mentioned GCC http://gcc.gnu.org/PR84907 , which has been closed as INVALID.  -fsanitize=undefined would have likely diagnosed this bug.

Comment 5 Erik Lindahl 2018-03-16 21:50:03 UTC
When we first wrote this code (just when Power8 appeared), gcc-6 produced buggy assembly when using the intrinsics.

After discussing with Michael Gschwind from IBM (whose team wrote the VSX compiler layer for gcc), we were strongly instructed that we should NOT be using the low-level intrinsics directly, but the prescribed usage was (at least at the time) to simply dereference memory and let the compiler do its job.

Thus, if you claim this is invalid use, I would appreciate a pointer to GCC documentation, authors or mailing lists posts specifying that things have either changed, or that those instructions were incorrect for PPC64le/VSX - since they only instructions at least quite recently were the opposite ;-)

Comment 6 Bill Schmidt 2018-03-16 22:10:46 UTC
Hi Erik,

I'm a co-author and current maintainer of the ELF v2 ABI document that included these instructions.  Michael has sadly moved on from this project.

We've recently published errata against the document when we realized that the recommendations in the ABI actually constitute undefined behavior according to the C specification, and was broken with respect to alignment guarantees in GCC.  You can find the original document at https://openpowerfoundation.org/?resource_lib=64-bit-elf-v2-abi-specification-power-architecture, and the errata describing the changes at https://openpowerfoundation.org/?resource_lib=openpower-elfv2-errata-elfv2-abi-version-1-4.

Your existing code works with GCC 7 and earlier code, but optimizations that take alignment into account in the upcoming GCC 8 release expose its undefined behavior, leading to this bug report.

I apologize for the difficulties that this causes you, but indeed the right answer is to always use the vec_xl and vec_xst intrinsics for unaligned loads and stores.  Please feel free to contact me directly with any questions or concerns.

Thanks,

Bill Schmidt, Ph.D.
STSM and GCC Architect for Linux on Power
Linux Technology Center
IBM Corporation

Comment 7 Erik Lindahl 2018-03-16 22:17:10 UTC
Hi Bill,

Thanks - no need to apologize, I realize things change, and we always appreciate clear definitions.

1) We might not have tested with earlier gcc versions in a while, but does this mean it is now considered a valid bug if the intrinsics don't work e.g. on gcc-6?

2) Can we also count on the intrinsics working for all aligned/unaligned load/store operations on xlC? Looking back at my correspondence, it appears that's where we first saw problems a few years ago.

Cheers,

Erik

Comment 8 Bill Schmidt 2018-03-16 22:22:31 UTC
1) Yes, absolutely.  GCC 6 is still in maintenance until this summer, and we want to be sure everything is working.  To my knowledge, vec_xl and vec_xst are working well in GCC 6.  Please report any bugs at the GCC bugzilla site and add wschmidt.org to the CC list to get my attention directly.

2) To my knowledge, yes, but I know the XLC team will want to know of any problems.  We've worked closely with them on compatibility issues over the years, and I know that they've been actively testing the intrinsics.  If you find problems, I can definitely put you in touch with the right person.

Thanks very much for your kind understanding!  Please let me know if I can be of any further assistance.

Bill

Comment 9 Erik Lindahl 2018-03-16 22:32:39 UTC
Just for reference, fix available at https://gerrit.gromacs.org/#/c/7688/ - it will be in the next Gromacs patch release.

Comment 10 Christoph Junghans 2018-03-20 05:37:12 UTC
Build that included that patch: 
https://koji.fedoraproject.org/koji/taskinfo?taskID=25824985

Comment 11 Christoph Junghans 2018-04-11 20:29:53 UTC
gromacs-2018.1 (bug #1559202) doesn't fix the issue, hopefully gromacs-2018.2 will.

Comment 12 Erik Lindahl 2018-04-11 20:54:24 UTC
yeah, sorry - there was one more fix we needed ( https://gerrit.gromacs.org/#/c/7696/ ) that was uploaded the day after, but unfortunately 2018.1 was pushed out before that was committed on Mar 27. Please test the release-2018 branch for us if you want to make 100% sure - if that doesn't work we'll investigate before we push out 2018.2.

Comment 13 Christoph Junghans 2018-04-11 21:28:58 UTC
(In reply to Erik Lindahl from comment #12)
> yeah, sorry - there was one more fix we needed (
> https://gerrit.gromacs.org/#/c/7696/ ) that was uploaded the day after, but
> unfortunately 2018.1 was pushed out before that was committed on Mar 27.
> Please test the release-2018 branch for us if you want to make 100% sure -
> if that doesn't work we'll investigate before we push out 2018.2.
No worries, I have an eye on https://redmine.gromacs.org/issues/2421

Comment 14 Jan Kurik 2018-08-14 10:10:49 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 29 development cycle.
Changing version to '29'.

Comment 15 Ben Cotton 2019-10-31 19:09:54 UTC
This message is a reminder that Fedora 29 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 29 on 2019-11-26.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '29'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 29 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 16 Ben Cotton 2019-11-27 22:47:29 UTC
Fedora 29 changed to end-of-life (EOL) status on 2019-11-26. Fedora 29 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.