Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1752241 - octave test fails with illegal instruction on s390x
Summary: octave test fails with illegal instruction on s390x
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: openblas
Version: 8.0
Hardware: s390x
OS: Linux
unspecified
low
Target Milestone: rc
: 8.2
Assignee: Nikola Forró
QA Contact: RHEL CS Apps Subsystem QE
URL:
Whiteboard:
Depends On:
Blocks: ZedoraTracker
TreeView+ depends on / blocked
 
Reported: 2019-09-15 02:30 UTC by Orion Poplawski
Modified: 2020-04-28 15:55 UTC (History)
12 users (show)

Fixed In Version: openblas-0.3.3-4.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-04-28 15:55:31 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
fix izamax (628 bytes, patch)
2019-09-16 17:52 UTC, Dan Horák
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
IBM Linux Technology Center 181498 0 None None None 2019-09-16 15:02:51 UTC
Red Hat Product Errata RHBA-2020:1664 0 None None None 2020-04-28 15:55:35 UTC

Description Orion Poplawski 2019-09-15 02:30:12 UTC
Description of problem:

BUILDSTDERR:   sparse/eigs.m ..................................................fatal: caught signal Illegal instruction -- stopping myself...
BUILDSTDERR: /bin/sh: line 1:   320 Illegal instruction     (core dumped) /bin/sh ../run-octave --norc --silent --no-history -p /builddir/build/BUILD/octave-5.1.0/test/mex /builddir/build/BUILD/octave-5.1.0/test/fntests.m /builddir/build/BUILD/octave-5.1.0/test

Version-Release number of selected component (if applicable):
octave-5.1.0-2

Comment 1 Orion Poplawski 2019-09-15 02:31:11 UTC
Tried to get a backtrace with libSegFault.so to no avail.

Comment 2 Dan Horák 2019-09-15 06:51:24 UTC
I'll try to look. Is it octave from epel8 branch?

Comment 3 Orion Poplawski 2019-09-16 03:10:45 UTC
Yes.  Thanks.

Comment 4 Dan Horák 2019-09-16 12:55:14 UTC
OK, reproduced locally, bellow is the traceback.

(gdb) where
#0  0x000003ffa852e1a8 in izamax_k () from /lib64/libopenblas.so.0
#1  0x000003ffa83a7d46 in izamax_ () from /lib64/libopenblas.so.0
#2  0x000003ffa8a3e564 in zlatrs_ () from /lib64/libopenblas.so.0
#3  0x000003ffa8a80f2e in ztrcon_ () from /lib64/libopenblas.so.0
#4  0x000003ffab157ef4 in ComplexMatrix::utsolve (this=this@entry=0x3ffc53f1278, mattype=..., b=..., info=@0x3ffc53f0e54: 0, rcon=@0x3ffc53f0e58: 0, sing_handler=0x0, calc_cond=true, 
    transt=blas_no_trans) at liboctave/array/CMatrix.cc:1566
#5  0x000003ffab15b8b2 in ComplexMatrix::solve (this=this@entry=0x3ffc53f1278, mattype=..., b=..., info=@0x3ffc53f0e54: 0, rcon=@0x3ffc53f0e58: 0, sing_handler=0x0, singular_fallback=true, 
    transt=blas_no_trans) at liboctave/array/CMatrix.cc:1977
#6  0x000003ffab47e478 in lusolve<ComplexMatrix, ComplexMatrix> (L=..., U=..., m=..., m@entry=<error reading variable: value has been optimized out>) at ./liboctave/array/dim-vector.h:285
#7  0x000003ffab48f75c in EigsComplexNonSymmetricMatrixShift<ComplexMatrix> (m=..., sigma=..., k_arg=k_arg@entry=10, p_arg=<optimized out>, info=@0x3ffc53f1734: 0, eig_vec=..., 
    eig_val=..., _b=..., permB=..., cresid=..., os=..., tol=<optimized out>, tol@entry=2.2204460492503131e-16, rvec=false, cholB=false, disp=0, maxit=7) at /usr/include/c++/8/complex:1307
#8  0x000003ff66c8e726 in F__eigs__ (interp=..., args=..., nargout=<optimized out>) at libinterp/dldfcn/__eigs__.cc:457
#9  0x000003ffac6beb5a in octave_builtin::call (this=0x2aa73c688a0, tw=..., nargout=<optimized out>, args=...) at libinterp/octave-value/ov-builtin.cc:71

(gdb) disas
Dump of assembler code for function izamax_k:
...
   0x000003ffa852e182 <+978>:	vrepg	%v5,%v7,1
   0x000003ffa852e188 <+984>:	wfcdb	%v26,%v6
   0x000003ffa852e18e <+990>:	jne	0x3ffa852e1a8 <izamax_k+1016>
   0x000003ffa852e192 <+994>:	vsteg	%v6,160(%r15),0
   0x000003ffa852e198 <+1000>:	vmnlg	%v1,%v5,%v7
   0x000003ffa852e19e <+1006>:	vlgvg	%r5,%v1,0
   0x000003ffa852e1a4 <+1012>:	j	0x3ffa852e1a6 <izamax_k+1014>
=> 0x000003ffa852e1a8 <+1016>:	wfchdb	%v16,%v26,%v6
   0x000003ffa852e1ae <+1022>:	vsel	%v1,%v5,%v7,%v16
   0x000003ffa852e1b4 <+1028>:	vsel	%v0,%v26,%v6,%v16
   0x000003ffa852e1ba <+1034>:	vlgvg	%r5,%v1,0
   0x000003ffa852e1c0 <+1040>:	std	%f0,160(%r15)
   0x000003ffa852e1c4 <+1044>:	cgrjh	%r2,%r11,0x3ffa852e1ce <izamax_k+1054>
   0x000003ffa852e1ca <+1050>:	j	0x3ffa852dede <izamax_k+302>
   0x000003ffa852e1ce <+1054>:	sllg	%r4,%r11,1
   0x000003ffa852e1d4 <+1060>:	ld	%f4,160(%r15)
   0x000003ffa852e1d8 <+1064>:	j	0x3ffa852de8c <izamax_k+220>
   0x000003ffa852e1dc <+1068>:	lghi	%r2,1
   0x000003ffa852e1e0 <+1072>:	j	0x3ffa852de4e <izamax_k+158>
   0x000003ffa852e1e4 <+1076>:	brasl	%r14,0x3ffa837b5d8 <__stack_chk_fail@plt>
End of assembler dump.

Could be z14 instruction slipping into z13 code or similar issue. I haven't checked the openblas build for rhel8/epel8 yet.

Comment 5 Dan Horák 2019-09-16 13:49:20 UTC
0x000003ffa852e1a4 <+1012>:	j	0x3ffa852e1a6 <izamax_k+1014>

looks suspicious, it jumps into a middle of next instruction, while it should jump much further, right after the "std" instruction

Comment 6 Dan Horák 2019-09-16 14:06:42 UTC
https://github.com/xianyi/OpenBLAS/blob/v0.3.3/kernel/zarch/izamax.c#L188 is the source code in question

Comment 7 Dan Horák 2019-09-16 14:53:44 UTC
Reassigned to RHEL, trying to figure out if it's an openblas issue or a toolchain issue.

Comment 8 Dan Horák 2019-09-16 17:52:03 UTC
With fixed openblas I've got

Summary:

  PASS                            15407
  FAIL                                5
  REGRESSION                          1
  XFAIL (reported bug)               28
  SKIP (missing feature)            124
  SKIP (run-time condition)          34

Comment 9 Dan Horák 2019-09-16 17:52:55 UTC
Created attachment 1615586 [details]
fix izamax

Comment 10 Dan Horák 2019-09-16 19:33:23 UTC
I strongly recommend to explicitly set TARGET=Z13 during the build, so the rpms won't get different default when the builders move to another machine.

Comment 11 Nikola Forró 2019-09-24 15:05:18 UTC
Thanks Dan.

> I strongly recommend to explicitly set TARGET=Z13 during the build, so the rpms won't get different default when the builders move to another machine.

Do you think I should also disable DYNAMIC_ARCH as is the case with other non-x86_64 arches?

Comment 12 Dan Horák 2019-10-01 09:50:48 UTC
AFAIK using DYNAMIC_ARCH is OK, because it builds all variants and selects the right one during runtime. What we should fix is the builds that don't support DYNAMIC_ARCH and don't set TARGET explicitly (like s390x).

Comment 15 IBM Bug Proxy 2019-11-21 12:10:23 UTC
------- Comment From Andreas.Krebbel.com 2019-11-21 07:04 EDT-------
"vlgvg  %[index],%%v1,0  \n\t"
"j 3    \n\t"
"2:     \n\t"

"j 3" is wrong. It must be either "j 3f" or "j 3b". This problem has been fixed in OpenBLAS in February this year.

Comment 16 Nikola Forró 2019-11-21 12:39:33 UTC
> "j 3" is wrong. It must be either "j 3f" or "j 3b". This problem has been fixed in OpenBLAS in February this year.

Yes, the patch changes "j 3" to "j 3f".

Comment 17 IBM Bug Proxy 2019-11-21 13:00:21 UTC
------- Comment From Andreas.Krebbel.com 2019-11-21 07:53 EDT-------
(In reply to comment #12)
> > "j 3" is wrong. It must be either "j 3f" or "j 3b". This problem has been fixed in OpenBLAS in February this year.
>
> Yes, the patch changes "j 3" to "j 3f".

Oh right. I missed that.

In upstream OpenBLAS there are bunch of patches to add z14 support. These also fix a couple of issues with the z13 support. There should be no testsuite fails anymore with the upstream level. We will check what needs to be backported and open a separate Bugzilla for this.

Comment 18 Dan Horák 2019-11-21 13:31:48 UTC
IIRC s390x is the only (RHEL) arch in openblas that can't build with DYNAMIC_ARCH (aka runtime CPU level detection). Without that we can only build the z13 variant for RHEL-8 as it's the minimum supported arch.

Comment 19 IBM Bug Proxy 2019-11-21 14:10:24 UTC
------- Comment From arnez.com 2019-11-21 09:06 EDT-------
> IIRC s390x is the only (RHEL) arch in openblas that can't build with
> DYNAMIC_ARCH (aka runtime CPU level detection).
Right.  There was a proposed project as part of the OpenMainframeProject's internship program to fix that:
https://github.com/openmainframeproject-internship/resources/blob/master/proposed_projects/OpenBLAS.mdp
But it hasn't been picked up by anyone yet.

Comment 20 IBM Bug Proxy 2019-11-21 14:20:33 UTC
------- Comment From arnez.com 2019-11-21 09:14 EDT-------
Oops, please replace "OpenBLAS.mdp" by "OpenBLAS.md" in the URL above.

Comment 22 errata-xmlrpc 2020-04-28 15:55:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1664


Note You need to log in before you can comment on or make changes to this bug.