Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1752241
Summary: | octave test fails with illegal instruction on s390x | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Orion Poplawski <orion> | ||||
Component: | openblas | Assignee: | Nikola Forró <nforro> | ||||
Status: | CLOSED ERRATA | QA Contact: | RHEL CS Apps Subsystem QE <rhel-cs-apps-subsystem-qe> | ||||
Severity: | low | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 8.0 | CC: | alex, bugproxy, cbm, dan, fkluknav, hannsj_uhl, jaromir.capik, jkejda, mmahut, orion, rakesh.pandit, susi.lehtola | ||||
Target Milestone: | rc | ||||||
Target Release: | 8.2 | ||||||
Hardware: | s390x | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | openblas-0.3.3-4.el8 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2020-04-28 15:55:31 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 467765 | ||||||
Attachments: |
|
Description
Orion Poplawski
2019-09-15 02:30:12 UTC
Tried to get a backtrace with libSegFault.so to no avail. I'll try to look. Is it octave from epel8 branch? Yes. Thanks. OK, reproduced locally, bellow is the traceback. (gdb) where #0 0x000003ffa852e1a8 in izamax_k () from /lib64/libopenblas.so.0 #1 0x000003ffa83a7d46 in izamax_ () from /lib64/libopenblas.so.0 #2 0x000003ffa8a3e564 in zlatrs_ () from /lib64/libopenblas.so.0 #3 0x000003ffa8a80f2e in ztrcon_ () from /lib64/libopenblas.so.0 #4 0x000003ffab157ef4 in ComplexMatrix::utsolve (this=this@entry=0x3ffc53f1278, mattype=..., b=..., info=@0x3ffc53f0e54: 0, rcon=@0x3ffc53f0e58: 0, sing_handler=0x0, calc_cond=true, transt=blas_no_trans) at liboctave/array/CMatrix.cc:1566 #5 0x000003ffab15b8b2 in ComplexMatrix::solve (this=this@entry=0x3ffc53f1278, mattype=..., b=..., info=@0x3ffc53f0e54: 0, rcon=@0x3ffc53f0e58: 0, sing_handler=0x0, singular_fallback=true, transt=blas_no_trans) at liboctave/array/CMatrix.cc:1977 #6 0x000003ffab47e478 in lusolve<ComplexMatrix, ComplexMatrix> (L=..., U=..., m=..., m@entry=<error reading variable: value has been optimized out>) at ./liboctave/array/dim-vector.h:285 #7 0x000003ffab48f75c in EigsComplexNonSymmetricMatrixShift<ComplexMatrix> (m=..., sigma=..., k_arg=k_arg@entry=10, p_arg=<optimized out>, info=@0x3ffc53f1734: 0, eig_vec=..., eig_val=..., _b=..., permB=..., cresid=..., os=..., tol=<optimized out>, tol@entry=2.2204460492503131e-16, rvec=false, cholB=false, disp=0, maxit=7) at /usr/include/c++/8/complex:1307 #8 0x000003ff66c8e726 in F__eigs__ (interp=..., args=..., nargout=<optimized out>) at libinterp/dldfcn/__eigs__.cc:457 #9 0x000003ffac6beb5a in octave_builtin::call (this=0x2aa73c688a0, tw=..., nargout=<optimized out>, args=...) at libinterp/octave-value/ov-builtin.cc:71 (gdb) disas Dump of assembler code for function izamax_k: ... 0x000003ffa852e182 <+978>: vrepg %v5,%v7,1 0x000003ffa852e188 <+984>: wfcdb %v26,%v6 0x000003ffa852e18e <+990>: jne 0x3ffa852e1a8 <izamax_k+1016> 0x000003ffa852e192 <+994>: vsteg %v6,160(%r15),0 0x000003ffa852e198 <+1000>: vmnlg %v1,%v5,%v7 0x000003ffa852e19e <+1006>: vlgvg %r5,%v1,0 0x000003ffa852e1a4 <+1012>: j 0x3ffa852e1a6 <izamax_k+1014> => 0x000003ffa852e1a8 <+1016>: wfchdb %v16,%v26,%v6 0x000003ffa852e1ae <+1022>: vsel %v1,%v5,%v7,%v16 0x000003ffa852e1b4 <+1028>: vsel %v0,%v26,%v6,%v16 0x000003ffa852e1ba <+1034>: vlgvg %r5,%v1,0 0x000003ffa852e1c0 <+1040>: std %f0,160(%r15) 0x000003ffa852e1c4 <+1044>: cgrjh %r2,%r11,0x3ffa852e1ce <izamax_k+1054> 0x000003ffa852e1ca <+1050>: j 0x3ffa852dede <izamax_k+302> 0x000003ffa852e1ce <+1054>: sllg %r4,%r11,1 0x000003ffa852e1d4 <+1060>: ld %f4,160(%r15) 0x000003ffa852e1d8 <+1064>: j 0x3ffa852de8c <izamax_k+220> 0x000003ffa852e1dc <+1068>: lghi %r2,1 0x000003ffa852e1e0 <+1072>: j 0x3ffa852de4e <izamax_k+158> 0x000003ffa852e1e4 <+1076>: brasl %r14,0x3ffa837b5d8 <__stack_chk_fail@plt> End of assembler dump. Could be z14 instruction slipping into z13 code or similar issue. I haven't checked the openblas build for rhel8/epel8 yet. 0x000003ffa852e1a4 <+1012>: j 0x3ffa852e1a6 <izamax_k+1014> looks suspicious, it jumps into a middle of next instruction, while it should jump much further, right after the "std" instruction https://github.com/xianyi/OpenBLAS/blob/v0.3.3/kernel/zarch/izamax.c#L188 is the source code in question Reassigned to RHEL, trying to figure out if it's an openblas issue or a toolchain issue. With fixed openblas I've got Summary: PASS 15407 FAIL 5 REGRESSION 1 XFAIL (reported bug) 28 SKIP (missing feature) 124 SKIP (run-time condition) 34 Created attachment 1615586 [details]
fix izamax
I strongly recommend to explicitly set TARGET=Z13 during the build, so the rpms won't get different default when the builders move to another machine. Thanks Dan.
> I strongly recommend to explicitly set TARGET=Z13 during the build, so the rpms won't get different default when the builders move to another machine.
Do you think I should also disable DYNAMIC_ARCH as is the case with other non-x86_64 arches?
AFAIK using DYNAMIC_ARCH is OK, because it builds all variants and selects the right one during runtime. What we should fix is the builds that don't support DYNAMIC_ARCH and don't set TARGET explicitly (like s390x). ------- Comment From Andreas.Krebbel.com 2019-11-21 07:04 EDT------- "vlgvg %[index],%%v1,0 \n\t" "j 3 \n\t" "2: \n\t" "j 3" is wrong. It must be either "j 3f" or "j 3b". This problem has been fixed in OpenBLAS in February this year. > "j 3" is wrong. It must be either "j 3f" or "j 3b". This problem has been fixed in OpenBLAS in February this year.
Yes, the patch changes "j 3" to "j 3f".
------- Comment From Andreas.Krebbel.com 2019-11-21 07:53 EDT------- (In reply to comment #12) > > "j 3" is wrong. It must be either "j 3f" or "j 3b". This problem has been fixed in OpenBLAS in February this year. > > Yes, the patch changes "j 3" to "j 3f". Oh right. I missed that. In upstream OpenBLAS there are bunch of patches to add z14 support. These also fix a couple of issues with the z13 support. There should be no testsuite fails anymore with the upstream level. We will check what needs to be backported and open a separate Bugzilla for this. IIRC s390x is the only (RHEL) arch in openblas that can't build with DYNAMIC_ARCH (aka runtime CPU level detection). Without that we can only build the z13 variant for RHEL-8 as it's the minimum supported arch. ------- Comment From arnez.com 2019-11-21 09:06 EDT------- > IIRC s390x is the only (RHEL) arch in openblas that can't build with > DYNAMIC_ARCH (aka runtime CPU level detection). Right. There was a proposed project as part of the OpenMainframeProject's internship program to fix that: https://github.com/openmainframeproject-internship/resources/blob/master/proposed_projects/OpenBLAS.mdp But it hasn't been picked up by anyone yet. ------- Comment From arnez.com 2019-11-21 09:14 EDT------- Oops, please replace "OpenBLAS.mdp" by "OpenBLAS.md" in the URL above. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1664 |