1064271 – perl-Net-SSLeay tests failing on s390(x) with glibc-2.18.90-21.fc21

Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.

Bug 1064271 - perl-Net-SSLeay tests failing on s390(x) with glibc-2.18.90-21.fc21

Summary: perl-Net-SSLeay tests failing on s390(x) with glibc-2.18.90-21.fc21

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	glibc
Sub Component:
Version:	rawhide
Hardware:	s390x
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Assignee:	Carlos O'Donell
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	ZedoraTracker
TreeView+	depends on / blocked

Reported:	2014-02-12 10:26 UTC by Dan Horák
Modified:	2016-11-24 12:34 UTC (History)
CC List:	14 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2014-08-28 00:33:36 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
diff between good and bad buildroots (4.53 KB, text/plain) 2014-02-28 12:04 UTC, Dan Horák	no flags	Details
output when running the test with LD_DEBUG=versions (40.24 KB, text/plain) 2014-03-28 09:35 UTC, Dan Horák	no flags	Details
backtrace (7.84 KB, text/plain) 2014-03-28 11:45 UTC, Michal Toman	no flags	Details
View All

Description Dan Horák 2014-02-12 10:26:28 UTC

tests are failing on s390(x)

...
Executing(%check): /bin/sh -e /var/tmp/rpm-tmp.piSV6w
+ umask 022
+ cd /builddir/build/BUILD
+ cd Net-SSLeay-1.58
+ make test
PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-MTest::Harness" "-e" "undef *Test::Harness::Switches; test_harness(0, 'inc', 'blib/lib', 'blib/arch')" t/local/*.t t/handle/local/*.t
t/handle/local/05_use.t ................ 
Failed 1/1 subtests 
t/local/01_pod.t ....................... ok
t/local/02_pod_coverage.t .............. skipped: these tests are for only for release candidate testing. Enable with RELEASE_TESTING=1
t/local/03_use.t ....................... 
Failed 1/1 subtests 
t/local/04_basic.t ..................... 
Failed 6/6 subtests 
t/local/05_passwd_cb.t ................. 
Failed 13/13 subtests 
t/local/06_tcpecho.t ................... 
No subtests run 
t/local/07_sslecho.t ................... 
No subtests run 
t/local/08_pipe.t ...................... 
No subtests run 
t/local/15_bio.t ....................... 
Failed 7/7 subtests 
t/local/20_autoload.t .................. 
No subtests run 
t/local/21_constants.t ................. 
No subtests run 
t/local/30_error.t ..................... 
No subtests run 
t/local/31_rsa_generate_key.t .......... 
No subtests run 
t/local/32_x509_get_cert_info.t ........ 
Failed 1243/1243 subtests 
t/local/33_x509_create_cert.t .......... 
Failed 124/124 subtests 
t/local/34_x509_crl.t .................. 
Failed 41/41 subtests 
t/local/35_ephemeral.t ................. 
Failed 3/3 subtests 
t/local/36_verify.t .................... 
Failed 25/25 subtests 
t/local/37_asn1_time.t ................. 
Failed 10/10 subtests 
t/local/38_priv-key.t .................. 
Failed 10/10 subtests 
t/local/39_pkcs12.t .................... 
Failed 19/19 subtests 
t/local/40_npn_support.t ............... 
No subtests run 
t/local/41_alpn_support.t .............. 
No subtests run 
t/local/50_digest.t .................... 
Failed 230/230 subtests 
t/local/61_threads-cb-crash.t .......... 
No subtests run 
t/local/62_threads-ctx_new-deadlock.t .. 
No subtests run 
t/local/kwalitee.t ..................... skipped: these tests are for only for release candidate testing. Enable with RELEASE_TESTING=1
Test Summary Report
-------------------
t/handle/local/05_use.t              (Wstat: 11 Tests: 0 Failed: 0)
  Non-zero wait status: 11
  Parse errors: Bad plan.  You planned 1 tests but ran 0.
t/local/03_use.t                     (Wstat: 11 Tests: 0 Failed: 0)
  Non-zero wait status: 11
  Parse errors: Bad plan.  You planned 1 tests but ran 0.
t/local/04_basic.t                   (Wstat: 11 Tests: 0 Failed: 0)
  Non-zero wait status: 11
  Parse errors: Bad plan.  You planned 6 tests but ran 0.
t/local/05_passwd_cb.t               (Wstat: 11 Tests: 0 Failed: 0)
  Non-zero wait status: 11
  Parse errors: Bad plan.  You planned 13 tests but ran 0.
t/local/06_tcpecho.t                 (Wstat: 11 Tests: 0 Failed: 0)
  Non-zero wait status: 11
  Parse errors: No plan found in TAP output
t/local/07_sslecho.t                 (Wstat: 11 Tests: 0 Failed: 0)
  Non-zero wait status: 11
  Parse errors: No plan found in TAP output
t/local/08_pipe.t                    (Wstat: 11 Tests: 0 Failed: 0)
  Non-zero wait status: 11
  Parse errors: No plan found in TAP output
t/local/15_bio.t                     (Wstat: 11 Tests: 0 Failed: 0)
  Non-zero wait status: 11
  Parse errors: Bad plan.  You planned 7 tests but ran 0.
t/local/20_autoload.t                (Wstat: 11 Tests: 0 Failed: 0)
  Non-zero wait status: 11
  Parse errors: No plan found in TAP output
t/local/21_constants.t               (Wstat: 11 Tests: 0 Failed: 0)
  Non-zero wait status: 11
  Parse errors: No plan found in TAP output
t/local/30_error.t                   (Wstat: 11 Tests: 0 Failed: 0)
  Non-zero wait status: 11
  Parse errors: No plan found in TAP output
t/local/31_rsa_generate_key.t        (Wstat: 11 Tests: 0 Failed: 0)
  Non-zero wait status: 11
  Parse errors: No plan found in TAP output
t/local/32_x509_get_cert_info.t      (Wstat: 11 Tests: 0 Failed: 0)
  Non-zero wait status: 11
  Parse errors: Bad plan.  You planned 1243 tests but ran 0.
t/local/33_x509_create_cert.t        (Wstat: 11 Tests: 0 Failed: 0)
  Non-zero wait status: 11
  Parse errors: Bad plan.  You planned 124 tests but ran 0.
t/local/34_x509_crl.t                (Wstat: 11 Tests: 0 Failed: 0)
  Non-zero wait status: 11
  Parse errors: Bad plan.  You planned 41 tests but ran 0.
t/local/35_ephemeral.t               (Wstat: 11 Tests: 0 Failed: 0)
  Non-zero wait status: 11
  Parse errors: Bad plan.  You planned 3 tests but ran 0.
t/local/36_verify.t                  (Wstat: 11 Tests: 0 Failed: 0)
  Non-zero wait status: 11
  Parse errors: Bad plan.  You planned 25 tests but ran 0.
t/local/37_asn1_time.t               (Wstat: 11 Tests: 0 Failed: 0)
  Non-zero wait status: 11
  Parse errors: Bad plan.  You planned 10 tests but ran 0.
t/local/38_priv-key.t                (Wstat: 11 Tests: 0 Failed: 0)
  Non-zero wait status: 11
  Parse errors: Bad plan.  You planned 10 tests but ran 0.
t/local/39_pkcs12.t                  (Wstat: 11 Tests: 0 Failed: 0)
  Non-zero wait status: 11
  Parse errors: Bad plan.  You planned 19 tests but ran 0.
t/local/40_npn_support.t             (Wstat: 11 Tests: 0 Failed: 0)
  Non-zero wait status: 11
  Parse errors: No plan found in TAP output
t/local/41_alpn_support.t            (Wstat: 11 Tests: 0 Failed: 0)
  Non-zero wait status: 11
  Parse errors: No plan found in TAP output
t/local/50_digest.t                  (Wstat: 11 Tests: 0 Failed: 0)
  Non-zero wait status: 11
  Parse errors: Bad plan.  You planned 230 tests but ran 0.
t/local/61_threads-cb-crash.t        (Wstat: 11 Tests: 0 Failed: 0)
  Non-zero wait status: 11
  Parse errors: No plan found in TAP output
t/local/62_threads-ctx_new-deadlock.t (Wstat: 11 Tests: 0 Failed: 0)
  Non-zero wait status: 11
  Parse errors: No plan found in TAP output
Files=28, Tests=2,  1 wallclock secs ( 0.07 usr  0.02 sys +  0.68 cusr  0.12 csys =  0.89 CPU)
Result: FAIL
Failed 25/28 test programs. 0/2 subtests failed.
make: *** [test_dynamic] Error 255
error: Bad exit status from /var/tmp/rpm-tmp.piSV6w (%check)
RPM build errors:
    Bad exit status from /var/tmp/rpm-tmp.piSV6w (%check)
Child return code was: 1
EXCEPTION: Command failed. See logs for output.
 # ['bash', '--login', '-c', 'rpmbuild -bb --target s390x --nodeps  builddir/build/SPECS/perl-Net-SSLeay.spec']
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/mockbuild/trace_decorator.py", line 70, in trace
    result = func(*args, **kw)
  File "/usr/lib/python2.7/site-packages/mockbuild/util.py", line 376, in do
    raise mockbuild.exception.Error, ("Command failed. See logs for output.\n # %s" % (command,), child.returncode)
Error: Command failed. See logs for output.
 # ['bash', '--login', '-c', 'rpmbuild -bb --target s390x --nodeps  builddir/build/SPECS/perl-Net-SSLeay.spec']
LEAVE do --> EXCEPTION RAISED

for full logs please see http://s390.koji.fedoraproject.org/koji/taskinfo?taskID=1351832


Version-Release number of selected component (if applicable):
perl-Net-SSLeay-1.58-1.fc21

Setting as urgent as it currently blocks any progress on Rawhide for s390(x)

Comment 1 Paul Howarth 2014-02-12 12:41:42 UTC

What was the last version of perl-Net-SSLeay/openssl that built successfully on s390(x)?

Comment 2 Dan Horák 2014-02-12 12:50:32 UTC

the previous perl-Net-SSLeay-1.57-1.fc21 (with openssl-1.0.1e-37.fc21.s390x in buildroot) was OK, for full history please see
http://s390.koji.fedoraproject.org/koji/packageinfo?packageID=5993

Comment 3 Dan Horák 2014-02-12 12:51:11 UTC

and FWIW perl-Net-SSLeay-1.58-1.fc21 rebuilds fine in F-20

Comment 4 Paul Howarth 2014-02-12 13:14:03 UTC

Does perl-Net-SSLeay-1.57-1.fc21 still build OK?

It looks like the module is failing to load at all, which looks like perhaps a toolchain issue, but there's very little difference in the buildroots.

Comment 5 Dan Horák 2014-02-12 13:25:08 UTC

it doesn't - http://s390.koji.fedoraproject.org/koji/taskinfo?taskID=1351908 :-(
and thanks for the hint

the buildroot for perl-Net-SSLeay-1.58-1.fc21 uses the same NVRs as build on primary with the exception of pcre that is rebuilt with larger stack for tests

Comment 6 Paul Howarth 2014-02-12 18:56:45 UTC

Can you try with the regular version of pcre?

Comment 7 Dan Horák 2014-02-13 09:09:29 UTC

pcre-8.34-3.fc21 is used instead of pcre-8.34-2.fc21, the difference is only in http://pkgs.fedoraproject.org/cgit/pcre.git/commit/?id=e73104aed3ff90f784f8ee2d04ede2a94c34e412 - it's only about larger stack for %check

Comment 8 Paul Howarth 2014-02-13 09:57:46 UTC

Is there a way of testing with pcre-8.34-2.fc21, to try to isolate if that's what's causing the failure? Or otherwise bisecting the buildroot changes that caused a previously-working build to fail?

Comment 9 Dan Horák 2014-02-28 12:04:05 UTC

Created attachment 869010 [details]
diff between good and bad buildroots

Comment 10 Dan Horák 2014-02-28 12:46:57 UTC

And the prime suspect is glibc, after downgrading the test suite passes again.

Comment 11 Dan Horák 2014-02-28 12:48:02 UTC

and fails too with glibc-2.18.90-22.fc21

Comment 12 Dan Horák 2014-02-28 13:39:38 UTC

and also with glibc-2.18.90-27.fc21

Comment 13 Dan Horák 2014-02-28 13:40:42 UTC

the last working version is glibc-2.18.90-20.fc21

Comment 14 Dan Horák 2014-02-28 14:02:55 UTC

and fails even with glibc-2.19.90-3.fc21, so help is needed

Comment 15 Carlos O'Donell 2014-02-28 14:42:24 UTC

In the past I've had little luck getting an s390 box with rawhide on it, does someone have a box all ready and setup so I could just login and do the rpmbuild to see what's failing in the build?

Comment 16 Dan Horák 2014-02-28 14:53:02 UTC

Currently I can provide a rawhide mock chroot where I tried all the various glibc versions during the build. I haven't tried upgrading the F-20 guest to rawhide yet.

Comment 17 Siddhesh Poyarekar 2014-02-28 14:57:21 UTC

The upstream resync from 2.18.90-20 to -21 was essentially glibc-2.18-753-gd5780fe..glibc-2.18-788-g497b1e6.  Below are the S/390 specific commits:


commit 87ded0c382b835e5d7ca8b5e059a8a044a6c3976
Author: Andreas Krebbel <krebbel.ibm.com>
Date:   Tue Jan 7 09:40:39 2014 +0100

    S/390: Remove __tls_get_addr argument cast.

commit c5eebdd084b77b0b581a3aa02213fa7cc5851216
Author: Andreas Krebbel <krebbel.ibm.com>
Date:   Tue Jan 7 09:40:00 2014 +0100

    S/390: Get rid of unused variable warning in dl-machine.h

commit 05d138ef07481b16f1aaee648798cc51182ec65e
Author: Andreas Krebbel <krebbel.ibm.com>
Date:   Tue Jan 7 09:37:31 2014 +0100

    S/390: Make ucontext_t extendible.

commit 93a45ff1ca6d459618bb0cf93580c4b2809a4b61
Author: Andreas Krebbel <krebbel.ibm.com>
Date:   Tue Jan 7 09:36:31 2014 +0100

    S/390: Make jmp_buf extendible.

Comment 18 Dan Horák 2014-03-24 12:25:50 UTC

some new info - the module must be built with the bad glibc for the tests to fail, upgrading/downgrading after building doesn't affect the results

Comment 19 Dan Horák 2014-03-24 14:16:42 UTC

and I can confirm it is one (or more) of the patches from comment 17, glibc-2.18.90-21.fc21 is bad, glibc-2.18.90-21.fc21 with those 4 patches reverted is good

Comment 20 Dan Horák 2014-03-24 14:48:01 UTC

and

commit 93a45ff1ca6d459618bb0cf93580c4b2809a4b61
Author: Andreas Krebbel <krebbel.ibm.com>
Date:   Tue Jan 7 09:36:31 2014 +0100

    S/390: Make jmp_buf extendible.

is the problem ...

Comment 21 Dan Horák 2014-03-27 16:42:46 UTC

reduced reproducer
- install F-20
- update glibc from http://fedora.danny.cz/s390/glibc-2.18.90-20.fc21.dh.1/ - it is glibc-2.18.90-20.fc21 + commit 93a45ff1
- rpmbuild --rebuild http://fedora.danny.cz/s390/perl-Net-SSLeay-1.58-1.fc21.src.rpm


for every failed test following info appears in kernel log:

[ 6672.505145] User process fault: interruption code 0x6003B in SSLeay.so[3fff6650000+89000]
[ 6672.505155] failing address: 0
[ 6672.505159] CPU: 0 PID: 16420 Comm: perl Not tainted 3.13.6-200.fc20.s390x #1
[ 6672.505162] task: 0000000072333c98 ti: 000000005859c000 task.ti: 000000005859c000
[ 6672.505176] User PSW : 0705000180000000 000003fff66b682c (0x3fff66b682c)
[ 6672.505178]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:0 PM:0 EA:3
User GPRS: 0000000088af88e8 0000000000000000 00000000887cc010 0000000000000000
[ 6672.505185]            000003fffcedc360 0000000000000001 0000000000000020 000003fff66db0e8
[ 6672.505188]            0000000000000000 0000000000000001 0000000088aca5a8 000003fffd1ed3a0
[ 6672.505192]            000003fffcebb000 000003fff66ceff0 000003fff66b6826 000003ffff852bd8
[ 6672.505202] User Code: 000003fff66b681a: e320b0000016        llgf    %r2,0(%r11)
           000003fff66b6820: c0e5fffd273c       brasl   %r14,3fff665b698
          #000003fff66b6826: e31026b80004       lg      %r1,1720(%r2)
          >000003fff66b682c: e31010100004       lg      %r1,16(%r1)
           000003fff66b6832: e32010000002       ltg     %r2,0(%r1)
           000003fff66b6838: e320b0000016       llgf    %r2,0(%r11)
           000003fff66b683e: a78402ed           brc     8,3fff66b6e18
           000003fff66b6842: c0e5fffd272b       brasl   %r14,3fff665b698
[ 6672.505324] Last Breaking-Event-Address:
[ 6672.505328]  [<000003fffcecf64e>] 0x3fffcecf64e

Comment 22 Carlos O'Donell 2014-03-27 21:46:35 UTC

(In reply to Dan Horák from comment #20)
> and
> 
> commit 93a45ff1ca6d459618bb0cf93580c4b2809a4b61
> Author: Andreas Krebbel <krebbel.ibm.com>
> Date:   Tue Jan 7 09:36:31 2014 +0100
> 
>     S/390: Make jmp_buf extendible.
> 
> is the problem ...

I've contacted Andreas upstream and asked him for help looking into this since he is the author of the patch.

Comment 23 Dan Horák 2014-03-28 07:01:03 UTC

(In reply to Carlos O'Donell from comment #22)
> (In reply to Dan Horák from comment #20)
> > and
> > 
> > commit 93a45ff1ca6d459618bb0cf93580c4b2809a4b61
> > Author: Andreas Krebbel <krebbel.ibm.com>
> > Date:   Tue Jan 7 09:36:31 2014 +0100
> > 
> >     S/390: Make jmp_buf extendible.
> > 
> > is the problem ...
> 
> I've contacted Andreas upstream and asked him for help looking into this
> since he is the author of the patch.

oh, I did the same couple days ago, but forgot to mention it here :-)

Comment 24 Dan Horák 2014-03-28 09:35:21 UTC

Created attachment 879767 [details]
output when running the test with LD_DEBUG=versions

Comment 25 Michal Toman 2014-03-28 11:44:40 UTC

Running the build step by step I've found that a simple 'use Net::SSLeay' causes a segfault when used with the newer glibc. Attaching a backtrace.

Comment 26 Michal Toman 2014-03-28 11:45:13 UTC

Created attachment 879794 [details]
backtrace

Comment 27 Dan Horák 2014-03-28 14:45:54 UTC

this function from SSLeay.xs

UV get_my_thread_id(void) /* returns threads->tid() value */
{
    dSP;
    UV tid = 0;
    int count = 0;
 
#ifdef USE_ITHREADS
    ENTER;
    SAVETMPS;
    PUSHMARK(SP);
    XPUSHs(sv_2mortal(newSVpv("threads", 0)));
    PUTBACK;
    count = call_method("tid", G_SCALAR|G_EVAL);
    SPAGAIN;
    if (SvTRUE(ERRSV) || count != 1)
       /* if threads not loaded or an error occurs return 0 */
       tid = 0;
    else
       tid = (UV)POPi;
    PUTBACK;
    FREETMPS;
    LEAVE;
#endif
 
    return tid;
}

expands to

UV get_my_thread_id(void)
{
    SV **sp = (((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Istack_sp);
    UV tid = 0;
    int count = 0;


    Perl_push_scope(((PerlInterpreter *)pthread_getspecific(PL_thr_key)));
    Perl_save_int(((PerlInterpreter *)pthread_getspecific(PL_thr_key)), (int*)&(((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Itmps_floor)), (((PerlInterprete
r *)pthread_getspecific(PL_thr_key))->Itmps_floor) = (((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Itmps_ix);
    (void)( { if (++(((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Imarkstack_ptr) == (((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Imarkstack_max)) 
Perl_markstack_grow(((PerlInterpreter *)pthread_getspecific(PL_thr_key))); *(((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Imarkstack_ptr) = (I32)((sp) - (((P
erlInterpreter *)pthread_getspecific(PL_thr_key))->Istack_base)); } );
    ((void)(__builtin_expect((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Istack_max) - sp < (int)(1),0) && (sp = Perl_stack_grow(((PerlInterpreter *)pthrea
d_getspecific(PL_thr_key)), sp,sp,(int) (1)))), *++sp = (Perl_sv_2mortal(((PerlInterpreter *)pthread_getspecific(PL_thr_key)), Perl_newSVpv(((PerlInterpreter *)pthrea
d_getspecific(PL_thr_key)), "threads",0))));
    (((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Istack_sp) = sp;
    count = Perl_call_method(((PerlInterpreter *)pthread_getspecific(PL_thr_key)), "tid",2|8);
    sp = (((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Istack_sp);
    if ((((*((0+((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv))->sv_u.svu_gp)->gp_sv ? &((0+((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv))->sv_u.svu_gp)->gp_sv) : &((0+(Perl_gv_add_by_type(((PerlInterpreter *)pthread_getspecific(PL_thr_key)), ((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv)),SVt_NULL))->sv_u.svu_gp)->gp_sv)))) && ((((*((0+((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv))->sv_u.svu_gp)->gp_sv ? &((0+((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv))->sv_u.svu_gp)->gp_sv) : &((0+(Perl_gv_add_by_type(((PerlInterpreter *)pthread_getspecific(PL_thr_key)), ((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv)),SVt_NULL))->sv_u.svu_gp)->gp_sv))))->sv_flags & 0x00200000) ? Perl_sv_2bool_flags(((PerlInterpreter *)pthread_getspecific(PL_thr_key)), (*((0+((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv))->sv_u.svu_gp)->gp_sv ? &((0+((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv))->sv_u.svu_gp)->gp_sv) : &((0+(Perl_gv_add_by_type(((PerlInterpreter *)pthread_getspecific(PL_thr_key)), ((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv)),SVt_NULL))->sv_u.svu_gp)->gp_sv))),2) : ( !(((*((0+((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv))->sv_u.svu_gp)->gp_sv ? &((0+((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv))->sv_u.svu_gp)->gp_sv) : &((0+(Perl_gv_add_by_type(((PerlInterpreter *)pthread_getspecific(PL_thr_key)), ((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv)),SVt_NULL))->sv_u.svu_gp)->gp_sv))))->sv_flags & (0x00000100|0x00000200|0x00000400|0x00000800| 0x00001000|0x00002000|0x00004000|0x00008000) || (((svtype)(((*((0+((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv))->sv_u.svu_gp)->gp_sv ? &((0+((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv))->sv_u.svu_gp)->gp_sv) : &((0+(Perl_gv_add_by_type(((PerlInterpreter *)pthread_getspecific(PL_thr_key)), ((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv)),SVt_NULL))->sv_u.svu_gp)->gp_sv))))->sv_flags & 0xff)) == SVt_REGEXP || (((*((0+((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv))->sv_u.svu_gp)->gp_sv ? &((0+((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv))->sv_u.svu_gp)->gp_sv) : &((0+(Perl_gv_add_by_type(((PerlInterpreter *)pthread_getspecific(PL_thr_key)), ((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv)),SVt_NULL))->sv_u.svu_gp)->gp_sv))))->sv_flags & (0xff|0x00004000|0x00008000|0x01000000)) == (SVt_PVLV|0x01000000))) ? 0 : (((*((0+((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv))->sv_u.svu_gp)->gp_sv ? &((0+((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv))->sv_u.svu_gp)->gp_sv) : &((0+(Perl_gv_add_by_type(((PerlInterpreter *)pthread_getspecific(PL_thr_key)), ((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv)),SVt_NULL))->sv_u.svu_gp)->gp_sv))))->sv_flags & 0x00000400) ? ( ((XPV*)(((*((0+((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv))->sv_u.svu_gp)->gp_sv ? &((0+((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv))->sv_u.svu_gp)->gp_sv) : &((0+(Perl_gv_add_by_type(((PerlInterpreter *)pthread_getspecific(PL_thr_key)), ((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv)),SVt_NULL))->sv_u.svu_gp)->gp_sv)))))->sv_any) && ( ((XPV*)(((*((0+((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv))->sv_u.svu_gp)->gp_sv ? &((0+((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv))->sv_u.svu_gp)->gp_sv) : &((0+(Perl_gv_add_by_type(((PerlInterpreter *)pthread_getspecific(PL_thr_key)), ((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv)),SVt_NULL))->sv_u.svu_gp)->gp_sv)))))->sv_any)->xpv_cur > 1 || ( ((XPV*)(((*((0+((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv))->sv_u.svu_gp)->gp_sv ? &((0+((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv))->sv_u.svu_gp)->gp_sv) : &((0+(Perl_gv_add_by_type(((PerlInterpreter *)pthread_getspecific(PL_thr_key)), ((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv)),SVt_NULL))->sv_u.svu_gp)->gp_sv)))))->sv_any)->xpv_cur && *((*((0+((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv))->sv_u.svu_gp)->gp_sv ? &((0+((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv))->sv_u.svu_gp)->gp_sv) : &((0+(Perl_gv_add_by_type(((PerlInterpreter *)pthread_getspecific(PL_thr_key)), ((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv)),SVt_NULL))->sv_u.svu_gp)->gp_sv))))->sv_u.svu_pv != '0' ) ) ) : (((*((0+((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv))->sv_u.svu_gp)->gp_sv ? &((0+((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv))->sv_u.svu_gp)->gp_sv) : &((0+(Perl_gv_add_by_type(((PerlInterpreter *)pthread_getspecific(PL_thr_key)), ((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv)),SVt_NULL))->sv_u.svu_gp)->gp_sv))))->sv_flags & (0x00000100|0x00000200)) ? ( ((((*((0+((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv))->sv_u.svu_gp)->gp_sv ? &((0+((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv))->sv_u.svu_gp)->gp_sv) : &((0+(Perl_gv_add_by_type(((PerlInterpreter *)pthread_getspecific(PL_thr_key)), ((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv)),SVt_NULL))->sv_u.svu_gp)->gp_sv))))->sv_flags & 0x00000100) && ((XPVIV*) ((*((0+((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv))->sv_u.svu_gp)->gp_sv ? &((0+((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv))->sv_u.svu_gp)->gp_sv) : &((0+(Perl_gv_add_by_type(((PerlInterpreter *)pthread_getspecific(PL_thr_key)), ((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv)),SVt_NULL))->sv_u.svu_gp)->gp_sv))))->sv_any)->xiv_u.xivu_iv != 0) || ((((*((0+((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv))->sv_u.svu_gp)->gp_sv ? &((0+((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv))->sv_u.svu_gp)->gp_sv) : &((0+(Perl_gv_add_by_type(((PerlInterpreter *)pthread_getspecific(PL_thr_key)), ((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv)),SVt_NULL))->sv_u.svu_gp)->gp_sv))))->sv_flags & 0x00000200) && ((XPVNV*) ((*((0+((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv))->sv_u.svu_gp)->gp_sv ? &((0+((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv))->sv_u.svu_gp)->gp_sv) : &((0+(Perl_gv_add_by_type(((PerlInterpreter *)pthread_getspecific(PL_thr_key)), ((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv)),SVt_NULL))->sv_u.svu_gp)->gp_sv))))->sv_any)->xnv_u.xnv_nv != 0.0)) : (Perl_sv_2bool_flags(((PerlInterpreter *)pthread_getspecific(PL_thr_key)), (*((0+((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv))->sv_u.svu_gp)->gp_sv ? &((0+((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv))->sv_u.svu_gp)->gp_sv) : &((0+(Perl_gv_add_by_type(((PerlInterpreter *)pthread_getspecific(PL_thr_key)), ((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Ierrgv)),SVt_NULL))->sv_u.svu_gp)->gp_sv))),0))))) || count != 1)

       tid = 0;
    else
       tid = (UV)((IV)({SV *_sv = ((SV *)({ void *_p = ((*sp--)); _p; })); ((((_sv)->sv_flags & (0x00000100|0x00200000)) == 0x00000100) ? ((XPVIV*) (_sv)->sv_any)->xiv_u.xivu_iv : Perl_sv_2iv_flags(((PerlInterpreter *)pthread_getspecific(PL_thr_key)), _sv,2)); }));
    (((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Istack_sp) = sp;
    if ((((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Itmps_ix) > (((PerlInterpreter *)pthread_getspecific(PL_thr_key))->Itmps_floor)) Perl_free_tmps(((PerlInterpreter *)pthread_getspecific(PL_thr_key)));
    Perl_pop_scope(((PerlInterpreter *)pthread_getspecific(PL_thr_key)));


    return tid;
}

Comment 28 Andreas Krebbel 2014-04-03 11:03:57 UTC

The change of the jmpbuf size requires that all packages that exchange jmpbufs are upgraded at once.

The sequence you describe above:
"reduced reproducer
- install F-20
- update glibc from http://fedora.danny.cz/s390/glibc-2.18.90-20.fc21.dh.1/ - it is glibc-2.18.90-20.fc21 + commit 93a45ff1
- rpmbuild --rebuild http://fedora.danny.cz/s390/perl-Net-SSLeay-1.58-1.fc21.src.rpm"

fails since you have the perl-Net* package built with the new jmpbuf and perl itself with the old. They both expect different sizes for the same data type.

I'm not sure where the original failure came from but it probably has to do with the order in which you've upgrade the packages.

I haven't figured out all the details yet but I can confirm that just commenting out the additional fields in /usr/include/bits/setjmp.h makes the problem disappear.

Comment 29 Dan Horák 2014-04-03 11:41:27 UTC

Thanks, Andreas, your explanation makes sense. I'm going to dig into the perl itself first.

Comment 30 Jeff Law 2014-04-03 12:52:47 UTC

Andreas,

Let me make sure I understand.  You're saying that code which exchanges jmp_bufs have to be upgraded in lock-step together or they will (possibly silently) fail?

In effect what we've got here is an ABI/API break across the glibc version, right?

Comment 31 Andreas Krebbel 2014-04-03 13:52:53 UTC

Yes.

This is an expected result of the jmpbuf extension. I've tried to minimize the effect by versioning all the accessor functions but symbol versioning is not available for data structures. In the end there is just one header file defining the structure. Code which uses the old header file is not compatible with code using the new header file in case jmpbufs are transferred between the two.

This happened in the past already. E.g. for Power with the introduction of Altivec or with the long double 64->128 bit extension.

Comment 32 Carlos O'Donell 2014-04-04 00:38:01 UTC

(In reply to Andreas Krebbel from comment #31)
> Yes.
> 
> This is an expected result of the jmpbuf extension. I've tried to minimize
> the effect by versioning all the accessor functions but symbol versioning is
> not available for data structures. In the end there is just one header file
> defining the structure. Code which uses the old header file is not
> compatible with code using the new header file in case jmpbufs are
> transferred between the two.
> 
> This happened in the past already. E.g. for Power with the introduction of
> Altivec or with the long double 64->128 bit extension.

We can do better though :-)

All of this could have been handled by using the compiler to generate a .gnu.attribute entry for the new ABI when such a structure was used. Then the static linker could generate a warning when linking mixed ABI objects (undefined + new ABI) or an error (old ABI + new ABI). This results in a much better user experience and the .gnu.attributes track which ABI components are in use (look at ARM which tracks the size of wchar_t).

Nobody likes to do this because it's work and nobody has yet extended the compiler to do this kind of suppression of the "don't care" state to make objects as interoperable as possible.

Background reading:
Binutils documention on attributes:
https://sourceware.org/binutils/docs-2.21/as/GNU-Object-Attributes.html#GNU-Object-Attributes
Discussion around "don't care attributes"
https://www.sourceware.org/ml/libc-alpha/2011-02/msg00130.html

Comment 33 Andreas Krebbel 2014-04-04 08:03:18 UTC

(In reply to Carlos O'Donell from comment #32)
> All of this could have been handled by using the compiler to generate a
> .gnu.attribute entry for the new ABI when such a structure was used. Then
> the static linker could generate a warning when linking mixed ABI objects
> (undefined + new ABI) or an error (old ABI + new ABI). This results in a
> much better user experience and the .gnu.attributes track which ABI
> components are in use (look at ARM which tracks the size of wchar_t).

So far this has been used solely for indicating ABI relevant changes inflicted by compiler options.  What you propose would be the first use for changes of Glibc data structures. It probably requires some more work to either 
detect all usages of such data structures and compare their definitions within GCC to emit the proper flags
- or -
to provide a language level type attribute to put an abi tag on data structures which is then translated by GCC to the .gnu.attr... stuff (after tracking down all its embedded uses).

While I think that mechanism would have been useful for static linking the situation with dynamic linking and Glibc data structures is a bit better since we have the accessor functions under control. Of course there might be somebody directly accessing a jmpbuf but that's hopefully a very rare case. Due to the symbol versioning of the accessor functions there are only few cases left where this is actually a problem. In general you can dynamically link two objects using different jmpbuf versions. They would use different sets of setjmp/longjmp symbols in glibc and all should be fine. Problems only occur if they pass jmpbuf objects to each other. So the mechanism above would trigger in too many cases to be useful I think.

Note: In fact even passing jmpbufs between .so's isn't a problem currently since the reserved fields are never accessed. The only problem we have right now is if:
1. a jmpbuf is embedded in another data structure (not being the last element)
2. that data structure is shared among modules assuming different jmpbuf sizes

Comment 34 Andreas Krebbel 2014-04-04 08:20:30 UTC

(In reply to Dan Horák from comment #29)
> Thanks, Andreas, your explanation makes sense. I'm going to dig into the
> perl itself first.

To my understanding the problem is that a sigjmp_buf is embedded into the main perl interpreter structure.

cop.h:
struct jmpenv {                                                                                        
    struct jmpenv *     je_prev;                                                                       
    Sigjmp_buf          je_buf;         <---- jmpbuf                              
    int                 je_ret;                                           
    bool                je_mustcatch;                                    
};
typedef struct jmpenv JMPENV;

intrpvar.h:
...
PERLVAR(I, top_env,     JMPENV *)                  
PERLVAR(I, start_env,   JMPENV)        <---- !!!            
PERLVARI(I, errors,     SV *,   NULL)
...

The struct interpreter is passed to many .so's involved with perl via my_perl argument.  In one of the examples I've debugged the problem arose from having perl-version built with the old glibc headers and perl itself with the new version.

So the /usr/lib64/perl5/vendor_perl/auto/version/vxs/vxs.so module coming from perl-version used different offsets into the my_perl structure than perl itself.

If all the required perl .so files come from RPMs rebuilding all of them at once should help. What I don't know is whether perl .so files dealing with struct interpreter might come in from other sources as well like CPAN?!

Comment 35 Carlos O'Donell 2014-04-04 22:00:48 UTC

Andreas,

I've written up "Packaging Changes" notes for this in upstream:
https://sourceware.org/glibc/wiki/Release/2.19#Packaging_Changes

Could you please checkin a note to the 2.19 section of the NEWS file in upstream stating that there is an ABI even for s390/s390x, please also could you backport that to the active 2.19 branch (requires Allan McRae to sign off).

This way we've covered our bases and made it clear in NEWS and release notes that there is a potential ABI issue coming down the pipe.

I will work within Red Hat to get this information to all of our customers.

(In reply to Andreas Krebbel from comment #33)
> (In reply to Carlos O'Donell from comment #32)
> > All of this could have been handled by using the compiler to generate a
> > .gnu.attribute entry for the new ABI when such a structure was used. Then
> > the static linker could generate a warning when linking mixed ABI objects
> > (undefined + new ABI) or an error (old ABI + new ABI). This results in a
> > much better user experience and the .gnu.attributes track which ABI
> > components are in use (look at ARM which tracks the size of wchar_t).
> 
> So far this has been used solely for indicating ABI relevant changes
> inflicted by compiler options.  What you propose would be the first use for
> changes of Glibc data structures. It probably requires some more work to
> either 
> detect all usages of such data structures and compare their definitions
> within GCC to emit the proper flags
> - or -
> to provide a language level type attribute to put an abi tag on data
> structures which is then translated by GCC to the .gnu.attr... stuff (after
> tracking down all its embedded uses).

That is correct. Nobody wants to be the first to attempt this :-)

Worse is that this only works when building your application.

At runtime if the library is updated you need to use an ELF header flag (e_flag) bit or 2 bits to annotate the ABI change and this allows ldconfig to correctly discover and handle allowing old binaries to load new modules with the new ABI.

Note that this is ABI markup at the object file level for runtime diagnostics, but we really want that data to live at the function and and variable and trickle up. Keeping the ABI markup at the function level for the runtime is probably too costly. Imagine the dynamic loader comparing function ABIs as it resolves PLT entries!

> While I think that mechanism would have been useful for static linking the
> situation with dynamic linking and Glibc data structures is a bit better
> since we have the accessor functions under control. Of course there might be
> somebody directly accessing a jmpbuf but that's hopefully a very rare case.
> Due to the symbol versioning of the accessor functions there are only few
> cases left where this is actually a problem. In general you can dynamically
> link two objects using different jmpbuf versions. They would use different
> sets of setjmp/longjmp symbols in glibc and all should be fine. Problems
> only occur if they pass jmpbuf objects to each other. So the mechanism above
> would trigger in too many cases to be useful I think.

That is correct, but this issue shows that it's actually common to run into these problems changing the size of any of the structures exported for public use by glibc.

Fixing the accessor macros never works perfectly. Too many applications simply embedded the jmpbuf direclty into another structure and that is eventually used by newer compiled object code which expects the new size and it fails.

I expect Ruby is going to fail also since it embeds jmp_buf similarly.

> Note: In fact even passing jmpbufs between .so's isn't a problem currently
> since the reserved fields are never accessed. The only problem we have right
> now is if:
> 1. a jmpbuf is embedded in another data structure (not being the last
> element)
> 2. that data structure is shared among modules assuming different jmpbuf
> sizes

That is correct.

Unfortunately this is much more common than you think.

Either way, if we need to extend jmp_buf and struct ucontext we need to do it.

Our primary goals should be:

* Clear communication to our customers of both the benefits and the problems.

* Better diagnostics for mixing code that could result in an ABI breakage.

I think we can and should be doing better on that second bullet point.

Comment 36 Carlos O'Donell 2014-04-04 22:03:00 UTC

(In reply to Andreas Krebbel from comment #34)
> If all the required perl .so files come from RPMs rebuilding all of them at
> once should help. What I don't know is whether perl .so files dealing with
> struct interpreter might come in from other sources as well like CPAN?!

We can only support those modules we build ourselves and distribute with RHEL. In that case we can make sure everything is rebuilt and works. What we can't guarantee is that an old module built by a user works correctly. So any user upgrading to say RHEL8 (hypothetical) will need to rebuild all of their perl modules because of the ABI breakage.

Comment 37 Dan Horák 2014-04-10 13:20:49 UTC

I have restarted rawhide builds and the change seems to be more severe than I thought originally. The perl stack is mixing old and rebuilt modules too often ...

Comment 38 Petr Pisar 2014-04-10 13:46:59 UTC

(In reply to Dan Horák from comment #37)
> I have restarted rawhide builds and the change seems to be more severe than
> I thought originally. The perl stack is mixing old and rebuilt modules too
> often ...

Do you talk about perl.spec itself or about building a Perl package in general? 

In the first case, this should not happen in minimal build root.

In the second case, you have to do the bootstrap. I.e. to rebuild the packages in dependency order and with defined perl_bootstrap spec macro and with changed rebuild_from_scratch macro in perl.spec and you have to treat dual-living packages specially.

Comment 39 Dan Horák 2014-04-11 08:02:55 UTC

(In reply to Petr Pisar from comment #38)
> (In reply to Dan Horák from comment #37)
> > I have restarted rawhide builds and the change seems to be more severe than
> > I thought originally. The perl stack is mixing old and rebuilt modules too
> > often ...
> 
> Do you talk about perl.spec itself or about building a Perl package in
> general? 

perl-5.18.2-297.fc21 build went fine, thanks for the fix. The problem lies app that use perl (eg. automake) or additional perl modules.

> In the first case, this should not happen in minimal build root.
> 
> In the second case, you have to do the bootstrap. I.e. to rebuild the
> packages in dependency order and with defined perl_bootstrap spec macro and
> with changed rebuild_from_scratch macro in perl.spec and you have to treat
> dual-living packages specially.

Yeah, I'm thinking about some kind of bootstrap. Unfortunately doing such thing solely for a secondary arch is difficult, so I'm thinking about the options.

Comment 40 Carlos O'Donell 2014-08-28 00:33:36 UTC

This is now fixed as IBM have reverted their patches and we've synchronzied with upstream.

Note You need to log in before you can comment on or make changes to this bug.