Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.

Bug 1677602

Summary: pseudo-RNG mis-compiled with gcc9
Product: [Fedora] Fedora Reporter: Dan Horák <dan>
Component: gccAssignee: Jakub Jelinek <jakub>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: aoliva, bugproxy, davejohansen, dmalcolm, fweimer, hannsj_uhl, ingvar, jakub, jwakely, law, mpolacek, msebor, nickc
Target Milestone: ---   
Target Release: ---   
Hardware: s390x   
OS: Linux   
Whiteboard:
Fixed In Version: gcc-9.0.1-0.6.fc30 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-03-01 13:02:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 467765, 1675181    
Attachments:
Description Flags
reduced standalone test case
none
preprocessed reduced standalone test case none

Description Dan Horák 2019-02-15 10:46:02 UTC
Created attachment 1535113 [details]
reduced standalone test case

Description of problem:
Looks like gcc9 mis-compiles a pseudo-RNG that's part of jemalloc test suite (https://github.com/jemalloc/jemalloc/blob/dev/test/unit/SFMT.c and https://github.com/jemalloc/jemalloc/blob/dev/test/src/SFMT.c)

When the do_recursion() function in the attached source code is compiled with -O0, then the check passes.


Version-Release number of selected component (if applicable):
gcc-9.0.1-0.4.fc30.s390x

How reproducible:


Steps to Reproduce:
1. gcc -o test -O2 -Wall test.c
2. ./test
3.

Actual results:
Output mismatch for i=2
Output mismatch for i=6


Expected results:
no output

Comment 1 Dan Horák 2019-02-15 10:46:37 UTC
Created attachment 1535114 [details]
preprocessed reduced standalone test case

Comment 2 Jakub Jelinek 2019-02-15 13:39:59 UTC
Needs -march=zEC12 -mtune=z13 -O2 to reproduce (haven't tried other arches admittedly, the default I had in my cross didn't reproduce it).
Started with http://gcc.gnu.org/r266203 .  Let me bisect it manually which function is affected.

Comment 3 Dan Horák 2019-02-15 13:49:41 UTC
do_recursion() breaks the generator

Comment 4 Jakub Jelinek 2019-02-15 13:52:02 UTC
Yeah, verified that too, taking r266202 produced code (which works) and patching r266203 do_recursion makes it fail, while r266203 init_gen_rand and init_by_array are fine (the only two other changed functions).

Comment 5 Jakub Jelinek 2019-02-15 14:42:24 UTC
Reduced testcase:
#if __SIZEOF_INT__ == 4 && __SIZEOF_LONG_LONG__ == 8 && __CHAR_BIT__ == 8
struct S { unsigned int u[4]; };

static void
foo (struct S *out, struct S const *in, int shift)
{
  unsigned long long th, tl, oh, ol;
  th = ((unsigned long long) in->u[3] << 32) | in->u[2];
  tl = ((unsigned long long) in->u[1] << 32) | in->u[0];
  oh = th >> (shift * 8);
  ol = tl >> (shift * 8);
  ol |= th << (64 - shift * 8);
  out->u[1] = ol >> 32;
  out->u[0] = ol;
  out->u[3] = oh >> 32;
  out->u[2] = oh;
}

static void
bar (struct S *out, struct S const *in, int shift)
{
  unsigned long long th, tl, oh, ol;
  th = ((unsigned long long) in->u[3] << 32) | in->u[2];
  tl = ((unsigned long long) in->u[1] << 32) | in->u[0];
  oh = th << (shift * 8);
  ol = tl << (shift * 8);
  oh |= tl >> (64 - shift * 8);
  out->u[1] = ol >> 32;
  out->u[0] = ol;
  out->u[3] = oh >> 32;
  out->u[2] = oh;
}

__attribute__((noipa)) static void
baz (struct S *r, struct S *a, struct S *b, struct S *c, struct S *d)
{
  struct S x, y;
  bar (&x, a, 1);
  foo (&y, c, 1);
  r->u[0] = a->u[0] ^ x.u[0] ^ ((b->u[0] >> 11) & 0xdfffffefU) ^ y.u[0] ^ (d->u[0] << 18);
  r->u[1] = a->u[1] ^ x.u[1] ^ ((b->u[1] >> 11) & 0xddfecb7fU) ^ y.u[1] ^ (d->u[1] << 18);
  r->u[2] = a->u[2] ^ x.u[2] ^ ((b->u[2] >> 11) & 0xbffaffffU) ^ y.u[2] ^ (d->u[2] << 18);
  r->u[3] = a->u[3] ^ x.u[3] ^ ((b->u[3] >> 11) & 0xbffffff6U) ^ y.u[3] ^ (d->u[3] << 18);
}

int
main ()
{
  struct S a[] = { { 0x000004d3, 0xbc5448db, 0xf22bde9f, 0xebb44f8f },
		   { 0x03a32799, 0x60be8246, 0xa2d266ed, 0x7aa18536 },
		   { 0x15a38518, 0xcf655ce1, 0xf3e09994, 0x50ef69fe },
		   { 0x88274b07, 0xe7c94866, 0xc0ea9f47, 0xb6a83c43 },
		   { 0xcd0d0032, 0x5d47f5d7, 0x5a0afbf6, 0xaea87b24 },
		   { 0, 0, 0, 0 } };
  baz (&a[5], &a[0], &a[1], &a[2], &a[3]);
  if (a[4].u[0] != a[5].u[0] || a[4].u[1] != a[5].u[1]
      || a[4].u[2] != a[5].u[2] || a[4].u[3] != a[5].u[3])
    __builtin_abort ();
  return 0;
}
#else
int
main ()
{
  return 0;
}
#endif

Comment 6 Jakub Jelinek 2019-02-15 14:52:31 UTC
When miscompiled, a[5].u[2] is 0xa40afbf6 instead of expected 0x5a0afbf6 (different upper byte).

Comment 7 Jakub Jelinek 2019-02-15 17:14:12 UTC
So, I think the problem is in the
        rxsbg   %r1,%r11,40,63,56
instruction, %r11 holds the right value here of 0x50ef69fef3e09994ULL and we want to perform %r1_SI ^= (SI) (%r11_DI >> 8), but instead of xoring in 0xfef3e099 it xors in 0xf3e099.
In *.final it is:
(insn 67 65 68 2 (parallel [
            (set (reg:SI 1 %r1 [189])
                (xor:SI (subreg:SI (zero_extract:DI (reg/v:DI 11 %r11 [orig:89 th ] [89])
                            (const_int 32 [0x20])
                            (const_int 24 [0x18])) 4)
                    (reg:SI 1 %r1 [187])))
            (clobber (reg:CC 33 %cc))
        ]) "rh1677602.c":42:73 1415 {*rxsbg_sidi_srl}
     (expr_list:REG_DEAD (reg/v:DI 11 %r11 [orig:89 th ] [89])
        (expr_list:REG_UNUSED (reg:CC 33 %cc)
            (nil))))
which looks probably good, zero extract counts the bits in memory order, so if we have a 64-bit number big-endian, we want to skip first 24 bits, then use 32 bits and finally skip last 8 bits.

Comment 8 Jakub Jelinek 2019-03-01 13:02:29 UTC
Should be fixed in gcc-9.0.1-0.6.fc30.s390x and later.