Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.

Bug 2273618

Summary: Optimizing with -O2 causes wrong results on s390x
Product: [Fedora] Fedora Reporter: Jonas Ådahl <jadahl>
Component: gccAssignee: Jakub Jelinek <jakub>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 40CC: dan, dmalcolm, fweimer, jakub, jlaw, josmyers, jwakely, mcermak, mpolacek, msebor, nickc, nixuser, sipoyare
Target Milestone: ---   
Target Release: ---   
Hardware: s390x   
OS: Linux   
Whiteboard:
Fixed In Version: gcc-14.0.1-0.14.fc41 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-04-12 13:45:04 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 467765    
Attachments:
Description Flags
Reproducer none

Description Jonas Ådahl 2024-04-05 11:00:05 UTC
When investigating faulty rendering in GNOME Shell when running under s390x, I eventually discovered that compiling mutter with -O0 made the issue go away.

Eventually I narrowed it down to a function that did a memcpy from a local float array to a stack allocated float array in a callee.

I could also work around it in three ways:

* #pragma GCC optimize ("O0") around the affected function.
* Mark the float array copied from as volatile
* Switch the memcpy to a for loop

With that in mind, I took the relevant code, removed as much as I could while still reproducing. It isn't only the memcpy; e.g. it needs a bit of noise to make it reproduce.

Attaching reproducing C file. When running, if it doesn't reproduce, it exits cleanly. If it reproduces it'll print

1.000000 == 0.000000 failed
Aborted (core dumped)

The three discovered workarounds are included in the C file, hidden behind `#if 0`.

Reproducible: Always

Comment 1 Jonas Ådahl 2024-04-05 11:00:46 UTC
Created attachment 2025354 [details]
Reproducer

Comment 2 Dan Horák 2024-04-05 11:45:42 UTC
Jonas, could you make also the attachment public? Thanks.

Comment 3 Jonas Ådahl 2024-04-05 11:55:08 UTC
(In reply to Dan Horák from comment #2)
> Jonas, could you make also the attachment public? Thanks.

Done; sorry about that.

Comment 4 Dan Horák 2024-04-05 12:05:02 UTC
Thanks and for the record it reproduces on z14 with gcc-14.0.1-0.13.fc41.s390x, but not with gcc-13.2.1-4.fc38.s390x

Comment 5 Jakub Jelinek 2024-04-05 12:11:44 UTC
Simplified for -march=z13 -O0:

typedef struct { const float *a; int b, c; float *d; } S;

__attribute__((noipa)) void
bar (void)
{
}

__attribute__((noinline, optimize (2))) static void
foo (S *e)
{
  const float *f;
  float *g;
  float h[4] = { 0.0, 0.0, 1.0, 1.0 };
  if (!e->b)
    f = h;
  else
    f = e->a;
  g = &e->d[0];
  __builtin_memcpy (g, f, sizeof (float) * 4);
  bar ();
  if (!e->b)
    if (g[0] != 0.0 || g[1] != 0.0 || g[2] != 1.0 || g[3] != 1.0)
      __builtin_abort ();
}

int
main ()
{
  float d[4];
  S e = { .d = d };
  foo (&e);
  return 0;
}

Bisecting now.

Comment 6 Jakub Jelinek 2024-04-05 13:10:07 UTC
Bisected to https://gcc.gnu.org/r14-5831