Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.

Bug 1864107

Summary:

m4: FTBFS in Fedora rawhide/f33

Product:

[Fedora] Fedora

Reporter:

Fedora Release Engineering <releng>

Component:

Assignee:

Vitezslav Crhonek <vcrhonek>

Status:

CLOSED RAWHIDE

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

CC:

codonell, mpolacek, praiskup, vcrhonek

Target Milestone:

---

Keywords:

Reopened

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

m4-1.4.18-16.fc34

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2020-10-13 09:47:44 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1803234

Attachments:

Description	Flags
build.log	none
root.log	none
state.log	none
test-float.i	none

Description Fedora Release Engineering 2020-08-03 18:00:23 UTC

m4 failed to build from source in Fedora rawhide/f33

https://koji.fedoraproject.org/koji/taskinfo?taskID=47996902


For details on the mass rebuild see:

https://fedoraproject.org/wiki/Fedora_33_Mass_Rebuild
Please fix m4 at your earliest convenience and set the bug's status to
ASSIGNED when you start fixing it. If the bug remains in NEW state for 8 weeks,
m4 will be orphaned. Before branching of Fedora 34,
m4 will be retired, if it still fails to build.

For more details on the FTBFS policy, please visit:
https://fedoraproject.org/wiki/Fails_to_build_from_source

Comment 1 Fedora Release Engineering 2020-08-03 18:00:25 UTC

Created attachment 1705778 [details]
build.log

file build.log too big, will only attach last 32768 bytes

Comment 2 Fedora Release Engineering 2020-08-03 18:00:26 UTC

Created attachment 1705779 [details]
root.log

file root.log too big, will only attach last 32768 bytes

Comment 3 Fedora Release Engineering 2020-08-03 18:00:27 UTC

Created attachment 1705780 [details]
state.log

Comment 4 Vitezslav Crhonek 2020-08-04 07:43:23 UTC

make check fails on ppc64le:

../build-aux/test-driver: line 107: 3320412 Aborted                 (core dumped) "$@" > $log_file 2>&1
FAIL: test-float

Comment 5 Vitezslav Crhonek 2020-08-04 09:10:57 UTC

test-float.c:318: assertion 'm + m > m' failed

Program received signal SIGABRT, Aborted.
0x00007ffff7d88f04 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff7d88f04 in raise () from /lib64/libc.so.6
#1  0x00007ffff7d69868 in abort () from /lib64/libc.so.6
#2  0x0000000100000e20 in test_float () at test-float.c:165
#3  main () at test-float.c:359

Comment 6 Vitezslav Crhonek 2020-08-06 13:53:27 UTC

Workarounded by disabling %check on ppc64le for now.

Comment 7 Carlos O'Donell 2020-08-10 21:21:24 UTC

Reopening. This is a bug in gnulib's detection of a working float.h. You're going to need to update this.

When run on POWER9 hardware I see the following:

cat test-float.log
test-float.c:318: assertion 'm + m > m' failed
FAIL test-float (exit status: 134)

Adding instrumentation I see the following:

LDBL_MAX = inf
m = inf
test-float.c:320: assertion 'm + m > m' failed
Aborted (core dumped)

This can't be right and looks like a compiler issue.

We should not have ended up with LDBL_MAX being equal to INF to start with.

Unfortunately I can't reduce this, the appropriate extracted code works as intended.

Leading up to the assert:

=> 0x0000000100000c8c <+876>:	addis   r9,r2,-2
   0x0000000100000c90 <+880>:	addi    r9,r9,-25152
   0x0000000100000c94 <+884>:	lfd     f0,0(r9)
   0x0000000100000c98 <+888>:	lfd     f1,8(r9)
   0x0000000100000c9c <+892>:	stfd    f0,304(r1)
   0x0000000100000ca0 <+896>:	stfd    f1,312(r1)
   0x0000000100000ca8 <+904>:	lfd     f1,304(r1)
   0x0000000100000cac <+908>:	lfd     f2,312(r1)
   0x0000000100000cb0 <+912>:	lfd     f3,304(r1)
   0x0000000100000cb4 <+916>:	lfd     f4,312(r1)

Parameters should be f1-f4.

$f1 == inf
$f2 == 0

$f3 == inf
$f4 == 0

So we are about to do "m + m" and the value of m is already wrong.

   0x0000000100000cb8 <+920>:	bl      0x100001758 <__gcc_qadd+8>

Do the add.

   0x0000000100000cbc <+924>:	nop
   0x0000000100000cc0 <+928>:	lfd     f0,304(r1)

Reload half of m.

   0x0000000100000cc4 <+932>:	fmr     f12,f1
   0x0000000100000cc8 <+936>:	fmr     f13,f2

Move result from f1/f2 to f12/f13.

   0x0000000100000ccc <+940>:	lfd     f1,312(r1)

Reload other half of m.

=> 0x0000000100000cd0 <+944>:	fcmpu   cr0,f12,f0

Compare INF to INF and the assert (INF + INF > INF) fails.

What is odd is that 304/312 + r1 is stored to by this earlier sequence (you see it in the original disassembly):

=> 0x0000000100000c8c <+876>:	addis   r9,r2,-2
   0x0000000100000c90 <+880>:	addi    r9,r9,-25152
   0x0000000100000c94 <+884>:	lfd     f0,0(r9)
   0x0000000100000c98 <+888>:	lfd     f1,8(r9)

Address 0+$r9 is 0x100001cc0 and it's here:

100000000-100010000 r-xp 00000000 fd:00 1774810                          /root/rpmbuild/BUILD/m4-1.4.18/tests/test-float
100010000-100020000 r--p 00000000 fd:00 1774810                          /root/rpmbuild/BUILD/m4-1.4.18/tests/test-float
100020000-100030000 rw-p 00010000 fd:00 1774810                          /root/rpmbuild/BUILD/m4-1.4.18/tests/test-float

That value is in the executable image, probably a constant pool.

It's odd that we'd load INF from a constant pool that should contain LDBL_MAX?

The pre-processed source is more interesting:

  {
    volatile long double m =
# 315 "test-float.c" 3
                            (gl_LDBL_MAX.ld)
# 315 "test-float.c"
                                    ;
    int n;

    do { if (!(m + m > m)) { fprintf (
# 318 "test-float.c" 3 4
   stderr
# 318 "test-float.c"
   , "%s:%d: assertion '%s' failed\n", "test-float.c", 318, "m + m > m");
# 318 "test-float.c" 3
   rpl_fflush
# 318 "test-float.c"
   (
# 318 "test-float.c" 3 4
   stderr
# 318 "test-float.c"
   ); abort (); } } while (0);

It looks like we're triggering the generation and inclusion of lib/float.h, and that doesn't work.

149 union gl_long_double_union
150   {
151     struct { double hi; double lo; } dd;
152     long double ld;
153   };
154 extern const union gl_long_double_union gl_LDBL_MAX;
155 # define LDBL_MAX (gl_LDBL_MAX.ld)

 24 const union gl_long_double_union gl_LDBL_MAX =
 25   { { DBL_MAX, DBL_MAX / (double)134217728UL / (double)134217728UL } };

Eventually this loaded value is invalid.

This is either a compiler problem or a problem in the gnulib float.h headers.

Comment 8 Carlos O'Donell 2020-08-11 03:11:39 UTC

Created attachment 1711030 [details]
test-float.i

Attaching pre-processed test-float.i

Comment 9 Carlos O'Donell 2020-08-11 03:52:50 UTC

Removing float.h from inclusion reveals the next problem.

test-float.c:324: assertion 'x + x == x' failed
Aborted (core dumped)

This is what I was expecting given my review of the code.

#include <stdio.h>
#include <assert.h>
#include <float.h>
#include <math.h>

int
main (void)
{
  int n = 107;
  volatile long double m = LDBL_MAX;
  volatile long double pow2_n = powl (2, n);
  volatile long double x = m + (m / pow2_n);

  printf ("n = %d\n", n);
  printf ("m = %Lf (%La)\n", m, m);
  printf ("pow2_n = %Lf (%La)\n", pow2_n, pow2_n);
  printf ("m / pow2_n = %Lf (%La)\n", (m / pow2_n), (m / pow2_n));
  printf ("x = %Lf (%La)\n", x, x);

  if (x > m)
    assert (x + x == x);
  return 0;
}

gcc -o ~/test-ldbl-max ~/test-ldbl-max.c -lm

~/test-ldbl-max
n = 107
m = 179769313486231580793728971405301199252069012264752390332004544495176179865349768338004270583473493681874097135387894924752516923758125018237039690323659469736010689648748751591634331824498526377862231967249520608291850653495428451067676993116107021027413767397958053860876625383538022115414866471826801819648.000000 (0x1.fffffffffffff7ffffffffffff8p+1023)
pow2_n = 162259276829213363391578010288128.000000 (0x1p+107)
m / pow2_n = 1107913932560222581216724223049124694376931327937918798971295069363205703164244740389102844506567402654244799528342026118673562844811584683014545030137100678976901567468093855075985516353544747282849589098225960074532039651619564827101237983225846137075291097947344654582153216.000000 (0x1.fffffffffffff7ffffffffffff8p+916)
x = 179769313486231580793728971405301199252069012264752390332004544495176179865349768338004270583473493681874097135387894924752516923758125018237039690323659469736010689648748751591634331824498526377862231967249520608291850653495428451067676993116107021027413767397958053860876625383538022115414866471826801819648.000000 (0x1.fffffffffffff7ffffffffffffcp+1023)
test-ldbl-max: /root/test-ldbl-max.c:21: main: Assertion `x + x == x' failed.
Aborted (core dumped)

There is a representable value that is in theory larger than LDBL_MAX and so we assert.

Note that x > m, because 0x1.fffffffffffff7ffffffffffffcp+1023 > 0x1.fffffffffffff7ffffffffffff8p+1023, but x + x most certainly INF not x.

Is this a problem with __LDBL_MAX__ as defined by the compiler?

Comment 10 Carlos O'Donell 2020-08-11 03:54:01 UTC

Marek, What do you make of the test case in comment #9?

Comment 11 Ben Cotton 2020-08-11 14:16:32 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 33 development cycle.
Changing version to 33.

Comment 12 Marek Polacek 2020-08-11 16:06:30 UTC

(In reply to Carlos O'Donell from comment #10)
> Marek, What do you make of the test case in comment #9?

Looks like there indeed is a bug in GCC: https://gcc.gnu.org/PR95450.  It hasn't been fixed yet.  I'll try to bisect it.

Comment 13 Vitezslav Crhonek 2020-08-18 10:30:53 UTC

Thank you very much for your investigation of the issue. I'll remove the workaround when the bug is fixed in GCC.

Comment 14 Vitezslav Crhonek 2020-10-13 09:47:44 UTC

The bug in GCC has been fixed, workaround is no longer needed.