Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1799842
Summary: | pacemaker: FTBFS in Fedora rawhide/f32 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Fedora Release Engineering <releng> | ||||||||
Component: | pacemaker | Assignee: | Jan Pokorný [poki] <jpokorny> | ||||||||
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
Severity: | unspecified | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 32 | CC: | andrew, anprice, dan, hannsj_uhl, jpokorny, lhh, mhroncok | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | pacemaker-2.0.3-4.fc33 | Doc Type: | If docs needed, set a value | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2020-03-06 17:32:16 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | 1811158, 1799903 | ||||||||||
Bug Blocks: | 485231, 1750908, 1803234, 1785415, 1792464 | ||||||||||
Attachments: |
|
Description
Fedora Release Engineering
2020-02-06 19:07:43 UTC
Created attachment 1660228 [details]
build.log
Created attachment 1660229 [details]
root.log
file root.log too big, will only attach last 32768 bytes
Created attachment 1660230 [details]
state.log
re [comment 2]: > DEBUG util.py:596: Error: > DEBUG util.py:596: Problem: package publican-4.3.2-14.fc31.noarch requires fop, but none of the providers can be installed > DEBUG util.py:596: - package fop-2.2-4.fc30.noarch requires avalon-framework >= 4.1.4, but none of the providers can be installed > DEBUG util.py:596: - conflicting requests > DEBUG util.py:596: - nothing provides mvn(avalon-logkit:avalon-logkit) needed by avalon-framework-4.3-24.fc31.noarch This is a chained dependency problem, immediately coming from publican, see the respective [bug 1799903 comment 4] for a more complete story. Marking this immediate dependency here -- nothing we can do about the FTBFS state right away (well, except for excluding documentation from what we ship in Fedora, which is rather an extreme workaround). This bug appears to have been reported against 'rawhide' during the Fedora 32 development cycle. Changing version to 32. Dear Maintainer, your package has not been built successfully in 32. Action is required from you. If you can fix your package to build, perform a build in koji, and either create an update in bodhi, or close this bug without creating an update, if updating is not appropriate [1]. If you are working on a fix, set the status to ASSIGNED to acknowledge this. Following the latest policy for such packages [2], your package will be orphaned if this bug remains in NEW state more than 8 weeks. A week before the mass branching of Fedora 33 according to the schedule [3], any packages not successfully rebuilt at least on Fedora 31 will be retired regardless of the status of this bug. [1] https://fedoraproject.org/wiki/Updates_Policy [2] https://docs.fedoraproject.org/en-US/fesco/Fails_to_build_from_source_Fails_to_install/ [3] https://fedoraproject.org/wiki/Releases/33/Schedule pacemaker fails to build with Python 3.9.0a3 due to this Python-unrelaed FTBFS with gcc 10: ... /usr/bin/ld: pacemaker_attrd-attrd_utils.o:/builddir/build/BUILD/pacemaker-Pacemaker-2.0.3/daemons/attrd/pacemaker-attrd.h:109: multiple definition of `attrd_cluster'; pacemaker_attrd-pacemaker-attrd.o:/builddir/build/BUILD/pacemaker-Pacemaker-2.0.3/daemons/attrd/pacemaker-attrd.h:109: first defined here /usr/bin/ld: pacemaker_attrd-attrd_alerts.o:/builddir/build/BUILD/pacemaker-Pacemaker-2.0.3/daemons/attrd/pacemaker-attrd.h:110: multiple definition of `attributes'; pacemaker_attrd-pacemaker-attrd.o:/builddir/build/BUILD/pacemaker-Pacemaker-2.0.3/daemons/attrd/pacemaker-attrd.h:110: first defined here /usr/bin/ld: pacemaker_attrd-attrd_alerts.o:/builddir/build/BUILD/pacemaker-Pacemaker-2.0.3/daemons/attrd/pacemaker-attrd.h:109: multiple definition of `attrd_cluster'; pacemaker_attrd-pacemaker-attrd.o:/builddir/build/BUILD/pacemaker-Pacemaker-2.0.3/daemons/attrd/pacemaker-attrd.h:109: first defined here /usr/bin/ld: pacemaker_attrd-attrd_elections.o:/builddir/build/BUILD/pacemaker-Pacemaker-2.0.3/daemons/attrd/pacemaker-attrd.h:109: multiple definition of `attrd_cluster'; pacemaker_attrd-pacemaker-attrd.o:/builddir/build/BUILD/pacemaker-Pacemaker-2.0.3/daemons/attrd/pacemaker-attrd.h:109: first defined here /usr/bin/ld: pacemaker_attrd-attrd_elections.o:/builddir/build/BUILD/pacemaker-Pacemaker-2.0.3/daemons/attrd/pacemaker-attrd.h:110: multiple definition of `attributes'; pacemaker_attrd-pacemaker-attrd.o:/builddir/build/BUILD/pacemaker-Pacemaker-2.0.3/daemons/attrd/pacemaker-attrd.h:110: first defined here /usr/bin/ld: warning: /usr/lib/gcc/x86_64-redhat-linux/10/../../../../lib64/libqb.so contains output sections; did you forget -T? collect2: error: ld returned 1 exit status See https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/RYVPP45PMWPXYBBDKXO3CI7YGL7CDQG6/ and https://gcc.gnu.org/gcc-10/porting_to.html#common for more information about the failure. For the build logs, see: https://copr-be.cloud.fedoraproject.org/results/@python/python3.9/fedora-rawhide-x86_64/01248360-pacemaker/ For all our attempts to build pacemaker with Python 3.9, see: https://copr.fedorainfracloud.org/coprs/g/python/python3.9/package/pacemaker/ Testing and mass rebuild of packages is happening in copr. You can follow these instructions to test locally in mock if your package builds with Python 3.9: https://copr.fedorainfracloud.org/coprs/g/python/python3.9/ Let us know here if you have any questions. Python 3.9 will be included in Fedora 33. To make that update smoother, we're building Fedora packages with early pre-releases of Python 3.9. A build failure prevents us from testing all dependent packages (transitive [Build]Requires), so if this package is required a lot, it's important for us to get it fixed soon. We'd appreciate help from the people who know this package best, but if you don't want to work on this now, let us know so we can try to work around it on our side. Dear Maintainer, your package has not been built successfully in 32. Action is required from you. If you can fix your package to build, perform a build in koji, and either create an update in bodhi, or close this bug without creating an update, if updating is not appropriate [1]. If you are working on a fix, set the status to ASSIGNED to acknowledge this. Following the latest policy for such packages [2], your package will be orphaned if this bug remains in NEW state more than 8 weeks. A week before the mass branching of Fedora 33 according to the schedule [3], any packages not successfully rebuilt at least on Fedora 31 will be retired regardless of the status of this bug. [1] https://fedoraproject.org/wiki/Updates_Policy [2] https://docs.fedoraproject.org/en-US/fesco/Fails_to_build_from_source_Fails_to_install/ [3] https://fedoraproject.org/wiki/Releases/33/Schedule re [comment 7]: Miro, sorry for blocking you, pacemaker used to be stalled on a train wreck with build dependencies ([comment 4]), which didn't move forward until some two weeks ago (if I skim it down to [bug 1799365] well). To put insult to injury, without being unblocked on build prereqs first, the standard workflow (free of cutting down some features at configure time) didn't allow me to anticipate and fix further problems related to GCC 10 -- these have actually been discovered by that point and fixed in master branch, but not in this proper 2.0.3 version of pacemaker. So this time around, all was rather painful and with more iteration rounds than usual. And, unfortunately, s390x build failed without any explanatory message from the linker command, which was the culprit. Hence, s390x arch is disabled at the moment. Let me know if you see any further problems, e.g. Python related. Marking this bug as blocking F-ExcludeArch-s390x for the reason
just mentioned:
> And, unfortunately, s390x build failed [...]
The problem is that all output goes into /dev/null, otherwise one could see following [sharkcz@devel10 pengine]$ gcc -DHAVE_CONFIG_H -I. -I../../include -I../../include -I../../include -I../../libltdl -I../../libltdl -DPCMK_TIME_EMERGENCY_CGT -UPCMK_TIME_EMERGENCY_CGT -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/libxml2 -I/usr/include/heartbeat -I/usr/include/dbus-1.0 -I/usr/lib64/dbus-1.0/include -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -march=zEC12 -mtune=z13 -fasynchronous-unwind-tables -fstack-clash-protection -ggdb -fgnu89-inline -Wall -Waggregate-return -Wbad-function-cast -Wcast-align -Wdeclaration-after-statement -Wendif-labels -Wfloat-equal -Wformat-security -Wmissing-prototypes -Wmissing-declarations -Wnested-externs -Wno-long-long -Wno-strict-aliasing -Wpointer-arith -Wwrite-strings -Wunused-but-set-variable -Wformat=2 -Wformat-nonliteral -Werror -c utils.c -o libpe_status_la-utils.o In file included from ../../include/crm_internal.h:21, from utils.c:10: In function ‘pe_action_set_reason’, inlined from ‘custom_action’ at utils.c:605:13: ../../include/crm/common/logging.h:235:13: error: ‘%s’ directive argument is null [-Werror=format-overflow=] 235 | qb_log_from_external_source(__func__, __FILE__, fmt, level, \ | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 236 | __LINE__, converted_tag , ##args); \ | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../../include/crm/pengine/internal.h:19:43: note: in expansion of macro ‘crm_log_tag’ 19 | # define pe_rsc_trace(rsc, fmt, args...) crm_log_tag(LOG_TRACE, rsc ? rsc->id : "<NULL>", fmt, ##args) | ^~~~~~~~~~~ utils.c:2502:9: note: in expansion of macro ‘pe_rsc_trace’ 2502 | pe_rsc_trace(action->rsc, "Changing %s reason from '%s' to '%s'", action->uuid, action->reason, reason); | ^~~~~~~~~~~~ utils.c: In function ‘custom_action’: utils.c:2502:69: note: format string is defined here 2502 | pe_rsc_trace(action->rsc, "Changing %s reason from '%s' to '%s'", action->uuid, action->reason, reason); | ^~ cc1: all warnings being treated as errors Preliminary fix: https://github.com/ClusterLabs/pacemaker/pull/2004 poki, https://copr.fedorainfracloud.org/coprs/g/python/python3.9/package/pacemaker/ is a success, thanks. Dan or whoever with easy access to s390x machine: As Dan let me know out-of-band, there is a subtle problem with hidden stdout/stderr of some compilation commands in pacemaker build -- coincidentally those that are failing on s390x, making it difficult to debug. But, I've observed this troublesome pattern based on build logs from failed s390x builds. We are building with the help of libtool, meaning that each module for what is to be linked together to form a shared library is compiled twice: 1. first (standard?) pass, producing output to .libs/<LIBNAME>_la_<MODULE>.o - this finishes just fine - this is run without any stdout/stderr hiding 2. second (libtool-specific?) pass, producing output to <LIBNAME>_la_<MODULE>.o - this is what fails while the former has already finishes without problems (?!) - this is run with said stdout/stderr hiding (Dan thought we are hiding this intentionally -- nope, it's automatism in libtool and it works like this everywhere), likely justified by the fact that when the build per 1. suceed, this one must as well - the only spottable difference with this compilation command is that it lacks -fPIC -DPIC switches just prior to terminating "-o <OUTPUT>" part I am rather lost why said difference would play such a crucial role regarding whether the build succeeds or not -- and while the same difference in "-fPIC -DPIC" presence happens also with other archs, it won't turn the build down like in case of s390x. I conducted a scratch build with an immediate fix[1] for what Dan shared ([comment 11]) and it manifested the problem at another location/occasion[2], but due to the libtool's success->success (presumably at least at s390x) flawed logic, I can't proceed further. Therefore, take this as a request for help, please. [1] https://github.com/ClusterLabs/pacemaker/pull/2004 [2] https://koji.fedoraproject.org/koji/taskinfo?taskID=42209989 or this snippet from respective build.log: compilation per 1. above: > libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../../include > -I../../include -I../../include -I../../libltdl -I../../libltdl > -I../.. -I../.. -DPCMK_TIME_EMERGENCY_CGT -UPCMK_TIME_EMERGENCY_CGT > -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include > -I/usr/include/libxml2 -I/usr/include/heartbeat > -I/usr/include/dbus-1.0 -I/usr/lib64/dbus-1.0/include > -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -O2 -g -pipe -Wall > -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 > -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong > -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 > -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -march=zEC12 > -mtune=z13 -fasynchronous-unwind-tables -fstack-clash-protection -ggdb > -fgnu89-inline -Wall -Waggregate-return -Wbad-function-cast > -Wcast-align -Wdeclaration-after-statement -Wendif-labels > -Wfloat-equal -Wformat-security -Wmissing-prototypes > -Wmissing-declarations -Wnested-externs -Wno-long-long > -Wno-strict-aliasing -Wpointer-arith -Wwrite-strings > -Wunused-but-set-variable -Wformat=2 -Wformat-nonliteral -Werror -c > pcmk_sched_group.c -fPIC -DPIC -o > .libs/libpacemaker_la-pcmk_sched_group.o immediately followed with compilation per 2. above: > libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../../include > -I../../include -I../../include -I../../libltdl -I../../libltdl > -I../.. -I../.. -DPCMK_TIME_EMERGENCY_CGT -UPCMK_TIME_EMERGENCY_CGT > -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include > -I/usr/include/libxml2 -I/usr/include/heartbeat > -I/usr/include/dbus-1.0 -I/usr/lib64/dbus-1.0/include > -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -O2 -g -pipe -Wall > -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 > -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong > -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 > -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -march=zEC12 > -mtune=z13 -fasynchronous-unwind-tables -fstack-clash-protection -ggdb > -fgnu89-inline -Wall -Waggregate-return -Wbad-function-cast > -Wcast-align -Wdeclaration-after-statement -Wendif-labels > -Wfloat-equal -Wformat-security -Wmissing-prototypes > -Wmissing-declarations -Wnested-externs -Wno-long-long > -Wno-strict-aliasing -Wpointer-arith -Wwrite-strings > -Wunused-but-set-variable -Wformat=2 -Wformat-nonliteral -Werror -c > pcmk_sched_group.c -o libpacemaker_la-pcmk_sched_group.o >/dev/null > 2>&1 resulting in: > make[3]: *** [Makefile:733: libpacemaker_la-pcmk_sched_constraints.lo] Error 1 The -fPIC/no-PIC builds could be for both static and shared libs or libtool mode set incorrectly and in theory both compiles could use a different path in the compiler, so different set of warnings. I need to look closer or I can give you access to the s390x machine I use. One thing is true generally, production builds shouldn't use -Werror. > One thing is true generally, production builds shouldn't use -Werror.
For us, it's deliberate, I believe:
better safe (broken build) than sorry (crash at run-time).
The first problem causing the build to fail you've reported and
that was already patched was a real crash-worthy problem (well,
depending on libc implementation, but nonetheless).
So that's the habit we follow, sparing us additional hassles for
CI purposes, for instance.
TBH, I dare to say that nobody will go fishing the warnings messages
from automated build systems (like koji), for big projects, one could
easily get blind to warnings during standard dev edit-recompile cycles.
So that's purposefully enforcing like this, with ability to opt-out.
Also, odds are that this very case exposed a flaw in libtool's
reasoning as mentioned (successful compilation 1. does not necessarily
imply success of compilation 2., so it's counter-productive to disable
any outputs going from 2., as they may actually contain the meat in
form of what failed).
Without going to the details of the buildsystem or libtool, I have a solution bellow diff --git a/pacemaker.spec b/pacemaker.spec index d7ae4b9..2a24086 100644 --- a/pacemaker.spec +++ b/pacemaker.spec @@ -387,6 +387,7 @@ export CPPFLAGS="-UPCMK_TIME_EMERGENCY_CGT $CPPFLAGS" %{?with_coverage: --with-coverage} \ %{!?with_doc: --with-brand=} \ %{?gnutls_priorities: --with-gnutls-priorities="%{gnutls_priorities}"} \ + --disable-static \ --with-initdir=%{_initrddir} \ --with-runstatedir=%{_rundir} \ --localstatedir=%{_var} \ because static libs aren't packaged anyway. Looks like gcc has a problem with the "static part", combine with the redirection to /dev/null (maybe libtool expects the same output from the compiler for both static and shared compile) and an un-explainable problem is here. Still it would be good to know, why gcc has different opinions on the code between static and shared (no-PIC/PIC). Sounds like an immediate plan, thanks.
Will you proceed to file a respective libtool bug?
It is my gut feeling that gcc is able to prove some more invariants
in a static-way compilation than otherwise ... perhaps unless LTO
is applied (which will be really interesting to eventually enable,
since I guess it will discover whole a lot of new potential problems).
This means that libtool's assumption is indeed flawed and this
part should go away, at least for architectures known to differ,
such as s390x:
> # Allow error messages only from the first compilation.
> if test yes = "$suppress_opt"; then
> suppress_output=' >/dev/null 2>&1'
> fi
(eventually with some smart middle mean that would compare
stdout/stderr between 1. and 2. and only present them from 2.
if they differ).
I'm going to file a gcc bug, because it's the root cause. It should provide consistent warnings across arches. Thanks, the work here is over, at least for now, I think. Build using ./configure --disable-static went through. Not sure what ramifications are there for blocking the main s390x arch bug -- do as you wish. If there are any more problems, feel free to reopen. Jan, do you plan to fix the other warnings in the source code too as they seem to be valid based on the feedback from the gcc devels in bug 1811158? Let me know if you need access to a s390x machine for reproducing them. Yeah, I realize, we'd rather deal with the underlying problems anyway. My plan is to fiddle around with extra options suggested in [bug 1811158 comment 2] to see if induces the problems first. If I get stuck, I'll kindly ask for access to s390x, thanks for the offer. |