Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 494026
Summary: | mono build is blocked by ppc-build. | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Fabian Deutsch <fabian.deutsch> | ||||
Component: | mono | Assignee: | Paul F. Johnson <paul> | ||||
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | rawhide | CC: | a.badger, adam, awilliam, dkaylor, lxtnow, munroesj, notting, paul, scottt.tw, sindrepb | ||||
Target Milestone: | --- | Keywords: | Reopened | ||||
Target Release: | --- | ||||||
Hardware: | powerpc | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2010-01-25 16:14:44 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 238953, 179260, 473302 | ||||||
Attachments: |
|
Description
Fabian Deutsch
2009-04-03 18:43:48 UTC
I've spoken to some of the devs at Novell who cannot reproduce the problem on any of their systems (including whatever they use for opensuse). They have asked if we can run the build through gdb and report back the problems. I've asked spot, but not had any response. Blocking ppc and F11 trackers The last successful build was RC1. The first failed build was RC2. RC2 did change ppc specific bits - ppc64 TLS was added, plus support for NPTL on both ppc32 and ppc64. pulled the mono-2.4 release and ran into a problem where gcc-4.4 is being more pedantic then gcc-4.2/3. for example very early we see: In file included from ./include/private/gc_priv.h:95, from alloc.c:19: ./include/private/gc_locks.h: In function ‘GC_test_and_set’: ./include/private/gc_locks.h:162: error: ‘asm’ operand has impossible constraints make[3]: *** [alloc.lo] Error 1 __asm__ __volatile__( "1:\tlwarx %0,0,%3\n" /* load and reserve */ "\tcmpwi %0, 0\n" /* if load is */ "\tbne 2f\n" /* non-zero, return already set */ "\tstwcx. %2,0,%1\n" /* else store conditional */ "\tbne- 1b\n" /* retry if lost reservation */ "\tsync\n" /* import barrier */ "2:\t\n" /* oldval is zero if we set */ : "=&r"(oldval), "=p"(addr) : "r"(temp), "1"(addr) : "cr0","memory"); The latest from bdwgc is a little different: 149 __asm__ __volatile__( 150 "1:lwarx %0,0,%1\n" /* load and reserve */ 151 "cmpwi %0, 0\n" /* if load is */ 152 "bne 2f\n" /* non-zero, return already set */ 153 "stwcx. %2,0,%1\n" /* else store conditional */ 154 "bne- 1b\n" /* retry if lost reservation */ 155 "2:\n" /* oldval is zero if we set */ 156 : "=&r"(oldval) 157 : "r"(addr), "r"(temp) 158 : "memory", "cr0"); Note that =p(addr) is gone and the back ref "1"(addr) is not needed. I would guess gcc-4.4 is choking on the =p(addr) Does mono need to update to a newer version of Boehm GC? Created attachment 338995 [details]
patch to GC_test_and_set
This patch fixes the compile error with gcc-4.4
The above patch allows the make for mono-2.4 to complete given the configure options: ./configure --prefix=/usr/local --with-moonlight=no Now I see regression errors in the make check. Test run: image=/home/ppcteam/mono-ppc/mono-131222/mono/mini/basic-long.exe, opts= Test 'test_2_neg' failed result (got 3, expected 2). Test 'test_0_neg_large' failed result (got 1, expected 0). Test 'test_1_simple_neg' failed result (got 0, expected 1). Results: total tests: 88, failed: 3, cfailed: 0 (pass: 96.59%) Elapsed time: 0.010395 secs (0.001288, 0.009107), Code size: 26908 We get something different results with older versions of GCC 4.1/4.3 with those we see: 365 test(s) passed. 2 test(s) did not pass. Failed tests: finalizer-wait.exe critical-finalizers.exe so there could be latent PPC GCC-4.4 bug, but could also be due to other system differences. realized that I was using the trunk instead of the 2.4 branch so trying again with http://ftp.novell.com/pub/mono/sources/mono/mono-2.4.tar.bz2 Also updated the gcc packages from yum. I've alerted the Novell people (via the mono developers list) of this problem as well as posting this BZ url. Hopefully, we'll be able to get a ppc build running soon. /me wishes someone would donate a PPC box to him so he could help get this fixed.... We can build mono-2.4 from the tar file on F11 with the attached patch. Working with the vargaz from monodev this patch has been applied to trunk and mono-2-4 branch But I am still see the the basic-long failure on of the make check. This is not the case with other distros (with older gcc). So please veriry that the mono-2-4 builds in the fedora build environment then we can close this bugz and open new bugz for these make check failures. Nope. still not building, same problem. http://koji.fedoraproject.org/koji/getfile?taskID=1290915&name=build.log :-( Did you verify that the attached patch is applied. and that you gcc is up2date? Also this may be a clue "make[6]: execvp: mcs: Permission denied" please check you security settings! Link to koji build page: http://koji.fedoraproject.org/koji/taskinfo?taskID=1290915 From the build.log: Patch #8 (mono-24-ppc-glocks.patch): + /bin/cat /builddir/build/SOURCES/mono-24-ppc-glocks.patch + /usr/bin/patch -s -p1 -b --suffix .glocks-ppc --fuzz=0 Checked cvs to ensure this is the patch provided in this bug report. From the root.log: DEBUG util.py:256: gcc.ppc 0:4.4.0-0.32 gcc-c++.ppc 0:4.4.0-0.32 I note that the spec currently has moonlight enabled: %configure --with-ikvm=yes --with-jit=yes --with-xen_opt=yes \ --with-moonlight=yes --disable-static --with-preview=yes \ --with-libgdiplus=installed I'm not 100% sure but I think that: "make[6]: execvp: mcs: Permission denied" is because Paul has changed the spec file to rebootstrap the package. So /usr/bin/mcs is not being found. After that, the code tries to use the mcslite bootstrapping binaries and fails. We shouldn't have to rebootstrap, though, because releng put the old package, that had ppc builds back into the buildroot. Okay, bootstrap code turned off, confirmed patch has been applied. Latest build gets farther but still fails with a stack overflow on ppc: http://koji.fedoraproject.org/koji/getfile?taskID=1295223&name=build.log Build task is: http://koji.fedoraproject.org/koji/taskinfo?taskID=1295223 Ok this is still some goofey problem specific to your build environment, because I can build mono-2.4 from the svn branch (and the tar file with patch). I have verified that I can compile Mono.Xml.Xsl/PatternTokenizer.cs within my F11 mono-2.4 build. one difference is I build --with-moonlight=no Can you try with moonlight=yes please? At least we can factor that one out then. That said, when I've pushed 2.4 release and RC2 + RC3 through they had moonlight=no as well (actually they didn't have any of the moonlight options taken as the default is no). I've tried with --with-moonlight=no and also with just:: %configure --with-moonlight=no --disable-static In each of those cases it stops at the same point in the build. Steven, is this error coming from mcs or some other tool? One thing to remember is that this is being built with the bootstrapping code disabled, so we're using mcs from mono-core-2.4-RC1 in this build. (Although Paul's build.log with bootstrapping enabled showed that the bootstrapping mcs will error at a different point in the build). Can you also confirm that the tarball before patch has md5sum: da2bf1c0aba2958d26c5e8a9a49fd9d1 mono-2.4.tar.bz2 (I noticed that the RC's and final all have the same name. :-( Finally, if you think this is something to do with the buildsystem's environment, can you try rpm --rebuild of the source rpm on your F11 system? It's available here: curl 'http://koji.fedoraproject.org/koji/getfile?taskID=1295214&name=mono-2.4-14.fc11.src.rpm' > mono-2.4-14.fc11.src.rpm rpm --rebuild mono-2.4-14.fc11.src.rpm When I add the --disable-static I see the following failure: echo "#define XSLT_PATTERN" > Mono.Xml.Xsl/PatternParser.cs ./../../jay/jay -ct Mono.Xml.Xsl/PatternParser.jay < ./../../jay/skeleton.cs >>Mono.Xml.Xsl/PatternParser.cs ./../../jay/jay: 3 rules never reduced ./../../jay/jay: 1 shift/reduce conflict, 46 reduce/reduce conflicts. echo "#define XSLT_PATTERN" > Mono.Xml.Xsl/PatternTokenizer.cs cat System.Xml.XPath/Tokenizer.cs >>Mono.Xml.Xsl/PatternTokenizer.cs MCS [basic] System.Xml.dll Stack overflow in unmanaged: IP: 0xf8b4b54, fault addr: 0xff18bda0 Stack overflow in unmanaged: IP: 0xf8b4b54, fault addr: 0xff18abd0 Stack overflow in unmanaged: IP: 0xf8b4b54, fault addr: 0xff189ff0 Stack overflow in unmanaged: IP: 0xf8b4b54, fault addr: 0xff188e20 Stack overflow in unmanaged: IP: 0xf8b4b54, fault addr: 0xff187c50 Stack overflow in unmanaged: IP: 0xfd98510, fault addr: 0xff186ef0 Stack overflow in unmanaged: IP: 0xf8b4b54, fault addr: 0xff185ea0 Stack overflow in unmanaged: IP: 0xf8b4b54, fault addr: 0xff184cd0 Stack overflow: IP: 0xfd98510, fault addr: 0xff183f70 At Unmanaged So the --disable-static seems to be part of the issue. Also verified the checksum: mono-ppc]$ md5sum mono-2.4.tar.bz2 da2bf1c0aba2958d26c5e8a9a49fd9d1 mono-2.4.tar.bz2 Following onto the previous comment: If I don't have --disable-static but do have --with-static_mono=no, I get the Stack overflow. --disable-static implies --with-static_mono=no http://koji.fedoraproject.org/koji/taskinfo?taskID=1295972 And if we link statically against libmono, we are able to build on ppc: https://koji.fedoraproject.org/koji/taskinfo?taskID=1296119 Apparently, --disable-static is not supported or tested upstream and future releases may rely on the static libs (makes for a quicker runtime as well). Can we allow mono to have static libs in? The --dibale-static option is also not used in the OpenSUSE build - which builds fine (according to some #mono-devel irc people). It is also not present in their spec file. We won't ship the static libs. It would be okay to build mono itself against a static libmono.a *as a temporary workaround*. We'd definitely want to fix this by F12. The big issue I'd want an answer to is whether this is specific to mono dynamically linking to libmono. If it's going to happen with other things that link against libmono to embed the runtime, then this is a much larger issue than if it just affects mono. By the looks, we build against libmono.a but then don't need to bundle it. It is also only an issue with mono and nothing else - yet. (In reply to comment #27) > By the looks, we build against libmono.a but then don't need to bundle it. Yep. We can rm -rf the static libraries. > It is also only an issue with mono and nothing else - yet. This is an oversimplification. Nothing we ship uses libmono ATM. But that doesn't mean that people using Fedora aren't embedding libmono into their applications. Since we don't know precisely what the trigger is other than linking to libmono dynamically, we don't know how far reaching this is. Okay, new mono build is in the buildsystem: http://koji.fedoraproject.org/koji/taskinfo?taskID=1299249 This will be mono-2.4 final with the patch from Steven (Thanks!) and building against a static libmono. Removing this from the F11Blocker bug and adding to the F12Blocker. Things we need to do as soon as possible and definitely before the F12 release: * Get mono linking dynamically against libmono again. * Try to bootstrap mono onto ppc64. that will be a trick! The cross (compiling 64-bit on a 32-bit default system) requires lots of 64-bit packages that may or may not exist and some hacking around brain-dead pkgconfig/libtool isms. Starting with glib-2.0 and pcre I have had to hack pkgconfig and libtool la files to for 64-bit packages that did not bother to provide them. In some cases I resorted to running configure then hacking config.status to to replace lib with lib64. Also you will need a 64-bit clean version of glibconfig.h as mono depends on it to get its int/pointer casts right. Without this fix any 64-bit mono build is doomed. It seems the glib is incapable of providing a biarch clean devel package. Finally mkbundle is brain-dead as it blindly exec's as and ld without applying to the appropriate -m64/-m32 switches. I have patches to hack around that as well. It is obviously simpler to build 64-bit on a 64-bit primary systems. I suspect it is just are hard to build a 32-bit mono on a 64-bit primary system, but I have not tried that yet. Luckily we build a ppc64 primary system, we just don't ship it by default :-) http://koji.fedoraproject.org/koji/taskinfo?taskID=1300604 Assuming that my next build with bootstrapping off works[1]_, we'll just have to figure out what's going wrong with dynamic linking to libmono. .. _[1]: http://koji.fedoraproject.org/koji/taskinfo?taskID=1300753 Steven, I *think* I have a simpler test case. Once you have mono-2.4 final built and installed: cd mono-2.4/samples/embed gcc -o teste teste.c `pkg-config --cflags --libs mono` -lm mcs test.cs ./teste test.exe Segmentation fault And: cd mono-2.4/samples/embed gcc -Wall -o test-invoke test-invoke.c `pkg-config --cflags --libs mono` -lm mcs invoke.cs ./test-invoke invoke.exe Segmentation fault These work when run on an F10 i386 box with mono-2.4 final rpms so it seems like it's related... although it could be I'm looking at a different bug now. Do you get this if you use gmcs instead of mcs? Yes. Just checked and this has not been fixed: http://koji.fedoraproject.org/koji/taskinfo?taskID=1447139 As discussed at today's blocker bug review meeting, since we have a current 'workaround' (actually blessed by upstream) with no catastrophic consequences, can't consider this blocking F12 release. Dropping to F12Target. -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers Steven, do you have the time to look at this at some point or are you terribly busy? By F13, ppc is going to be a secondary arch and we'll probably go to dynamic linking for the F13 rawhide cycle whether or not this bug is fixed. |