Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1045193
Summary: | python3 fails test_faulthandler test_gdb tests on aarch64 | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Peter Robinson <pbrobinson> | ||||
Component: | glibc | Assignee: | Siddhesh Poyarekar <spoyarek> | ||||
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | rawhide | CC: | amcnabb, bkabrda, blc, codonell, dmalcolm, fweimer, jakub, kmcmartin, law, mnewsome, mstuchli, pfrankli, rth, spoyarek, tomspur, vstinner | ||||
Target Milestone: | --- | Keywords: | Reopened | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | python3-3.4.1-3.fc21 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2014-06-03 12:42:08 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1045187 | ||||||
Bug Blocks: | 922257 | ||||||
Attachments: |
|
Description
Peter Robinson
2013-12-19 20:33:15 UTC
blc gave me access to the build chroot. For "test_gdb", I saw similar noise from gdb: "Failed to read a valid object file image from memory." as seen in bug 1045187, with the pretty-printers appearing to otherwise be functioning normally. test_faulthandler failed thusly according to the build logs in comment #0: ====================================================================== FAIL: test_register_chain (test.test_faulthandler.FaultHandlerTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/builddir/build/BUILD/Python-3.3.2/Lib/test/test_faulthandler.py", line 588, in test_register_chain self.check_register(chain=True) File "/builddir/build/BUILD/Python-3.3.2/Lib/test/test_faulthandler.py", line 566, in check_register self.assertRegex(trace, regex) AssertionError: Regex didn't match: '^Traceback \\(most recent call first\\):\n File "<string>", line 7 in func\n File "<string>", line 28 in <module>$' not found in 'Traceback (most recent call first):\n File "<string>", line 7 in func\n File "<string>", line 28 in <module>\npython: /builddir/build/BUILD/Python-3.3.2/Modules/gcmodule.c:332: update_refs: Assertion `gc->gc.gc_refs == (-3)\' failed.' ---------------------------------------------------------------------- However, on attempting to reproduce in the chroot, I get: ====================================================================== FAIL: test_register_chain (__main__.FaultHandlerTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/builddir/build/BUILD/Python-3.3.2/Lib/test/test_faulthandler.py", line 588, in test_register_chain self.check_register(chain=True) File "/builddir/build/BUILD/Python-3.3.2/Lib/test/test_faulthandler.py", line 572, in check_register self.assertEqual(exitcode, 0) AssertionError: -11 != 0 The test_gdb case is covered in 1045187, so this BZ specifically concerns the test_faulthandler issue. python-2.7.5-11.fc21 built fine with the 3.13 kernel and gcc-4.8.2-14.fc21 Closed the wrong bug. So still seeing the test_faulthandler issue. http://arm.koji.fedoraproject.org/koji/taskinfo?taskID=2254074 Ran 47 tests in 2.520s OK (skipped=3) 344 tests OK. 1 test failed: test_faulthandler 2 tests altered the execution environment: test_site test_urllib2_localnet 26 tests skipped: test_codecmaps_cn test_codecmaps_hk test_codecmaps_jp test_codecmaps_kr test_codecmaps_tw test_curses test_devpoll test_ioctl test_kqueue test_msilib test_ossaudiodev test_pep277 test_smtpnet test_socketserver test_startfile test_systemtap test_timeout test_tk test_ttk_guionly test_unicode_file test_urllib2net test_urllibnet test_winreg test_winsound test_xmlrpc_net test_zipfile64 4 skips unexpected on linux: test_ioctl test_systemtap test_tk test_ttk_guionly [2380528 refs] I don't have access to an aarch64 machine, so I can't debug this. Is there a possibility of getting a testing machine for this? (In reply to Bohuslav "Slavek" Kabrda from comment #7) > I don't have access to an aarch64 machine, so I can't debug this. Is there a > possibility of getting a testing machine for this? I believe there's access to devices by beaker. Brendan can you confirm this with Bohuslav please? Bohuslav, hardware is available, I will send you information. I managed to track this down a bit and created upstream bug report - all the relevant information I've come up with so far are summarized there: http://bugs.python.org/issue21131 Any status update on this? The last scratch build I tried I get the following on aarch64: Ran 47 tests in 2.591s OK (skipped=3) 343 tests OK. 2 tests failed: test_faulthandler test_sqlite 2 tests altered the execution environment: test_site test_urllib2_localnet 26 tests skipped: test_codecmaps_cn test_codecmaps_hk test_codecmaps_jp test_codecmaps_kr test_codecmaps_tw test_curses test_devpoll test_ioctl test_kqueue test_msilib test_ossaudiodev test_pep277 test_smtpnet test_socketserver test_startfile test_systemtap test_timeout test_tk test_ttk_guionly test_unicode_file test_urllib2net test_urllibnet test_winreg test_winsound test_xmlrpc_net test_zipfile64 4 skips unexpected on linux: test_ioctl test_systemtap test_tk test_ttk_guionly [2373339 refs] Slavek, any update? Hi Brendan, - the test_sqlite failure has already been solved upstream and the fix for it will be part of upcoming Python 3.4 (it's already being built in Koji side tag and will be merged for F21) - as ofr the test_faulthandler, I think that noone actually knows what's going on there - not even upstream - I guess the best thing to do here would be disabling the test on aarch64. If that's ok with you, we'll do it for Python 3.4 which will eventually get merged to F21. If it's okay with you it's okay with me. (In reply to Brendan Conoboy from comment #14) > If it's okay with you it's okay with me. Good. I'll let you know when we merge Python 3.4 with the fix and disabled test into Koji Rawhide tag. (In reply to Bohuslav "Slavek" Kabrda from comment #15) > (In reply to Brendan Conoboy from comment #14) > > If it's okay with you it's okay with me. > > Good. I'll let you know when we merge Python 3.4 with the fix and disabled > test into Koji Rawhide tag. What's the timeframe for 3.4 landing in rawhide main repos? My current guess is by the end of month. The problem is that although we announced on fedora-devel and python-devel mailing lists, most of the maintainers don't rebuild their packages. While we don't require all packages to be rebuilt before we merge f21-python into Rawhide, we'd like to have at least the "important" ones (e.g. big frameworks, important build tools, etc) - that's a lot of packages to rebuild. So far it's been going fine, but we may still hit some obstacles, so I can't give you a better estimate right now, sorry. (In reply to Bohuslav "Slavek" Kabrda from comment #17) > My current guess is by the end of month. The problem is that although we > announced on fedora-devel and python-devel mailing lists, most of the > maintainers don't rebuild their packages. > While we don't require all packages to be rebuilt before we merge f21-python > into Rawhide, we'd like to have at least the "important" ones (e.g. big > frameworks, important build tools, etc) - that's a lot of packages to > rebuild. So far it's been going fine, but we may still hit some obstacles, > so I can't give you a better estimate right now, sorry. The fact of the matter is what you have now is what you'll get from maintainers. You will need to do it yourself.... like pretty much all the other maintainers of the core of big stacks do. If you wait for them to do it the fact is it won't happen so just get on with it, the longer you wait the more in the main rawhide repos will change and the more screwed you'll be. (In reply to Peter Robinson from comment #18) > (In reply to Bohuslav "Slavek" Kabrda from comment #17) > > My current guess is by the end of month. The problem is that although we > > announced on fedora-devel and python-devel mailing lists, most of the > > maintainers don't rebuild their packages. > > While we don't require all packages to be rebuilt before we merge f21-python > > into Rawhide, we'd like to have at least the "important" ones (e.g. big > > frameworks, important build tools, etc) - that's a lot of packages to > > rebuild. So far it's been going fine, but we may still hit some obstacles, > > so I can't give you a better estimate right now, sorry. > > The fact of the matter is what you have now is what you'll get from > maintainers. You will need to do it yourself.... like pretty much all the > other maintainers of the core of big stacks do. If you wait for them to do > it the fact is it won't happen so just get on with it, the longer you wait > the more in the main rawhide repos will change and the more screwed you'll > be. I *am* doing it myself, that's exactly why I'm saying that I have a very bad time estimate.
> I *am* doing it myself, that's exactly why I'm saying that I have a very bad
> time estimate.
Why aren't you using the mass rebuild scripts then and automating it? Ask the perl team what they use, or possibly the ruby team. There's ways and means to automate this
(In reply to Peter Robinson from comment #20) > > I *am* doing it myself, that's exactly why I'm saying that I have a very bad > > time estimate. > > Why aren't you using the mass rebuild scripts then and automating it? Ask > the perl team what they use, or possibly the ruby team. There's ways and > means to automate this There are also many circular dependencies in Python stack, which means automated script aren't much help until certain packages are rebuilt. Once I manage to rebuild these, I'll use automated scripts. Just BTW, Dennis Gilmore announced that relengs will be merging all Koji side tags on 2014-05-26 [1], because of Fedora mass rebuild, so if we don't manage to do it sooner, this is the date. [1] https://lists.fedoraproject.org/pipermail/devel-announce/2014-May/001404.html f21-python has just been merged to rawhide. Could you please re-test with python3-3.4.1-3.fc21? Retested, we've regressed. 4 test failures. Because of all the noarch packages built against 3.4 this is now blocking _ALL_ aarch64 builds and is now critical path. Ran 47 tests in 2.364s OK (skipped=3) 358 tests OK. 4 tests failed: test_ensurepip test_faulthandler test_os test_venv 1 test altered the execution environment: test_site 26 tests skipped: test_codecmaps_cn test_codecmaps_hk test_codecmaps_jp test_codecmaps_kr test_codecmaps_tw test_curses test_devpoll test_ioctl test_kqueue test_msilib test_ossaudiodev test_pep277 test_smtpnet test_socketserver test_startfile test_systemtap test_timeout test_tk test_ttk_guionly test_unicode_file test_urllib2net test_urllibnet test_winreg test_winsound test_xmlrpc_net test_zipfile64 error: Bad exit status from /var/tmp/rpm-tmp.CGoTsu (%check) Bad exit status from /var/tmp/rpm-tmp.CGoTsu (%check) RPM build errors: Child return code was: 1 http://arm.koji.fedoraproject.org/koji/taskinfo?taskID=2356557 (In reply to Peter Robinson from comment #24) > Retested, we've regressed. 4 test failures. Because of all the noarch > packages built against 3.4 this is now blocking _ALL_ aarch64 builds and is > now critical path. > > Ran 47 tests in 2.364s > OK (skipped=3) > 358 tests OK. > 4 tests failed: > test_ensurepip test_faulthandler test_os test_venv I've disabled test_faulthandler for now, it's reported in the linked Python upstream issue. As for the others: - test_ensurepip and test_venv are caused by the same root issue - Python doesn't find python3-pip and/or python3-setuptools package(s) where it should, I don't know why right now - test_os failure seems to be new, I'll need to investigate I'm working on this right now with the highest priority. I'll let you know as soon as I figure something out. > 1 test altered the execution environment: > test_site > 26 tests skipped: > test_codecmaps_cn test_codecmaps_hk test_codecmaps_jp > test_codecmaps_kr test_codecmaps_tw test_curses test_devpoll > test_ioctl test_kqueue test_msilib test_ossaudiodev test_pep277 > test_smtpnet test_socketserver test_startfile test_systemtap > test_timeout test_tk test_ttk_guionly test_unicode_file > test_urllib2net test_urllibnet test_winreg test_winsound > test_xmlrpc_net test_zipfile64 > error: Bad exit status from /var/tmp/rpm-tmp.CGoTsu (%check) > Bad exit status from /var/tmp/rpm-tmp.CGoTsu (%check) > RPM build errors: > Child return code was: 1 > > http://arm.koji.fedoraproject.org/koji/taskinfo?taskID=2356557 So, it seems that the test_os problem is caused by a regression in glibc - I tested with older glibc-2.18.90-17.fc21.aarch64 and it doesn't fail compared to current glibc-2.19.90-18.fc21.aarch64, where it fails. gdb debugging gave me this: - the test calls os.tcgetpgrp() with an invalid file descriptor (*) (this is done on purpose to test proper errno) - this calls cPython's posix_tcgetpgrp() function, which is just a simple wrapper over tcgetpgrp() - tcgetpgrp() returns -1, but errno is still set to 0 (but should be set to EBADF), which makes the test fail (and is actually a bug) I'm ccing main glibc maintainer - could you please have a look at this? It seems that this is affecting not only tcgetpgrp, but also tcsetpgrp and ioctl. Since (if I'm not mistaken) tcgetpgrp and tcsetpgrp make underlying calls to ioctl, my guess is that the bug is actually just in one place - ioctl (not sure though). I'm attaching a reproducer that demonstrates this behaviour when used with glibc-2.19.90-18.fc21.aarch64. (*) invalid is, in this case, a descriptor of a closed file Created attachment 900077 [details]
Reproducer demonstrating broken behaviour of tcgetpgrp/tcsetpgrp and ioctl
(In reply to Bohuslav "Slavek" Kabrda from comment #26) > gdb debugging gave me this: > - the test calls os.tcgetpgrp() with an invalid file descriptor (*) (this is > done on purpose to test proper errno) > - this calls cPython's posix_tcgetpgrp() function, which is just a simple > wrapper over tcgetpgrp() > - tcgetpgrp() returns -1, but errno is still set to 0 (but should be set to > EBADF), which makes the test fail (and is actually a bug) The tcgetpgrp() function is a thin wrapper around the __ioctl() function which is itself a syscall wrapper. If downgrading glibc fixes the issue then it's more likely a problem with the syscall wrapper than the kernel (which I assume remained constant and is returning the right errno). In the -18 release we pulled in Richard Henderson's changes to sysdep.h and that is likely the problem. Richard, Would you be able to have a look at this? For reference we're seeing the same issue with python2 http://arm.koji.fedoraproject.org/koji/taskinfo?taskID=2357787 (In reply to Peter Robinson from comment #24) > Retested, we've regressed. 4 test failures. Because of all the noarch > packages built against 3.4 this is now blocking _ALL_ aarch64 builds and is > now critical path. > > Ran 47 tests in 2.364s > OK (skipped=3) > 358 tests OK. > 4 tests failed: > test_ensurepip test_faulthandler test_os test_venv > 1 test altered the execution environment: > test_site > 26 tests skipped: > test_codecmaps_cn test_codecmaps_hk test_codecmaps_jp > test_codecmaps_kr test_codecmaps_tw test_curses test_devpoll > test_ioctl test_kqueue test_msilib test_ossaudiodev test_pep277 > test_smtpnet test_socketserver test_startfile test_systemtap > test_timeout test_tk test_ttk_guionly test_unicode_file > test_urllib2net test_urllibnet test_winreg test_winsound > test_xmlrpc_net test_zipfile64 > error: Bad exit status from /var/tmp/rpm-tmp.CGoTsu (%check) > Bad exit status from /var/tmp/rpm-tmp.CGoTsu (%check) > RPM build errors: > Child return code was: 1 > > http://arm.koji.fedoraproject.org/koji/taskinfo?taskID=2356557 The original test_venv failure was likely caused by python-pip being built against Python 3.3 in your repo, rewheel therefore couldn't find it and it failed. It appears you have since rebuilt pip against Python 3.4, so that is no longer an issue, however test_venv still fails because the rewheel patch was not updated to reflect changed directory structure in Lib/ensurepip, I'll fix that today. I've also added a note about how to bootstrap python 3.4 with the rewheel module to Python3 spec, hopefully that should help us avoid the initial issue with test_venv in the future. (In reply to Bohuslav "Slavek" Kabrda from comment #27) > Created attachment 900077 [details] > Reproducer demonstrating broken behaviour of tcgetpgrp/tcsetpgrp and ioctl Presumably you meant - pid_t pgid = tcgetpgrp((int)fp); + pid_t pgid = tcgetpgrp(fileno(fp)); But either way, with glibc-2.17-55.9.sa1.3.aarch64 I get $ ./a.out tcgetpgrp returned: -1 errno is: 9 ioctl returned: -1 errno is: 9 which appears to be exactly what you were looking for. It's certainly the same results as I get on x86_64. (In reply to Carlos O'Donell from comment #28) > In the -18 release we pulled in Richard Henderson's changes to sysdep.h and > that is likely the problem. Pardon? The -18 release was * Wed Jul 31 2013 Siddhesh Poyarekar <siddhesh> - 2.17-18 Further, I have yet to push my sysdep.h changes to any RH branch, so I'm not really certain which patches to which you are referring... (In reply to Richard Henderson from comment #32) > (In reply to Carlos O'Donell from comment #28) > > In the -18 release we pulled in Richard Henderson's changes to sysdep.h and > > that is likely the problem. > > Pardon? The -18 release was > > * Wed Jul 31 2013 Siddhesh Poyarekar <siddhesh> - 2.17-18 > > Further, I have yet to push my sysdep.h changes to any RH branch, so > I'm not really certain which patches to which you are referring... Keep in mind this is rawhide, and -18 is: * Mon May 26 2014 Siddhesh Poyarekar <siddhesh> - 2.19.90-18 - Sync with upstream master. - Adjust rtkaio patches to build with upstream master. The glibc team rebases the rawhide branches against upstream master on a weekly basis. Therefore you need not do anything to get your upstream patches into rawhide. I noted that -18 was the upstream rebase release which included your changes, and noted that those changes touched code in the errno handling paths. I haven't debugged any further. Does that clarify the situation? Ah, wonderful. We're now on the same page. And yes, I broke ioctl here: ca3cfa40c16ef34c74951a07a57cfcbcd58898b1 committed on May 25th, and fixed it here: 74f31c18593111725478a991b395ae45661985a3 committed on May 30th, which is after the 2.19.90-18 revision cited. So in theory everything should be fixed in the next pull. > committed on May 30th, which is after the 2.19.90-18 revision cited.
> So in theory everything should be fixed in the next pull.
Can we expedite that pull? It's currently blocking all aarch64 builds
(In reply to Peter Robinson from comment #35) > > committed on May 30th, which is after the 2.19.90-18 revision cited. > > So in theory everything should be fixed in the next pull. > > Can we expedite that pull? It's currently blocking all aarch64 builds I'm assigning to Siddhesh. We'll do the rawhide update on Wednesday after which a rebuild of python3 should just work. We'll work in the background to see if we can get this done sooner by promoting rth to packager so he can do it himself. glibc rawhide has now been rebased to upstream master. This bug has been reported upstream and I just fixed it: https://bugs.python.org/issue21131 "test_faulthandler.test_register_chain fails on 64bit ppc/arm with kernel >= 3.10" So ppc64 is also affected, not only ARM. It was a bug in the size of the stack allocated by faulthandler for its signal handlers. The bug depends on the CPU model and the FPU state size: faulthandler uses a too small stack. |