Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 894088
Summary: | illegal instruction errrors for python programs/rpm on Fedora 18 Beta kirkwood image | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Till Maas <opensource> |
Component: | glibc | Assignee: | Carlos O'Donell <codonell> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 18 | CC: | blc, codonell, dennis, fweimer, jakub, jcm, law, ndevos, opensource, pfrankli, schwab, spoyarek |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | arm | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2014-01-02 21:13:29 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 245418 | ||
Attachments: |
Description
Till Maas
2013-01-10 16:41:11 UTC
Some ideas on debugging: 1. Does any python script work? 2. Can you run 'python -m pdb /usr/bin/cnucnu' - enter 'c' to continue the execution at the Pdb-prompt 3. Can you strace and/or ltrace the /usr/bin/cnucnu (In reply to comment #1) > Some ideas on debugging: > > 1. Does any python script work? Some invocations work: /usr/bin/cnucnu --help Also yum worked at least enough to allow to install cnucnu. Installed packages are: Jan 09 17:31:14 Installed: python-magic-5.11-4.fc18.armv5tel Jan 09 17:31:15 Installed: python-bugzilla-0.7.0-2.fc18.noarch Jan 09 17:31:19 Installed: pyOpenSSL-0.13-4.fc18.armv5tel Jan 09 17:31:21 Installed: python-simplejson-2.6.0-2.fc18.armv5tel Jan 09 17:31:22 Installed: python-zope-event-3.5.1-4.fc18.noarch Jan 09 17:31:23 Installed: python-zope-interface-4.0.2-3.fc18.armv5tel Jan 09 17:31:24 Installed: python-bunch-1.0.1-3.fc18.noarch Jan 09 17:31:26 Installed: python-fedora-0.3.29-2.fc18.noarch Jan 09 17:31:29 Installed: pyserial-2.6-3.fc18.noarch Jan 09 17:31:41 Installed: python-babel-0.9.6-5.fc18.noarch Jan 09 17:31:46 Installed: m2crypto-0.21.1-9.fc18.armv5tel Jan 09 17:31:47 Installed: python-fpconst-0.7.3-10.fc18.noarch Jan 09 17:32:17 Installed: python-twisted-core-12.1.0-2.fc18.armv5tel Jan 09 17:32:19 Installed: SOAPpy-0.11.6-15.fc18.noarch Jan 09 17:32:26 Installed: python-twisted-web-12.1.0-2.fc18.armv5tel Jan 09 17:32:39 Installed: python-genshi-0.6-4.fc17.armv5tel Jan 09 17:32:43 Installed: libyaml-0.1.4-3.fc18.armv5tel Jan 09 17:32:44 Installed: PyYAML-3.10-6.fc18.armv5tel Jan 09 17:32:45 Installed: cnucnu-0-0.11.20121004git618ed580.fc18.noarch > 2. Can you run 'python -m pdb /usr/bin/cnucnu' > - enter 'c' to continue the execution at the Pdb-prompt python -m pdb /usr/bin/yum > /usr/bin/yum(2)<module>() -> import sys (Pdb) c Ungültiger Maschinenbefehl # python -m pdb /usr/bin/cnucnu report-outdated > /usr/bin/cnucnu(20)<module>() -> import logging (Pdb) c Ungültiger Maschinenbefehl Just running "import sys" in an interactive python prompt worked without problems. > 3. Can you strace and/or ltrace the /usr/bin/cnucnu The system just crashed completly during tests, I will see if the programs are installed after it rebooted. Actually rpm does not work as well: not working: rpm -qa rpm -Uhv strace-4.7-2.fc18.armv5tel.rpm working: rpm --help There is no trace command that I could use to debug further. Unpack the rpm with rpm2cpio and run the command there? (Or unpack it on a separate box and scp it and/or gdb over). There does not seem to exist ltrace for arm: http://ftp-stud.hs-esslingen.de/pub/fedora-secondary/development/18/arm/os/Packages/l/ strace for rpm -ql: munmap(0xb6fe9000, 27378) = 0 open("/pkcs11.txt", O_RDONLY) = -1 ENOENT (No such file or directory) --- SIGILL {si_signo=SIGILL, si_code=ILL_ILLOPC, si_addr=0x49327724} --- +++ killed by SIGILL +++ Illegal instruction strace for cnucnu: brk(0xb75000) = 0xb75000 gettimeofday({1357856683, 804006}, NULL) = 0 --- SIGILL {si_signo=SIGILL, si_code=ILL_ILLOPC, si_addr=0xb6b4b700} --- +++ killed by SIGILL +++ Illegal instruction Created attachment 676573 [details]
core dump from running rpm -ql
At that low a level... throwing at glibc. My best guess is an architecture mismatch between the built files and your hardware. Alternatively you have flaky DRAM. I started to look into this, but the easiest way is to use an x86/x86_64 cross-gdb to load the core file and inspect the instruction at the time of failure. Unfortunately from FC17 x86_64 I don't see a cross-gdb (just cross gcc, and binutils). We need a cross-gdb. The next easiest thing is for me to just run a simulated FC18 ARM environment. I've started downloading the FC18 Versatile (QEMU) image, but I figured I might as well ask some questions. (1) Can someone with a cross-gdb or native ARM gdb determine what the faulting instruction was? (2) Run `readelf -a /usr/bin/rpm` and attach it. I'm most interested in the GNU attributes section of the output (at the end) since it describes what ARM architecture the file was built for. (3) Run `ldd /usr/bin/rpm` and provide the output. (4) Run `objdump -ldr /usr/bin/rpm >& dump.asm` and attach dump.asm. Thanks. In addition: (5) Run `cat /proc/cpuinfo` and provide the output. (6) Run `uname -a` and provide the output. (7) Did you build and boot a custom kernel for this system? If yes, provide the .config. Thanks. Core was generated by `rpm -ql'. Program terminated with signal 4, Illegal instruction. #0 pt_TestAbort () at ../../../mozilla/nsprpub/pr/src/pthreads/ptio.c:1219 1219 PR_SetError(PR_PENDING_INTERRUPT_ERROR, 0); (gdb) l 1214 static PRBool pt_TestAbort(void) 1215 { 1216 PRThread *me = PR_GetCurrentThread(); 1217 if(_PT_THREAD_INTERRUPTED(me)) 1218 { 1219 PR_SetError(PR_PENDING_INTERRUPT_ERROR, 0); 1220 me->state &= ~PT_THREAD_ABORTED; 1221 return PR_TRUE; 1222 } 1223 return PR_FALSE; (gdb) disassemble Dump of assembler code for function pt_TestAbort: 0x49327700 <+0>: push {r4, lr} 0x49327704 <+4>: bl 0x4930f8e4 0x49327708 <+8>: ldr r1, [r0, #3279928] ; 0xa8 0x4932770c <+12>: mov r4, r0 0x49327710 <+16>: cmp r1, #0 0x49327714 <+20>: bne 0x49327740 <pt_TestAbort+64> 0x49327718 <+24>: ldr r0, [r0] 0x4932771c <+28>: ands r0, r0, #16 0x49327720 <+32>: popeq {r4, pc} => 0x49327724 <+36>: ldr r0, [pc, #40882336] ; 0x49327748 <pt_TestAbort+72> 0x49327728 <+40>: bl 0x49310034 0x4932772c <+44>: ldr r3, [r4] 0x49327730 <+48>: mov r0, #1 0x49327734 <+52>: bic r3, r3, #16 0x49327738 <+56>: str r3, [r4] 0x4932773c <+60>: pop {r4, pc} 0x49327740 <+64>: mov r0, #0 0x49327744 <+68>: pop {r4, pc} 0x49327748 <+72>: ; <UNDEFINED> instruction: 0xffffe897 End of assembler dump. Created attachment 679810 [details]
readelf -a /usr/bin/rpm
# ldd /usr/bin/rpm librpm.so.3 => /lib/librpm.so.3 (0xb6f2a000) librpmio.so.3 => /lib/librpmio.so.3 (0xb6eff000) libselinux.so.1 => /lib/libselinux.so.1 (0xb6ed9000) libcap.so.2 => /lib/libcap.so.2 (0xb6ecd000) libacl.so.1 => /lib/libacl.so.1 (0xb6ebe000) libdb-5.3.so => /lib/libdb-5.3.so (0xb6d4b000) libbz2.so.1 => /lib/libbz2.so.1 (0xb6d31000) libelf.so.1 => /lib/libelf.so.1 (0xb6d12000) liblzma.so.5 => /lib/liblzma.so.5 (0xb6ce9000) liblua-5.1.so => /lib/liblua-5.1.so (0xb6cbd000) libm.so.6 => /lib/libm.so.6 (0xb6c12000) libnss3.so => /lib/libnss3.so (0xb6b15000) libpopt.so.0 => /lib/libpopt.so.0 (0xb6b04000) libz.so.1 => /lib/libz.so.1 (0xb6ae8000) libdl.so.2 => /lib/libdl.so.2 (0xb6adb000) libpthread.so.0 => /lib/libpthread.so.0 (0xb6abb000) libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xb6a94000) libc.so.6 => /lib/libc.so.6 (0xb694d000) /lib/ld-linux.so.3 (0xb6f89000) libpcre.so.1 => /lib/libpcre.so.1 (0xb68ef000) libattr.so.1 => /lib/libattr.so.1 (0xb68e2000) librt.so.1 => /lib/librt.so.1 (0xb68d2000) libnssutil3.so => /lib/libnssutil3.so (0xb68ab000) libplc4.so => /lib/libplc4.so (0xb689f000) libplds4.so => /lib/libplds4.so (0xb6894000) libnspr4.so => /lib/libnspr4.so (0xb685a000) Created attachment 679813 [details]
objdump -ldr /usr/bin/rpm >& dump.asm
I've gathered these files and output from a Fedora-18-Beta-kirkwood-arm chroot on a armv7hl. The Kirkwood machines are armv5tel. Till, can you provide the info that was requested in comment #9? (In reply to comment #10) > Core was generated by `rpm -ql'. > Program terminated with signal 4, Illegal instruction. > #0 pt_TestAbort () at ../../../mozilla/nsprpub/pr/src/pthreads/ptio.c:1219 > 1219 PR_SetError(PR_PENDING_INTERRUPT_ERROR, 0); > (gdb) l > 1214 static PRBool pt_TestAbort(void) > 1215 { > 1216 PRThread *me = PR_GetCurrentThread(); > 1217 if(_PT_THREAD_INTERRUPTED(me)) > 1218 { > 1219 PR_SetError(PR_PENDING_INTERRUPT_ERROR, 0); > 1220 me->state &= ~PT_THREAD_ABORTED; > 1221 return PR_TRUE; > 1222 } > 1223 return PR_FALSE; > (gdb) disassemble > Dump of assembler code for function pt_TestAbort: > 0x49327700 <+0>: push {r4, lr} > 0x49327704 <+4>: bl 0x4930f8e4 > 0x49327708 <+8>: ldr r1, [r0, #3279928] ; 0xa8 The assembler or gdb has incorrectly assembled or disassembled a immediate-offset register relative load instruction. > 0x4932770c <+12>: mov r4, r0 > 0x49327710 <+16>: cmp r1, #0 > 0x49327714 <+20>: bne 0x49327740 <pt_TestAbort+64> > 0x49327718 <+24>: ldr r0, [r0] > 0x4932771c <+28>: ands r0, r0, #16 > 0x49327720 <+32>: popeq {r4, pc} > => 0x49327724 <+36>: ldr r0, [pc, #40882336] ; 0x49327748 > <pt_TestAbort+72> This is a PC-relative load, but the the offset of the PC-relative load is too big. The offset should be *very* small and no more than ~4K at most. The "illegal instruction" is because this is likely an invalid encoding. This is starting to look like a binutils bug. > 0x49327728 <+40>: bl 0x49310034 > 0x4932772c <+44>: ldr r3, [r4] > 0x49327730 <+48>: mov r0, #1 > 0x49327734 <+52>: bic r3, r3, #16 > 0x49327738 <+56>: str r3, [r4] > 0x4932773c <+60>: pop {r4, pc} > 0x49327740 <+64>: mov r0, #0 > 0x49327744 <+68>: pop {r4, pc} > 0x49327748 <+72>: ; <UNDEFINED> instruction: 0xffffe897 > End of assembler dump. The PC-relative load is trying to load the value in the constant pool at the *end* of the function e.g. 0xffffe897, but instead it looks like the assembler encoded the value incorrectly or gdb disassembled it incorrectly. Can you get an `objdump -ldr <file>` of the dynamic library that contains the function `pt_TestAbort'? I want to see what the bfd disassembler says about the function above and compare it to gdb. Notes: - Unfortunately the binutils used to build /usr/bin/rpm didn't use gnu attributes to mark the architectures supported by the binary. Created attachment 680127 [details]
output of "objdump -ldr /lib/libnspr4.so", the library that I expect to have pt_TestAbort inlined
Hmm, the only files that reference pt_TestAbort are:
- /lib/debug/usr/lib/libnspr4.so.debug
- /usr/src/debug/nspr-4.9.2/mozilla/nsprpub/pr/src/pthreads/ptio.c
My guess is that the calling of pt_TestAbort has been optimized and pt_TestAbort
was inlined. Trying to figure out what called pt_TestAbort results in this:
(gdb) bt
#0 pt_TestAbort () at ../../../mozilla/nsprpub/pr/src/pthreads/ptio.c:1219
#1 0x01135900 in ?? ()
#2 0x01135900 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
I can not reproduce the "Illegal Instruction" error (using a chroot on a
different ARM architecture), but setting a gdb-breakpoiint on pr_TestAbort
results in the following backtrace:
Breakpoint 1, pt_TestAbort () at ../../../mozilla/nsprpub/pr/src/pthreads/ptio.c:1215
1215 {
(gdb) bt
#0 pt_TestAbort () at ../../../mozilla/nsprpub/pr/src/pthreads/ptio.c:1215
#1 0xb68d1614 in PR_Access (name=name@entry=0x18700 "/secmod.db", how=how@entry=PR_ACCESS_EXISTS) at ../../../mozilla/nsprpub/pr/src/pthreads/ptio.c:3611
#2 0xb691187c in nssutil_ReadSecmodDB (filename=0x17268 "secmod.db", dbname=0xb6ffeeb8 <__stack_chk_guard> "", dbname@entry=0x186a8 "/pkcs11.txt",
params=0xa <Address 0xa out of bounds>,
params@entry=0x17ed0 "configdir='' certPrefix='' keyPrefix='' secmod='' flags=readOnly,noCertDB,noModDB,forceOpen,optimizeSpace updatedir='' updateCertPrefix='' updateKeyPrefix='' updateid='' updateTokenDescription='' ", rw=-1090523340, appName=<optimized out>, dbType=<optimized out>) at utilmod.c:358
#3 0xb69119f8 in NSSUTIL_DoModuleDBFunction (function=function@entry=0,
parameters=parameters@entry=0x17ed0 "configdir='' certPrefix='' keyPrefix='' secmod='' flags=readOnly,noCertDB,noModDB,forceOpen,optimizeSpace updatedir='' updateCertPrefix='' updateKeyPrefix='' updateid='' updateTokenDescription='' ", args=args@entry=0x0) at utilmod.c:669
#4 0xb6530ce8 in NSC_ModuleDBFunc (function=0,
parameters=0x17ed0 "configdir='' certPrefix='' keyPrefix='' secmod='' flags=readOnly,noCertDB,noModDB,forceOpen,optimizeSpace updatedir='' updateCertPrefix='' updateKeyPrefix='' updateid='' updateTokenDescription='' ", args=0x0) at pkcs11.c:2652
#5 0xb6ba6b1c in SECMOD_GetModuleSpecList (module=module@entry=0x17e48) at pk11pars.c:914
#6 0xb6ba6da8 in SECMOD_LoadModule (
modulespec=modulespec@entry=0x17278 "name=\"NSS Internal Module\" parameters=\"configdir='' certPrefix='' keyPrefix='' secmod='' flags=readOnly,noCertDB,noModDB,forceOpen,optimizeSpace updatedir='' updateCertPrefix='' updateKeyPrefix='' upd"..., parent=parent@entry=0x0, recurse=recurse@entry=1) at pk11pars.c:1028
#7 0xb6b7b540 in nss_InitModules (isContextInit=0, optimizeSpace=1, forceOpen=1, noModDB=1, noCertDB=0, readOnly=0, pwRequired=<optimized out>, configStrings=0x0,
configName=0xb6c4f02c "", updateName=0xb6c4f02c "", updateID=0xb6c4f02c "", updKeyPrefix=0xb6c4f02c "",
updCertPrefix=0xb6b7bd70 <NSS_NoDB_Init+104> "D", <incomplete sequence \342>, updateDir=0x0, secmodName=0x17238 "Hr\001", keyPrefix=<optimized out>,
certPrefix=<optimized out>, configdir=0xb6c4f02c "") at nssinit.c:438
#8 nss_Init (configdir=0xb6c4f02c "", certPrefix=<optimized out>, keyPrefix=<optimized out>, secmodName=0x17238 "Hr\001", updateDir=0xb6c4f02c "",
updCertPrefix=0xb6c4f02c "", updKeyPrefix=0xb6c4f02c "", updateID=0xb6c4f02c "", updateName=0xb6c4f02c "", initContextPtr=initContextPtr@entry=0x0,
initParams=initParams@entry=0x0, readOnly=readOnly@entry=1, noCertDB=noCertDB@entry=1, noModDB=noModDB@entry=1, forceOpen=forceOpen@entry=1,
noRootInit=noRootInit@entry=1, optimizeSpace=optimizeSpace@entry=1, noSingleThreadedModules=noSingleThreadedModules@entry=0,
allowAlreadyInitializedModules=allowAlreadyInitializedModules@entry=0, dontFinalizeModules=dontFinalizeModules@entry=0) at nssinit.c:643
#9 0xb6b7bd70 in NSS_NoDB_Init (configdir=configdir@entry=0x0) at nssinit.c:878
#10 0xb6f63718 in rpmInitCrypto () at digest_nss.c:46
#11 0xb6fafc50 in rpmReadConfigFiles (file=0x0, target=0x1 <Address 0x1 out of bounds>, target@entry=0x0) at rpmrc.c:1617
#12 0xb6fa00b4 in rpmcliConfigured () at poptALL.c:66
#13 0xb6fa05ac in rpmcliInit (argc=argc@entry=2, argv=0xbefff814, optionsTable=<optimized out>) at poptALL.c:290
#14 0x00008fb4 in main (argc=2, argv=<optimized out>) at rpmqv.c:91
PR_Access is available in libnspr4.so, attaching its "objdump -ldr".
(In reply to comment #8) > My best guess is an architecture mismatch between the built files and your > hardware. Alternatively you have flaky DRAM. I tried to reproduce this on a second Dockstar with a new image. There the error did not occur during initial testing but I then broke the initramfs and did not finish with testing, because I ran out of time. I also did not get to try to reproduce the error on the Dockstar that showed the error initially. (In reply to comment #9) > In addition: > > (5) Run `cat /proc/cpuinfo` and provide the output. > > (6) Run `uname -a` and provide the output. > > (7) Did you build and boot a custom kernel for this system? If yes, provide > the .config. I do not have the original USB image anymore, but it was the standard default kirkwood image with the default kernel. I can provide the cpuinfo information after I got one Dockstar to boot again.. The "objdump -ldr /lib/libnspr4.so" contains the sequence that leads up to the crash in the function: 0001f4b8 <PRP_NakedBroadcast>: ... 1f700: e92d4010 push {r4, lr} 1f704: ebffa076 bl 78e4 <_init+0x308> 1f708: e59010a8 ldr r1, [r0, #168] ; 0xa8 1f70c: e1a04000 mov r4, r0 1f710: e3510000 cmp r1, #0 1f714: 1a000009 bne 1f740 <PRP_NakedBroadcast+0x288> 1f718: e5900000 ldr r0, [r0] 1f71c: e2100010 ands r0, r0, #16 1f720: 08bd8010 popeq {r4, pc} 1f724: e59f001c ldr r0, [pc, #28] ; 1f748 <PRP_NakedBroadcast+0x290> 1f728: ebffa241 bl 8034 <_init+0xa58> 1f72c: e5943000 ldr r3, [r4] 1f730: e3a00001 mov r0, #1 1f734: e3c33010 bic r3, r3, #16 1f738: e5843000 str r3, [r4] 1f73c: e8bd8010 pop {r4, pc} 1f740: e3a00000 mov r0, #0 1f744: e8bd8010 pop {r4, pc} 1f748: ffffe897 ; <UNDEFINED> instruction: 0xffffe897 ... This is the only occurrence of that sequence. You'll note that PR_Access does call PRP_NakedBoardcast. Here the offsets are reasonable, and the load address is sufficiently (8-byte) aligned. It might just have been a defect in gdb that the target of the load is printed in the disassembly instead of the actual offset (as it should be). There is no reason for this instruction fault the CPU. Can I get a confirmation from the reporter that this happens consistently? Were there multiple threads present? `info threads' from gdb after loading the core? This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component. This message is a reminder that Fedora 18 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 18. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '18'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 18's end of life. Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 18 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior to Fedora 18's end of life. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. |