Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 104029
Summary: | Possible bug in Radeon DRI driver | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux Beta | Reporter: | Nils Philippsen <nphilipp> |
Component: | XFree86 | Assignee: | Mike A. Harris <mharris> |
Status: | CLOSED RAWHIDE | QA Contact: | David Lawrence <dkl> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | beta1 | CC: | behdad, chris.ricker, drepper, jdennis, mingo, notting, sajchurchey |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | 4.3.0-25 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2003-11-06 06:52:11 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 100643 | ||
Attachments: |
Description
Nils Philippsen
2003-09-09 06:15:36 UTC
Created attachment 94316 [details]
backtrace of crashed glmatrix
Created attachment 94317 [details]
backtrace of crashed skyrocket
rss_glx upload now finished. Get also the OpenAL packages if you want to try it. Hmm, functions with exec-shield is switched off. Your call if this is an XFree86, application or kernel bug) ;-). Unfortunately, the backtraces are useless without debugging information in them. It's hard to determine where the problem may lay, however if I had to hazard a guess, I would guess the Mesa DRI 3D driver. Can you rebuild the X src.rpm with debug symbols by changing the DebuggableBuild toggle in the spec file and adding .dbg to the release, then upgrading to the new packages? That would help quite a lot. TIA This reminds me of a bug I filed about issues with an R9000 and DRI, lemme see if I can dig it up. Indeed, it seems these two bugs might be the same bug. I'll leave both open until we can conclude for sure it's a dupe or not though. I was hoping to get a good backtrace of Nils, in order to query bugzilla for similarish bugs, but I think you saved me a bit of time. ;o) Thanks Bill I'd be happy to test, but I guess I'd need to tidy up my system befortehands then -- I don't want to build XFree86 more than once if possible because this is not exactly the "killer compilation machine" ;-). How much space do I need in .../redhat/ to do this? Any other prereqs? We think we know the cause of this problem now. Try disabling exec-shield on your machine, and running the apps that previously failed. The DRI 3D drivers appear to be trying to execute malloc()'d code somewhere, which wasn't mapped with PROT_EXEC. Please let me know if disabling exec-shield causes the problem to stop. Compiling X requires 1-2Gb of disk space free. rpmbuild will complain about any missing dependancies. Hope this helps. As I wrote above: the stuff works as before with "kernel.exec-shield = 0". Ok, then this is definitely the same issue. Basically something is wrong with the Mesa DRI drivers. They're dlopen()'d, and so should be exec-shield friendly, however they're not. Something appears to be allocating memory without PROT_EXEC and then executing it. The problem area has yet to be located. I've discussed this both with Ingo and Uli earlier today, and I've got a vague idea where the problems are (radeon_vtxfmt_c.c and r200_vtxfmt_c.c) however tracking the problem down to the ultimate cause will have to wait until after I have completed the XFree86 security erratum which is a remotely exploitable hole in XFree86 font server, X server, and other local holes. If this issue is considered an urgent blocker to fix (which it is not flagged as currently in bugzilla), perhaps John can investigate. If John has the time for this, I can discuss what is known over the phone with him quickly. I believe he's investigated similar issues inside the ELF loader before as well. Basically, anywhere malloc'd memory ends up containing executable code, that memory needs to have mprotect() called on it to mark it executable first. Likewise, anywhere that mmap() is used for executable code, it must be mmapped with PROT_EXEC. It's almost certain that mesa or some of the DRI drivers themselves are malloc()ing or mmapping incorrectly and this is causing the code to fail when exec-shield is used. Note that this has nothing at all to do with the X server ELF loader, as these modules are not X server modules, but are Mesa DRI modules. Additional information, is that libGL dlopen()s the DRI modules, and that the tdfx, radeon, and r200 modules also use dlopen() internally. It's unlikely that the problems are caused in these areas. It appears that there might perhaps be memory getting malloc()'d that then gets used as a function jump table without being mprotect()'ed, but that is just an unverified hypothesis that jumped into my mind while studying the complex web of token pasting, macro abuse and function pointer abuse that is in the DRI driver code. Bill: Basically DRI will not work at all if exec-shield is enabled. If you think this is a Target or Blocker, feel free to mark it as such, as I'm prioritizing bugs based on their Blocker/Target status. John: Do you have cycles to spare to look into this and/or are you interested in poking at it? I don't suspect it'd take any more than a day or so to figure out. TIA Hmm... I seem to be having problems reproducing this. Kernel is 2.4.22-1.2039.nptl, xscreensaver is xscreensaver-4.12-1, XFree86-4.3.0-29. It does not seem to matter if /proc/sys/kernel/exec-shield is 0 or 1, xscreen saver glmatrix works. How are you changing exec-shield, are you cat'ing "0" or "1" into /proc/sys/kernel/exec-shield? Is anybody else able to reproduce this? I've coded a fix, but unless I can reproduce the failure, but may be a moot point. Yeah, this spontaneously started working for me in more recent builds, and I'm not sure why. Hmm... that's a bit worrysome. Mesa hasn't changed, is it possible that exec-shield got broken such that its not enabled anymore? "WORKSFORME": --- 8< --- nils@wombat:~> sudo sysctl -w "kernel.exec-shield=1" kernel.exec-shield = 1 nils@wombat:~> /usr/X11R6/lib/xscreensaver/glmatrix Segmentation fault (core dumped) nils@wombat:~> uname -r 2.4.22-1.2044.nptl nils@wombat:~> rpm -q XFree86 XFree86-4.3.0-30 nils@wombat:~> rpm -q xscreensaver xscreensaver-4.12-1 nils@wombat:~> sudo sysctl -w "kernel.exec-shield=0" kernel.exec-shield = 0 nils@wombat:~> /usr/X11R6/lib/xscreensaver/glmatrix nils@wombat:~> --- >8 --- Created attachment 94602 [details]
patch provides execute permission on memory allocated for code generation
Created attachment 94603 [details]
patch modifies Imakefiles to correctly install in "tls" directores for a tls enabled build
This patch fixes these patches:
XFree86-4.3.0-redhat-libGL-opt.patch
XFree86-4.3.0-redhat-libGL-opt-v2.patch
Actually I think libGL-opt-v2 supercedes the other patch.
This patch corrects two flaws in the above patch.
1) GlxUseThreadLocalStorage was referenced before it was defined. This produced
errors on every Makefile that was made.
2) The intent is to produce two binary differnt versions of libraries, one
without threading and one supporting thread local storage (TLS). The tls
variants of the libraries are installed in a "tls" subdirectory. However the
original patch never updated the destination directory for installs, instead
tls versions of the library were in installed in the non-tls (parent) directory
where the non-tls versions of the libraries lived. Then the spec file copied
the files into the the tls subdirectory. In other words "make install" is very
broken and if done outside of an rpm build will trash the libraries on the
system :-(
After applying this patch we need to fix the spec file, we need to remove the
code that copies and links the tls libraries. We need to keep the code that
adds GlxUseThreadLocalStorage to host.def, does a make clean, make Makefiles,
make, make install. When GlxUseThreadLocalStorage is defined to YES the files
will be installed where they belong.
Mike: I'm assigning this to you so you can apply the patches and update the spec file. Please read the note on the TLS Imakefile patch. That patch BTW was created after the XFree86-4.3.0-redhat-libGL-opt-v2.patch was applied, perhaps it should be merged with that patch as its all related and undoes some of what earlier patch did. You'll have to delete some stuff from the spec file too, hopefully the comment above will be clear. I will submit the other patch upstream to the DRI folks, I noticed other potentional bugs in mem.c I'd like to bring to their attention as well. You may be interested to know the patch contains two alternate implementations, one that uses mprotect and one that uses anonymous mmap, both were tested, the patch turns on the mmap variant. There is also extensive documentation in mem.c that I added. *** Bug 101647 has been marked as a duplicate of this bug. *** execute permission patch applies cleanly but does not compile, failing at: gcc -m32 -O2 -march=i386 -mcpu=i686 -fno-strict-aliasing -pipe -ansi -pedantic -Wall -Wpointer-arith -Wundef -fno-merge-constants -I../../../../../exports/include -I../../../../../exports/include/X11 -I../../../../../include/extensions -I../../../../../extras/Mesa/include -I../../../../../lib/GL/include -I../../../../../extras/Mesa/src -I../../../../../programs/Xserver/include -I../../../../.. -I../../../../../exports/include -Dlinux -D__i386__ -D_POSIX_C_SOURCE=199309L -D_POSIX_SOURCE -D_XOPEN_SOURCE -D_BSD_SOURCE -D_SVID_SOURCE -D_GNU_SOURCE -DSHAPE -DXINPUT -DXKB -DLBX -DXAPPGROUP -DXCSECURITY -DTOGCUP -DXF86BIGFONT -DDPMSExtension -DPIXPRIV -DPANORAMIX -DRENDER -DRANDR -DGCCUSESGAS -DAVOID_GLYPHBLT -DPIXPRIV -DSINGLEDEPTH -DXFreeXDGA -DXvExtension -DXFree86LOADER -DXFree86Server -DXF86VIDMODE -DXvMCExtension -DSMART_SCHEDULE -DXResExtension -DX_BYTE_ORDER=X_LITTLE_ENDIAN -DNDEBUG -DFUNCPROTO=15 -DNARROWPROTO -DIN_MODULE -DXFree86Module -DGLXEXT -DXF86DRI -DGLX_DIRECT_RENDERING -DGLX_USE_DLOPEN -DGLX_USE_MESA -c mem.c In file included from /usr/include/bits/types.h:29, from /usr/include/unistd.h:190, from mem.c:42: /usr/lib/gcc-lib/i386-redhat-linux/3.2/include/stddef.h:201: conflicting types for `xf86size_t' ../../../../../programs/Xserver/include/xf86_libc.h:59: previous declaration of `xf86size_t' In file included from mem.c:42: /usr/include/unistd.h:193: conflicting types for `xf86ssize_t' ../../../../../programs/Xserver/include/xf86_libc.h:60: previous declaration of `xf86ssize_t' In file included from mem.c:42: /usr/include/unistd.h:310: conflicting types for `xf86read' ../../../../../programs/Xserver/include/xf86_ansic.h:270: previous declaration of `xf86read' /usr/include/unistd.h:313: conflicting types for `xf86write' ../../../../../programs/Xserver/include/xf86_ansic.h:271: previous declaration of `xf86write' /usr/include/unistd.h:383: conflicting types for `xf86usleep' ../../../../../programs/Xserver/include/xf86_ansic.h:342: previous declaration of `xf86usleep' In file included from mem.c:42: /usr/include/unistd.h:820:29: macro "getpagesize" passed 1 arguments, but takes just 0 In file included from mem.c:43: /usr/include/sys/mman.h:59: conflicting types for `xf86mmap' ../../../../../programs/Xserver/include/xf86_ansic.h:272: previous declaration of `xf86mmap' /usr/include/sys/mman.h:77: conflicting types for `xf86munmap' ../../../../../programs/Xserver/include/xf86_ansic.h:273: previous declaration of `xf86munmap' mem.c:148: conflicting types for `_mesa_malloc' ../../../../../extras/Mesa/src/mem.h:59: previous declaration of `_mesa_malloc' mem.c:158: conflicting types for `_mesa_calloc' ../../../../../extras/Mesa/src/mem.h:60: previous declaration of `_mesa_calloc' mem.c:188: conflicting types for `_mesa_align_malloc' ../../../../../extras/Mesa/src/mem.h:63: previous declaration of `_mesa_align_malloc' mem.c:215: conflicting types for `_mesa_align_calloc' ../../../../../extras/Mesa/src/mem.h:64: previous declaration of `_mesa_align_calloc' mem.c:494: conflicting types for `_mesa_exec_malloc' ../../../../../extras/Mesa/src/mem.h:66: previous declaration of `_mesa_exec_malloc' mem.c:569: conflicting types for `_mesa_memset16' ../../../../../extras/Mesa/src/mem.h:141: previous declaration of `_mesa_memset16' make[7]: *** [mem.o] Error 1 make[7]: Leaving directory `/home/mharris/rpmbuild/BUILD/XFree86-4.3.0/xc/programs/Xserver/GL/mesa/src' make[6]: *** [all] Error 2 make[6]: Leaving directory `/home/mharris/rpmbuild/BUILD/XFree86-4.3.0/xc/programs/Xserver/GL/mesa' make[5]: *** [mesa] Error 2 make[5]: Leaving directory `/home/mharris/rpmbuild/BUILD/XFree86-4.3.0/xc/programs/Xserver/GL' make[4]: *** [GL] Error 2 make[4]: Leaving directory `/home/mharris/rpmbuild/BUILD/XFree86-4.3.0/xc/programs/Xserver' make[3]: *** [all] Error 2 make[3]: Leaving directory `/home/mharris/rpmbuild/BUILD/XFree86-4.3.0/xc/programs' make[2]: *** [all] Error 2 make[2]: Leaving directory `/home/mharris/rpmbuild/BUILD/XFree86-4.3.0/xc' make[1]: *** [World] Error 2 make[1]: Leaving directory `/home/mharris/rpmbuild/BUILD/XFree86-4.3.0/xc' make: *** [World] Error 2 make: Leaving directory `/home/mharris/rpmbuild/BUILD/XFree86-4.3.0/xc' error: Bad exit status from /home/mharris/rpmbuild/tmp/rpm-tmp.4364 (%build) Some feedback from jakub on the patch.. <jakub> mharris: better use fd -1 instead of 0 for MAP_ANONYMOUS <jakub> mharris: and at least glapi.c should simply EXEC_MALLOC (getpagesize () - 128, 16) the first time it needs it <jakub> mharris: and then return addresses from that buffer... <jakub> mharris: and free in __attribute__((destructor)) O.K. I'll confess I don't understand why this isn't compiling. My patch was against rev 29 of the rpm. I suspect things have changed since I notice the gcc args passed to mem.c are not the same as I have. It appears the xf86size_t is only defined in xf86_libc.h and xf86_OSlib.h, neither of which get included when I compile mem.c. Since I can't reproduce this perhaps the best thing is to send me a pointer to the src rpm generating the error and I'll trying debugging the build using that. It's more than a month now. I guess this has been fixed. right? I just tried building the latest package XFree86-4.3.0-42.src.rpm with this patch enabled and it built fine on x86. From my vantage point the bug is fixed. However I did notice the application of the patch was disabled in the spec file, that will probably have to be remedied. I'll just assume the compile problem was one of those mysteries of the universe. If the rpmbuild fails again with the patch enabled, give me the details and assign it back to me again, otherwise I'm assuming this will sail through. Version 2 of this patch was applied to the spec file on Sept 25: * Thu Sep 25 2003 Mike A. Harris <mharris> 4.3.0-34 - Updated to XFree86-4.3.0-xf-4_3-branch-2003-09-26.patch to pick up new security fixes from CVS - Updated XFree86-4.3.0-redhat-libGL-exec-shield-fixes.patch to new patch XFree86-4.3.0-redhat-libGL-exec-shield-fixes-v2.patch which reorders some includes in mem.c so it builds. Still cambridge only. It's been flagged to only compile in for build_cambridge previously, and recently I renamed that flag to build_yarrow for the final release name. It was flagged this way because RHL 8.0/9 doesn't have exec shield anyway so I didn't want to introduce the possibility of regression for erratum updates for 9, or to needlessly break 8.0 if for some reason it didn't work (pedantic paranoia mostly), and wanted it only in Fedora Core 1 until well tested enough in the wild to apply to other builds potentially. So this patch has been applied for over a month, but the bug report just not updated to reflect that. Doh. No problems reported yet John, so your fix seems to work. I'm closing the bug for now, but if anyone has any problems with exec-shield, please reopen unless you think it is a different issue, in which case open a new bug report for us to investigate. Feel free if you test this to add a "tested and it works for me now" to this report also if you like.... Closing as RAWHIDE, fixed in 4.3.0-25 |