Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 952946
Summary: | 32-bit process stack space allocation is broken in PIE mode | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Tom Lane <tgl> |
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
Status: | NEW --- | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | rawhide | CC: | cse.cem+redhatbugz, fweimer, gansalmon, hhorak, itamar, jonathan, kernel-maint, madhu.chinakonda, moez.roy, pmatouse, praiskup, sgrubb, tgl |
Target Milestone: | --- | Keywords: | Reopened, Triaged |
Target Release: | --- | ||
Hardware: | other | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-07-19 10:09:36 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1199775, 947022 | ||
Attachments: |
Created attachment 736660 [details]
patch to print getrlimit and memory map every so often (thanks to John Reiser for suggestions)
Created attachment 736661 [details]
postmaster log extract
Note the discussion on fedora-devel: http://lists.fedoraproject.org/pipermail/devel/2013-April/181553.html This is probably something we want to chase as using the hardened build is definitely desired. I've been able to replicate this on a 32-bit F18 installation, not using mock, just building the SRPM as a normal unprivileged user. So that lets mock off the hook for sure, and I can say it's not a 64-bit-kernel-vs-32-bit-userland issue either. This run was with kernel 3.8.7-201.fc18.i686.PAE, and whatever F18 packages beaker is installing at the moment. Created attachment 737408 [details]
standalone test case
Here's a standalone testcase in 41 lines of C. Compile with
gcc -m32 -pie -fPIE -g -o ./where where.c
The problem is that sbrk() can grow the heap of a -pie program until the heap overlaps the stack, with no complaint from kernel nor glibc.
The testcase repeatedly expands the heap by 0.5MB, printing the address space each time. Each time overlap is detected, then the testcase pauses by reading one byte from stdin. Execution continues until sbrk() fails.
Typical output when the testcase detects overlap that nobody else does is:
f779e000-f779f000 rw-p 00000000 08:15 7024004 ./where
f779f000-f77a1000 rw-p 00000000 00:00 0
f83b3000-ff133000 rw-p 00000000 00:00 0 [heap]
ff926000-ff947000 rw-p 00000000 00:00 0 [stack]
stack rlim_cur=0x800000 rlim_max=0xffffffff stack=0xff945e08
sbrk(0x80000)=0xff133000
warning: possible overlap of heap and stack
The overlap happens because (0xff133000 + 0x80000) > (0xff945e08 - 0x800000).
The high end of the sbrk is (0xff133000 + 0x80000), which is the low end plus the size; and the low end of stackspace is (0xff945e08 - 0x800000), which is the current value minus the maximum stack size.
(In reply to comment #9) > The problem is that sbrk() can grow the heap of a -pie program until the > heap overlaps the stack, with no complaint from kernel nor glibc. Oooh, great diagnosis. I was about to object that this must be a different symptom from what I'm seeing in Postgres, but I'd forgotten that the Postgres test case is eating heap space even faster than it's eating stack (as you can easily see from the successive memory maps in the log in comment #2). There's still some daylight between heap and stack in the last map dump, but extrapolation says they'd have overlapped by several MB by the time of the crash. I'm now thinking that the core dump comes when heap-data manipulations corrupt the stack. Further experimentation says that sbrk will complain only when extending the heap would overrun the currently mapped bottom of stack (0xff926000 in John's sample map above), forgetting that we may have promised via RLIMIT_STACK that the stack can be extended to below that. And apparently, the stack expansion code doesn't notice that it's intruding on already-allocated heap space either. Created attachment 737459 [details]
postgres log with (edited) memory map dump after each 1K of stack growth
I modified the previously shown check-stack.patch so it'd print the memory map each time it printed "observed stack", ie after each 1K of stack growth. This attachment shows the last few printouts before crash; for brevity I omitted all but the last three lines of each memory map. It's rather interesting to watch the map start labeling the stack as "[heap]". But I think what's really happening is that the code that ought to expand the stack just silently gives up once there's no room to expand the stack anymore. Which is not terribly surprising. So IMO John is right to affix the blame on the sbrk() side of things: the heap should not have been allowed to intrude into the region reserved by RLIMIT_STACK. These maps show conclusively that it was so allowed --- the region reserved for stack should go down to 0xff3b5000, but here's heap allocated up to 0xff9ce000.
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs. Fedora 19 has now been rebased to 3.11.1-200.fc19. Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel. If you experience different issues, please open a new bug report for those. Yes. This is still problematic. Considering you have ASLR enabled, if you allocate a memory (does not metter if you use brk(), sbrk() or malloc()), the memory mapping of [heap] grows up to fields where stack should reside (at least if the RLIMIT_STACK was guaranteed). Ideally, allocation glibc function should not go far then to RLIMIT_STACK if ASLR is ON. Otherwise, stack memory mapping may decrease to _very_ small space. Is there any know workaround for this behaviour (e.g. if we were allowed to guarantee some stack minimum size)? Pavel Created attachment 806268 [details]
adjusted reproducer using malloc()
Attached reproducer is yet enhanced John's reproducer showing that it is
possible to cut stack's virtual address space to minimum values by enlarging
heap space. Just run 'make && ./program' (reproduced on x86_64/i386
F19).
*********** MASS BUG UPDATE ************** This bug has been in a needinfo state for more than 1 month and is being closed with insufficient data due to inactivity. If this is still an issue with Fedora 19, please feel free to reopen the bug and provide the additional information requested. I think leaving this bug in NEEDINFO state is a mistake; it implies (at least to some people) that more information is needed from the bug reporter. ISTM that at this point the ball is definitely in the kernel maintainers' court: to either fix it, or provide a workaround for sbrk's failure to honor RLIMIT_STACK. *********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs. Fedora 19 has now been rebased to 3.14.4-100.fc19. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 20, and are still experiencing this issue, please change the version to Fedora 20. If you experience different issues, please open a new bug report for those. This bug is langishing (obviously). I'm marking it so it won't get hit by the auto-rebase-needinfo dealings. Has anyone reported this to the upstream MM developers? That would likely be the best bet for resolution. This came up again just a few days ago on the Postgres mailing lists: http://www.postgresql.org/message-id/20140519091808.GA7296@msgid.df7cb.de Debian's PG packager is now seeing it on all their 32-bit architectures (*not* only i386). I think marking this as "i686 only" was incorrect, see comment #20. We don't have a generic '32-bit hardware' category and 'All' is also incorrect. I guess I'll mark it 'other'. This message is a reminder that Fedora 20 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 20. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '20'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 20 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. I'm told that this has been fixed upstream by http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=a87938b2e246b81b4fb713edb371a9fa3c5c3c86 Dunno if that's migrated into any RH kernels yet, but if so you could try checking whether the problem's gone away, and if so consider enabling PIE for PG. (In reply to Tom Lane from comment #24) > I'm told that this has been fixed upstream by > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/ > ?id=a87938b2e246b81b4fb713edb371a9fa3c5c3c86 > > Dunno if that's migrated into any RH kernels yet, but if so you could try > checking whether the problem's gone away, and if so consider enabling PIE > for PG. It's in 4.1, so all current Fedora releases have that fix. Tested on F22 i386, and PostgreSQL's testsuite passed. Same for Rawhide scratch build. I'm afraid that the problem with not guaranteed RLIMIT_STACK still exists (reproducer in commend #15 still segfaults on i386) -- but that is apparently not a problem for PostgreSQL (so I'll enable hardening). Not sure whether the [heap] and [stack] collision shouldn't be fixed... (I bet its better to let this bug opened). FYI, after bit of testing, I also filed bug 1263974 (I'm not 100% sure about its correctness) -- that issue made my testing bit more uncomfortable.. This message is a reminder that Fedora 21 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 21. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '21'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 21 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. Fedora 22 changed to end-of-life (EOL) status on 2016-07-19. Fedora 22 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. As far as I can tell, this still hasn't been fixed. |
Created attachment 736659 [details] patch for postgresql.spec to enable _hardened_build and add debugging patch Description of problem: I find that enabling _hardened_build breaks PostgreSQL 32-bit builds: they fail regression tests fairly consistently, with symptoms indicating that the kernel is providing only 2MB of stack space even though getrlimit(RLIMIT_STACK) claims the stack limit is 8MB. This does not happen without _hardened_build, and it's not 100% consistent with, so there's something rotten in the address space randomization stuff. It should be noted that I'm testing 32-bit builds under mock with a 64-bit kernel; I do not know whether the kernel's word width is relevant here. Version-Release number of selected component (if applicable): kernel-3.9.0-0.rc5.git1.301.fc19.x86_64 How reproducible: Seems close to 100% when using F19-alpha environment on a laptop. I see the same behavior on my due-for-retirement F16 workstation, although on that box it only fails maybe 50% of the time; don't know if this is related to the beefier hardware or the older kernel. Steps to Reproduce: [ sorry for the overcomplicated test case, but I've been unable to reproduce this with a simple test program ] 1. Grab current postgresql sources from Fedora package git, and add _hardened_build to the specfile; optionally add check-stack.patch which attempts to provide some relevant debug output. 2. Build in 32-bit environment under mock, viz /usr/bin/mock -r fedora-19-i386 /tmp/postgresql-9.2.4-1y.fc20.src.rpm On an actual 32-bit machine it might not be necessary to use mock ... or then again maybe the 64-bit kernel is an important part of the equation. Actual results: regression tests fail due to crash in "infinite_recurse()" test case, which is meant to verify that the platform's stack depth limit has been correctly detected. If this doesn't happen immediately, try it a few times. Expected results: Should pass reliably. Additional info: I've attached a specfile patch, the referenced check-stack.patch, and an extract from the postmaster log showing what the check-stack patch prints before dying. It is quite clear that the effective stack depth limit is only about 2MB, even though getrlimit claims it's 8MB. I've tried to generate an equivalent failure using a short test program, without much success. I speculate that the reason postgres fails has to do with the fact that it loads a fair number of shared libraries (cf memory maps in log), or with the fact that it creates a SysV-style shared memory segment.