Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1341829
Summary: | Systemd-coredump doesn't save any core files | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Göran Uddeborg <goeran> | ||||||||
Component: | selinux-policy-targeted | Assignee: | Lukas Vrabec <lvrabec> | ||||||||
Status: | CLOSED NEXTRELEASE | QA Contact: | Ben Levenson <benl> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 24 | CC: | cfergeau, cschalle, dwalsh, fedora, fedora, fweimer, jberan, jkurik, johannbg, knight, lnykryn, lvrabec, matej, mattdm, mcatanzaro+wrong-account-do-not-cc, msekleta, muadda, riehecky, ssahani, s, stefw, systemd-maint, tpopela, zbyszek | ||||||||
Target Milestone: | --- | Keywords: | Triaged | ||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | selinux-policy-3.13.1-241.fc26.noarch | Doc Type: | If docs needed, set a value | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2017-07-04 18:22:29 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | 1365435 | ||||||||||
Bug Blocks: | 1309172, 1405995 | ||||||||||
Attachments: |
|
Description
Göran Uddeborg
2016-06-01 19:59:08 UTC
You have to set ulimit now (and disable SELinux, bug #1317927) for systemd-coredump to work in F24. This really sucks; neither was required in F23. I was hoping that having coredumpctl enabled by default could be a F25 feature thanks to recent integration work by the ABRT team, but looks like that requires either setting ulimit systemwide (probably preferable) or reverting the change to respect ulimit. (In reply to Michael Catanzaro from comment #1) > requires either setting ulimit systemwide (probably preferable) Starting with systemd-229, 'ulimit -c' (RLIMIT_CORE) is "unlimited" for all process by default [1][2][3] (bug #1309172). 1: https://github.com/systemd/systemd/blob/master/src/core/main.c#L1500 2: https://github.com/systemd/systemd/blob/master/NEWS (CHANGES WITH 229) 3: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/HQ4JFTYLPT5GRW6AD4M2MWGMRAPE7ITN/ Hmm, this seems not to be the case on my system, not sure why /proc/1/limits shows Max core file size unlimited unlimited bytes but then most other processes have it set to 0 This is a system I've been upgrading from f21 or so, so maybe this is related to a change which was not properly done on upgrade? (don't think I have a clean f24 install around) (In reply to Jakub Filak from comment #2) > (In reply to Michael Catanzaro from comment #1) > > requires either setting ulimit systemwide (probably preferable) > > Starting with systemd-229, 'ulimit -c' (RLIMIT_CORE) is "unlimited" for all > process by default [1][2][3] (bug #1309172). > On a fresh f24 install: core file size (blocks, -c) 0 Indeed, it seems that systemd is screwing up somehow. On a fresh installation of F24, "Max core file size" is unlimited for PID 1, but for various daemons it's set to 0. This doesn't match what systemd thinks should be set for the service: $ systemctl show -p LimitCORE avahi-daemon LimitCORE=18446744073709551615 Locally I'm running F24 with systemd from git, and the limits are infinity as expected. But I don't see any relevant changes in code between v229 and my version. I'm stumped as to what the cause of this discrepancy is, so I've reported the bug upstream. Oh, it seems to be selinux related... With "permissive", most daemons and my user session has no limit, while with "enforcing", most daemons and the user session all have 0. But /var/log/audit/audit.log does not seem to contain any useful data. Maybe some don't audit rule? I'll reassign this to selinux. -- short reproducer: $ grep core /proc/$(pidof systemd-journald)/limits This returns "Max core file size unlimited unlimited bytes" under F24/targeted/permissive, and "Max core file size 0 0 bytes" with F24/targeted/enforcing. Hi Zbigniew, is this a different bug from bug #1317927? #1317927 is long and messy, but it seems that it's a separate issue: ProtectSystem=full and denying mounton which is used to implement it (https://bugzilla.redhat.com/show_bug.cgi?id=1317927#c18). The effect of both is very similar (no core file), but it seems that there are two underlying causes. Hmm, iirc selinux policy can actually affect rlimit setting. and I think the policy prohibits this for PID1 atm. Note that PID 1 in systemd will bump RLIMIT_CORE to infinity early on, but it ignores failures on this. All services started during runtime simply inherit this then. If the bumping fails nothing will be inherited. it would hence good to know if selinux permissive vs. enforcing has an effect on RLIMIT_CORE for PID 1 itself. Thi sis how it should look like: $ grep core /proc/1/limits Max core file size unlimited unlimited bytes This is on a permissive system. Question is, does it look like that on enforcing too? If not, then i figure all that's missing is an selinux policy change to permit PID 1 to bump RLIMIT_CORE for itself. (And of course, it might make sense to change systemd to log at debug level if bumping fails, instead of being entirely quiet about it) (In reply to Lennart Poettering from comment #9) > This is on a permissive system. Question is, does it look like that on > enforcing too? Unfortunately that is how it looks on my enforcing system. Indeed. After a fresh reboot with selinux-policy-targeted-3.13.1-191.10.fc24 (the version supposed to fix bug #1317927?) I get the output below. Retrying the test case, I still get the crash listed with "coredumpctl list", but "coredumpctl gdb <pid>" fails the same way as in comment 0. (I guess everyone realized I meant "coredumpctl gdb <pid>" in step 5.) mimmi$ sudo grep core /proc/1/limits Max core file size unlimited unlimited bytes mimmi$ sudo grep core /proc/`pidof systemd-journald`/limits Max core file size 0 unlimited bytes Same issue here. Goran, Could you set selinux to permissve and also provide: # semodule -DB And then reproduce the issue? Could you see any AVCs then? THank you. Created attachment 1191322 [details]
Audit log with dontaudit disabled
No problem, I attach attach the audit log during the experiment. There are three AVC:s. Two of them I recognize as ones I usually see when I turn off dontaudit. I'm less sure about the socket read/write attempt. Could that be a clue?
Sorry, my bad! I was redoing my initial experiment with dontaudits disabled. But that of course got hit by my login session having core size limit set to 0. I guess you meant I should reboot the machine with dontaudits disabled, and hand you THAT list. I'll get back to that, but I'll have to find a "service window" when I don't disturb. Created attachment 1192584 [details]
Audit log from reboot with dontaudit disabled
Ok, so here is a new try. I attach the audit log file, starting at the boot with dontaudit rules disabled.
Looking a bit, I started to think about all those "rlimitinh" denials. That is something which seems to be dontaudited everywhere. Wouldn't that have exactly this effect? According to http://seedit.sourceforge.net/doc/access_vectors/ "If this is denied, signal state is cleared". I'm not sure exactly what "cleared" means in this case. But it sounds suspicious to me. I make many mistakes in this report. :-( The quote should be "If this is denied, rlimit is cleared". Hi Göran, Could you test it with local policy? $ cat local.cil (allow init_t systemd_coredump_t(process (noatsecure rlimitinh))) # semodule -i local.cil And reproduce your issue. Thanks. "cil", that was something new to me! Anyway, I installed the module and rebooted. I couldn't see any change. I also wonder how it COULD have helped. If I understand the module correctly, it will only affect the systemd-coredump process itself. Maybe this report has become a bit confused. So maybe it's appropriate to clarify MY understanding of the situation. The problem is that by default, when a process gets a signal that normally would generate a core, no core is generated and collected by systemd-coredump. The reason seems to be that the ulimit for core files is set to 0, again by default. If I explicitly change the ulimit to unlimited in a shell, and retry the experiment, I DO get a core file saved. According to comment 2, the intention is for the core ulimit to be unlimited. The fact it isn't seems to be the problem. My guess in comment 16 and comment 17 was that this could be because SELinux mostly deny the rlimitinh access. These denials don't show up in the log since they are dontaudit:ed. But they still foil systemd's attempt to allow core dumps in general. Since I consider coredumpctl to be a priority feature for Fedora Workstation, I am planning to propose disabling SELinux by default in Workstation until this can be fixed. Tested with a fresh copy of Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso: - booting in the default configuration: ulimit -c is 0 in liveuser's gnome-terminal ulimit -c is 0 in xterm started from alt-f2 - booting with enforcing=0 on the kernel command line: ulimit -c is unlimited in gnome-terminal ulimit -c is unlimited in xterm started from alt-f2 If I raise ulimit I get a successful core dump: [liveuser@localhost ~]$ ulimit -c unlimited [liveuser@localhost ~]$ ulimit -c unlimited [liveuser@localhost ~]$ sudo systemctl stop abrt* [liveuser@localhost ~]$ bash -c 'kill -SEGV $$' Segmentation fault (core dumped) [liveuser@localhost ~]$ coredumpctl TIME PID UID GID SIG PRESENT EXE Tue 2016-10-18 13:26:17 EDT 2549 1000 1000 11 * /usr/bin/bash [liveuser@localhost ~]$ coredumpctl gdb (works) So it issue seems to boil down to selinux rules. (In reply to Zbigniew Jędrzejewski-Szmek from comment #22) > Tested with a fresh copy of Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso: > - booting in the default configuration: > ulimit -c is 0 in liveuser's gnome-terminal > ulimit -c is 0 in xterm started from alt-f2 > - booting with enforcing=0 on the kernel command line: > ulimit -c is unlimited in gnome-terminal > ulimit -c is unlimited in xterm started from alt-f2 That means that SELinux is preventing systemd from updating its RLIMIT_CORE: https://github.com/systemd/systemd/blob/master/src/core/main.c#L1516 setrlimit(RLIMIT_CORE, &RLIMIT_MAKE_CONST(RLIM_INFINITY)) The issue was introduced in Fedora 24 because when systemd-229 was released and the default RLIMIT_CORE was changed to UNLIMITED (and because ABRT maintainers didn't know about this major change) ABRT has started laying core files all around file system. That proves that SELinux wasn't preventing systemd to update RLIMIT_CORE at that time. In reply to Jakub Filak from comment #23) > That means that SELinux is preventing systemd from updating its RLIMIT_CORE You don't think, as I suspected in comment #16, that it allows SETTING of the limit, but prevents children from INHERITING the new value? That seems to be happening. PID 1 has core=unlimited rlimit, but various child processes have core=0. The default for PID 1 seems to be core=0, that's what I see if I boot with init=/bin/bash, and strace reveals no setrlimit calls from bash. So it seems systemd successfully sets rlimit core=unlimited for itself, but this is not inherited as expected. (In reply to Michael Catanzaro from comment #21) > Since I consider coredumpctl to be a priority feature for Fedora > Workstation, I am planning to propose disabling SELinux by default in > Workstation until this can be fixed. Just noting I recall that SELinux enablement is a Fedora shipping default the Council previously stated was not variable per addition. Which means we need to figure out how to resolve this issue between SELinux, systemd, and any other required developer teams. Sorry, *edition. On the Evaluation meeting for Prioritized bugs we have agreed not to approve this bug for the "Priritized bugs list". It seems likely that this will end up fixed as a dependency of other changes in Fedora Workstation, so we don't think we need to call it out as requiring special attention. Hi, Could somebody test it with the latest selinux-policy rpm package? http://koji.fedoraproject.org/koji/buildinfo?buildID=822892 I added some changes there. It's still broken: Dec 08 08:19:14 victory-road systemd[1]: Created slice system-systemd\x2dcoredump.slice. Dec 08 08:19:14 victory-road systemd[1]: Started Process Core Dump (PID 5730/UID 0). Dec 08 08:19:14 victory-road audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@0-5730-0 comm="systemd" exe="/usr/lib/systemd/systemd" Dec 08 08:19:14 victory-road systemd-coredump[5737]: Core Dumping has been disabled for process 2403 (epiphany). Dec 08 08:19:14 victory-road systemd-coredump[5737]: Process 2403 (epiphany) of user 1000 dumped core. I guess ulimit is still 0. Do you see any AVC denials? Please run: # semodule -DB # reproduce the scenario # ausearch -m USER_AVC,AVC -ts recent Thanks. Yeah I see a bunch of denials for systemd, then (ironically?) a couple for setroubleshootd itself, then these two: time->Thu Dec 8 11:59:51 2016 type=AVC msg=audit(1481219991.533:361): avc: denied { rlimitinh } for pid=14725 comm="systemd-coredum" scontext=system_u:system_r:init_t:s0 tcontext=system_u:system_r:systemd_coredump_t:s0 tclass=process permissive=0 ---- time->Thu Dec 8 11:59:51 2016 type=AVC msg=audit(1481219991.533:362): avc: denied { noatsecure } for pid=14725 comm="systemd-coredum" scontext=system_u:system_r:init_t:s0 tcontext=system_u:system_r:systemd_coredump_t:s0 tclass=process permissive=0 This is with selinux-policy-3.13.1-225.1.fc25 from updates-testing. Created attachment 1229583 [details]
ausearch -m USER_AVC,AVC -ts recent
We were discussing this in the Workstation working group meeting -- the bug is still present. Lukas, any progress here? Paul, Working on fix right now. I'll provide more info ASAP. Okay, I have fix for this issue. Quick workaround: 1. # cat domain.cil (allow init_t domain (process (rlimitinh))) 2. semodule -i domain.cil Testing: # getenforce Enforcing # grep core /proc/`pidof systemd-journald`/limits Max core file size unlimited unlimited bytes # sleep 30 # hit ^\ to generate a SIGABRT # coredumpctl Tue 2017-01-17 16:25:38 CET 1207 0 0 3 present /usr/bin/sleep Build will be available ASAP. *** Bug 1365435 has been marked as a duplicate of this bug. *** Hi Lukas, will an update be available for this soon? (In reply to Michael Catanzaro from comment #38) > Hi Lukas, will an update be available for this soon? Hi Lukas, the change deadline for this is March 3. It's been a month and a half since you identified a fix for this issue; can you please release an update? (In reply to Michael Catanzaro from comment #39) > Hi Lukas, the change deadline for this is March 3. It's been a month and a > half since you identified a fix for this issue; can you please release an > update? Er, actually the deadline is today. March 3 is the date of the FESCo review meeting. Hi, # sesearch -A -s init_t -t domain -c process | grep rlimi allow init_t domain : process { sigchld sigkill sigstop signull signal getpgid getattr setrlimit rlimitinh } ; # rpm -q selinux-policy selinux-policy-3.13.1-241.fc26.noarch This issue is already fixed in F26. So if this is already fixed, surely the bug should be closed...? Here's how bug workflow happens: https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_status MODIFIED means there's a fix available that the developer believes can be tested. If, in the Fedora case, the reporter or QA tests and verifies the fix, they can report that back (VERIFIED) and then the bug can be closed by the assignee. If there's an F25 update available, I'm happy to test that. But I trust that it works, given that I've verified that running Lukas's semodule command in comment #36 fixed the issue for me locally. @Michael, as a reporter I'm also happy to test an F25 update. Testing the F26 update will have to wait a little, though. When it comes to trust, I'm more of the kind "I believe it when I see it". :-) Appears to be working on F26 Alpha. Works for me too with F26 packages (3.13.1-251.fc26) right now. Since I'm the reporter, I take it I can move this bug to VERIFIED. Based on the comments, seems to be fixed. Let's close this. |