Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1352064 - Thread local state doesn't get reset across fork
Summary: Thread local state doesn't get reset across fork
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Fedora
Classification: Fedora
Component: libcap-ng
Version: 25
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Steve Grubb
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 1351995 1352066
TreeView+ depends on / blocked
 
Reported: 2016-07-01 14:23 UTC by Daniel Berrangé
Modified: 2017-11-14 16:42 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1352066 (view as bug list)
Environment:
Last Closed: 2017-11-14 16:42:01 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Use pthread_atfork to reset global state (deleted)
2016-07-01 14:55 UTC, Daniel Berrangé
no flags Details | Diff

Description Daniel Berrangé 2016-07-01 14:23:42 UTC
Description of problem:
The latest version of audit-libs has started using capng_have_capability() before sending audit messages.

This has in turn exposed a bug in libcap-ng's use of thread local state.

Specifically, if you call any capng_* function and then call fork(), the 

static __thread struct cap_ng m

will get initialized with the current PID.

If you then fork() and call another capng API, (for example, you're trying to drop caps for a child process you're spawning), this thread local doesn't get reset. 

As a result capng_apply() will try to set capabilities on the parent PID and gets an error

strace clearly shows the problem:

[pid 26690] capset({_LINUX_CAPABILITY_VERSION_3, 26689}, {0, 0, 0}) = -1 EPERM (Operation not permitted)


not it is runing pid 26690 but setting caps on 26689.

AFAICT, this bug has existed forever. The new audit-libs 2.6.1 has just started calling capng_have_capability() which has in turn broken libvirt which uses capng when spawning processes after fork.

The following demo shows the problem

#include <libaudit.h>
#include <cap-ng.h>

#include <stdio.h>
#include <sys/wait.h>


int main(int argc, char **argv)
{
  int fd = audit_open();

  if (fd < 0) {
    perror("audit_open");
    return 1;
  }

  audit_log_user_message(fd, AUDIT_VIRT_CONTROL, "test", NULL,
			 NULL, NULL, 1);

  close(fd);

  pid_t child = fork();

  if (child == 0) {
    capng_clear(CAPNG_SELECT_CAPS);
    if (capng_apply(CAPNG_SELECT_CAPS) < 0) {
      perror("capng_apply");
      _exit(1);
    }
    
    _exit(0);
  }

  int status = 0;
  waitpid(child, &status, 0);

  fprintf(stderr, "Child exited %d\n", status);
  return 0;
}


$ gcc -lcap-ng -laudit -o cap cap.c
[berrange@t530wlan ~]$ ./cap
capng_apply: Operation not permitted
Child exited 256


If you downgrade to audit-libs-2.6 the problem goes away, though clearly the bug in cap-ng still exists.


Version-Release number of selected component (if applicable):
audit-libs-2.6.1-1.fc24.x86_64
audit-libs-2.6.1-1.fc24.i686
libcap-ng-0.7.7-4.fc24.x86_64
libcap-ng-0.7.7-4.fc24.i686

Comment 1 Steve Grubb 2016-07-01 14:46:14 UTC
As for the libvirt issue, the audit package has been fixed to save and restore state in upstream commit 1304. A new audit package will be released shortly. See bz 1351954.

Comment 2 Daniel Berrangé 2016-07-01 14:55:30 UTC
Created attachment 1174946 [details]
Use pthread_atfork to reset global state

Comment 3 Jan Kurik 2016-07-26 05:04:09 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 25 development cycle.
Changing version to '25'.

Comment 4 Steve Grubb 2017-11-14 16:42:01 UTC
Fixed in upstream commit 7759e6f. Going to close this as upstream.

Daniel, thanks for pointing out the bug and a potential solution. What was committed used pthread_atfork as a weak symbol to fix things for people using pthreads. But also to document the issue in a man page explaining the problem and 2 potential solutions.


Note You need to log in before you can comment on or make changes to this bug.