Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1591804 - something keeps /lib/libnss_systemd.so.2 open on minimal appliance image, breaking composes
Summary: something keeps /lib/libnss_systemd.so.2 open on minimal appliance image, bre...
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: sssd
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Jakub Hrozek
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: ARMTracker
TreeView+ depends on / blocked
 
Reported: 2018-06-15 15:28 UTC by Kevin Fenzi
Modified: 2018-06-25 23:21 UTC (History)
12 users (show)

Fixed In Version: sssd-1.16.2-3.fc29
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-06-25 23:21:11 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Kevin Fenzi 2018-06-15 15:28:16 UTC
Rawhide composes have been failing since 2018-06-12.

The Fedora arm minimal appliance (a required deliverable) has been failing to build. 

appliance-creator (in appliance-tools) is used to make this image. It creates a file and loop mounts it, then installs into it, umounts it and compresses it. 

The umount step was failing, saying the device was busy. 

lsof in the loop mounted fs gave: 

COMMAND     PID USER   FD   TYPE DEVICE SIZE/OFF  NODE NAME
appliance 13668 root  mem    REG 253,11          11247 ./libnss_systemd.so.2 (stat: No such file or directory)

Sadly this just said it was a child of appliance-creator, not what was using it. 
Somewhat blindly, we untagged sssd-1.16.2-1.fc29 as it was one of the few packages that changed that looked like it might try and use a nss library... and it worked. We got a good compose with sssd-1.16.1-9.fc29

So, is there any changes between 1.16.1-9 and 1.16.2-1 that would use that library? probibly in a scriptlet? and somehow not close it?

Comment 1 Fabiano Fidêncio 2018-06-16 09:56:11 UTC
Kevin, Paul,

Paul provided me a machine so I'd be able to give it a try.
In order to bisect what may be causing the issue (and unblock you guys), I'd like to confirm that the hanging process is the sssd that's installed in the mounted fs, not in the system one, right?

Considering that's the case, what's the easiest way to change the sssd version installed in the mounted fs?

Comment 2 Kevin Fenzi 2018-06-16 17:51:39 UTC
Yeah, it's the one in the image that appliance-creator is making. 

It makes a file, loop mounts it and installs into it. At the end it should umount it and finish up, but this bug is causing that loop mount to be busy and so it cannot umount the loop and it fails. 

It should use your enabled repos on the machine. So perhaps exclude sssd from the normal repos and add a local repo and put sssd versions in there?

koji calls it like so: 

'/usr/bin/appliance-creator', '-c', '/chroot_tmpdir/koji-image-f29-build-27652379.ks', '-d', '-v', '--logfile', '/chroot_tmpdir/appliance.log', '--cache', '/chroot_tmpdir/koji-appliance', '-o', 'app-output', '--format', 'raw', '--name', 'Fedora-Minimal-armhfp-Rawhide-20180616.n.0', '--version', 'Rawhide', '--release', '20180616.n.0'

You can find a ks at: https://kojipkgs.fedoraproject.org//work/tasks/2379/27652379/koji-image-f29-build-27652379.ks

Thanks much for looking into this! Let me know if I can assist any further...

Comment 3 Fabiano Fidêncio 2018-06-16 19:19:48 UTC
Kevin,

Paul has shared that. Retreading it I realized I can simply tweak the .ks file to use my own version of sssd (hopefully). I will take a look at that.

Thanks to both of you!

Comment 4 Sumit Bose 2018-06-21 16:30:23 UTC
Hi,

I think 

@@ -909,6 +859,9 @@ done
 %attr(750,root,root) %dir %{_var}/log/%{name}
 %attr(700,root,root) %dir %{_sysconfdir}/sssd
 %attr(711,root,root) %dir %{_sysconfdir}/sssd/conf.d
+%if (0%{?use_openssl} == 1)
+%attr(711,sssd,sssd) %dir %{_sysconfdir}/sssd/pki
+%endif
 %ghost %attr(0600,root,root) %config(noreplace) %{_sysconfdir}/sssd/sssd.conf
 %dir %{_sysconfdir}/logrotate.d
 %config(noreplace) %{_sysconfdir}/logrotate.d/sssd

is causing the issue.

Fabiano, can you change the owner of the pki directory to root so that it reads:

%if (0%{?use_openssl} == 1)
%attr(711,root,root) %dir %{_sysconfdir}/sssd/pki
%endif

(same permissions and owner as eg. ../sssd/conf.d)


While installing the package the user 'sssd' has to be looked up to set the permission. But since the appliance-creator is running changed root in the image directory at this point and the sssd user does not exist in /etc/passwd of the image glibc's nss code has to check all configured nss modules (sss, files and systemd here). It looks like sss and file where already load while appliance-creator was not running in the changed root environment to do other lookup. But systemd is loaded for the first time and since appliance-creator is currently in the change root environment libnss_system.so is loaded from the change root image and not from the main system and stays open because the nss modules are not unloaded only when the process exits.

I found https://pagure.io/appliance-tools/c/398360b2b5e86072cee058e0e3d2eb9a74eb158e?branch=master which I guess was added to fix a similar issue.

To avoid this issue in future libnss_system.so.2 can be loaded in the same way at startup or the dlopen can be replaced with a lookup for an non-existing user like e.g.:

+def do_unknown_user_hack():
+    import pwd as forgettable
+    try:
+        forgettable.getpwnam('fwefwkejkgre')
+    except:
+        pass
+    del forgettable
+    return
+
 if __name__ == "__main__":
-    hack = do_nss_sss_hack()
+    do_unknown_user_hack()
     sys.exit(main())


On the other hand you might not want to change anything and consider the failure as a test for unknown users in the installed packages.

HTH

bye,
Sumit

P.S. I think I somehow broke the test system you provided because I've see 'umount: /var/tmp/imgcreate-ZwmDIR/install_root/sys: target is busy' in my last tests as well.

Comment 5 Kevin Fenzi 2018-06-21 17:44:18 UTC
Awesome detective work. Thanks for digging into this.

Comment 6 Fabiano Fidêncio 2018-06-21 19:46:49 UTC
Kevin,

I've pushed the fix suggested by Sumit. I've tested it in the machine provided by Paul and it works like a charm.

The build is on its way and from Tomorrow (or later Today) you should be unblocked.

I'll keep this bug opened till I hear a positive feedback from you. Okay?

Comment 7 Kevin Fenzi 2018-06-21 21:02:27 UTC
Sure. :) If we get a rawhide compose tomorrow I think we can close it as fixed.

Comment 8 Kevin Fenzi 2018-06-25 23:21:11 UTC
This looks fine now. Thanks for all your work fixing it!


Note You need to log in before you can comment on or make changes to this bug.