Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1430250 - bind-pkcs11 keeps failing to connect to LDAP server during FreeIPA server deployment on current Rawhide
Summary: bind-pkcs11 keeps failing to connect to LDAP server during FreeIPA server dep...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: 389-ds
Version: rawhide
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: wibrown@redhat.com
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: AcceptedBlocker
Depends On: 1404409 1432149
Blocks: F26AlphaBlocker
TreeView+ depends on / blocked
 
Reported: 2017-03-08 07:47 UTC by Adam Williamson
Modified: 2017-04-10 17:08 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-03-17 01:08:08 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
ipaserver-install.log from a recent openQA test (3.63 MB, text/plain)
2017-03-08 07:49 UTC, Adam Williamson
no flags Details
system journal from recent openQA install (uncompress then use 'journalctl -f system.journal' to view) (1.47 MB, application/octet-stream)
2017-03-08 07:53 UTC, Adam Williamson
no flags Details

Description Adam Williamson 2017-03-08 07:47:16 UTC
In current Rawhide openQA tests, FreeIPA server deployment appears to succeed, but client enrolment fails, apparently because some expected DNS records are absent. Looking into why that might be on the server side, I see that named-pkcs11 seems to be consistently failing to communicate with the LDAP server, for some reason:

Mar 07 12:49:38 ipa001.domain.local named-pkcs11[9338]: LDAP error: Can't contact LDAP server: unable to start SyncRepl session
Mar 07 12:49:38 ipa001.domain.local named-pkcs11[9338]: LDAP configuration synchronization failed: socket is not connected
Mar 07 12:49:38 ipa001.domain.local named-pkcs11[9338]: ldap_syncrepl will reconnect in 60 seconds
Mar 07 12:50:38 ipa001.domain.local named-pkcs11[9338]: GSSAPI client step 1
Mar 07 12:50:38 ipa001.domain.local named-pkcs11[9338]: GSSAPI client step 1
Mar 07 12:50:38 ipa001.domain.local named-pkcs11[9338]: GSSAPI client step 1
Mar 07 12:50:38 ipa001.domain.local named-pkcs11[9338]: GSSAPI client step 2
Mar 07 12:50:38 ipa001.domain.local named-pkcs11[9338]: successfully reconnected to LDAP server
Mar 07 12:50:38 ipa001.domain.local named-pkcs11[9338]: LDAP error: Can't contact LDAP server: unable to start SyncRepl session
Mar 07 12:50:38 ipa001.domain.local named-pkcs11[9338]: LDAP configuration synchronization failed: socket is not connected
Mar 07 12:50:38 ipa001.domain.local named-pkcs11[9338]: ldap_syncrepl will reconnect in 60 seconds
Mar 07 12:51:38 ipa001.domain.local named-pkcs11[9338]: GSSAPI client step 1
Mar 07 12:51:38 ipa001.domain.local named-pkcs11[9338]: GSSAPI client step 1
Mar 07 12:51:38 ipa001.domain.local named-pkcs11[9338]: GSSAPI client step 1
Mar 07 12:51:38 ipa001.domain.local named-pkcs11[9338]: GSSAPI client step 2
Mar 07 12:51:38 ipa001.domain.local named-pkcs11[9338]: successfully reconnected to LDAP server
Mar 07 12:51:39 ipa001.domain.local named-pkcs11[9338]: LDAP error: Can't contact LDAP server: unable to start SyncRepl session
Mar 07 12:51:39 ipa001.domain.local named-pkcs11[9338]: LDAP configuration synchronization failed: socket is not connected
Mar 07 12:51:39 ipa001.domain.local named-pkcs11[9338]: ldap_syncrepl will reconnect in 60 seconds
Mar 07 12:52:39 ipa001.domain.local named-pkcs11[9338]: GSSAPI client step 1
Mar 07 12:52:39 ipa001.domain.local named-pkcs11[9338]: GSSAPI client step 1
Mar 07 12:52:39 ipa001.domain.local named-pkcs11[9338]: GSSAPI client step 1
Mar 07 12:52:39 ipa001.domain.local named-pkcs11[9338]: GSSAPI client step 2
Mar 07 12:52:39 ipa001.domain.local named-pkcs11[9338]: successfully reconnected to LDAP server
Mar 07 12:52:39 ipa001.domain.local named-pkcs11[9338]: LDAP error: Can't contact LDAP server: unable to start SyncRepl session
Mar 07 12:52:39 ipa001.domain.local named-pkcs11[9338]: LDAP configuration synchronization failed: socket is not connected
Mar 07 12:52:39 ipa001.domain.local named-pkcs11[9338]: ldap_syncrepl will reconnect in 60 seconds
Mar 07 12:53:19 ipa001.domain.local named-pkcs11[9338]: no valid RRSIG resolving 'root-servers.net/DS/IN': 199.7.83.42#53
Mar 07 12:53:19 ipa001.domain.local named-pkcs11[9338]: network unreachable resolving 'root-servers.net/DS/IN': 2001:503:a83e::2:30#53
Mar 07 12:53:19 ipa001.domain.local named-pkcs11[9338]: network unreachable resolving 'root-servers.net/DS/IN': 2001:503:231d::2:30#53
Mar 07 12:53:39 ipa001.domain.local named-pkcs11[9338]: GSSAPI client step 1
Mar 07 12:53:39 ipa001.domain.local named-pkcs11[9338]: GSSAPI client step 1
Mar 07 12:53:39 ipa001.domain.local named-pkcs11[9338]: GSSAPI client step 1
Mar 07 12:53:39 ipa001.domain.local named-pkcs11[9338]: GSSAPI client step 2
Mar 07 12:53:39 ipa001.domain.local named-pkcs11[9338]: successfully reconnected to LDAP server
Mar 07 12:53:39 ipa001.domain.local named-pkcs11[9338]: LDAP error: Can't contact LDAP server: unable to start SyncRepl session
Mar 07 12:53:39 ipa001.domain.local named-pkcs11[9338]: LDAP configuration synchronization failed: socket is not connected
Mar 07 12:53:39 ipa001.domain.local named-pkcs11[9338]: ldap_syncrepl will reconnect in 60 seconds
Mar 07 12:54:39 ipa001.domain.local named-pkcs11[9338]: GSSAPI client step 1
Mar 07 12:54:39 ipa001.domain.local named-pkcs11[9338]: GSSAPI client step 1
Mar 07 12:54:39 ipa001.domain.local named-pkcs11[9338]: GSSAPI client step 1
Mar 07 12:54:39 ipa001.domain.local named-pkcs11[9338]: GSSAPI client step 2
Mar 07 12:54:39 ipa001.domain.local named-pkcs11[9338]: successfully reconnected to LDAP server
Mar 07 12:54:40 ipa001.domain.local named-pkcs11[9338]: LDAP error: Can't contact LDAP server: unable to start SyncRepl session
Mar 07 12:54:40 ipa001.domain.local named-pkcs11[9338]: LDAP configuration synchronization failed: socket is not connected
Mar 07 12:54:40 ipa001.domain.local named-pkcs11[9338]: ldap_syncrepl will reconnect in 60 seconds

I'm not totally sure if this is the root of the DNS issues, but I'm just reporting everything that looks off in the server deployment for now. (See also #1430427 for another thing I noticed).

In fact, now I poke through the logs a bit more, ipa-dnskeysyncd is complaining about not being able to contact the LDAP server also:

Mar 07 12:52:50 ipa001.domain.local ipa-dnskeysyncd[9422]: ipa         : INFO     LDAP bind...
Mar 07 12:52:50 ipa001.domain.local ipa-dnskeysyncd[9422]: ipa         : ERROR    syncrepl_poll: LDAP error ({'desc': "Can't contact LDAP server"})
Mar 07 12:52:50 ipa001.domain.local ipa-dnskeysyncd[9422]:     while ldap_connection.syncrepl_poll(all=1, msgid=ldap_search):
Mar 07 12:52:50 ipa001.domain.local ipa-dnskeysyncd[9422]:   File "/usr/lib64/python2.7/site-packages/ldap/syncrepl.py", line 352, in syncrepl_poll
Mar 07 12:52:50 ipa001.domain.local ipa-dnskeysyncd[9422]:   File "/usr/lib64/python2.7/site-packages/ldap/ldapobject.py", line 521, in result4
Mar 07 12:52:50 ipa001.domain.local ipa-dnskeysyncd[9422]:     ldap_result = self._ldap_call(self._l.result4,msgid,all,timeout,add_ctrls,add_intermediates,add_extop)
Mar 07 12:52:50 ipa001.domain.local ipa-dnskeysyncd[9422]:   File "/usr/lib64/python2.7/site-packages/ldap/ldapobject.py", line 106, in _ldap_call
Mar 07 12:52:50 ipa001.domain.local ipa-dnskeysyncd[9422]: SERVER_DOWN: {'desc': "Can't contact LDAP server"}

I'll attach the entire journal and the server deployment log, in case they help.

Comment 1 Adam Williamson 2017-03-08 07:49:51 UTC
Created attachment 1261118 [details]
ipaserver-install.log from a recent openQA test

Comment 2 Adam Williamson 2017-03-08 07:53:25 UTC
Created attachment 1261119 [details]
system journal from recent openQA install (uncompress then use 'journalctl -f system.journal' to view)

Comment 3 Tomáš Hozza 2017-03-08 13:52:26 UTC
Maybe a bind-dyndb-ldap issue?

Comment 4 Tomas Krizek 2017-03-08 15:27:16 UTC
This is actually caused by bug 1404409 in 389-ds. This issue occurs when using GSSAPI to authenticate to LDAP. As a workaround, simple authentication can be configured in /etc/named.conf.

Comment 5 Adam Williamson 2017-03-09 18:47:47 UTC
So we're a week out from Alpha go/no-go, and FreeIPA is definitely not in good enough shape for release. I'm gonna propose this bug as an Alpha blocker as I at least *suspect* this one is the cause of the DNS problems. On that basis I propose it as a violation of "The core functional requirements for all Featured Server Roles must be met", as one of the 'core functional requirements' for the Domain Controller role is "Multiple clients must be able to enrol and unenrol in the domain" (and also "The Domain Controller must serve DNS SRV records for ldap and kerberos on port 53").

What can we do to get this fixed? I was interested by wibrown's mention in the other bug of "Maybe I have fixed this inadvertently with the LFDS disable?", but I can't find any commit since the 1.3.6.1 release with either 'lfds' or 'disable' in its commit message, so I don't know what change that is referring to and can't check to see if it fixes the problem.

I also don't know what settings to use in named.conf exactly to use 'simple authentication', to see if that makes enrolment work properly. 'auth_method' should be 'simple', fine, but what 'bind_dn' and 'password' do I use?

Comment 6 wibrown@redhat.com 2017-03-09 22:31:12 UTC
This was fixed in:

https://pagure.io/389-ds-base/issue/49139

We brought in the nunc-stans source tree because it was too hard to bundle in the rpm. This has been getting some extensive testing in the rpm and such since.

I've updated my copr repo https://pagure.io/389-ds-base/issue/49139 so this should have the needed fixes in it. If not, we have to go back and investigate.

Comment 7 Adam Williamson 2017-03-09 22:39:14 UTC
Thanks for the reference. A COPR is really no use for Fedora, we need Fedora builds with the relevant fixes included. Does this basically mean backporting https://pagure.io/389-ds-base/c/a82a2cc6479c298db2b95ed476b91d85d9ee934a?branch=master and https://pagure.io/389-ds-base/c/617c61e81c9cc3561f3910697fbd3c3be821ded8?branch=master to the Fedora package? If so, I can try that, if the patches will apply cleanly.

Comment 8 wibrown@redhat.com 2017-03-09 23:39:48 UTC
What we have in that copr is going to be the next release of 1.3.6 which is what will go into fedora. You don't need to backport those fixes, and really, they are going to be hard to backport. 

We need to talk to mreynolds about when the next fedora build is going to be made.

Comment 9 wibrown@redhat.com 2017-03-09 23:40:46 UTC
When's our next build of 1.3.6 for fedora?

Comment 10 Adam Williamson 2017-03-10 00:14:32 UTC
Fedora 26 is now frozen for Alpha. There really needs to be a better co-ordination going on here between the FreeIPA-related teams and the Fedora schedule; there should've been a plan long ago about which versions of what would be in Alpha, and they all should've landed weeks ago and be working by now.

We aren't supposed to land major changes during the milestone freezes, we're only supposed to make just enough changes to fix blocker and freeze exception bugs. Of course, since this whole stack has been broken six ways from Sunday for the last three months it matters a bit less if we land a major change to it, but still. (If you're wondering why this only just came up, various other bits of the FreeIPA stack have been broken over the last several months which prevented our tests even getting to this point to encounter this bug).

So, anyway. The situation is that we need a build without this problem to be done for Fedora 26 (and rawhide) and submitted as an update (the update is only needed for F26) ASAP: 26 Alpha go/no-go is next Thursday, which means we really need a compose to test by Tuesday at the latest. The earlier the better, because there could *well* be further bugs in the FreeIPA stack we need to sort out after this one is dealt with - we haven't had FreeIPA actually working on Rawhide for months, so who knows what else is lurking.

Given the circumstances it's fine for now if the new build has to be a substantial update with other changes that aren't relevant to this bug, but in future it'd be good if the development/package maintenance process could be adjusted to avoid us having to accept major changes through the freezes.

Thanks!

Comment 11 wibrown@redhat.com 2017-03-10 00:19:02 UTC
DS 1.3.6 was always going to be in this release, and I believe already is. We just need to make the point release to bring it up to date with the fixes. There are no more substantially large changes, it's all fixes and build process improvement. 

As for the other issue with testing, well I'll let other people talk about that.

Comment 12 Adam Williamson 2017-03-10 00:41:11 UTC
Yes, the current package is of 1.3.6.1. But I noticed that release was from October 2016, i.e. five months ago, and there've been rather a lot of commits to master between then and now; if we're getting all those commits in the next build, that's a lot of change.

Comment 13 Tomáš Hozza 2017-03-10 10:25:28 UTC
Based on the discussion, this is not caused by BIND. Moving to appropriate component.

Comment 14 Tomas Krizek 2017-03-10 11:37:53 UTC
(In reply to Adam Williamson from comment #5)

> I also don't know what settings to use in named.conf exactly to use 'simple
> authentication', to see if that makes enrolment work properly. 'auth_method'
> should be 'simple', fine, but what 'bind_dn' and 'password' do I use?

auth_method "simple";
bind_dn "cn=Directory Manager";
password "<DIRMAN_PASSWORD>";

After you modify the named.conf, don't forget to restart BIND.

# systemctl restart named-pkcs11

bind-dyndb-ldap should be able to connect to LDAP and work properly at this point.

Comment 15 Mike Ruckman 2017-03-13 18:42:51 UTC
Discussed in today's Blocker Review meeting. This bug is accepted as an Alpha blocker due to the severity of the breakage and lack of an workaround - voilating the following Alpha criterion: "Release-blocking roles and the supported role configuration interfaces must meet the core functional Role Definition Requirements to the extent that supported roles can be successfully deployed..."

Comment 16 Adam Williamson 2017-03-13 23:37:20 UTC
Note: https://bodhi.fedoraproject.org/updates/FEDORA-2017-042d32a655 is intended to fix this and in fact may, but deployment with that 389-ds-base still fails. Don't know yet if the failure is just another manifestation of the same issue, or a different issue that occurs *before* this issue, or a different issue that occurs *after* this issue. More info in parent bug https://bugzilla.redhat.com/show_bug.cgi?id=1404409 .

Comment 17 wibrown@redhat.com 2017-03-14 04:07:54 UTC
A patch has now been provided to upstream.

https://pagure.io/389-ds-base/issue/49165

Comment 18 Fedora Update System 2017-03-14 17:14:46 UTC
389-ds-base-1.3.6.2-1.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-b9fbd8b768

Comment 19 Fedora Update System 2017-03-15 04:23:55 UTC
389-ds-base-1.3.6.2-1.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-b9fbd8b768

Comment 20 Fedora Update System 2017-03-15 14:23:57 UTC
389-ds-base-1.3.6.2-2.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-d0ad136c8c

Comment 21 Fedora Update System 2017-03-16 00:50:43 UTC
389-ds-base-1.3.6.2-2.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-d0ad136c8c

Comment 22 Fedora Update System 2017-03-17 01:08:08 UTC
389-ds-base-1.3.6.2-2.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.