Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1724380 - 3DES removal breaks credential acquisition
Summary: 3DES removal breaks credential acquisition
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: krb5
Version: rawhide
Hardware: All
OS: All
unspecified
high
Target Milestone: ---
Assignee: Robbie Harwood
QA Contact: Fedora Extras Quality Assurance
URL: https://github.com/krb5/krb5/pull/952
Whiteboard: openqa
Depends On:
Blocks: F31BetaBlocker
TreeView+ depends on / blocked
 
Reported: 2019-06-27 00:38 UTC by Adam Williamson
Modified: 2019-07-15 21:22 UTC (History)
10 users (show)

Fixed In Version: krb5-1.17-35.fc31
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-07-15 17:10:06 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
/var/log tarball from failed test (server) (deleted)
2019-06-27 00:43 UTC, Adam Williamson
no flags Details
named.run log after the failures (deleted)
2019-06-27 00:44 UTC, Adam Williamson
no flags Details

Description Adam Williamson 2019-06-27 00:38:31 UTC
In the Fedora-Rawhide-20190604.n.0 compose, the openQA 'upgrade a FreeIPA server and client from previous stable release' tests started failing.

What these linked tests do is this:

1. Server test deploys a Fedora 30 FreeIPA server (with DNS server enabled)
2. Client test (another Fedora 30 system) enrols to the domain
3. Server test upgrades to Rawhide, checks ipa.service is running, signals the client to upgrade
4. Client test upgrades, checks client functionality works

Throughout, the client uses the server as its DNS server. What seems to be happening is that, at step 4, the client upgrade fails, because the client suddenly cannot resolve kojipkgs.fedoraproject.org (which it needs to do to run the upgrade, the way things are set up in openQA).

When I dug into this a little deeper, it seems that the problem is that, after the server upgrade, when the BIND running on the server forwards the resolution of kojipkgs.fedoraproject.org , the answer it gets back is the *external public* IP for that domain, not the *internal infra private* IP for it. This implies that it is not using the correct forwarder. When the test initially deploys the server, it configures the infra internal DNS servers as forwarders, and this is clearly working or the test would fail at step 2. But it seems that after the upgrade, the server stops forwarding requests to the infra internal DNS servers and instead is forwarding them somewhere else - probably to 8.8.8.8, which IIRC is the default. This means we get the public IP for kojipkgs, which is not accessible from inside infra, and so the test fails.

Looking into the BIND logs on the server, it seems to have trouble connecting to LDAP after the upgrade. We see errors like this:

26-Jun-2019 19:47:23.581 LDAP configuration synchronization failed: socket is not connected
26-Jun-2019 19:47:23.581 ldap_syncrepl will reconnect in 60 seconds

...

26-Jun-2019 19:48:23.581 trying to establish LDAP connection to ldapi://%2fvar%2frun%2fslapd-DOMAIN-LOCAL.socket
26-Jun-2019 19:48:23.581 Using default keytab file name: FILE:/etc/named.keytab
26-Jun-2019 19:48:23.581 Credentials are not present in cache (Matching credential not found)
26-Jun-2019 19:48:23.581 Attempting to acquire new Kerberos credentials
26-Jun-2019 19:48:23.582 Failed to get initial credentials (TGT) using principal 'DNS/ipa001.domain.local' and keytab 'FILE:/etc/named.keytab' (Cannot contact any KDC for realm 'DOMAIN.LOCAL')
26-Jun-2019 19:48:23.582 ldap_syncrepl will reconnect in 60 seconds

...

26-Jun-2019 19:49:16.111 LDAP configuration synchronization failed: socket is not connected
26-Jun-2019 19:49:16.111 ldap_syncrepl will reconnect in 60 seconds

...

26-Jun-2019 19:50:17.739 trying to establish LDAP connection to ldapi://%2fvar%2frun%2fslapd-DOMAIN-LOCAL.socket
26-Jun-2019 19:50:17.739 Using default keytab file name: FILE:/etc/named.keytab
26-Jun-2019 19:50:17.769 Credentials are not present in cache (Matching credential not found)
26-Jun-2019 19:50:17.769 Attempting to acquire new Kerberos credentials
26-Jun-2019 19:50:17.962 Failed to get initial credentials (TGT) using principal 'DNS/ipa001.domain.local' and keytab 'FILE:/etc/named.keytab' (Pre-authentication failed: Invalid argument)
26-Jun-2019 19:50:17.963 ldap_syncrepl will reconnect in 60 seconds
26-Jun-2019 19:51:17.964 trying to establish LDAP connection to ldapi://%2fvar%2frun%2fslapd-DOMAIN-LOCAL.socket
26-Jun-2019 19:51:17.964 Using default keytab file name: FILE:/etc/named.keytab
26-Jun-2019 19:51:17.966 Credentials are not present in cache (Matching credential not found)
26-Jun-2019 19:51:17.966 Attempting to acquire new Kerberos credentials
26-Jun-2019 19:51:17.976 Failed to get initial credentials (TGT) using principal 'DNS/ipa001.domain.local' and keytab 'FILE:/etc/named.keytab' (Pre-authentication failed: Invalid argument)
26-Jun-2019 19:51:17.976 ldap_syncrepl will reconnect in 60 seconds

krb5kdc.log shows this:

Jun 26 19:50:17 ipa001.domain.local krb5kdc[1027](info): AS_REQ (7 etypes {aes256-cts-hmac-sha1-96(18), aes128-cts-hmac-sha1-96(17), DEPRECATED:arcfour-hmac(23), camellia128-cts-cmac(25), camellia256-cts-cmac(26), aes256-cts-hmac-sha384-192(20), aes128-cts-hmac-sha256-128(19)}) 10.0.2.100: NEEDED_PREAUTH: DNS/ipa001.domain.local for krbtgt/DOMAIN.LOCAL, Additional pre-authentication required
Jun 26 19:50:17 ipa001.domain.local krb5kdc[1027](info): closing down fd 11
Jun 26 19:50:21 ipa001.domain.local krb5kdc[1029](info): AS_REQ (7 etypes {aes256-cts-hmac-sha1-96(18), aes128-cts-hmac-sha1-96(17), DEPRECATED:arcfour-hmac(23), camellia128-cts-cmac(25), camellia256-cts-cmac(26), aes256-cts-hmac-sha384-192(20), aes128-cts-hmac-sha256-128(19)}) 10.0.2.100: NEEDED_PREAUTH: ipa-dnskeysyncd/ipa001.domain.local for krbtgt/DOMAIN.LOCAL, Additional pre-authentication required
Jun 26 19:50:21 ipa001.domain.local krb5kdc[1029](info): closing down fd 11
Jun 26 19:50:21 ipa001.domain.local krb5kdc[1029](info): AS_REQ (7 etypes {aes256-cts-hmac-sha1-96(18), aes128-cts-hmac-sha1-96(17), DEPRECATED:arcfour-hmac(23), camellia128-cts-cmac(25), camellia256-cts-cmac(26), aes256-cts-hmac-sha384-192(20), aes128-cts-hmac-sha256-128(19)}) 10.0.2.100: NEEDED_PREAUTH: ipa-dnskeysyncd/ipa001.domain.local for krbtgt/DOMAIN.LOCAL, Additional pre-authentication required
Jun 26 19:50:21 ipa001.domain.local krb5kdc[1029](info): closing down fd 11
Jun 26 19:51:17 ipa001.domain.local krb5kdc[1027](info): AS_REQ (7 etypes {aes256-cts-hmac-sha1-96(18), aes128-cts-hmac-sha1-96(17), DEPRECATED:arcfour-hmac(23), camellia128-cts-cmac(25), camellia256-cts-cmac(26), aes256-cts-hmac-sha384-192(20), aes128-cts-hmac-sha256-128(19)}) 10.0.2.100: NEEDED_PREAUTH: DNS/ipa001.domain.local for krbtgt/DOMAIN.LOCAL, Additional pre-authentication required
Jun 26 19:51:17 ipa001.domain.local krb5kdc[1027](info): closing down fd 11
Jun 26 19:51:23 ipa001.domain.local krb5kdc[1029](info): AS_REQ (7 etypes {aes256-cts-hmac-sha1-96(18), aes128-cts-hmac-sha1-96(17), DEPRECATED:arcfour-hmac(23), camellia128-cts-cmac(25), camellia256-cts-cmac(26), aes256-cts-hmac-sha384-192(20), aes128-cts-hmac-sha256-128(19)}) 10.0.2.100: NEEDED_PREAUTH: ipa-dnskeysyncd/ipa001.domain.local for krbtgt/DOMAIN.LOCAL, Additional pre-authentication required
Jun 26 19:51:23 ipa001.domain.local krb5kdc[1029](info): closing down fd 11
Jun 26 19:51:23 ipa001.domain.local krb5kdc[1029](info): AS_REQ (7 etypes {aes256-cts-hmac-sha1-96(18), aes128-cts-hmac-sha1-96(17), DEPRECATED:arcfour-hmac(23), camellia128-cts-cmac(25), camellia256-cts-cmac(26), aes256-cts-hmac-sha384-192(20), aes128-cts-hmac-sha256-128(19)}) 10.0.2.100: NEEDED_PREAUTH: ipa-dnskeysyncd/ipa001.domain.local for krbtgt/DOMAIN.LOCAL, Additional pre-authentication required
Jun 26 19:51:23 ipa001.domain.local krb5kdc[1029](info): closing down fd 11

I'm pinning this on krb5 as it's about the only relevant thing that changed in the affected compose. FreeIPA didn't, 389-ds didn't and bind didn't. And there are those Kerberos error messages.

Proposing as a Beta blocker as this violates "It must be possible to successfully complete a direct upgrade from a fully updated installation of each of the last two stable Fedora Server releases with the system configured as a FreeIPA domain controller or postgresql server as specified in the relevant criteria. The upgraded system must meet all relevant release criteria, including criteria relating to functionality of the server software" - https://fedoraproject.org/wiki/Fedora_30_Beta_Release_Criteria#Server_upgrade_requirements

Comment 1 Adam Williamson 2019-06-27 00:43:30 UTC
Created attachment 1584995 [details]
/var/log tarball from failed test (server)

This is a tarball of /var/log on the server after the failures. Note the upgrade starts around 23:40 UTC - which is 19:40 system local time - and completes around 8 minutes later. journalctl shows time stamps in your local timezone by default; use --utc to see UTC timestamps.

Comment 2 Adam Williamson 2019-06-27 00:44:13 UTC
Created attachment 1584997 [details]
named.run log after the failures

Note this uses system local timestamps (so upgrade is at 19:40, ends at 19:48).

Comment 3 Adam Williamson 2019-06-27 17:26:48 UTC
rcrit and rharwood asked for the output of `klist -kt -e /etc/krb5.keytab`. Ask and ye shall receive! Prior to the upgrade, it is:

Keytab name: FILE:/etc/krb5.keytab
KVNO Timestamp           Principal
---- ------------------- ------------------------------------------------------
   2 06/27/2019 11:43:03 host/ipa001.domain.local (aes256-cts-hmac-sha1-96) 
   2 06/27/2019 11:43:03 host/ipa001.domain.local (aes128-cts-hmac-sha1-96) 
   2 06/27/2019 11:43:03 host/ipa001.domain.local (DEPRECATED:des3-cbc-sha1) 
   2 06/27/2019 11:43:03 host/ipa001.domain.local (DEPRECATED:arcfour-hmac) 
   2 06/27/2019 11:43:03 host/ipa001.domain.local (camellia128-cts-cmac) 
   2 06/27/2019 11:43:03 host/ipa001.domain.local (camellia256-cts-cmac) 

after the upgrade, it is:

Keytab name: FILE:/etc/krb5.keytab
KVNO Timestamp           Principal
---- ------------------- ------------------------------------------------------
   2 06/27/2019 11:43:03 host/ipa001.domain.local (aes256-cts-hmac-sha1-96) 
   2 06/27/2019 11:43:03 host/ipa001.domain.local (aes128-cts-hmac-sha1-96) 
   2 06/27/2019 11:43:03 host/ipa001.domain.local (DEPRECATED:des3-hmac-sha1) 
   2 06/27/2019 11:43:03 host/ipa001.domain.local (DEPRECATED:arcfour-hmac) 
   2 06/27/2019 11:43:03 host/ipa001.domain.local (camellia128-cts-cmac) 
   2 06/27/2019 11:43:03 host/ipa001.domain.local (camellia256-cts-cmac) 

they're *nearly* identical, but there's one change: "(DEPRECATED:des3-cbc-sha1)" changes to "(DEPRECATED:des3-hmac-sha1)".

Comment 4 Robbie Harwood 2019-07-08 18:06:52 UTC
So the KDC is receiving initial requests that look correctly formed (no removed enctypes or anything), to which it's replying asking for preauth.  But the client seems unwilling to follow up on that?  I'd kind of like to see KRB5_TRACE output... I guess of whatever bind does for initial credential acquisition.

SSSD appears to be having the same problem (/var/log/messages): Jun 26 19:47:16 ipa001 [sssd[ldap_child[15584]]][15584]: Failed to initialize credentials using keytab [MEMORY:/etc/krb5.keytab]: Cannot contact any KDC for realm 'DOMAIN.LOCAL'. Unable to create GSSAPI-encrypted LDAP connection.                                 

It might be easier to check SSSD - crank debug_level to 9.  Maybe also just a copy of the keytab itself, if that's possible (no worries if that's hard).

I guess it's also possible that selinux is to blame here.  It looks like that has updated as well?  I'm not sure how to check this, but I didn't see anything in the logs, so maybe not.

Comment 5 Adam Williamson 2019-07-08 18:36:22 UTC
Tomasz Torcz just found something:

<zdzichu> adamw: hey, about https://bugzilla.redhat.com/show_bug.cgi?id=1717229; could you fpaste output of 'grep netscape.security /etc/pki/pki-tomcat/ca/CS.cfg'?
<zdzichu> adamw: I had a problem with freeipa recently, class names weren't updates to correct ones
<adamw> zdzichu: https://paste.fedoraproject.org/paste/mIZl-Ktm0XK7oSyWQS7qng
<zdzichu> yep, seems like the same. It should have org.mozilla.jss in front, like in https://paste.fedoraproject.org/paste/QY9ECyRbHOIWKbHw2ysoKQ
<adamw> oo

so I changed all those to add org.mozilla.jss on the front and rebooted. Then it seemed to blow up on a certificate validity issue. So I tried rewinding time and restarting certmonger...which fails because it can't talk to a KDC. At this point I would like to set the whole thing on fire and go punt it in a freaking lake.

Can I just give you root permissions on the thing and let you go to town with it?

Comment 6 Alexander Bokovoy 2019-07-08 18:42:53 UTC
I think the error message on 19:47:16 and even earlier at 19:46 are due to 389-ds not yet fully started due to ipa-server-upgrade starting/restarting all services.

Jun 26 19:46:06 ipa001 systemd[1]: Started 389 Directory Server DOMAIN-LOCAL..
Jun 26 19:46:07 ipa001 named-pkcs11[14818]: Failed to get initial credentials (TGT) using principal 'DNS/ipa001.domain.local' and keytab 'FILE:/etc/named.keytab' (Cannot contact any KDC for realm 'DOMAIN.LOCAL')

In fact, there are multiple 389-ds restarts:
$ grep 'starting up' dirsrv/slapd-DOMAIN-LOCAL/errors
[26/Jun/2019:19:31:30.538443342 -0400] - INFO - main - 389-Directory/1.4.1.2 B2019.089.026 starting up
[26/Jun/2019:19:31:35.320162467 -0400] - INFO - main - 389-Directory/1.4.1.2 B2019.089.026 starting up
[26/Jun/2019:19:31:39.163542700 -0400] - INFO - main - 389-Directory/1.4.1.2 B2019.089.026 starting up
[26/Jun/2019:19:31:45.288498343 -0400] - INFO - main - 389-Directory/1.4.1.2 B2019.089.026 starting up
[26/Jun/2019:19:34:12.104069800 -0400] - INFO - main - 389-Directory/1.4.1.2 B2019.089.026 starting up
[26/Jun/2019:19:34:18.268245611 -0400] - INFO - main - 389-Directory/1.4.1.2 B2019.089.026 starting up
[26/Jun/2019:19:34:24.390636122 -0400] - INFO - main - 389-Directory/1.4.1.2 B2019.089.026 starting up
[26/Jun/2019:19:34:50.934026949 -0400] - INFO - main - 389-Directory/1.4.1.2 B2019.089.026 starting up
[26/Jun/2019:19:35:12.907075341 -0400] - INFO - main - 389-Directory/1.4.1.2 B2019.089.026 starting up
[26/Jun/2019:19:35:24.231159036 -0400] - INFO - main - 389-Directory/1.4.1.2 B2019.089.026 starting up
[26/Jun/2019:19:45:39.015522571 -0400] - INFO - main - 389-Directory/1.4.1.4 B2019.169.2126 starting up
[26/Jun/2019:19:45:50.133540561 -0400] - INFO - main - 389-Directory/1.4.1.4 B2019.169.2126 starting up
[26/Jun/2019:19:45:53.801909332 -0400] - INFO - main - 389-Directory/1.4.1.4 B2019.169.2126 starting up
[26/Jun/2019:19:46:04.957738093 -0400] - INFO - main - 389-Directory/1.4.1.4 B2019.169.2126 starting up
[26/Jun/2019:19:47:20.251421928 -0400] - INFO - main - 389-Directory/1.4.1.4 B2019.169.2126 starting up
[26/Jun/2019:19:49:12.239974795 -0400] - INFO - main - 389-Directory/1.4.1.4 B2019.169.2126 starting up

If we check the upgrade logs, there is certain correlation, definitely:
$ grep dirsrv@ ipaupgrade.log
2019-06-26T23:45:38Z DEBUG args=['/bin/systemctl', 'is-active', 'dirsrv']
2019-06-26T23:45:38Z DEBUG args=['/bin/systemctl', 'start', 'dirsrv']
2019-06-26T23:45:40Z DEBUG Start of dirsrv complete
2019-06-26T23:45:47Z DEBUG args=['/bin/systemctl', 'restart', 'dirsrv']
2019-06-26T23:45:51Z DEBUG Restart of dirsrv complete
2019-06-26T23:45:51Z DEBUG args=['/bin/systemctl', 'stop', 'dirsrv']
2019-06-26T23:45:53Z DEBUG Stop of dirsrv complete
2019-06-26T23:45:53Z DEBUG args=['/bin/systemctl', 'is-active', 'dirsrv']
2019-06-26T23:45:53Z DEBUG args=['/bin/systemctl', 'start', 'dirsrv']
2019-06-26T23:45:54Z DEBUG args=['/bin/systemctl', 'is-active', 'dirsrv']
2019-06-26T23:45:54Z DEBUG Start of dirsrv complete
2019-06-26T23:46:03Z DEBUG args=['/sbin/restorecon', '/etc/systemd/system/dirsrv.d/ipa-env.conf']
2019-06-26T23:46:03Z DEBUG args=['/bin/systemctl', 'stop', 'dirsrv']
2019-06-26T23:46:04Z DEBUG Stop of dirsrv complete
2019-06-26T23:46:04Z DEBUG args=['/bin/systemctl', 'start', 'dirsrv']
2019-06-26T23:46:06Z DEBUG args=['/bin/systemctl', 'is-active', 'dirsrv']
2019-06-26T23:46:06Z DEBUG Start of dirsrv complete
2019-06-26T23:47:16Z DEBUG args=['/bin/systemctl', 'stop', 'dirsrv']
2019-06-26T23:47:18Z DEBUG Stop of dirsrv complete

So we can ignore anything before 19:47:19-04:00 because that was simply part of the upgrade sequence which completed at that time:
2019-06-26T23:47:19Z INFO The ipa-server-upgrade command was successful

Since krb5kdc uses LDAPI (as well as bind-dyndb-ldap in named), we should be checking when LDAPI access becomes available:
$ grep 'Listening on /var' messages
Jun 26 19:31:36 ipa001 ns-slapd[6331]: [26/Jun/2019:19:31:36.284311814 -0400] - INFO - slapd_daemon - Listening on /var/run/slapd-DOMAIN-LOCAL.socket for LDAPI requests
Jun 26 19:31:40 ipa001 ns-slapd[6380]: [26/Jun/2019:19:31:40.065880083 -0400] - INFO - slapd_daemon - Listening on /var/run/slapd-DOMAIN-LOCAL.socket for LDAPI requests
Jun 26 19:31:46 ipa001 ns-slapd[6488]: [26/Jun/2019:19:31:46.190858516 -0400] - INFO - slapd_daemon - Listening on /var/run/slapd-DOMAIN-LOCAL.socket for LDAPI requests
Jun 26 19:34:13 ipa001 ns-slapd[8869]: [26/Jun/2019:19:34:13.037814733 -0400] - INFO - slapd_daemon - Listening on /var/run/slapd-DOMAIN-LOCAL.socket for LDAPI requests
Jun 26 19:34:19 ipa001 ns-slapd[8964]: [26/Jun/2019:19:34:19.164672460 -0400] - INFO - slapd_daemon - Listening on /var/run/slapd-DOMAIN-LOCAL.socket for LDAPI requests
Jun 26 19:34:25 ipa001 ns-slapd[9042]: [26/Jun/2019:19:34:25.308606786 -0400] - INFO - slapd_daemon - Listening on /var/run/slapd-DOMAIN-LOCAL.socket for LDAPI requests
Jun 26 19:34:52 ipa001 ns-slapd[9956]: [26/Jun/2019:19:34:52.024964480 -0400] - INFO - slapd_daemon - slapd started.  Listening on /var/run/slapd-DOMAIN-LOCAL.socket for LDAPI requests
Jun 26 19:35:14 ipa001 ns-slapd[10024]: [26/Jun/2019:19:35:14.048293669 -0400] - INFO - slapd_daemon - slapd started.  Listening on /var/run/slapd-DOMAIN-LOCAL.socket for LDAPI requests
Jun 26 19:35:25 ipa001 ns-slapd[10078]: [26/Jun/2019:19:35:25.461827754 -0400] - INFO - slapd_daemon - Listening on /var/run/slapd-DOMAIN-LOCAL.socket for LDAPI requests
Jun 26 19:45:40 ipa001 ns-slapd[14073]: [26/Jun/2019:19:45:40.042291711 -0400] - INFO - slapd_daemon - slapd started.  Listening on /var/run/slapd-DOMAIN-LOCAL.socket for LDAPI requests
Jun 26 19:45:51 ipa001 ns-slapd[14108]: [26/Jun/2019:19:45:51.100285602 -0400] - INFO - slapd_daemon - slapd started.  Listening on /var/run/slapd-DOMAIN-LOCAL.socket for LDAPI requests
Jun 26 19:45:54 ipa001 ns-slapd[14145]: [26/Jun/2019:19:45:54.737239713 -0400] - INFO - slapd_daemon - Listening on /var/run/slapd-DOMAIN-LOCAL.socket for LDAPI requests
Jun 26 19:46:06 ipa001 ns-slapd[14634]: [26/Jun/2019:19:46:06.052633915 -0400] - INFO - slapd_daemon - Listening on /var/run/slapd-DOMAIN-LOCAL.socket for LDAPI requests
Jun 26 19:47:21 ipa001 ns-slapd[15621]: [26/Jun/2019:19:47:21.187925693 -0400] - INFO - slapd_daemon - Listening on /var/run/slapd-DOMAIN-LOCAL.socket for LDAPI requests
Jun 26 19:49:13 ipa001 ns-slapd[964]: [26/Jun/2019:19:49:13.995964793 -0400] - INFO - slapd_daemon - Listening on /var/run/slapd-DOMAIN-LOCAL.socket for LDAPI requests

E.g. after 19:49:13-04:00 we should be able to talk to the LDAPI socket unless 389-ds did crash.
Jun 26 19:49:15 ipa001 named-pkcs11[1045]: Failed to get initial credentials (TGT) using principal 'DNS/ipa001.domain.local' and keytab 'FILE:/etc/named.keytab' (Pre-authentication failed: Invalid argument)
Jun 26 19:49:15 ipa001 named-pkcs11[1045]: Failed to get initial credentials (TGT) using principal 'DNS/ipa001.domain.local' and keytab 'FILE:/etc/named.keytab' (Pre-authentication failed: Invalid argument)
Jun 26 19:50:17 ipa001 named-pkcs11[1045]: Failed to get initial credentials (TGT) using principal 'DNS/ipa001.domain.local' and keytab 'FILE:/etc/named.keytab' (Pre-authentication failed: Invalid argument)
Jun 26 19:51:17 ipa001 named-pkcs11[1045]: Failed to get initial credentials (TGT) using principal 'DNS/ipa001.domain.local' and keytab 'FILE:/etc/named.keytab' (Pre-authentication failed: Invalid argument)
Jun 26 19:52:17 ipa001 named-pkcs11[1045]: Failed to get initial credentials (TGT) using principal 'DNS/ipa001.domain.local' and keytab 'FILE:/etc/named.keytab' (Pre-authentication failed: Invalid argument)

According to krb5kdc.log, at the same time the only thing happened was receiving of the AS_REQ requests mentioned above. And 389-ds access log confirms that at least until the last of these requests KDC did contact 389-ds over LDAPI socket:

[26/Jun/2019:19:50:43.830134200 -0400] conn=6 op=20 SRCH base="cn=Default Host Password Policy,cn=computers,cn=accounts,dc=domain,dc=local" scope=0 filter=
"(objectClass=*)" attrs="krbMaxPwdLife krbMinPwdLife krbPwdMinDiffChars krbPwdMinLength krbPwdHistoryLength krbPwdMaxFailure krbPwdFailureCountInterval krb
PwdLockoutDuration"
[26/Jun/2019:19:50:43.830258319 -0400] conn=6 op=20 RESULT err=0 tag=101 nentries=1 etime=0.0000264641
[26/Jun/2019:19:51:17.971345902 -0400] conn=6 op=22 SRCH base="dc=domain,dc=local" scope=2 filter="(&(|(objectClass=krbprincipalaux)(objectClass=krbprincip
al)(objectClass=ipakrbprincipal))(|(ipaKrbPrincipalAlias=DNS/ipa001.domain.local)(krbPrincipalName:caseIgnoreIA5Match:=DNS/ipa001.domain.local
@DOMAIN.LOCAL)))" attrs="krbPrincipalName krbCanonicalName krbUPEnabled krbPrincipalKey krbTicketPolicyReference krbPrincipalExpiration krbPasswordExpirati
on krbPwdPolicyReference krbPrincipalType krbPwdHistory krbLastPwdChange krbPrincipalAliases krbLastSuccessfulAuth krbLastFailedAuth krbLoginFailedCount kr
bPrincipalAuthInd krbExtraData krbLastAdminUnlock krbObjectReferences krbTicketFlags krbMaxTicketLife krbMaxRenewableAge nsAccountLock passwordHistory ipaK
rbAuthzData ipaUserAuthType ipatokenRadiusConfigLink objectClass"
[26/Jun/2019:19:51:17.972786043 -0400] conn=6 op=22 RESULT err=0 tag=101 nentries=1 etime=0.0001862323
[26/Jun/2019:19:51:17.972989777 -0400] conn=6 op=23 SRCH base="cn=DOMAIN.LOCAL,cn=kerberos,dc=domain,dc=local" scope=0 filter="(objectClass=krbticketpolicy
aux)" attrs="krbMaxTicketLife krbMaxRenewableAge krbTicketFlags"
[26/Jun/2019:19:51:17.973194096 -0400] conn=6 op=23 RESULT err=0 tag=101 nentries=1 etime=0.0000359671
[26/Jun/2019:19:51:17.973390908 -0400] conn=6 op=24 SRCH base="dc=domain,dc=local" scope=2 filter="(&(|(objectClass=krbprincipalaux)(objectClass=krbprincip
al)(objectClass=ipakrbprincipal))(|(ipaKrbPrincipalAlias=krbtgt/DOMAIN.LOCAL)(krbPrincipalName:caseIgnoreIA5Match:=krbtgt/DOMAIN.LOCAL@DOMAIN.
LOCAL)))" attrs="krbPrincipalName krbCanonicalName krbUPEnabled krbPrincipalKey krbTicketPolicyReference krbPrincipalExpiration krbPasswordExpiration krbPw
dPolicyReference krbPrincipalType krbPwdHistory krbLastPwdChange krbPrincipalAliases krbLastSuccessfulAuth krbLastFailedAuth krbLoginFailedCount krbPrincip
alAuthInd krbExtraData krbLastAdminUnlock krbObjectReferences krbTicketFlags krbMaxTicketLife krbMaxRenewableAge nsAccountLock passwordHistory ipaKrbAuthzD
ata ipaUserAuthType ipatokenRadiusConfigLink objectClass"
[26/Jun/2019:19:51:17.973965382 -0400] conn=6 op=24 RESULT err=0 tag=101 nentries=1 etime=0.0000739196
[26/Jun/2019:19:51:17.974161567 -0400] conn=6 op=25 SRCH base="cn=Default Service Password Policy,cn=services,cn=accounts,dc=domain,dc=local" scope=0 filte
r="(objectClass=*)" attrs="krbMaxPwdLife krbMinPwdLife krbPwdMinDiffChars krbPwdMinLength krbPwdHistoryLength krbPwdMaxFailure krbPwdFailureCountInterval k
rbPwdLockoutDuration"
[26/Jun/2019:19:51:17.974420173 -0400] conn=6 op=25 RESULT err=0 tag=101 nentries=1 etime=0.0000416655

There are few more requests in the 389-ds access log after that from another krb5kdc process (conn=7) about ipa-dnskeysyncd/ipa001.domain.local principal so it seems both krb5kdc processes were able to connect to the LDAP server. This means they either got incorrect data back or interpreted it wrong?

Comment 7 Adam Williamson 2019-07-08 18:50:08 UTC
Sorry, disregard #c5 - that's meant to be for 1717229 .

Comment 8 Adam Williamson 2019-07-08 18:52:06 UTC
Robbie: "It might be easier to check SSSD - crank debug_level to 9." - that should already be set to 9 for all openQA tests, it's in the test scripts. Did you check the logs? We run 'sss_debuglevel 9' right after the client enrols: https://pagure.io/fedora-qa/os-autoinst-distri-fedora/blob/master/f/tests/realmd_join_sssd.pm#_82

Comment 9 Robbie Harwood 2019-07-08 18:57:54 UTC
I did.  There's nothing in the krb5_child.log - I would have expected to see krb5 trace output in there.  Indeed, most of these appear much shorter than 9 would suggest - sssd.log is also empty, even.

Comment 10 Robbie Harwood 2019-07-08 21:15:01 UTC
Alexander suggested on IRC that this is probably because the way SSSD's debug level is being set isn't persistent across restarts.  Moreover, I think he's right that this failure occurs while the KDC is down for upgrading, and isn't actually a problem.

I've attempted to reproduce this setup locally, and am unable to.  Specifically, a default IPA server installation on fc30 (with DNS) upgrades without a hitch to rawhide, and I observe no problems from named.

What I can't figure out is how you got a keytab with those enctypes in it to begin with - even setting crypto-policies to LEGACY doesn't do it for me.

Is this on a machine you'd be willing/able to give me access to, or can you give me reproduction instructions (since you've reproduced it above already)?

Comment 11 Adam Williamson 2019-07-08 23:26:43 UTC
The errors I posted happen constantly, over and over, until the test ends (when it finishes uploading the logs). It's not a transient error.

However...this actually seems to have been resolved somehow in the last couple of composes. It failed from 20190604.n.0 through 20190703.n.0; it started working again in 20190704.n.1 and worked again in 20190707.n.0 yesterday. That pattern is exactly the same on staging and production, which means it's unlikely to be some kind of test blip, but someone really did something that fixed this in 0704.n.1.

I guess let's close this for now, if it starts happening again we can re-open...

Comment 12 Adam Williamson 2019-07-08 23:29:47 UTC
freeipa-4.8.0-1.fc31 showed up in that compose, so that's the obvious candidate for what fixed this. There was also an NSS update, not sure if that's relevant. Previous freeipa was 4.7.90.pre1-6.fc31 .

Comment 13 Robbie Harwood 2019-07-10 20:06:21 UTC
freeipa removed their 3des configuration, which I imagine is why.

I've finally managed to reproduce this outside of freeipa:

[root@kdc ~]# rpm -qv krb5-workstation
krb5-workstation-1.17-34.fc31.x86_64
[root@kdc ~]# klist -ek
Keytab name: FILE:/etc/krb5.keytab
KVNO Principal
---- --------------------------------------------------------------------------
   2 host/kdc.example.com (aes256-cts-hmac-sha1-96) 
   2 host/kdc.example.com (aes128-cts-hmac-sha1-96) 
   2 host/kdc.example.com (DEPRECATED:des3-hmac-sha1) 
   2 host/kdc.example.com (DEPRECATED:arcfour-hmac) 
   2 host/kdc.example.com (camellia256-cts-cmac) 
   2 host/kdc.example.com (camellia128-cts-cmac) 
[root@kdc ~]# kinit -k
kinit: Bad encryption type while getting initial credentials
[root@kdc ~]# KRB5_TRACE=/dev/stderr kinit -k
[776] 1562788938.944469: Getting initial credentials for host/kdc.example.com
[776] 1562788938.944470: Looked up etypes in keytab: aes256-cts, aes128-cts, rc4-hmac, camellia256-cts, camellia128-cts
[776] 1562788938.944472: Sending unauthenticated request
[776] 1562788938.944473: Sending request (215 bytes) to EXAMPLE.COM
[776] 1562788938.944474: Resolving hostname kdc.example.com
[776] 1562788938.944475: Sending initial UDP request to dgram 192.168.121.184:88
[776] 1562788938.944476: Received answer (793 bytes) from dgram 192.168.121.184:88
[776] 1562788938.944477: Sending DNS URI query for _kerberos.EXAMPLE.COM.
[776] 1562788939.42448: No URI records found
[776] 1562788939.42449: Sending DNS SRV query for _kerberos-master._udp.EXAMPLE.COM.
[776] 1562788939.42450: Sending DNS SRV query for _kerberos-master._tcp.EXAMPLE.COM.
[776] 1562788939.42451: No SRV records found
[776] 1562788939.42452: Response was not from master KDC
[776] 1562788939.42453: Processing preauth types: PA-ETYPE-INFO2 (19)
[776] 1562788939.42454: Selected etype info: etype aes256-cts, salt "EXAMPLE.COMhostkdc.example.com", params ""
[776] 1562788939.42455: Produced preauth for next request: (empty)
[776] 1562788939.42456: Getting AS key, salt "EXAMPLE.COMhostkdc.example.com", params ""
[776] 1562788939.42457: Retrieving host/kdc.example.com from FILE:/etc/krb5.keytab (vno 0, enctype aes256-cts) with result: -1765328196/Bad encryption type
[776] 1562788939.42458: Getting initial credentials for host/kdc.example.com
[776] 1562788939.42459: Looked up etypes in keytab: aes256-cts, aes128-cts, rc4-hmac, camellia256-cts, camellia128-cts
[776] 1562788939.42461: Sending unauthenticated request
[776] 1562788939.42462: Sending request (215 bytes) to EXAMPLE.COM (master)
[776] 1562788939.42463: Sending DNS URI query for _kerberos.EXAMPLE.COM.
[776] 1562788939.42464: No URI records found
[776] 1562788939.42465: Sending DNS SRV query for _kerberos-master._udp.EXAMPLE.COM.
[776] 1562788939.42466: Sending DNS SRV query for _kerberos-master._tcp.EXAMPLE.COM.
[776] 1562788939.42467: No SRV records found
kinit: Bad encryption type while getting initial credentials
[root@kdc ~]# 

while in krb5kdc.log:

Jul 10 20:02:18 kdc.example.com krb5kdc[770](info): AS_REQ (7 etypes {aes256-cts-hmac-sha1-96(18), aes128-cts-hmac-sha1-96(17), DEPRECATED:arcfour-hmac(23), camellia128-cts-cmac(25), camellia256-cts-cmac(26), aes256-cts-hmac-sha384-192(20), aes128-cts-hmac-sha256-128(19)}) 192.168.121.184: ISSUE: authtime 1562788938, etypes {rep=aes256-cts-hmac-sha1-96(18), tkt=aes256-cts-hmac-sha1-96(18), ses=aes256-cts-hmac-sha1-96(18)}, host/kdc.example.com for krbtgt/EXAMPLE.COM

Comment 14 Adam Williamson 2019-07-10 20:21:37 UTC
Cool. Aha - now I think about it again...there's another possibility for why the tests stopped failing: I bet a newer FreeIPA showed up in *Fedora 30*, so the *start point* for the upgrade process is different. And...yeah, that looks to be exactly what happened:

https://bodhi.fedoraproject.org/updates/FEDORA-2019-76a8f0144a

Still...this started happening in June, when both F30 and Rawhide should have had 4.7.90-pre1 as the current FreeIPA version. So...I guess somehow upgrading from F30+4.7.90-pre1 to Rawhide+4.7.90-pre1 triggers the bug, but upgrading from F30+4.8.0 to Rawhide+4.8.0 doesn't?

Comment 15 Robbie Harwood 2019-07-10 21:16:48 UTC
Yes, it looks like freeIPA backported the 3DES non-configuration change to fc30 as well.  I guess the freeIPA openqa is the only person/thing testing migration, because this bug turns out to be actually important for me to have fixed before fc31 release.  (Fix posted upstream; will backport once merged.)

Comment 16 Stephen Gallagher 2019-07-15 13:28:50 UTC
So, based on Comment #14, this isn't currently an issue going from F30 4.8.0 to F31 4.8.0, or it's still a problem if you've been upgrading from 4.7.x since earlier in F30? If the latter, I'd call that a +1 blocker. If it's the former, we formally only support upgrades from the latest stable packages in the previous release, so I'd be -1 blocker, +1 FE.

Comment 17 Adam Williamson 2019-07-15 16:03:28 UTC
I am honestly not sure exactly when you might run into it at this point. Robbie, what do you see as still outstanding here?

Comment 18 Robbie Harwood 2019-07-15 17:10:06 UTC
I was just gonna fix it and call it a day... the build in "fixed in" (krb5-1.17-35.fc31) resolves the issue using the patch I wrote upstream.

Comment 19 Adam Williamson 2019-07-15 17:19:25 UTC
Do we also need to fix it in F30?

Comment 20 Geoffrey Marr 2019-07-15 19:00:12 UTC
Discussed during the 2019-07-15 blocker review meeting: [1]

The decision to delay the classification of this as a blocker bug was made as there is definitely a significant bug here, but as things stand we aren't clear if it breaks the criteria for F31 due to the changes in F30. We will investigate this further and decide blocker status once we have more information.

[1] https://meetbot.fedoraproject.org/fedora-blocker-review/2019-07-15/f31-blocker-review.2019-07-15-16.05.txt

Comment 21 Robbie Harwood 2019-07-15 21:22:09 UTC
No, because 3DES removal isn't part of fc30.


Note You need to log in before you can comment on or make changes to this bug.