Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1757224
Summary: | Tickets act like they're expiring prematurely when using KCM cache | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Gwyn Ciesla <gwync> | ||||||||||||
Component: | sssd | Assignee: | Michal Zidek <mzidek> | ||||||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||||
Severity: | unspecified | Docs Contact: | |||||||||||||
Priority: | unspecified | ||||||||||||||
Version: | 31 | CC: | abokovoy, awilliam, bojan, james, jhrozek, j, lslebodn, mpitt, mzidek, nalin, npmccallum, pasik, pbrezina, rharwood, sbose, ssorce, tomek | ||||||||||||
Target Milestone: | --- | ||||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | Unspecified | ||||||||||||||
OS: | Unspecified | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | Environment: | ||||||||||||||
Last Closed: | 2020-01-31 09:58:38 UTC | Type: | Bug | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Attachments: |
|
Description
Gwyn Ciesla
2019-09-30 21:22:08 UTC
Additionally, this is not only occurring for my local krb5 domain, but for FEDORAPROJECT.ORG also. Can you klist -A the next time it happens, before and after you kinit and post the output of the command? Before: [gwyn@gwythsefyll ~]$ klist -A Ticket cache: KCM:1000:62367 Default principal: limb Valid starting Expires Service principal 10/01/2019 04:16:01 10/02/2019 04:15:53 krbtgt/FEDORAPROJECT.ORG renew until 10/07/2019 16:15:56 10/01/2019 08:03:52 10/02/2019 04:15:53 host/pkgs.fedoraproject.org renew until 10/07/2019 16:15:56 Ticket cache: KCM:1000 Default principal: gwyn@BAMBOO Valid starting Expires Service principal 10/01/2019 08:03:12 10/02/2019 08:03:12 krbtgt/BAMBOO@BAMBOO renew until 10/01/2019 08:03:12 After: [gwyn@gwythsefyll ~]$ klist -A Ticket cache: KCM:1000 Default principal: gwyn@BAMBOO Valid starting Expires Service principal 10/01/2019 08:10:36 10/02/2019 08:10:36 krbtgt/BAMBOO@BAMBOO renew until 10/01/2019 08:10:36 Ticket cache: KCM:1000:62367 Default principal: limb Valid starting Expires Service principal 10/01/2019 08:10:33 10/02/2019 08:10:27 krbtgt/FEDORAPROJECT.ORG renew until 10/08/2019 08:10:27 So the tickets are clearly not expired (they last 1 day), so how do you determine that they are expired? Are you receiving an error from some application? Is this possibly a duplicate of: https://bugzilla.redhat.com/show_bug.cgi?id=1757299 ? That's actually the main symptom, yes, inability to ssh to my GSSAPI-auth hosts of conduct dist-git operations, which require both an rsa key and valid krb5 ticket. So probably. What realm's hosts are you trying to reach when this happens? Fedora's or bamboo's ? *** Bug 1757299 has been marked as a duplicate of this bug. *** Either. ok, I was hoping kswitch could be a workaround and it was just a matter of not finding the right tickets ... Robbie did anything change in krb5 libs wrt ticket expiration checks? Only KCM-related checks. Does it occur if you disable KCM? How do I disable that explicitly? I currently have default settings for default_ccache_name, and was using KEYRING previously, and still saw this issue. A few ways. `yum erase sssd-kcm` will do it, or edit/remove the file in /etc/krb5.conf.d (I think it's called sssd_kcm). Found it. It made no difference. Can you show the same output as before, with KEYRING this time? Can you also post KRB5_TRACE output for whatever you're doing that says it's expired? Sure, how do I generate the KRB5_TRACE info? set the KRB5_TRACE env variable to a file or /dev/stderr for example. the run the tool that complains about expired tickets. collect the file (or the output if you used stderr) and post it here (mark private if needed) Created attachment 1622097 [details]
Before
[gwyn@gwythsefyll git]$ klist -A
Ticket cache: KCM:1000:4294920187
Default principal: gwyn@BAMBOO
Valid starting Expires Service principal
10/02/2019 15:17:45 10/03/2019 15:17:45 krbtgt/BAMBOO@BAMBOO
renew until 10/02/2019 15:17:45
Created attachment 1622110 [details]
After
[gwyn@gwythsefyll git]$ klist -A
Ticket cache: KCM:1000:4294920187
Default principal: gwyn@BAMBOO
Valid starting Expires Service principal
10/02/2019 15:41:46 10/03/2019 15:41:40 host/bamboo@
renew until 10/02/2019 15:41:40
10/02/2019 15:41:40 10/03/2019 15:41:40 krbtgt/BAMBOO@BAMBOO
renew until 10/02/2019 15:41:40
10/02/2019 15:41:46 10/03/2019 15:41:40 host/bamboo@BAMBOO
renew until 10/02/2019 15:41:40
Hi Gwyn the two comments above still show you are using KCM "Ticket cache: KCM:1000:4294920187" That line should have the keyword KEYRING instead to test with keyring caches. Oops. Looks like it's setting KRB5CCNAME=KCM: for my new logins. I can't figure out why, but I'm manually unsetting in the meantime. Ok, oddness. I'm manually unsetting that environment variable, and the problem hasn't come back. What's odd though is that I'm not sure why it's being set. Did you disable KCM by removing sssd-kcm, or removing the file in /etc/krb5.conf.d? If not, it's going to keep setting itself as the default. Do I understand correctly that the problem doesn't occur when KCM is not in use? Ok, that did it. That is correct, it seems to be KCM related. That then broke logins with sssd until I re-installed sssd-kcm. Is this because I have cache_credentials = True in sssd.conf? Sending to sssd for further investigation (though I remain on CC) because trace output suggests that KCM is producing a NOTFOUND response. It works well with sssd-kcm-2.2.2-1.fc31.x86_64 krb5-libs-1.17-15.fc30.x86_64 Do you know which change between krb5-libs-1.17-15 and krb5-libs-1.17-45 could broke it ? I think it's a KCM bug, not a krb5 bug, hence why I assigned it there. It works well with sh$ rpm -q freeipa-server openssh krb5-libs sssd-kcm freeipa-server-4.8.1-2.fc31.x86_64 openssh-8.0p1-8.fc31.1.x86_64 krb5-libs-1.17-40.fc31.x86_64 sssd-kcm-2.2.2-1.fc31.x86_64 Upgrading krb5 broke it Preparing : 1/1 Running scriptlet: krb5-libs-1.17-45.fc31.x86_64 1/1 Upgrading : krb5-libs-1.17-45.fc31.x86_64 1/12 Upgrading : libkadm5-1.17-45.fc31.x86_64 2/12 Upgrading : krb5-devel-1.17-45.fc31.x86_64 3/12 Upgrading : krb5-server-1.17-45.fc31.x86_64 4/12 Running scriptlet: krb5-server-1.17-45.fc31.x86_64 4/12 Upgrading : krb5-workstation-1.17-45.fc31.x86_64 5/12 Upgrading : krb5-pkinit-1.17-45.fc31.x86_64 6/12 Cleanup : krb5-devel-1.17-40.fc31.x86_64 7/12 Running scriptlet: krb5-server-1.17-40.fc31.x86_64 8/12 Cleanup : krb5-server-1.17-40.fc31.x86_64 8/12 Running scriptlet: krb5-server-1.17-40.fc31.x86_64 8/12 Cleanup : krb5-workstation-1.17-40.fc31.x86_64 9/12 Cleanup : libkadm5-1.17-40.fc31.x86_64 10/12 Cleanup : krb5-pkinit-1.17-40.fc31.x86_64 11/12 Cleanup : krb5-libs-1.17-40.fc31.x86_64 12/12 Running scriptlet: krb5-libs-1.17-40.fc31.x86_64 12/12 Verifying : krb5-devel-1.17-45.fc31.x86_64 1/12 Verifying : krb5-devel-1.17-40.fc31.x86_64 2/12 Verifying : krb5-libs-1.17-45.fc31.x86_64 3/12 Verifying : krb5-libs-1.17-40.fc31.x86_64 4/12 Verifying : krb5-pkinit-1.17-45.fc31.x86_64 5/12 Verifying : krb5-pkinit-1.17-40.fc31.x86_64 6/12 Verifying : krb5-server-1.17-45.fc31.x86_64 7/12 Verifying : krb5-server-1.17-40.fc31.x86_64 8/12 Verifying : krb5-workstation-1.17-45.fc31.x86_64 9/12 Verifying : krb5-workstation-1.17-40.fc31.x86_64 10/12 Verifying : libkadm5-1.17-45.fc31.x86_64 11/12 Verifying : libkadm5-1.17-40.fc31.x86_64 So either there is regression in krb5 or some change in krb5 revealed bug in sssd-kcm. Anyway we need to know more details from krb5 POV. It is trivial to reproduce on f31 using reproducer from https://bugzilla.redhat.com/show_bug.cgi?id=1757299#c0 There is not anything between -40 and -45 in koji for 31. My bisect failed there :-( If you're going to do triage for someone else's component, at least do actual investigation instead of uselessly bisecting. It's also not at all helpful to tell me that "it works well" - *what* works well? What are you testing? How do you reproduce? etc. You also didn't answer any of my questions - why does krb5 get a NOTFOUND back from sssd here? What does KCM think is going on? The only change related to KCM in any way between fc30 and fc31 is around handling clockskew. Presumably the clockskew in your environment is negligible - hence bisecting won't be illuminating and is in fact more useless noise for us to sift through. Remember, we're not here to point fingers - we're here to fix the bug. (In reply to Gwyn Ciesla from comment #24) > That then broke logins with sssd until I re-installed sssd-kcm. Is this > because I have cache_credentials = True in sssd.conf? No, and that shouldn't happen, can you report it as a separate bug against SSSD ? (In reply to Simo Sorce from comment #32) > (In reply to Gwyn Ciesla from comment #24) > > That then broke logins with sssd until I re-installed sssd-kcm. Is this > > because I have cache_credentials = True in sssd.conf? > > No, and that shouldn't happen, can you report it as a separate bug against > SSSD ? https://bugzilla.redhat.com/show_bug.cgi?id=1758661 Robbie, looking at the first klist and trace, krb5 is asking for a specific ticket (host/bamboo) which is not present, so KCM properly returns NOTFOUND. When krb5 asks for the krbtgt/BAMBOO@BAMBOO ticket it gets 0/Success, but somehow it doesn't then proceed to try to obtain a ticket via AS request, and instead it loops back to try and find the host ticket again, rinse and repeat until krb5 finally gives up. Robbie, it sounds like krb5 believes KCM is returning an expired ticket, where there any changes to the krb5/KCM interface that may have triggered this ? Per #c31, the only change to cc_kcm was handling clockskew properly. There have been no other changes to the cc_kcm between f30 and f31. It was this, if you're curious: https://github.com/krb5/krb5/pull/969 I agree that it looks like krb5 and KCM disagree on the state of the ticket. Per #c25, what I was asking for was information on what KCM thinks the state of the ticket is. FWIW, I'm seeing this too, with both my FreeIPA domain ticket and my Fedora ticket. I constantly have to re-kinit to do anything with fedpkg or send any mail (since I use GSS auth for my mail client). It's very annoying. Adam can you confirm you are also using the KCM cache type? If so can you reproduce if you downgrade the krb5 package to one before the change Robbie highlighted ? Yeah, I'm using KCM apparently. I'll try downgrading krb5 to 1.17-40.fc31 - the last build before the "Fix KCM client time offset propagation" fix - and report back how that goes. So I woke up this morning and ssh'ed into one of my servers and gssapi worked, with the ticket I was issued yesterday. That certainly wasn't happening before. So I'd say yes, downgrading to 1.17-40.fc31 fixes this. Again, can we please stop bisecting krb5 and get the information to actually fix this instead? I need to know what KCM thinks is going on with the contents of the ccache so *we* can fix the issue. I don't care which package changed to cause breakage, and neither should anyone else. You didn't explain how either of the people encountering the bug can provide the information you want, though. And Simo asked me to try downgrading krb5. So I did. I'm happy to provide whatever info you need if you explain how to get it. My guess would be KCM logs at debug level 9? That's the usual procedure I think sssd follows for triage. I don't know that that's super useful, I have debug=9, and mine just has lots of this: (Wed Oct 9 17:04:01 2019) [sssd[kcm]] [orderly_shutdown] (0x0010): SIGTERM: killing children (In reply to Adam Williamson from comment #41) > You didn't explain how either of the people encountering the bug can provide > the information you want, though. And Simo asked me to try downgrading krb5. > So I did. I'm happy to provide whatever info you need if you explain how to > get it. To generate the sssd-kcm logs, either add this to your sssd.conf file: [kcm] debug_level = 10 or, if you don't have one, it should be enough to add that content to a file under /etc/sssd/conf.d/ with this recent sssd version. Then, restart the sssd-kcm service: systemctl restart sssd-kcm.service Finally, run whatever use-case doesn't work for you. The KCM logs will be generated at /var/log/sssd/sssd_kcm.log. It is recommended to disable the debug logs when you no longer need the debugging to be enabled as the sssd-kcm service can generate quite a large amount of debugging information. By the way, I tried to reproduce the bug as well today, because honestly I think it's more likely that something is broken in sssd-kcm than libkrb5, just on the grounds of the respective maturity of the codebases, but I didn't see the issue. I also tried with multiple principals, tried switching them back and forth with kswitch etc. Jakub: how long did you wait? So far I can say for me it seems like a ticket will stop working 42 minutes after being issued, but I'm not sure what the lower bound is. 7 minutes after being issued still worked OK. So, seems to be somewhere between 7 and 42 mins on current data. Created attachment 1624490 [details]
kcm debug=10 log
ooh, that's a lot of info.
Gwyn: it'd probably help in reading it if you have timings of when you actually hit the problem, i.e. when you tried to do something and it failed because the ticket was 'expired'? I dunno if it's easy for Jakub or Robbie, but I certainly can't just skim that log and spot such an instance :) I set the debug and restarted at 15:11:59. I then immediately attempted to ssh with GSSAPI and was prompted for a password, which begins at 15:12:03. ah, OK. I'm doing the same but I re k-init'ed after the restart in case that was important, so I'm waiting for it to start failing again right now. Thank you for the logs. Jakub, I'm curious about these lines: (Thu Oct 10 15:12:43 2019) [sssd[kcm]] [kcm_op_get_kdc_offset_getbyname_done] (0x1000): KDC offset: 2147483647 and also (Thu Oct 10 15:12:13 2019) [sssd[kcm]] [kcm_op_get_kdc_offset_getbyname_done] (0x1000): KDC offset: 1 For the first one, that looks like an endian issue - is the printing patch present in that version? Any idea what's up with the second one? On a VM with updated rawhide, sssd-kcm configured. kinit at REDHAT.COM, then at FEDORAPROJECT.ORG (so FEDORAPROJECT.ORG as first entry in cccol). FP.O ticket has 10 minute lifetime. Checked functionality using `koji hello`: immediately after kinit, 5 minutes, and roughly every 30 seconds thereafter. No failures noted until the ticket hit the actual expiration. 2147483647, if the endian were switched, would be -129. Can folks sync with an NTP server and retry? I did sudo chronyc waitsync. No change. I'm always synced to nyp, and for my domain I can see that both server and client are on the same time. Ten minutes is not long enough to reproduce the bug. Give it 45 at least. Created attachment 1624569 [details]
my debug kcm log
OK, I reproduced again and here's my kcm debug log. The failed ssh attempt was at 18:52:39. I ran ntpdate pool.ntp.org after that and it adjusted the time by -0.003482 sec, then retried the ssh and it still fails.
Hm, in Adam's case the offsets are INT32_MAX, which at least gives me a hunch.. Can either of you try this build? https://koji.fedoraproject.org/koji/taskinfo?taskID=38214003 It doesn't contain a proper fix, but hopefully it would propel us to the right direction.. Installed, monitoring. Ditto. Installed, set debug to 10, did kinit again, will see where we're at in an hour. well, the ticket I got at time of last comment is still working, and it's been an hour and ten minutes, and I was able to reproduce the problem after ~45 minutes before. So let's tentatively say that build is working. Do you need logs? My observations match Adam's. Adam, Gwyn, thanks for testing. Jakub, so since it looks the issue is in SSSD's KCM, can you change to component to 'sssd' and clone the ticket to other products where you think this issue might be present as well? Thanks. bye, Sumit (In reply to Jakub Hrozek from comment #55) > Hm, in Adam's case the offsets are INT32_MAX, which at least gives me a > hunch.. Can either of you try this build? > https://koji.fedoraproject.org/koji/taskinfo?taskID=38214003 It doesn't > contain a proper fix, but hopefully it would propel us to the right > direction.. That scratch build help a bit but I am not sure whether it is the right solution. The most difficult part of debugging was missing debug information about kdc offset from krb5. Neither klist nor KRB5_TRACE shows that and IMH) there is a place for improvement in MIT krb5 (but that should be covered in different PR So the key difference is how krb5 ticket was created. Becasue kinit and krb5_child(which is used by sssd) trigger different KCM operation. kinit: sh# grep kcm_cmd_done sssd_kcm.log_kinit (Mon Oct 14 06:12:31 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_DEFAULT_CACHE returned [0]: Success (Mon Oct 14 06:12:31 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_PRINCIPAL returned [1432158223]: No matching credentials found (Mon Oct 14 06:12:31 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_CACHE_UUID_LIST returned [0]: Success (Mon Oct 14 06:12:31 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_DEFAULT_CACHE returned [0]: Success (Mon Oct 14 06:12:31 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_PRINCIPAL returned [1432158223]: No matching credentials found (Mon Oct 14 06:12:36 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation INITIALIZE returned [0]: Success (Mon Oct 14 06:12:36 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation SET_KDC_OFFSET returned [0]: Success (Mon Oct 14 06:12:36 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation STORE returned [0]: Success (Mon Oct 14 06:12:36 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_PRINCIPAL returned [0]: Success (Mon Oct 14 06:12:36 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation STORE returned [0]: Success (Mon Oct 14 06:12:36 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_PRINCIPAL returned [0]: Success (Mon Oct 14 06:12:36 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation STORE returned [0]: Success krb5_child(sssd-common-2.2.2-1.fc31) sh# grep kcm_cmd_done sssd_kcm.log_krb5_child (Mon Oct 14 06:14:16 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_DEFAULT_CACHE returned [0]: Success (Mon Oct 14 06:14:16 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_KDC_OFFSET returned [1432158223]: No matching credentials found (Mon Oct 14 06:14:16 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_CRED_UUID_LIST returned [1432158218]: No credentials available (Mon Oct 14 06:14:16 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_CACHE_UUID_LIST returned [0]: Success (Mon Oct 14 06:14:16 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_DEFAULT_CACHE returned [0]: Success (Mon Oct 14 06:14:16 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_PRINCIPAL returned [1432158223]: No matching credentials found (Mon Oct 14 06:14:16 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_DEFAULT_CACHE returned [0]: Success (Mon Oct 14 06:14:16 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_CACHE_UUID_LIST returned [0]: Success (Mon Oct 14 06:14:16 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_DEFAULT_CACHE returned [0]: Success (Mon Oct 14 06:14:16 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_PRINCIPAL returned [1432158223]: No matching credentials found (Mon Oct 14 06:14:16 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GEN_NEW returned [0]: Success (Mon Oct 14 06:14:16 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation INITIALIZE returned [0]: Success (Mon Oct 14 06:14:16 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation STORE returned [0]: Success (Mon Oct 14 06:14:16 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation SET_DEFAULT_CACHE returned [0]: Success (Mon Oct 14 06:14:16 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_DEFAULT_CACHE returned [0]: Success (Mon Oct 14 06:14:16 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_PRINCIPAL returned [0]: Success As you can see there is missing operation `SET_KDC_OFFSET` and thus default value INT32_MAX was used. And `SET_KDC_OFFSET` was not used with sssd-kcm-2.2.2-1.fc31.1 even thought that build "fix" the issue. krb5_child:(sssd-common-2.2.2-1.fc31.1) sh# grep kcm_cmd_done sssd_kcm.log_krb5_child_good (Mon Oct 14 06:23:46 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_DEFAULT_CACHE returned [0]: Success (Mon Oct 14 06:23:46 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_KDC_OFFSET returned [1432158223]: No matching credentials found (Mon Oct 14 06:23:46 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_CRED_UUID_LIST returned [1432158218]: No credentials available (Mon Oct 14 06:23:46 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_CACHE_UUID_LIST returned [0]: Success (Mon Oct 14 06:23:46 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_DEFAULT_CACHE returned [0]: Success (Mon Oct 14 06:23:46 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_PRINCIPAL returned [1432158223]: No matching credentials found (Mon Oct 14 06:23:46 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_DEFAULT_CACHE returned [0]: Success (Mon Oct 14 06:23:46 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_CACHE_UUID_LIST returned [0]: Success (Mon Oct 14 06:23:46 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_DEFAULT_CACHE returned [0]: Success (Mon Oct 14 06:23:46 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_PRINCIPAL returned [1432158223]: No matching credentials found (Mon Oct 14 06:23:46 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GEN_NEW returned [0]: Success (Mon Oct 14 06:23:46 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation INITIALIZE returned [0]: Success (Mon Oct 14 06:23:46 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation STORE returned [0]: Success (Mon Oct 14 06:23:46 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation SET_DEFAULT_CACHE returned [0]: Success (Mon Oct 14 06:23:46 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_DEFAULT_CACHE returned [0]: Success (Mon Oct 14 06:23:46 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_PRINCIPAL returned [0]: Success So the question is whether sssd-kcm shoudl be fixed or krb5_child. (In reply to Lukas Slebodnik from comment #61) > The most difficult part of debugging was missing debug information about kdc offset from krb5. Neither klist nor KRB5_TRACE shows that and IMH) there is a place for improvement in MIT krb5 (but that should be covered in different PR Thank you for your opinion. Logging KCM protocol goo is the responsibility of KCM, not krb5. There was no difficulty in finding the offset once we had KCM logs; the difficulty was in getting KCM logs. (In reply to Robbie Harwood from comment #62) > (In reply to Lukas Slebodnik from comment #61) > > The most difficult part of debugging was missing debug information about kdc offset from krb5. Neither klist nor KRB5_TRACE shows that and IMH) there is a place for improvement in MIT krb5 (but that should be covered in different PR > > Thank you for your opinion. > > Logging KCM protocol goo is the responsibility of KCM, not krb5. There was > no difficulty in finding the offset once we had KCM logs; the difficulty was > in getting KCM logs. We need to agree to disagree here. The krb5 code always call function set_kdc_offset in initialization of kcm but that function does something only of krb5 context has os flag KRB5_OS_TOFFSET_VALID. And it was properly set to global krb5_context(kr->ctx) after calling krb5_get_init_creds_password. But another krb5 context is created when creating ccache[1]. (comment in the code explain reason) Copying time from one context to another seems to help. I assume it was not a problem with KEYRING ccache because it did not set default kdc offset to INT32_MAX And the value 0 works well due to chrony/ntpdate. [1] https://pagure.io/SSSD/sssd/blob/6baf291ba2bd5f873b8e5e84e98d9cd251a5b2a3/f/src/providers/krb5/krb5_child.c#_978 Created attachment 1625695 [details]
krb5_child patch
I did not run any regression tests
(In reply to Lukas Slebodnik from comment #65) > Created attachment 1625695 [details] > krb5_child patch > > I did not run any regression tests Thank you, perhaps we could use that as well, but I still think that using the KCM cache should not require the user to know that they need to set the time themselves.. (In reply to Jakub Hrozek from comment #66) > (In reply to Lukas Slebodnik from comment #65) > > Created attachment 1625695 [details] > > krb5_child patch > > > > I did not run any regression tests > > Thank you, perhaps we could use that as well, but I still think that using > the KCM cache should not require the user to know that they need to set the > time themselves.. Another POV is that it helped to find a very old bug in krb5_child. Maybe it should be configurable but I can image you can consider such way to be overkill :-) It is up to you to decide. Anyway, fedora 31 final freeze is over and final GA release is approaching. So it would be good to patch at least fedora and it does not matter which way. https://fedoraproject.org/wiki/Releases/31/Schedule btw Heimdal had a similar bug, at least that makes me feel a little better: https://github.com/heimdal/heimdal/commit/9f58896af958ae5e6e3ebde8c48dad4eda841986 (In reply to Jakub Hrozek from comment #68) > btw Heimdal had a similar bug, at least that makes me feel a little better: > https://github.com/heimdal/heimdal/commit/ > 9f58896af958ae5e6e3ebde8c48dad4eda841986 heimdal had uninitialized data which is something different :-) sssd-kcm was in much better situation. Therefore workaround with kinit worked for Adam W. But any authentication (screnn unlock ..) would override ticker by krb5_child. So workaround with kinit is not ideal. *** Bug 1762881 has been marked as a duplicate of this bug. *** BTW, uninstalling sssd-kcm and using kernel keyring was a short term workaround here. Tickets issued on login into a Gnome session would be reported by stuff like Evolution, ssh, NFS as expired, although they looked OK when using klist. Weird... Fixed in master: commit 2c9bdcf579e430fa8f7e5595a17cf7242adb5216 FEDORA-2019-3bdedf56fb has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2019-3bdedf56fb FEDORA-2019-e9fc910d7f has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2019-e9fc910d7f sssd-2.2.2-3.fc31 has been pushed to the Fedora 31 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-e9fc910d7f sssd-2.2.2-3.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-1b16b0e9ae sssd-2.2.2-3.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-3bdedf56fb sssd-2.2.2-3.fc31 has been pushed to the Fedora 31 stable repository. If problems still persist, please make note of it in this bug report. sssd-2.2.2-3.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report. Our Cockpit tests confirm that this is fixed (https://github.com/cockpit-project/bots/pull/479). Can this be closed now? (In reply to Martin Pitt from comment #80) > Our Cockpit tests confirm that this is fixed > (https://github.com/cockpit-project/bots/pull/479). Can this be closed now? It has never been pushed to f29 because bodhi update was obsoleted https://bodhi.fedoraproject.org/updates/FEDORA-2019-3bdedf56fb However, f29 is EOL and other versions have already fixed version. Martin, fill free to close as CURRENT_RELEASE :-) @Lukas: F29 hasn't even been on my radar, and this bz has "Version: 31". So time to close then indeed. Thanks! fwiw I seem to be seeing something like this again the last few days. I'll try and look into it more carefully, but the basic symptom (constantly having to run kinit) seems to be back. I kinit'ed to do a Fedora build this morning, suspended my system, went out for a few hours, came back, woke it up again, tried to do another build, and my ticket had expired... |