Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.

Bug 1757224

Summary: Tickets act like they're expiring prematurely when using KCM cache
Product: [Fedora] Fedora Reporter: Gwyn Ciesla <gwync>
Component: sssdAssignee: Michal Zidek <mzidek>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 31CC: abokovoy, awilliam, bojan, james, jhrozek, j, lslebodn, mpitt, mzidek, nalin, npmccallum, pasik, pbrezina, rharwood, sbose, ssorce, tomek
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-01-31 09:58:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Before
none
After
none
kcm debug=10 log
none
my debug kcm log
none
krb5_child patch none

Description Gwyn Ciesla 2019-09-30 21:22:08 UTC
I use kerberos for multiple things in my environment. My tickets are supposed to be valid for 24 hours and renewable for a week. After a few hours, I have to kinit again to access any kerberized functions.

I was using default_ccache_name = KEYRING:persistent:%{uid}, but commented it out to see if it would help. It didn't.

This was not occurring on f30, but is on f31.

Comment 1 Gwyn Ciesla 2019-09-30 21:22:49 UTC
Additionally, this is not only occurring for my local krb5 domain, but for FEDORAPROJECT.ORG also.

Comment 2 Simo Sorce 2019-09-30 21:41:33 UTC
Can you klist -A the next time it happens, before and after you kinit and post the output of the command?

Comment 3 Gwyn Ciesla 2019-10-01 13:10:58 UTC
Before:

[gwyn@gwythsefyll ~]$ klist -A
Ticket cache: KCM:1000:62367
Default principal: limb

Valid starting       Expires              Service principal
10/01/2019 04:16:01  10/02/2019 04:15:53  krbtgt/FEDORAPROJECT.ORG
	renew until 10/07/2019 16:15:56
10/01/2019 08:03:52  10/02/2019 04:15:53  host/pkgs.fedoraproject.org
	renew until 10/07/2019 16:15:56

Ticket cache: KCM:1000
Default principal: gwyn@BAMBOO

Valid starting       Expires              Service principal
10/01/2019 08:03:12  10/02/2019 08:03:12  krbtgt/BAMBOO@BAMBOO
	renew until 10/01/2019 08:03:12

After:
[gwyn@gwythsefyll ~]$ klist -A
Ticket cache: KCM:1000
Default principal: gwyn@BAMBOO

Valid starting       Expires              Service principal
10/01/2019 08:10:36  10/02/2019 08:10:36  krbtgt/BAMBOO@BAMBOO
	renew until 10/01/2019 08:10:36

Ticket cache: KCM:1000:62367
Default principal: limb

Valid starting       Expires              Service principal
10/01/2019 08:10:33  10/02/2019 08:10:27  krbtgt/FEDORAPROJECT.ORG
	renew until 10/08/2019 08:10:27

Comment 4 Simo Sorce 2019-10-01 13:22:13 UTC
So the tickets are clearly not expired (they last 1 day), so how do you determine that they are expired?
Are you receiving an error from some application?

Is this possibly a duplicate of: https://bugzilla.redhat.com/show_bug.cgi?id=1757299 ?

Comment 5 Gwyn Ciesla 2019-10-01 13:25:19 UTC
That's actually the main symptom, yes, inability to ssh to my GSSAPI-auth hosts of conduct dist-git operations, which require both an rsa key and valid krb5 ticket. So probably.

Comment 6 Simo Sorce 2019-10-01 13:31:06 UTC
What realm's hosts are you trying to reach when this happens?
Fedora's or bamboo's ?

Comment 7 Simo Sorce 2019-10-01 13:38:00 UTC
*** Bug 1757299 has been marked as a duplicate of this bug. ***

Comment 8 Gwyn Ciesla 2019-10-01 13:39:58 UTC
Either.

Comment 9 Simo Sorce 2019-10-01 14:01:10 UTC
ok, I was hoping kswitch could be a workaround and it was just a matter of not finding the right tickets ...

Robbie did anything change in krb5 libs wrt ticket expiration checks?

Comment 10 Robbie Harwood 2019-10-02 16:33:16 UTC
Only KCM-related checks.  Does it occur if you disable KCM?

Comment 11 Gwyn Ciesla 2019-10-02 16:37:39 UTC
How do I disable that explicitly? I currently have default settings for default_ccache_name, and was using KEYRING previously, and still saw this issue.

Comment 12 Robbie Harwood 2019-10-02 17:09:02 UTC
A few ways.  `yum erase sssd-kcm` will do it, or edit/remove the file in /etc/krb5.conf.d (I think it's called sssd_kcm).

Comment 13 Gwyn Ciesla 2019-10-02 18:07:15 UTC
Found it. It made no difference.

Comment 14 Robbie Harwood 2019-10-02 20:14:49 UTC
Can you show the same output as before, with KEYRING this time?

Can you also post KRB5_TRACE output for whatever you're doing that says it's expired?

Comment 15 Gwyn Ciesla 2019-10-02 20:19:03 UTC
Sure, how do I generate the KRB5_TRACE info?

Comment 16 Simo Sorce 2019-10-02 20:38:08 UTC
set the KRB5_TRACE env variable to a file or /dev/stderr for example.
the run the tool that complains about expired tickets.
collect the file (or the output if you used stderr) and post it here (mark private if needed)

Comment 17 Gwyn Ciesla 2019-10-02 20:41:37 UTC
Created attachment 1622097 [details]
Before

[gwyn@gwythsefyll git]$ klist -A
Ticket cache: KCM:1000:4294920187
Default principal: gwyn@BAMBOO

Valid starting       Expires              Service principal
10/02/2019 15:17:45  10/03/2019 15:17:45  krbtgt/BAMBOO@BAMBOO
	renew until 10/02/2019 15:17:45

Comment 18 Gwyn Ciesla 2019-10-02 20:42:38 UTC
Created attachment 1622110 [details]
After

[gwyn@gwythsefyll git]$ klist -A
Ticket cache: KCM:1000:4294920187
Default principal: gwyn@BAMBOO

Valid starting       Expires              Service principal
10/02/2019 15:41:46  10/03/2019 15:41:40  host/bamboo@
	renew until 10/02/2019 15:41:40
10/02/2019 15:41:40  10/03/2019 15:41:40  krbtgt/BAMBOO@BAMBOO
	renew until 10/02/2019 15:41:40
10/02/2019 15:41:46  10/03/2019 15:41:40  host/bamboo@BAMBOO
	renew until 10/02/2019 15:41:40

Comment 19 Simo Sorce 2019-10-02 20:48:56 UTC
Hi Gwyn the two comments above still show you are using KCM

"Ticket cache: KCM:1000:4294920187"

That line should have the keyword KEYRING instead to test with keyring caches.

Comment 20 Gwyn Ciesla 2019-10-03 13:25:55 UTC
Oops. Looks like it's setting KRB5CCNAME=KCM: for my new logins. I can't figure out why, but I'm manually unsetting in the meantime.

Comment 21 Gwyn Ciesla 2019-10-03 15:37:49 UTC
Ok, oddness. I'm manually unsetting that environment variable, and the problem hasn't come back. What's odd though is that I'm not sure why it's being set.

Comment 22 Robbie Harwood 2019-10-03 17:13:58 UTC
Did you disable KCM by removing sssd-kcm, or removing the file in /etc/krb5.conf.d?  If not, it's going to keep setting itself as the default.

Do I understand correctly that the problem doesn't occur when KCM is not in use?

Comment 23 Gwyn Ciesla 2019-10-03 18:23:11 UTC
Ok, that did it. That is correct, it seems to be KCM related.

Comment 24 Gwyn Ciesla 2019-10-03 19:44:37 UTC
That then broke logins with sssd until I re-installed sssd-kcm. Is this because I have cache_credentials = True in sssd.conf?

Comment 25 Robbie Harwood 2019-10-03 19:48:47 UTC
Sending to sssd for further investigation (though I remain on CC) because trace output suggests that KCM is producing a NOTFOUND response.

Comment 26 Lukas Slebodnik 2019-10-04 13:59:50 UTC
It works well with
sssd-kcm-2.2.2-1.fc31.x86_64
krb5-libs-1.17-15.fc30.x86_64

Do you know which change between krb5-libs-1.17-15 and krb5-libs-1.17-45 could broke it ?

Comment 27 Robbie Harwood 2019-10-04 15:13:40 UTC
I think it's a KCM bug, not a krb5 bug, hence why I assigned it there.

Comment 28 Lukas Slebodnik 2019-10-04 15:31:49 UTC
It works well with

sh$ rpm -q freeipa-server openssh krb5-libs sssd-kcm
freeipa-server-4.8.1-2.fc31.x86_64
openssh-8.0p1-8.fc31.1.x86_64
krb5-libs-1.17-40.fc31.x86_64
sssd-kcm-2.2.2-1.fc31.x86_64

Upgrading krb5 broke it

  Preparing        :                                                                                                                                                                   1/1 
  Running scriptlet: krb5-libs-1.17-45.fc31.x86_64                                                                                                                                     1/1 
  Upgrading        : krb5-libs-1.17-45.fc31.x86_64                                                                                                                                    1/12 
  Upgrading        : libkadm5-1.17-45.fc31.x86_64                                                                                                                                     2/12 
  Upgrading        : krb5-devel-1.17-45.fc31.x86_64                                                                                                                                   3/12 
  Upgrading        : krb5-server-1.17-45.fc31.x86_64                                                                                                                                  4/12 
  Running scriptlet: krb5-server-1.17-45.fc31.x86_64                                                                                                                                  4/12 
  Upgrading        : krb5-workstation-1.17-45.fc31.x86_64                                                                                                                             5/12 
  Upgrading        : krb5-pkinit-1.17-45.fc31.x86_64                                                                                                                                  6/12 
  Cleanup          : krb5-devel-1.17-40.fc31.x86_64                                                                                                                                   7/12 
  Running scriptlet: krb5-server-1.17-40.fc31.x86_64                                                                                                                                  8/12 
  Cleanup          : krb5-server-1.17-40.fc31.x86_64                                                                                                                                  8/12 
  Running scriptlet: krb5-server-1.17-40.fc31.x86_64                                                                                                                                  8/12 
  Cleanup          : krb5-workstation-1.17-40.fc31.x86_64                                                                                                                             9/12 
  Cleanup          : libkadm5-1.17-40.fc31.x86_64                                                                                                                                    10/12 
  Cleanup          : krb5-pkinit-1.17-40.fc31.x86_64                                                                                                                                 11/12 
  Cleanup          : krb5-libs-1.17-40.fc31.x86_64                                                                                                                                   12/12 
  Running scriptlet: krb5-libs-1.17-40.fc31.x86_64                                                                                                                                   12/12 
  Verifying        : krb5-devel-1.17-45.fc31.x86_64                                                                                                                                   1/12 
  Verifying        : krb5-devel-1.17-40.fc31.x86_64                                                                                                                                   2/12 
  Verifying        : krb5-libs-1.17-45.fc31.x86_64                                                                                                                                    3/12 
  Verifying        : krb5-libs-1.17-40.fc31.x86_64                                                                                                                                    4/12 
  Verifying        : krb5-pkinit-1.17-45.fc31.x86_64                                                                                                                                  5/12 
  Verifying        : krb5-pkinit-1.17-40.fc31.x86_64                                                                                                                                  6/12 
  Verifying        : krb5-server-1.17-45.fc31.x86_64                                                                                                                                  7/12 
  Verifying        : krb5-server-1.17-40.fc31.x86_64                                                                                                                                  8/12 
  Verifying        : krb5-workstation-1.17-45.fc31.x86_64                                                                                                                             9/12 
  Verifying        : krb5-workstation-1.17-40.fc31.x86_64                                                                                                                            10/12 
  Verifying        : libkadm5-1.17-45.fc31.x86_64                                                                                                                                    11/12 
  Verifying        : libkadm5-1.17-40.fc31.x86_64

So either there is regression in krb5 or some change in krb5 revealed bug in sssd-kcm.
Anyway we need to know more details from krb5 POV.

Comment 29 Lukas Slebodnik 2019-10-04 15:32:45 UTC
It is trivial to reproduce on f31 using reproducer from https://bugzilla.redhat.com/show_bug.cgi?id=1757299#c0

Comment 30 Lukas Slebodnik 2019-10-04 15:34:43 UTC
There is not anything between -40 and -45 in koji for 31. My bisect failed there :-(

Comment 31 Robbie Harwood 2019-10-04 18:22:54 UTC
If you're going to do triage for someone else's component, at least do actual investigation instead of uselessly bisecting.

It's also not at all helpful to tell me that "it works well" - *what* works well?  What are you testing?  How do you reproduce?  etc.

You also didn't answer any of my questions - why does krb5 get a NOTFOUND back from sssd here?  What does KCM think is going on?

The only change related to KCM in any way between fc30 and fc31 is around handling clockskew.  Presumably the clockskew in your environment is negligible - hence bisecting won't be illuminating and is in fact more useless noise for us to sift through.  Remember, we're not here to point fingers - we're here to fix the bug.

Comment 32 Simo Sorce 2019-10-04 18:46:27 UTC
(In reply to Gwyn Ciesla from comment #24)
> That then broke logins with sssd until I re-installed sssd-kcm. Is this
> because I have cache_credentials = True in sssd.conf?

No, and that shouldn't happen, can you report it as a separate bug against SSSD ?

Comment 33 Gwyn Ciesla 2019-10-04 18:50:28 UTC
(In reply to Simo Sorce from comment #32)
> (In reply to Gwyn Ciesla from comment #24)
> > That then broke logins with sssd until I re-installed sssd-kcm. Is this
> > because I have cache_credentials = True in sssd.conf?
> 
> No, and that shouldn't happen, can you report it as a separate bug against
> SSSD ?

https://bugzilla.redhat.com/show_bug.cgi?id=1758661

Comment 34 Simo Sorce 2019-10-04 18:55:26 UTC
Robbie, looking at the first klist and trace, krb5 is asking for a specific ticket (host/bamboo) which is not present, so KCM properly returns NOTFOUND.
When krb5 asks for the krbtgt/BAMBOO@BAMBOO ticket it gets 0/Success, but somehow it doesn't then proceed to try to obtain a ticket via AS request, and instead it loops back to try and find the host ticket again, rinse and repeat until krb5 finally gives up.

Robbie, it sounds like krb5 believes KCM is returning an expired ticket, where there any changes to the krb5/KCM interface that may have triggered this ?

Comment 35 Robbie Harwood 2019-10-08 18:14:44 UTC
Per #c31, the only change to cc_kcm was handling clockskew properly.  There have been no other changes to the cc_kcm between f30 and f31.  It was this, if you're curious: https://github.com/krb5/krb5/pull/969

I agree that it looks like krb5 and KCM disagree on the state of the ticket.  Per #c25, what I was asking for was information on what KCM thinks the state of the ticket is.

Comment 36 Adam Williamson 2019-10-09 18:36:01 UTC
FWIW, I'm seeing this too, with both my FreeIPA domain ticket and my Fedora ticket. I constantly have to re-kinit to do anything with fedpkg or send any mail (since I use GSS auth for my mail client). It's very annoying.

Comment 37 Simo Sorce 2019-10-09 18:39:07 UTC
Adam can you confirm you are also using the KCM cache type?
If so can you reproduce if you downgrade the krb5 package to one before the change Robbie highlighted ?

Comment 38 Adam Williamson 2019-10-09 20:53:18 UTC
Yeah, I'm using KCM apparently. I'll try downgrading krb5 to 1.17-40.fc31 - the last build before the "Fix KCM client time offset propagation" fix - and report back how that goes.

Comment 39 Adam Williamson 2019-10-10 15:31:33 UTC
So I woke up this morning and ssh'ed into one of my servers and gssapi worked, with the ticket I was issued yesterday. That certainly wasn't happening before. So I'd say yes, downgrading to 1.17-40.fc31 fixes this.

Comment 40 Robbie Harwood 2019-10-10 16:20:07 UTC
Again, can we please stop bisecting krb5 and get the information to actually fix this instead?  I need to know what KCM thinks is going on with the contents of the ccache so *we* can fix the issue.  I don't care which package changed to cause breakage, and neither should anyone else.

Comment 41 Adam Williamson 2019-10-10 16:35:35 UTC
You didn't explain how either of the people encountering the bug can provide the information you want, though. And Simo asked me to try downgrading krb5. So I did. I'm happy to provide whatever info you need if you explain how to get it.

Comment 42 Robbie Harwood 2019-10-10 19:07:38 UTC
My guess would be KCM logs at debug level 9?  That's the usual procedure I think sssd follows for triage.

Comment 43 Gwyn Ciesla 2019-10-10 19:14:00 UTC
I don't know that that's super useful, I have debug=9, and mine just has lots of this:

(Wed Oct  9 17:04:01 2019) [sssd[kcm]] [orderly_shutdown] (0x0010): SIGTERM: killing children

Comment 44 Jakub Hrozek 2019-10-10 20:04:39 UTC
(In reply to Adam Williamson from comment #41)
> You didn't explain how either of the people encountering the bug can provide
> the information you want, though. And Simo asked me to try downgrading krb5.
> So I did. I'm happy to provide whatever info you need if you explain how to
> get it.

To generate the sssd-kcm logs, either add this to your sssd.conf file:
   [kcm]
   debug_level = 10
or, if you don't have one, it should be enough to add that content to a file
under /etc/sssd/conf.d/ with this recent sssd version.

Then, restart the sssd-kcm service:
   systemctl restart sssd-kcm.service

Finally, run whatever use-case doesn't work for you. The KCM logs will be generated at /var/log/sssd/sssd_kcm.log. It is recommended to disable the debug logs when you no longer need the debugging to be enabled as the sssd-kcm service can generate quite a large amount of debugging information.

By the way, I tried to reproduce the bug as well today, because honestly I think it's more likely that something is broken in sssd-kcm than libkrb5, just on the grounds of the respective maturity of the codebases, but I didn't see the issue. I also tried with multiple principals, tried switching them back and forth with kswitch etc.

Comment 45 Adam Williamson 2019-10-10 20:09:45 UTC
Jakub: how long did you wait? So far I can say for me it seems like a ticket will stop working 42 minutes after being issued, but I'm not sure what the lower bound is. 7 minutes after being issued still worked OK. So, seems to be somewhere between 7 and 42 mins on current data.

Comment 46 Gwyn Ciesla 2019-10-10 20:14:05 UTC
Created attachment 1624490 [details]
kcm debug=10 log

ooh, that's a lot of info.

Comment 47 Adam Williamson 2019-10-10 20:17:21 UTC
Gwyn: it'd probably help in reading it if you have timings of when you actually hit the problem, i.e. when you tried to do something and it failed because the ticket was 'expired'? I dunno if it's easy for Jakub or Robbie, but I certainly can't just skim that log and spot such an instance :)

Comment 48 Gwyn Ciesla 2019-10-10 20:20:06 UTC
I set the debug and restarted at 15:11:59. I then immediately attempted to ssh with GSSAPI and was prompted for a password, which begins at 15:12:03.

Comment 49 Adam Williamson 2019-10-10 20:22:08 UTC
ah, OK. I'm doing the same but I re k-init'ed after the restart in case that was important, so I'm waiting for it to start failing again right now.

Comment 50 Robbie Harwood 2019-10-10 20:46:41 UTC
Thank you for the logs.  Jakub, I'm curious about these lines:

(Thu Oct 10 15:12:43 2019) [sssd[kcm]] [kcm_op_get_kdc_offset_getbyname_done] (0x1000): KDC offset: 2147483647

and also

(Thu Oct 10 15:12:13 2019) [sssd[kcm]] [kcm_op_get_kdc_offset_getbyname_done] (0x1000): KDC offset: 1

For the first one, that looks like an endian issue - is the printing patch present in that version?  Any idea what's up with the second one?

On a VM with updated rawhide, sssd-kcm configured.  kinit at REDHAT.COM, then at FEDORAPROJECT.ORG (so FEDORAPROJECT.ORG as first entry in cccol).  FP.O ticket has 10 minute lifetime.  Checked functionality using `koji hello`: immediately after kinit, 5 minutes, and roughly every 30 seconds thereafter.  No failures noted until the ticket hit the actual expiration.

Comment 51 Robbie Harwood 2019-10-10 21:04:11 UTC
2147483647, if the endian were switched, would be -129.  Can folks sync with an NTP server and retry?

Comment 52 Gwyn Ciesla 2019-10-10 21:14:57 UTC
I did sudo chronyc waitsync. No change.

Comment 53 Adam Williamson 2019-10-10 23:19:44 UTC
I'm always synced to nyp, and for my domain I can see that both server and client are on the same time.

Ten minutes is not long enough to reproduce the bug. Give it 45 at least.

Comment 54 Adam Williamson 2019-10-11 01:56:30 UTC
Created attachment 1624569 [details]
my debug kcm log

OK, I reproduced again and here's my kcm debug log. The failed ssh attempt was at 18:52:39. I ran ntpdate pool.ntp.org after that and it adjusted the time by -0.003482 sec, then retried the ssh and it still fails.

Comment 55 Jakub Hrozek 2019-10-11 07:36:18 UTC
Hm, in Adam's case the offsets are INT32_MAX, which at least gives me a hunch.. Can either of you try this build? https://koji.fedoraproject.org/koji/taskinfo?taskID=38214003 It doesn't contain a proper fix, but hopefully it would propel us to the right direction..

Comment 56 Gwyn Ciesla 2019-10-11 14:34:18 UTC
Installed, monitoring.

Comment 57 Adam Williamson 2019-10-11 15:15:41 UTC
Ditto. Installed, set debug to 10, did kinit again, will see where we're at in an hour.

Comment 58 Adam Williamson 2019-10-11 16:26:09 UTC
well, the ticket I got at time of last comment is still working, and it's been an hour and ten minutes, and I was able to reproduce the problem after ~45 minutes before. So let's tentatively say that build is working. Do you need logs?

Comment 59 Gwyn Ciesla 2019-10-11 19:33:05 UTC
My observations match Adam's.

Comment 60 Sumit Bose 2019-10-12 11:09:13 UTC
Adam, Gwyn, thanks for testing.

Jakub, so since it looks the issue is in SSSD's KCM, can you change to component to 'sssd' and clone the ticket to other products where you think this issue might be present as well? Thanks.

bye,
Sumit

Comment 61 Lukas Slebodnik 2019-10-14 10:33:51 UTC
(In reply to Jakub Hrozek from comment #55)
> Hm, in Adam's case the offsets are INT32_MAX, which at least gives me a
> hunch.. Can either of you try this build?
> https://koji.fedoraproject.org/koji/taskinfo?taskID=38214003 It doesn't
> contain a proper fix, but hopefully it would propel us to the right
> direction..

That scratch build help a bit but I am not sure whether it is the right solution.

The most difficult part of debugging was missing debug information about kdc offset from krb5.
Neither klist nor KRB5_TRACE shows that and IMH) there is a place for improvement in MIT krb5
(but that should be covered in different PR

So the key difference is how krb5 ticket was created. Becasue kinit and krb5_child(which is used by sssd)
trigger different KCM operation.

kinit:
  sh# grep kcm_cmd_done sssd_kcm.log_kinit
  (Mon Oct 14 06:12:31 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_DEFAULT_CACHE returned [0]: Success
  (Mon Oct 14 06:12:31 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_PRINCIPAL returned [1432158223]: No matching credentials found
  (Mon Oct 14 06:12:31 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_CACHE_UUID_LIST returned [0]: Success
  (Mon Oct 14 06:12:31 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_DEFAULT_CACHE returned [0]: Success
  (Mon Oct 14 06:12:31 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_PRINCIPAL returned [1432158223]: No matching credentials found
  (Mon Oct 14 06:12:36 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation INITIALIZE returned [0]: Success
  (Mon Oct 14 06:12:36 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation SET_KDC_OFFSET returned [0]: Success
  (Mon Oct 14 06:12:36 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation STORE returned [0]: Success
  (Mon Oct 14 06:12:36 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_PRINCIPAL returned [0]: Success
  (Mon Oct 14 06:12:36 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation STORE returned [0]: Success
  (Mon Oct 14 06:12:36 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_PRINCIPAL returned [0]: Success
  (Mon Oct 14 06:12:36 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation STORE returned [0]: Success

krb5_child(sssd-common-2.2.2-1.fc31)
  sh# grep kcm_cmd_done sssd_kcm.log_krb5_child
  (Mon Oct 14 06:14:16 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_DEFAULT_CACHE returned [0]: Success
  (Mon Oct 14 06:14:16 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_KDC_OFFSET returned [1432158223]: No matching credentials found
  (Mon Oct 14 06:14:16 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_CRED_UUID_LIST returned [1432158218]: No credentials available
  (Mon Oct 14 06:14:16 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_CACHE_UUID_LIST returned [0]: Success
  (Mon Oct 14 06:14:16 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_DEFAULT_CACHE returned [0]: Success
  (Mon Oct 14 06:14:16 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_PRINCIPAL returned [1432158223]: No matching credentials found
  (Mon Oct 14 06:14:16 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_DEFAULT_CACHE returned [0]: Success
  (Mon Oct 14 06:14:16 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_CACHE_UUID_LIST returned [0]: Success
  (Mon Oct 14 06:14:16 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_DEFAULT_CACHE returned [0]: Success
  (Mon Oct 14 06:14:16 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_PRINCIPAL returned [1432158223]: No matching credentials found
  (Mon Oct 14 06:14:16 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GEN_NEW returned [0]: Success
  (Mon Oct 14 06:14:16 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation INITIALIZE returned [0]: Success
  (Mon Oct 14 06:14:16 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation STORE returned [0]: Success
  (Mon Oct 14 06:14:16 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation SET_DEFAULT_CACHE returned [0]: Success
  (Mon Oct 14 06:14:16 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_DEFAULT_CACHE returned [0]: Success
  (Mon Oct 14 06:14:16 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_PRINCIPAL returned [0]: Success

As you can see there is missing operation `SET_KDC_OFFSET` and thus default value INT32_MAX was used.

And `SET_KDC_OFFSET` was not used with sssd-kcm-2.2.2-1.fc31.1 even thought that build "fix" the issue.

krb5_child:(sssd-common-2.2.2-1.fc31.1)
  sh# grep kcm_cmd_done sssd_kcm.log_krb5_child_good                         
  (Mon Oct 14 06:23:46 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_DEFAULT_CACHE returned [0]: Success
  (Mon Oct 14 06:23:46 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_KDC_OFFSET returned [1432158223]: No matching credentials found
  (Mon Oct 14 06:23:46 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_CRED_UUID_LIST returned [1432158218]: No credentials available
  (Mon Oct 14 06:23:46 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_CACHE_UUID_LIST returned [0]: Success
  (Mon Oct 14 06:23:46 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_DEFAULT_CACHE returned [0]: Success
  (Mon Oct 14 06:23:46 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_PRINCIPAL returned [1432158223]: No matching credentials found
  (Mon Oct 14 06:23:46 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_DEFAULT_CACHE returned [0]: Success
  (Mon Oct 14 06:23:46 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_CACHE_UUID_LIST returned [0]: Success
  (Mon Oct 14 06:23:46 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_DEFAULT_CACHE returned [0]: Success
  (Mon Oct 14 06:23:46 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_PRINCIPAL returned [1432158223]: No matching credentials found
  (Mon Oct 14 06:23:46 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GEN_NEW returned [0]: Success
  (Mon Oct 14 06:23:46 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation INITIALIZE returned [0]: Success
  (Mon Oct 14 06:23:46 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation STORE returned [0]: Success
  (Mon Oct 14 06:23:46 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation SET_DEFAULT_CACHE returned [0]: Success
  (Mon Oct 14 06:23:46 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_DEFAULT_CACHE returned [0]: Success
  (Mon Oct 14 06:23:46 2019) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_PRINCIPAL returned [0]: Success

So the question is whether sssd-kcm shoudl be fixed or krb5_child.

Comment 62 Robbie Harwood 2019-10-14 15:02:25 UTC
(In reply to Lukas Slebodnik from comment #61)
> The most difficult part of debugging was missing debug information about kdc offset from krb5. Neither klist nor KRB5_TRACE shows that and IMH) there is a place for improvement in MIT krb5 (but that should be covered in different PR

Thank you for your opinion.

Logging KCM protocol goo is the responsibility of KCM, not krb5.  There was no difficulty in finding the offset once we had KCM logs; the difficulty was in getting KCM logs.

Comment 63 Lukas Slebodnik 2019-10-14 21:58:37 UTC
(In reply to Robbie Harwood from comment #62)
> (In reply to Lukas Slebodnik from comment #61)
> > The most difficult part of debugging was missing debug information about kdc offset from krb5. Neither klist nor KRB5_TRACE shows that and IMH) there is a place for improvement in MIT krb5 (but that should be covered in different PR
> 
> Thank you for your opinion.
> 
> Logging KCM protocol goo is the responsibility of KCM, not krb5.  There was
> no difficulty in finding the offset once we had KCM logs; the difficulty was
> in getting KCM logs.

We need to agree to disagree here.

Comment 64 Lukas Slebodnik 2019-10-14 22:17:01 UTC
The krb5 code always call function set_kdc_offset in initialization of kcm but that function does something
only of krb5 context has os flag KRB5_OS_TOFFSET_VALID.

And it was properly set to global krb5_context(kr->ctx) after calling krb5_get_init_creds_password.
But another krb5 context is created when creating ccache[1]. (comment in the code explain reason)
Copying time from one context to another seems to help.

I assume it was not a problem with KEYRING ccache because it did not set default kdc offset to INT32_MAX
And the value 0 works well due to chrony/ntpdate.

[1] https://pagure.io/SSSD/sssd/blob/6baf291ba2bd5f873b8e5e84e98d9cd251a5b2a3/f/src/providers/krb5/krb5_child.c#_978

Comment 65 Lukas Slebodnik 2019-10-14 22:19:02 UTC
Created attachment 1625695 [details]
krb5_child patch

I did not run any regression tests

Comment 66 Jakub Hrozek 2019-10-15 06:58:26 UTC
(In reply to Lukas Slebodnik from comment #65)
> Created attachment 1625695 [details]
> krb5_child patch
> 
> I did not run any regression tests

Thank you, perhaps we could use that as well, but I still think that using the KCM cache should not require the user to know that they need to set the time themselves..

Comment 67 Lukas Slebodnik 2019-10-15 21:23:19 UTC
(In reply to Jakub Hrozek from comment #66)
> (In reply to Lukas Slebodnik from comment #65)
> > Created attachment 1625695 [details]
> > krb5_child patch
> > 
> > I did not run any regression tests
> 
> Thank you, perhaps we could use that as well, but I still think that using
> the KCM cache should not require the user to know that they need to set the
> time themselves..

Another POV is that it helped to find a very old bug in krb5_child.
Maybe it should be configurable but I can image you can consider such way to be overkill :-)
It is up to you to decide.

Anyway,
fedora 31 final freeze is over and final GA release is approaching.
So it would be good to patch at least fedora and it does not matter which way.

https://fedoraproject.org/wiki/Releases/31/Schedule

Comment 68 Jakub Hrozek 2019-10-16 20:11:59 UTC
btw Heimdal had a similar bug, at least that makes me feel a little better: https://github.com/heimdal/heimdal/commit/9f58896af958ae5e6e3ebde8c48dad4eda841986

Comment 69 Lukas Slebodnik 2019-10-17 11:08:12 UTC
(In reply to Jakub Hrozek from comment #68)
> btw Heimdal had a similar bug, at least that makes me feel a little better:
> https://github.com/heimdal/heimdal/commit/
> 9f58896af958ae5e6e3ebde8c48dad4eda841986

heimdal had uninitialized data which is something different :-)
sssd-kcm was in much better situation. Therefore workaround with kinit worked for Adam W.
But any authentication (screnn unlock ..) would override ticker by krb5_child. So workaround with kinit is not ideal.

Comment 70 Simo Sorce 2019-10-17 20:17:22 UTC
*** Bug 1762881 has been marked as a duplicate of this bug. ***

Comment 71 Bojan Smojver 2019-10-22 10:18:19 UTC
BTW, uninstalling sssd-kcm and using kernel keyring was a short term workaround here. Tickets issued on login into a Gnome session would be reported by stuff like Evolution, ssh, NFS as expired, although they looked OK when using klist. Weird...

Comment 72 Michal Zidek 2019-10-22 15:09:50 UTC
Fixed in master:
commit 2c9bdcf579e430fa8f7e5595a17cf7242adb5216

Comment 73 Fedora Update System 2019-10-22 19:31:43 UTC
FEDORA-2019-3bdedf56fb has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2019-3bdedf56fb

Comment 74 Fedora Update System 2019-10-22 19:46:16 UTC
FEDORA-2019-e9fc910d7f has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2019-e9fc910d7f

Comment 75 Fedora Update System 2019-10-23 15:44:52 UTC
sssd-2.2.2-3.fc31 has been pushed to the Fedora 31 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-e9fc910d7f

Comment 76 Fedora Update System 2019-10-25 19:33:46 UTC
sssd-2.2.2-3.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-1b16b0e9ae

Comment 77 Fedora Update System 2019-10-25 21:26:42 UTC
sssd-2.2.2-3.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-3bdedf56fb

Comment 78 Fedora Update System 2019-11-02 02:27:36 UTC
sssd-2.2.2-3.fc31 has been pushed to the Fedora 31 stable repository. If problems still persist, please make note of it in this bug report.

Comment 79 Fedora Update System 2019-11-09 22:38:21 UTC
sssd-2.2.2-3.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.

Comment 80 Martin Pitt 2020-01-31 07:24:19 UTC
Our Cockpit tests confirm that this is fixed (https://github.com/cockpit-project/bots/pull/479). Can this be closed now?

Comment 81 Lukas Slebodnik 2020-01-31 09:45:24 UTC
(In reply to Martin Pitt from comment #80)
> Our Cockpit tests confirm that this is fixed
> (https://github.com/cockpit-project/bots/pull/479). Can this be closed now?

It has never been pushed to f29 because bodhi update was obsoleted https://bodhi.fedoraproject.org/updates/FEDORA-2019-3bdedf56fb
However,  f29 is EOL and other versions have already fixed version.

Martin,
fill free to close as CURRENT_RELEASE :-)

Comment 82 Martin Pitt 2020-01-31 09:58:38 UTC
@Lukas: F29 hasn't even been on my radar, and this bz has "Version: 31". So time to close then indeed. Thanks!

Comment 83 Adam Williamson 2020-03-10 21:09:34 UTC
fwiw I seem to be seeing something like this again the last few days. I'll try and look into it more carefully, but the basic symptom (constantly having to run kinit) seems to be back. I kinit'ed to do a Fedora build this morning, suspended my system, went out for a few hours, came back, woke it up again, tried to do another build, and my ticket had expired...