Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1727424
Summary: | libdnf 0.35.1 crashes with "Assertion `repoImpl->libsolvRepo == repo' failed" | |||
---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Adam Williamson <awilliam> | |
Component: | libdnf | Assignee: | Jaroslav Rohel <jrohel> | |
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | |
Severity: | urgent | Docs Contact: | ||
Priority: | high | |||
Version: | rawhide | CC: | bojan, dmach, fzatlouk, jmracek, jrohel, ksrot, mblaha, pkratoch, robatino, rpm-software-management, yaneti | |
Target Milestone: | --- | Keywords: | Triaged | |
Target Release: | --- | |||
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | openqa AcceptedBlocker | |||
Fixed In Version: | libdnf-0.35.1-2.fc30 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1730224 (view as bug list) | Environment: | ||
Last Closed: | 2019-07-23 15:16:53 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1644937, 1730224 |
Description
Adam Williamson
2019-07-06 04:36:53 UTC
Thank you very much for the report. I would like to ask you for a simple reproducer? I just tried "pkcon install acpi" and it worked like expected. I would prefer a reproducer without cockpit or at least with description for a person without a previous experience with cockpit. We taking the issue seriously. Using the Cockpit reproducer would be tedious, as you need to set up a whole FreeIPA server (although if there's any other Cockpit operation which causes it to install packages, those might trigger it too). Running GNOME Software and trying to refresh available updates may do it. I do have the same crash here on my own Rawhide desktop, presumably from a background update refresh attempt. I just fiddled about with it a bit here, and I was able to reproduce it by killing all running gnome-software processes, restarting packagekit, and running gnome-software. Just did it again and it crashed again, so that's 2 for 2. Try that? The issue is very difficult to resolve without a reproducer. I create a patch https://github.com/rpm-software-management/libdnf/pull/759 that theoretically could help. Please could you: 1. Try the patch if it resolves the issue? 2. Please could you try to reproduce the issue with libdnf-0.33 and libdnf-0.31? Thanks a lot A scratch build with the patch: https://koji.fedoraproject.org/koji/taskinfo?taskID=36130933 Jaroslav: as noted on IRC, I put a reproducer that worked for me in my comment. I'm pretty sure the bug didn't happen with 0.31, as that was the version previously in Rawhide and we weren't hitting this crash till 0.35 landed. I don't know about 0.33, as that never made it to a Rawhide compose. I will check that, and the scratch build. No, the scratch build doesn't help. My reproducer (killall gnome-software, systemctl restart packagekit, gnome-software) still crashes packagekitd, with the same assertion error. I'll test with 0.33. 0.33 does not seem to have the bug. I'll triage more tomorrow. If I had to guess a suspect...maybe ce7d1f25681c42079c348328bdfae26eb23d3051 ? OK, I was one commit out :). Bisected down to this commit: https://github.com/rpm-software-management/libdnf/commit/61a235c960b552640e73909c5bc52585c5a3f844 which, now I look at it, has this rather smoking gun-looking line: https://github.com/rpm-software-management/libdnf/commit/61a235c960b552640e73909c5bc52585c5a3f844#diff-f5b2fb1705fa70e7aeb3eb12b877c6feR1419 ...so, yeah. That line should be ok, the null pointer is an initial value that is overridden later on. The problem occurs when there's repo with enabled=0, enabled_metadata=1; typically fedora-cisco-openh264 on Fedora. Reason for the crash is that refcount to the repo object is decreased, repo gets deallocated and is used afterwards, which triggers the crash. We're still unable to identify the root cause - no idea why refcount gets decreased for repos with enabled=0, enabled_metadata=1. I managed to fix that in PackageKit, simply by postponing the deallocation: https://koji.fedoraproject.org/koji/taskinfo?taskID=36183157 But we're still trying to discover the root cause in libdnf and understand what's going on. Adam, after couple days of reviewing code and Repo implementation, we have came to a conclusion that you were absolutely right about the place where it breaks. Jaroslav Rohel is working on a fix. The code is unnecessarily complicated and re-initializes the underlying libsolvRepo several times and the work with references is far from ideal. Unfortunately the code cannot be probably simplified without breaking the current C API -> we'll do that in the next major libdnf version. I added PR https://github.com/rpm-software-management/libdnf/pull/761 . But I found another problem during CI tests, so the PR is blocked until I fix it. Discussed during the 2019-07-15 blocker review meeting: [1] The decision to classify this bug as an AcceptedBlocker was made: "AFAWCS, these two crashes have the same basic cause and break GNOME Software and Cockpit package installation. They do not seem to happen 100% of the time but on current information we think they're significant enough to violate "The installed system must be able appropriately to install software with the default tool for the relevant software type in all release-blocking desktops" [1] https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2019-07-15/f31-blocker-review.2019-07-15-16.05.log.txt Hello, I made scratch builds with the newest patch. rawhide: https://koji.fedoraproject.org/koji/taskinfo?taskID=36317514 Fedora 30: https://koji.fedoraproject.org/koji/taskinfo?taskID=36317358 Can you please confirm the issue is fixed there? Thank you. OK, https://openqa.stg.fedoraproject.org/tests/overview?distri=fedora&version=30&build=Kojitask-36317358-NOREPORT&groupid=2 is testing that build; if the realmd_join_cockpit and desktop_update_graphical tests pass, that would indicate the bug is fixed. All tests passed, so from the looks of that, the fix does work. Thanks. FEDORA-2019-672a74d688 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-672a74d688 This bug was filed against Rawhide, so we can just close it at this point. (We could probably close the other one too as the bad update never made it out of u-t, but meh). dnf-4.2.7-2.fc30, libdnf-0.35.1-2.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-672a74d688 dnf-4.2.7-2.fc30, libdnf-0.35.1-2.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report. |