Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1862204 - Persistent "Error: Loading repository 'blah' has failed"
Summary: Persistent "Error: Loading repository 'blah' has failed"
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: libdnf
Version: CentOS Stream
Hardware: Unspecified
OS: Unspecified
medium
unspecified
Target Milestone: rc
: 8.0
Assignee: Lukáš Hrázký
QA Contact: swm-qe
URL:
Whiteboard:
Depends On:
Blocks: dnf-community
TreeView+ depends on / blocked
 
Reported: 2020-07-30 17:34 UTC by Sergei Iudin
Modified: 2020-12-02 12:19 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-30 16:35:34 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Sergei Iudin 2020-07-30 17:34:31 UTC
Description of problem: dnf works only first time after updating cache and building solv db. After db is built it fails with:
"Error: Loading repository 'kernel_contbuild' has failed"

This repo works with yum just fine.

Version-Release number of selected component (if applicable):
libsolv-0.7.7-1.el8
libdnf-0.39.1-5

i also tried libdnf and libsolv built from master - they has the same issue

How reproducible:
always, i'm providing 

( https://drive.google.com/file/d/1YHWJhhFFQPYfJ_bSEauF2Adzo040uKeL/view?usp=sharing )

dummy repo which can reproduce the issue.

Steps to Reproduce:
1.
[kernel_contbuild]
baseurl=file:/tmp/repo/
enabled=1
gpgcheck=0
name=kernel_contbuild


2. [ ~] sudo rm -rf /var/cache/dnf/*

3. [ ~] sudo dnf search kernel-core
kernel_contbuild                                                                                                                                                                                              27 MB/s |  28 kB     00:00
===================================================================================================== Name Exactly Matched: kernel-core =====================================================================================================
kernel-core-5.3.7-301.fc31.x86_64 : The Linux kernel

4. [ ~] sudo dnf search kernel-core
Error: Loading repository 'kernel_contbuild' has failed


Actual results:
Error: Loading repository 'kernel_contbuild' has failed



Expected results:
kernel-core-5.3.7-301.fc31.x86_64 : The Linux kernel


Additional info:
i've debugged this for a while i know that actual exception happens somewhere here:
https://paste2.org/9ZzyLmjC
https://github.com/openSUSE/libsolv/blob/master/src/repo_solv.c#L549

Comment 1 Jan Blazek 2020-07-31 06:45:51 UTC
Hello, the URL you have in your repo file does not seem to follow file URI scheme standard. See https://en.wikipedia.org/wiki/File_URI_scheme

Can you try to change the baseurl option to baseurl=file:///tmp/repo/ ?

Comment 2 Sergei Iudin 2020-07-31 07:18:59 UTC
[root@ /tmp]# cat /etc/dnf/dnf.conf

[main]
cachedir=/var/cache/dnf
debuglevel=2
distroverpkg=redhat-release
exactarch=1
exclude=freerdp-libs freerdp vinagre libvirt-glib libvirt-gobject libvirt-gconfig gnome-boxes virt-viewer libmad libmad-devel kernel-4.18* fb-runit runit kernel-headers-4.18* *i686*
gpgcheck=1
http_caching=packages
installonlypkgs=kernel kernel-bigmem kernel-enterprise kernel-smp kernel-debug kernel-unsupported kernel-source kernel-devel kernel-PAE kernel-PAE-debug
keepcache=0
logfile=/var/log/yum.log
metadata_expire=1h
obsoletes=1
pkgpolicy=newest
plugins=1
showdupesfromrepos=1
sslcacert=/var/facebook/rootcanal/ca.pem
sslverify=0
timeout=5
tolerant=1

[kernel_contbuild]
baseurl=file:///tmp/repo/
enabled=1
gpgcheck=0
name=kernel_contbuild
[root@ /tmp]# rm /var/cache/dnf/* -rf
[root@ /tmp]# dnf search kernel-core
kernel_contbuild                                                                                                                                                                                              27 MB/s |  28 kB     00:00
===================================================================================================== Name Exactly Matched: kernel-core =====================================================================================================
kernel-core-5.3.7-301.fc31.x86_64 : The Linux kernel
[root@ /tmp]# dnf search kernel-core
Error: Loading repository 'kernel_contbuild' has failed

Comment 4 Lukáš Hrázký 2020-11-09 14:38:05 UTC
Hello Sergei, sorry about the delay. Can you still reproduce this and are you sure you have a valid repository in /tmp/repo? (e.g. the /tmp contents get deleted after a reboot...)

dnf definitely should print a descriptive error message in this case. Can you run dnf with -v and post the output?

Would it be possible to share the faulty repository so that we can easily reproduce the issue?

Comment 5 Sergei Iudin 2020-11-13 02:18:43 UTC
Hello.

>Can you still reproduce this
Yes, i still can reproduce.

>Would it be possible to share the faulty repository
Yes, it is possible to share the repo - the link to the repo posted in the op post: ( https://drive.google.com/file/d/1YHWJhhFFQPYfJ_bSEauF2Adzo040uKeL/view?usp=sharing )

>are you sure you have a valid repository in /tmp/repo
It is valid repo in terms of yum at least - yum works perfectly fine with this repo.

Comment 6 Lukáš Hrázký 2020-11-23 17:12:24 UTC
The repodata are invalid, you're missing (at least) the top-level xml tags in all of filelists.xml, other.xml, primary.xml. For example, your contents of other.xml:

<package nevra="kernel-core-0:5.3.7-301.fc31.x86_64" type="rpm">
  <version epoch="0" ver="5.3.7" rel="301.fc31"/>
</package>

My generated repodata (using createrepo_c):

<?xml version="1.0" encoding="UTF-8"?>
<otherdata xmlns="http://linux.duke.edu/metadata/other" packages="1">
<package pkgid="f0509e333636e5c34726c8a2b8260bf88fe0a35b95cae6dda62191fee1be4c6a" name="kernel-core" arch="x86_64">
  <version epoch="0" ver="5.3.7" rel="301.fc31"/>
  <changelog author="Justin M. Forbes &lt;jforbes&gt;" date="1570104000">- Fix CVE-2019-17052 CVE-2019-17053 CVE-2019-17054 CVE-2019-17055 CVE-2019-17056
  (rhbz 1758239 1758240 1758242 1758243 1758245 1758246 1758248 1758249 1758256 1758257)</changelog>
  <!-- ... changelog entries ommited for brevity -->
</package>
</otherdata>

I'm not sure why it passed on the first attempt and it would certainly be better if we got a meaningful error message from libsolv, I've reported the issue upstream: https://github.com/openSUSE/libsolv/issues/413

Alas, that's just about proper error reporting, and improving that may take some time. The real issue is your repodata are wrong.

Comment 7 Sergei Iudin 2020-11-23 19:02:30 UTC
Does it mean "tags" are mandatory field per specification? May i have a link to specs? Does spec different for yum and dnf? And, well, obviously specs very unlikely changed between first and non-first invocations of dnf.

The satisfactory outcome of this bug would be clear written specifications and an app that behave according to these written specifications.

Comment 8 Lukáš Hrázký 2020-11-24 09:34:48 UTC
Not sure what exactly you mean by "tags" as a mandatory field, I was talking about xml tags like "<otherdata>" in the example above. And also the "<?xml ...?>" line, which is called XML prolog.

Not a proper specification, but here are schemas for the xml files: https://github.com/openSUSE/libzypp/tree/master/zypp/parser/yum/schema

The data are the same for yum and dnf.

The fact that it appeared to load fine on the first run and failed with an error on the second is I think just quirky libsolv behavior.

In an ideal world, we would have perfect specifications for everything, but that's not the world we live in and there's a lot of other work to do. It's also open-source, you're welcome to join the efforts.

Comment 9 Lukáš Hrázký 2020-11-30 16:35:34 UTC
Closing, since there's only error-handling to fix and there's the upstream issue for that. If you come to a conclusion there's something to fix on dnf side, feel free to reopen.

Comment 10 Sergei Iudin 2020-11-30 17:23:45 UTC
Hello.

I disagree that is not a bug - the metadata working with yum just fine and dnf supposed to be backward compatible and i disagree that metadata is invalid for the reason above plus the fact that it's working on first invocation.
But, well, thanks anyway. I've seen the code and i understand that it's close to impossible to understand why it's not working with this metadata, even for author probably.

Comment 11 Lukáš Hrázký 2020-11-30 17:40:14 UTC
I don't really wanna go and examine the yum xml parser, it's a waste of time for the most part. If you understand xml, you'll know that without the outer tags the document is formally invalid. True you may be able to read the data from it, but you're not conforming to the original schema, and that's the most important thing in any data exchange protocol. We're not dealing with "maybe I can read this", reliable software needs to adhere to the proper format. The other two things are a proper specification and proper error reporting. We don't seem to have either in this case, so yes, that needs some work... as many other areas in dnf with a technical debt. You have the schemas now, so hopefully you can validate your xmls. Sorry about the inconvenience.

I'll also note the official tool to create repodata is createrepo_c, you may have your reasons for not using that, but it's a consideration and you can compare your xmls to the output of that if you want to quickly see what's different.

Thanks for understanding.

Comment 12 Sergei Iudin 2020-12-01 20:57:53 UTC
FYI - it is not the xml itself and not the xml parser - adding `<?xml version="1.0" encoding="UTF-8"?>` won't change anything, i tried just now.

I understand your point about open-source, i certainly can try to reverse engineer with gdb this masterpiece of programming to figure out why exactly it isn't working, but i don't think it is feasible if even developers unable to figure this out.

I would really like to have an valid argument to stand the position that this is a valid xml, but i'm unable to have this argument even in theory - there is no definition of a valid xml for dnf, which is very sad.

Comment 13 Lukáš Hrázký 2020-12-02 12:19:53 UTC
Sergei, I haven't tried to figure out what the issue exactly is. I don't have the time right now, I could track it down, but seeing your somewhat derogatory comments you probably understand it would take some time to just track it down, just to find out you need to fix your data and in libsolv it would mean fixing the error handling/reporting, which may be non-trivial and invasive. I've only verified that the repo works when I generate the data using createrepo_c, hence the issue is something with your repodata. You could even try bisecting the differences between your xmls and the ones generated by createrepo_c to find out what exactly is causing the issue.

But, it's not just about adding the xml prolog (the <?xml ...?> tag). You're also missing the outer <otherdata/> tag in case of other.xml, though that's likely also not the issue, even though that should certainly cause libsolv to fail with an error already. So the problem must be some other difference in the repodata.

And there actually is a definition of a valid xml, I've linked the schemas for you in comment#8.

Further discussion, if you decide to do more investigation, might be better recorded in the upstream bug, especially if you'd e.g. find that the crash is still happening once your xmls are valid: https://github.com/openSUSE/libsolv/issues/413


Note You need to log in before you can comment on or make changes to this bug.