Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at
Bug 1104375 - Collisions possible in proxy lookaside cache
Summary: Collisions possible in proxy lookaside cache
Alias: None
Product: Spacewalk
Classification: Community
Component: Proxy Server
Version: 2.2
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
Assignee: Stephen Herr
QA Contact: Red Hat Satellite QA List
Depends On:
Blocks: 1105282 space22
TreeView+ depends on / blocked
Reported: 2014-06-03 20:57 UTC by Stephen Herr
Modified: 2014-07-17 08:41 UTC (History)
0 users

Fixed In Version: spacewalk-proxy-2.2.4-1
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1105282 (view as bug list)
Last Closed: 2014-07-17 08:41:34 UTC

Attachments (Terms of Use)

Description Stephen Herr 2014-06-03 20:57:16 UTC
Description of problem:
If you are using rhn_package_manager to upload rpms to a Proxy cache and place their metadata into a custom channel, the rpms themselves are not uploaded to Spacewalk but rather just placed in a local cache on the Proxy. This local cache can have collisions if there are multiple rpms with the same name-epoch-version-release-arch combination (which is unusual but possible, say when rpms are signed with two different keys (beta and normal) or when rpms in the Spacewalk are built in two different places (RHEL and Centos). In this case whichever rpm is in the lookaside cache will be returned for all requests, regardless of if it's the correct one for the current channel.

Spacewalk has solved this problem by incorporating the checksum of the rpm into the filesystem path it gets placed in.

Comment 1 Stephen Herr 2014-06-04 15:19:16 UTC
Spacewalk can solve this problem by incorporating the checksum in the path because the checksum is calculated when the rpm is first imported, and then the path is simply saved in the database. Subsequent requests will simply lookup the path from the database and find the appropriate file.

This will not work for Proxy. Proxy could calculate the checksum at rpm import time, but later requests do not contain the checksum information and Proxy has no database to store the path in.

Proxy receives GET requests that contain channel-label and package-name only for identifying information. Theoretically we organize the Proxy lookaside cache such that the channel-label was part of the path; this would solve this immediate problem (a given channel can only contain one rpm with a given nevra), however it's a bad idea. That would mean that we are storing duplicate copies of the rpm for every original, cloned, and custom channel they're in, and that the user would have to upload the rpm into the cache once for every channel instead of just uploading it once. This would cause the Proxy filesystem to have to be many times the size of the Spacewalk filesystem in some use cases.

Theoretically we could implement a new function call up to the Spacewalk to get the rpm's checksum and use it to calculate the path, however this is also probably a bad idea. The Proxy lookaside cache is meant to be a very fast local lookup, it happens before we even hit squid to see what it has cached. It makes little sense to me to include a call up to Spacewalk (over a potentially high-latency network) for every single request, even for those files we eventually do find in the lookaside or squid caches.

However that leaves no other feasible options that I can see. We simply don't have enough information in the request to determine which of the same-nevra-different-checksum rpms is desired. Proxy has no database, no permanent channel-to-package mappings, no permanent statefull information of any kind.

Where does that leave us? Simply documenting that you may not store rpms with the same nevra but different checksums on the proxy, and that once you store an rpm with a given nevra in the lookaside cache it will be returned for all requests for any channel, regardless of if that's the rpm that's actually in that channel? Other options that I don't see?

Comment 2 Stephen Herr 2014-06-04 20:15:28 UTC
I found another solution.

The Proxy does get a channel -> package map from Spacewalk and use it to generate a cached list of package -> path mappings. I'll have to make a new api method on Spacewalk that also returns checksum information for the packages, and then ensure that updated Proxies call that and use the information appropriately to generate paths.

Comment 3 Stephen Herr 2014-06-06 21:19:15 UTC
Committing to Spacewalk master:

We should use a path structure that avoids collisions if:
1) The upstream server is new enough to have implemented bz 1105282
2) This is not a source rpm. Unfortunately for source rpms there's no relaible way to find it again later, they'll just have to use the old collidable path structure.

Comment 4 Stephen Herr 2014-06-07 17:40:14 UTC

Comment 5 Stephen Herr 2014-06-08 02:09:51 UTC
Checkstyle fixes:

Comment 6 Milan Zázrivec 2014-07-17 08:41:34 UTC
Spacewalk 2.2 has been released:

Note You need to log in before you can comment on or make changes to this bug.