Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1105282 - [Server] Collisions possible in proxy lookaside cache
Summary: [Server] Collisions possible in proxy lookaside cache
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Spacewalk
Classification: Community
Component: Server
Version: 2.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Stephen Herr
QA Contact: Red Hat Satellite QA List
URL:
Whiteboard:
Depends On: 1104375
Blocks: space22
TreeView+ depends on / blocked
 
Reported: 2014-06-05 18:44 UTC by Stephen Herr
Modified: 2014-07-17 08:41 UTC (History)
0 users

Fixed In Version: spacewalk-backend-2.2.34-1
Doc Type: Bug Fix
Doc Text:
Clone Of: 1104375
Environment:
Last Closed: 2014-07-17 08:41:11 UTC
Embargoed:


Attachments (Terms of Use)

Description Stephen Herr 2014-06-05 18:44:13 UTC
+++ This bug was initially created as a clone of Bug #1104375 +++

Description of problem:
If you are using rhn_package_manager to upload rpms to a Proxy cache and place their metadata into a custom channel, the rpms themselves are not uploaded to Spacewalk but rather just placed in a local cache on the Proxy. This local cache can have collisions if there are multiple rpms with the same name-epoch-version-release-arch combination (which is unusual but possible, say when rpms are signed with two different keys (beta and normal) or when rpms in the Spacewalk are built in two different places (RHEL and Centos). In this case whichever rpm is in the lookaside cache will be returned for all requests, regardless of if it's the correct one for the current channel.

Spacewalk has solved this problem by incorporating the checksum of the rpm into the filesystem path it gets placed in.

--- Additional comment from Stephen Herr on 2014-06-04 11:19:16 EDT ---

Spacewalk can solve this problem by incorporating the checksum in the path because the checksum is calculated when the rpm is first imported, and then the path is simply saved in the database. Subsequent requests will simply lookup the path from the database and find the appropriate file.

This will not work for Proxy. Proxy could calculate the checksum at rpm import time, but later requests do not contain the checksum information and Proxy has no database to store the path in.

Proxy receives GET requests that contain channel-label and package-name only for identifying information. Theoretically we organize the Proxy lookaside cache such that the channel-label was part of the path; this would solve this immediate problem (a given channel can only contain one rpm with a given nevra), however it's a bad idea. That would mean that we are storing duplicate copies of the rpm for every original, cloned, and custom channel they're in, and that the user would have to upload the rpm into the cache once for every channel instead of just uploading it once. This would cause the Proxy filesystem to have to be many times the size of the Spacewalk filesystem in some use cases.

Theoretically we could implement a new function call up to the Spacewalk to get the rpm's checksum and use it to calculate the path, however this is also probably a bad idea. The Proxy lookaside cache is meant to be a very fast local lookup, it happens before we even hit squid to see what it has cached. It makes little sense to me to include a call up to Spacewalk (over a potentially high-latency network) for every single request, even for those files we eventually do find in the lookaside or squid caches.

However that leaves no other feasible options that I can see. We simply don't have enough information in the request to determine which of the same-nevra-different-checksum rpms is desired. Proxy has no database, no permanent channel-to-package mappings, no permanent statefull information of any kind.

Where does that leave us? Simply documenting that you may not store rpms with the same nevra but different checksums on the proxy, and that once you store an rpm with a given nevra in the lookaside cache it will be returned for all requests for any channel, regardless of if that's the rpm that's actually in that channel? Other options that I don't see?

--- Additional comment from Stephen Herr on 2014-06-04 16:15:28 EDT ---

I found another solution.

The Proxy does get a channel -> package map from Spacewalk and use it to generate a cached list of package -> path mappings. I'll have to make a new api method on Spacewalk that also returns checksum information for the packages, and then ensure that updated Proxies call that and use the information appropriately to generate paths.

Comment 1 Stephen Herr 2014-06-05 18:48:50 UTC
This is a clone for the Spacewalk-side changes that need to be in place to support this fix, the original bug will track the Proxy-side changes.

We need a new API for Proxy to call that will provide the list of packages in the channel with associated checksum information.

Committing to Spacewalk master:
522f10ad718d0bd555c0cf756f7086291f4b1277

Comment 2 Stephen Herr 2014-06-06 20:25:10 UTC
And:
2740d32dda23568c2bea19ece0eb84180f55549e

Comment 3 Milan Zázrivec 2014-07-17 08:41:11 UTC
Spacewalk 2.2 has been released:

    https://fedorahosted.org/spacewalk/wiki/ReleaseNotes22


Note You need to log in before you can comment on or make changes to this bug.