Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1915976 - DNF/RPM Copy on Write enablement for all variants
Summary: DNF/RPM Copy on Write enablement for all variants
Keywords:
Status: ASSIGNED
Alias: None
Product: Fedora
Classification: Fedora
Component: Changes Tracking
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Matthew Almond
QA Contact:
URL:
Whiteboard:
Depends On: 1919003 1922920
Blocks: F35Changes
TreeView+ depends on / blocked
 
Reported: 2021-01-13 20:58 UTC by Ben Cotton
Modified: 2021-02-03 18:43 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: ---
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github rpm-software-management librepo pull 222 0 None open Add support for rpm2extents transcoder 2021-02-01 00:28:47 UTC
Github rpm-software-management rpm pull 1470 0 None open RPM with Copy on Write 2021-02-01 00:28:03 UTC

Internal Links: 1922920

Description Ben Cotton 2021-01-13 20:58:19 UTC
This is a tracking bug for Change: DNF/RPM Copy on Write enablement for all variants
For more details, see: https://fedoraproject.org/wiki/Changes/RPMCoW

RPM Copy on Write provides a better experience for Fedora Users as it reduces the amount of I/O and offsets CPU cost of package decompression. RPM Copy on Write uses reflinking capabilities in btrfs, which is the default filesystem in Fedora 33 for most variants.

Comment 1 Matthew Almond 2021-01-14 23:39:24 UTC
Here's my plan

1. Fix the simple stuff in librepo PR
2. Re-write my super trivial dnf python plugin as a libdnf plugin. This eliminates a new top level package from Fedora. It will be a sub-package.
3. Prototype signature/digest verification during transcode (rpm2extents) in rpm. I think this is challenging, but not impossible.
4. Address other issues in rpm PR
5. Produce PRs for the src.fedoraproject.org/rpms packages. This is simple spec changes, and patches derived from PRs. Depending on rate of updates, I might get some or all of the upstream PRs accepted/merged. I expect each will need some kind of version bump so we can update rawhide
6. Performance numbers. I have another patch for rpm that adds a "measure" plugin. It's super hacky and need a dnf plugin to collect the values. I aim to get some public numbers
7. I want to make the measure tool usable for others, so we go beyond "trust me" to something independently verifiable.

I am aiming for end of January for all this, and I realize that this is insanely optimistic. For visibility, I'm giving a talk at CentOS Dojo 2021 @ Fosdem (https://hopin.com/events/centos-dojo-fosdem) on Feb 5th which is a 45 minute presentation. I aim to cover what we've got so far, and (more interestingly) what I think we can do next.

Comment 2 Matthew Almond 2021-01-20 02:15:14 UTC
1. done
2. I thought I could avoid creating a new package in Fedora by contributing a change to libdnf. Turns out this is not the right approach. The right approach for libdnf is to use seperate sources and reference libdnf-devel. Add to this the complexity that CentOS doesn't ship with -devel packages, and the complexity is much higher than I anticipated. My plan now: I *will* create a new package in Fedora, nominally called 'dnf-plugin-cow' that builds python3-dnf-plugin-cow (naming convention), and later will build libdnf-plugin-cow.
3. I will switch focus to this - I have partial / code. I need to spend more time on it.

Comment 3 Konstantin 2021-01-20 11:59:53 UTC
Proposal authors say:

> Ballpark performance difference is about half the duration for file download+install time

It sounds all good, the problem is, reflinks are not necessarily faster than plain copy, so seeing actual numbers would be helpful.

Incidentally, just yesterday I benchmarked ccache with and without reflinks¹, and I found out that reflinks/CoW was consistently 30 *slower* in building libinput than plain copy.

Of course `ccache` is not dnf, and since I'm no ccache dev, I can't know for sure that it isn't because ccache screwed up their reflinks usage badly. I doubt it though, in part because the benchmark also shows 30% less CPU usage, so apparently the excess time is spent doing IO, which is likely inside BTRFS. Anyway, that's the reasoning behind the question on actual numbers.

1: https://github.com/ccache/ccache/issues/213#issuecomment-762714286

Comment 4 Matthew Almond 2021-02-01 05:50:50 UTC
I wrote an excellent reply, and I just lost it all due to a form-repost action in Bugzilla. Gah. I'm going to summarize what I remember.

> 1. Fix the simple stuff in librepo PR
Done. Now we're pending PR1470 on rpm before this is accepted.

> 2. Re-write my super trivial dnf python plugin as a libdnf plugin. This eliminates a new top level package from Fedora. It will be a sub-package.

See https://github.com/facebookincubator/dnf-plugin-cow/issues/1

> 3. Prototype signature/digest verification during transcode (rpm2extents) in rpm. I think this is challenging, but not impossible.

There's two concerns:

## DoS of disk space on client
There's a potential DoS exploit where a rogue mirror could feed valid data into a compressor that could fill up the local storage during download. This sounds bad, but after looking at rpmfiArchiveReadToFilePsm() I see the header's idea of size is honored, so we only need to verify the header, not the header+payload.

### Potential of a Remote Code Execution (RCE) exploit
on a vulnerable decompression library. I've been experimenting with conditional transcoding dependant on file size. Files under a limit (e.g. 64MB?) can be buffered in memory, revealing the full file digest. It's possible to employ all the checks. The main restriction is that signatures with trusted key should not be transcoded. If a given file is over the limit, it is not transcoded. The bit that sucks here is that larger rpms are the ones that benefit most from transcoding.

> 4. Address other issues in rpm PR

https://github.com/rpm-software-management/rpm/pull/1470/commits

> 5. Produce PRs for the src.fedoraproject.org/rpms packages. This is simple spec changes, and patches derived from PRs. Depending on rate of updates, I might get some or all of the upstream PRs accepted/merged. I expect each will need some kind of version bump so we can update rawhide

I've got forks of rpm, librepo which I need to keep in sync with the PRs. I've made a COPR repo for testing on f33: https://copr.fedorainfracloud.org/coprs/malmond/rpmcow/

> 6. Performance numbers. I have another patch for rpm that adds a "measure" plugin. It's super hacky and need a dnf plugin to collect the values. I aim to get some public numbers
> 7. I want to make the measure tool usable for others, so we go beyond "trust me" to something independently verifiable.

The goal is to satisfy: https://fedoraproject.org/wiki/Changes/RPMCoW#Performance_Metrics . I've added a new section https://fedoraproject.org/wiki/Changes/RPMCoW#update_2021-01-31 to explain progress. I'm going to use bug 1922920 to track this work.

Comment 5 Matthew Almond 2021-02-03 17:06:37 UTC
I've been communicating with the maintainer of RPM on the pull request and it's become clear that this likely depends on the creation of a public, supportable API for RPM. This is not achievable within the window for Fedora 34, so I'm withdrawing the change for Fedora 34 at this time. I will continue to work on this, and expect to re-submit for Fedora 35.

[1] https://github.com/rpm-software-management/rpm/pull/1470#issuecomment-772410935

Comment 6 Ben Cotton 2021-02-03 17:10:13 UTC
(In reply to Matthew Almond from comment #5)
> I'm withdrawing the change for Fedora 34 at this time. I will
> continue to work on this, and expect to re-submit for Fedora 35.

You don't need to withdraw it if you don't want to. We can defer it to F35. However, if you think the end result will be significantly different from your F34 proposal, then it's better to withdraw and resubmit at a later time. I'll let you decide which approach is more appropriate.

Comment 7 Matthew Almond 2021-02-03 18:36:41 UTC
Didn't know that was an option here[1]. I do prefer to defer - the substance of the change proposal doesn't change, just some of the implementation details.

[1] based on https://docs.fedoraproject.org/en-US/program_management/changes_policy/

Comment 8 Ben Cotton 2021-02-03 18:43:59 UTC
Okay, I'll update the appropriate trackers, etc. I'll also add improving that documentation to explicitly address deferring changes.


Note You need to log in before you can comment on or make changes to this bug.