Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1225501 - query performance does not scale
Summary: query performance does not scale
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Fedora
Classification: Fedora
Component: libdnf
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: rpm-software-management
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 1080837 1156501
TreeView+ depends on / blocked
 
Reported: 2015-05-27 14:04 UTC by Daniel Mach
Modified: 2018-05-29 14:20 UTC (History)
10 users (show)

Fixed In Version: libdnf-0.14
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-05-29 14:20:35 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
reproducer (1.76 KB, text/x-python)
2015-05-27 14:04 UTC, Daniel Mach
no flags Details
new reproducer working with dnf 2.x and f26 (1.76 KB, text/x-python)
2017-08-01 10:23 UTC, Daniel Mach
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1306304 0 unspecified CLOSED [perf] cache installed set of packages in query (for updates) 2022-05-16 11:32:56 UTC

Internal Links: 1306304

Description Daniel Mach 2015-05-27 14:04:20 UTC
Created attachment 1030579 [details]
reproducer

hawkey (or libsolv?) performs sequence scan for every single query argument.
This makes queries slower than on yum, that probably benefits from using database (sqlite3) backend with indexed data.


Results from my test where I cached data in memory and narrowed down package sets for individual queries vs queries without caching:

1 iteration:
dict cache:  2.5s <-- cache building overhead
queries:     1.8s


5 iterations:
dict cache:  3.9s
queries:     9.5s


10 iterations:
dict cache:  5.4s
queries:    18.2s


20 iterations:
dict cache:  9.0s
queries:    36.2s


100 iterations:
dict cache:  35.3s
queries:    191.3s

Comment 1 Honza Silhan 2015-07-24 08:52:49 UTC
We would appreciate more data from profiler, please, to get it fixed.

Comment 2 Honza Silhan 2015-10-21 13:47:50 UTC
*** Bug 1272109 has been marked as a duplicate of this bug. ***

Comment 3 Fedora Admin XMLRPC Client 2016-07-08 09:24:43 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 4 Fedora End Of Life 2016-07-19 19:09:16 UTC
Fedora 22 changed to end-of-life (EOL) status on 2016-07-19. Fedora 22 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 5 Igor Gnatenko 2017-02-21 08:42:16 UTC
btw, I have to note that it's impossible to get proper performance with hawkey/libdnf/dnf for repoclosure, because they don't expose libsolv objects.

If you need speed, use libsolv directly.

Comment 6 Igor Gnatenko 2017-02-21 08:49:45 UTC
Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    14                                           @profile
    15                                           def main():
    16         1         3801   3801.0      0.0      d = dnf.Base()
    17         1           14     14.0      0.0      d.conf.cachedir = "./dnf-cache"
    18         1          789    789.0      0.0      repo = dnf.repo.Repo("repo-0", d.conf)
    19         1           83     83.0      0.0      repo.baseurl = "http://dl.fedoraproject.org/pub/fedora/linux/releases/22/Server/x86_64/os/"
    20         1           12     12.0      0.0      d.repos.add(repo)
    21                                           
    22         1         6644   6644.0      0.0      d.fill_sack(load_system_repo=False, load_available_repos=True)
    23                                           
    24                                           
    25         1           25     25.0      0.0      print("DICT CACHE")
    26                                           
    27         1            5      5.0      0.0      t10 = datetime.now()
    28                                           
    29         1          899    899.0      0.0      RELDEP_RE = re.compile("^(?P<name>.*)( (?P<flag>[<>=]+) (?P<version>.*))?$")
    30                                           
    31         1            1      1.0      0.0      pkgs_by_dep = {}   # provides_name -> [pkgs]
    32         1            1      1.0      0.0      pkgs_by_file = {}  # /file/path -> [pkgs]
    33                                           
    34      2482         7217      2.9      0.0      for pkg in d.sack.query():
    35     54090        79387      1.5      0.4          for prov in pkg.provides:
    36     51609        82624      1.6      0.4              match = RELDEP_RE.match(str(prov))
    37     51609        60466      1.2      0.3              name = match.groupdict()["name"]
    38     51609        70579      1.4      0.3              pkgs_by_dep.setdefault(name, set()).add(pkg)
    39    172182       230086      1.3      1.0          for prov in pkg.files:
    40    169701       341161      2.0      1.5              pkgs_by_file.setdefault(str(prov), set()).add(pkg)
    41                                           
    42     20158        22652      1.1      0.1      for key in pkgs_by_dep:
    43     20157      1316334     65.3      6.0          pkgs_by_dep[key] = d.sack.query().filter(pkg=pkgs_by_dep[key]).apply()
    44                                           
    45                                           
    46        11           18      1.6      0.0      for i in range(ITERATIONS):
    47     24820       129733      5.2      0.6          for pkg in d.sack.query():
    48    226700       404662      1.8      1.8              for req in pkg.requires:
    49    201890       446666      2.2      2.0                  match = RELDEP_RE.match(str(req))
    50    201890       280349      1.4      1.3                  name = match.groupdict()["name"]
    51    201890       218405      1.1      1.0                  if name.startswith("/"):
    52      4490         7620      1.7      0.0                      pkgs_by_file.get(name, [])
    53                                                           else:
    54    197400       260652      1.3      1.2                      q = pkgs_by_dep.get(name, None)
    55    197400       327988      1.7      1.5                      if q:
    56    130580      3404829     26.1     15.4                          q.filter(provides=req)
    57                                                               else:
    58     66820        58576      0.9      0.3                          []
    59                                           
    60         1            7      7.0      0.0      t11 = datetime.now()
    61         1            2      2.0      0.0      delta = t11 - t10
    62         1           38     38.0      0.0      print("total: %ss" % delta.total_seconds())
    63                                           
    64         1           20     20.0      0.0      print()
    65         1            9      9.0      0.0      print("-----")
    66         1            7      7.0      0.0      print()
    67                                           
    68         1            8      8.0      0.0      print("QUERIES")
    69                                           
    70         1            4      4.0      0.0      t20 = datetime.now()
    71                                           
    72        11           18      1.6      0.0      for i in range(ITERATIONS):
    73     24820        79194      3.2      0.4          for pkg in d.sack.query():
    74    226700       386157      1.7      1.7              for req in pkg.requires:
    75    201890     13841045     68.6     62.7                  list(d.sack.query().filter(provides=req))
    76                                           
    77         1            7      7.0      0.0      t21 = datetime.now()
    78         1            2      2.0      0.0      delta = t21 - t20
    79         1           48     48.0      0.0      print("total: %ss" % delta.total_seconds())

Comment 7 Fedora End Of Life 2017-02-28 09:44:17 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 26 development cycle.
Changing version to '26'.

Comment 8 Daniel Mach 2017-08-01 10:23:32 UTC
Created attachment 1307485 [details]
new reproducer working with dnf 2.x and f26

cache performance has degraded significantly (regression in libdnf/hawkey? unicode literals?)
but the overall query performance stays where it was

Comment 9 Daniel Mach 2018-05-29 14:20:35 UTC
Query performance was fixed in upstream, to be released as part of libdnf-0.14


Note You need to log in before you can comment on or make changes to this bug.