Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1276753 - malloc: arena free list can become cyclic, increasing contention
Summary: malloc: arena free list can become cyclic, increasing contention
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: glibc
Version: 7.1
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: 7.3
Assignee: Florian Weimer
QA Contact: Arjun Shankar
Marc Muehlfeld
URL:
Whiteboard:
Depends On:
Blocks: 1297579 1364088
TreeView+ depends on / blocked
 
Reported: 2015-10-30 18:10 UTC by Paulo Andrade
Modified: 2019-11-14 07:06 UTC (History)
28 users (show)

Fixed In Version: glibc-2.17-156.el7
Doc Type: Bug Fix
Doc Text:
Core C library (glibc) enhanced to increase *malloc()* scalability A defect in the implementation of the *malloc()* function could result in unnecessary serialization of memory allocation requests across threads. This update fixes the bug and substantially increases the concurrent throughput of allocation requests for applications that frequently create and destroy threads.
Clone Of: 1264189
Environment:
Last Closed: 2016-11-03 08:27:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
IBM Linux Technology Center 133482 0 None None None 2019-01-08 08:06:55 UTC
Red Hat Bugzilla 1264189 1 None None None 2021-01-20 06:05:38 UTC
Red Hat Bugzilla 1356648 0 unspecified CLOSED glibc: Further malloc arena free list management fix 2022-05-16 11:32:56 UTC
Red Hat Product Errata RHSA-2016:2573 0 normal SHIPPED_LIVE Low: glibc security, bug fix, and enhancement update 2016-11-03 12:05:56 UTC
Sourceware 19048 0 P2 RESOLVED malloc: arena free list can become cyclic, increasing contention 2021-02-05 05:48:43 UTC
Sourceware 19182 0 P2 RESOLVED malloc deadlock between ptmalloc_lock_all and _int_new_arena/reused_arena 2021-02-05 05:48:43 UTC
Sourceware 19243 0 P1 RESOLVED reused_arena can pick an arena on the free list, leading to an assertion failure and reference count corruption 2021-02-05 05:48:43 UTC
Sourceware 20370 0 P2 RESOLVED malloc: Arena free list management is still racy (incorrect fix in bug 19243) 2021-02-05 05:48:43 UTC

Internal Links: 1356648

Comment 14 Carlos O'Donell 2016-01-11 15:08:40 UTC
I'm opening this bug up more publicly and including IBM.

We plan to deliver a fix for this bug in rhel-7.3.

This bug was created to track the backport of the patches to fix upstream sourceware bug 19048.

The fixes to be backported are as follows:
---
commit 1bd5483e104c8bde6e61dc5e3f8a848bc861872d
Author: Florian Weimer <fweimer>
Date:   Tue Dec 29 20:32:35 2015 +0100

    malloc: Test various special cases related to allocation failures
    
    This test case exercises unusual code paths in allocation functions,
    related to allocation failures.  Specifically, the test can reveal
    the following bugs:
    
    (a) calloc returns non-zero memory on fallback to sysmalloc.
    (b) calloc can self-deadlock because it fails to release
        the arena lock on certain allocation failures.
    (c) pvalloc can dereference a NULL arena pointer.
    
    (a) and (b) appear specific to a faulty downstream backport.
    (c) was fixed as part of commit 10ad46bc6526edc5c7afcc57112da96917ff3629.
    
    The test for (a) was inspired by a reproducer supplied by Jeff Layton.
---
commit 7962541a32eff5597bc4207e781cfac8d1bb0d87
Author: Florian Weimer <fweimer>
Date:   Wed Dec 23 17:23:33 2015 +0100

    malloc: Update comment for list_lock
---
commit 90c400bd4904b0240a148f0b357a5cbc36179239
Author: Florian Weimer <fweimer>
Date:   Mon Dec 21 16:42:46 2015 +0100

    malloc: Fix list_lock/arena lock deadlock [BZ #19182]
    
        * malloc/arena.c (list_lock): Document lock ordering requirements.
        (free_list_lock): New lock.
        (ptmalloc_lock_all): Comment on free_list_lock.
        (ptmalloc_unlock_all2): Reinitialize free_list_lock.
        (detach_arena): Update comment.  free_list_lock is now needed.
        (_int_new_arena): Use free_list_lock around detach_arena call.
        Acquire arena lock after list_lock.  Add comment, including FIXME
        about incorrect synchronization.
        (get_free_list): Switch to free_list_lock.
        (reused_arena): Acquire free_list_lock around detach_arena call
        and attached threads counter update.  Add two FIXMEs about
        incorrect synchronization.
        (arena_thread_freeres): Switch to free_list_lock.
        * malloc/malloc.c (struct malloc_state): Update comments to
        mention free_list_lock.
---
commit 3da825ce483903e3a881a016113b3e59fd4041de
Author: Florian Weimer <fweimer>
Date:   Wed Dec 16 12:39:48 2015 +0100

    malloc: Fix attached thread reference count handling [BZ #19243]
    
    reused_arena can increase the attached thread count of arenas on the
    free list.  This means that the assertion that the reference count is
    zero is incorrect.  In this case, the reference count initialization
    is incorrect as well and could cause arenas to be put on the free
    list too early (while they still have attached threads).
    
        * malloc/arena.c (get_free_list): Remove assert and adjust
        reference count handling.  Add comment about reused_arena
        interaction.
        (reused_arena): Add comments abount get_free_list interaction.
        * malloc/tst-malloc-thread-exit.c: New file.
        * malloc/Makefile (tests): Add tst-malloc-thread-exit.
        (tst-malloc-thread-exit): Link against libpthread.
---
commit 400e12265d99964f8445bb6d717321eb73152cc5
Author: Florian Weimer <fweimer>
Date:   Tue Nov 24 16:37:15 2015 +0100

    Replace MUTEX_INITIALIZER with _LIBC_LOCK_INITIALIZER in generic code
    
        * sysdeps/mach/hurd/libc-lock.h (_LIBC_LOCK_INITIALIZER): Define.
        (__libc_lock_define_initialized): Use it.
        * sysdeps/nptl/libc-lockP.h (_LIBC_LOCK_INITIALIZER): Define.
        * malloc/arena.c (list_lock): Use _LIBC_LOCK_INITIALIZER.
        * malloc/malloc.c (main_arena): Likewise.
        * sysdeps/generic/malloc-machine.h (MUTEX_INITIALIZER): Remove.
        * sysdeps/nptl/malloc-machine.h (MUTEX_INITIALIZER): Remove.
---
commit a62719ba90e2fa1728890ae7dc8df9e32a622e7b
Author: Florian Weimer <fweimer>
Date:   Wed Oct 28 19:32:46 2015 +0100

    malloc: Prevent arena free_list from turning cyclic [BZ #19048]
    
        [BZ# 19048]
        * malloc/malloc.c (struct malloc_state): Update comment.  Add
        attached_threads member.
        (main_arena): Initialize attached_threads.
        * malloc/arena.c (list_lock): Update comment.
        (ptmalloc_lock_all, ptmalloc_unlock_all): Likewise.
        (ptmalloc_unlock_all2): Reinitialize arena reference counts.
        (deattach_arena): New function.
        (_int_new_arena): Initialize arena reference count and deattach
        replaced arena.
        (get_free_list, reused_arena): Update reference count and deattach
        replaced arena.
        (arena_thread_freeres): Update arena reference count and only put
        unreferenced arenas on the free list.

---
commit 6782806d8f6664d87d17bb30f8ce4e0c7c931e17
Author: Florian Weimer <fweimer>
Date:   Sat Oct 17 12:06:48 2015 +0200

    malloc: Rewrite with explicit TLS access using __thread
---

Comment 15 Carlos O'Donell 2016-01-11 15:11:04 UTC
*** Bug 1297423 has been marked as a duplicate of this bug. ***

Comment 24 Florian Weimer 2016-04-27 17:31:32 UTC
Note: I will make this bug public soon so that others can comment if they feel so inclined.

Comment 25 Florian Weimer 2016-04-27 17:33:59 UTC
*** Bug 1330623 has been marked as a duplicate of this bug. ***

Comment 26 Sumeet Keswani 2016-04-27 17:39:01 UTC
In our case (vertica database server) once the arena freelist goes circular, 
it affects the application moving forward independent of concurrency (as the application becomes sick).

This causes significant performance degradation in high concurrency situations.

We tested the efficacy of the patch posted on sourceware internally by (re)building glibc and the patch was stable and improved performance under concurrent load.
A couple customers have tested it not just for stability (it is) but also for performance (it helps).

Comment 27 David Linden 2016-04-27 17:50:08 UTC
What is the target release date for 7.3, thus glibc-2.17-131.el7 will be available?

What are the prospects for publishing glibc-2.17-131.el7 as an update prior to 7.3?

What are the prospects of backporting this to the glibc-2.12 stream for RHEL6?

Comment 28 Florian Weimer 2016-04-27 17:53:02 UTC
(In reply to Sumeet Keswani from comment #26)
> In our case (vertica database server) once the arena freelist goes circular, 
> it affects the application moving forward independent of concurrency (as the
> application becomes sick).
> 
> This causes significant performance degradation in high concurrency
> situations.
> 
> We tested the efficacy of the patch posted on sourceware internally by
> (re)building glibc and the patch was stable and improved performance under
> concurrent load.
> A couple customers have tested it not just for stability (it is) but also
> for performance (it helps).

You should see a similar performance improvement on Red Hat Enterprise Linux 6.8 Beta, where we fixed this issue as bug 1264189 (currently private).

Comment 30 Sumeet Keswani 2016-06-01 17:44:47 UTC
is this fix included in glibc-2.12-1.192 ?

does not show up in this advisory? (RHBA-2016:0834-1)
https://rhn.redhat.com/errata/RHBA-2016-0834.html

How can users on RHEL 6.X get this fix?

Comment 31 Sumeet Keswani 2016-06-01 17:47:39 UTC
can i get access to BZ 1264189

Comment 32 Joseph Kachuck 2016-06-01 17:55:55 UTC
Hello,
I have requested HPE access to BZ 1264189.
Please note this BZ was closed with errata:
https://rhn.redhat.com/errata/RHBA-2016-0834.html

Thank You
Joe Kachuck

Comment 33 Sumeet Keswani 2016-06-01 18:04:04 UTC
Its not listed in the errata (RHBA-2016:0834-1) hence i was not certain how to point users to that for a fix.

Comment 34 Florian Weimer 2016-06-01 18:23:44 UTC
(In reply to Sumeet Keswani from comment #33)
> Its not listed in the errata (RHBA-2016:0834-1) hence i was not certain how
> to point users to that for a fix.

This bug was fixed with RHBA-2016:0834-1 for Red Hat Enterprise Linux 6.8, under bug 1264189.  That bug largely consists of private comments and is not very illuminating to external parties as a result.

Comment 35 Arjun Shankar 2016-07-14 12:02:42 UTC
The upstream bug (https://sourceware.org/bugzilla/show_bug.cgi?id=19048) had a test.c and a check-free_list.sh script attached to it. Running the script against a running instance of the test exposes the bug on ppc64 and s390x (not sure why not on other architectures) even on the patched glibc. So this needs to be looked at again. Florian's doing that right now.

Comment 41 Georg Markgraf 2016-09-09 08:44:36 UTC
Arjun,  is this verified on both architectures, ppc64 and s390x ?

Comment 42 Martin Cermak 2016-09-09 15:20:35 UTC
(In reply to Georg Markgraf from comment #41)
> Arjun,  is this verified on both architectures, ppc64 and s390x ?

Georg, yes the verification happened on all the rhel-7 supported architectures, incl ppc64 and s390x.

Comment 43 IBM Bug Proxy 2016-09-12 17:31:12 UTC
------- Comment From MSTRUBEL.com 2016-09-12 13:27 EDT-------
I ran the arena tests (from glibc bugzilla) and did straces with and without TRIM option enabled using old & new glibc version. It seems the patches are effective.

Thank you very much for taking care of this issue,

I think, that this BZ can be closed now.

best regards
Matthias Strubel

Comment 44 Florian Weimer 2016-09-12 17:35:06 UTC
(In reply to IBM Bug Proxy from comment #43)
> ------- Comment From MSTRUBEL.com 2016-09-12 13:27 EDT-------
> I ran the arena tests (from glibc bugzilla) and did straces with and without
> TRIM option enabled using old & new glibc version. It seems the patches are
> effective.
> 
> Thank you very much for taking care of this issue,

Thank you for the additional testing.

> I think, that this BZ can be closed now.

This bug will be closed automatically once we ship the update as part of Red Hat Enterprise Linux 7.3.

Comment 46 errata-xmlrpc 2016-11-03 08:27:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2573.html


Note You need to log in before you can comment on or make changes to this bug.