Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1831086 - softhsm use-after-free on process exit
Summary: softhsm use-after-free on process exit
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: softhsm
Version: 33
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Paul Wouters
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: openqa
: 1838487 (view as bug list)
Depends On:
Blocks: 1834909
TreeView+ depends on / blocked
 
Reported: 2020-05-04 15:44 UTC by David Woodhouse
Modified: 2020-12-09 01:54 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1834909 (view as bug list)
Environment:
Last Closed: 2020-12-09 01:54:56 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github opendnssec SoftHSMv2 issues 548 0 None closed Use after free on ENGINE_finish(rdrand_engine). 2020-12-09 00:36:57 UTC

Description David Woodhouse 2020-05-04 15:44:18 UTC
My OpenConnect builds are failing on rawhide. It turns out that when softhsm is used, it prevents the process from exiting cleanly.

Reproduce by just rebuilding the openconnect RPM. Or checkout out from git://git.infradead.org/users/dwmw2/openconnect.git and running 'make check'...


make  check-TESTS
make[2]: Entering directory '/home/dwmw2/git/openconnect/gtls/tests'
make[3]: Entering directory '/home/dwmw2/git/openconnect/gtls/tests'
PASS: autocompletion
PASS: auth-username-pass
PASS: auth-certificate
PASS: auth-nonascii
PASS: cert-fingerprint
PASS: id-test


The test it's stuck on, and hasn't printed yet, is auth-pkcs11. If I connect to the stuck openconnect process with gdb, I see this backtrace:

919	      if ((atomic_load_acquire (&rwlock->__data.__readers)
(gdb) bt
#0  0x00007f3be4ed14b7 in __pthread_rwlock_wrlock_full (abstime=0x0, rwlock=0x134f7b0) at pthread_rwlock_common.c:919
#1  __GI___pthread_rwlock_wrlock (rwlock=0x134f7b0) at pthread_rwlock_wrlock.c:27
#2  0x00007f3be4bfbe7d in CRYPTO_THREAD_write_lock (lock=<optimized out>) at crypto/threads_pthread.c:78
#3  0x00007f3be4b61745 in ENGINE_finish (e=0x134f7f0) at crypto/engine/eng_init.c:101
#4  0x00007f3bd4b55e8b in OSSLCryptoFactory::~OSSLCryptoFactory() () from /usr/lib64/pkcs11/libsofthsm2.so
#5  0x00007f3bd4b55f1d in OSSLCryptoFactory::~OSSLCryptoFactory() () from /usr/lib64/pkcs11/libsofthsm2.so
#6  0x00007f3be4d38680 in __run_exit_handlers (status=status@entry=0, listp=0x7f3be4ebe738 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108
#7  0x00007f3be4d387c0 in __GI_exit (status=status@entry=0) at exit.c:139
#8  0x0000000000403f4c in main (argc=<optimized out>, argv=<optimized out>) at ../main.c:1906

Comment 1 David Woodhouse 2020-05-04 16:22:36 UTC
==228434== Invalid read of size 4
==228434==    at 0x54D9316: __pthread_rwlock_wrlock_full (pthread_rwlock_common.c:581)
==228434==    by 0x54D9316: pthread_rwlock_wrlock (pthread_rwlock_wrlock.c:27)
==228434==    by 0x58ADE7C: CRYPTO_THREAD_write_lock (threads_pthread.c:78)
==228434==    by 0x5813744: ENGINE_finish (eng_init.c:101)
==228434==    by 0x134794D6: OSSLCryptoFactory::~OSSLCryptoFactory() (OSSLCryptoFactory.cpp:240)
==228434==    by 0x13479558: OSSLCryptoFactory::~OSSLCryptoFactory() (OSSLCryptoFactory.cpp:259)
==228434==    by 0x552767F: __run_exit_handlers (exit.c:108)
==228434==    by 0x55277BF: exit (exit.c:139)
==228434==    by 0x404043: main (main.c:1553)
==228434==  Address 0x14bb74c8 is 24 bytes inside a block of size 56 free'd
==228434==    at 0x4839A0C: free (vg_replace_malloc.c:540)
==228434==    by 0x583FBE1: OPENSSL_cleanup (init.c:601)
==228434==    by 0x583FBE1: OPENSSL_cleanup (init.c:497)
==228434==    by 0x552767F: __run_exit_handlers (exit.c:108)
==228434==    by 0x55277BF: exit (exit.c:139)
==228434==    by 0x404043: main (main.c:1553)
==228434==  Block was alloc'd at
==228434==    at 0x483880B: malloc (vg_replace_malloc.c:309)
==228434==    by 0x584859D: CRYPTO_zalloc (mem.c:230)
==228434==    by 0x58ADE0B: CRYPTO_THREAD_lock_new (threads_pthread.c:29)
==228434==    by 0x58137B9: do_engine_lock_init (eng_lib.c:25)
==228434==    by 0x58137B9: do_engine_lock_init_ossl_ (eng_lib.c:21)
==228434==    by 0x54DBD7E: __pthread_once_slow (pthread_once.c:116)
==228434==    by 0x58ADEFC: CRYPTO_THREAD_run_once (threads_pthread.c:118)
==228434==    by 0x5813878: ENGINE_new (eng_lib.c:33)
==228434==    by 0x58152FD: ENGINE_rdrand (eng_rdrand.c:70)
==228434==    by 0x58152FD: engine_load_rdrand_int (eng_rdrand.c:85)
==228434==    by 0x583F8BC: ossl_init_engine_rdrand (init.c:353)
==228434==    by 0x583F8BC: ossl_init_engine_rdrand_ossl_ (init.c:347)
==228434==    by 0x54DBD7E: __pthread_once_slow (pthread_once.c:116)
==228434==    by 0x58ADEFC: CRYPTO_THREAD_run_once (threads_pthread.c:118)
==228434==    by 0x584004C: OPENSSL_init_crypto (init.c:723)
==228434==    by 0x584004C: OPENSSL_init_crypto (init.c:620)

Comment 2 David Woodhouse 2020-05-07 14:22:32 UTC
This is because SoftHSM's atexit hander (well, the destructor for the OSSLCryptoFactory object) is trying to clean up the rdrand engine after OpenSSL's own atexit handler has run and cleaned up *everything*.

Proof of concept fix at https://github.com/dwmw2/SoftHSMv2/commit/f9c24e04f21e241ee93dd862f5ebb56dd54461c5

I've worked around it for now in my OpenConnect builds by exporting OPENSSL_ia32cap=~0x4000000000000000 when running the tests, to disable RDRAND.

Comment 4 Fedora Update System 2020-05-12 15:50:01 UTC
FEDORA-2020-3735365352 has been submitted as an update to Fedora 32. https://bodhi.fedoraproject.org/updates/FEDORA-2020-3735365352

Comment 5 Fedora Update System 2020-05-13 03:55:57 UTC
FEDORA-2020-3735365352 has been pushed to the Fedora 32 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-3735365352`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-3735365352

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 6 Fedora Update System 2020-05-13 05:51:47 UTC
FEDORA-2020-7dfd335e05 has been pushed to the Fedora 31 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-7dfd335e05`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-7dfd335e05

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 7 Lukas Slebodnik 2020-05-13 08:51:13 UTC
It is even worse with the latest version.

sh-5.0$ rm -rf /tmp/softhsm2_none*
sh-5.0$ mkdir /tmp/softhsm2_none
sh-5.0$ cat > /tmp/softhsm2_none.conf <<EOF
directories.tokendir = /tmp/softhsm2_none
objectstore.backend = file
slots.removable = true
EOF

sh-5.0$ SOFTHSM2_CONF=/tmp/softhsm2_none.conf /usr/bin/softhsm2-util --init-token  --label "Test Token" --pin 123456 --so-pin 123456 --free
Slot 0 has a free/uninitialized token.
The token has been initialized and is reassigned to slot 1837621079
Segmentation fault (core dumped)
sh-5.0$ echo $?
139

sh-5.0$ rpm -q softhsm
softhsm-2.6.1-2.fc33.x86_64

Comment 8 Lukas Slebodnik 2020-05-13 08:54:22 UTC
sh-5.0$ SOFTHSM2_CONF=/tmp/softhsm2_none.conf valgrind /usr/bin/softhsm2-util --init-token  --label "Test Token" --pin 123456 --so-pin 123456 --free
==196== Memcheck, a memory error detector
==196== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==196== Using Valgrind-3.16.0.GIT and LibVEX; rerun with -h for copyright info
==196== Command: /usr/bin/softhsm2-util --init-token --label Test\ Token --pin 123456 --so-pin 123456 --free
==196== 
Slot 0 has a free/uninitialized token.
The token has been initialized and is reassigned to slot 806032075
==196== Jump to the invalid address stated on the next line
==196==    at 0x5518E90: ???
==196==    by 0x49D9399: OPENSSL_cleanup (in /usr/lib64/libcrypto.so.1.1.1g)
==196==    by 0x4D91316: __run_exit_handlers (in /usr/lib64/libc-2.31.9000.so)
==196==    by 0x4D914BF: exit (in /usr/lib64/libc-2.31.9000.so)
==196==    by 0x4D79158: (below main) (in /usr/lib64/libc-2.31.9000.so)
==196==  Address 0x5518e90 is not stack'd, malloc'd or (recently) free'd
==196== 
==196== 
==196== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==196==  Access not within mapped region at address 0x5518E90
==196==    at 0x5518E90: ???
==196==    by 0x49D9399: OPENSSL_cleanup (in /usr/lib64/libcrypto.so.1.1.1g)
==196==    by 0x4D91316: __run_exit_handlers (in /usr/lib64/libc-2.31.9000.so)
==196==    by 0x4D914BF: exit (in /usr/lib64/libc-2.31.9000.so)
==196==    by 0x4D79158: (below main) (in /usr/lib64/libc-2.31.9000.so)
==196==  If you believe this happened as a result of a stack
==196==  overflow in your program's main thread (unlikely but
==196==  possible), you can try to increase the size of the
==196==  main thread stack using the --main-stacksize= flag.
==196==  The main thread stack size used in this run was 8388608.
==196== 
==196== HEAP SUMMARY:
==196==     in use at exit: 91,224 bytes in 2,995 blocks
==196==   total heap usage: 14,625 allocs, 11,630 frees, 1,680,726 bytes allocated
==196== 
==196== LEAK SUMMARY:
==196==    definitely lost: 0 bytes in 0 blocks
==196==    indirectly lost: 0 bytes in 0 blocks
==196==      possibly lost: 0 bytes in 0 blocks
==196==    still reachable: 91,224 bytes in 2,995 blocks
==196==         suppressed: 0 bytes in 0 blocks
==196== Rerun with --leak-check=full to see details of leaked memory
==196== 
==196== For lists of detected and suppressed errors, rerun with: -s
==196== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)

Comment 9 Lukas Slebodnik 2020-05-13 08:58:43 UTC
I cannot see crash with older version.

sh-5.0$ rm -rf /tmp/softhsm2_none*
sh-5.0$ mkdir /tmp/softhsm2_none
sh-5.0$ cat > /tmp/softhsm2_none.conf <<EOF  
> directories.tokendir = /tmp/softhsm2_none 
> objectstore.backend = file 
> slots.removable = true 
> EOF
sh-5.0$ rpm -q softhsm
softhsm-2.6.1-1.fc33.x86_64
sh-5.0$ 
sh-5.0$ SOFTHSM2_CONF=/tmp/softhsm2_none.conf valgrind /usr/bin/softhsm2-util --init-token  --label "Test Token" --pin 123456 --so-pin 123456 --free
==34== Memcheck, a memory error detector
==34== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==34== Using Valgrind-3.16.0.GIT and LibVEX; rerun with -h for copyright info
==34== Command: /usr/bin/softhsm2-util --init-token --label Test\ Token --pin 123456 --so-pin 123456 --free
==34== 
Slot 0 has a free/uninitialized token.
The token has been initialized and is reassigned to slot 2117858883
==34== 
==34== HEAP SUMMARY:
==34==     in use at exit: 0 bytes in 0 blocks
==34==   total heap usage: 14,224 allocs, 14,224 frees, 1,647,246 bytes allocated
==34== 
==34== All heap blocks were freed -- no leaks are possible
==34== 
==34== For lists of detected and suppressed errors, rerun with: -s
==34== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Comment 10 David Woodhouse 2020-05-13 11:25:07 UTC
Thanks for the report. I think I see the problem.

https://github.com/opendnssec/SoftHSMv2/issues/548#issuecomment-627919265

Comment 11 David Woodhouse 2020-05-13 12:24:02 UTC
https://github.com/opendnssec/SoftHSMv2/pull/551

Please could you test the build at https://koji.fedoraproject.org/koji/taskinfo?taskID=44446665

Comment 12 Lukas Slebodnik 2020-05-13 17:29:31 UTC
(In reply to David Woodhouse from comment #11)
> https://github.com/opendnssec/SoftHSMv2/pull/551
> 
> Please could you test the build at
> https://koji.fedoraproject.org/koji/taskinfo?taskID=44446665

Thank you.
it works for me

Comment 13 Adam Williamson 2020-05-13 17:33:12 UTC
FreeIPA server deployment broke in today's Rawhide, it fails due to a crash in softhsm2-util . I don't have a backtrace ATM as it seems like Rawhide's default config prevents systemd capturing the core dump when the crash happens, but from timing, looks quite likely to be this bug. Proposing as an F33 Beta blocker, violates Basic criterion "It must be possible to configure a Fedora Server system installed according to the above criteria as a FreeIPA domain controller, using the official deployment tools provided in the distribution FreeIPA packages" - https://fedoraproject.org/wiki/Basic_Release_Criteria#FreeIPA_server_requirements

Comment 14 David Woodhouse 2020-05-13 18:53:47 UTC
I've taken the liberty of submitting a 2.6.1-3 build for rawhide. I'll let Paul handle F31/F32 updates, perhaps waiting until the latest fix is actually merged into SoftHSM upstream (which hopefully won't take long as I broke them too).

Sorry :)

Comment 15 David Woodhouse 2020-05-15 09:49:32 UTC
I've made buildroot overrides for softhsm-2.6.1-3 in F31 and F32 so that I can build the new OpenConnect 8.10 release for CVE-2020-12823.

Comment 16 Alexander Bokovoy 2020-05-15 10:08:54 UTC
David, after you've done with the rebuild, feel free to add OpenConnect builds to the Bodhi requests of softhsm so they get pushed together:

F32: https://bodhi.fedoraproject.org/updates/FEDORA-2020-77604f07ae
F31: https://bodhi.fedoraproject.org/updates/FEDORA-2020-901be5b51e

Comment 17 David Woodhouse 2020-05-15 11:36:05 UTC
Thanks. In fact I think I'll keep them separate though; the bugs are mostly orthogonal except that the self-tests for OpenConnect were tripping over the SoftHSM one.

Comment 18 Tomas Mraz 2020-05-21 09:39:57 UTC
*** Bug 1838487 has been marked as a duplicate of this bug. ***

Comment 19 Daniel Berrangé 2020-05-21 09:57:49 UTC
IMHO it would be desirable if openssl did more robust cleanup to avoid the danger of use after free. 

At the very least  setting "global_engine_lock = NULL" in engine_cleanup_int should have caused this to crash immediately rather proceed to access free'd memory and burn 100% CPU.

Comment 20 David Woodhouse 2020-05-21 10:10:24 UTC
Agreed.

Comment 21 Adam Williamson 2020-05-21 16:13:10 UTC
FreeIPA tests passed in Rawhide in the last few composes we got, so I'd say from that perspective this is fixed. Do we need to keep the bug open till F31 and F32 updates go stable?

Comment 22 Adam Williamson 2020-05-23 00:15:26 UTC
Dropping F33 blocker tags, at least.

Comment 23 Ben Cotton 2020-08-11 13:22:37 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 33 development cycle.
Changing version to 33.

Comment 24 Paul Wouters 2020-12-09 01:54:56 UTC
this was fixed in f32/f33 a while ago


Note You need to log in before you can comment on or make changes to this bug.