Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.

Bug 1797052

Summary: CVE-2020-9391 kernel: brk discards top byte of addresses on aarch64, causing heap corruption in glibc malloc
Product: [Fedora] Fedora Reporter: Miro Hrončok <mhroncok>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 31CC: airlied, aoliva, arjun.is, bskeggs, codonell, cstratak, dj, fweimer, gsuckevi, hdegoede, ichavero, itamar, jarodwilson, jcline, jeremy, jglisse, john.j5live, jonathan, josef, kernel-maint, law, linville, masami256, mchehab, mfabian, mhroncok, mjg59, pbrobinson, pfrankli, pviktori, python-sig, rth, siddhesh, steved, torsava, vstinner
Target Milestone: ---Keywords: Security
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
URL: https://koschei.fedoraproject.org/package/python27
Whiteboard:
Fixed In Version: kernel-5.5.6-201.fc31 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-02-29 03:21:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 245418, 1775943, 1799090, 1799091, 1814237    

Description Miro Hrončok 2020-01-31 19:52:03 UTC
Description of problem:
Package python27 fails to build from source in Fedora rawhide on aarch64:

...
test_args_error (test.test_io.CBufferedWriterTest) ... ok
test_close_error_on_close (test.test_io.CBufferedWriterTest) ... ok
test_constructor (test.test_io.CBufferedWriterTest) ... make: *** [Makefile:894: test] Segmentation fault (core dumped)
error: Bad exit status from /var/tmp/rpm-tmp.e12Exf (%check)

This is reproducible.

Version-Release: 2.7.17-1.fc32

Steps to Reproduce:
$ fedpkg clone python27
$ cd python27
$ fedpkg build

Additional info:
This package is tracked by Koschei. See:
https://koschei.fedoraproject.org/package/python27

The first failing build is https://koschei.fedoraproject.org/build/7736139

It includes gcc 10 and an glibc update.

Comment 1 Miro Hrončok 2020-02-03 10:42:16 UTC
python36:


Fatal Python error: Segmentation fault
Current thread 0x0000ffff73fff1e0 (most recent call first):
  File "/builddir/build/BUILD/Python-3.6.10/Lib/http/client.py", line 622 in _safe_read
  File "/builddir/build/BUILD/Python-3.6.10/Lib/http/client.py", line 472 in read
  File "/builddir/build/BUILD/Python-3.6.10/Lib/test/test_wsgiref.py", line 293 in run_client
  File "/builddir/build/BUILD/Python-3.6.10/Lib/threading.py", line 864 in run
  File "/builddir/build/BUILD/Python-3.6.10/Lib/threading.py", line 916 in _bootstrap_inner
  File "/builddir/build/BUILD/Python-3.6.10/Lib/threading.py", line 884 in _bootstrap
Thread 0x0000ffffaa47dcc0 (most recent call first):
  File "/builddir/build/BUILD/Python-3.6.10/Lib/socketserver.py", line 803 in write
  File "/builddir/build/BUILD/Python-3.6.10/Lib/wsgiref/handlers.py", line 453 in _write
  File "/builddir/build/BUILD/Python-3.6.10/Lib/wsgiref/handlers.py", line 279 in write
  File "/builddir/build/BUILD/Python-3.6.10/Lib/wsgiref/handlers.py", line 180 in finish_response
  File "/builddir/build/BUILD/Python-3.6.10/Lib/wsgiref/handlers.py", line 138 in run
  File "/builddir/build/BUILD/Python-3.6.10/Lib/wsgiref/simple_server.py", line 133 in handle
  File "/builddir/build/BUILD/Python-3.6.10/Lib/socketserver.py", line 724 in __init__
  File "/builddir/build/BUILD/Python-3.6.10/Lib/socketserver.py", line 364 in finish_request
  File "/builddir/build/BUILD/Python-3.6.10/Lib/socketserver.py", line 351 in process_request
  File "/builddir/build/BUILD/Python-3.6.10/Lib/socketserver.py", line 320 in _handle_request_noblock
  File "/builddir/build/BUILD/Python-3.6.10/Lib/socketserver.py", line 300 in handle_request
  File "/builddir/build/BUILD/Python-3.6.10/Lib/test/test_wsgiref.py", line 298 in test_interrupted_write
  File "/builddir/build/BUILD/Python-3.6.10/Lib/unittest/case.py", line 622 in run
  File "/builddir/build/BUILD/Python-3.6.10/Lib/unittest/case.py", line 670 in __call__
  File "/builddir/build/BUILD/Python-3.6.10/Lib/unittest/suite.py", line 122 in run
  File "/builddir/build/BUILD/Python-3.6.10/Lib/unittest/suite.py", line 84 in __call__
  File "/builddir/build/BUILD/Python-3.6.10/Lib/unittest/suite.py", line 122 in run
  File "/builddir/build/BUILD/Python-3.6.10/Lib/unittest/suite.py", line 84 in __call__
  File "/builddir/build/BUILD/Python-3.6.10/Lib/unittest/suite.py", line 122 in run
  File "/builddir/build/BUILD/Python-3.6.10/Lib/unittest/suite.py", line 84 in __call__
  File "/builddir/build/BUILD/Python-3.6.10/Lib/unittest/runner.py", line 176 in run
  File "/builddir/build/BUILD/Python-3.6.10/Lib/test/support/__init__.py", line 1921 in _run_suite
  File "/builddir/build/BUILD/Python-3.6.10/Lib/test/support/__init__.py", line 2017 in run_unittest
  File "/builddir/build/BUILD/Python-3.6.10/Lib/test/libregrtest/runtest.py", line 178 in test_runner
  File "/builddir/build/BUILD/Python-3.6.10/Lib/test/libregrtest/runtest.py", line 182 in runtest_inner
  File "/builddir/build/BUILD/Python-3.6.10/Lib/test/libregrtest/runtest.py", line 127 in runtest
  File "/builddir/build/BUILD/Python-3.6.10/Lib/test/libregrtest/main.py", line 407 in run_tests_sequential
  File "/builddir/build/BUILD/Python-3.6.10/Lib/test/libregrtest/main.py", line 514 in run_tests
  File "/builddir/build/BUILD/Python-3.6.10/Lib/test/libregrtest/main.py", line 617 in _main
  File "/builddir/build/BUILD/Python-3.6.10/Lib/test/libregrtest/main.py", line 582 in main
  File "/builddir/build/BUILD/Python-3.6.10/Lib/test/libregrtest/main.py", line 638 in main
  File "/builddir/build/BUILD/Python-3.6.10/Lib/test/regrtest.py", line 46 in _main
  File "/builddir/build/BUILD/Python-3.6.10/Lib/test/regrtest.py", line 50 in <module>
  File "/builddir/build/BUILD/Python-3.6.10/Lib/runpy.py", line 85 in _run_code
  File "/builddir/build/BUILD/Python-3.6.10/Lib/runpy.py", line 193 in _run_module_as_main
/var/tmp/rpm-tmp.td1Zds: line 67: 348542 Segmentation fault      (core dumped) WITHIN_PYTHON_RPM_BUILD= LD_LIBRARY_PATH=$ConfDir $ConfDir/python -m test.regrtest -wW --slowest --findleaks -x test_distutils -x test_bdist_rpm -x test_gdb -x test_faulthandler


python35:
test_constructor (test.test_io.CBufferedWriterTest) ... Fatal Python error: Segmentation fault
Current thread 0x0000ffffadf43c80 (most recent call first):
/var/tmp/rpm-tmp.JpDd3o: line 42: 955334 Segmentation fault      (core dumped) WITHIN_PYTHON_RPM_BUILD= LD_LIBRARY_PATH=$ConfDir $ConfDir/python -m test.regrtest --verbose --findleaks -x test_distutils -x test_faulthandler -x test_gdb
error: Bad exit status from /var/tmp/rpm-tmp.JpDd3o (%check)


python34:

test_constructor (test.test_io.CBufferedWriterTest) ... Fatal Python error: Segmentation fault
Current thread 0x0000ffff8d389c60 (most recent call first):
  File "/builddir/build/BUILD/Python-3.4.10/Lib/unittest/case.py", line 200 in handle
  File "/builddir/build/BUILD/Python-3.4.10/Lib/unittest/case.py", line 745 in assertRaises
  File "/builddir/build/BUILD/Python-3.4.10/Lib/test/test_io.py", line 1446 in test_constructor
  File "/builddir/build/BUILD/Python-3.4.10/Lib/unittest/case.py", line 618 in run
  File "/builddir/build/BUILD/Python-3.4.10/Lib/unittest/case.py", line 666 in __call__
  File "/builddir/build/BUILD/Python-3.4.10/Lib/unittest/suite.py", line 122 in run
  File "/builddir/build/BUILD/Python-3.4.10/Lib/unittest/suite.py", line 84 in __call__
  File "/builddir/build/BUILD/Python-3.4.10/Lib/unittest/suite.py", line 122 in run
  File "/builddir/build/BUILD/Python-3.4.10/Lib/unittest/suite.py", line 84 in __call__
  File "/builddir/build/BUILD/Python-3.4.10/Lib/unittest/suite.py", line 122 in run
  File "/builddir/build/BUILD/Python-3.4.10/Lib/unittest/suite.py", line 84 in __call__
  File "/builddir/build/BUILD/Python-3.4.10/Lib/unittest/runner.py", line 168 in run
  File "/builddir/build/BUILD/Python-3.4.10/Lib/test/support/__init__.py", line 1779 in _run_suite
  File "/builddir/build/BUILD/Python-3.4.10/Lib/test/support/__init__.py", line 1813 in run_unittest
  File "/builddir/build/BUILD/Python-3.4.10/Lib/test/regrtest.py", line 1278 in test_runner
/var/tmp/rpm-tmp.Pf2Vjj: line 41: 1623633 Segmentation fault      (core dumped) WITHIN_PYTHON_RPM_BUILD= LD_LIBRARY_PATH=$ConfDir $ConfDir/python -m test.regrtest --verbose --findleaks -x test_distutils -x test_faulthandler -x test_gdb -x test_venv

Comment 2 Victor Stinner 2020-02-10 09:18:10 UTC
Quick check with unpatched Python 2.7.17 (from python.org) compiled with gcc -O0 (disable all compiler optimizations): the whole test suite pass, only test_ctypes fails (see below). So this issue sounds like a compiler (GCC) issue.

I tested gcc-10.0.1-0.7.fc32.aarch64. The latest build (2020-01-23) used an old gcc 10.0.1-0.4.fc32: https://koschei.fedoraproject.org/package/python27

Maybe the fix has been fixed between gcc 10.0.1-0.4.fc32 and gcc-10.0.1-0.7.fc32.aarch64. Or maybe the issue comes from compiler optimizations.

--

test_ctypes failure:

test test_ctypes failed -- Traceback (most recent call last):
  File "/Python-2.7.17/Lib/ctypes/test/test_win32.py", line 130, in test_struct_by_value
    self.assertEqual(ret.left, left.value)
AssertionError: -200 != 10

* https://bugzilla.redhat.com/show_bug.cgi?id=1174037
* https://bugs.python.org/issue32203
* https://bitbucket.org/cffi/cffi/issues/312/tests-failed-with-armv8
* https://bugs.gentoo.org/610626

Comment 3 Victor Stinner 2020-02-10 09:28:50 UTC
I created scratch builds:

* python27-2.7.17-2.fc32.src.rpm: https://koji.fedoraproject.org/koji/taskinfo?taskID=41445285
* python36-3.6.10-2.fc32.src.rpm: https://koji.fedoraproject.org/koji/taskinfo?taskID=41445241

Comment 4 Victor Stinner 2020-02-10 09:44:53 UTC
I compiled Python 3.6.10 (tarball from python.org) with "./configure && make": gcc -O3 (no PGO nor LTO): test_io pass successfully, but test_ctypes.test_callbacks() does crash with SIGILL:

<mock-chroot> sh-5.0# gdb -args ./python -m test -v test_ctypes 
GNU gdb (GDB) Fedora 9.1-1.fc32
(...)
(gdb) run
Starting program: /builddir/Python-3.6.10/python -m test -v test_ctypes
 
== CPython 3.6.10 (default, Feb 10 2020, 09:29:03) [GCC 10.0.1 20200130 (Red Hat 10.0.1-0.7)]
== Linux-5.3.15-300.fc31.aarch64-aarch64-with-fedora-32-Rawhide little-endian
(...)
test_subclass (ctypes.test.test_arrays.ArrayTestCase) ... ok
test_byval (ctypes.test.test_as_parameter.AsParamPropertyWrapperTestCase) ... ok
test_callbacks (ctypes.test.test_as_parameter.AsParamPropertyWrapperTestCase) ... 
Program received signal SIGILL, Illegal instruction.
0x0000fffff7fe5058 in ?? ()
    
(gdb) where
#0  0x0000fffff7fe5058 in ?? ()
#1  0x0000ffffe9799000 in ?? ()
#2  0x0000000000000010 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Comment 5 Victor Stinner 2020-02-10 09:54:47 UTC
> test_ctypes.test_callbacks() does crash with SIGILL

Florian Weimer wrote a fix one year ago which is part of libffi 3.3 release, but Fedora Rawhide still provides an old libffi-3.1-24.fc32.aarch64 (libffi 3.1 was released in 2014).

* https://github.com/libffi/libffi/commit/44a6c28545186d78642487927952844156fc7ab5
* https://github.com/libffi/libffi/issues/470
* https://bugs.python.org/issue36024

Comment 6 Victor Stinner 2020-02-10 10:06:50 UTC
> Florian Weimer wrote a fix one year ago which is part of libffi 3.3 release, but Fedora Rawhide still provides an old libffi-3.1-24.fc32.aarch64 (libffi 3.1 was released in 2014).

bz#873990 proposes to upgrade to libffi 3.3.

Comment 7 Victor Stinner 2020-02-10 10:27:08 UTC
> python27-2.7.17-2.fc32.src.rpm: https://koji.fedoraproject.org/koji/taskinfo?taskID=41445285

test_ctypes.test_constructor() crashed on Aarch64. "[GCC 10.0.1 20200130 (Red Hat 10.0.1-0.7)"

> python36-3.6.10-2.fc32.src.rpm: https://koji.fedoraproject.org/koji/taskinfo?taskID=41445241

Build successfully on Aarch64.

Comment 8 Victor Stinner 2020-02-10 10:54:27 UTC
> test_ctypes.test_constructor() crashed on Aarch64. "[GCC 10.0.1 20200130 (Red Hat 10.0.1-0.7)"

I fail to reproduce this issue when I build Python manually :-( I tested:

Python 2.7.17 (default, Feb 10 2020, 10:24:12) 
[GCC 10.0.1 20200130 (Red Hat 10.0.1-0.7)] on linux2

=> gcc-10.0.1-0.7.fc32.aarch64

Commands:

tar -xf Python-2.7.17.tar.xz 
cd Python-2.7.17

export 'CFLAGS=-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -fasynchronous-unwind-tables -fstack-clash-protection -D_GNU_SOURCE -fPIC -fwrapv'
export 'OPT=-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -fasynchronous-unwind-tables -fstack-clash-protection -D_GNU_SOURCE -fPIC -fwrapv'
export 'LDFLAGS=-Wl,-z,relro -Wl,--as-needed  -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld'

./configure --build=aarch64-redhat-linux-gnu --host=aarch64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --enable-ipv6 --enable-shared --enable-unicode=ucs4 --with-dbmliborder=gdbm:ndbm:bdb --with-system-expat --with-system-ffi --with-dtrace --with-tapset-install-dir=/usr/share/systemtap/tapset

make 'EXTRA_CFLAGS=-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -fasynchronous-unwind-tables -fstack-clash-protection -D_GNU_SOURCE -fPIC -fwrapv ' -j5

LD_LIBRARY_PATH=$PWD ./python -m test -v test_ctypes 
LD_LIBRARY_PATH=$PWD ./python -m test -v test_io

=> both tests pass successfully

Comment 9 Victor Stinner 2020-02-10 12:22:07 UTC
I failed to reproduce the crash with: "rpmbuild --rebuild python27-2.7.17-2.fc32.src.rpm", all tests passed.

Comment 10 Ben Cotton 2020-02-11 16:33:54 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 32 development cycle.
Changing version to 32.

Comment 11 Miro Hrončok 2020-02-12 11:56:46 UTC
This now blocks a setuptools upgrade, trough https://src.fedoraproject.org/rpms/python27/pull-request/6

Comment 12 Victor Stinner 2020-02-13 10:51:50 UTC
New attempt using mock --force-arch.

On the host:
---
sudo dnf install qemu-user-static
mock -r fedora-32-aarch64 --forcearch aarch64 --init    
mock -r fedora-32-aarch64 --forcearch aarch64 --install dnf
mock -r fedora-32-aarch64 --forcearch aarch64 --enable-network --shell
---

In the mock container:
---
dnf install -y fedpkg
# use git with HTTPS, since fedpkg wants to use Kerberos,
# but I failed to get a fedoraproject.org Kerberos token in mock
git clone https://src.fedoraproject.org/rpms/python27.git
cd python27

dnf install -y dnf-plugins-core
dnf builddep -y python27

fedpkg --release master srpm
rpmbuild --rebuild python*.src.rpm
---

Comment 13 Miro Hrončok 2020-02-13 12:31:11 UTC
FTR here is a scratchbuild that packages the entire builddir for inspection:

https://koji.fedoraproject.org/koji/taskinfo?taskID=41477522

Unpack it in an empty folder via:

rpm2cpio python27-2.7.17-2.fc33.aarch64.rpm | cpio -idmv


This is how it was created (in case it is garbage collected):

-  WITHIN_PYTHON_RPM_BUILD= EXTRATESTOPTS="$EXTRATESTOPTS" make test
+  WITHIN_PYTHON_RPM_BUILD= EXTRATESTOPTS="$EXTRATESTOPTS" make test || (cd && tar -czvf %{buildroot}/builddir.tar.gz %{_builddir})

...

 %files
+/builddir.tar.gz

Comment 14 Miro Hrončok 2020-02-13 15:43:28 UTC
On aarch64-test02.fedorainfracloud.org f31 (aarch64 vm), with up to date mock 2.0, I did not reproduce the issue (rawhide mockbuild --enablerepo=local).

Comment 15 Victor Stinner 2020-02-13 17:48:41 UTC
I managed to reproduce the crash in a Fedora 31 VM (which is running on AArch64 baremetal):

dnf install fedpkg -y
fedpkg clone python27 --anonymous
cd python27
fedpkg mockbuild --mock-config fedora-rawhide-aarch64 --no-clean-all --enablerepo=local

Enter the mock with --shell:

cd /builddir/build/BUILD/Python-2.7.17/build/optimized
# command extract from: make test
LD_LIBRARY_PATH=/builddir/build/BUILD/Python-2.7.17/build/optimized ./python -Wd -3 -E -tt  /builddir/build/BUILD/Python-2.7.17/Lib/test/regrtest.py -l  test_io

Comment 16 Victor Stinner 2020-02-14 08:50:04 UTC
Some notes on this issue.

== Hardware AArch64 vs emulated AArch64 ==

Reproduced on baremetal:

* Crashes have been seen on Koji which builds packages in a Fedora Rawhide mock container hosted on Fedora 31 which runs on AArch64 baremetal: Fedora 31 VM => Fedora Rawhide container
* I reproduced the issue in a Fedora Rawhide mock container running on Fedora 31 VM which runs on a Centos 8 which runs on AArch64 baremetal: Centos 8 => Fedora 31 VM => Fedora Rawhide container

Not reproduced on emulated AArch64:

* I failed to reproduce the issue on aarch64-test01.fedorainfracloud.org which is an AArch64 VM: <unknown OS but likely x86-64> => Fedora 31 AArch64 VM => Fedora Rawhide container
* Miro failed to reproduce the issue on aarch64-test02.fedorainfracloud.org which is also an AArch64 VM: <unknown OS but likely x86-64> => Fedora 31 AArch64 VM => Fedora Rawhide container
* I failed to reproduce the issue on an AArch64 container created by mock --force-arch=aarch64 running on my x86-64 laptop: Fedora 31 (x86-64) => Fedora Rawhide AArch64 container (QEMU User Mode)

== Packages and tests ==

Crashs seen in packages:

* python27
* python34
* python35
* python36

It seems like building python27 trigger the crash at each package build (like 3 failures on 3 builds: 100%), whereas it occurs randomly when building the python36 package (1 failure on between 3 and 5 attempts, I don't recall, sorry).

Tests which crashed:

* python27: test_io.CBufferedWriterTest.test_constructor()
* python34: test_io.CBufferedWriterTest.test_constructor()
* python35: test.test_io.CBufferedWriterTest.test_constructor()
* python36:

  * test_wsgiref: test_interrupted_write()
  * test_random.test_choices_algorithms()


== How tests are run ==

The python27 and python36 packages run all test files in the same process (-jN option not used): "Run tests sequentially". I didn't check python34 and python35, but they are likely doing the same.


python27:

LD_LIBRARY_PATH=/builddir/build/BUILD/Python-2.7.17/build/optimized ./python -Wd -3 -E -tt  /builddir/build/BUILD/Python-2.7.17/Lib/test/regrtest.py -l --verbose -x test_gdb
== CPython 2.7.17 (default, Jan 30 2020, 00:00:00) [GCC 10.0.1 20200130 (Red Hat 10.0.1-0.7)]
==   Linux-5.4.17-200.fc31.aarch64-aarch64-with-fedora-32-Rawhide little-endian
==   /builddir/build/BUILD/Python-2.7.17/build/test_python_26593
== CPU count: 8
Run tests sequentially
(...)
test_readonly_attributes (test.test_io.PyBufferedReaderTest) ... ok
test_repr (test.test_io.PyBufferedReaderTest) ... ok
test_threads (test.test_io.PyBufferedReaderTest) ... skipped "resource u'cpu' is not enabled"
test_uninitialized (test.test_io.PyBufferedReaderTest) ... ok
test_args_error (test.test_io.CBufferedWriterTest) ... ok
test_close_error_on_close (test.test_io.CBufferedWriterTest) ... ok
test_constructor (test.test_io.CBufferedWriterTest) ... make: *** [Makefile:894: test] Segmentation fault (core dumped)
error: Bad exit status from /var/tmp/rpm-tmp.2GniPt (%check)


python36:

STARTING: CHECKING OF PYTHON FOR CONFIGURATION: optimized
+ WITHIN_PYTHON_RPM_BUILD=
+ LD_LIBRARY_PATH=/builddir/build/BUILD/Python-3.6.10/build/optimized
+ /builddir/build/BUILD/Python-3.6.10/build/optimized/python -m test.regrtest -wW --slowest --findleaks -x test_distutils -x test_bdist_rpm -x test_gdb -x test_faulthandler
== CPython 3.6.10 (default, Jan 30 2020, 00:00:00) [GCC 10.0.1 20200130 (Red Hat 10.0.1-0.7)]
== Linux-5.4.17-200.fc31.aarch64-aarch64-with-fedora-32-Rawhide little-endian
== cwd: /builddir/build/BUILD/Python-3.6.10/build/optimized/build/test_python_26844
== CPU count: 8
== encodings: locale=UTF-8, FS=utf-8
Run tests sequentially
(...)
0:14:44 load avg: 0.62 [267/405] test_random
Fatal Python error: Segmentation fault

Current thread 0x0000ffff97239cc0 (most recent call first):
  File "/builddir/build/BUILD/Python-3.6.10/Lib/random.py", line 356 in choices
  File "/builddir/build/BUILD/Python-3.6.10/Lib/test/test_random.py", line 696 in test_choices_algorithms
  (...)


== Reproduce the issue manually ==

The python27 bug is the easiest to reproduce manually:

cd /builddir/build/BUILD/Python-2.7.17/build/optimized
LD_LIBRARY_PATH=/builddir/build/BUILD/Python-2.7.17/build/optimized ./python -Wd -3 -E -tt  /builddir/build/BUILD/Python-2.7.17/Lib/test/regrtest.py -l --verbose test_io

Output:

(...)
test_writelines_userlist (test.test_io.CBufferedRandomTest) ... ok
test_writes (test.test_io.CBufferedRandomTest) ... ./bug.sh: line 1:   117 Segmentation fault      (core dumped) LD_LIBRARY_PATH=/builddir/build/BUILD/Python-2.7.17/build/optimized ./python -Wd -3 -E -tt /builddir/build/BUILD/Python-2.7.17/Lib/test/regrtest.py -l --verbose test_io


Sadly, any subtle change in the command line, memory allocators, source code, etc. makes the bug disappear :-( It makes the bug really hard to investigate.


== Debug ==

gdb traceback in python27:

test_writes (test.test_io.CBufferedRandomTest) ... 
Program received signal SIGSEGV, Segmentation fault.
0x0000fffff7cba478 in _int_malloc () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install bzip2-libs-1.0.8-2.fc32.aarch64 openssl-libs-1.1.1d-6.fc32.aarch64 zlib-1.2.11-21.fc32.aarch64
(gdb) where
#0  0x0000fffff7cba478 in _int_malloc () from /lib64/libc.so.6
#1  0x0000fffff7cbb29c in malloc () from /lib64/libc.so.6
#2  0x0000fffff7e48ca0 in string_repeat (a=0xffffe9fff290, n=<optimized out>) at /builddir/build/BUILD/Python-2.7.17/Objects/stringobject.c:1115
#3  0x0000fffff7e9d2d0 in PyEval_EvalFrameEx (
    f=f@entry=Frame 0xaaaaaad6af30, for file /builddir/build/BUILD/Python-2.7.17/Lib/test/test_io.py, line 1185, in check_writes (self=<CBufferedRandomTest(_resultForDoCleanups=<TextTestResult(_original_stdout=<file at remote 0xfffff7af51e0>, dots=False, skipped=[(<CIOTest(_resultForDoCleanups=<...>, _type_equality_funcs={<type at remote 0xfffff7f7ce18>: 'assertTupleEqual', <type at remote 0xfffff7f81e78>: 'assertMultiLineEqual', <type at remote 0xfffff7f74690>: 'assertListEqual', <type at remote 0xfffff7f78b10>: 'assertSetEqual', <type at remote 0xfffff7f78c98>: 'assertSetEqual', <type at remote 0xfffff7f767e8>: 'assertDictEqual'}, _testMethodDoc=None, _testMethodName='test_unbounded_file', _cleanups=[]) at remote 0xffffe9fa4250>, 'test can only run in a 32-bit address space'), (<PyIOTest(_resultForDoCleanups=<...>, _type_equality_funcs={<type at remote 0xfffff7f7ce18>: 'assertTupleEqual', <type at remote 0xfffff7f81e78>: 'assertMultiLineEqual', <type at remote 0xfffff7f74690>: 'assertListEqual', <type at remote 0xfffff7f...(truncated), throwflag=throwflag@entry=0)
    at /builddir/build/BUILD/Python-2.7.17/Python/ceval.c:1485
(...)

A crash in malloc() is very likely a memory overflow which occurred "previously".

I tried different things to make the bug more likely or to get more information when it happens:

* set MALLOC_CHECK_=2 or MALLOC_CHECK_=3 environment variable: enable glibc memory debugger
* use Valgrind: pymalloc allocator of python27 emits tons of false alarm. python27 should be rebuilt with ./configure --with-valgrind and Valgrind should use Misc/valgrind.suppr suppression file of Python. But I had troubles to reproduce the issue if I modify the code. I should try again.
* Use python36 which has builtin memory debugger which can be enabled with PYTHONMALLOC=debug at runtime (no need to rebuild). Sadly, the bug is really hard to reproduce on python36. On python36, -X dev command line option can be used to enable PYTHONMALLOC=debug.

Comment 17 Victor Stinner 2020-02-14 16:10:14 UTC
I can reproduce the crash with the following commands which don't use the Fedora package at all, only Python tarball from python.org:
---
set -e -x
curl -O https://www.python.org/ftp/python/2.7.17/Python-2.7.17.tar.xz
tar -xf Python-2.7.17.tar.xz
cd Python-2.7.17
mkdir build
cd build
../configure -C \
 --enable-ipv6 \
 --enable-shared \
 --enable-unicode=ucs4 \
 --with-system-expat \
 --with-system-ffi \
 CC=gcc \
 'LDFLAGS=-specs=/usr/lib/rpm/redhat/redhat-hardened-ld'
make clean
CFLAGS='-fno-strict-aliasing -O2 -g -pipe -Wp,-D_FORTIFY_SOURCE=2 -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-clash-protection -D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG'
make EXTRA_CFLAGS="$CFLAGS" -j12
LD_LIBRARY_PATH=$PWD ./python -m test -v test_io
---

Reproduced with versions:
---
# rpm -q gcc glibc redhat-rpm-config
gcc-9.2.1-1.fc32.3.aarch64
glibc-2.30.9000-29.fc32.aarch64
redhat-rpm-config-147-1.fc32.noarch
---

Comment 18 Victor Stinner 2020-02-14 16:31:08 UTC
Even more simplified commands to configure + make Python:

../configure -C --enable-shared --enable-unicode=ucs4 --with-system-expat --with-system-ffi  CC=gcc OPT=''
make -j12 EXTRA_CFLAGS='-fno-strict-aliasing -O2 -fwrapv -DNDEBUG'
LD_LIBRARY_PATH=$PWD ./python -m test -v test_io

Comment 19 Victor Stinner 2020-02-14 16:49:30 UTC
Oh, I can now reproduce the crash on Fedora 31 AArch64 as well, using the commands of Comment 17 + Comment 18,

(...)
[vstinner@python-builder-fedora-stable-aarch64 build]$ LD_LIBRARY_PATH=$PWD ./python -m test -v test_io
(...)
test_writes (test.test_io.CBufferedRandomTest) ... Segmentation fault (core dumped)

[vstinner@python-builder-fedora-stable-aarch64 build]$ rpm -q gcc glibc redhat-rpm-config
gcc-9.2.1-1.fc31.aarch64
glibc-2.30-10.fc31.aarch64
redhat-rpm-config-142-1.fc31.noarch

[vstinner@python-builder-fedora-stable-aarch64 build]$ uname -a
Linux python-builder-fedora-stable-aarch64 5.4.17-200.fc31.aarch64 #1 SMP Sat Feb 1 18:45:35 UTC 2020 aarch64 aarch64 aarch64 GNU/Linux

Comment 20 Victor Stinner 2020-02-14 16:59:59 UTC
Valgrind doesn't see any error. I configure without pymalloc (Python memory allocator), so glibc malloc() is used directly.

<mock-chroot> sh-5.0# LD_LIBRARY_PATH=$PWD valgrind --log-file=valgrind.log ./python -m test -v test_io
(...)

<mock-chroot> sh-5.0# cat valgrind.log 
==40376== Memcheck, a memory error detector
==40376== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==40376== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==40376== Command: ./python -m test -v test_io
==40376== Parent PID: 2
==40376== 
==40376== 
==40376== HEAP SUMMARY:
==40376==     in use at exit: 4,051,358 bytes in 24,065 blocks
==40376==   total heap usage: 11,690,828 allocs, 11,666,763 frees, 1,197,486,256 bytes allocated
==40376== 
==40376== LEAK SUMMARY:
==40376==    definitely lost: 0 bytes in 0 blocks
==40376==    indirectly lost: 0 bytes in 0 blocks
==40376==      possibly lost: 1,695,882 bytes in 8,677 blocks
==40376==    still reachable: 2,355,476 bytes in 15,388 blocks
==40376==         suppressed: 0 bytes in 0 blocks
==40376== Rerun with --leak-check=full to see details of leaked memory
==40376== 
==40376== For lists of detected and suppressed errors, rerun with: -s
==40376== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)


--

Using jemalloc, the test doesn't crash anymore:

<mock-chroot> sh-5.0# LD_PRELOAD=/usr/lib64/libjemalloc.so.2 LD_LIBRARY_PATH=$PWD ./python -m test -v test_io
(...)
Tests result: SUCCESS

--

Using MALLOC_CHECK_=3, the bug hides as well:

<mock-chroot> sh-5.0# MALLOC_CHECK_=3 LD_LIBRARY_PATH=$PWD ./python -m test -v test_io
(...)
Tests result: SUCCESS

Comment 21 Victor Stinner 2020-02-14 17:11:18 UTC
Simplified commands which reproduces the issue on Fedora 31 AArch64:

./configure OPT="-O3 -ggdb" --without-pymalloc && make clean && make -j10 && ./python -m test -v test_io

But test_io doesn't crash if Python is built with gcc -O2. It may be a compiler bug.

Note: I never saw this bug on other architectures. It seems specific to AArch64.

Comment 22 Victor Stinner 2020-02-14 23:55:34 UTC
New reproducer full script:
---
set -e -x

# disable ASLR
sudo bash -c 'echo 0 > /proc/sys/kernel/randomize_va_space'

test -e Python-2.7.17.tar.xz || curl -O https://www.python.org/ftp/python/2.7.17/Python-2.7.17.tar.xz
tar -xf Python-2.7.17.tar.xz
cd Python-2.7.17
mkdir build
cd build
../configure -C OPT="-O0 -ggdb" --without-pymalloc
make -j10

cat > tests << EOF
test.test_io.BufferedRandomTest.test_threads
test.test_io.CBufferedRandomTest.test_constructor
test.test_io.CBufferedRandomTest.test_writes_and_seeks
test.test_io.PyBufferedWriterTest.test_writes_and_seeks
EOF

export PYTHONHASHSEED=424969

./python -m test --matchfile=tests -v test_io
# sometimes the first run behaves differently because of the creaton of .pyc files
./python -m test --matchfile=tests -v test_io
---

I disabled ASLR and set a fixed Python hash seed to reduce randomness. I also disabled SELinux, just in case.

Shorter script to reproduce the crash:
---
set -e -x
PYTHONHASHSEED=424969 ./python -m test -m test.test_io.CBufferedRandomTest.test_constructor -m test.test_io.CBufferedRandomTest.test_writes_and_seeks -m test.test_io.PyBufferedWriterTest.test_writes_and_seeks -v test_io
# sometimes the first run doesn't crash
PYTHONHASHSEED=424969 ./python -m test -m test.test_io.CBufferedRandomTest.test_constructor -m test.test_io.CBufferedRandomTest.test_writes_and_seeks -m test.test_io.PyBufferedWriterTest.test_writes_and_seeks -v test_io
---

Comment 23 Victor Stinner 2020-02-14 23:56:56 UTC
After a very long refactoring...


I manage to simply test_io (3409 lines of Python code, ignoring all imports) to just 3 malloc+free calls...

Reproducer C program:
---
#include <stdio.h>
#include <stdlib.h>

#define PY_SSIZE_T_MAX ((ssize_t)(((size_t)-1)>>1))

void my_alloc(size_t size)
{
        void *ptr;
        ptr = malloc(size);
        if (ptr != NULL) {
                printf("malloc(%zu) -> ok\n", size);
                free(ptr);
        }
        else {
                printf("malloc(%zu) -> FAIL\n", size);
        }
}

int main()
{
        int i;
        for(i=0; i<2; i++) {
                my_alloc(1170037);
        }

        my_alloc(PY_SSIZE_T_MAX);

        for(i=0; i<4; i++) {
                my_alloc(1170037);
        }

        printf("ok\n");
        return 0;
}
---

Example:
---
malloc(1170037) -> ok
malloc(1170037) -> ok
malloc(9223372036854775807) -> FAIL
Segmentation fault (core dumped)
---

Comment 24 Victor Stinner 2020-02-15 00:19:57 UTC
I can still reproduce the crash with glibc-2.30-4.fc31.aarch64 which is the oldest version available on Koji for Fedora 31: glibc-2.30-3.fc31 has been trashed.

Comment 25 Victor Stinner 2020-02-15 00:23:19 UTC
Centos 8 AArch64 doesn't seem to be affected: Comment 23 reproducer doesn't crash.

$ rpm -q glibc
glibc-2.28-72.el8_1.1.aarch64

Comment 26 Victor Stinner 2020-02-15 01:14:44 UTC
This bug was tricky to detect since Valgrind didn't complain, the bug was worked around when using jemalloc (hint!), and MALLOC_CHECK_ also hides the bug!

[vstinner@python-builder-fedora-stable-aarch64 ~]$ MALLOC_CHECK_=3 ./malloc 
malloc(1170037) -> ok
malloc(1170037) -> ok
malloc(9223372036854775807) -> FAIL
malloc(1170037) -> ok
malloc(1170037) -> ok
malloc(1170037) -> ok
malloc(1170037) -> ok
ok

Comment 27 Victor Stinner 2020-02-15 01:22:42 UTC
Internally, malloc(PY_SSIZE_T_MAX) calls mmap(PY_SSIZE_T_MAX) which fails and then sbrk(0x7fffffffffee2000) which also fails. After that, the next malloc() call does crash.

The problem is that the sbrk() calls *reduces* the size of the heap from 1,306,624 bytes to 135,168 bytes.

Comment 28 Fedora Release Engineering 2020-02-16 04:34:18 UTC
Dear Maintainer,

your package has not been built successfully in 32. Action is required from you.

If you can fix your package to build, perform a build in koji, and either create
an update in bodhi, or close this bug without creating an update, if updating is
not appropriate [1]. If you are working on a fix, set the status to ASSIGNED to
acknowledge this. Following the latest policy for such packages [2], your package
will be orphaned if this bug remains in NEW state more than 8 weeks.

A week before the mass branching of Fedora 33 according to the schedule [3],
any packages not successfully rebuilt at least on Fedora 31 will be
retired regardless of the status of this bug.

[1] https://fedoraproject.org/wiki/Updates_Policy
[2] https://docs.fedoraproject.org/en-US/fesco/Fails_to_build_from_source_Fails_to_install/
[3] https://fedoraproject.org/wiki/Releases/33/Schedule

Comment 29 Miro Hrončok 2020-02-16 15:26:58 UTC
Removing the F32FTBFS tracker, gcc builds just fine, it's other packages that don't build.

Comment 31 Florian Weimer 2020-02-18 12:33:24 UTC
Catalin Marinas posted a kernel patch (“mm: Avoid creating virtual address aliases in brk()/mmap()/mremap()”):

http://lists.infradead.org/pipermail/linux-arm-kernel/2020-February/712003.html

Comment 32 Victor Stinner 2020-02-18 12:34:31 UTC
This issue is a regression caused by the following kernel commit which landed in Linux kernel 5.4 which has been released at Nov 24, 2019:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ce18d171cb7368557e6498a3ce111d7d3dc03e4d

Koji was running Linux kernel 5.3 or older until mi-January 2020 (I don't know the exact day). We only started to notice the crash recently when we tried to rebuild python27 which reproduces the crash in a reliable way.

The first python27 build which failed ran at Jan 20, 2020 and it used a kernel 5.5.0 according to RPM installed in the buildroot (but the build doesn't provide all logs, so I'm not 100% sure). The previous (successful) python27 build was done in 2019-10-21.

Comment 33 Miro Hrončok 2020-02-19 16:22:45 UTC
The aarch64-test02.fedorainfracloud.org f31 aarch64 vm got an updated kernel, so we might be able to reproduce there now as well.

Comment 34 Jeremy Cline 2020-02-19 21:39:09 UTC
I've picked the fix up for f30 in kernel-5.4.21-100.fc30 along with today's Rawhide build (kernel-5.6.0-0.rc2.git2.1.fc33). It should also arrive in F31 via the rebase to v5.5.5.

Comment 35 Miro Hrončok 2020-02-20 07:35:04 UTC
Thanks, Jeremy!

Comment 36 Miro Hrončok 2020-02-23 08:16:34 UTC
Fedora Infra ticket to update the kernel on aarch64 Koji: https://pagure.io/fedora-infrastructure/issue/8677

Comment 37 Miro Hrončok 2020-02-23 18:30:29 UTC
(In reply to Miro Hrončok from comment #33)
> The aarch64-test02.fedorainfracloud.org f31 aarch64 vm got an updated
> kernel, so we might be able to reproduce there now as well.

FTR I was able to reproduce the crash on aarch64-test01.fedorainfracloud.org (not enough disk space on aarch64-test02) with kernel 5.4.19-200.fc31.aarch64.

Comment 38 Charalampos Stratakis 2020-02-24 15:40:28 UTC
There hasn't been a new fixed build for F31 as the fix was applied on top of the 5.5.5-200 rebase. Could a build be created?

Comment 39 Jeremy Cline 2020-02-24 15:53:20 UTC
v5.5.6 just got released so it'll be in today's kernel-5.5.6-200.fc31 build, sorry about that.

Comment 40 Fedora Update System 2020-02-25 12:29:20 UTC
FEDORA-2020-3cd64d683c has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2020-3cd64d683c

Comment 43 Florian Weimer 2020-02-25 18:04:43 UTC
Posted to oss-security: https://www.openwall.com/lists/oss-security/2020/02/25/6

Comment 44 Fedora Update System 2020-02-27 18:34:51 UTC
kernel-5.5.6-201.fc31, kernel-headers-5.5.6-200.fc31, kernel-tools-5.5.6-200.fc31 has been pushed to the Fedora 31 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-3cd64d683c

Comment 45 Fedora Update System 2020-02-29 03:21:09 UTC
kernel-5.5.6-201.fc31, kernel-headers-5.5.6-200.fc31, kernel-tools-5.5.6-200.fc31 has been pushed to the Fedora 31 stable repository. If problems still persist, please make note of it in this bug report.

Comment 46 Victor Stinner 2020-02-29 16:05:56 UTC
Using aarch64-test01.fedorainfracloud.org, I was able to reproduce the crash using Comment 23 program (C code using malloc) on 5.4.19-200.fc31.aarch64.

I confirm that I'm not longer able to reproduce the crash on 5.5.6-201.fc31.aarch64.

Thanks for the fix ;-)

Comment 47 Justin M. Forbes 2020-03-17 13:48:42 UTC
*** Bug 1814238 has been marked as a duplicate of this bug. ***