Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1788918 - librepo fails to build with Python 3.9 (segmentation fault in %check)
Summary: librepo fails to build with Python 3.9 (segmentation fault in %check)
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: librepo
Version: rawhide
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: ---
Assignee: Lukáš Hrázký
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: PYTHON39
TreeView+ depends on / blocked
 
Reported: 2020-01-08 11:30 UTC by Miro Hrončok
Modified: 2020-02-10 15:59 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-02-10 15:57:26 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Miro Hrončok 2020-01-08 11:30:16 UTC
librepo fails to build with Python 3.9.0a2.

...
2: test_download_packages_with_expectedsize (tests.test_yum_package_downloading.TestCaseYumPackagesDownloading) ... ok
2: test_download_packages_with_fastestmirror_enabled_1 (tests.test_yum_package_downloading.TestCaseYumPackagesDownloading) ... ok
2: test_download_packages_with_fastestmirror_enabled_2 (tests.test_yum_package_downloading.TestCaseYumPackagesDownloading) ... /builddir/build/BUILD/librepo-1.11.1/build-py3/tests/python/tests/run_nosetests.sh: line 1:   778 Segmentation fault      (core dumped) LD_LIBRARY_PATH="/builddir/build/BUILD/librepo-1.11.1/build-py3/librepo/:" PYTHONPATH="/builddir/build/BUILD/librepo-1.11.1/build-py3/librepo/python/python3" /usr/bin/python3 -m nose -s -v "/builddir/build/BUILD/librepo-1.11.1/tests/python/tests/"
(build hangs here)


For the build logs, see:
https://copr-be.cloud.fedoraproject.org/results/@python/python3.9/fedora-rawhide-x86_64/01141146-librepo/

For all our attempts to build librepo with Python 3.9, see:
https://copr.fedorainfracloud.org/coprs/g/python/python3.9/package/librepo/

Testing and mass rebuild of packages is happening in copr. You can follow these instructions to test locally in mock if your package builds with Python 3.9:
https://copr.fedorainfracloud.org/coprs/g/python/python3.9/

Let us know here if you have any questions.

Python 3.9 will be included in Fedora 33, but the initial bootstrapping has already started.
A build failure this early in the bootstrap sequence blocks us very much.

Comment 1 Lukáš Hrázký 2020-01-13 16:20:10 UTC
Hi Miro,

I'm looking into this and I kind of suspect the cause may be potentially some mess in python versions against which the python packages being used during the tests were built. Either that or possibly a python bug.

I've installed a systemd-nspawn container from the official rawhide repositories, added the copr repository with python 3.9 from a link above and installed python 3.9. I've reproduced the issue and got a full traceback (it is very long, btw., I've had over 100 frames one time and 246 frames another time). This is the important part:
#0  get_small_int (ival=2) at /usr/src/debug/python3-3.9.0~a2-1.fc32.x86_64/Objects/longobject.c:47
#1  PyLong_FromLong (ival=2) at /usr/src/debug/python3-3.9.0~a2-1.fc32.x86_64/Objects/longobject.c:313
#2  0x00007fffe9718878 in fastestmirror_callback (data=0x7fffe81866d0, stage=LR_FMSTAGE_DETECTION, ptr=0x7fffffff9628) at /home/lu/dev/librepo/librepo/python/handle-py.c:155

So PyLong_FromLong() is called with ival=2, which leaves little room for librepo to be at fault, except for memory corruption. For small numbers PyLong_FromLong() calls get_small_int(), which pulls the return value from some cache for small numbers. The code is here:
https://github.com/python/cpython/blob/d8efc1495194228c3a4cd472200275d6491d8e2d/Objects/longobject.c#L47

On this line, tstate is NULL (and it being dereferenced causes the segfault). Googling around, it seems one possibility for tstate being NULL is when a module tries to use a python library that is different than the one the interpreter uses.

That's where I'm at now, trying to figure out what's built against what might be quite tedious, I'm not sure how to proceed yet.

Is there a way to make sure different python library versions aren't in play here, e.g. by ensuring all the packages really are built against python 3.9? This setup with an ad-hoc repository lends itself to be susceptible to this kind of an issue.

Comment 2 Miro Hrončok 2020-01-14 00:48:35 UTC
The buildroot was:

Packages built from our copr and hence 99% sure Python 3.9 based:

file
file-libs
gdb-minimal
gpgme
gpgme-devel
libbrotli
libpwquality
libselinux
libselinux-devel
libxml2
libxml2-devel
python-pip-wheel
python-setuptools-wheel
python3
python3-babel
python3-click
python3-devel
python3-docutils
python3-flask
python3-gpg
python3-chardet
python3-idna
python3-imagesize
python3-itsdangerous
python3-jinja2
python3-libs
python3-markupsafe
python3-nose
python3-packaging
python3-pygments
python3-pyparsing
python3-pysocks
python3-pytz
python3-pyxattr
python3-requests
python3-rpm-generators
python3-setuptools
python3-six
python3-snowballstemmer
python3-sphinx
python3-sphinxcontrib-applehelp
python3-sphinxcontrib-devhelp
python3-sphinxcontrib-htmlhelp
python3-sphinxcontrib-jsmath
python3-sphinxcontrib-qthelp
python3-sphinxcontrib-serializinghtml
python3-sphinx-theme-alabaster
python3-urllib3
python3-werkzeug
rpm
rpm-build
rpm-build-libs
rpm-libs

Packages built from Koji and hence 100% sure Python 3.8 based (or non-Python or Python version agnostic):

adobe-mappings-cmap
adobe-mappings-cmap-deprecated
adobe-mappings-pdf
alternatives
annobin
atk
audit-libs
avahi-libs
basesystem
bash
binutils
binutils-gold
bzip2
bzip2-libs
ca-certificates
cairo
cairo-gobject
cmake
cmake-data
cmake-filesystem
cmake-rpm-macros
coreutils
coreutils-common
cpio
cpp
cracklib
crypto-policies
cups-libs
curl
cyrus-sasl-lib
dbus-libs
diffutils
doxygen
dwz
efi-srpm-macros
elfutils
elfutils-default-yama-scope
elfutils-libelf
elfutils-libs
emacs-filesystem
expat
fedora-gpg-keys
fedora-release
fedora-release-common
fedora-repos
fedora-repos-rawhide
filesystem
findutils
fontconfig
fontpackages-filesystem
fpc-srpm-macros
freetype
fribidi
gawk
gc
gcc
gd
gdbm-libs
gdk-pixbuf2
gdk-pixbuf2-modules
ghc-srpm-macros
glibc
glibc-common
glibc-devel
glibc-headers
glibc-minimal-langpack
glib2
glib2-devel
gmp
gnat-srpm-macros
gnupg2
gnutls
google-droid-sans-fonts
go-srpm-macros
graphite2
graphviz
grep
gtk-update-icon-cache
gtk2
gts
guile22
gzip
harfbuzz
hicolor-icon-theme
check
check-devel
info
isl
jasper-libs
jbigkit-libs
jbig2dec-libs
jsoncpp
kernel-headers
keyutils-libs
krb5-libs
lasi
lcms2
libacl
libarchive
libassuan
libassuan-devel
libatomic_ops
libattr
libattr-devel
libblkid
libblkid-devel
libcap
libcap-ng
libcom_err
libcroco
libcurl
libcurl-devel
libdatrie
libdb
libdb-utils
libfdisk
libffi
libffi-devel
libfontenc
libgcc
libgcrypt
libgomp
libgpg-error
libgpg-error-devel
libgs
libICE
libidn
libidn2
libijs
libimagequant
libjpeg-turbo
libksba
libmcpp
libmetalink
libmount
libmount-devel
libmpc
libnghttp2
libnsl2
libpaper
libpkgconf
libpng
libpsl
librsvg2
libsemanage
libsepol
libsepol-devel
libsigsegv
libSM
libsmartcols
libssh
libssh-config
libstdc++
libtasn1
libthai
libtiff
libtirpc
libtool-ltdl
libunistring
libusbx
libutempter
libuuid
libuv
libverto
libwebp
libXau
libXaw
libxcb
libXcomposite
libxcrypt
libxcrypt-devel
libXcursor
libXdamage
libXext
libXfixes
libXft
libXi
libXinerama
libXmu
libXpm
libXrandr
libXrender
libXt
libXxf86vm
libX11
libX11-common
libzstd
libzstd-devel
lua-libs
lz4-libs
make
mcpp
mpfr
ncurses
ncurses-base
ncurses-libs
netpbm
nettle
nim-srpm-macros
npth
ocaml-srpm-macros
openblas-srpm-macros
openjpeg2
openldap
openssl-devel
openssl-libs
pam
pango
patch
pcre
pcre-cpp
pcre-devel
pcre-utf16
pcre-utf32
pcre2
pcre2-devel
pcre2-utf16
pcre2-utf32
perl-Carp
perl-constant
perl-Errno
perl-Exporter
perl-File-Path
perl-interpreter
perl-IO
perl-libs
perl-macros
perl-parent
perl-PathTools
perl-Scalar-List-Utils
perl-Socket
perl-srpm-macros
perl-Text-Tabs+Wrap
perl-threads
perl-threads-shared
perl-Unicode-Normalize
pixman
pkgconf
pkgconf-m4
pkgconf-pkg-config
popt
publicsuffix-list-dafsa
python-rpm-macros
python-srpm-macros
python3-rpm-macros
p11-kit
p11-kit-trust
qt5-srpm-macros
readline
redhat-rpm-config
rhash
rust-srpm-macros
sed
setup
shadow-utils
shared-mime-info
sqlite-libs
subunit
subunit-devel
systemd-libs
tar
tzdata
unzip
urw-base35-bookman-fonts
urw-base35-c059-fonts
urw-base35-d050000l-fonts
urw-base35-fonts
urw-base35-fonts-common
urw-base35-gothic-fonts
urw-base35-nimbus-mono-ps-fonts
urw-base35-nimbus-roman-fonts
urw-base35-nimbus-sans-fonts
urw-base35-p052-fonts
urw-base35-standard-symbols-ps-fonts
urw-base35-z003-fonts
util-linux
which
xapian-core-libs
xorg-x11-fonts-ISO8859-1-100dpi
xorg-x11-font-utils
xorg-x11-server-utils
xz
xz-devel
xz-libs
zchunk-devel
zchunk-libs
zip
zlib
zlib-devel
zstd

Comment 3 Petr Viktorin 2020-01-14 08:05:49 UTC
Victor, this looks like something you've worked on. Do you have any insight to share re. tstate & the small int cache?

Comment 4 Lukáš Hrázký 2020-01-14 13:44:58 UTC
Miro, thanks for the list, but apart from going through it and making sure there aren't any obvious python packages built with 3.8 and used in the tests, I'm still not sure what I can do. And since it seems librepo is most likely not the culprit and I don't have an effective way to debug this, I'm reluctant to spend more time on it.

Also, do you have a way I could list the packages based on the repo they come from on my test system? It seems non-trivial to do that...

I'm kind of thinking that making sure python 3.8 isn't a part of the build chain for any packages might be the easiest way to verify the issue, but I have little clue about the build process, so I might be totally off here.

Petr, thanks for the link and I'm adding a needinfo on Victor, hoping he can help shed some light on this.

Comment 5 Miro Hrončok 2020-01-14 14:03:21 UTC
(In reply to Lukáš Hrázký from comment #4)
> Miro, thanks for the list, but apart from going through it and making sure
> there aren't any obvious python packages built with 3.8 and used in the
> tests, I'm still not sure what I can do.

I just posted it here for completeness.

Here's the thing: Any "normal" Python package would require the correct python(abi) or libpython version - and hence it would bail the build already when resolving the dependencies.

However, if an extension module is build to a nonstandard location, it is possible that a 3.8 built module is present in the buildroot. Apart from filenames, nothing gives us a hint about this. I will try to reproduce this in mock and search for 3.8 filenames in the buildroot and report back.

> Also, do you have a way I could list the packages based on the repo they
> come from on my test system? It seems non-trivial to do that...

You can grep the output of `dnf list`, third column is the repo.

> I'm kind of thinking that making sure python 3.8 isn't a part of the build
> chain for any packages might be the easiest way to verify the issue, but I
> have little clue about the build process, so I might be totally off here.

Python is often present in the buildroot of non-Python packages as well, so this would be extremely hard and produce a lot of false positives :(

Comment 6 Miro Hrončok 2020-01-14 14:09:47 UTC
I have only found the following outliers:

/usr/share/crypto-policies/python/**/*.cpython-38*.pyc
/usr/share/gcc-9/python/libstdcxx/**/*.cpython-38*.pyc
/usr/share/gdb/auto-load/usr/lib64/__pycache__/*.cpython-38*.pyc
/usr/share/glib-2.0/codegen/__pycache__/*.cpython-38*.pyc
/usr/share/glib-2.0/gdb/__pycache__/*.cpython-38*.pyc

All just bytecode cache, should not harm anything during the build, certainly not segfault librepo.

Comment 7 Lukáš Hrázký 2020-01-14 14:51:23 UTC
(In reply to Miro Hrončok from comment #5)
> Here's the thing: Any "normal" Python package would require the correct
> python(abi) or libpython version - and hence it would bail the build already
> when resolving the dependencies.

Yeah, I know this should ideally be handled in RPM dependencies...

> You can grep the output of `dnf list`, third column is the repo.

Right, I tried to do it with `dnf repoquery`... So I've just verified all python* packages on my test setup are from the python 3.9 repo and also all the files (that are in the standard location) are under paths containing the 3.9 version (i.e. /usr/lib/python3.9 and /usr/lib64/python3.9), so those should obviously be the correct version.

So unless there's something very wrong with the built packages (which I'd assume would manifest in other places), maybe this is a red herring and the null pointer is caused by something else. It would be a pain to debug and trying to minimize the reproducer also might not be much fun, I'll wait for Victor to chime in before attempting that.

Comment 8 Victor Stinner 2020-01-15 15:27:56 UTC
It seems like librepo doesn't use the Python C API properly. librepo must hold the GIL to use this API, whereas it seems like it's not the case. Commented code:

static void
fastestmirror_callback(void *data, LrFastestMirrorStages stage, void *ptr)
{
    ...
    if (!ptr) {
        pydata = Py_None;
    } else {
        switch (stage) {
        case LR_FMSTAGE_CACHELOADING:
        case LR_FMSTAGE_CACHELOADINGSTATUS:
        case LR_FMSTAGE_STATUS:
            pydata = PyStringOrNone_FromString((char *) ptr);
            break;
        case LR_FMSTAGE_DETECTION:
            pydata = PyLong_FromLong(*((long *) ptr));
            break;
        default:
            pydata = Py_None;
        }
    }

    /* PyEval_RestoreThread = take the GIL */
    EndAllowThreads(self->state);
    result = PyObject_CallFunction(self->fastestmirror_cb,
                        "(OlO)", user_data, (long) stage, pydata);
    Py_XDECREF(result);
    /* PyEval_SaveThread() = release the GIL */
    BeginAllowThreads(self->state);

    if (pydata != Py_None)
        Py_XDECREF(pydata);

    return;
}

https://github.com/rpm-software-management/librepo/blob/5ade10012d40fad27a7aa1085091c2c5fdd099ff/librepo/python/handle-py.c#L155

The GIL is only taken to call PyObject_CallFunction() and Py_XDECREF(result). For example, PyLong_FromLong() call at line 155 is done without holding the GIL.

To me, it's a bug in librepo.

The behavior changed in Python 3.9: get_small_int() now requires to get the Python thread state.

--

FYI I'm the one who modified get_small_int():
https://bugs.python.org/issue38858

For the rationale, see my article:
https://vstinner.github.io/cpython-pass-tstate.html

Comment 9 Lukáš Hrázký 2020-01-16 15:44:06 UTC
Thanks for the information, Victor. I've gone through the code and tried to make sure no Python API is called without holding the GIL. The tests are passing for me now.

PR: https://github.com/rpm-software-management/librepo/pull/179

Comment 10 Victor Stinner 2020-01-17 13:18:10 UTC
> Thanks for the information, Victor. I've gone through the code and tried to make sure no Python API is called without holding the GIL. The tests are passing for me now.

You can use a debug build of Python, like python3-debug, to check if the GIL is held when using the Python C API.

Comment 11 Lukáš Hrázký 2020-01-17 13:34:39 UTC
(In reply to Victor Stinner from comment #10)
> You can use a debug build of Python, like python3-debug, to check if the GIL
> is held when using the Python C API.

That's handy, but I'm not sure I'm using it correctly. I've installed the package, looked around briefly and haven't found much info on how to use it. So I used the `/usr/bin/python3-debug` binary to run the tests. It passed with my patch and I still got a segfault without my patch...

Comment 12 Miro Hrončok 2020-02-10 14:06:23 UTC
Can we please get a backport of https://github.com/rpm-software-management/librepo/pull/179 ?

Comment 13 amatej 2020-02-10 15:57:26 UTC
I did the backport to rawhide. :)

Comment 14 Miro Hrončok 2020-02-10 15:59:06 UTC
Thank you, awesome!


Note You need to log in before you can comment on or make changes to this bug.