Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1881915
Summary: | kstars segmentation fault in doActivate when starting | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Matt Fagnani <matthew.fagnani> |
Component: | kstars | Assignee: | Jeff Law <law> |
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 33 | CC: | astro-sig, jh.xsnrg, jreznik, kde-sig, law, lupinix.fedora, rdieter, than |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | kstars-3.4.3-4.fc33 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-09-27 00:16:53 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Matt Fagnani
2020-09-23 11:30:17 UTC
I can confirm the crash, I also get it with Qt 5.14.2 on F33. OK, I figured out that we have another LTO issue here. Will disable it for now and rebuild the package. Thank you very much for your report! Pinging Jeff Law about the hypothesis that we have an LTO related issue here. I don't see anything in this BZ which argues for or against this being an LTO issue. It is worth noting that most of the qt libraries are compiled without LTO for various reasons, including qt5-qtbase. From the backtrace info I would hazard a guess that there's a NULL instance pointer and we offset that by 8 to get the address of something within the object. I'll dig into that a little and see what I can find. But again, at this point there's nothing I've seen that argues for or against this being an LTO issue. When I build with "%define _lto_cflags %{nil}" kstars runs fine, without that: Crash on startup. I did not dive into the details though just tried it similar to https://bugzilla.redhat.com/show_bug.cgi?id=1880290 FEDORA-2020-b289261bdf has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-b289261bdf Thanks. Please include that kind of information in the bug report next time. It can significantly cut down on the amount of duplicated work. Right now I'd hazard a guess that qGuiApp isn't initialized. If that is indeed the case, then it could be LTO changing the order of static initializers across translation units within kstars. The C++ standard leaves the ordering of static initializers across TUs undefined and I've seen at least one other application make assumptions about static initializer ordering. *If* (and it's a big if at this point) that is the case, then this is a package bug. I'm still poking around. I'm close on c#7, but not 100% correct. I think it's actually two instances of QCoreApplication::self, one of which appears to be initialized, the other not. One instance comes from libqt5Core, the other from the main application. This is similar to another issue I'm tracking, but also subtly different. I won't know if they're the same until I look into how the final kstars executable and libqt5Core are composed. kstars-3.4.3-4.fc33 fixed the crash when starting. kstars has run normally. Thanks. When running kstars on Wayland the crash had 1 thread, while when running kstars on X the core dump had 2 threads. The functions appear to involve initialization. The following gdb trace of all threads shows the crash when running QT_QPA_PLATFORM=xcb kstars & in with kstars-3.4.3-3.fc33.x86_64 in Plasma on Wayland Core was generated by `kstars'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007f309d5bce5b in QScopedPointer<QObjectData, QScopedPointerDeleter<QObjectData> >::operator-> --Type <RET> for more, q to quit, c to continue without paging--c (this=<optimized out>) at kernel/qobject.cpp:3766 3766 void doActivate(QObject *sender, int signal_index, void **argv) (gdb) thread apply all bt full Thread 2 (Thread 0x7f308ab1c640 (LWP 8780)): #0 0x00007f309c2a3a0f in __GI___poll (fds=0x7f308ab1b9f8, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29 sc_ret = -516 sc_cancel_oldtype = 0 sc_ret = <optimized out> #1 0x00007f309aa52842 in _xcb_conn_wait.part.0 () from /lib64/libxcb.so.1 No symbol table info available. #2 0x00007f309aa541cc in xcb_wait_for_event () from /lib64/libxcb.so.1 No symbol table info available. #3 0x00007f308ae64c80 in QXcbEventQueue::run (this=0x55a2a4222750) at qxcbeventqueue.cpp:228 event = <optimized out> connection = 0x55a2a422e510 tail = 0x7f308af2eec0 <QXcbEventQueue::qXcbEventNodeFactory(xcb_generic_event_t*)::qXcbNodePool> enqueueEvent = <optimized out> #4 0x00007f309d3fd4cc in QThreadPrivate::start(void*) () at thread/qthread_unix.cpp:329 currentThreadData = 0x55a2a42405e0 current_thread_data_once = 2 (anonymous namespace)::destroy_current_thread_data_key_dtor_instance_ = {<No data fields>} current_thread_data_key = 5 #5 0x00007f309ccf63f9 in start_thread (arg=0x7f308ab1c640) at pthread_create.c:463 ret = <optimized out> pd = 0x7f308ab1c640 --Type <RET> for more, q to quit, c to continue without paging--c unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139846462064192, -164550953755017166, 140734544037022, 140734544037023, 0, 139846462064192, 276532280879153202, 276509965578600498}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = 0 #6 0x00007f309c2aeb03 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 No locals. Thread 1 (Thread 0x7f309e75c2c0 (LWP 8779)): #0 0x00007f309d5bce5b in QScopedPointer<QObjectData, QScopedPointerDeleter<QObjectData> >::operator-> (this=<optimized out>) at kernel/qobject.cpp:3766 No locals. #1 qGetPtrHelper<QScopedPointer<QObjectData, QScopedPointerDeleter<QObjectData> > > (ptr=...) at ../../include/QtCore/../../src/corelib/global/qglobal.h:1135 No locals. #2 QObject::d_func (this=<optimized out>) at kernel/qobject.h:132 No locals. #3 QObjectPrivate::get (o=<optimized out>) at kernel/qobject_p.h:339 No locals. #4 doActivate<false> (sender=0x0, signal_index=9, argv=0x7fff508144f0) at kernel/qobject.cpp:3768 sp = <optimized out> signal_spy_set = <optimized out> empty_argv = {0x7fff508145b0} senderDeleted = <optimized out> #5 0x00007f309d9f0ee2 in QGuiApplication::screenAdded (this=<optimized out>, _t1=<optimized out>) at .moc/moc_qguiapplication.cpp:389 _a = {0x0, 0x7fff508144e8} #6 0x00007f309d9e123c in QWindowSystemInterface::handleScreenAdded (ps=0x55a2a4246650, isPrimary=<optimized out>) at ../../include/QtCore/../../src/corelib/kernel/qcoreapplication.h:116 screen = 0x55a2a42bd300 #7 0x00007f308ae682b0 in QXcbConnection::initializeScreens (this=0x55a2a4190c40) at qxcbscreen.h:172 screen = 0x55a2a4246650 __for_range = @0x55a2a4190f30: {<QListSpecialMethods<QXcbScreen*>> = {<No data fields>}, {p = {static shared_null = {ref = {atomic = {_q_value = {<std::__atomic_base<int>> = {static _S_alignment = 4, _M_i = -1}, static is_always_lock_free = true}}}, alloc = 0, begin = 0, end = 0, array = {0x0}}, d = 0x55a2a4257080}, d = 0x55a2a4257080}} __for_begin = <optimized out> __for_end = <optimized out> it = {data = 0x55a2a4235d20, rem = 0, index = 9456} xcbScreenNumber = <optimized out> primaryScreen = 0x55a2a4246650 #8 0x00007f308ae43bd0 in QXcbConnection::QXcbConnection (this=0x55a2a4190c40, nativeInterface=<optimized out>, canGrabServer=<optimized out>, defaultVisualId=<optimized out>, displayName=<optimized out>) at qxcbconnection.cpp:103 focusInDelay = <optimized out> focusInDelay = <optimized out> #9 0x00007f308ae46853 in QXcbIntegration::QXcbIntegration (this=0x55a2a422b320, parameters=..., argc=@0x7fff50814bcc: 1, argv=<optimized out>) at ../../../../include/QtCore/../../src/corelib/tools/qscopedpointer.h:138 displayName = <optimized out> noGrabArg = <optimized out> doGrabArg = <optimized out> underDebugger = <optimized out> conn = 0x0 numParameters = 0 canNotGrabEnv = false displayName = <optimized out> noGrabArg = <optimized out> doGrabArg = <optimized out> underDebugger = <optimized out> numParameters = <optimized out> conn = <optimized out> j = <optimized out> i = <optimized out> arg = <optimized out> ok = <optimized out> i = <optimized out> display = <optimized out> qt_category_enabled = <optimized out> #10 0x00007f309f9f746f in QXcbIntegrationPlugin::create (this=<optimized out>, system=..., argv=0x7fff50814f88, argc=@0x7fff50814bcc: 1, parameters=...) at qxcbmain.cpp:56 xcbIntegration = <optimized out> #11 QXcbIntegrationPlugin::create (this=<optimized out>, system=..., parameters=..., argc=@0x7fff50814bcc: 1, argv=0x7fff50814f88) at qxcbmain.cpp:53 xcbIntegration = <optimized out> #12 0x00007f309d9e9f4b in QPlatformIntegrationFactory::create (platform=..., paramList=..., argc=@0x7fff50814bcc: 1, argv=<optimized out>, platformPluginPath=...) at kernel/qplatformintegrationfactory.cpp:51 No locals. #13 0x00007f309d9f4690 in init_platform (argv=<optimized out>, argc=@0x7fff50814bcc: 1, platformThemeName=..., platformPluginPath=..., pluginNamesWithArguments=...) at kernel/qguiapplication.cpp:1223 arguments = {<QList<QString>> = {<QListSpecialMethods<QString>> = {<No data fields>}, {p = {static shared_null = {ref = {atomic = {_q_value = {<std::__atomic_base<int>> = {static _S_alignment = 4, _M_i = -1}, static is_always_lock_free = true}}}, alloc = 0, begin = 0, end = 0, array = {0x0}}, d = 0x55a2a41ab640}, d = 0x55a2a41ab640}}, <No data fields>} name = {static null = {<No data fields>}, d = 0x55a2a421d920} argumentsKey = {static null = {<No data fields>}, d = 0x55a2a421e110} pluginArgument = @0x55a2a421d960: {static null = {<No data fields>}, d = 0x55a2a421d920} __for_range = @0x7fff508148e8: {<QList<QString>> = {<QListSpecialMethods<QString>> = {<No data fields>}, {p = {static shared_null = {ref = {atomic = {_q_value = {<std::__atomic_base<int>> = {static _S_alignment = 4, _M_i = -1}, static is_always_lock_free = true}}}, alloc = 0, begin = 0, end = 0, array = {0x0}}, d = 0x55a2a421d950}, d = 0x55a2a421d950}}, <No data fields>} __for_begin = <optimized out> __for_end = <optimized out> plugins = {<QList<QString>> = {<QListSpecialMethods<QString>> = {<No data fields>}, {p = {static shared_null = {ref = {atomic = {_q_value = {<std::__atomic_base<int>> = {static _S_alignment = 4, _M_i = -1}, static is_always_lock_free = true}}}, alloc = 0, begin = 0, end = 0, array = {0x0}}, d = 0x55a2a421d950}, d = 0x55a2a421d950}}, <No data fields>} platformArguments = {<QList<QString>> = {<QListSpecialMethods<QString>> = {<No data fields>}, {p = {static shared_null = {ref = {atomic = {_q_value = {<std::__atomic_base<int>> = {static _S_alignment = 4, _M_i = -1}, static is_always_lock_free = true}}}, alloc = 0, begin = 0, end = 0, array = {0x0}}, d = 0x55a2a24a1530 <QListData::shared_null>}, d = 0x55a2a24a1530 <QListData::shared_null>}}, <No data fields>} availablePlugins = {<QList<QString>> = {<QListSpecialMethods<QString>> = {<No data fields>}, {p = {static shared_null = {ref = {atomic = {_q_value = {<std::__atomic_base<int>> = {static _S_alignment = 4, _M_i = -1}, static is_always_lock_free = true}}}, alloc = 0, begin = 0, end = 0, array = {0x0}}, d = 0x55a2a421ea80}, d = 0x55a2a421ea80}}, <No data fields>} themeNames = {<QList<QString>> = {<QListSpecialMethods<QString>> = {<No data fields>}, {p = {static shared_null = {ref = {atomic = {_q_value = {<std::__atomic_base<int>> = {static _S_alignment = 4, _M_i = -1}, static is_always_lock_free = true}}}, alloc = 0, begin = 0, end = 0, array = {0x0}}, d = 0x7fff50814980}, d = 0x7fff50814980}}, <No data fields>} plugins = <optimized out> platformArguments = <optimized out> availablePlugins = <optimized out> themeNames = <optimized out> pluginArgument = <optimized out> __for_range = <optimized out> __for_begin = <optimized out> __for_end = <optimized out> arguments = <optimized out> name = <optimized out> argumentsKey = <optimized out> qt_category_enabled = <optimized out> qt_category_enabled = <optimized out> fatalMessage = <optimized out> themeName = <optimized out> __for_range = <optimized out> __for_begin = <optimized out> __for_end = <optimized out> themeName = <optimized out> __for_range = <optimized out> __for_begin = <optimized out> __for_end = <optimized out> nativeInterface = <optimized out> argument = <optimized out> __for_range = <optimized out> __for_begin = <optimized out> __for_end = <optimized out> equalsPos = <optimized out> name = <optimized out> value = <optimized out> #14 QGuiApplicationPrivate::createPlatformIntegration (this=0x55a2a421c6a0) at kernel/qguiapplication.cpp:1474 platformPluginPath = {static null = {<No data fields>}, d = 0x55a2a24a1e80 <QArrayData::shared_null>} platformName = {d = 0x55a2a421d870} sessionType = {d = 0x55a2a421d8a0} platformNameEnv = {d = 0x55a2a421d870} platformThemeName = {static null = {<No data fields>}, d = 0x55a2a24a1e80 <QArrayData::shared_null>} icon = {static null = {<No data fields>}, d = 0x55a2a24a1e80 <QArrayData::shared_null>} j = <optimized out> #15 0x00007f309d9f5ca0 in QGuiApplicationPrivate::createEventDispatcher (this=<optimized out>) at kernel/qguiapplication.cpp:1491 No locals. #16 0x00007f309d594f86 in QCoreApplicationPrivate::init (this=<optimized out>) at kernel/qcoreapplication.cpp:852 q = <optimized out> appPaths = 0x0 manualPaths = 0x0 thisThreadData = <optimized out> #17 0x00007f309d9f85f4 in QGuiApplicationPrivate::init (this=0x55a2a421c6a0) at kernel/qguiapplication.cpp:1520 loadTestability = <optimized out> pluginList = {<QListSpecialMethods<QByteArray>> = {<No data fields>}, {p = {static shared_null = {ref = {atomic = {_q_value = {<std::__atomic_base<int>> = {static _S_alignment = 4, _M_i = -1}, static is_always_lock_free = true}}}, alloc = 0, begin = 0, end = 0, array = {0x0}}, d = 0x55a2a421c810}, d = 0x55a2a421c810}} session_id = {static null = {<No data fields>}, d = 0x55a2a24a1e80 <QArrayData::shared_null>} session_key = {static null = {<No data fields>}, d = 0x55a2a421c768} s = {static null = {<No data fields>}, d = 0x0} j = <optimized out> envPlugins = {d = 0x8} q = <optimized out> #18 0x00007f309dfb9ef9 in QApplicationPrivate::init (this=0x55a2a421c6a0) at kernel/qapplication.cpp:513 No locals. #19 0x000055a2a1a8929e in main () No symbol table info available. Thanks. If you look at frame #4 in thread #1, that's where you see the issue. The first argument is NULL which is a result of the problems related to QCoreApplication::self. At this time I think the issue is in the LTO plugin and/or the static linker since we should only have one instance of that object, but we actually have two -- one defined in the DSO the other in the main application. I'm going on PTO for the remainder of the week, but this is high on my list when I get back Monday. If I get lucky I'll have a testcase ready for the other GCC developers before I leave and they can take a peek while I wander Yellowstone with my daughter ;-) I thought that doActivate being called in frame 4 and then in frame 0 looked strange. Your explanation helped me understand more of what was happening. I saw kwin_wayland segmentation faults in QWeakPointer<QObject>::QWeakPointer with doActivate in the trace in frame 8 at https://bugzilla.redhat.com/show_bug.cgi?id=1797165#c2 and in std::__atomic_base<int>::operator++ with doActivate in frame 11 at https://bugzilla.redhat.com/show_bug.cgi?id=1851769 Could those crashes be related to LTO? Thanks. Highly unlikely for both since the initial reports are from June and Feb of this year -- both of which pre-date LTO enablement which occurred in late July. Given the references to std::__atomic_base it might be worth trying to build the components with gcc-11 which has some additional diagnostics about losing qualifiers on atomics. Note there are no official gcc-11 rpms for Fedora yet and my private ones have those diagnostics disabled. So I don't have a list of affected packages at this time. FEDORA-2020-b289261bdf has been pushed to the Fedora 33 testing repository. In short time you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-b289261bdf` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-b289261bdf See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates. FEDORA-2020-b289261bdf has been pushed to the Fedora 33 stable repository. If problem still persists, please make note of it in this bug report. Thanks Jeff. kstars has a splash screen while it's starting. One of the doActivate calls could have been for the splash screen and the other for the main kstars window. The kwin_wayland segmentation fault in std::__atomic_base<int>::operator++ looked to be due to an invalid pointer this=0x7000700070006 which was also in frame #2 QBasicAtomicInteger<int>::ref. Memory corruption might've resulted in the invalid pointer due to something in kwayland(-server) with the mouse cursor position. Both the kwin_wayland crashes in https://bugzilla.redhat.com/show_bug.cgi?id=1797165 and https://bugzilla.redhat.com/show_bug.cgi?id=1851769 are very similar below frames 3 and 6 in operator at /usr/src/debug/kwayland-server-5.19.5-1.fc33.x86_64/src/server/pointer_interface.cpp:217 respectively. Would the diagnostics you mentioned be able to detect something wrong like that? Note that twinkle appears to suffer from the exact same problem as kstars. In response to c#15, it depends on *why* things were corrupted. valgrind is a great tool to use in these scenarios. This discussion should really be taken to the relevant BZ rather than cluttering this one. So after much head-banging I think what's going on with kstars and twinkle (and likely other QT applications) is ultimately the same problem that's discussed here: https://bugreports.qt.io/browse/QTBUG-45755?focusedCommentId=281535&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel It's not 100% the same in the sense that qtbase no longer uses -Bsymbolic and forces the right pic/pie options when building its DSOs. However, the underlying issue of needing to avoid R_COPY relocs and local data/function binding are the same. In fact, if you read through all the analysis we're talking about the exact same hunk of code (QCoreApplication::instance()) and the exact same behavior. In fact the little testcase can be used to show the problematic binding when linking against properly compiled QT libraries. I think we just need to ensure we're compiling the applications with -fPIC which will avoid the problematic local binding. We can scan all the RPMs for binaries with R_X86_64_COPY relocs that reference QCoreApplication::self then look and see of that set of packages/executables, which are not compiled with PIC. I've fixed kstars to build with -fPIC which avoids the local binding and copy relocation issues. Builds with the fix and LTO re-enabled are spinning. Between Florian's symbol data and my script I can identify all the packages that will need similar treatment and will be testing/fixing them as quickly as possible. Thanks for the report and your patience while I wrapped my head around the assumptions made by the QT libraries. Hello all. I am still having trouble with kstars and the immediate seg fault at launch on Fedora 33, even after adding -fPIC to the CMakeLists.txt. I am building the package on Copr, and using this spec file, due to a couple modifications: https://invent.kde.org/xsnrg/kstars-spec You will see the two lines are there for the -fPIC flag: +SET(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fexceptions -fPIC") +SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fexceptions -fPIC") The most recent build as of this note, is here: https://copr.fedorainfracloud.org/coprs/xsnrg/kstars-bleeding/build/1707750/ All the builds are successful, and I have been testing with a simple docker container generated like so: Dockerfile: from amd64/fedora:33 USER root RUN dnf upgrade -y --refresh RUN dnf install -y dnf-utils RUN dnf copr enable -y xsnrg/stellarsolver-bleeding RUN dnf copr enable -y xsnrg/kstars-bleeding RUN dnf copr enable -y xsnrg/libindi-bleeding RUN dnf install -y stellarsolver kstars xeyes RUN useradd user -G wheel RUN echo "user ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers USER user CMD ["/bin/bash"] Build: `sudo podman build -t fedora33 .` Launch: `sudo podman run -ti -e DISPLAY --rm -v ~/.Xauthority:/root/.Xauthority:Z --net=host fedora33` I then just run kstars from the prompt. With changing only to fedora32, everything runs fine, but with fedora33, I get the immediate seg fault listed in the above description. My next attempt at a Fedora 33 build will be to put the %define _lto_cflags %{nil} entry back in and trying that, but I wanted to post here in the hopes that somebody would notice something that I was doing that needed fixed, if -fPIC is working for others. Thank you Jim Update: adding %define _lto_cflags %{nil} to the spec file allows my copr build of kstars to start and run from the container. Also worth noting, the kstars 3.4.3 package bundled in Fedora 33 also starts and runs from the container. I checked this by removing my build of kstars, removing the copr configuration, and dnf installing kstars in Fedora 33. I then looked at possible patch problems in my copr build, but everything seems to be patching okay: + echo 'Patch #0 (CMakeLists.txt.patch):' Patch #0 (CMakeLists.txt.patch): + /usr/bin/patch --no-backup-if-mismatch -p1 --fuzz=0 patching file CMakeLists.txt + RPM_EC=0 ++ jobs -p + exit 0 indeed the # of "-fPIC" flags in the build logs between the copr build and the kojipkgs build are about the same, with copr being just a few more, likely due to version 3.5.0 master vs 3.4.3 in the koji build. I am still left wondering why my build with only -fPIC aborts on launch. building with lto enabled is the known trigger, see comment #5 If I understand Jeff's changes correctly, he added -fPIC, but put LTO back in. See comment #19. This is what did not work for me. I only found success with adding %define _lto_cflags %{nil} to the spec. |