Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 2166931 - [fedora-eln] [s390x] provision fails due to lvm2-lvmdbusd.service: Failed with result 'core-dump'.
Summary: [fedora-eln] [s390x] provision fails due to lvm2-lvmdbusd.service: Failed wit...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: lvm2
Version: 39
Hardware: s390x
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: LVM Team
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: ZedoraTracker
TreeView+ depends on / blocked
 
Reported: 2023-02-03 14:27 UTC by Bruno Goncalves
Modified: 2023-08-21 16:21 UTC (History)
28 users (show)

Fixed In Version: lvm2-2.03.21-1.fc39
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-08-21 16:21:09 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Bruno Goncalves 2023-02-03 14:27:54 UTC
Description of problem:

Trying to provision ELN compose on s390x with:

[   37.406150] anaconda[2046]: Starting automated install.Saving storage configu 
ration...                                                                        
[   37.406193] anaconda[2046]: Failed to save storage configuration              
[   37.406212] anaconda[2046]: The installation was stopped due to an error whic 
h occurred while running in non-interactive cmdline mode. Since there cannot be  
any questions in cmdline mode, edit your kickstart file and retry installation.  
                                                                                 
[   37.406234] anaconda[2046]: The exact error message is:                       
[   37.406253] anaconda[2046]: Non interactive installation failed: Failed to ge 
t Name property of the /com/redhat/lvmdbus1/Vg/0 object: GDBus.Error:org.freedes 

Version-Release number of selected component (if applicable):
anaconda 38.19-1.eln125

How reproducible:
100%

Steps to Reproduce:
1.Try to provision s390x with ELN compose

Comment 5 Jan Stodola 2023-02-03 15:38:21 UTC
Reassigning to lvm2 based on comment 3.

Comment 6 Ben Cotton 2023-02-07 15:08:05 UTC
This bug appears to have been reported against 'rawhide' during the Fedora Linux 38 development cycle.
Changing version to 38.

Comment 7 Bruno Goncalves 2023-02-21 14:45:32 UTC
Moving back to Rawhide as it continues to hit eln composes...

Comment 9 Marian Csontos 2023-02-21 20:52:58 UTC
Could you please retest with fresh build lvm2-2.03.19-1

Comment 15 Tony Asleson 2023-03-28 16:17:40 UTC
OK, to get ELN installed to gather more debug I selected a system that ELN would install on, used F38 and installed using autopart_type=plain for kickstart to clear out lvm signature on disk and then re-provisioned same system with ELN and autopart_type=plain which worked.  The system came up with many errors and RO root fs which I didn't dig into.  I remounted root fs RW manually over serial console as sshd failed to start.  I then ran lvmdusd which consistently seg. faults.


The issue looks like it resides on python itself.  I don't know why we aren't seeing this on rawhide or F38.

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000003ff8127cbdc in _Py_Dealloc (op=0x3ff80bbb4b0) at /usr/src/debug/python3.11-3.11.2-1.eln126.s390x/Objects/object.c:2390
2390	    (*dealloc)(op);
[Current thread is 1 (Thread 0x3ff7d4fe8c0 (LWP 1343))]
(gdb) bt
#0  0x000003ff8127cbdc in _Py_Dealloc (op=<unknown at remote 0x3ff80bbb4b0>) at /usr/src/debug/python3.11-3.11.2-1.eln126.s390x/Objects/object.c:2390
#1  Py_DECREF (op=<unknown at remote 0x3ff80bbb4b0>) at /usr/src/debug/python3.11-3.11.2-1.eln126.s390x/Include/object.h:538
#2  _PyFrame_Clear (frame=frame@entry=0x3ff816fc3f0) at /usr/src/debug/python3.11-3.11.2-1.eln126.s390x/Python/frame.c:146
#3  0x000003ff8126262e in _PyEvalFrameClearAndPop (frame=0x3ff816fc3f0, tstate=0x2aa04cd35c0) at /usr/src/debug/python3.11-3.11.2-1.eln126.s390x/Python/ceval.c:6407
#4  _PyEval_Vector (tstate=0x2aa04cd35c0, func=func@entry=0x3ff80630540, locals=locals@entry=0x0, args=args@entry=0x3ff7d4fcfe0, argcount=argcount@entry=2, kwnames=0x0)
    at /usr/src/debug/python3.11-3.11.2-1.eln126.s390x/Python/ceval.c:6440
#5  0x000003ff812b84ee in _PyFunction_Vectorcall (kwnames=0x0, nargsf=2, stack=0x3ff7d4fcfe0, func=<function at remote 0x3ff80630540>) at /usr/src/debug/python3.11-3.11.2-1.eln126.s390x/Objects/call.c:393
#6  _PyObject_VectorcallTstate (kwnames=0x0, nargsf=2, args=0x3ff7d4fcfe0, callable=<function at remote 0x3ff80630540>, tstate=0x2aa04cd35c0) at /usr/src/debug/python3.11-3.11.2-1.eln126.s390x/Include/internal/pycore_call.h:92
#7  method_vectorcall (method=method@entry=<method at remote 0x3ff7e7fb080>, args=args@entry=0x3ff7d4fcfe8, nargsf=nargsf@entry=9223372036854775809, kwnames=kwnames@entry=0x0)
    at /usr/src/debug/python3.11-3.11.2-1.eln126.s390x/Objects/classobject.c:59
#8  0x000003ff8129b086 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=9223372036854775809, args=0x3ff7d4fcfe8, callable=<method at remote 0x3ff7e7fb080>, tstate=0x2aa04cd35c0)
    at /usr/src/debug/python3.11-3.11.2-1.eln126.s390x/Include/internal/pycore_call.h:92
#9  PyObject_CallOneArg (func=<method at remote 0x3ff7e7fb080>, arg=<optimized out>) at /usr/src/debug/python3.11-3.11.2-1.eln126.s390x/Objects/call.c:376
#10 0x000003ff81370e88 in call_attribute (self=<optimized out>, attr=<method at remote 0x3ff7e7fb080>, name='syscall') at /usr/src/debug/python3.11-3.11.2-1.eln126.s390x/Objects/typeobject.c:7671
#11 0x000003ff812def8e in slot_tp_getattr_hook (
    self=<CDLL(_name='libc.so.6', _FuncPtr=<_ctypes.PyCFuncPtrType at remote 0x3ff74000be0>, _handle=4395924165904, syscall=<_FuncPtr(__name__='syscall') at remote 0x3ff7e7e7950>) at remote 0x3ff80d9dd10>, name='syscall')
    at /usr/src/debug/python3.11-3.11.2-1.eln126.s390x/Objects/typeobject.c:7712
#12 0x000003ff8125f904 in PyObject_GetAttr (
    v=<CDLL(_name='libc.so.6', _FuncPtr=<_ctypes.PyCFuncPtrType at remote 0x3ff74000be0>, _handle=4395924165904, syscall=<_FuncPtr(__name__='syscall') at remote 0x3ff7e7e7950>) at remote 0x3ff80d9dd10>, name='syscall')
    at /usr/src/debug/python3.11-3.11.2-1.eln126.s390x/Objects/object.c:923
#13 0x000003ff81262178 in _PyObject_GetMethod (
    obj=<CDLL(_name='libc.so.6', _FuncPtr=<_ctypes.PyCFuncPtrType at remote 0x3ff74000be0>, _handle=4395924165904, syscall=<_FuncPtr(__name__='syscall') at remote 0x3ff7e7e7950>) at remote 0x3ff80d9dd10>, name='syscall', 
    method=0x3ff7d4fd3b8) at /usr/src/debug/python3.11-3.11.2-1.eln126.s390x/Objects/object.c:1164
#14 0x000003ff8126d960 in _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>, throwflag=<optimized out>) at /usr/src/debug/python3.11-3.11.2-1.eln126.s390x/Python/ceval.c:4490
#15 0x000003ff8126261a in _PyEval_EvalFrame (throwflag=0, frame=0x3ff816fc2e8, tstate=0x2aa04cd35c0) at /usr/src/debug/python3.11-3.11.2-1.eln126.s390x/Include/internal/pycore_ceval.h:73
#16 _PyEval_Vector (tstate=0x2aa04cd35c0, func=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, argcount=argcount@entry=1, kwnames=0x0) at /usr/src/debug/python3.11-3.11.2-1.eln126.s390x/Python/ceval.c:6435
#17 0x000003ff81296bc8 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/src/debug/python3.11-3.11.2-1.eln126.s390x/Objects/call.c:393
#18 0x000003ff8129f868 in _PyObject_Call (tstate=0x2aa04cd35c0, callable=<function at remote 0x3ff7f7316c0>, args=('update thread exiting!',), kwargs=0x0) at /usr/src/debug/python3.11-3.11.2-1.eln126.s390x/Objects/call.c:328
#19 0x000003ff8126dc80 in _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>, throwflag=<optimized out>) at /usr/src/debug/python3.11-3.11.2-1.eln126.s390x/Python/ceval.c:5379
#20 0x000003ff8126261a in _PyEval_EvalFrame (throwflag=0, frame=0x3ff816fc188, tstate=0x2aa04cd35c0) at /usr/src/debug/python3.11-3.11.2-1.eln126.s390x/Include/internal/pycore_ceval.h:73
#21 _PyEval_Vector (tstate=0x2aa04cd35c0, func=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, argcount=argcount@entry=1, kwnames=0x0) at /usr/src/debug/python3.11-3.11.2-1.eln126.s390x/Python/ceval.c:6435
#22 0x000003ff81296bc8 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/src/debug/python3.11-3.11.2-1.eln126.s390x/Objects/call.c:393
#23 0x000003ff8129f868 in _PyObject_Call (tstate=0x2aa04cd35c0, callable=<function at remote 0x3ff7eaa1a80>, 
    args=(<StateUpdate(lock=<_thread.RLock at remote 0x3ff7e7d8340>, queue=<Queue(maxsize=0, queue=<collections.deque at remote 0x3ff7e7cc6d0>, mutex=<_thread.lock at remote 0x3ff7e7ef140>, not_empty=<Condition(_lock=<...>, acquire=<built-in method acquire of _thread.lock object at remote 0x3ff7e7ef140>, release=<built-in method release of _thread.lock object at remote 0x3ff7e7ef140>, _waiters=<collections.deque at remote 0x3ff7e7cc7c0>) at remote 0x3ff7e7efe10>, not_full=<Condition(_lock=<...>, acquire=<built-in method acquire of _thread.lock object at remote 0x3ff7e7ef140>, release=<built-in method release of _thread.lock object at remote 0x3ff7e7ef140>, _waiters=<collections.deque at remote 0x3ff7e7cc8b0>) at remote 0x3ff7e7ef810>, all_tasks_done=<Condition(_lock=<...>, acquire=<built-in method acquire of _thread.lock object at remote 0x3ff7e7ef140>, release=<built-in method release of _thread.lock object at remote 0x3ff7e7ef140>, _waiters=<collections.deque at remote 0x3ff7e7cc9a0>) at remote 0x3ff7e7efd10>, unfin...(truncated), kwargs={}) at /usr/src/debug/python3.11-3.11.2-1.eln126.s390x/Objects/call.c:328
#24 0x000003ff8126dc80 in _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>, throwflag=<optimized out>) at /usr/src/debug/python3.11-3.11.2-1.eln126.s390x/Python/ceval.c:5379
#25 0x000003ff8126261a in _PyEval_EvalFrame (throwflag=0, frame=0x3ff816fc020, tstate=0x2aa04cd35c0) at /usr/src/debug/python3.11-3.11.2-1.eln126.s390x/Include/internal/pycore_ceval.h:73
#26 _PyEval_Vector (tstate=0x2aa04cd35c0, func=func@entry=0x3ff80cb7f60, locals=locals@entry=0x0, args=args@entry=0x3ff7d4fdd78, argcount=argcount@entry=1, kwnames=0x0)
    at /usr/src/debug/python3.11-3.11.2-1.eln126.s390x/Python/ceval.c:6435
#27 0x000003ff812b8598 in _PyFunction_Vectorcall (kwnames=0x0, nargsf=1, stack=0x3ff7d4fdd78, func=<function at remote 0x3ff80cb7f60>) at /usr/src/debug/python3.11-3.11.2-1.eln126.s390x/Objects/call.c:393
#28 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=1, args=0x3ff7d4fdd78, callable=<function at remote 0x3ff80cb7f60>, tstate=0x2aa04cd35c0) at /usr/src/debug/python3.11-3.11.2-1.eln126.s390x/Include/internal/pycore_call.h:92
#29 method_vectorcall (method=<optimized out>, args=0x3ff815fabf0 <_PyRuntime+58928>, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/src/debug/python3.11-3.11.2-1.eln126.s390x/Objects/classobject.c:67
#30 0x000003ff8129f868 in _PyObject_Call (tstate=0x2aa04cd35c0, callable=<method at remote 0x3ff7e7f8a40>, args=(), kwargs=0x0) at /usr/src/debug/python3.11-3.11.2-1.eln126.s390x/Objects/call.c:328
#31 0x000003ff813a05d6 in thread_run (boot_raw=0x3ff7e7d2370) at /usr/src/debug/python3.11-3.11.2-1.eln126.s390x/Modules/_threadmodule.c:1092
#32 0x000003ff813570f4 in pythread_wrapper (arg=<optimized out>) at /usr/src/debug/python3.11-3.11.2-1.eln126.s390x/Python/thread_pthread.h:241
#33 0x000003ff80ea462a in start_thread (arg=0x3ff7d4fe8c0) at pthread_create.c:444
#34 0x000003ff80f28342 in thread_start () at ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:68
(gdb)

Comment 16 Tony Asleson 2023-03-28 16:23:12 UTC
Created attachment 1954208 [details]
core file

Comment 17 Miro Hrončok 2023-03-28 16:54:43 UTC
Why are all the useful comments in this bugzilla marked as confidential?

Comment 18 Miro Hrončok 2023-03-28 18:38:36 UTC
Thanks. The instructions to reproduce assume prior knowledge which I do not have. Could you please provide step-by-step instructions on how do I trigger this exactly?

Note that the crash happens in Python, but it does not mean it is caused by Python, it might as well be gobject-introspection or some other Python extension module not coutning references properly.

Comment 19 Tony Asleson 2023-03-28 23:25:05 UTC
(In reply to Miro Hrončok from comment #18)
> Thanks. The instructions to reproduce assume prior knowledge which I do not
> have. Could you please provide step-by-step instructions on how do I trigger
> this exactly?

# /usr/sbin/lvmdbusd

Typically is crashes at start, if it doesn't hit ^C to exit daemon and it will core then.

> Note that the crash happens in Python, but it does not mean it is caused by
> Python, it might as well be gobject-introspection or some other Python
> extension module not coutning references properly.

True, however the backtrace doesn't seem to implicate other libraries.

Comment 20 Miro Hrončok 2023-03-29 00:06:36 UTC
I get this:

# /usr/sbin/lvmdbusd
Traceback (most recent call last):
  File "/usr/sbin/lvmdbusd", line 16, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/lib/python3.11/site-packages/lvmdbusd/main.py", line 193, in main
    cfg.bus = dbus.SystemBus()
              ^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/site-packages/dbus/_dbus.py", line 192, in __new__
    return Bus.__new__(cls, Bus.TYPE_SYSTEM, mainloop=mainloop,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/site-packages/dbus/_dbus.py", line 99, in __new__
    bus = BusConnection.__new__(subclass, bus_type, mainloop=mainloop)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/site-packages/dbus/bus.py", line 120, in __new__
    bus = cls._new_for_bus(address_or_type, mainloop=mainloop)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
dbus.exceptions.DBusException: org.freedesktop.DBus.Error.FileNotFound: Failed to connect to socket /run/dbus/system_bus_socket: No such file or directory


I suppose this won't be testable in a container, right? Will checkout a virtual machine later.

Comment 22 Miro Hrončok 2023-03-29 09:20:14 UTC
python3-3.11.2-1.eln126.s390x:


[root@s390x-kvm-011 ~]# /usr/sbin/lvmdbusd
^CMar 29 05:13:31.287154: 2709:0 - Exiting daemon with signal 2
Segmentation fault (core dumped)

Same system, but with python3-3.11.2-1.fc39.s390x:

[root@s390x-kvm-011 ~]# /usr/sbin/lvmdbusd
^CMar 29 05:12:01.471404: 2663:-1 - Exiting daemon with signal 2


I cannot reproduce this with the debug build of Python (python3-debug-3.11.2-1.eln126.s390x):

[root@s390x-kvm-011 ~]# python3-debug /usr/sbin/lvmdbusd
<frozen importlib._bootstrap>:673: ImportWarning: DynamicImporter.exec_module() not found; falling back to load_module()
/usr/lib/python3.11/site-packages/lvmdbusd/cmdhandler.py:244: ResourceWarning: unclosed file <_io.BufferedReader name=5>
  results = _t_call(command, debug)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/lib/python3.11/site-packages/lvmdbusd/cmdhandler.py:244: ResourceWarning: unclosed file <_io.BufferedReader name=7>
  results = _t_call(command, debug)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/lib/python3.11/site-packages/lvmdbusd/cmdhandler.py:244: ResourceWarning: unclosed file <_io.BufferedReader name=10>
  results = _t_call(command, debug)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/lib/python3.11/site-packages/lvmdbusd/cmdhandler.py:244: ResourceWarning: unclosed file <_io.BufferedReader name=12>
  results = _t_call(command, debug)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/lib/python3.11/site-packages/pyudev/monitor.py:513: DeprecationWarning: "event_handler" argument will be removed in 1.0. Use Monitor.poll() instead.
  warnings.warn(
^CMar 29 05:19:07.789763: 3106:-1 - Exiting daemon with signal 2


:/

Comment 23 Miro Hrončok 2023-03-29 09:26:39 UTC
I can also reproduce this on a real s390x machine, in mock:

$ mock -r fedora-eln-s390x init
$ mock -r fedora-eln-s390x install /usr/bin/dbus-daemon /usr/sbin/lvmdbusd
$ mock -r fedora-eln-s390x shell
<mock-chroot> sh-5.2# mkdir -p /run/dbus
<mock-chroot> sh-5.2# dbus-daemon --system
<mock-chroot> sh-5.2# mkdir -p /var/run/lock/lvm
<mock-chroot> sh-5.2# /usr/sbin/lvmdbusd
^CMar 29 09:24:04.699467: 39:0 - Exiting daemon with signal 2
Segmentation fault

$ mock -r fedora-eln-s390x --orphanskill

Comment 24 Miro Hrončok 2023-03-29 09:30:05 UTC
<mock-chroot> sh-5.2# python3 -X faulthandler /usr/sbin/lvmdbusd
^CMar 29 09:28:16.069389: 38:0 - Exiting daemon with signal 2
Fatal Python error: Segmentation fault

Current thread 0x000003ff7d6ff8c0 (most recent call first):
  File "/usr/lib/python3.11/site-packages/lvmdbusd/utils.py", line 309 in _format_log_entry
  File "/usr/lib/python3.11/site-packages/lvmdbusd/utils.py", line 346 in log_debug
  File "/usr/lib/python3.11/site-packages/lvmdbusd/main.py", line 57 in process_request
  File "/usr/lib64/python3.11/threading.py", line 975 in run
  File "/usr/lib64/python3.11/threading.py", line 1038 in _bootstrap_inner
  File "/usr/lib64/python3.11/threading.py", line 995 in _bootstrap

Thread 0x000003ff811f7720 (most recent call first):
  File "/usr/lib64/python3.11/threading.py", line 1132 in _wait_for_tstate_lock
  File "/usr/lib64/python3.11/threading.py", line 1112 in join
  File "/usr/lib/python3.11/site-packages/lvmdbusd/main.py", line 237 in main
  File "/usr/sbin/lvmdbusd", line 16 in <module>

Extension modules: _dbus_bindings, gi._gi, _dbus_glib_bindings (total: 3)
Segmentation fault


But:

<mock-chroot> sh-5.2# python3 -X faulthandler -X tracemalloc /usr/sbin/lvmdbusd
^CMar 29 09:29:00.550427: 47:-1 - Exiting daemon with signal 2
(no segfault)

Comment 25 Victor Stinner 2023-03-29 15:13:59 UTC
It seems like the issue comes from the "tid = ctypes.CDLL('libc.so.6').syscall(186)" line of lvmdbusd.utils._format_log_entry(), knowning that the signal handler logs messages: handler() of lvmdbusd.utils.

Each ctypes.CDLL('libc.so.6') call creates a new ctypes.CDLL instance which is destroyed after the syscall() method call.

I can also trigger the crash on my modified code with the SIGUSR1 signal, not only at exit (CTRL+C sending SIGINT).

Well, Python 3.11 should not crash, even if creating a new ctypes.CDLL instance on each signal is surprising, it's valid code and it worked well on Python 3.10 if I understand correctly. It looks like Python 3.11 regression to me. I will investigate the issue.

Comment 26 Tony Asleson 2023-03-29 15:43:53 UTC
I changed:

tid = ctypes.CDLL('libc.so.6').syscall(186)

to

tid = threading.get_native_id()

and the seg. fault goes away.

threading.get_native_id() is 3.8 or later, which I don't think is available in all the expected run-times, but I'll double check.
This line of code has existed since 2016 and has worked on many different python run-times, but it's kind of a hack.

Comment 27 Tony Asleson 2023-03-29 16:41:42 UTC
Reproduces with:


import ctypes
import datetime
import os


def _format_log_entry(msg):
    tid = ctypes.CDLL('libc.so.6').syscall(186)
    msg = "%s: %d:%d - %s" % \
        (datetime.datetime.now().strftime("%b %d %H:%M:%S.%f"),
        os.getpid(), tid, msg)
    print(msg)


if __name__ == "__main__":
    i = 0
    while True:
        _format_log_entry("msg %d" % i)
        i += 1



# python3 repro.py 
Mar 29 12:40:18.324178: 3888:-1 - msg 0
Mar 29 12:40:18.324260: 3888:-1 - msg 1
Mar 29 12:40:18.324291: 3888:-1 - msg 2
Mar 29 12:40:18.324318: 3888:-1 - msg 3
Mar 29 12:40:18.324361: 3888:-1 - msg 4
Mar 29 12:40:18.324392: 3888:-1 - msg 5
Mar 29 12:40:18.324428: 3888:-1 - msg 6
Mar 29 12:40:18.324460: 3888:-1 - msg 7
Mar 29 12:40:18.324492: 3888:0 - msg 8
Segmentation fault (core dumped)

Comment 28 Dan Horák 2023-03-29 16:55:12 UTC
Does it reproduce on Fedora too is it ELN specific?

Comment 29 Tony Asleson 2023-03-29 16:56:30 UTC
(In reply to Dan Horák from comment #28)
> Does it reproduce on Fedora too is it ELN specific?

ELN s390x specific AFAIK

Comment 30 Victor Stinner 2023-03-30 12:31:59 UTC
tl; dr The lvmdbusd code to call gettid() syscall is specific to Linux x86-64 and has undefined behavior on other architectures. It's a bug in lvmdbusd, not in Python. A workaround is to replace "tid = ctypes.CDLL('libc.so.6').syscall(186)" with "tid = threading.get_native_id()".

By the way, calling ctypes.CDLL('libc.so.6') at each (log formatter) function call is inefficient, it should be called once at the first call, and then store the object somewhere.

I reassign the bug to the lvm2 component.


> tid = ctypes.CDLL('libc.so.6').syscall(186)

Well... 186 is a hardcoded syscall number. It's the x86-64 syscall number of gettid(). Example on Fedora x86-64 (linux 6.1.18-200.fc37.x86_64):
---
$ grep __NR_gettid /usr/include -R
/usr/include/asm/unistd_32.h:#define __NR_gettid 224
/usr/include/asm/unistd_64.h:#define __NR_gettid 186
(...)
---

On x86 (32-bit), it's a different syscall number: 224.

But syscall numbers are architecture specific. On s390, 186 is sigaltstack() syscall, whereas gettid() syscall number is 236:
---
$ grep -E '186|NR_gettid' /usr/include/asm/unistd_64.h
#define __NR_sigaltstack 186
#define __NR_gettid 236
---

The sigaltstack() syscall has 2 arguments, whereas lvmdbusd calls the glibc syscall() function with no arguments. In practice, it means that the syscall arguments are "undefined": their value depend on the unknown stack state and unknown CPU register values...

On x86-64, the ctypes code just works. Example on Fedora 37 x86-64:
---
$ python3
Python 3.11.2 (main, Feb  8 2023, 00:00:00) [GCC 12.2.1 20221121 (Red Hat 12.2.1-4)] on linux
>>> import threading; print(threading.get_native_id())
142315
>>> import ctypes; print(ctypes.CDLL('libc.so.6').syscall(186))
142315
---

On s390x, syscall(186) is sigaltstack() which may just fail since arguments are invalid random values, or it can change the signal stack and lead to a crash "sometimes" (depending on the stack and CPU register states):
---
$ python3.11
Python 3.11.2 (main, Feb  8 2023, 00:00:00) [GCC 13.0.1 20230127 (Red Hat 13.0.1-0)] on linux
>>> import threading; print(threading.get_native_id())
41771
>>> import ctypes; print(ctypes.CDLL('libc.so.6').syscall(186))
-1
---

Comment 31 Zbigniew Jędrzejewski-Szmek 2023-03-30 13:08:50 UTC
The locking in that function also looks suspicious:

def _common_log(msg, *attributes):
        cfg.stdout_lock.acquire()
        msg = _format_log_entry(msg)

        if STDOUT_TTY and attributes:
                print(color(msg, *attributes))
        else:
                print(msg)

        cfg.stdout_lock.release()
        sys.stdout.flush()

The locking would make some sense if it was around print() and flush().
Including the formatting in the lock is pointless, because it just accesses
local variables and is slow. The lock is effective around print() only, but
print() by itself is atomic. OTOH, the only interesting race with other code
would be if it managed to print() something before the flush. But that is
not covered by the lock.

Comment 32 Tony Asleson 2023-03-30 16:09:27 UTC
Correction posted upstream: 

https://sourceware.org/git/?p=lvm2.git;a=commit;h=300c8209a810f287992376e8a0561fdf878d1520

Comment 33 Tony Asleson 2023-03-30 16:13:07 UTC
(In reply to Zbigniew Jędrzejewski-Szmek from comment #31)
> The locking would make some sense if it was around print() and flush().
> Including the formatting in the lock is pointless, because it just accesses
> local variables and is slow. The lock is effective around print() only, but
> print() by itself is atomic. OTOH, the only interesting race with other code
> would be if it managed to print() something before the flush. But that is
> not covered by the lock.

Thanks, suggestion integrated
https://sourceware.org/git/?p=lvm2.git;a=commit;h=f7ca470abe995210e8008084c0b4436292b2816b

Comment 34 Zbigniew Jędrzejewski-Szmek 2023-03-30 16:19:42 UTC
Ouch. Please, never never say 'except:', except when you want to catch ^C.
In other words, unless you are writing an interactive interpreter, 'except:'
is usually an error.

Comment 35 Fedora Release Engineering 2023-08-16 07:06:34 UTC
This bug appears to have been reported against 'rawhide' during the Fedora Linux 39 development cycle.
Changing version to 39.


Note You need to log in before you can comment on or make changes to this bug.