Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1452662 - Very slow shutdown, it takes 1 minute 30 seconds
Summary: Very slow shutdown, it takes 1 minute 30 seconds
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: 26
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-05-19 12:43 UTC by Leslie Satenstein
Modified: 2017-12-07 03:39 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1310608
Environment:
Last Closed: 2017-12-07 03:39:42 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Leslie Satenstein 2017-05-19 12:43:24 UTC
+++ This bug was initially created as a clone of Bug #1310608 +++

Description of problem:
This is not 100% reproducible, but occurs in more than 60% of shutdowns. It seems it cannot shutdown device mapper because it is busy, so it timeouts after 1:30. I suspected akmods, because the CPU is quite busy for the time it waits, but from the log it doesn't seems it's causing the problem.

Version-Release number of selected component (if applicable):
systemd-222-14.fc23.x86_64

How reproducible:
Sometimes (in more than 60% of shutdowns)

Steps to Reproduce:
1. poweroff

Actual results:
Waits and timeouts after 1 minute 30 seconds, then it shutdowns.

Expected results:
Immediate shutdown.

Additional info:
It starts happening after upgrade to F23.

Is there any debug option howto find out what is blocking the shutdown?

Notice the time difference in journal:
úno 22 01:27:31 yarda kernel: thinkpad_acpi: EC reports that Thermal Table has changed
úno 22 01:28:56 yarda systemd[1]: dev-dm\x2d0.device: Job dev-dm\x2d0.device/stop timed out.

--- Additional comment from Jan Synacek on 2016-02-22 06:07:22 EST ---

Also, this seems to be the same problem as in https://github.com/systemd/systemd/issues/2691.

--- Additional comment from Jan Synacek on 2016-02-22 06:09:40 EST ---

(In reply to Jaroslav Škarvada from comment #0)
> Is there any debug option howto find out what is blocking the shutdown?

You can try adding "systemd.log_level=debug" to the kernel command line and attach the output of the boot where the hang happened (use "journalctl -b -1 (minus one) after reboot).

--- Additional comment from Jaroslav Škarvada on 2016-02-22 15:52:10 EST ---

The delay is caused by systemd-coredump process which needs a bit more time on my core i7 than 90 seconds to complete (it's dumping memory and xz compressing it which is really CPU demanding especially if there is a lot of RAM). The problem is that the process is killed after 90 seconds, so nothing is dumped, which is really confusing. I see some serious problems here:

- user should be alerted what's going there on the console, i.e. that's coredump process is running, just showing "waiting for device-mapper service" (or similar) and counting down is totally useless

- it should be logged to journal that coredump process started and it's waiting for its completion, but unfortunately nothing is currently logged

- and finally the coredump process shouldn't be killed after 90 seconds, or it shouldn't use xz compression when dumping, but compress the core later (e.g. during next boot) which may result in much faster completion, otherwise it's useless time waste

By observing the coredump, it seems it's caused by xfce4-power-manager crash (SIGSEGV), backtrace follows:

#0  0x00007f8890ff3b3a in strlen () at /lib64/libc.so.6
#1  0x000055b775fab868 in get_device_description (upower=0x7f8874009270, device=device@entry=0x7f88740394e0)
    at xfpm-power-common.c:267
#2  0x000055b775fa7526 in power_manager_button_update_device_icon_and_details (button=0x55b777b04230, device=0x7f88740394e0)
    at ../panel-plugins/power-manager-plugin/power-manager-button.c:454
#3  0x00007f88922ee7a5 in g_closure_invoke () at /lib64/libgobject-2.0.so.0
#4  0x00007f8892300851 in signal_emit_unlocked_R () at /lib64/libgobject-2.0.so.0
#5  0x00007f8892309530 in g_signal_emit_valist () at /lib64/libgobject-2.0.so.0
#6  0x00007f88923098ff in g_signal_emit () at /lib64/libgobject-2.0.so.0
#7  0x00007f88922f2cd4 in g_object_dispatch_properties_changed () at /lib64/libgobject-2.0.so.0
#8  0x00007f88922f51c1 in g_object_notify () at /lib64/libgobject-2.0.so.0
#9  0x00007f88922ee7a5 in g_closure_invoke () at /lib64/libgobject-2.0.so.0
#10 0x00007f8892300851 in signal_emit_unlocked_R () at /lib64/libgobject-2.0.so.0
#11 0x00007f8892309530 in g_signal_emit_valist () at /lib64/libgobject-2.0.so.0
#12 0x00007f88923098ff in g_signal_emit () at /lib64/libgobject-2.0.so.0
#13 0x00007f88922f2cd4 in g_object_dispatch_properties_changed () at /lib64/libgobject-2.0.so.0
#14 0x00007f88922f51c1 in g_object_notify () at /lib64/libgobject-2.0.so.0
#15 0x00007f889254692f in up_device_glue_proxy_g_properties_changed () at /lib64/libupower-glib.so.3
#16 0x00007f888c3dad30 in ffi_call_unix64 () at /lib64/libffi.so.6
#17 0x00007f888c3da79b in ffi_call () at /lib64/libffi.so.6
#18 0x00007f88922eefc9 in g_cclosure_marshal_generic () at /lib64/libgobject-2.0.so.0
#19 0x00007f88922ee7a5 in g_closure_invoke () at /lib64/libgobject-2.0.so.0
#20 0x00007f8892300dfe in signal_emit_unlocked_R () at /lib64/libgobject-2.0.so.0
#21 0x00007f8892309530 in g_signal_emit_valist () at /lib64/libgobject-2.0.so.0
#22 0x00007f88923098ff in g_signal_emit () at /lib64/libgobject-2.0.so.0
#23 0x00007f889283283d in on_name_owner_changed () at /lib64/libgio-2.0.so.0
#24 0x00007f8892822554 in emit_signal_instance_in_idle_cb () at /lib64/libgio-2.0.so.0
#25 0x00007f8891fefe3a in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#26 0x00007f8891ff01d0 in g_main_context_iterate.isra () at /lib64/libglib-2.0.so.0
#27 0x00007f8891ff04f2 in g_main_loop_run () at /lib64/libglib-2.0.so.0
#28 0x00007f8894830255 in gtk_main () at /lib64/libgtk-3.so.0
#29 0x000055b775f934db in xfpm_start (bus=<optimized out>, client_id=0x0, dump=<optimized out>) at xfpm-main.c:240
#30 0x000055b775f93b4f in main (argc=1, argv=0x7ffe2e113e48) at xfpm-main.c:447

I am not sure why it's crashing, I guess there is probably null dereference when parsing some garbage/NULL got from the upower upon it's deinitialization. Well, it's very probably easily fixable xfce4-power-manager bug, so cloning there. But I think the above mentioned systemd issues should be also addressed.

--- Additional comment from Jan Synacek on 2016-02-23 03:04:18 EST ---

Thanks for debugging this!

I'm removing the blocking bug, since it's a separate issue.

--- Additional comment from Jaroslav Škarvada on 2016-03-18 09:39:37 EDT ---

Maybe the underlying problem is more complex and just increasing the timeout value may be not enough. Yesterday when it dumped another core with increased timeout (I increased it to 15 minutes), the dump itself took cca 2 minutes and then the CPU went idle and nothing was happening until it was killed after 15 minutes timeout. It seems there is some deadlock, probably related to the dump:

[  OK  ] Started Show Plymouth Power Off Screen.
[  OK  ] Stoppped Restore /run/initramfs on shutdown.
[   ***] (5 of 8) A stop job is running for LVM PV ... on /dev/dm-0 (24s / 1min 30s)[ 7987.947850] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment
[*     ] (1 of 8) A stop job is running for LVM PV ... on /dev/dm-0 (1min 30s / no limit)[ 8053.333338] systemd[1]: dev-dm\x2d0.device: Job dev-dm\x2d0.device/stop timed out.
[     *] (2 of 2) A stop job is running for Session 1 of user yarda (3min 16s / 15 min)

So:

> i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment

I am not sure what does it exactly means, but it is probably something from i915 driver and unrelated to this problem

> (1 of 8) A stop job is running for LVM PV ... on /dev/dm-0 (1min 30s / no limit)

Why no limit here?

> systemd[1]: dev-dm\x2d0.device: Job dev-dm\x2d0.device/stop timed out

Why Job dev-dm\x2d0.device? The escape sequence looks strange in the report.

And finally I don't know for what it is waiting for. The systemd-coredump job seems to finish (even if not announced in the logs) and dumped the core. Maybe it didn't correctly exit? Also this was the first time I reproduced this.

--- Additional comment from Jaroslav Škarvada on 2016-03-20 05:36:33 EDT ---

(In reply to Jaroslav Škarvada from comment #5)

> [*     ] (1 of 8) A stop job is running for LVM PV ... on /dev/dm-0 (1min
> 30s / no limit)[ 8053.333338] systemd[1]: dev-dm\x2d0.device: Job
> dev-dm\x2d0.device/stop timed out.

This job timed out after 1:30 even I set 15 minutes:
DefaultTimeoutStopSec=900s

The other job seems to have "no limit" and the third job 15 minutes. Why? Guys it seems totally broken.

--- Additional comment from Jaroslav Škarvada on 2016-03-20 05:38:47 EDT ---

And finally I am able to reproduce the deadlock even with the xfce4-power-manager problem resolved. So there is probably another issue.

--- Additional comment from Jaroslav Škarvada on 2016-04-05 05:50:51 EDT ---

It seems that the problem occur only if there is a coredump created during the machine shutdown after the abrt handler is uninstalled (i.e. after abrt shutdown). I think the systemd code for catching the coredumps should be reviewed as it seems it is causing deadlocks and prolonging the shutdown process. What worse, user/admin is not notified what's going under the hood.

--- Additional comment from Jan Synacek on 2016-05-02 01:54:21 EDT ---

https://github.com/systemd/systemd/pull/3148/commits/3e7a1f50e473a374e1657d2051237e2db04c4db2

--- Additional comment from Zbigniew Jędrzejewski-Szmek on 2016-05-21 23:43:36 EDT ---

This should be fixed in rawhide now. Reassigning to 24, to keep track of the issue there.

--- Additional comment from Fedora Update System on 2016-07-21 02:48:03 EDT ---

systemd-229-9.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-47bda25e7a

--- Additional comment from Fedora Update System on 2016-07-21 14:53:26 EDT ---

systemd-229-9.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-47bda25e7a

--- Additional comment from Fedora Update System on 2016-07-23 14:48:34 EDT ---

systemd-229-9.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.

--- Additional comment from Leslie Satenstein on 2017-04-16 14:25:24 EDT ---

Please reopen, the problem is persistant with F25, F26 Alpha, and the nightly builds.

As well, 
the watchdog timer is waiting on some event. It does not report the event for which it is waiting.


If I enter a few ctl-breaks (less than 7), the timeout counter can be reduced to around 30 seconds from 90.  If I do 7+ successive ctl-breaks, the watchdog timer is system is supposed exit for the reboot, but the system locks up, waiting for the event that has not arrived.

--- Additional comment from Jaroslav Škarvada on 2017-04-18 05:47:36 EDT ---

I can also reproduce it sometimes (very seldom, it's much better now), but it maybe different source of the problem than I originally reported. And taking into account account it now occurs in cca. 5% of reboots/shutdowns to me, it's practically undebugeable for me.

Comment 1 Leslie Satenstein 2017-07-17 04:26:44 UTC
Fedora 26 problem is not as frequent as F25 was.  

Not sure as it happens if one runs a terminal program and kills the app,  leaving a defunct (zombie).

7 control alt dels   stops the count down, but system does not reboot.

That becomes a manual process.

Comment 2 Leslie Satenstein 2017-12-07 03:39:42 UTC
Fixed for F27 and works with Rawhide (F28)


Note You need to log in before you can comment on or make changes to this bug.