Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1829096 - [regression in 5.6] hang after resuming from suspend
Summary: [regression in 5.6] hang after resuming from suspend
Keywords:
Status: CLOSED DUPLICATE of bug 1830150
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 32
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-28 21:08 UTC by Dominik 'Rathann' Mierzejewski
Modified: 2020-05-05 10:07 UTC (History)
20 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-05 10:07:10 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
dmesg from booting Fedora 32 kernel-5.6.7-300.fc32.x86_64 (77.80 KB, text/plain)
2020-04-28 21:08 UTC, Dominik 'Rathann' Mierzejewski
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Linux Kernel 207491 0 None None None 2020-04-28 21:08:12 UTC

Description Dominik 'Rathann' Mierzejewski 2020-04-28 21:08:12 UTC
Created attachment 1682630 [details]
dmesg from booting Fedora 32 kernel-5.6.7-300.fc32.x86_64

1. Please describe the problem:
After upgrading from 5.5.17 to 5.6.7 (Fedora 31 and 32), the machine no longer resumes after suspend successfully. The screen comes up, but there's no reaction to mouse or keyboard input and the machine remains inaccessible via network (iwlwifi). Fedora rawhide kernel-5.7.0-0.rc3.1.fc33 suffers from this as well.

2. What is the Version-Release number of the kernel:
5.6.7-300.fc32.x86_64

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :
5.6.6-300.fc32.x86_64
I haven't tried all 5.6 kernels yet, but 

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:
1. Boot 5.6.6 or later kernel.
2. Suspend.
3. Resume (press power button).

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:
Yes.

6. Are you running any modules that not shipped with directly Fedora's kernel?:
No.

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.
Attached.

Comment 1 Dominik 'Rathann' Mierzejewski 2020-04-29 07:57:34 UTC
5.6.0-300.fc32 also suffers from this.

I managed to get an Oops with a partial backtrace after adding no_console_suspend to kernel command line:
[   86.898573] Freezing user space processes ... (elapsed 0.001 seconds) done.
[   86.900150] OOM killer disabled.
[   86.900165] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[   86.901579] wlp1s0: deauthenticating from aa:bb:cc:dd:ee:ff by local choice (Reason: 3=DEAUTH_LEAVING)
[   86.907405] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[   86.910092] sd 0:0:0:0: [sda] Stopping disk
[   86.939714] Removing pn544
[   87.216904] PM: suspend devices took 0.315 seconds
[   87.230238] ACPI: EC: interrupt blocked
[   87.244083] ACPI: Preparing to enter system sleep state S3
[   87.245040] ACPI: EC: event blocked
[   87.245046] ACPI: EC: EC stopped
[   87.245049] PM: Saving platform NVS memory
[   87.245306] Disabling non-boot CPUs ...
[   87.246142] IRQ 16: no longer affine to CPU1
[   87.247209] smpboot: CPU 1 is now offline
[   87.252696] IRQ 45: no longer affine to CPU2
[   87.253848] smpboot: CPU 2 is now offline
[   87.256900] IRQ 23: no longer affine to CPU3
[   87.256902] IRQ 43: no longer affine to CPU3
[   87.256905] IRQ 49: no longer affine to CPU3
[   87.257919] smpboot: CPU 3 is now offline
[   87.261044] ACPI: Low-level resume complete
[   87.261115] ACPI: EC: EC started
[   87.261118] PM: Restoring platform NVS memory
[   87.265116] Enabling non-boot CPUs ...
[   87.265175] x86: Booting SMP configuration:
[   87.265179] smpboot: Booting Node 0 Processor 1 APIC 0x2
[   87.268461] CPU1 is up
[   87.268508] smpboot: Booting Node 0 Processor 2 APIC 0x1
[   87.269612] CPU2 is up
[   87.269649] smpboot: Booting Node 0 Processor 3 APIC 0x3
[   87.270604] CPU3 is up
[   87.272498] ACPI: Waking up from system sleep state S3
[   87.274019] ACPI: button: The lid device is not compliant to SW_LID.
[   87.274453] ACPI: EC: interrupt unblocked
[   87.298871] ACPI: EC: event unblocked
[   87.309328] sd 0:0:0:0: [sda] Starting disk
[   87.313944] sony_laptop: invalid acpi_object: expected 0x1 got 0x3
[   87.314115] sony_laptop: invalid acpi_object: expected 0x1 got 0x3
[   87.315003] sony_laptop: invalid acpi_object: expected 0x1 got 0x3
[   87.315012] BUG: kernel NULL pointer dereference, address: 000000000000000000
[   87.315014] #PF: supervisor read access in kernel mode
[   87.315015] #PF: error_code(0x0000) - not-present page
[   87.315016] PGD 0 P4D 0
[   87.315019] Oops: 0000 [#1] SMP PTI
[   87.315021] CPU: 0 PID: 1796 Comm: systemd-sleep Not tainted 5.6.7-300-fc32.x86_64 #1
[   87.315023] Hardware name: Sony Corporation SVP1322C5E/VAIO, BIOS R2091V7 03/24/2014
[   87.315028] RIP: 0100:sony_nc_resume+0x1de/0x200 [sony_laptop]
[   87.315030] Code: ff ff ff e9 40 ff ff ff 4c 89 e2 be 00 01 00 00 bf 22 01 00 00 e8 f2 df ff ff 85 c0 75 23 0f b6 44 24 0c 48 8b 15 12 97 00 00 <8b> 3a 39 c7 0f 84 14 ff ff ff 0f b7 ff e8 30 e0 ff ff e9 07 ff ff
[   87.315032] RSP: 0018:ffffa98140953d10 EFLAGS: 00010282
[   87.315034] RAX: 00000000fffffffb RBX: 000000000000000f RCX: 000000000000937d
[   87.315036] RDX: 0000000000000000 RSI: c5f11fee0037bf32 RDI: 0000000000030080
[   87.315037] RBP: ffff8c69963ab260 R08: 0000000000000461 R09: 0000000000000029
[   87.315039] R10: ffff8c696c4a59a0 R11: 0000000000000000 R12: ffffa98140953d1c
[   87.315040] R13: 0000000000000000 R14: ffffffffba3df2e1 R15: 0000000000000010
[   87.315042] FS:  00007f45e0ab2b80(0000) GS:ffff8c6997a00000(0000) knlGS:0000000000000000
[   87.315044] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000000050033
[   87.315045] CR2: 0000000000000000 CR3: 00000001fcc7a001 CR4: 00000000001606f0
[   87.315047] Call Trace:
[   87.315052]  ? _cond_resched+0x16/0x40
[   87.315054]  ? sony_nc_thermal_mode_show+0x60/0x60 [sony_laptop]
[   87.315057] dpm_run_callback+0x4f/0x140
[   87.315059] device_resume+0x136/0x200
[   87.315062] dpm_resume+0xce/0x2e0
[   87.315064] dpm_resume_end+0xd/0x20
[   87.628326] ata1.00: configured for UDMA/133 SControl 300)ci_hcdType 0x1000000000000000f ff ff 85 c0 75 23 0f b6 44 24 0c 48 8b 15 12 97 00 00 <8b> 3a 39 c7 0f 84 14 ff ff ff 0f b7 ff e8 30 e0 ff ffimer mei_me snd mei soundc

Comment 2 Hans de Goede 2020-04-29 09:28:14 UTC
Ok, so this seems to be a bug in the sony-laptop driver, lets try blacklisting that as a first step towards debugging this.

Can you try adding: "modprobe.blacklist=sony-laptop" to your kernel commandline ?

Comment 3 Hans de Goede 2020-04-29 09:36:01 UTC
p.s.

After adding the kernel commandline option and rebooting, please do: "lsmod | grep sony" this should not show sony-laptop, if it does then the blacklisting did not work for some reason.

Comment 4 Dominik 'Rathann' Mierzejewski 2020-04-29 11:01:14 UTC
Plain modprobe.blacklist doesn't work, but rd.driver.blacklist and blaxklist sony-laptop in /etc/modprobe.d/sony-laptop.conf does work. Also, resume works fine without the sony-laptop module loaded. Thanks.

Comment 5 Dominik 'Rathann' Mierzejewski 2020-04-29 20:56:37 UTC
FWIW, I don't see any commits between 5.5.17 and 5.6.7 touching the sony_laptop module, but there are some new errors in dmesg compared to 5.6.7:
[   18.419670] sony_laptop: Invalid acpi_object: expected 0x1 got 0x3
[   18.422698] sony_laptop: Invalid acpi_object: expected 0x1 got 0x3
[   18.424856] sony_laptop: couldn't set up keyboard backlight function (-22)
[   18.428007] sony_laptop: Invalid acpi_object: expected 0x1 got 0x3
[   18.430306] sony_laptop: Invalid acpi_object: expected 0x1 got 0x3
[   18.433781] sony_laptop: Invalid acpi_object: expected 0x1 got 0x3
[   18.435843] sony_laptop: No USB Charge capability found
[   18.438865] sony_laptop: Invalid acpi_object: expected 0x1 got 0x3
[   18.441430] sony_laptop: couldn't set up lid resume function (-5)
[   18.443902] sony_laptop: Invalid acpi_object: expected 0x1 got 0x3
[   18.446393] sony_laptop: couldn't to read the thermal profiles
[   18.448270] sony_laptop: couldn't set up thermal profile function (-22)
[   18.452028] sony_laptop: SNC setup done.

Comment 6 Dominik 'Rathann' Mierzejewski 2020-04-29 21:28:20 UTC
Instead of blacklisting the module altogether, I added a work-around to unload the module for suspend/resume only:

$ cat /usr/lib/systemd/system-sleep/sony-laptop 
#!/bin/sh
if [ "${1}" == "pre" ]; then
  logger --journald <<__end
MESSAGE=removing sony-laptop module before suspending, see https://bugzilla.redhat.com/show_bug.cgi?id=1829096
__end
  modprobe -r sony-laptop
elif [ "${1}" == "post" ]; then
  logger --journald <<__end
MESSAGE=reloading sony-laptop module after resuming, see https://bugzilla.redhat.com/show_bug.cgi?id=1829096
__end
  modprobe sony-laptop
fi

Seems to work reliably.

Comment 7 Hans de Goede 2020-04-30 09:36:59 UTC
Hmm, ok so this regression is likely caused by some changes inside the ACPI subsystem. I'm afraid that the best way to track this down is to do a kernel bisect between 5.5.0 and 5.6.0 then. I know this is a bit time consuming, but it really is the easiest way to find the commit which causes this issue.

Comment 8 William Bader 2020-05-04 20:24:11 UTC
>the best way to track this down is to do a kernel bisect between 5.5.0 and 5.6.0 then.

I bisected it at https://bugzilla.redhat.com/show_bug.cgi?id=1830150#c24

Comment 9 Hans de Goede 2020-05-05 10:07:10 UTC

*** This bug has been marked as a duplicate of bug 1830150 ***


Note You need to log in before you can comment on or make changes to this bug.