Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1849196 - [ARK] kernel bug list_del corruption on s390x from stress-ng mknod and stress-ng symlink
Summary: [ARK] kernel bug list_del corruption on s390x from stress-ng mknod and stress...
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: s390x
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: ZedoraTracker
TreeView+ depends on / blocked
 
Reported: 2020-06-19 19:40 UTC by Jeff Bastian
Modified: 2021-06-18 07:18 UTC (History)
25 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Embargoed:


Attachments (Terms of Use)
5.7.2-88cd4de.cki-console.log.xz (41.57 KB, application/octet-stream)
2020-06-19 19:47 UTC, Jeff Bastian
no flags Details
5.7.4-4418f34.cki-console.log.xz (35.29 KB, application/octet-stream)
2020-06-19 19:50 UTC, Jeff Bastian
no flags Details
5.7.4-9b26e20.cki-console.log.xz (35.64 KB, application/octet-stream)
2020-06-19 19:50 UTC, Jeff Bastian
no flags Details
5.7.3-bbd1511.cki-console.log.xz (40.76 KB, application/octet-stream)
2020-06-19 19:50 UTC, Jeff Bastian
no flags Details
5.7.4-inf.cki-console.log.xz (40.48 KB, application/octet-stream)
2020-06-19 19:51 UTC, Jeff Bastian
no flags Details
5.8.0-rc1-c1f840d.cki-console.log.xz (37.09 KB, application/octet-stream)
2020-06-19 19:51 UTC, Jeff Bastian
no flags Details


Links
System ID Private Priority Status Summary Last Updated
IBM Linux Technology Center 186309 0 None None None 2020-06-20 16:13:51 UTC

Description Jeff Bastian 2020-06-19 19:40:49 UTC
1. Please describe the problem:
The stress-ng mknod and symlink stressors triggera a kernel bug on the ARK kernel on s390x:

mknod:
[ 1256.534428] list_del corruption. next->prev should be 000003e000be7c98, but was 00000001a7a6d1b0
[ 1256.534463] ------------[ cut here ]------------
[ 1256.534466] kernel BUG at lib/list_debug.c:54!
[ 1256.534535] monitor event: 0040 ilc:2 [#1] SMP
[ 1256.534540] Modules linked in: ...<snip>...
[ 1256.534806] CPU: 2 PID: 582352 Comm: stress-ng-mknod Kdump: loaded Not tainted 5.8.0-rc1-c1f840d.cki #1
[ 1256.534810] Hardware name: IBM 2964 N96 400 (z/VM 6.4.0)
...

symlink:
[ 1754.761295] list_del corruption. prev->next should be 000003e000e27a68, but was 00000001d6daa1f0
[ 1754.880585] ------------[ cut here ]------------
[ 1754.880588] kernel BUG at lib/list_debug.c:51!
[ 1754.880656] monitor event: 0040 ilc:2 [#1] SMP
[ 1754.880662] Modules linked in: ...<snip>...                            
[ 1754.880738] CPU: 3 PID: 592107 Comm: stress-ng-symli Kdump: loaded Not tainted 5.7.2-88cd4de.cki #1
[ 1754.880740] Hardware name: IBM 2964 N96 400 (z/VM 6.4.0)
...

2. What is the Version-Release number of the kernel:
several recent ARK kernel builds including:
5.8.0-rc1-c1f840d.cki
5.7.4-9b26e20.cki
5.7.4-4418f34.cki
5.7.4-inf.cki
5.7.3-bbd1511.cki
5.7.2-88cd4de.cki

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

Judging by the CKI logs, this first appeared in 5.7.2-88cd4de.cki

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

Build and run stress-ng and focus on the mknod stressor.

git clone git://kernel.ubuntu.com/cking/stress-ng.git
cd stress-ng
git checkout -b V0.09.56 V0.09.56
make
./stress-ng --mknod 0 --timeout 5 --log-file mknod.log

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

Unknown, but I can try the Rawhide kernel if it's valuable.

6. Are you running any modules that not shipped with directly Fedora's kernel?:

No

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Coming soon.

Comment 1 Jeff Bastian 2020-06-19 19:47:22 UTC
Created attachment 1698160 [details]
5.7.2-88cd4de.cki-console.log.xz

serial console log from kernel 5.7.2-88cd4de.cki

Comment 2 Jeff Bastian 2020-06-19 19:50:51 UTC
Created attachment 1698163 [details]
5.7.4-4418f34.cki-console.log.xz

serial console log from kernel 5.7.4-4418f34.cki

Comment 3 Jeff Bastian 2020-06-19 19:50:54 UTC
Created attachment 1698164 [details]
5.7.4-9b26e20.cki-console.log.xz

serial console log from kernel 5.7.4-9b26e20.cki

Comment 4 Jeff Bastian 2020-06-19 19:50:58 UTC
Created attachment 1698165 [details]
5.7.3-bbd1511.cki-console.log.xz

serial console log from kernel 5.7.3-bbd1511.cki

Comment 5 Jeff Bastian 2020-06-19 19:51:01 UTC
Created attachment 1698166 [details]
5.7.4-inf.cki-console.log.xz

serial console log from kernel 5.7.4-inf.cki

Comment 6 Jeff Bastian 2020-06-19 19:51:05 UTC
Created attachment 1698167 [details]
5.8.0-rc1-c1f840d.cki-console.log.xz

serial console log from kernel 5.8.0-rc1-c1f840d.cki

Comment 8 Jeff Bastian 2020-06-19 20:00:02 UTC
The full trace from kernel 5.8.0-rc1-c1f840d.cki

[ 1256.534428] list_del corruption. next->prev should be 000003e000be7c98, but was 00000001a7a6d1b0
[ 1256.534463] ------------[ cut here ]------------
[ 1256.534466] kernel BUG at lib/list_debug.c:54!
[ 1256.534535] monitor event: 0040 ilc:2 [#1] SMP
[ 1256.534540] Modules linked in: loop binfmt_misc psnap llc salsa20_generic camellia_generic cast6_generic cast_common serpent_generic twofish_generic twofish_common ofb lrw tgr192 wp512 rmd320 rmd256 rmd160 rmd128 md4 lcs ctcm fsm zfcp scsi_transport_fc dasd_fba_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfslockd grace fscache sunrpc qeth_l2 qeth qdio ccwgroup vfio_ccw vfio_mdev mdev vfio_iommu_type1 vfio drm drm_panel_orientation_quirks backlight i2c_core ip_tables xfs libcrc32c crc32_vx_s390 ghash_s390 prng aes_s390 des_s390 libdes sha512_s390 sha256_s390 sha1_s390 sha_common dasd_eckd_mod dasd_mod pkey zcrypt
[ 1256.534806] CPU: 2 PID: 582352 Comm: stress-ng-mknod Kdump: loaded Not tainted 5.8.0-rc1-c1f840d.cki #1
[ 1256.534810] Hardware name: IBM 2964 N96 400 (z/VM 6.4.0)
[ 1256.534820] Krnl PSW : 0404e00180000000 000000009d9b173c (__list_del_entry_valid+0x8c/0xb8)
[ 1256.534831]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
[ 1256.534833] Krnl GPRS: 0000000000000064 000000009e5eace8 0000000000000054 0000001f5042a08
[ 1256.534834]            00000001f5051800 0000000000000000 0000000190040180 00000001f1c86570
[ 1256.534836]            00000001f1c86620 070000009ddc1582 000003e000be7c80 00000001a7a6d1a8
[ 1256.534837]            00000001f11ac000 00000001e44f1400 000000009d9b1738 000003e000be7b30
[ 1256.534851] Krnl Code: 000000009d9b172c: e33010000004        lg      %r3,0(%r1)
[ 1256.534851]            000000009d9b1732: c0e5ffdda063        brasl   %r14,000000009d5657f8
[ 1256.534851]           #000000009d9b1738: af000000            mc      0,0
[ 1256.534851]           >000000009d9b173c: b9040032            lgr     %r3,%r2
[ 1256.534851]            000000009d9b1740: c020003251ab        larl    %r2,000000009dffba96
[ 1256.534851]            000000009d9b1746: c0e5ffdda059        brasl   %r14,000000009d5657f8
[ 1256.534851]            000000009d9b174c: af000000            mc      0,0
[ 1256.534851]            000000009d9b1750: b9040032            lgr     %r3,%r2
[ 1256.534866] Call Trace:
[ 1256.534868]  [<000000009d9b173c>] __list_del_entry_valid+0x8c/0xb8
[ 1256.534871] ([<000000009d9b1738>] __list_del_entry_valid+0x88/0xb8)
[ 1256.534876]  [<000000009d54d800>] remove_wait_queue+0x48/0xa0
[ 1256.535024]  [<000003ff8019f3d0>] xfs_log_commit_cil+0x900/0xa50 [xfs]
[ 1256.535056]  [<000003ff80197704>] __xfs_trans_commit+0x9c/0x3a8 [xfs]
[ 1256.535089]  [<000003ff80189a9c>] xfs_remove+0x274/0x328 [xfs]
[ 1256.535121]  [<000003ff80183962>] xfs_vn_unlink+0x5a/0xa8 [xfs]
[ 1256.535126]  [<000000009d768874>] vfs_unlink+0x134/0x250
[ 1256.535128]  [<000000009d76d00a>] do_unlinkat+0x1ba/0x318
[ 1256.535133]  [<000000009ddc692c>] system_call+0xe0/0x2b0
[ 1256.535134] Last Breaking-Event-Address:
[ 1256.810858]  [<000000009ddc7b40>] __s390_indirect_jump_r14+0x0/0xc
[ 1256.810925] ---[ end trace d4f63cd47d1c630e ]---

Comment 9 Jeff Bastian 2020-06-19 20:07:50 UTC
The full trace from kernel 5.7.2-88cd4de.cki

[ 1754.761295] list_del corruption. prev->next should be 000003e000e27a68, but was 00000001d6daa1f0
[ 1754.880585] ------------[ cut here ]------------
[ 1754.880588] kernel BUG at lib/list_debug.c:51!
[ 1754.880656] monitor event: 0040 ilc:2 [#1] SMP
[ 1754.880662] Modules linked in: unix_diag binfmt_misc psnap llc salsa20_generic camellia_generic cast6_generic cast_common serpent_generic twofish_generic twofish_common ofb lrw tgr192 wp512 rmd320 rmd256 rmd160 rmd128 md4 loop tun af_kecrypto_user scsi_transport_iscsi xt_multiport overlay xt_CONNSECMARK xt_SECMARKnft_counter xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables nfnetlink ah6 ah4 sctp lcs ctcm fsm zfcp scsi_transport_fc dasd_fba_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc qeth_l2 qeth qdio ccwgroup vfio_ccw vfio_mdev mdev vfio_iommu_type1 vfio drm drm_panel_orientation_quirks backlight i2c_core ip_tables xfs libcrc32c crc32_vx_s390 ghash_s390 prng aes_s390 des_s390 libdes sha512_s390 sha256_s390 sha1_s390 sha_common dasd_eckd_mod dasd_mod pkey zcrypt
[ 1754.880738] CPU: 3 PID: 592107 Comm: stress-ng-symli Kdump: loaded Not tainted 5.7.2-88cd4de.cki #1
[ 1754.880740] Hardware name: IBM 2964 N96 400 (z/VM 6.4.0)
[ 1754.880742] Krnl PSW : 0404e00180000000 0000000021651180 (__list_del_entry_valid+0xa0/0xb8)
[ 1754.880754]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
[ 1754.880756] Krnl GPRS: 0000000000000064 000000002226fbe8 0000000000000054 00000001f509fa10
[ 1754.880757]            00000001f50ae408 0000000000000000 0000000184ca0d30 00000001f104aae0
[ 1754.880759]            00000001f104ab90 0700000021a58c52 000003e000e27a50 00000001d6daa1e8
[ 1754.880760]            0000000089acc000 00000001e4125c00 000000002165117c 000003e000e27900
[ 1754.880768] Krnl Code: 0000000021651170: c0200031e8a3        larl    %r2,0000000021c8e2b6
[ 1754.880768]            0000000021651176: c0e5ffddf2d9        brasl   %r14,000000002120f728
[ 1754.880768]           #000000002165117c: af000000            mc      0,0
[ 1754.880768]           >0000000021651180: b9040032            lgr     %r3,%r2
[ 1754.880768]            0000000021651184: c0200031e87d        larl    %r2,0000000021c8e27e
[ 1754.880768]            000000002165118a: c0e5ffddf2cf        brasl   %r14,000000002120f728
[ 1754.880768]            0000000021651190: af000000            mc      0,0
[ 1754.880768]            0000000021651194: 0707                bcr     0,%r7
[ 1754.880783] Call Trace:
[ 1754.880786]  [<0000000021651180>] __list_del_entry_valid+0xa0/0xb8
[ 1754.880789] ([<000000002165117c>] __list_del_entry_valid+0x9c/0xb8)
[ 1754.880793]  [<00000000211f77d8>] remove_wait_queue+0x48/0xa0
[ 1754.884725]  [<000003ff801e2240>] xfs_log_commit_cil+0x900/0xa50 [xfs]
[ 1754.884763]  [<000003ff801da55c>] __xfs_trans_commit+0x9c/0x3a8 [xfs]
[ 1754.884795]  [<000003ff8015dfd8>] xfs_attr_try_sf_addname+0x68/0xc8 [xfs]
[ 1754.884827]  [<000003ff8015f09a>] xfs_attr_set_args+0x9a/0x128 [xfs]
[ 1754.885084]  [<000003ff8015f37e>] xfs_attr_set+0x1be/0x2f8 [xfs]
[ 1754.885117]  [<000003ff801c61e8>] xfs_initxattrs+0x98/0xb8 [xfs]
[ 1754.885122]  [<000000002157d542>] security_inode_init_security+0x152/0x160
[ 1754.885154]  [<000003ff801c6144>] xfs_init_security+0x2c/0x38 [xfs]
[ 1754.885187]  [<000003ff801c7bc8>] xfs_vn_symlink+0xb0/0x1d0 [xfs]
[ 1754.885190]  [<000000002140d7ae>] vfs_symlink+0xfe/0x1c8
[ 1754.885192]  [<000000002141020a>] do_symlinkat+0xa2/0xf8
[ 1754.885196]  [<0000000021a5dfd0>] system_call+0xdc/0x2c8
[ 1754.885197] Last Breaking-Event-Address:
[ 1754.885199]  [<0000000021a5f560>] __s390_indirect_jump_r14+0x0/0xc
[ 1754.885249] ---[ end trace 9e0dbe149edf1c8a ]---

Comment 10 Jeff Bastian 2020-06-19 20:20:02 UTC
Hanns-Joachim, can you mirror this for IBM BZ?

Comment 11 IBM Bug Proxy 2020-06-22 10:51:36 UTC
------- Comment From geraldsc.com 2020-06-22 06:41 EDT-------
This look like a common code / xfs issue, and probably should be reported to xfs maintainer. No s390 code involved here.

I also cannot reproduce this on my ext4 system, can you verify that this only shows with xfs? Does it also show on other architectures?

Comment 12 Jeff Bastian 2020-06-24 18:36:05 UTC
The same tests pass fine on x86_64, ppc64le, and aarch64.  It only fails on s390x, and fails fairly regularly.

I'll try ext4 and see what happens.

Comment 13 Jeff Bastian 2020-07-10 21:02:55 UTC
I ran xfs tests 4-times on an ext4 file system and could not reproduce the problem.


Note You need to log in before you can comment on or make changes to this bug.