1762199 – RAID6 Grow causes md to crash in aarch64 AMD Seattle

Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.

Bug 1762199 - RAID6 Grow causes md to crash in aarch64 AMD Seattle

Summary: RAID6 Grow causes md to crash in aarch64 AMD Seattle

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	mdadm
Sub Component:
Version:	30
Hardware:	aarch64
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Jes Sorensen
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	ARMTracker
TreeView+	depends on / blocked

Reported:	2019-10-16 07:58 UTC by Sahaj Sarup
Modified:	2020-01-09 17:04 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-01-09 17:04:36 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
dump (62.78 KB, text/plain) 2019-10-16 07:59 UTC, Sahaj Sarup	no flags	Details
View All

Description Sahaj Sarup 2019-10-16 07:58:43 UTC

Description of problem:

Growing a RAID6 setup with mdadm causes modules related to Linux md devices to crash. And the process repeats on every reboot.

Hardware:

Softiron Overdrive 3000 with 16Gig of ECC Memory.

Version-Release number of selected component (if applicable):

Logs from: 5.2.18-200.fc30.aarch64
Also tested on 5.0.xx-fc30 and 5.3.xx-fc31

How reproducible:
Very

Steps to Reproduce:

Setup:
1. 5x 4TB HDD mix of WD and Seagate
2. Each HDD is GPT and has a single EXT4 Partition of 3.6T
3. Let the partitions be sda1, sdb1, sdc1, sdd1, sde1
4. mdadm will use these ext4 partitions and not the raw drive

Create RAID6 with four drives:
1. mdadm --create /dev/md0 --level=6 --raid-devices=4 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
2. mkfs.ext4 /dev/md0
3. Reboot to make sure raid holds.

Grow RAID6 by adding the 5th drive
1. mdadm --add /dev/md0 /dev/sde1
2. mdadm -v --grow --raid-devices=5 /dev/md0
3. It is at this point that the crash occurs.

Actual results:

- dmesg reports the crash dump (see attached log)
- /proc/mdstat reports infinitely increasing eta similar to:
[>....................] reshape = 0.0% (370132/2930265088) finish=6389349238472939.1min speed=0.0K/sec
- all mdadm commands hang.

Expected results:
- Reshape continues without issues and raid still being active

Additional Info:

I have already tried the following things, all resulting in the same crash dump.
- physically disconnecting the 5th drive
- changing all the cables
- noncq
- using a pcie sata card instead of the onboard sata ports

Fedora always booted using acpi=force

Comment 1 Sahaj Sarup 2019-10-16 07:59:27 UTC

Created attachment 1626328 [details]
dump

Comment 2 Peter Robinson 2019-10-16 08:44:39 UTC

Can you have a look at this, or advise who would be best to look at it, I'm concerned there's a possible data loss issue with a mdraid raid-6 crash on aarch64

Comment 3 Mark Salter 2019-10-16 15:51:03 UTC

I'll see if I can reproduce with the 5 puny half-TB drives I have immediate access to.

Comment 4 Sahaj Sarup 2019-10-17 18:49:38 UTC

Bit of an update on the situation that might help. (Also this is just FYI, not asking for data recovery help)

I was able to continue the reshape process in an x86 box and ended up adding the 5th drive as an active raid disk.
At first all seemed fine until i ran fsck on the md0 device and it turns out the entire ext4 filesystem on the md0 device got corrupted pretty bad.

Now weather is happened because i tried to recover raid numerous times instead of just plugging the drives to another system or was this caused by the mdraid crashing, I'm not sure.

full of an endless list of: inode <big number in billions> seem to contain garbage. Clear?

Comment 5 Mark Salter 2019-10-17 19:20:39 UTC

Thanks for update. I reproduced on my seattle running 5.3.5-200.fc30 kernel. I'll keeping digging at it.

Comment 6 Mark Salter 2019-10-21 13:54:02 UTC

So the backtrace really tells the story here. There's a NULL-pointer dereference in __list_del_entry_valid(). The caller, ccp_tx_submit() has this:

  static dma_cookie_t ccp_tx_submit(struct dma_async_tx_descriptor *tx_desc)
  {
        ...
        list_del(&desc->entry);
        list_add_tail(&desc->entry, &chan->pending);
        ...
  }


Apparently, given the right set of circumstances, desc->entry is uninitialized leading to NULL-pointer dereference. This makes the problem go away:

index a54f9367a580..0770a83bf1a5 100644
--- a/drivers/crypto/ccp/ccp-dmaengine.c
+++ b/drivers/crypto/ccp/ccp-dmaengine.c
@@ -342,6 +342,7 @@ static struct ccp_dma_desc *ccp_alloc_dma_desc(struct ccp_dma_chan *chan,
        desc->tx_desc.flags = flags;
        desc->tx_desc.tx_submit = ccp_tx_submit;
        desc->ccp = chan->ccp;
+       INIT_LIST_HEAD(&desc->entry);
        INIT_LIST_HEAD(&desc->pending);
        INIT_LIST_HEAD(&desc->active);
        desc->status = DMA_IN_PROGRESS;

Most code paths end up adding ccp_dma_desc->entry to another list which implicitly initializes it and avoids the problem. I'm still trying to understand the bad path to make sure it is valid. Once I'm satisfied, I'll post a patch upstream.

BTW, ccp is an AMD crypto chip with a dma offload engine. I'm surprised that this hasn't been seen before on an AMD x86 box.

Comment 7 Gary R Hook (AMD) 2019-10-22 15:11:35 UTC

We think that the precipitating condition may be that the source or destination length of a transfer is 0. If so, then the code in ccp_create_desc() would kick out of a loop without adding the new descriptor to a list, and entry would be uninitialized.

We also think that initializing the list head is the "right" solution.

Not having seen a failure like this, I'd be interested in a test of an alternative patch, but without your patch above:

    crypto:ccp - Verify length parameters before creating a descriptor
    
    The source and destination lengths should be vetted early. If either
    is zero, the transfer is a no-op, so return.
    
    Signed-off-by: Gary R Hook <gary.hook>

diff --git a/drivers/crypto/ccp/ccp-dmaengine.c b/drivers/crypto/ccp/ccp-dmaengine.c
index 0770a83bf1a5..4669c5ac3c81 100644
--- a/drivers/crypto/ccp/ccp-dmaengine.c
+++ b/drivers/crypto/ccp/ccp-dmaengine.c
@@ -376,16 +376,20 @@ static struct ccp_dma_desc *ccp_create_desc(struct dma_chan *dma_chan,
        if (!dst_nents || !src_nents)
                return NULL;
 
+       src_len = sg_dma_len(src_sg);
+       if (!src_len)
+               return NULL;
+
+       dst_len = sg_dma_len(dst_sg);
+       if (!dst_len)
+               return NULL;
+
        desc = ccp_alloc_dma_desc(chan, flags);
        if (!desc)
                return NULL;
 
        total_len = 0;
-
-       src_len = sg_dma_len(src_sg);
        src_offset = 0;
-
-       dst_len = sg_dma_len(dst_sg);
        dst_offset = 0;
 
        while (true) {

Comment 8 Mark Salter 2019-10-22 18:21:37 UTC

Yes, that patch (without mine) also makes the problem go away.

Comment 9 Gary R Hook (AMD) 2019-10-29 14:32:20 UTC

Good to hear; thanks! Yours is the official fix, as I stated above, but we're glad to understand the corner case better.

Would you like us to handle the backport to 8.2? We should try to get this in, I think.

Comment 10 Gary R Hook (AMD) 2019-10-29 14:37:35 UTC

Sure thing: accepted in cryptodev-2.6: 0d3c6781d8d8

I am in the process of getting this into 8.2 (as part of a larger patchset). Do you want to assign this BZ to me, or use it to track a KB item?

Comment 11 Gary R Hook (AMD) 2019-10-29 14:38:44 UTC

Gah! Ignore comment 10; wrong BZ.

Comment 12 Mark Salter 2019-11-26 16:56:41 UTC

The fix has landed in Linus' tree. commit 691505a803a7

Note You need to log in before you can comment on or make changes to this bug.