Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 736387
Summary: | kernel-2.6.40-4.fc15.x86_64 fails to boot due to failure to start MD RAID | |||
---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Doug Ledford <dledford> | |
Component: | mdadm | Assignee: | Doug Ledford <dledford> | |
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | |
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 15 | CC: | agajania, agk, brian.broussard, bugzilla, cb20777, c.bradley, cjg9411, dev, dledford, harald, Jes.Sorensen, maciej.patelczyk, mbroz, michael.wuersch, msmsms10079, pb, rhbugzilla, serge, vezza | |
Target Milestone: | --- | |||
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | mdadm-3.2.2-15.fc15 | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | 729205 | |||
: | 744217 744219 (view as bug list) | Environment: | ||
Last Closed: | 2011-12-14 23:37:16 UTC | Type: | --- | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Attachments: |
Description
Doug Ledford
2011-09-07 15:04:07 UTC
Thanks for cloning the bug - I am not familiar with the internals of the early linux boot process and therefore, up to now, I was not aware that the bugs weren't related. I did not edit any lines out after the first line (i.e., the line 'dracut: dracut-009-12.fc15'). Can I contribute anything else to help in resolving this issue? Michael In the other bug I cloned from this one a fact came up that might be relevant here. Can you try grabbing the dracut package from your install media and downgrading your copy of dracut to what was shipped with f15, then rebuild the initramfs that fails to boot with the old dracut and try booting again? I did not use any media but instead relied on PreUpgrade to get to fc15. But I will download an ISO quickly and try as advised. No luck, so far. I have checked the version of dracut on the DVD: dracut-009-10.fc15.noarch.rpm, whereas I had installed 009-12.fc15. Since I can boot with 2.6.35.14-95.fc14.x86_64, I bootet and ran: sudo yum downgrade dracut Output: ... Running Transaction Installing : dracut-009-10.fc15.noarch Cleanup : dracut-009-12.fc15.noarch Removed: dracut.noarch 0:009-12.fc15 Installed: dracut.noarch 0:009-10.fc15 Then I ran: sudo dracut initramfs-2.6.40.4-5.fc15.x86_64.img 2.6.40.4-5.fc15.x86_64 --force and did a reboot. Same error message as before. Michael For some reason, on your system, the hard drives are not being found. Can you boot into the working kernel, then run dmesg and post the output of that into this bug please? Created attachment 522371 [details]
dmesg output
I have attached the log.
OK, so when the machine boots up successfully, it is starting drives sda and sdb as an imsm raid array, so when you try to boot the new kernel, it drops you to a debug shell. From that debug shell, I need you to do a few things. First, verify that /dev/sda and /dev/sdb exist. Next, if they exist, try to assemble them using mdadm via the following commands: /sbin/mdadm -I /dev/sda /sbin/mdadm -I /dev/sdb If those commands work, then you should now have a new md device. Try running this command on that new device: /sbin/mdadm -I /dev/md<device_number> If that gets you your raid array up and running, then the question becomes "Why isn't this happening automatically like it's supposed to?" To try and answer that, make sure that the files /lib/udev/rules.d/64-md-raid.rules and /lib/udev/rules.d/65-md-incremental.rules exist. Let me know what you find out. Perhaps my note https://bugzilla.redhat.com/show_bug.cgi?id=729205#c15 helps, at least in my case downgrade to mdadm-3.1.5-2 and recreate initramfs files will result in a proper working newer kernel. initramfs containing mdadm binary from mdadm-3.2.2-6 nor 3.2.2-9 will not work in my case and result in a broken boot. Any hints how to debug the mdadm problem in dracut shell? I booted into dracut debug shell and entered: /sbin/mdadm -I /dev/sda /sbin/mdadm -I /dev/sdb Output was: mdam: no RAID superblock on /dev/sda and /dev/sdb, respectively. /lib/udev/rules.d/64-md-raid.rules does exist, whereas /lib/udev/rules.d/65-md-incremental.rules does not. Here's the raid info from the "good" kernel: --- [user ~]$ sudo mdadm --detail /dev/md0 /dev/md0: Version : imsm Raid Level : container Total Devices : 2 Working Devices : 2 Member Arrays : /dev/md127 Number Major Minor RaidDevice 0 8 0 - /dev/sda 1 8 16 - /dev/sdb --- sudo mdadm --detail /dev/md127 /dev/md127: Container : /dev/md0, member 0 Raid Level : raid1 Array Size : 1953511424 (1863.01 GiB 2000.40 GB) Used Dev Size : 1953511556 (1863.01 GiB 2000.40 GB) Raid Devices : 2 Total Devices : 2 Update Time : Mon Sep 12 09:16:20 2011 State : active Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Number Major Minor RaidDevice State 1 8 0 0 active sync /dev/sda 0 8 16 1 active sync /dev/sdb --- cat /proc/mdstat Personalities : [raid1] md127 : active raid1 sda[1] sdb[0] 1953511424 blocks super external:/md0/0 [2/2] [UU] md0 : inactive sdb[1](S) sda[0](S) 4514 blocks super external:imsm unused devices: <none> Peter, Michael: if you boot into an initramfs that does not work, then what do you get when you run mdadm -E /dev/sda? Does it simply say there is no superblock at all, or does it say it finds one but it's invalid, and if it does say it's invalid, does it say why? mdadm -E /dev/sda shows proper output like /dev/sda Magic: Intel Raid ISM Cfg. Sig. ... Attributes: All supported ... [OS] (name of configured RAID1 set in BIOS) ... Migrate State: repair (because of all this failed boots...) ... cat /proc/mdstat tells md127 : inactive sda[1] sdb[0] ... blocks super exsternal:-md0/0 md0 : inactive sdb[1](S) sda[0](S) .. blocks super external: imsm For me it looks like that the new version of mdadm simply forget to activate the RAID, while the old version does Sorry for the delay, here's the output of mdadm: dracut:/# /sbin/mdadm -E /dev/sd? /dev/sda: Magic : Intel Raid ISM Cfg Sig. Version : 1.1.00 Orig Family : 0932e0b0 Family : 0932e0b0 Generation : 00261fa8 Attributes : All supported UUID : ...:...:... Checksum : 045764af correct MPB Sectors : 1 Disks : 2 RAID Devices : 1 Disk00 Serial : JK11A8B9JL8X5F State : active Id : 00000000 Usable Size : 3907023112 (1863.01 GiB 2000.40 GB) [System:] UUID : ...:...:... RAID LEVEL : 1 Members : 2 SLOTS : [UU] FAILED DISK : none This Slot : 0 Array Size : 3907022848 (1863.01 GiB 2000.40 GB) Per Dev Size : 3907023112 (1863.01 GiB 2000.40 GB) Sector Offset : 0 Num Stripes : 15261808 Chunk Size : 64 KiB Reserved : 0 Migrate State : idle Map State : normal Dirty State : dirty Disk00 Serial : JK11A8B9JL8X5F State : active Id : 00000000 Usable Size : 3907023112 (1863.01 GiB 2000.40 GB) ... (pretty much the same for /dev/sdb, as above) Peter, you left out part of the contents of /proc/mdstat, what does the personality line read on a failed boot? (And I would like the same info from you Michael, aka the full contents of /proc/mdstat on a failed boot) Next notes: 1. always successful boot with old mdadm: Personalities : [raid1] 2. did now successful boot to a "NORMAL" (BIOS) array also with new mdadm. But here the resync starts immediately. [ 3.260537] md: md0 stopped. [ 3.263234] md: bind<sda> [ 3.263338] md: bind<sdb> [ 3.263490] dracut: mdadm: Container /dev/md0 has been assembled with 2 drives [ 3.272304] md: md127 stopped. [ 3.272514] md: bind<sdb> [ 3.272653] md: bind<sda> [ 3.273900] md: raid1 personality registered for level 1 [ 3.274490] md/raid1:md127: not clean -- starting background reconstruction ^^^^ BIOS told "NORMAL" ! [ 3.274564] md/raid1:md127: active with 2 out of 2 mirrors [ 3.274643] md127: detected capacity change from 0 to 160038912000 [ 3.282507] md: md127 switched to read-write mode. [ 3.282761] md: resync of RAID array md127 [ 3.282790] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. [ 3.282826] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync. [ 3.282882] md: using 128k window, over a total of 156288132k. [ 3.292149] dracut: mdadm: Started /dev/md127 with 2 devices [ 3.401892] md127: p1 p2 p3 p4 < p5 p6 p7 p8 > [ 3.724356] md: md1 stopped. [ 3.727722] md: bind<sdc1> [ 3.730487] md: bind<sdd1> [ 3.734248] md/raid1:md1: active with 2 out of 2 mirrors [ 3.736817] md1: detected capacity change from 0 to 160039174144 [ 3.739379] dracut: mdadm: /dev/md1 has been started with 2 drives. [ 3.743099] md1: unknown partition table Just note here, I ran 2 RAID1 with 4 drives /dev/sd{a,b} is IMSM (dual boot with Windows) /dev/sd{c,d} is a Linux only software RAID 3. Reboot now during this running resync results in BIOS "VERIFY" (just note that I think during shutdown something like store of current sync position is shown. Booting with new mdadm results now in broken boot, where Personalities : [raid1] and md1 (the Linux software RAID) is active, while md127 is inactive So as other also already have seen, if the IMSM RAID is in "VERIFY" mode, mdadm will not start the RAID. cat /proc/mdstat does not list anything when dropped to the dracut debug shell, i.e.: dracut:/# cat /proc/mdstat Personalities : unused devices: <none> Output for the old kernel (the one which is able to boot) is: Personalities : [raid1] md127 : active raid1 sda[1] sdb[0] 1953511424 blocks super external:/md0/0 [2/2] [UU] [>....................] resync = 0.0% (1727872/1953511556) finish=5236.1min speed=6212K/sec md0 : inactive sdb[1](S) sda[0](S) 4514 blocks super external:imsm unused devices: <none> OK, I've got enough info to try and reproduce it here. I'll see if I can work up a fix to this. It seems that the mdadm-3.2.2 binary is misinterpreting some of the bits in the imsm superblock so that it doesn't assemble arrays in VERIFY state and when the BIOS thinks an array is clean, mdadm thinks it is dirty and starts a rebuild. I ran additional tests because also after downgrading to mdadm-3.1.5-2.fc15.i686 the rebuild starts even on a clean array, which keeps my system after each reboot very busy for 90 minutes. Crossdowngrading to mdadm-3.1.3-0.git20100804.3.fc14 of F14 and creating a special new ramdisk finally solves the issue. Please check all changes from 3.1.3 to 3.1.5/3.2.2 I have been trying to bisect my way through 3.1.5 to 3.2.2 and not really had much luck with it. I am running a setup where I have 2 drives in a raid1, and one drive for the OS, so I do not depend on assembly during the initramfs state. I did notice that in some cases if I ran mdadm -I manually, after 3-4 tries they raid would suddenly come up and start syncing. In other cases it would show up in PENDING state as inactive. Still catching up on this so not sure what causes this to result in a raid being marked PENDING? Could we have a race with a missing memory barrier or something? I'll try and go back to 3.1.3 as well. Cheers, Jes Jes: for clarification's sake, when you say PENDING, do you mean something BIOS related, or do you mean the array is inactive and marked PENDING in the output of /proc/mdstat? Doug: This is /proc/mdstat output. I am pretty sure the word was PENDING, but I'll have to double check as I am not near the box showing the problems right now. Ok just to confirm, if I use 3.1.5 I sometimes get the md device into PENDING, like this: [root@mahomaho mdadm-nbrown]# cat /proc/mdstat Personalities : [raid1] md126 : active (read-only) raid1 sda[1] sdb[0] 41943040 blocks super external:/md127/0 [2/2] [UU] resync=PENDING md127 : inactive sdb[1](S) sda[0](S) 4514 blocks super external:imsm unused devices: <none> This is after I run 'mdadm -I /dev/sda ; mdadm -I /dev/sdb' for a raid1 md device. Going back to 3.1.3 seems to make it start resync'ing reliably. Jes Hi, More testing and some bad news .... I can reproduce this with 3.1.3 as well! It is just a bit harder to reproduce with the older version. My guess is we have a race condition somewhere, and something happened along the way that altered the timing. It is fairly easy to reproduce if you are looking for it. Sometimes I have to try 30-40 times, but it shows up in the end, even with 3.1.3. My setup is fairly simple: IMSM raid in raid1 mode over two drives (sd[ab]) and my OS installed on sdc. Boot, then manually run this: ./mdadm -S /dev/md126 ; ./mdadm -S /dev/md127 ; cat /proc/mdstat ./mdadm -I /dev/sda ; ./mdadm -I /dev/sdb ; cat /proc/mdstat Repeat Until the array shows up in PENDING (like in the previous post) or inactive. In some cases md126 isn't found, in other cases both show up as inactive, and in some cases I get PENDING. Jes I was recently bitten by this, presumably after my F15 kernel (or mdadm) was routinely updated. Here is my information in case there are any helpful clues: I have the following on my system: kernel: 2.6.40.6-0.fc15.x86_64 kernel: 2.6.40.4-5.fc15.x86_64 kernel: 2.6.40.3-0.fc15.x86_64 - using this one for now mdadm: 3.2.2-9.fc15 IMSM mirror: boot on MD parition, root and rest on LVM on second MD partition. If I boot into any of the above kernels with the IMSM in "VERIFY" mode, the boot fails with a dracut error trying to access the files under LVM. If I boot into the "4-5" kernel, with IMSM in "NORMAL" mode, the boot fails with dracut not finding the LVM volumes, and the IMSM gets set to "VERIFY". What I do to recover is to boot from an F15 DVD, enter rescue mode, and wait a few hours for the MD raid to rebuild and get set to NORMAL again. I have not yet tried to boot in "6-0" with the IMSM in "NORMAL" mode. Questions: 1) Is this problem well enough understood that there is a fix somewhere? I couldn't see anything in "testing"? 2) How do you guys get all that debug info exported from a system that fails to boot and drops into the very limited debug shell? I've been using pencil and paper -- very laborious. Charles, Thanks for the data! The problem you are seeing with the 4-5 kernel may have been fixed in Fedora 16, but I am not 100% sure it is safe to ask you to update your dracut binary to this one: https://koji.fedoraproject.org/koji/buildinfo?buildID=266766 Maybe Harald can comment on this. With regard to understanding the problem, then unfortunately no, it isn't well enough understood yet to say what is causing this. I am going to do a fresh Fedora 15 and play with the two kernel versions you mention, it could give us a hint. Last, how to copy data across, I find the simplest way is to use a USB stick. Switch to console mode CTRL-ALT-F1, mount it, then copy /tmp/*log to the USB stick. Jes Harald, Can you comment on whether the mdraid changes you made to dracut are applicable to the latest version of Fedora 15 as well, per the two previous comments? Thanks, Jes Charles, I am seeing it here too, I had a clean raid1, booted it into kernel: 2.6.40.6-0.fc15.x86_64 and it got marked dirty. Taking it offline and re-adding it and it behaves like previously reporting in this bug. I will try and roll back to kernel: 2.6.40.3-0.fc15.x86_64 Cheers, Jes Tried 2.6.40-3.0 and I still see the same - then rolled back to 2.6.38.6-26.rc1.fc15.x86_64 and there I also see the problem with the array refusing to start syncing..... Hej guys, I also have the same line-up (and so the same problem) with my workstation as Charles Butterfield has. @Jes: If I can do any dirty testing (I already saved my data to another disk), let me know. ;-) Greetz, Gerhard If after upgrading to mdadm 3.2.2 you see message like this: "First, Rodney, you're original bug was this: dracut: mdadm: Container /dev/md127 has been assembled with 2 drives dracut: mdadm (IMSM): Unsupported attributes: 40000000 dracut: mdadm IMSM metadata load not allowed due to attribute incompatibility" which is in first comment by Doug then i suggest that you should try the following patch from Neil's repo: commit id: 418f9b368a1200370695527d22aba8c3606172c5 IMSM: allow some array attribute bits to be ignored. Some bits are not handled by mdadm, but their presence should not cause failure. In particular MPB_ATTRIB_NEVER_USE appears harmless. Reported-by: Thomas Steinborn <thestonewell> Signed-off-by: NeilBrown <neilb> Doug could you try this? Gerhard, Just to be sure, in your case are you trying to boot off the raid1 device or is it a secondary device in the system that doesn't get assembled correctly at boot? Thanks, Jes Hej Jes, exactly. I try to start from the raid1 device and after a while I get an error-msg and the dracut-shell. The only difference to Charles is, I don't use an LVM - just four primary partitions. Greetz, Gerhard mdadm-3.2.2-10.fc15 has been submitted as an update for Fedora 15. https://admin.fedoraproject.org/updates/mdadm-3.2.2-10.fc15 *** Bug 727696 has been marked as a duplicate of this bug. *** Neil Brown (mdadm maintainer) spotted a bug in one of my fixes. I'll update and push a fixed version. Package mdadm-3.2.2-10.fc15: * should fix your issue, * was pushed to the Fedora 15 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing mdadm-3.2.2-10.fc15' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2011-14760 then log in and leave karma (feedback). mdadm-3.2.2-12.fc15 has been submitted as an update for Fedora 15. https://admin.fedoraproject.org/updates/mdadm-3.2.2-12.fc15 Hmm, this version still behaves strange. - updated to mentioned version - initramfs rebuilt - boot - array is recognized (good) - array in resync mode (good) - reboot after 10 min (sync still ongoing) - array still in resync mode (good) - wait until array is 100% resync'ed - reboot, BIOS shows "Normal" - array starts resyncing again. It looks also that there is a difference between normal reboots and using ALT-SYSRQ boot (needed sometimes because hanging in "unmount" state) if array is in "Normal" state. ALT-SYSRQ B keeps "Normal", while normal reboot triggers resync. mdadm-3.2.2-10.fc15 works well enough to allow me to use my system. Last night I installed mdadm-3.2.2-10.fc15, and recreated the initramfs for all 3 of my most recent kernels. I asked how to rebuild the initramfs and got the following guidance, which I am sharing: To rebuild various initramfs, do the following: cd /boot dracut -f initramfs-<kernel version>.img <kernel version> # for each version where <kernel version> = 2.6.40.3-0.fc15.x86_64, etc Lastly, ignore the warnings about missing modules, they seem to be benign (at least in my case). Thanks guys! Thanks for the feedback! Note that mdadm-3.2.2-12 is out, fixing a bug in the previous version. However if -10 works for you, not a problem, the bug is not malicious. Peter, I have seen the reboot issue occasionally. I don't think it is related to this particular problem, but rather an issue with raids not being shutdown correctly at reboot. If you could file a separate BZ on that issue, that would be good. Cheers, Jes I did: sudo yum update --enablerepo=updates-testing mdadm-3.2.2-10.fc15 sudo dracut initramfs-2.6.40.4-5.fc15.x86_64.img 2.6.40.4-5.fc15.x86_64 --force and rebooted, selecting the corresponding kernel. Still fails to boot and instead drops into to dracut debug shell. Cheers, Michael Michael, Please grab mdadm-3.2.2-12.fc15 Then as root run 'dracut -f "" 2.6.40.4-5.fc15.x86_64' The symptoms you are seeing sounds very much like it is picking up the old initramfs image. Jes Thanks, Jes and sorry for bothering you - I did not see the latest comment. I will check the new version and report back as soon as it is available (it does not seem to have arrived in the updates-testing repo yet). Michael No luck, so far: I grabbed the rpm in the meantime from http://kojipkgs.fedoraproject.org/packages/mdadm/3.2.2/12.fc15/x86_64/mdadm-3.2.2-12.fc15.x86_64.rpm. I have also used yum update to get the latest kernel version (2.6.40.6-0.fc15.x86_64). Installed the mdadm update, rebuilt the initramfs for the latest kernel (but 4-5 does not work either), and rebooted with: sudo yum install /home/wuersch/mdadm-3.2.2-12.fc15.x86_64.rpm sudo dracut initramfs-2.6.40.6-0.fc15.x86_64.img 2.6.40.6-0.fc15.x86_64 --force sudo shutdown -r now Still, I am getting to the dracut debug shell with dmesg showing: dracut: Autoassembling MD Raid dracut Warning: No root device "block:/dev/disk/by-uuid/812eb062-d765-4065-be34-4a2cf4160064" (as mentioned a couple of weeks ago, I can still boot with 2.6.35.14-95.fc14.x86_64) Let me know, if I can provide any additional information to sort this out. Michael Michael, Very odd - could you try and grab a copy of /init.log and a snapshot of the screen when it goes wrong? You should be able to mount a usb stick from the dracut shell. Thanks, Jes Created attachment 529846 [details]
dmesg with rdshell rdinitdebug
There's no /init.log (I have removed rhgb quiet from the kernel commandline and added rdshell rdinitdebug instead). See attachement above. The init.log might be in a different directory. However your dmesg output seems to be a cycle of messages about a Fedora 15 disc in the DVD drive filling the log. Could you try booting without this disc in the drive? Thanks, Jes Created attachment 529854 [details]
dmesg with rdshell rdinitdebug
Removed the DVD as requested.
--Michael
Still mostly messages from the DVD drive - please add "log_buf_len=1M" Ok input from Harald, please try with these parameters added: rdshell rdbreak rdinitdebug quiet loglevel=9 log_buf_len=1M Btw. init.log should be either /init.log or /run/initramfs/init.log Created attachment 529876 [details]
dmesg with rdshell rdbreak rdinitdebug quiet loglevel=9 log_buf_len=1M
Here's the dmesg output again. Do you still need init.log in addition?
[ 4.509309] dracut: + /sbin/mdadm -As --auto=yes --run Michael, on a failed boot can you please grab the contents of /etc/mdadm.conf from the rdshell. Also, mdadm -E /dev/sda and mdadm -E /dev/sdb would be helpful. Finally, you can try removing any instances of rd_MD_UUID from the grub command line and see if it boots successfully that way. Michael, In addition, once you get dropped into the rdshell, are there any references on the screen at that point about 'reshape' and the lack of a data file? Thanks, Jes Created attachment 530036 [details]
/run/initramfs/init.log from a failed boot
Created attachment 530037 [details]
Output from mdadm -E /dev/sda
Created attachment 530038 [details]
Output from mdadm -E /dev/sdb
Created attachment 530039 [details]
/etc/mdadm.conf from a failed boot
Jes, I do neither see 'reshape' or any data file mentioned on the screen. The only output (except for the stuff also in dmesg output) is: Dropping to debug shell. sh: can't access tty; job control turned off dracut:/# Thanks for investigating, Michael Doug, Removing rd_MD_UUID from the kernel parameters via grub still brings me to the debug shell. Michael mdadm-3.2.2-14.fc15 has been submitted as an update for Fedora 15. https://admin.fedoraproject.org/updates/mdadm-3.2.2-14.fc15 Does not fix the issue .... issue a sudo shutdown -r 0 fresh build with 2.6.40.6-0 and then yum mdadm-3.2.2-14 worked fine with root shutdown -r 0 and shutdown -h 0 but when a user that has sudo rights to /sbin/shutdown system failed sh: can't access tty; job control turned off dracut:/# Target is a DELL Optiplex 990 with Intel Matrix RAID. mdadm-3.2.2-15.fc15 has been submitted as an update for Fedora 15. https://admin.fedoraproject.org/updates/mdadm-3.2.2-15.fc15 fresh build yum update and added mdadm-3.2.2-15.fc15 note took minimal build from the DVD, then yum update -y as root reboot (OK) make user login as user su reboot (failed) sh: can't access tty; job control turned off dracut:/# target system Dell 960 & 990 with Intel Matrix RAID. same as with -14, right? yes, note the reboot was issues with "su" not "su -" so I am not sure if it is an environment issues, but when the system comes back up the RAID states it is in Verify (not normal or initial)... Question should the mdadm be able to recover from this? Is there a way to get away from 0.90 Metadata? Note in F16 I have not been able to reproduce this issue, just have a number of other issues mainly with third party components... (In reply to comment #66) > same as with -14, right? Brian, This is still the issue where it goes wrong when a user reboots, but not when root issues the reboot? If that is the case, I would still expect it to be related to the environment when the command is issued. However this BZ is about problems with IMSM RAIDs, which is different metadata than 0.90, so I don't quite understand how the two are intersecting. Jes thanks i was not seeing the link, just see a lot of comments on the metadata. Want to make sure I did not miss something. I have look over the source code and am not understanding all the differences between the F15 2.6.x and the F16 3.1.x and how their associated files are responding deferentially to the same hardware. Looking at porting our robotic solution to F16 and just move forward. In your opinion why does this not happen in the latest F16? If F16 will not see this issue because of some design/logic differences then we will just move forward. this IMSM concerns are forefront of my mind today as all my field systems have them; most are FC11 running great, and the last few months been shipping FC15 kernel 2.6.38... also doing fine, until a yum updated and lost the machine on the next reboot. Also note mdadm-3.2.2-13 on three identical machines with FC15 (2.6.38 & 2.6.40) and FC16 (3.1.1) only see the issue with FC15 kernel 2.6.40, yes and their associated files. I do not like not understanding the WHY, but I need to move on with a working solution, so will let you know if I see this in the F16 solution. thanks brian (In reply to comment #68) > Brian, > > This is still the issue where it goes wrong when a user reboots, but not > when root issues the reboot? If that is the case, I would still expect > it to be related to the environment when the command is issued. > > However this BZ is about problems with IMSM RAIDs, which is different > metadata than 0.90, so I don't quite understand how the two are intersecting. > > Jes Brian, There are some differences in how dracut assembles the raid between F15 and F16 - I suspect this is where it goes wrong. The mdadm packages should be pretty much identical between the two Fedora releases. If F16 works for you, I'd recommend going down that path. Jes thanks (In reply to comment #70) > Brian, > > There are some differences in how dracut assembles the raid between F15 and > F16 - I suspect this is where it goes wrong. The mdadm packages should be > pretty much identical between the two Fedora releases. > > If F16 works for you, I'd recommend going down that path. > > Jes (In reply to comment #70) > Brian, > > There are some differences in how dracut assembles the raid between F15 and > F16 - I suspect this is where it goes wrong. The mdadm packages should be > pretty much identical between the two Fedora releases. > > If F16 works for you, I'd recommend going down that path. > > Jes Just to elaborate on this a little bit: Fedora is moving to a different initramfs scheme that involves the initramfs bringing raid devices up, then switching to the real root and exec'ing systemd as init, then when you shutdown, systemd will kill everything off that was started on the real root filesystem but leave things started from the initrd alive, switch the initrd root back to being the system root, then tear everything down in the initrd in reverse order that the initrd started it up. This is a complex set of operations that they don't have done yet, and won't appear in f15, but they have started to lay ground work in place in the dracut package IIUTC. Now, if you install the latest dracut on your system, and you install the latest kernel on your system, and the bug is actually in dracut and not the kernel, then it will appear to be a kernel bug because only new kernels will be effected when in fact it's a dracut bug and since dracut is used to build initramfs images and then those images are not updated just because dracut is updated, the new dracut only shows its bug on new kernel installs. So, the kernel issue can be a big red herring many times when it comes to mdadm/raid bootup issues. The real culprit in many of those cases is the initramfs image (either because dracut made a bad one, or there is a bad mdadm binary on it, or bad udev rules files on it, etc). So I wouldn't be so sure that your problem is related to the 2.6.40 kernel on f15 is my point ;-) I'd probably be looking more closely at one or both of systemd and dracut on that failing f15 box. mdadm-3.2.2-15.fc15 has been pushed to the Fedora 15 stable repository. If problems still persist, please make note of it in this bug report. |