Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 203241
Summary: | PATCH: mkinitrd does not create /dev/dm-x devices for dmraid causing total boot failure | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Hans de Goede <hdegoede> | ||||||
Component: | mkinitrd | Assignee: | Peter Jones <pjones> | ||||||
Status: | CLOSED DUPLICATE | QA Contact: | David Lawrence <dkl> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | urgent | ||||||||
Version: | rawhide | CC: | chabotc, gnomeuser, growltiger | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2006-08-31 21:04:46 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Hans de Goede
2006-08-19 18:25:36 UTC
Created attachment 134511 [details]
PATCH fixing dmraid booting in mkinitrd
Created attachment 134512 [details]
PATCH: dmraid boot initrd script workaround
Do we know when this patch will get into rawhide? It might be worth mentioning that the issued updated kernel for FC5 does not boot dmraid either as I most painfully discovered a few minutes ago. This definitely needs to get fixed soon. I am desparate to get past this problem. I confess that I have so far relied upon RPMs to update so I am hesitant about the patch process. Would someone be so kind as to provide some directions here (or a pointer to some site that provides a walkthrough) as to the order and procedure of applying this patch? Thanks in advance. By the way, do both patches need to be applied? The initial comment suggests that one might apply one OR the other. No either one will work. In your case the workaround rather then the real fix has the advantage that it will work without a recompile. Instructions: Save the second attachmend as mkinitrd.diff "cd /sbin" "patch -p1 < [path-to]/mkinitrd.diff" Where [path-to] should be replaced by the path to mkinitrd.diff And then rerun mkinitrd: "mkinitrd -f /boot/initrd-`uname-r`.img `uname-r`" Notice that this patch and the entire diagnosis is based on FC6-test2 + rawhide updates and may or may not apply to FC-5. With this patch and mkinitrd >= 5.1.6 ("rpm -q mkinitrd" to find out) and no usb-storage lines in /etc/modprobe.conf dmraid should work. I am in no man's land, since I am still in FC5 and my mkinitrd == 5.0.32-1. I suppose I will either have to wait for a solution for that version. Anyway, thanks for the directions. I hope your efforts help others who are stuck in this most annoying predicament. Good news. Just syncd with rawhide and the new kernel booted with dm raid0 without any modifications. Looks like nash has been fixed. I'm using the via_sata driver. Dwaine (In reply to comment #9) > Good news. Just syncd with rawhide and the new kernel booted with dm raid0 > without any modifications. Looks like nash has been fixed. > > I'm using the via_sata driver. > Hmm, I just checked my mirror and the nash there isn't fixed, maybe this bug only applies to nv_sata using systems, although I have a hard time believing that. Could you cat and paste or attach the contents of your /etc/fstab here? Thanks! Would someone point me in the right direction to learning how I, too, can sync FC5 to RawHide? (Assuming that is possible; I also use via_sata.) Thanks. (In reply to comment #11) > Would someone point me in the right direction to learning how I, too, can sync > FC5 to RawHide? (Assuming that is possible; I also use via_sata.) Thanks. First of all this may make your system unbootable even with the older kernel! Now with that said, edit: /etc/yum.repos.d/fedora-core.repo This file has 3 sections, of which only the top one is enabled by default, you can see this because the top section contains the line: enabled=1 Change this to: enabled=0 If you've enabled any other sections yourself disable them too. Do the the same for: /etc/yum.repos.d/fedora-updates.repo and: /etc/yum.repos.d/fedora-extras.repo Now edit /etc/yum.repos.d/fedora-development.repo and enable the top secxtion, that is modify it so that it contains: enabled=1 Do the same for: /etc/yum.repos.d/fedora-extras-development.repo Now your yum points to the development branch of Fedora. I think in this case it is wise todo a piecemeal update as you're only interested in mkinitrd, so after making the above changes type: yum update mkinitrd Do not use "yum -y update mkinitrd"! Now once yum has done all the magic it will give a list of packages that it will updater, this will include mkinitrd glibc(-xxx) and probably device-mapper and mdraid, this list may be around 10 packages long if its much longer please post it here and press N to stop yum from doing the actual update. If you are comfortable with the list press Y to continue, and once yum is done you've got the new mkinitrd which is al you need. After this you may revert the changes to /etc/yum.repos.d/*, if you don't revert this and do a yum update later you will get updated to a full development system! pjones, Can we get some progress on this? Maybe I can inspire some confidence in the validness of the attached patches / diagnosys of the problem by explaining how I came to these conclusions: As said a friend of mine has a Dell XPS, which default comes with nvidea sata dmraid setup. With a kernel update some time ago this broke for him (and many others). He had managed to manually fix this by adding the nescesarry "dm xxxx" lines to the init scripts in his initrd, using an initrd generated by mkinitrd 5.0.46 as base. Using later mkinitrd versions generated initrd's with the magic lines added manually for the first few newer mkinitrd versions and added by mkinitrd itself for later versions, his system broke once again. So I started by collecting mkinitrd versions 5.0.46 - 5.1.9 and managed to find most and by trial and error found out that this new breakage was introduced by 5.0.47, so 5.0.47 and newer do not work on his system even with the nescesarry magic "dm xxxxxx" lines in place. After pinpointing the exact version which broke I wanted to know where exactly it broke, so I recompiled the Fedora busybox rpm to include the ash applet and I inserted "busybox ash" lines between all the lines in the initrd init script. This way I could closely observe the behaviour of nash / the init script during the initrd stage of the boot. This way I soon noticed that with 5.0.46 /dev/dm-x nodes showed up in /dev after the magic "dm xxxxx" lines in the init script, whereas with 5.0.47 these didn't show up. The missing of this devices in trun caused the "mkrootdev xxxxx" line from the init script to fail, which in turn caused total boot failure. I could fix the boot with 5.0.47 (and later) by doing a manual mknod from ash for either /dev/dm-x or /dev/root . Then I first try to rerun mkblkdevs after the "dm xxxxx" lines, which worked but didn't seem pretty (this is what the second attached patch does). So I did a "diff -ur" between the sources of 5.0.46 and 5.0.47 (huge diff, many internal changes) and found the removal of the mksmartnods call which is readded in the first attached patch, which fixes this in a less ugly way. I hope that explains to how I came to this patches and why one of these patches is needed. Now PLEASE apply one of these before FC-6 so that people with a similar setup can have a working system out of the box. I hit a roadblock on the yum update mkinitrd trail. Here is the output of that action: Loading "installonlyn" plugin Setting up Update Process Setting up repositories livna [1/6] extras-development [2/6] development [3/6] gst-0.10-apps [4/6] gst-0.10-deps [5/6] gst-0.10-gst [6/6] Reading repository metadata in from local files Resolving Dependencies --> Populating transaction set with selected packages. Please wait. ---> Downloading header for mkinitrd to pack into transaction set. mkinitrd-5.1.9-1.x86_64.r 100% |=========================| 49 kB 00:05 ---> Package mkinitrd.x86_64 0:5.1.9-1 set to be updated --> Running transaction check --> Processing Dependency: rtld(GNU_HASH) for package: mkinitrd --> Restarting Dependency Resolution with new changes. --> Populating transaction set with selected packages. Please wait. ---> Downloading header for glibc to pack into transaction set. glibc-2.4.90-23.x86_64.rp 100% |=========================| 135 kB 00:15 ---> Package glibc.x86_64 0:2.4.90-23 set to be updated --> Running transaction check --> Processing Dependency: glibc-common = 2.4.90-23 for package: glibc --> Processing Conflict: glibc-common conflicts glibc > 2.4 --> Restarting Dependency Resolution with new changes. --> Populating transaction set with selected packages. Please wait. ---> Downloading header for glibc-common to pack into transaction set. glibc-common-2.4.90-23.x8 100% |=========================| 707 kB 01:31 ---> Package glibc-common.x86_64 0:2.4.90-23 set to be updated --> Running transaction check --> Processing Dependency: glibc-common = 2.4-8 for package: glibc --> Processing Conflict: glibc-common conflicts glibc < 2.4.90 --> Restarting Dependency Resolution with new changes. --> Populating transaction set with selected packages. Please wait. ---> Downloading header for glibc to pack into transaction set. glibc-2.4.90-23.i686.rpm 100% |=========================| 134 kB 00:18 ---> Package glibc.i686 0:2.4.90-23 set to be updated --> Running transaction check --> Processing Dependency: glibc = 2.4-8 for package: glibc-headers --> Processing Dependency: glibc = 2.4-8 for package: glibc-devel --> Restarting Dependency Resolution with new changes. --> Populating transaction set with selected packages. Please wait. ---> Downloading header for glibc-devel to pack into transaction set. glibc-devel-2.4.90-23.x86 100% |=========================| 100 kB 00:14 ---> Package glibc-devel.x86_64 0:2.4.90-23 set to be updated ---> Downloading header for glibc-headers to pack into transaction set. glibc-headers-2.4.90-23.x 100% |=========================| 133 kB 00:14 ---> Package glibc-headers.x86_64 0:2.4.90-23 set to be updated --> Running transaction check --> Processing Dependency: glibc-headers = 2.4-8 for package: glibc-devel --> Processing Dependency: glibc = 2.4-8 for package: glibc-devel --> Restarting Dependency Resolution with new changes. --> Populating transaction set with selected packages. Please wait. --> Running transaction check --> Processing Dependency: glibc-headers = 2.4-8 for package: glibc-devel --> Processing Dependency: glibc = 2.4-8 for package: glibc-devel --> Restarting Dependency Resolution with new changes. --> Populating transaction set with selected packages. Please wait. --> Running transaction check Error: Unable to satisfy dependencies Error: Package glibc-devel needs glibc-headers = 2.4-8, this is not available. Error: Package glibc-devel needs glibc = 2.4-8, this is not available. So, first let me apologize for being such a tyro, but here I am. Do I need to go back and get these earlier versions of glibc? Or is this a hiding for nothing? Thanks to all. Hmm, going a bit offtopic for this bug, you could try: yum update mkinitrd 'glibc*' If that doesn't help please include the output of: rpm -qa|grep glibc in your next comment Yes, updating glibc* and then mkinitrd did the trick. Thanks to Mssrs. Degoede & Garden, and, for that matter all the other cognesceti who contributed to Bug 30241 & Bug 18642 for providing the magic recipes and incantations to work through this problem. Software engineering may not be the dismal science, but it sure travels some grim paths at times. For the record, this, in brief, is my setup: AMD 64 4200 RAID0 (2 x 250gb Western Digital ATA) ASUS A8V I know that this is not a production solution. But if I wanted that I guess I would be using RHEL 4WS as I do at the office. Again, thanks to all concerned for seeing me through the darkness. Sorry, I meant Bug 203241, you know, this one, and not 30241. Stupid fingers. (In reply to comment #16) > Yes, updating glibc* and then mkinitrd did the trick. Did just the update do the trick, or did you also apply the second patch attached to this bug? I first yum updated glibc* (to v2.4.90) and then yum updated mkinitrd (to v5.1.9-1), both from the development repositories as you suggested above. After those packages were installed, I re-enabled the standard depositories (and disabled the development ones), applied the kernel update (to take my machine to 2.6.17-1.2174_FC5), and rebooted whilst holding my breath. And so, here I am. By the way, if memory serves, after the glibc update, the dependency list for mkinitrd was that package only. Also, all suggested updates, save the kernel, were applied before I hybridized my system. And as a further correction, my RAID0 consists of SATA (not just ATA :-)) drives. Again, thanks to all for helping me through this. Hmm, So you didn't use / apply any of the patches attached here and still have a working setup that makes you the second person. Could you attach /include in a comment your /etc/fstab and the output of the "mount" command? Just the lines concernign your / (root) filesystem will do. Thanks! Looks like we are getting somewhere, thanks Jesse Keating See the transcript from irc / #fedora-devel below: f13 Horray! dm-raid still bust-o on rawhide (: f13 pjones: strangely enough, rescue mode is able to mount it just fine. hansg f13, maybe the patch I submitted here will fix this: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=203241 * _Zoltan_ has quit (Read error: 113 (No route to host)) hansg f13, the second patch (workaround) can be applied directly to /sbin/mkinitrd and then recreate the initrd hansg f13, if you can try this, it helps you and you can then convince pjones to take a look at #203241 you'll be my hero * f13 looks pjones I don't have any argument against fixing it. I just don't have any time, either. hansg I've done a lot of digging and I'm willing todo more if needed, but some response showing that my work (around 12 hours sofar) isn't going to /dev/null would be appreciated hansg f13, also make sure you are using the latest mkinitrd and that you do not have any scsi_adapter usb-storage aliases in /etc/modprobe.conf f13 hansg: so the problem I'm having is IO error reading sda2 or something like that. f13 I'll patch, we'll see. * behdad has quit ("Leaving.") hansg f13, then its most likely usb-storage aliases in /etc/modprobe.conf * jwb grows tired of callion and dnielsen spotting on blogs f13 hansg: I watched the mkinitrd creation, there were no usb modules added to initrd. f13 hansg: for rawhide do I need both the mkinitrd patch _and_ the initscripts patch? f13 n/m, I read it now * f13 tries the /sbin/mkinitrd patch hansg f13, you say no usb modules at all? Or just not usb-storage? If you've got no usb-modules at all then you're using a pretty old mkinitrd (or a very new on with wihhc I'm not familiar yet) hansg f13, rpm -q mkinitrd ? f13 hansg: oh fun! I updated to newest mkinitrd and now I get the usb modules brought in. f13 uhci-hcd, ohci-hcd, ehci-hcd hansg f13, good! That might fix the sda2 error f13 hrm, f13 I should try this w/out your patch first. hansg those are a normal erm "feature" of the newest mkinitrd, as long as usb-storage isn't added things are ok f13 nod hansg yes testing without the patch first is a good idea I think f13 its helpful when you don't get udev but you have a usb keyboard * somegeek has quit (Read error: 104 (Connection reset by peer)) f13 peter and I kept hitting this on my ppc mini. f13 udev would barf the box, but w/out udev we couldn't use the usb keyboard (: f13 hansg: rebooting w/out your patch. hansg yes they are I had the same problem when I added a static shell to the initrd to debug this on a friends Pc, no keyboard f13 hansg: so, with the unpatched new mkinitd and a recreated initrd, it just works. hansg thats good news, lots of people tell me that, but it doesn't work on my friends PC without the patch :| f13 suck. hansg what does "mount" say for root? hansg and /etc/fstab? f13 /dev/dm-1 on /boot type ext3 (rw) f13 /dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw) f13 /dev/VolGroup00/LogVol00 / ext3 defaults 1 1 f13 LABEL=/boot /boot ext3 defaults 1 2 f13 --- Physical volume --- f13 PV Name /dev/dm-2 hansg Thanks, thats different from what my friend has he has /dev/dm-X as root device instead of /dev/mapper/VolGroup00-LogVol00, thats probably why the missing /dev/dm-x nodes (missing frm the /dev in the initrd) bite him * tibbs_ has quit (Remote closed the connection) f13 hansg: probably, is he not using LVM? * stickster (n=pfrields@fedora/stickster) has joined #fedora-devel hansg I'm pretty sure that if you change your fstab to contain LABEL=/ for root and then rerun mkinitrd you will need my patch hansg because when using a label the label gets translated to ./dev/dm-X and not /dev/mapper/XXXXXX f13 hrm. f13 all that is crackrock. I hear pjones screaming about that through the cubewall on a weekly basis. f13 the stupid naming of crud that is pjones we shouldn't _ever_ be mounting a dm-N device pjones If we do, that's a bug. pjones (but ugh, what a PITA) f13 pjones: my box has it mounted for /boot/ :/ pjones So I see. f13 /dev/dm-1 on /boot type ext3 (rw) f13 ah * somegeek (i=levin@tor/regular/somegeek) has joined #fedora-devel f13 so the label translation stuff is getting it wrong again? hansg then my patch is wrong and the real bug is that LABEL= lines can get translated to /dev/dm-X stuff? --- Chris (chabotc) can you try to change the line for your root filesystem in /etc/fstab to use /dev/mapper/XXXXXp3 as device instead of LABEL=/ and then recreate your initrd with a pristine (unpatched) mkinitrd? some more irc logs: hansg f13, pjones, If i understand correctly we've pretty much got the dmraid problem confined / defined to wrong LABEL=xxx translation, right? pjones I did say I haven't looked at it, right? pjones but even if we get dm-1 instead of /dev/mapper/pdc_whatever , as long as they're the same major:minor that shouldn't cause a failure hansg pjones, it does because mkinitrd > 5.0.46 (nash > 5.0.46 actually) no longer creates /dev/dm-x in the ramdisk /dev dir and does the mkrootdev line from the init ramdisk script fai when it gets passed /dev/dm-x as a parameter * Foolish has quit (Read error: 104 (Connection reset by peer)) hansg s/does/thus/ pjones yeah, but it shouldn't be getting /dev/dm-N as a parameter. pjones if it is, there's another problem being missed hansg pjones, agreed which seems to happen in the LABEL -> device translation pjones taking patches ;) hansg the stranege thing is I did try putting /dev/mapper/XXXX in the initrd-init script manually and that didn't work either, but maybe that was with an older mkinitrd when I was debugging this I've tried about 10 different mkinitrd versions hansg I've asked my friend to try it with /dev/mapper/XXXX as root in his fstab, if that fixes things for him I'll take a stab at fixing the LABEL -> device creation It turns out that although /dev/dm-x related the patches attached to this bug are completely wrong. The real problem (for normal setups) is that booting by LABEL= from lvm or dmraid fails. Bug 204768 was created for this problem and contains a proper patch, so I'm closing this one as a dup of 204768. *** This bug has been marked as a duplicate of 204768 *** |