Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 117575
Summary: | rhgb breaks root-on-LVM fsck (was: [device-mapper] block device numbers are unstable) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Alexandre Oliva <oliva> | ||||||||
Component: | rhgb | Assignee: | Daniel Veillard <veillard> | ||||||||
Status: | CLOSED NEXTRELEASE | QA Contact: | Mike McLean <mikem> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | rawhide | CC: | bugzilla_rhn, gbpeck, jorton, sct, th0ma7 | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | i386 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2004-09-09 12:44:17 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 123268 | ||||||||||
Attachments: |
|
Description
Alexandre Oliva
2004-03-05 14:59:35 UTC
Known problem. I've been using the scriptlet: # If we're running with LVM and with no devfs, we need to populate /dev/mapper # now to pick up dynamic major/minor numbers. dm_minor=`LC_ALL=C fgrep device-mapper /proc/misc | LC_ALL=C awk '{print $1}'` if [ "x"$dm_minor != "x" -a ! -e /dev/.devfsd ] ; then mkdir -p /dev/mapper mount -t tmpfs -o context=system_u:object_r:fixed_disk_device_t tmpfs /dev/mapper/ mknod --mode 0600 /dev/mapper/control c 10 $dm_minor # We can't lock the lvm config files if root is still r/o lvm vgscan --ignorelockingfailure --mknodes fi in rc.sysinit as a temporary workaround, but that has nasty interactions with SELinux. Currently, the plan to fix this is to use the initrd's root device node for the initial fsck instead, and only to unmount initrd once that has completed. Created attachment 98326 [details]
lvm-on-root rc.sysint fix
Defer initrd unload until after root fs check;
Use the /initrd/dev/* root device node if available.
This patch fixes things for me --- Bill, can you give it a check and merge it if it looks sane? *** Bug 116573 has been marked as a duplicate of this bug. *** This patch ain't working for me... My system does boot with kernel 2.6.1-1.65smp but takes about 5 mins to pass the "Setting up local disks"... My system disk looks like this: - 1st, Win2k3 server, 9gig - 2nd, /boot, 256mb - 3rd, vg00: - /, 4096 (lvol1) - /var, 2048 (lvol2) - /tmp, 1024 (lvol3) I've tried kernel 2.6.3-2.1.242smp, 246smp, 254smp and I always get this: insmod: error inserting '/lib/modules/2.6.3-2.1.246smp/kernel/drivers/acpi/toshiba_acpi.ko': -1 No such device Initialising USB Controller (uhci-hcd): OK Mounting USB filesystem: OK Checking root filesyste fsck.ext3: /dev/vg00/lvol0: The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternative superblock: e2fsck -b 8193 <device> Invalid argument while trying to open /dev/vg00/lvol0 FAILED *** An error occured during the filesystem chkeck. *** Dropping you to a shell; the system will reboot *** when you leave the shell. Give root password for maintenance (or type Control-D to continue): cat: /sys//devices/pci0000:00/0000:00:04.3/usb2/2-1/2-1:1.0/host1/1:0:0:0/type:No such file or directory Unable to open /etc/fstab for writing: read-only file system cat: /sys//class/usb/lp0/bNumConfigurations: No such file or directory /etc/hotplug/usb.agent: line 144: [: too many arguments ---- after typing root passwd: (repair filesystem) 1# df filesystem 1k-block used available use% mounted on /dev/mapper/vg00-lvol0 4128448 3292616 626120 85% / none 4128448 3292616 626120 85% /dev/shm (repair filesystem) 2# I've tried the patch and attached my actual /etc/rc.d/rc.sysinit file Created attachment 98529 [details]
Patched rc.sysinit file
What is specified as the root filesystem in your /etc/fstab and /etc/grub.conf files? /boot/grub/grub.conf (partial): title Fedora Core (2.6.3-2.1.253smp) root (hd0,1) kernel /vmlinuz-2.6.3-2.1.253smp ro root=/dev/vg00/lvol0 rhgb initrd /initrd-2.6.3-2.1.253smp.img title MS Windows Server 2003 rootnoverify (hd0,0) chainloader +1 Maybie it's new in kernel 2.6 but my LV is seen has /dev/mapper/vg00-lvolX instead of /dev/vg00/lvolX ... and the error I'm getting at boot time with any updated kernel is: fsck.ext3: /dev/vg00/lvol0: blabla.. Added in CVS, will be in 7.47-1. Seems to have fixed the problem for me. It could use a cosmetic improvement, though: ATM, it prints `/initrd//dev/int/fc2test: ...' in the fsck message, instead of the FS label. I don't particularly care about not getting the label, but the double slash looks ugly, and shouldn't be hard to fix. maybie my problem is a bit different? Nibody has a clue? Can you try with the current initscripts package? still the same problem.... really long to boot with 2.6.1 kernel and unable to boot with any updated kernel.. What's the fsck error message you get now? At least the pathname to the root device should have changed after sct's patch. in fact, the first comment I've made about my fsck error was with sct's patch. fsck.ext3: /dev/vg00/lvol0: The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternative superblock: e2fsck -b 8193 <device> Invalid argument while trying to open /dev/vg00/lvol0 FAILED *** An error occured during the filesystem chkeck. *** Dropping you to a shell; the system will reboot *** when you leave the shell. Did you install sct's patch or update the package, like he suggested? I suppose you did the former, since the latter has the effect of prepending /initrd/ to the root block device. It is the root filesystem it's trying to fsck, right? Make sure you initrd.img actually has code to enable root on lvm2. I did install sct's patch... and with the usual up2date I have'nt found any new initscripts package. It is the root filesystem on witch fsck is trying to run. On 2.6 kernel, how can I know if lvm2 is enabled within the initrd image? Do I simply call "mkinitrd NewImageFile 2.6.3kernel-smp-what-so-ever" or do I have to add a param like "--with=lvm2" I've seen at boot time "vg00 activated has lvm2" or something like that. I've just been able to boot on 2.6.3 kernel.. finally.. The problem is that, when installing the newer kernel (2.6.1 to 2.6.3-xxx) it did'nt change the grub.conf file properly. It seems that the device has to be "/dev/vg00-lvol0" instead of "/dev/vg00/lvol0". So I just edited my line : kernel /vmlinuz-2.6.3-2.1.253.2.1smp ro root=/dev/vg00/lvol0 rhgb for kernel /vmlinuz-2.6.3-2.1.253.2.1smp ro root=/dev/vg00-lvol0 and it worked (I removed the rhgb just to actually see all the output). It still take about 3 to 5 minutes to pass the "Setting up logical Volume Management" section... I get the error message "/dev/cdrom1: open failed: read-only file system" How can I remove /dev/cdrom1 from the lvm check at boot time? Create /etc/lvm/lvm.conf with an entry such as: devices { filter = [ "a|/dev/md.*|", "r|.*|" ] } The above will only look for physical volumes in raid block devices; adjust to suit your needs. You may want to have the `r' entry first, and a catch-all for `a'. After you're happy with the performance of lvm vgscan on the booted system, re-run mkinitrd and it will copy lvm.conf into the initrd img. The lvm.conf proposed work really nicely. It now boots-up in a few seconds like it used to with other redhat builds. And with the correction to the grub configurations it fixed the 2.6.3 upgrade from 2.6.1. Everything is now fine.. thnx a lot! "/dev/vg00-lvol0" looks wrong: I'd expect it to be "/dev/mapper/dev/vg00-lvol0". Are you sure it's the former? Also, are you running in SELinux enforcing mode or not? I've jsut tried /dev/mapper/vg00-lvol0 and it worked properly with latest 2.6.3 kernel instead of /dev/vg00-lvol0 (wich was working too:). When I installed the system, I created the vg00 and it's LV with the former installer of Core2-test1. It created a grub.conf file with /dev/vg00/lvol0 has / filesystem. With 2.6.3 it seems I needed to call /dev/mapper/vg00-lvol0 instead but the kernel upgrade did not change the grub file automatically.. instead it simply cloned the last entry (from 2.6.1 kernel). And about SELinux enforcing mode, I don't know... I would need to read more about it. note: I have not tried /dev/mapper/dev/vg00-lvol0 ... It did'nt match with the usual df output. But if you insist, I can try it. "/dev/mapper/dev/vg00-lvol0" was a typo, it should indeed be "/dev/mapper/vg00-lvol0" as you tried. If it's working for you now, I can only imagine that it's an initrd issue, as nothing relevant has changed in the kernel itself. But as long as it's working, I'll close it for now. Please reopen if you find it is not properly fixed. Err... I had reopened this because of hte cosmetic issue pointed out in https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=117575#c10 Should I file a separate bugzilla for it? Yes, please do. FC3test1 fails to boot again if the root device-mapper device changes. /initrd/dev does not exist any more, so once again we're trying to fsck the outdated device-mapper device in the read-only root filesystem. FYI - I believe I'm having the same issue. I have no issue booting 2.6.3, but cannot boot with any kernel > 2.6.3 - same error as comment #15, above. When I enter lvm commands by hand after being dropped into the shell, I get the locking type 1 errors. When I try --ignorelocking failures, the lvm commands succeed, but all logical partitions are reported as invalid and will not mount. I've tried both "stable" and "latest" dm and lvm packages. All of the above works with 2.6.3, but not 2.6.4, 2.6.5 or 2.6.7 (didn't try 2.6.6). All kernels built from the 2.6.3 .config (make oldconfig). Also tried with and without selinux support in the kernel. I also hand-checked the initrd files created by mkinitrd vs. the working 2.6.3 - only difference are the kernel modules. Created attachment 102011 [details]
lvm-on-root rc.sysinit fix for FC3test1
rhgb really abuses /initrd. It mounts a ramfs atop of the original /initrd,
hiding the device nodes we actually needed. Which means we end up not
umounting the original /initrd at all (or at least not as early as we could).
This patch, that depends on the directory /dev/initrd-dev to exist (I just made
it up, and I'm open to suggestions), binds /initrd/dev to this new dir, umounts
/initrd early before rhgb starts and mounts something over it, and then umounts
/dev/initrd-dev after fscking the root filesystem. How does this look?
Ugh, I made a mistake while editing the patch file I posted. The `elif' line added in the first hunk is missing a `; then' at the end. Sorry about that. I'm more and more convinced that we shouldn't have to change rc.sysinit because of rhgb's insistence in abusing /initrd. I'm changing this bug to rhgb. Do I understand correctly that you would want rhgb to create its ramfs in a different location than /initrd ? if yes do you have suggestion of where it should mount it, any name can potentially be used for something else. The point I guess was to avoid stepping on the users/administrator toes because /initrd is somewhat already reaserved... Daniel The problem is precisely that /initrd is reserved. It is still in use at the time rhgb starts (if it starts early). I have suggested /tmp/rhgb, /var/rhgb, or /var/tmp/rhgb. If it doesn't exist at the time we attempt to early-start rhgb (before the root fsck), we wait until root is remounted rw and then we can create whatever directory we need and proceed. It would be just a small delay for rhgb in the unlikely case of the directory we need not being there. The only problem is if /tmp or /var are mounted as separate filesystems: then umounting the rhgb-mounted filesystem may be a bit tricky. Another option, if rhgb umounts /initrd twice as it appears to me that it does (because when it starts it mounts something else atop of /initrd, but after it completes, I can't see the original initrd mounted any more), it might as well use /initrd/tmp/rhgb. I agree it's not a simple problem, but we can't have two different programs competing to decide what's mounted in /initrd, and clearly what rhgb is mounting on /initrd is not an initrd image, so I think it's the one that must give. Ugh. This problem is present in FC2 as well. As soon as I upgraded a remote box to kernel-2.6.8-1.520, it wouldn't reboot, because the device number associated with the logical volume holding the root device, for some reason, changed. I have moved rhgb out of /initrd into /etc/rhgb , this should fix this problem. This is commited in CVS, there is test (S)RPMS available under http://people.redhat.com/veillard/testing/ for FC2 and Rawhide, version 0.12.4 . I hope this resolve the issue, it really should. Daniel Thanks for the change. /etc/rhgb looks like an excellent solution! FWIW, udev in initrd (and possibly even in sysinit) already offers a solution for the problem of fscking the root filesystem, since then /dev is writable, but for people who choose to disable udev, it would still be a problem. |