Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 863978
Summary: | Odd btrfs failures in F18 and Rawhide | |||
---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Richard W.M. Jones <rjones> | |
Component: | btrfs-progs | Assignee: | Zach Brown <zab> | |
Status: | CLOSED UPSTREAM | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | |
Severity: | unspecified | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 18 | CC: | aoliva, blair, bugzilla, esandeen, gansalmon, itamar, jonathan, josef, kernel-maint, kxra, lightdot, madhu.chinakonda, mmahut, npmccallum, rjones, robatino, sjensen, sweil | |
Target Milestone: | --- | Keywords: | Reopened | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 907554 (view as bug list) | Environment: | ||
Last Closed: | 2013-07-03 14:24:00 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 910269, 907554, 910270, 922891 |
Description
Richard W.M. Jones
2012-10-08 08:49:22 UTC
Here is another one (i686 this time .. does seem to be more prevalent on 32 bit, but I have seen it one time on 64 bit): btrfsck /dev/sda1 Check tree block failed, want=139264, have=0 Check tree block failed, want=139264, have=0 Check tree block failed, want=139264, have=0 read block failed check_tree_block Couldn't read chunk root guestfsd: error: /dev/sda1: Check tree block failed, want=139264, have=0 Check tree block failed, want=139264, have=0 Check tree block failed, want=139264, have=0 read block failed check_tree_block Couldn't read chunk root libguestfs: recv_from_daemon: 256 bytes: 20 00 f5 f5 | 00 00 00 04 | 00 00 01 4c | 00 00 00 01 | 00 12 34 91 | ... libguestfs: trace: btrfs_fsck = -1 (error) btrfs_fsck: /dev/sda1: Check tree block failed, want=139264, have=0 Check tree block failed, want=139264, have=0 Check tree block failed, want=139264, have=0 read block failed check_tree_block Couldn't read chunk root http://kojipkgs.fedoraproject.org//work/tasks/261/4570261/build.log Here are another couple (on x86-64): mkfs.btrfs /dev/sda1 libguestfs: recv_from_daemon: 40 bytes: 20 00 f5 f5 | 00 00 00 04 | 00 00 01 3d | 00 00 00 01 | 00 12 34 a1 | ... libguestfs: trace: mkfs_btrfs = 0 libguestfs: trace: mount "/dev/sda1" "/" libguestfs: send_to_daemon: 68 bytes: 00 00 00 40 | 20 00 f5 f5 | 00 00 00 04 | 00 00 00 01 | 00 00 00 00 | ... guestfsd: main_loop: proc 317 (mkfs_btrfs) took 0.33 seconds guestfsd: main_loop: new request, len 0x40 mount -o /dev/sda1 /sysroot/ [ 93.385809] device fsid 5137f603-6b78-46f6-9d95-e85bf55202c2 devid 1 transid 3 /dev/sda1 [ 93.386822] device fsid 5137f603-6b78-46f6-9d95-e85bf55202c2 devid 1 transid 3 /dev/sda1 [ 93.408020] btrfs: disk space caching is enabled [ 93.459089] btrfs: failed to recover relocation [ 93.477535] btrfs: open_ctree failed mount: wrong fs type, bad option, bad superblock on /dev/sda1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so guestfsd: error: /dev/sda1 on / (options: ''): mount: wrong fs type, bad option, bad superblock on /dev/sda1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so libguestfs: recv_from_daemon: 284 bytes: 20 00 f5 f5 | 00 00 00 04 | 00 00 00 01 | 00 00 00 01 | 00 12 34 a2 | ... libguestfs: trace: mount = -1 (error) mount: /dev/sda1 on / (options: ''): mount: wrong fs type, bad option, bad superblock on /dev/sda1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so mkfs.btrfs /dev/sda1 libguestfs: recv_from_daemon: 40 bytes: 20 00 f5 f5 | 00 00 00 04 | 00 00 01 3d | 00 00 00 01 | 00 12 34 a9 | ... libguestfs: trace: mkfs_btrfs = 0 libguestfs: trace: mount "/dev/sda1" "/" libguestfs: send_to_daemon: 68 bytes: 00 00 00 40 | 20 00 f5 f5 | 00 00 00 04 | 00 00 00 01 | 00 00 00 00 | ... guestfsd: main_loop: proc 317 (mkfs_btrfs) took 0.35 seconds guestfsd: main_loop: new request, len 0x40 mount -o /dev/sda1 /sysroot/ [ 96.384211] device fsid 962db3c0-4153-450b-9ca7-c9216e81afe3 devid 1 transid 3 /dev/sda1 [ 96.385314] device fsid 962db3c0-4153-450b-9ca7-c9216e81afe3 devid 1 transid 3 /dev/sda1 [ 96.394158] btrfs: disk space caching is enabled [ 96.428656] btrfs: failed to recover relocation [ 96.437190] btrfs: open_ctree failed mount: wrong fs type, bad option, bad superblock on /dev/sda1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so guestfsd: error: /dev/sda1 on / (options: ''): mount: wrong fs type, bad option, bad superblock on /dev/sda1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so libguestfs: recv_from_daemon: 284 bytes: 20 00 f5 f5 | 00 00 00 04 | 00 00 00 01 | 00 00 00 01 | 00 12 34 aa | ... libguestfs: trace: mount = -1 (error) mount: /dev/sda1 on / (options: ''): mount: wrong fs type, bad option, bad superblock on /dev/sda1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so http://kojipkgs.fedoraproject.org//work/tasks/354/4570354/build.log The common factor seems to be that the filesystem is created (mkfs.btrfs /dev/sda1) and then it is immediately used -- eg. mounted or some btrfs subtool is run on it. There is no pause or sync between the operations. In libguestfs we have been testing this for a long time, and this change has only just happened, so it must be some change in the kernel or btrfs tools. Raised upstream: http://thread.gmane.org/gmane.comp.file-systems.btrfs/20257 Chris Mason on the upstream mailing list suggested adding this patch to btrfs-progs: commit 6eba9002956ac40db87d42fb653a0524dc568810 Author: Goffredo Baroncelli <kreijack> Date: Tue Sep 4 19:59:26 2012 +0200 Correct un-initialized fsid variable I did that and it did appear to fix the problem. I'm still running tests to be sure about this. (Instead of cherry picking that patch, we could instead move to a later version of btrfs-progs, since we haven't updated in about 2 months). btrfs-progs-0.19.20120817git043a639-2.fc18 has been submitted as an update for Fedora 18. https://admin.fedoraproject.org/updates/btrfs-progs-0.19.20120817git043a639-2.fc18 Package btrfs-progs-0.19.20120817git043a639-2.fc18: * should fix your issue, * was pushed to the Fedora 18 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing btrfs-progs-0.19.20120817git043a639-2.fc18' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-15895/btrfs-progs-0.19.20120817git043a639-2.fc18 then log in and leave karma (feedback). (In reply to comment #4) > Chris Mason on the upstream mailing list suggested adding > this patch to btrfs-progs: > > commit 6eba9002956ac40db87d42fb653a0524dc568810 > Author: Goffredo Baroncelli <kreijack> > Date: Tue Sep 4 19:59:26 2012 +0200 > > Correct un-initialized fsid variable > > I did that and it did appear to fix the problem. I'm > still running tests to be sure about this. Unfortunately although this patch has reduced the occurrence of the bug, I saw it happen again last night even with the patched btrfs-progs. modprobe btrfs [ 15.716610] Btrfs loaded grep ^[[:space:]]*btrfs$ /proc/filesystems mkfs.btrfs /dev/sda1 /dev/sdb1 [ 16.656467] device fsid fb238b09-83b9-4b18-9545-d1fae8f5d489 devid 1 transid 1 /dev/sda1 [ 16.657467] device fsid fb238b09-83b9-4b18-9545-d1fae8f5d489 devid 1 transid 1 /dev/sda1 [ 17.227381] device fsid fb238b09-83b9-4b18-9545-d1fae8f5d489 devid 2 transid 3 /dev/sdb1 fsync /dev/sda fsync /dev/sdb fsync /dev/sdc fsync /dev/sdd libguestfs: recv_from_daemon: 40 bytes: 20 00 f5 f5 | 00 00 00 04 | 00 00 01 3d | 00 00 00 01 | 00 12 34 04 | ... libguestfs: trace: mkfs_btrfs = 0 libguestfs: trace: mount "/dev/sda1" "/" libguestfs: send_to_daemon: 68 bytes: 00 00 00 40 | 20 00 f5 f5 | 00 00 00 04 | 00 00 00 01 | 00 00 00 00 | ... guestfsd: main_loop: proc 317 (mkfs_btrfs) took 2.22 seconds guestfsd: main_loop: new request, len 0x40 mount -o /dev/sda1 /sysroot/ [ 17.512337] device fsid fb238b09-83b9-4b18-9545-d1fae8f5d489 devid 1 transid 4 /dev/sda1 [ 17.758300] device fsid fb238b09-83b9-4b18-9545-d1fae8f5d489 devid 2 transid 4 /dev/sdb1 [ 17.857285] device fsid fb238b09-83b9-4b18-9545-d1fae8f5d489 devid 1 transid 4 /dev/sda1 [ 17.893279] device fsid fb238b09-83b9-4b18-9545-d1fae8f5d489 devid 1 transid 4 /dev/sda1 [ 17.909277] btrfs: disk space caching is enabled [ 17.943272] btrfs bad tree block start 0 135168 [ 17.955270] btrfs: open_ctree failed mount: wrong fs type, bad option, bad superblock on /dev/sda1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so guestfsd: error: /dev/sda1 on / (options: ''): mount: wrong fs type, bad option, bad superblock on /dev/sda1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so libguestfs: recv_from_daemon: 284 bytes: 20 00 f5 f5 | 00 00 00 04 | 00 00 00 01 | 00 00 00 01 | 00 12 34 05 | ... libguestfs: trace: mount = -1 (error) http://kojipkgs.fedoraproject.org//work/tasks/9775/4579775/build.log Is Bug 864765 related to this one? I'm not actually finding the SIGABRT in any of the logs, but maybe I've overlooked something. The file system is clearly created and valid without errors, but anaconda is crashing before subvols are created. That other bug looks like a different bug ... Still happens. No resolution upstream yet. Richard, was the device in which the btrfs filesystem is created used recently by another btrfs mount before since the last boot? I've seen recent versions of btrfs keep the device active, possibly still pending some I/O, even after umount of all mount-points of a filesystem complete successfully, and I'm guessing this might explain corruption should information from the old filesystem make to disk and overwrite info from the newly-created one. (In reply to comment #11) > Richard, was the device in which the btrfs filesystem is created used > recently by another btrfs mount before since the last boot? I've seen > recent versions of btrfs keep the device active, possibly still pending some > I/O, even after umount of all mount-points of a filesystem complete > successfully, and I'm guessing this might explain corruption should > information from the old filesystem make to disk and overwrite info from the > newly-created one. No. I can get this by creating a new VM from scratch with a sparse raw disk, running mkfs.btrfs immediately followed by mount. I talked with upstream devs about this, and it is apparently something to do with the writes from userspace (ie. mkfs) not being seen/read by the mount (kernel space). I have to admit I did not fully understand the explanation. I thought I would update people on this bug with the current Rawhide kernel. There is still a bug, although it's a little different now from the original report. Because it seem different, I have cloned this bug (to preserve the CC list) as bug 907554. I would say that the current bug appears to be fixed, but bug 907554 looks rather similar to it. Actually I'm going to reopen this because it occurs on Fedora 18 still (although it's very rare). kernel-3.7.2-204.fc18.x86_64 Captured a few minutes ago: mount -o /dev/sda1 /sysroot/ [ 22.833046] device fsid 3f86f060-d5ef-4b25-b37a-f076f2956038 devid 1 transid 1 /dev/sda1 [ 22.834326] device fsid 3f86f060-d5ef-4b25-b37a-f076f2956038 devid 1 transid 1 /dev/sda1 [ 22.835959] btrfs: disk space caching is enabled [ 22.841717] btrfs: failed to recover relocation [ 22.842300] btrfs: open_ctree failed mount: wrong fs type, bad option, bad superblock on /dev/sda1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so guestfsd: error: /dev/sda1 on / (options: ''): mount: wrong fs type, bad option, bad superblock on /dev/sda1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so Another example: mount -o /dev/sda1 /sysroot/ [ 75.882682] device fsid c79ef5dc-239a-49cb-b1fd-fd459ca912bd devid 1 transid 4 /dev/sda1 [ 76.014647] device fsid c79ef5dc-239a-49cb-b1fd-fd459ca912bd devid 1 transid 4 /dev/sda1 [ 76.255755] btrfs: disk space caching is enabled [ 76.308880] device fsid c79ef5dc-239a-49cb-b1fd-fd459ca912bd devid 2 transid 4 /dev/sdb1 [ 76.395366] btrfs bad tree block start 0 135168 [ 76.477635] btrfs: open_ctree failed mount: wrong fs type, bad option, bad superblock on /dev/sda1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so guestfsd: error: /dev/sda1 on / (options: ''): mount: wrong fs type, bad option, bad superblock on /dev/sda1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so libguestfs: recv_from_daemon: 284 bytes: 20 00 f5 f5 | 00 00 00 04 | 00 00 00 01 | 00 00 00 01 | 00 12 34 05 | ... libguestfs: trace: mount = -1 (error) *stdin*:14: libguestfs: error: mount: /dev/sda1 on / (options: ''): mount: wrong fs type, bad option, bad superblock on /dev/sda1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so Full log: http://koji.fedoraproject.org/koji/getfile?taskID=4930070&name=build.log Although this error is much rarer than earlier, it's still happening. After this discussion upstream: http://thread.gmane.org/gmane.comp.file-systems.btrfs/23249 I added this patch to the Rawhide btrfs-progs package: https://git.kernel.org/?p=linux/kernel/git/mason/btrfs-progs.git;a=commitdiff;h=8fe354744cd7b5c4f7a3314dcdbb5095192a032f This fixes the problem on Rawhide (but NOT on Fedora 18). *** Bug 816304 has been marked as a duplicate of this bug. *** According to the thread Richard posted above, the actual fix for this is here: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=6f60cbd3ae442cb35861bb522f388db123d42ec1 Has this been resolved? (In reply to Nathaniel McCallum from comment #18) > According to the thread Richard posted above, the actual fix for this is > here: > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/ > ?id=6f60cbd3ae442cb35861bb522f388db123d42ec1 > > Has this been resolved? Should be. That went into 3.9 and F18 is well past that. However, there were different btrfs-progs commits too. (In reply to Nathaniel McCallum from comment #18) > According to the thread Richard posted above, the actual fix for this is > here: > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/ > ?id=6f60cbd3ae442cb35861bb522f388db123d42ec1 > > Has this been resolved? This looks a lot more plausible as a fix. However I'll need to run another libguestfs build with all the btrfs tests reenabled to really verify it. Yup, confirmed this is fixed. |