Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 559290
Summary: | LVMError: lvcreate failed for VolGroup/lv_root - 512M insufficient for Fedora install | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | James Laska <jlaska> |
Component: | lvm2 | Assignee: | LVM and device-mapper development team <lvm-team> |
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 13 | CC: | agk, awilliam, bmarzins, bmr, dcantrell, den.mail, dwysocha, heinzm, jlaska, jonathan, jonstanley, jturner, kparal, lmacken, lvm-team, mbanas, mbroz, meetmehiro, mpatocka, msnitzer, petersen, prajnoha, prockai, robatino, vanmeeuwen+fedora |
Target Milestone: | --- | Keywords: | CommonBugs |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | anaconda_trace_hash:3d87f7226a0e99c05a950567995483787c5b0890411719a260b384bdfd22b8d8 https://fedoraproject.org/wiki/Common_F13_bugs#lvm-memory-usage | ||
Fixed In Version: | lvm2-2.02.62-2.fc13 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2010-03-19 18:07:07 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 538274 | ||
Attachments: |
Description
James Laska
2010-01-27 16:55:01 UTC
Created attachment 387129 [details]
Attached traceback automatically from anaconda.
Encountered during rats_install test run (for details sett https://fedoraproject.org/wiki/Test_Results:Fedora_13_Rawhide_Acceptance_Test_1). RATS can be initiated by running ... # git clone git://git.fedorahosted.org/autoqa.git # cd autoqa/tests/rats_install # ln -s ../../lib/python ../../lib/autoqa # PYTHONPATH="../../lib" install.py -a x86_64 -i http://serverbeach1.fedoraproject.org/pub/alt/stage/rawhide/x86_64/os -x "updates=http://jlaska.fedorapeople.org/updates-557588.img" http://download.fedoraproject.org/pub/fedora/linux/development/x86_64/os The kickstart used is fairly basic and includes the following storage options: # partitioning - nuke and start fresh zerombr clearpart --all --initlabel autopart bootloader --location=mbr Created attachment 387393 [details]
anaconda-logs.tgz
# tar -ztvf /tmp/anaconda-logs.tgz
-rw-r--r-- root/root 82985 2010-01-28 12:45 tmp/autoqa/log/minimon/anacdump.txt
-rw-r--r-- root/root 7400 2010-01-28 12:44 tmp/autoqa/log/minimon/anaconda.log
-rw-r--r-- root/root 5433 2010-01-28 12:42 tmp/autoqa/log/minimon/program.log
-rw-r--r-- root/root 51965 2010-01-28 12:44 tmp/autoqa/log/minimon/storage.log
-rw-r--r-- root/root 46011 2010-01-28 12:42 tmp/autoqa/log/minimon/messages.log
One important piece of information appears to be missing: dlehman shared the fact that these failures occurred within a guest install. The guest was apparently only given 512M of memory? The first big test to help isolate where the problem is would be to test the same install on a physical (host) install with: mem=512M on the kernel commandline. This will tell us if the problem is in the virt block stack or beneath it. Bare metal x86_64 with mem=512 doesn't get very far at all. All I see is the some initial boot message. Probing EDD (edd=off to disable)... ok Following by nothing else. Sure sounds like 512 isn't enough anymore. Um, mem=512 is 512K... try mem=512M :) I'd be very surprised to find 512M is suddenly a non-starter for x86_64. I have an F12 x86_64 w/ LVM root running perfectly fine within a 512M guest (happens to be running rawhide's kernel-2.6.33-0.23.rc5.git1.fc13 but I can put any kernel you'd like me to try on it). (In reply to comment #6) > Um, mem=512 is 512K... Actually 'mem=512' may even be 512b; regardless 'mem=512M' should help you make forward progress. Heh, what, 512K isn't good enough? :) Okay, when using mem=512M, I'm able to get to stage#2 and start selecting drives to use for install. But them anaconda gets OOM killed. (In reply to comment #8) > Heh, what, 512K isn't good enough? :) > > Okay, when using mem=512M, I'm able to get to stage#2 and start selecting > drives to use for install. But them anaconda gets OOM killed. It gets OOM killed when? Before or after you've provisioned the storage (I assume you're just using the default -- which include LVM LVs et al). Before formatting the filesystems? Just after formatting the filesystems? While installing packages? In any case: sounds like rawhide's current installer kernel has a nasty leak somewhere. comment #0 indicates an anaconda crash/OOM when you're creating an LVM LV. Does it always crash/OOM there? Can you change the storage config to not use LVM at all and see if you still get the OOM? (In reply to comment #9) > (In reply to comment #8) > > Heh, what, 512K isn't good enough? :) > > > > Okay, when using mem=512M, I'm able to get to stage#2 and start selecting > > drives to use for install. But them anaconda gets OOM killed. > > It gets OOM killed when? Before or after you've provisioned the storage (I > assume you're just using the default -- which include LVM LVs et al). Before > formatting the filesystems? Just after formatting the filesystems? While > installing packages? Yes, I'm using the default partitioning scheme which included LVM. When tested yesterday, it failed prior to formatting. However, I believe it may have been examining the LVM metadata on the drives for consistency. > In any case: sounds like rawhide's current installer kernel has a nasty leak > somewhere. comment #0 indicates an anaconda crash/OOM when you're creating an > LVM LV. Does it always crash/OOM there? Can you change the storage config to > not use LVM at all and see if you still get the OOM? I can try this when I have physical access to the system again. Otherwise, I can reproduce this using a virt guest. I should note, that the OOMkill doesn't happen during partitioning, it seems to happen while anaconda is scanning existing partitions on the disks selected for install. The disks previously had LVM data on them. For example, the last command I see on the console is ... 16:26:46,011 INFO : Running... ['lvm', 'lvchange', '-a', 'y', 'vg_test1217/lv_root'] So, I think this means that there isn't a wait to work around this by not using LVM partitions, since LVM operations still take place during the install process. I did try blanking out the disks before proceeding. This allowed me to attempt an install without using LVM. However, I then encountered bug#560017 (In reply to comment #11) > I did try blanking out the disks before proceeding. This allowed me to attempt > an install without using LVM. However, I then encountered bug#560017 I've retested that procedure, and was able to install a virt guest with only 512M of memory without using any LVM logical or physical volumes. (In reply to comment #0) > The following was filed automatically by anaconda: > anaconda 13.22 exception report > Traceback (most recent call first): ... > LVMError: lvcreate failed for VolGroup/lv_root: 11:48:42,310 INFO : > Running... ['lvm', 'lvcreate', '-L', '6760m', '-n', 'lv_root', 'VolGroup'] BTW, this shouldn't have anything to do with OOM but mbroz just pointed out that new LVM2 will treat '6760m' differently than '6760M'. 'm' is 1*1000*1000 whereas 'M' is 1*1024*1024. Created attachment 389060 [details]
Attached traceback automatically from anaconda.
cc+ agk Reviewed during the F13 Alpha blocker bug meeting, 512M was thought to be common environment in virtualization, and LVM is used in the default install path. The team concluded that this issue could be moved to block F13Beta and reassigned to LVM for further investigation. Alasdair, do you have any thoughts on this issue? 07:04:25,471 INFO : Running... ['lvm', 'pvcreate', '/dev/sda3'] 07:04:29,894 INFO : Running... ['lvm', 'pvcreate', '/dev/sdb1'] 07:04:31,039 INFO : Running... ['lvm', 'vgcreate', '-s', '4m', 'vg_ibml4blp2', '/dev/sda3', '/dev/sdb1'] 07:04:38,116 INFO : Running... ['lvm', 'lvcreate', '-L', '24096m', '-n', 'lv_root', 'vg_ibml4blp2'] Can someone dig through the log and find the exact sizes of /dev/sda3 and /dev/sdb1? Then create similar-sized partitions and run a similar 4 lvm commands outside the installer and check for errors and see how much memory is used (will depend to some extent on other devices present and what lvm.conf settings anaconda uses). There may be two separate problems here - a failure followed by a failure to handle the failure sanely. And are we *still* missing an anaconda 'debug' option to gather lvm debugging information too? There's really no reason to be guessing what lvm is or isn't doing. Either add -vvvv to the lvm cmdline and log output to a file, or set up a debug log file with --config (or in lvm.conf). Created attachment 389425 [details]
Attached traceback automatically from anaconda.
Note, further review shows this is the cause of the rawhide acceptance test failures on i386 and x86_64 (see https://fedoraproject.org/wiki/Test_Results:Fedora_13_Rawhide_Acceptance_Test_3). Created attachment 389811 [details]
lvm lvcreate -vvvv -L 6760m -n lv_root vg_test1166test1166
Able to reproduce the failure from the install environment by running commands manually. The following command triggers the failure.
# lvm lvcreate -vvvv -L 6760m -n lv_root vg_test1166
See attached file for verbose command output.
(In reply to comment #20) > Created an attachment (id=389811) [details] > lvm lvcreate -vvvv -L 6760m -n lv_root vg_test1166test1166 > > Able to reproduce the failure from the install environment by running commands > manually. The following command triggers the failure. > > # lvm lvcreate -vvvv -L 6760m -n lv_root vg_test1166 > > See attached file for verbose command output. To be clear, the "failure" is _not_ an lvm command failure. The lvcreate runs just fine. Interestingly, while the lvcreate is running "install exited abnormally [1/1]" is reported and the system starts shutting down. Here is the a condensed overview of what that log attachment provides: # lvm pvcreate /dev/vda2 # lvm vgcreate -s 4m vg_test1166 /dev/vda2 # lvm lvcreate -vvvv -L 6760m -n lv_root vg_test1166 #metadata/pv_map.c:49 Allowing allocation on /dev/vda2 start PE 0 length 1922 #metadata/pv_manip.c:272 /dev/vda2 0: 0 1690: lv_root(0:0) #metadata/pv_manip.c:272 /dev/vda2 1: 1690 232: NULL(0:0) ... #mm/memlock.c:100 Locking memory #install exited abnormally [1/1] #disabling swap... #unmounting filesystems... /mnt/runtime done #mm/memlock.c:141 memlock_count inc to 1 ... /proc done /dev/pts done #mm/memlock.c:150 memlock_count dec to 0 #libdm-common.c:450 Created /dev/mapper/vg_test1166-lv_root /sys done # /selinux done waiting for mdraid sets to become clean... ... Logical volume "lv_root" created #locking/file_locking.c:74 Unlocking /var/lock/lvm/V_vg_test1166 ##locking/file_locking.c:51 _undo_flock /var/lock/lvm/V_vg_test1166 ##device/dev-io.c:532 Closed /dev/vda2 #sending termination signals...done #sending kill signals...done #you may safely reboot your system #Kernel panic - not syncing: Attempted to kill init! So this seems to beg the question: why is the "install exited abnormally"? If it is a lack of memory (OOM) shouldn't we see that in the log? If not can you boot the VM with serial console configured to capture the full kernel log? It is very strange that lvcreate would be somehow inducing anaconda to fail when the 'lvcreate' is executed from the commandline (independent of anaconda). All testing is done using a serial console. What you see is what you get. Problem 1: ---------- 20:45:09,212 ERR kernel:Killed process 808 (lvm) vsz:120884kB, anon-rss:532kB, file-rss:85288kB LVM is mmapping 85MB of memory and locking itself. As someone pointed on our phone call, it may be some extreme locale data, translations, or so (look at /proc/<pid>/maps to see which files are being mapped). Problem 2: ---------- 20:45:09,212 WARN kernel:69578 total pagecache pages Why is there more than 227MB cached pages and the kernel kills an user process? LVM locked 85MB of that, who locked the rest? Either there is a kernel bug (the kernel triggers OOM even with cached pages) or some other process used mlockall() its memory. BTW, for similar bug, look at https://bugzilla.redhat.com/show_bug.cgi?id=565995 I found that RHEL installer crashes with OOM even with plenty of cached pages. This out of memory crash is caused by lvm and glibc interaction. For each process with locale set, glibc mmaps the file /usr/lib/locale/locale-archive . The file contains definitions for all the locales and it has up to 100MB size (depending on the particular Linux distribution). Lvm calls mlockall() (to avoid deadlocks while suspending the block devices) and it locks all its address space in memory --- including this locale-archive file. Thus, lvm uses 100MB of fixed memory during its invocation and that's what causes the out-of-memory condition. You can see it even on normally installed system. If you type: LANG= lvm and look at VSZ of lvm, it is slightly above 20MB. If you type: LANG=cs_CZ.iso-8859-2 lvm , and look at VZS, it's over 100MB. As a short term fix, we would recommend Anaconda developers to unset environment variable "LANG" or set this variable to "C" when executing lvm commands. Using the default locale will prevent the glibc from mapping that large file and it will cut lvm memory consumption. As a long term fix, glibc must be fixed not to map such a big file. See the bug 553193. There are other problems because of this large file --- dmeventd is a daemon that locks itself in memory and it constantly takes 100MB of user's memory because of this locale file. *** Bug 567489 has been marked as a duplicate of this bug. *** 512M ram not enough i386 either, though less VM pressure than 64bit. I found what looks like a similar failure in the live install: live install of Alpha RC4 fails on a x86-64 VM with 512MB of RAM, the system hangs in the post-install filesystem modification stage. If you run with a console running top open, it hangs earlier, and shows all RAM on the system exhausted. It succeeds if I give the VM 1GB of RAM. I suspect this has the same root cause. New package built in rawhide which we hope has a lower memory footprint - it has a workaround to avoid pinning the locale archive file into memory. Please give it a try. lvm2-2_02_62-1_fc14 giving Rawhide a try is non-trivial, I think, as it's not easily installable. I think it'd be easier to test if you put it in f13 so we can try an install with the next f13 build... -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers I believe this is fixed in f13 - at least I managed an install on Friday with 512M. Hmm but I don't see the newer build - so there was another workaround for anaconda?? no, not as far as we're aware. Memory usage during install is an inexact science, it's not beyond the realms of possibility that it may succeed with 512MB in some circumstance. you _were_ doing an install which actually involved an LVM, right? if you did custom partitioning with no LVM, you obviously wouldn't hit this problem. -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers Yup just a default install: I couldn't complete them earlier without bumping vm memory to 1GB iirc. (In reply to comment #28) > New package built in rawhide which we hope has a lower memory footprint - it > has a workaround to avoid pinning the locale archive file into memory. > > Please give it a try. > > lvm2-2_02_62-1_fc14 Can you provide a Fedora 13 build that contains these changes? With that build, we can compose a test install image. This bug appears to have been reported against 'rawhide' during the Fedora 13 development cycle. Changing version to '13'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping Try: Package: lvm2-2.02.62-1.fc13 Tag: dist-f13-updates-candidate Status: complete Built by: agk ID: 161973 Ticket filed with Fedora release engineering to request one-off install images containing the new lvm2 build. For details, see https://fedorahosted.org/rel-eng/ticket/3515 I just installed Fedora 13 Beta TC0 x86_64 from DVD media: http://serverbeach1.fedoraproject.org/pub/alt/stage/13-Beta.TC0/ I used KVM and 512MB RAM was enough for default install, no problems occurred. VirtualBox still doesn't allow GUI install in Beta TC0. However, text install works with 384M with either i386 or x86_64 DVD ISO. It fails with 256M in x86_64 with essentially the same LVM-related anaconda traceback as in the OP (didn't try it with i386.) Tested a graphical (VNC) install of Fedora 13 Beta TC0 on a 512M x86_64 guest (using lvm2-2.02.62-1.fc13). The reported problem is not reproducible. Alasdair: are there other test results you'd like see against this updated package in Fedora 13? Or is this ready to be included in Fedora 13 Beta? (In reply to comment #41) > Alasdair: are there other test results you'd like see against this updated > package in Fedora 13? Or is this ready to be included in Fedora 13 Beta? We discussed this on a today's LVM call; Alasdair feels this lvm2 package is ready for inclusion in Fedora 13 Beta. Please submit it as an update then via bodhi, so that we can provide karma and get it included in the branched compose. As I understand it, MIN_GUI_RAM is still the same as in F12 (393216 = 384*1024). Shouldn't this be the minimum value to test (not 512M)? Also, my testing with vbox shows that the installer's message about not having enough RAM for a graphical install happens when the RAM is less than 415M (not less than 384M, as I would expect). Does anyone else see this behavior, and does it make sense? lvm2-2.02.62-1.fc13 has been submitted as an update for Fedora 13. http://admin.fedoraproject.org/updates/lvm2-2.02.62-1.fc13 It was decided during the blocker review meeting to focus this bug only on the reported problem. THerefore, since the reported issue no longer occurs with the updated lvm2 packages, I'm going to mark this issue closed. For continued investigation around the documented minimum hardware requirements for Fedora, please see bug#499585 lvm2-2.02.62-2.fc13 has been pushed to the Fedora 13 stable repository. If problems still persist, please make note of it in this bug report. Hi All, I am installing Fedora 13 on below system. System Details : DELL POWER EDGE : 2850 BIOS VERSION : AO1 x86_64 and 1024MB of RAM We have created 6 CD's of "Fedora 13 Linux" . and During installation , we are facing below errors. "LVMError: lvcreate failed for VolGroup/lv_root" and installation exited abnormally. Could you please help us to resolve this issue????? (In reply to comment #48) > lvm2-2.02.62-2.fc13 has been pushed to the Fedora 13 stable repository. If > problems still persist, please make note of it in this bug report. Hi All, I am installing Fedora 13 on below system. System Details : DELL POWER EDGE : 2850 BIOS VERSION : AO1 x86_64 and 1024MB of RAM We have created 6 CD's of "Fedora 13 Linux" . and During installation , we are facing below errors. "LVMError: lvcreate failed for VolGroup/lv_root" and installation exited abnormally. Could you please help us to resolve this issue????? (In reply to comment #50) > Could you please help us to resolve this issue????? The failure details you posted are not specific enough to attribute the error to this bug report. Can you auto-fill your bug report using the bug reporting mechanism provided by the installer? Alternatively, if that is not available, please file a new bug report manually and include all log files referenced at http://fedoraproject.org/wiki/How_to_debug_installation_problems#Log_Files. When you create the new bug, please add a comment here pointing to the new bug report. I had "Out of memory" problems when installing Fedora 13 i386 from Net install media. My computer is an IBM Thinkpad X31 (PM740 with 1024MB RAM, IDE 40GB HDD). I downloaded the entire Fedora 13 repository "Everything", and it was available on my local server via FTP. The images where also available via FTP. Those local mirrors where built via rsync. And up to date as of July 11th 2010. Every netinstall ISO was verified with sha256sums, and every burnt CD was verified with sha256sum. The installation goes as follow: - choosing normal hard drive, then partitioning with 1 /boot primary partition and 1 encrypted LVM group with some volumes for the rest of the system (/, /home, ...) and 1.2GB of swap - choosing standard install (default packages) - the first 1000 packets are installing fast - next from 1000 packets, install process begins to be slower and slower (using swap increasingly intensively) until it takes 10+minutes to install 500kB! - then after downloading OpenOffice core package, trying to unpack it - it says installation exited abnormally - I check the logs and I could read: ERR kernel: Out of Memory: kill process 72 (loader) score 2804 or a child ERR kernel: Killed process 359 (anaconda) vsz:290504kB, annon-rss:36kB, file-rss:36kB I tried to add some kernel parameters at boot (acpi=off nodma=ide and such) but still had no chance of getting Fedora13 installed (with default packets). I also tried to add swap (up to 10GB!) but it took 20minutes to install 1100 packets then took 2hours to install 10 until it tried to install "gnome games doc" for half an hour, then I interrupted the process. 3 hours of installation and there was still 30 packets to install. The 1024MB of RAM was used entirely and several gigs of the 10GB swap where used. I tried to boot from net install CD, or from install kernel, with and without "askmethod" parameter: still out of memory. I finally got around that problem by deselecting packets (OpenOffice, Gimp, and such not-useful-for-now packets) to get the number of packets below 1100. It got slow at the end, but it got through the entire installation process without complaining. I tried 7 times with different installations methods, and summed up to 10+ hours to try an install it. So there's definitely a memory problem. Feel free to ask questions. Bye for now. Dag Please file a new bug for your problem. It is not the same as this bug. |