Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1128341
Summary: | aarch64: anaconda in VM dies with "SystemError: Could not determine system architecture." because blivet assumes aarch64 always has EFI | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Richard W.M. Jones <rjones> |
Component: | python-blivet | Assignee: | David Lehman <dlehman> |
Status: | CLOSED NOTABUG | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | rawhide | CC: | aaron, amit.shah, amulhern, anaconda-maint-list, bcl, berrange, cfergeau, crobinso, dlehman, drjones, dwmw2, g.kaviyarasu, itamar, jonathan, lersek, pbonzini, pbrobinson, pwhalen, rjones, scottt.tw, vanmeeuwen+fedora, virt-maint, vpodzime |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | aarch64 | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2014-12-06 21:04:11 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 922257 |
Description
Richard W.M. Jones
2014-08-09 09:07:33 UTC
I was looking through the source of blivet, which is where the bug happens (not Anaconda), and it seems that blivet assumes that aarch64 is always EFI-based, ie from blivet/platform.py: ... elif arch.isEfi(): if arch.isMactel(): return MacEFI() elif arch.isAARCH64(): return Aarch64EFI() else: return EFI() elif arch.isX86(): return X86() elif arch.isARM(): armMachine = arch.getARMMachine() if armMachine == "omap": return omapARM() else: return ARM() else: raise SystemError("Could not determine system architecture.") I'm installing to a VM, and currently (and for the foreseeable future) VMs won't have EFI. We will eventually add it, but it's going to be quite complicated. So anyway I believe this is an assumption in Blivet which is only correct for baremetal, not for VMs. (By the way I could not find the right component for Blivet, so I'm leaving this as Anaconda for now). Please note I am currently looking at how hard it would be to get aarch64 VMs to use EFI. If that can be done relatively easily, then fixing this bug would not be necessary. Will get back on this point. *** Bug 1166876 has been marked as a duplicate of this bug. *** I hit the same issue (see my duped bug), however I was passing UEFI/AAVMF roms to the VM. I just don't know if they are loaded when qemu boots off -kernel. This blivet check is just a minor bit though, I'm guessing even if we taught it that aarch64 isn't strictly _always_ on uefi, anaconda wouldn't set up the correct post install boot environment if it doesn't detect UEFI (just a guess). So figuring out what's going on at the qemu/kernel level is probably necessary (In reply to Cole Robinson from comment #4) > I hit the same issue (see my duped bug), however I was passing UEFI/AAVMF > roms to the VM. I just don't know if they are loaded when qemu boots off > -kernel. > > This blivet check is just a minor bit though, I'm guessing even if we taught > it that aarch64 isn't strictly _always_ on uefi, anaconda wouldn't set up > the correct post install boot environment if it doesn't detect UEFI (just a > guess). So figuring out what's going on at the qemu/kernel level is probably > necessary Is there a qemu/virtinstall/libvirt bug for this? If not, please open one with all the details. Here's some background on how the '-kernel' qemu option works, for the x86_64 (-M pc) target and the aarch64 (-M virt) target. (1) For x86_64, the -kernel flag causes qemu to load the kernel image, massage it a bit, and expose it under a number of fw_cfg keys. Then, it is the responsibility of the boot firmware to look for these fw_cfg keys, and to download & dispatch the kernel if it is available. Both SeaBIOS and OVMF implement this. Differently, of course, but both firmwares handle it. In qemu, see pc_memory_init() [hw/i386/pc.c] load_linux() load_multiboot() [hw/i386/multiboot.c] fw_cfg_add_bytes( ... FW_CFG_KERNEL_DATA ... ) fw_cfg_add_bytes( ... FW_CFG_KERNEL_DATA ... ) load_linux() either calls load_multiboot() or it doesn't, based on the image format; but in either case, FW_CFG_KERNEL_DATA is populated. In addition, "pc-bios/optionrom/linuxboot.S" in qemu provides a minimal boot loader (basically just an init trampoline) that gets compiled into an option ROM. (See qemu commit 57a46d05.) When SeaBIOS is used as firmware, it dispatches this minimal option ROM. The option ROM downloads the kernel using fw_cfg, and jumps to it. When OVMF is used as firmware, the minimal option ROM is ignored. Instead, OVMF looks for the fw_cfg keys in question directly, and downloads the kernel, fixes it up, and jumps to it. See "OvmfPkg/Library/PlatformBdsLib/QemuKernel.c" and "OvmfPkg/Library/LoadLinuxLib". Note that in the OVMF case, the kernel loaded thusly does run in a full UEFI environment, where the runtime services et al are available. (2) In case of the aarch64 target, '-kernel' works differently. machvirt_init() [hw/arm/virt.c] arm_load_kernel() [hw/arm/boot.c] write_bootloader() If the -kernel option was not used, then write_bootloader() is not reached, and execution will simply start at address 0, which is where the UEFI binary resides normally. If -kernel was used, then it is loaded into guest RAM, and a minimal boot loader is generated in qemu dynamically, in ARM machine code (see "bootloader_aarch64" and write_bootloader()). loader_start = vbi->memmap[VIRT_MEM].base, 0x40000000. When -kernel is used, the flash contents will simply not be executed (although the flash contents are correct). Note that an arm64 guest kernel loaded this way cannot be used for installing a guest. Since the UEFI blob is never launched, the guest kernel loaded with '-kernel' will have no access to UEFI runtime services, and -- to name just one thing -- it won't be able to configure boot options with "efibootmgr" for the installed guest. In other words, you can install a UEFI guest only when the installer kernel and Anaconda etc. already run in a full-blown UEFI environment. For this reason "blivet" is right to reject the environment that it finds itself in. (== Cole was right in comment 4.) The solution to this is to: - rework the arm code in qemu similarly to the i386 pc code (that is, always jump to the firmware, and expose the kernel to the firmware only as "data" initially, be it through a future fw_cfg mechanism that's appropriate for arm, or place it in guest RAM and reference it from the DTB), - *and* ArmVirtualizationQemu (== the arm32/arm64 counterpart of OVMF in edk2) needs to reuse (or clone) OVMF's QemuKernel.c and LoadLinuxLib features. Which translates to: - no BZ for blivet, - at least one BZ for (upstream) qemu, - at least one BZ for (upstream) ArmVirtualizationQemu in edk2. This feature is not small. What do we need it for? I'm aware of two possible use cases: - quick guest kernel development cycle facilitated by the '-kernel' option of qemu (you just rebuild the kernel on the host, and boot the guest directly with it, without having to copy it to a guest disk, updating grub.cfg in the guest, and so on) - guest installation from URL (ie. --location http://...) with virt-install. (In reply to Laszlo Ersek from comment #6) > This feature is not small. What do we need it for? I'm aware of two possible > use cases: > - quick guest kernel development cycle facilitated by the '-kernel' option > of qemu (you just rebuild the kernel on the host, and boot the guest > directly with it, without having to copy it to a guest disk, updating > grub.cfg in the guest, and so on) > - guest installation from URL (ie. --location http://...) with virt-install. On x86, -kernel is used in three places in our "stack" that I'm aware of: (1) virt-install uses it in order to implement the --location option, which is what you mention above. (2) libguestfs uses to boot /boot/vmlinuz instead of having to build a disk image containing the host kernel. As you mentioned above, but it's not a development option, it's how we work. (3) It's a valid way to configure guest VMs, using the <kernel> directive in libvirt. This is sometimes used as a way to get around bootloader problems in the guest, eg. if grub is broken in the guest or you can't install a bootloader for some reason, it's convenient to pull out the guest kernel [virt-builder --get-kernel], modify the libvirt config, and have a working guest again. Of these only (1) impacts blivet / anaconda / installation of VMs. (1) is seriously useful, but I didn't realize it would be so awkward to implement on ARM .. Fixing this in blivet will only get you so far because the platform class used for aarch64 is a subclass of the efi platform class. It will require either you make the VMs look more realistic or someone implement a blivet platform (and anaconda bootloader) class to match this vm-only reality. Personally, I don't think this bug should be assigned to blivet, but I don't know where else to assign it. Laszlo has done some work towards fixing it in qemu and edk2/aavmf, so let's move it to qemu So, after analyzing this bug to death, I'm moving it back to python-blivet, and closing it as NOTABUG at once. "python-blivet" is not at fault. Re comment 10, I prefer not to simply redirect the BZ to another component. We have covered a lot of ground in the comments above, and the objective has significantly diverged from the original bug report (see comment #0). Virtual UEFI firmware is now available for aarch64 guests. If issues remain that block Fedora 21 guest installation: https://fedoraproject.org/wiki/Architectures/AArch64/Install_with_QEMU then those are related to: - shim (trying to load "grubx64.efi") eg. in case of PXE installation, - bad installer media / lorax (no bootable ElTorito image), eg. in case of virtio-scsi ISO installation, - qemu (failure to combine guest UEFI with -kernel / -initrd / -append boot), eg. in case of URL installation (virt-install --location), - and potentially other packages. Let's leave this python-blivet BZ die in peace. Regarding URL installation specifically, please open a brand new BZ for the qemu component, with the following title: qemu-system-aarch64: support "-drive if=pflash" and "-kernel" simultaneously Please open such a BZ for *each* Product that wishes to track this feature. (The most recent upstream posting for this feature is at <http://thread.gmane.org/gmane.comp.emulators.qemu/309428>. The text of the cover letter can be reused as comment#0 for these new qemu BZs) As for libguestfs (comment #7 points (2) and (3)), those use cases don't need guest UEFI; they already work nicely with the traditional "-kernel" option of qemu (no guest UEFI needs to be passed with "-drive if=pflash"). Is there an actual resolution to the bug, because it's still happening even though I've passed the UEFI ROMs to the guest. (In reply to Richard W.M. Jones from comment #14) > Is there an actual resolution to the bug, because it's still happening > even though I've passed the UEFI ROMs to the guest. It turns out the version of qemu in Rawhide isn't new enough to activate UEFI in a guest. I've backported Laszlo's UEFI patches to qemu and will update it shortly. > Let's leave this python-blivet BZ die in peace.
Apologies for resurrecting such an old bug, but...
Why is it correct to assume aarch64 should always use UEFI? This is not always the case.
> Why is it correct to assume aarch64 should always use UEFI? This is not
> always the case.
Well that depends:
* blivet now (as of F-24+) has support for msdos as well as GPT partition tables.
* u-boot supports (as of 2016.05) uEFI boot and uEFI services emulation and that's the way we intend on supporting Fedora on various SBBs
* VMs on aarch64 will always use the tianocore uEFI firmware so in the context of this bug it's fixed
> * u-boot supports (as of 2016.05) uEFI boot and uEFI services emulation and that's the way we intend on supporting Fedora on various SBBs
Hmm, I did not know that. Still doesn't stop non-EFI aarch64 from being a valid platform, though. But if Fedora's policy is to only support EFI-based aarch64 architectures then at least I know that now, thanks for the response.
> Hmm, I did not know that. Still doesn't stop non-EFI aarch64 from being a
> valid platform, though. But if Fedora's policy is to only support EFI-based
> aarch64 architectures then at least I know that now, thanks for the response.
Support and work are two different things. We have limited resources so need to focus, especially on low level things like boot paths, on what provides us the largest amount of platforms to support for the best amount of effort (IE the best bang for our buck). We don't explicitly exclude anything and if people wish to provide patches for other methods I'll happily review them.
Alright, I'll have a look into the different options. Cheers. This bug just crossed my path and reminded me of bug 1267667, which I wrote long, long ago but has never received any comments... |