Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1701078
Summary: | [Fedora29][aarch64][Gigabyte][r270] Internal error: Oops: 96000004 [#1] SMP | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | PaulB <pbunyan> |
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
Status: | CLOSED WONTFIX | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 29 | CC: | 33186108, ahs3, airlied, bskeggs, dbenoit, hdegoede, ichavero, itamar, jarodwilson, jbastian, jcm, jeremy, jeremy.linton, jglisse, jlinton, john.j5live, jonathan, josef, jpoulin, kernel-maint, linville, mchehab, mjg59, msalter, pbrobinson, pbunyan, pwhalen, rrichter, steved, wcohen, winson.lin |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | aarch64 | ||
OS: | All | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-05-08 18:20:29 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 245418 |
Description
PaulB
2019-04-18 01:03:14 UTC
All, --------------------- Here is a reproducer: --------------------- distro: Fedora-29 Everything aarch64 kernel: 4.18.5-300.fc29.aarch64 anaconda: 29.24.3-1.fc29 https://beaker.engineering.redhat.com/jobs/3476413 Note systems that reproduce this issue are gigabyte-r270: GIGABYTE R270 BIOS T49 02/02/2018 fwiw... the gigabyte-r120 systems with BIOS T49 install Fedora29 without issue: distro: Fedora-29 Everything aarch64 kernel: 4.18.5-300.fc29.aarch64 anaconda: 29.24.3-1.fc29 host: gigabyte-r120 bios: BIOS T49 02/02/2018 https://beaker.engineering.redhat.com/jobs/3476078 - PASS https://beaker.engineering.redhat.com/jobs/3476079 - PASS Best, -pbunyan All, ------------------------------------ Answering the outstanding questions: ------------------------------------ https://bugzilla.redhat.com/show_bug.cgi?id=1701078#c0 ---<-snip->--- 3. Did it work previously in Fedora? If so, what kernel version did the issue *first* appear? Old kernels are available for download at https://koji.fedoraproject.org/koji/packageinfo?packageID=8 : unknown - I will tests Fedora28 and follow up. yes - this issue is reproduced with Fedora28. see here - https://beaker.engineering.redhat.com/jobs/3482697 - FAIL https://bugzilla.redhat.com/show_bug.cgi?id=1701078#c0 ---<-snip->--- 4. Can you reproduce this issue? If so, please provide the steps to reproduce the issue below: yes - this fails consistently 5. Does this problem occur with the latest Rawhide kernel? To install the Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by ``sudo dnf update --enablerepo=rawhide kernel``: unknown - I will test Rawhide and follow up. yes - Fedora-Rawhide-20190417.n.0 Everything aarch64 also fails. Though, the failure is different. see here: https://beaker.engineering.redhat.com/jobs/3484673 http://beaker-archive.host.prod.eng.bos.redhat.com/beaker-logs/2019/04/34846/3484673/6760278/console.log ---<-snip->--- [ 59.756861] ---[ end trace 9061ffef8a40d3d7 ]--- [ 59.761519] WARNING: CPU: 7 PID: 1669 at arch/arm64/mm/numa.c:60 cpumask_of_node+0x44/0x70 [ 59.769778] Modules linked in: vfat fat nicvf cavium_ptp cavium_rng_vf crct10dif_ce ghash_ce nicpf joydev mdio_thunder thunder_bgx mdio_cavium thunderx_zip thunder_xcv thunderx_edac cavium_rng ipmi_ssif ipmi_devintf ipmi_msghandler ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm gpio_keys i2c_thunderx thunderx_mmc [ 59.800902] CPU: 7 PID: 1669 Comm: NetworkManager Tainted: G W 5.1.0-0.rc5.git1.1.fc31.aarch64 #1 [ 59.811066] Hardware name: GIGABYTE R270-T65-00/MT60-SC5-00, BIOS T49 02/02/2018 [ 59.818454] pstate: 60400005 (nZCv daif +PAN -UAO) [ 59.823238] pc : cpumask_of_node+0x44/0x70 [ 59.827327] lr : cpumask_local_spread+0xb8/0x160 [ 59.831935] sp : ffff00001d76b340 [ 59.835241] x29: ffff00001d76b340 x28: ffff81077e9ed44d [ 59.840546] x27: 0000000000000000 x26: 0000000000000001 [ 59.845851] x25: ffff000011845374 x24: 0000000000000000 [ 59.851156] x23: ffff000011845374 x22: ffff000011845098 [ 59.856460] x21: 0000000000000001 x20: 0000000000000001 [ 59.861765] x19: 0000000000000001 x18: 00000000fffffffc [ 59.867070] x17: 0000000000000000 x16: 0000000000000000 [ 59.872374] x15: 0000000000000001 x14: ffffffffffffffff [ 59.877678] x13: ffff000000000000 x12: 0000000000000028 [ 59.882983] x11: 0101010101010101 x10: ffff7f7f7f7fff7f [ 59.888288] x9 : 0000000000000000 x8 : ffff8107a938e180 [ 59.893592] x7 : 0000000000000000 x6 : 0000000000000000 [ 59.898897] x5 : 0000000000000080 x4 : ffffffffffffffff [ 59.904201] x3 : 0000000000000000 x2 : 0000000000000000 [ 59.909506] x1 : 0000000000000060 x0 : 0000000000000001 [ 59.914810] Call trace: [ 59.917249] cpumask_of_node+0x44/0x70 [ 59.920991] cpumask_local_spread+0xb8/0x160 [ 59.925257] nicvf_register_interrupts+0x324/0x388 [nicvf] [ 59.930737] nicvf_open+0x2a8/0x6f8 [nicvf] [ 59.934913] __dev_open+0xdc/0x178 [ 59.938307] __dev_change_flags+0x170/0x1c8 [ 59.942482] dev_change_flags+0x3c/0x78 [ 59.946311] do_setlink+0x7c8/0x9a0 [ 59.949792] __rtnl_newlink+0x590/0x6a8 [ 59.953620] rtnl_newlink+0x54/0x80 [ 59.957101] rtnetlink_rcv_msg+0x184/0x538 [ 59.961189] netlink_rcv_skb+0x40/0xf8 [ 59.964930] rtnetlink_rcv+0x28/0x38 [ 59.968497] netlink_unicast+0x15c/0x1d0 [ 59.972412] netlink_sendmsg+0x1b0/0x350 [ 59.976326] sock_sendmsg+0x4c/0x68 [ 59.979807] ___sys_sendmsg+0x288/0x2b8 [ 59.983634] __sys_sendmsg+0x64/0xa0 [ 59.987202] __arm64_sys_sendmsg+0x2c/0x38 [ 59.991291] el0_svc_common+0x78/0x128 [ 59.995032] el0_svc_handler+0x38/0x78 [ 59.998773] el0_svc+0x8/0xc [ 60.001646] irq event stamp: 0 [ 60.004692] hardirqs last enabled at (0): [<0000000000000000>] (null) [ 60.012080] hardirqs last disabled at (0): [<ffff0000100f97c4>] copy_process.isra.0.part.0+0x304/0x1500 [ 60.021464] softirqs last enabled at (0): [<ffff0000100f97c4>] copy_process.isra.0.part.0+0x304/0x1500 [ 60.030852] softirqs last disabled at (0): [<0000000000000000>] (null) [ 60.038246] ---[ end trace 9061ffef8a40d3d8 ]--- ---<-snip->--- Best, -pbunyan pwhalen, What's the process for nursing an aarch64 Fedora BZ along? Who assigns the Fedora BZ? Thank you, Paul. Best, -pbunyan Do you have any SRV-IO VF (Virtual function) or similar functionality enabled in the bios? (In reply to PaulB from comment #3) > pwhalen, > What's the process for nursing an aarch64 Fedora BZ along? > Who assigns the Fedora BZ? It should get picked up by one of the maintainers. Added to the ARM Tracker and I'll keep an eye on it. (In reply to Paul Whalen from comment #5) > (In reply to PaulB from comment #3) > > pwhalen, > > What's the process for nursing an aarch64 Fedora BZ along? > > Who assigns the Fedora BZ? > > It should get picked up by one of the maintainers. Added to the ARM Tracker > and I'll keep an eye on it. Paul, Please add the Fedora maintainer to the cc list so the BZ is on their radar. Thank you. Best, -pbunyan I think the VF's must be enabled, as that nicvf_main only gets triggered if VF's are found. But in the 5.1 crash I suspect there is an error in the SRAT/SLIT/DSDT (looking closer, maybe the nic is trying to set the node, and no such node exists) which means the node request likely isn't valid. If a couple prints are sprinked around nicvf_set_irq_affinity() and compared with the ACPI node information, im guesing you will see a mismatch. oh, just to complete the install, try `modprobe.blacklist=nicvf` on the kernel command line. Setting `modprobe.blacklist=nicvf` appears to result in an dracut initqueue timeout and drops you out into the Dracut emergency shell. https://beaker.engineering.redhat.com/jobs/3507807 That said, this could be an issue with the underlying gigabyte r270 that I'm using. I've queued a copy of Paul initial recipe on my host to verify that I can reproduce the original issue: https://beaker.engineering.redhat.com/jobs/3507865 I will update this issue with links to the console logs as soon as the jobs complete. Results confirm my findings: w/ modprobe.blacklist.nicvf -> Dracut initqueue timeout http://beaker-archive.host.prod.eng.bos.redhat.com/beaker-logs/2019/04/35078/3507807/6807486/console.log w/out modprobe.blacklist.nicvf -> Panic reproduced http://beaker-archive.host.prod.eng.bos.redhat.com/beaker-logs/2019/04/35078/3507865/6807589/console.log Was there anything further we can test on our side to assist with the debugging process? Can you clarify what releases/kernels you are testing on? It would likely be useful to re-test on F-30 GA release. Can you also clarify whether there has been changes in the firmware around NICs/SRV IO etc. (In reply to Peter Robinson from comment #12) > Can you clarify what releases/kernels you are testing on? Questions have already been answered in the previous comments of this BZ: ----------------------------------- Fedora29 [4.18.5-300.fc29.aarch64]: ----------------------------------- https://bugzilla.redhat.com/show_bug.cgi?id=1701078#c0 see Beaker job: https://beaker.engineering.redhat.com/jobs/3476180 ----------------------------------- Fedora28 [4.16.3-301.fc28.aarch64]: ----------------------------------- https://bugzilla.redhat.com/show_bug.cgi?id=1701078#c2 see Beaker job: https://beaker.engineering.redhat.com/jobs/3482697 -------------------------------------------------------------- Fedora-Rawhide-20190417.n.0 [5.1.0-0.rc5.git1.1.fc31.aarch64]: -------------------------------------------------------------- https://bugzilla.redhat.com/show_bug.cgi?id=1701078#c2 see Beaker job: https://beaker.engineering.redhat.com/jobs/3484673 > > It would likely be useful to re-test on F-30 GA release. > Jeremy Poulin <jpoulin>, please retest F-30 GA release, for Peter. > Can you also clarify whether there has been changes in the firmware around > NICs/SRV IO etc. Firmware version T49 has been around for sometime. There have been no recent changes. Best, -pbunyan To Summarize the Tests I ran ============================ F29 with flag https://beaker.engineering.redhat.com/jobs/3507807 -> http://beaker-archive.host.prod.eng.bos.redhat.com/beaker-logs/2019/04/35078/3507807/6807486/console.log F29 without flag https://beaker.engineering.redhat.com/jobs/3507865 -> http://beaker-archive.host.prod.eng.bos.redhat.com/beaker-logs/2019/04/35078/3507865/6807589/console.log Both my tests were run with the following: distro: Fedora-29 Everything aarch64 kernel: 4.18.5-300.fc29.aarch64 The test that panicked (w/out modprobe.blacklist=nicvf) go to anaconda, and was using version: anaconda: 29.24.3-1.fc29 The test that included modprobe.blacklist=nicvf never reached the anaconda step, else I believe it would use the same version. BIOS Date: 02/02/2018 14:11:01 Ver: T49 (This is the latest firmware to my knowledge - https://www.gigabyte.com/us/ARM-Server/R270-T65-rev-100#support-dl-bios). > Can you also clarify whether there has been changes in the firmware around NICs/SRV IO etc. I defer to Paul's answer on this. I will run the same jobs targeting F30 GA. F30 Results =========== F30 with flag https://beaker.engineering.redhat.com/jobs/3509182 -> https://beaker.engineering.redhat.com/recipes/6810199/logs/console.log This times out in the initqueue step just like it had for for F29. F30 with flag https://beaker.engineering.redhat.com/jobs/3509076 -> http://beaker-archive.host.prod.eng.bos.redhat.com/beaker-logs/2019/05/35090/3509076/6809994/console.log The panic is still present. distro: Fedora-30 Everything aarch64 kernel: 5.0.9-301.fc30.aarch64 anaconda: 30.25.6-2.fc30 BIOS Date: 02/02/2018 14:11:01 Ver: T49 (In reply to PaulB from comment #13) > (In reply to Peter Robinson from comment #12) > > Can you clarify what releases/kernels you are testing on? > > Questions have already been answered in the previous comments of this BZ: Actually no they weren't, there was no previous mention of F-28 in this bug at all. Please be a little bit more friendly if you actually want this dealt with! > ----------------------------------- > Fedora29 [4.18.5-300.fc29.aarch64]: > ----------------------------------- > https://bugzilla.redhat.com/show_bug.cgi?id=1701078#c0 > see Beaker job: https://beaker.engineering.redhat.com/jobs/3476180 This is a Fedora bug, beaker isn't publicly available to every one that may be replying to this bug so the references in Fedora are invalid. If the comments are private they don't exist in the Fedora community space and I don't see any missing comment numbers so I don't believe that's the case. > ----------------------------------- > Fedora28 [4.16.3-301.fc28.aarch64]: > ----------------------------------- > https://bugzilla.redhat.com/show_bug.cgi?id=1701078#c2 > see Beaker job: https://beaker.engineering.redhat.com/jobs/3482697 > > -------------------------------------------------------------- > Fedora-Rawhide-20190417.n.0 [5.1.0-0.rc5.git1.1.fc31.aarch64]: > -------------------------------------------------------------- > https://bugzilla.redhat.com/show_bug.cgi?id=1701078#c2 > see Beaker job: https://beaker.engineering.redhat.com/jobs/3484673 > > > > > It would likely be useful to re-test on F-30 GA release. > > > Jeremy Poulin <jpoulin>, please retest F-30 GA release, for Peter. > > > Can you also clarify whether there has been changes in the firmware around > > NICs/SRV IO etc. > Firmware version T49 has been around for sometime. > There have been no recent changes. That does not answer my question. To re word it. Has there been any specific configuration of NFV or related changes with in the firmware configuration. Can you reset the firmware to default settings. We've had numerous ThunderX systems (both X1 and X2) confirmed running without issues on Fedora all recent versions of Fedora. The only relatively recent issue we've had was an issue with their crypto drivers and that was some time ago and it was fixed in time for the GA release (28 I think from memory) so this issue is something specific to this system hence the questions, I am trying to ascertain what is different. > That does not answer my question. To re word it. Has there been any specific
> configuration of NFV or related changes with in the firmware configuration.
> Can you reset the firmware to default settings. We've had numerous ThunderX
> systems (both X1 and X2) confirmed running without issues on Fedora all
> recent versions of Fedora.
>
> The only relatively recent issue we've had was an issue with their crypto
> drivers and that was some time ago and it was fixed in time for the GA
> release (28 I think from memory) so this issue is something specific to this
> system hence the questions, I am trying to ascertain what is different.
We checked the host in question and there were no configuration files related to NFV, kvm, or anything we thought would be related. We are in the process of resetting the firmware back to default settings, and I'm going to re-run the jobs for F29 to determine if the issues persist with default settings. Do you need me to test anything outside of F29 on the fresh host?
Thanks!
F-30 would be good, in Fedora we never re-spin the installers so ultimately we need to look forward to F-30+ Good to know. I will include results for F-30. :) Some additional information since I'm not sure what might be relevant: == Platform Information == Manufacturer: Cavium Product Name: ThunderX CRB BIOS Version: T49 BIOS Release Date: 02/02/2018 == Firmware Information == Product Name: MergePoint EMS Product Information: MergePoint Embedded Management Software Firmware Version: 7.70 Firmware Updated: 06 Oct 2016, 19:13:24 (UTC+0000) ASIC Type: ast2400 == CPLD Information == MB CPLD Version: R06 BPB CPLD Version: R03 Additionally, there didn't seem to be any NFV related options for configuring in the firmware configuration options. So, it may be the actual difference between the machines is a variation in the thunderX model. The problematic one from the log is a CN8890-2000BG2601-ST-Y-G, AKA it has a bunch of extra accelerators that aren't part of the normal CP model. It also looks like the machine is booting in DT mode, which AFAIK, is not really optimal as this is an enterprise platform. You might add `acpi=force` or assure that the firmware is running in ACPI mode. Looking at the log (usually I too tend to keep fedora defects "community" by using a public id), it seems there are a number of firmware problems with node and IOMMU Ids: [ 30.985300] Failed to set up IOMMU for device 0000:01:01.4; retaining platform DMA ops [ 30.993496] thunderx_mmc: probe of 0000:01:01.4 failed with error -2 [ 30.993759] Failed to set up IOMMU for device 0000:01:01.3; retaining platform DMA ops [ 30.999979] Failed to set up IOMMU for device 0004:01:01.4; retaining platform DMA ops [ 31.008101] libphy: mdio_thunder: probed [ 31.015939] thunderx_mmc: probe of 0004:01:01.4 failed with error -2 [ 31.016588] thunder_xcv, ver 1.0 [ 31.016707] Failed to set up IOMMU for device 0000:01:09.2; retaining platform DMA ops [ 31.020400] mdio_thunder 0000:01:01.3: Added bus at 87e005003800 [ 31.030270] thunder_bgx, ver 1.0 [ 31.030388] Failed to set up IOMMU for device 0000:03:00.0; retaining platform DMA ops [ 31.037402] libphy: mdio_thunder: probed [ 31.043258] Failed to set up IOMMU for device 0000:01:10.0; retaining platform DMA ops [ 31.046642] mdio_thunder 0000:01:01.3: Added bus at 87e005003880 [ 31.072434] Failed to set up IOMMU for device 0004:01:01.3; retaining platform DMA ops [ 31.072949] i2c-thunderx 0000:01:09.2: Probed. Set system clock to 800000000 [ 31.073916] input: soc@0:gpio-keys as /devices/platform/soc@0/soc@0:gpio-keys/input/input3 [ 31.083501] libphy: mdio_thunder: probed [ 31.092988] i2c-thunderx 0000:01:09.2: SMBUS alert not active on this bus [ 31.101645] mdio_thunder 0004:01:01.3: Added bus at 97e005003800 [ 31.105526] ThunderX-ZIP 0000:03:00.0: Found ZIP device 0 177d:a01a on Node 0 [ 31.105603] Failed to set up IOMMU for device 0000:01:09.4; retaining platform DMA ops [ 31.112374] libphy: mdio_thunder: probed [ 31.118482] thunder_bgx 0000:01:10.0: BGX0 QLM mode: XFI [ 31.118552] Failed to set up IOMMU for device 0004:03:00.0; retaining platform DMA ops [ 31.118570] ThunderX-ZIP 0004:03:00.0: Found ZIP device 1 177d:a01a on Node -1 [ 31.123141] alg: No test for lzs (lzs-cavium) [ 31.125549] mdio_thunder 0004:01:01.3: Added bus at 97e005003880 [ 31.133824] i2c-thunderx 0000:01:09.4: Probed. Set system clock to 800000000 [ 31.141709] alg: No test for lzs (lzs-scomp-cavium) [ 31.142626] i2c-thunderx 0000:01:09.4: SMBUS alert not active on this bus [ 31.144143] audit: type=1130 audit(1556737441.150:7): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-udev-trigger comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 31.207445] Failed to set up IOMMU for device 0004:01:09.4; retaining platform DMA ops [ 31.208088] Failed to set up IOMMU for device 0000:01:10.1; retaining platform DMA ops [ 31.216015] i2c-thunderx 0004:01:09.4: Probed. Set system clock to 800000000 [ 31.223415] thunder_bgx 0000:01:10.1: BGX1 QLM mode: XLAUI [ 31.230429] i2c-thunderx 0004:01:09.4: SMBUS alert not active on this bus [ 31.236398] Failed to set up IOMMU for device 0004:01:10.0; retaining platform DMA ops [ 31.250801] thunder_bgx 0004:01:10.0: BGX2 QLM mode: XLAUI [ 31.256714] Failed to set up IOMMU for device 0004:01:10.1; retaining platform DMA ops [ 31.264754] thunder_bgx 0004:01:10.1: BGX3 QLM mode: XLAUI [ 31.277055] nicpf, ver 1.0 [ 31.279912] Failed to set up IOMMU for device 0002:01:00.0; retaining platform DMA ops [ 31.280575] Failed to set up IOMMU for device 0008:21:00.0; retaining platform DMA ops I'm put rrichter on CC, he may be able to help point in the right direction. No new information was obtained from running an install post firmware reset. I've explicitly listed the results below, just to be consistent. My next test will be to try to force acpi mode, as was Jeremy's suggestion in https://bugzilla.redhat.com/show_bug.cgi?id=1701078#c20. Post Firmware Reset Results =========================== F30 --------------------------- w/out modprobe.blacklist=nicvf https://beaker.engineering.redhat.com/jobs/3511486 -> http://beaker-archive.host.prod.eng.bos.redhat.com/beaker-logs/2019/05/35114/3511486/6815368/console.log Panic still occurs. w/ modprobe.blacklist=nicvf https://beaker.engineering.redhat.com/jobs/3511487 -> http://beaker-archive.host.prod.eng.bos.redhat.com/beaker-logs/2019/05/35114/3511487/6815369/console.log Timeout on initqueue still drops to emergency shell. F29 --------------------------- w/out modprobe.blacklist=nicvf https://beaker.engineering.redhat.com/jobs/3511488 -> http://beaker-archive.host.prod.eng.bos.redhat.com/beaker-logs/2019/05/35114/3511488/6815370/console.log Panic still occurs. w/ modprobe.blacklist=nicvf https://beaker.engineering.redhat.com/jobs/3511489 -> http://beaker-archive.host.prod.eng.bos.redhat.com/beaker-logs/2019/05/35114/3511489/6815371/console.log Timeout on initqueue still drops to emergency shell. (In reply to Jeremy Poulin from comment #17) > > That does not answer my question. To re word it. Has there been any specific > > configuration of NFV or related changes with in the firmware configuration. > > Can you reset the firmware to default settings. We've had numerous ThunderX > > systems (both X1 and X2) confirmed running without issues on Fedora all > > recent versions of Fedora. > > > > The only relatively recent issue we've had was an issue with their crypto > > drivers and that was some time ago and it was fixed in time for the GA > > release (28 I think from memory) so this issue is something specific to this > > system hence the questions, I am trying to ascertain what is different. > > We checked the host in question and there were no configuration files > related to NFV, kvm, or anything we thought would be related. We are in the > process of resetting the firmware back to default settings, and I'm going to > re-run the jobs for F29 to determine if the issues persist with default > settings. Do you need me to test anything outside of F29 on the fresh host? > > Thanks! All, I am adding winson.lin to this BZ. Winson is excellent and is our firmware contact for Gigabyte systems. He would have first hand knowledge and access to the firmware change log. Winson - can you assist in answering the question regarding the system firmware and NFV, please. --------------- reference note: --------------- All the gigabyte system have firmware version T49. Please note this issue is seen when installing Fedora on the gigabyte-r270 system only. Installing the gigabyte-r120 systems with Fedora is fine: https://beaker.engineering.redhat.com/jobs/3476079 - PASS https://beaker.engineering.redhat.com/jobs/3476079 - PASS Also RHEL8 installs fine on both gigabyte-r270 and gigabyte-r120 systems. Best, -pbunyan Hi ALL, I need your side system power on console log , for CPU SKU information. Like as below : SKU: CN8890-2000BG2601-AAP-PR-Y-G SKU: CN8890-2000BG2601-CP-Y-G SKU: CN8890-2000BG2601-ST-Y-G https://www.marvell.com/documents/o6h6who7rnkhiicjhbfh/ ThunderX_CP: Up to 48 highly efficient cores along with integrated vSoC, multiple 10/40 GbE and high memory bandwidth. This family is optimized for private and public cloud web servers and content delivery, web caching and social media data analytics workloads. ThunderX_ST: Up to 48 highly efficient cores along with integrated vSoC, multiple SATAv3 controllers, 10/40 GbE & PCIe Gen3 ports, high memory bandwidth, dual socket coherency, and scalable fabric for east-west as well as north-south traffic connectivity. This family includes hardware accelerators for data protection/ integrity/security, user to user efficient data movement (RoCE) and compressed storage. This family is optimized for Hadoop, block & object storage, distributed file storage and hot/warm/cold storage type workloads. ThunderX_SC: Up to 48 highly efficient cores along with integrated vSoC, 10/40 GbE connectivity, multiple PCIe Gen3 ports, high memory bandwidth, dual socket coherency, and scalable fabric for east-west as well as north-south traffic connectivity. The hardware accelerators include Cavium’s industry leading 4th generation NITROX and TurboDPI technol- ogy with acceleration for IPSec, SSL, Anti-virus, Anti-malware, firewall and DPI. This family is optimized for Secure Web frontend, security appliances and Cloud RAN type workloads. ThunderX_NT: Up to 48 highly efficient cores along with integrated vSoC, 10/40/100 GbE connectivity, multiple PCIe Gen3 ports, high memory bandwidth, dual socket coherency, and scalable fabric with feature rich capabilities for bandwidth provisioning , QoS, traffic Shaping and tunnel termination. The hardware accelerators include high packet throughput processing, network virtualization and data monitoring. This family is optimized for media servers, scale-out embedded application and NFV type workloads BR, Winson
>> I need your side system power on console log , for CPU SKU information.
Both need early power on console log from your side gigabyte-r270 and gigabyte-r120 systems.
Thanks you.
BR, Winson
So I tested out Jeremy Linton's suggestion to use acpi=force for Fedora 30, and that appeared to install properly: F30 w/ acpi=force --------------------------- https://beaker.engineering.redhat.com/jobs/3513166 -> http://beaker-archive.host.prod.eng.bos.redhat.com/beaker-logs/2019/05/35131/3513166/6819262/console.log Despite the installation working correction, the job still aborts. The issue that is encountered with this build is that the "restraint" package is not available; however, this is a known issue and is being tracked in https://bugzilla.redhat.com/show_bug.cgi?id=1699254. The relevant lines from the log are below: + yum -y install restraint-rhts beakerlib beakerlib-redhat Last metadata expiration check: 0:00:46 ago on Fri May 3 15:34:53 2019. Error: Problem: conflicting requests - package restraint-rhts-0.1.39-1.fc30eng.x86_64 does not have a compatible architecture - nothing provides restraint(x86-64) = 0.1.39-1.fc30eng needed by restraint-rhts-0.1.39-1.fc30eng.x86_64 (try to add '--skip-broken' to skip uninstallable packages) Winson, The SKU for my r270 is: SKU: CN8890-2000BG2601-ST-Y-G The SKU for Paul's r120 is: SKU: CN8880-1800BG2601-CP-Y-G Is this all the information you need? Just to confirm that the acpi=force does the trick on F29, I ran the job again expecting that it would pass (since restraint is built for aarch64 for Fedora 29). It works as expected. F2 w/ acpi=force ================ https://beaker.engineering.redhat.com/jobs/3476079 -> http://beaker-archive.host.prod.eng.bos.redhat.com/beaker-logs/2019/04/34760/3476079/6744278/console.log Edit for comment 27. I provided the wrong links: https://beaker.engineering.redhat.com/jobs/3518106 -> http://beaker-archive.host.prod.eng.bos.redhat.com/beaker-logs/2019/05/35181/3518106/6828554/console.log The upstream kernel, and thus Fedora, both prefer DeviceTree first and fall back to ACPI. We have switched that around in RHEL to make ACPI preferred since ACPI is required by the SBSA/SBBR standards for ARM Servers (and we tried to convince upstream to do the same, but they said no). But when testing Fedora, it's easy to forgot to add acpi=force to the kernel command line args. (In reply to winson.lin from comment #24) > >> I need your side system power on console log , for CPU SKU information. > > Both need early power on console log from your side gigabyte-r270 and > gigabyte-r120 systems. > > Thanks you. > > BR, Winson Winson, Than you for your reply. Jeremy added the info you requested here: https://bugzilla.redhat.com/show_bug.cgi?id=1701078#c26 However, as you can see this issue is resolve for Fedora with the use of acpi=force on the kernel command line. I don't know if it would be helpful to link back to any relevant documentation to the upstream decision to reject the change in default preference as it relates to ARM (I searched for it but was unsuccessful), but otherwise I believe this issue can be closed as resolved. I'm also having problems now finding the discussion in the mailing list archives, but here is the patch that made ACPI the fallback mechanism upstream: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b10d79f76085b577673395daf92d6208ae09196f If you need more documentation, let me know. And again, the RHEL kernel has a small patch that flips the behavior around and makes ACPI preferred and DeviceTree the fallback; it just changes one variable in arch/arm64/kernel/acpi.c: -static bool param_acpi_on __initdata; +static bool param_acpi_on __initdata = true; Thanks Jeff! I think what you have provided is sufficient context. I am closing this as WONTFIX since while it may be an issue, the solution has already been discussed and rejected upstream. If someone believes that there is sufficient grounds to re-open the discussion on this topic upstream, I'd welcome them to re-open this ticket to document the results of that discussion. I found one thread. There are others but all end the same way. http://lists.infradead.org/pipermail/linux-arm-kernel/2016-December/475059.html Winson, I brought this issue up in today's aarch64 meeting. Seems the thought is that if the systems firmware has an option to "enable acpi" then the DeviceTree would not be offered. All the gigabyte system have firmware version T49. I looked thru the firmware options in T49 for both r270 and r120 and I did not see an option to "enable acpi" specifically. Is there a plan to add this option in future firmware release? Apologies for the recap - but the consensus is a better resolution would be a firmware fix, rather than kernel command line option. ========== reference: ========== --------------------------------------------------------------------------------------- Please note this issue is seen when installing Fedora on the gigabyte-r270 system only: --------------------------------------------------------------------------------------- Fedora29 [4.18.5-300.fc29.aarch64]: https://beaker.engineering.redhat.com/jobs/3476180 Fedora28 [4.16.3-301.fc28.aarch64]: https://beaker.engineering.redhat.com/jobs/3482697 Fedora-Rawhide-20190417.n.0 [5.1.0-0.rc5.git1.1.fc31.aarch64]: https://beaker.engineering.redhat.com/jobs/3484673 --------------------------------------------------------- Installing the gigabyte-r120 systems with Fedora is fine: --------------------------------------------------------- https://beaker.engineering.redhat.com/jobs/3476079 - PASS https://beaker.engineering.redhat.com/jobs/3476079 - PASS Best, -pbunyan Hi ALL, ftp://ODMcustomer:download@ftp.gigabyte.com.tw/ThunderX/BIOS/F02a/ Please download F02a for NFV. ( if still call trace , then you can adjust ACPI setup items for debug ) BR, Winson Hi ALL, About Fedora29 have use on Gigabyte ThunderX2 ARM Server ? https://www.gigabyte.com/tw/ARM-Server/ ( R281-T94 / R281-T91 / R181-T92 / R181-T90 ) BR, Winson (In reply to winson.lin from comment #37) > Hi ALL, > > About Fedora29 have use on Gigabyte ThunderX2 ARM Server ? > > https://www.gigabyte.com/tw/ARM-Server/ > > ( R281-T94 / R281-T91 / R181-T92 / R181-T90 ) > > BR, Winson Winson, We have no "Gigabyte" cn99xx ThunderX2 systems at this time. The "Gigabyte" aarch64 systems we currently have are all cn88xx ThunderX systems. We do have other vendor cn99xx ThunderX2 systems. However as you know each vendor has their own firmwares. The other vendor firmware has the enable/disable acpi option in the bios. Also I have downloaded and updated on of our R270,T60 (cn88xx) with firmware F02a: https://bugzilla.redhat.com/show_bug.cgi?id=1701078#c36 I see that firmware F02a has the enable/disable acpi option. I have enable acpi in the firmware and am currently retesting. I will follow up when the results are complete. Thank you for your attention/assistance, Winson. Best, -pbunyan (In reply to PaulB from comment #38) > (In reply to winson.lin from comment #37) > > Hi ALL, > > > > About Fedora29 have use on Gigabyte ThunderX2 ARM Server ? > > > > https://www.gigabyte.com/tw/ARM-Server/ > > > > ( R281-T94 / R281-T91 / R181-T92 / R181-T90 ) > > > > BR, Winson > > Winson, > We have no "Gigabyte" cn99xx ThunderX2 systems at this time. > The "Gigabyte" aarch64 systems we currently have are all > cn88xx ThunderX systems. > > We do have other vendor cn99xx ThunderX2 systems. > However as you know each vendor has their own firmwares. > The other vendor firmware has the enable/disable acpi option in the bios. > > > Also I have downloaded and updated on of our R270,T60 (cn88xx) with > firmware F02a: > https://bugzilla.redhat.com/show_bug.cgi?id=1701078#c36 > > I see that firmware F02a has the enable/disable acpi option. > I have enable acpi in the firmware and am currently retesting. > I will follow up when the results are complete. > > Thank you for your attention/assistance, Winson. > > Best, > -pbunyan All, Retesting R270,T60 (cn88xx) with firmware F02a (with acpi enabled in the bios), I am, unfortunately, able to reproduce this issue: Fedora29: https://beaker.engineering.redhat.com/jobs/3536831 Fedora30: https://beaker.engineering.redhat.com/jobs/3536832 So it seems the bios option is NOT working as expected in firmware F02a. Best, -pbunyan Test on R270 with firmware F02 (https://www.gigabyte.cn/ARM-Server/R270-T64-rev-110/support#support-dl-bios/) and able to reproduce on kernel 4.19.90 from https://www.kernel.org/ |