Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 2209635 - Clang don't work on s390x architecture in Fedora 38 and later
Summary: Clang don't work on s390x architecture in Fedora 38 and later
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: qemu
Version: 39
Hardware: s390x
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Ilya Leoshkevich
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: ZedoraTracker
TreeView+ depends on / blocked
 
Reported: 2023-05-24 10:19 UTC by Mikhail Mitskevich
Modified: 2023-11-07 08:40 UTC (History)
23 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-11-07 08:40:58 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gitlab qemu-project qemu issues 1668 0 None opened Fedora 38 build of clang 16 fails when run under s390x emulation (both system & linux-user) 2023-05-26 09:55:40 UTC

Description Mikhail Mitskevich 2023-05-24 10:19:54 UTC
Clang don't work on s390x architecture in Fedora 38 and later, 
all options are failed except --version

Reproducible: Always

Steps to Reproduce:
1. Run latest release of Fedora in Docker:
docker run --rm -it --pull always --platform linux/s390x fedora:latest bash
2.Update system:
# dnf -y update
3.Install clang:
# dnf install -y clang
# rpm -q clang
clang-16.0.3-1.fc38.s390x
4. Show clang version (only working option):
# clang --version
clang version 16.0.3 (Fedora 16.0.3-1.fc38)
Target: s390x-redhat-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
5.Run another clang command:
# clang --help
Output:
clang-16: error: unsupported option '--help'; did you mean '--help'?
clang-16: error: no input files
Actual Results:  
clang-16: error: unsupported option '--help'; did you mean '--help'?
clang-16: error: no input files

Expected Results:  
Clang help

Bug is reproduced on Fedora 38 and Rawhide on s390x. Bug is not reproduced on other architectures.

Comment 1 Tulio Magno Quites Machado Filho 2023-05-25 12:52:05 UTC
I can reproduce this issue.
Interestingly, it doesn't affect c9s.

While trying to debug it, sometimes I get a segfault:

Program received signal SIGSEGV, Segmentation fault.
0x000003fff48b7fb2 in AddTaggedVal () at /usr/src/debug/clang-16.0.3-1.fc38.s390x/include/clang/Basic/Diagnostic.h:1191
1191        DiagStorage->DiagArgumentsVal[DiagStorage->NumDiagArgs++] = V;

Comment 2 Tulio Magno Quites Machado Filho 2023-05-25 14:10:34 UTC
Interestingly, I can only reproduce this issue when emulating a system on qemu.
I can't reproduce this on the following system:
version         : FF
identification  : 2EB428
machine         : 8561

@mitskevichmn Are you emulating s390x? If not, could you confirm which system you're using, please?

Comment 3 Mikhail Mitskevich 2023-05-25 15:17:18 UTC
(In reply to Tulio Magno Quites Machado Filho from comment #2)
> Interestingly, I can only reproduce this issue when emulating a system on
> qemu.
> I can't reproduce this on the following system:
> version         : FF
> identification  : 2EB428
> machine         : 8561
> 
> @mitskevichmn Are you emulating s390x? If not, could you confirm
> which system you're using, please?

Yes, I emulated s390x in Docker via tonistiigi/binfmt emulator image. It uses qemu. Native system is amd64.

Comment 4 Tulio Magno Quites Machado Filho 2023-05-25 19:48:57 UTC
qemu maintainers, could you give us a hand here, please?

We're seeing a case where clang is misbehaving only when running on qemu.
In my case, I have qemu-7.0.0-15.fc37 installed.

I can reproduce the issue with the following commands:

 - On the host:
$ testcloud create --arch s390x --timeout 300 --ram 8196 --disksize 40 --vcpus 1 fedora:38

 - On the guest:
# dnf install -y clang
# clang --help
Output:
clang-16: error: unsupported option '--help'; did you mean '--help'?
clang-16: error: no input files
Actual Results:  
clang-16: error: unsupported option '--help'; did you mean '--help'?
clang-16: error: no input files

Comment 5 Richard W.M. Jones 2023-05-25 20:07:30 UTC
To be clear, this is qemu-system-s390x running on x86-64 host (not on s390 host)?
That would mean software emulation (TCG) is involved.

Comment 6 Richard W.M. Jones 2023-05-25 20:08:10 UTC
Also we'll need the full qemu command line and any errors printed by qemu,
which may appear in a log file or libvirt log.

Comment 7 Tulio Magno Quites Machado Filho 2023-05-25 23:11:26 UTC
(In reply to Richard W.M. Jones from comment #5)
> To be clear, this is qemu-system-s390x running on x86-64 host (not on s390
> host)?
> That would mean software emulation (TCG) is involved.

This is correct. I was able to reproduce it with the TCG enabled.

This is the qemu command line:

/usr/bin/qemu-system-s390x -name guest=naughty_wiles,debug-threads=on -S -object {"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-3-naughty_wiles/master-key.aes"} -machine s390-ccw-virtio-7.0,usb=off,dump-guest-core=off,memory-backend=s390.ram -accel tcg -cpu qemu -m 8196 -object {"qom-type":"memory-backend-ram","id":"s390.ram","size":8594128896} -overcommit mem-lock=off -smp 1,sockets=1,cores=1,threads=1 -uuid ce418863-f9dc-4909-a892-4113311eff29 -display none -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=34,server=on,wait=off -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -no-shutdown -boot strict=on -blockdev {"driver":"file","filename":"/var/lib/testcloud/backingstores/Fedora-Cloud-Base-38-1.6.s390x.qcow2","node-name":"libvirt-3-storage","cache":{"direct":false,"no-flush":true},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-3-format","read-only":true,"cache":{"direct":false,"no-flush":true},"driver":"qcow2","file":"libvirt-3-storage","backing":null} -blockdev {"driver":"file","filename":"/var/lib/testcloud/instances/naughty_wiles/naughty_wiles-local.qcow2","node-name":"libvirt-2-storage","cache":{"direct":false,"no-flush":true},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-2-format","read-only":false,"cache":{"direct":false,"no-flush":true},"driver":"qcow2","file":"libvirt-2-storage","backing":"libvirt-3-format"} -device {"driver":"virtio-blk-ccw","devno":"fe.0.0000","drive":"libvirt-2-format","id":"virtio-disk0","bootindex":1,"write-cache":"on"} -blockdev {"driver":"file","filename":"/var/lib/testcloud/instances/naughty_wiles/naughty_wiles-seed.img","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-1-format","read-only":false,"driver":"raw","file":"libvirt-1-storage"} -device {"driver":"virtio-blk-ccw","devno":"fe.0.0001","drive":"libvirt-1-format","id":"virtio-disk1"} -netdev tap,fd=35,id=hostnet0 -device {"driver":"virtio-net-ccw","netdev":"hostnet0","id":"net0","mac":"52:54:00:67:15:88","devno":"fe.0.0002"} -chardev pty,id=charserial0 -device {"driver":"sclpconsole","chardev":"charserial0","id":"serial0"} -device {"driver":"virtio-keyboard-ccw","id":"input0","devno":"fe.0.0003"} -audiodev {"id":"audio1","driver":"none"} -device {"driver":"virtio-balloon-ccw","id":"balloon0","devno":"fe.0.0004"} -object {"qom-type":"rng-random","id":"objrng0","filename":"/dev/urandom"} -device {"driver":"virtio-rng-ccw","rng":"objrng0","id":"rng0","devno":"fe.0.0005"} -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on

There are no error messages in the logs (I looked at journalctl and /var/log/libvirt/qemu/naughty_wiles.log).

Comment 8 Mikhail Mitskevich 2023-05-26 07:09:12 UTC
(In reply to Richard W.M. Jones from comment #5)
> To be clear, this is qemu-system-s390x running on x86-64 host (not on s390
> host)?
> That would mean software emulation (TCG) is involved.

Yes. Host system is x86-64.

Comment 9 Daniel Berrangé 2023-05-26 09:34:38 UTC
(In reply to Richard W.M. Jones from comment #5)
> To be clear, this is qemu-system-s390x running on x86-64 host (not on s390
> host)?
> That would mean software emulation (TCG) is involved.

Best way to reproduce is that illustrated in the initial bug description. ie running a s390x container using podman/docker - internally that spawns /usr/bin/qemu-s390x-static via binfmts. This is much less overhead for debugging than running a full virtual machine with emulation.

I tried some tests

 * qemu 7.2.1 (F38) with F36 container (clang 14) => --help works
 * qemu 7.2.1 (F38) with F37 container (clang 15) => --help works
 * qemu 7.2.1 (F38) with F38 container (clang 16) => --help fails

 * qemu 7.1.0 (upstream) with F36 container (clang 14) => --help works
 * qemu 7.1.0 (upstream) with F37 container (clang 15) => --help works
 * qemu 7.1.0 (upstream) with F38 container (clang 16) => --help fails

 * qemu v8.0.0-394-gc2b7158455 (upstream)  with F36 container (clang 14) => --help works
 * qemu v8.0.0-394-gc2b7158455 (upstream) with F37 container (clang 15) => --help works
 * qemu v8.0.0-394-gc2b7158455 (upstream) with F38 container (clang 16) => --help fails

So the problem is specific to some code in clang 16 build. 

Given that multiple qemu versions show the same behaviour, it doesn't look like a regression (at least not since 7.1.0), more likely to be a latent bug only just now exposed by the new clang builds.

What we can't tell is what the problem is exposed by a chagen in clang 15 -> 16, or whether the change is exposed by the newer GCC version that compiled  clang 16.  ie if we compiled older clang 15, with the F38 GCC, would it expose the same bug or not.

Comment 10 Daniel Berrangé 2023-05-26 09:56:52 UTC
I filed this upstream since I think it'll need upstream s390x maintainer attention

 https://gitlab.com/qemu-project/qemu/-/issues/1668

Comment 11 Paolo Bonzini 2023-05-26 09:58:16 UTC
Is this qemu-s390x or qemu-system-s390x?

Comment 12 Daniel Berrangé 2023-05-26 10:01:51 UTC
(In reply to Paolo Bonzini from comment #11)
> Is this qemu-s390x or qemu-system-s390x?

Both - initial bug report shows docker which is qemu-s390x, but comment #4 shows a VM which is qemu-system-s390x.

IOW the problem is likely to be TCG.

Comment 13 Tulio Magno Quites Machado Filho 2023-05-26 17:15:46 UTC
(In reply to Daniel Berrangé from comment #9)
> What we can't tell is what the problem is exposed by a chagen in clang 15 ->
> 16, or whether the change is exposed by the newer GCC version that compiled 
> clang 16.  ie if we compiled older clang 15, with the F38 GCC, would it
> expose the same bug or not.

I suspect it could be related to a newer GCC version change because clang 16 is built on c9s with GCC 12 and works.
Meanwhile Fedora 38 and rawhide use GCC 13.

Comment 14 Daniel Berrangé 2023-05-26 17:26:01 UTC
(In reply to Tulio Magno Quites Machado Filho from comment #13)
> (In reply to Daniel Berrangé from comment #9)
> > What we can't tell is what the problem is exposed by a chagen in clang 15 ->
> > 16, or whether the change is exposed by the newer GCC version that compiled 
> > clang 16.  ie if we compiled older clang 15, with the F38 GCC, would it
> > expose the same bug or not.
> 
> I suspect it could be related to a newer GCC version change because clang 16
> is built on c9s with GCC 12 and works.
> Meanwhile Fedora 38 and rawhide use GCC 13.

Ah, good reference for c9s, thanks.

Relevant changes in the toolchain in F38 were:

https://fedoraproject.org/wiki/Changes/Add_FORTIFY_SOURCE%3D3_to_distribution_build_flags
https://fedoraproject.org/wiki/Changes/fno-omit-frame-pointer
https://fedoraproject.org/wiki/Changes/GNUToolchainF38
https://fedoraproject.org/wiki/Changes/z13BaselineForIBMZ

Of those, I feel like the last one - moving from zEC12 -> z13 is a strong candidate:

- %__cflags_arch_s390x %[0%{?rhel} >= 9 ? "-march=z14 -mtune=z15" : "-march=zEC12 -mtune=z13"]
+ %__cflags_arch_s390x %[0%{?rhel} >= 9 ? "-march=z14 -mtune=z15" : "-march=z13 -mtune=z14"] 

this will almost certainly have resulted in code triggering new codepaths in QEMU that we weren't previously hitting due to the older baseline

Comment 15 Ilya Leoshkevich 2023-05-26 17:47:19 UTC
Apparently the problem is the LOCFHR implementation in QEMU. I posted a tentative fix in https://gitlab.com/qemu-project/qemu/-/issues/1668.

Comment 16 Richard W.M. Jones 2023-05-27 06:51:45 UTC
Thanks Ilya! Upstream patch series:
https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg06962.html

Comment 17 Tulio Magno Quites Machado Filho 2023-05-31 17:52:58 UTC
I applied this patch series and confirmed that clang --help does work now.
I went ahead and run a more complex test and I found out some binaries were crashing.
I reported bug 2211472 in order to track this new issue.

Thank you all!

Comment 18 Richard W.M. Jones 2023-06-01 08:04:22 UTC
I assigned the bug to Ilya in order to triage the bug, but note there is
nothing to do here.  It will get fixed automatically next time qemu is released
and once that happens we can just close the bug.

Comment 19 Stephen Gallagher 2023-06-08 15:58:59 UTC
For the record, I'm also seeing this bug in my CI pipelines for sscg: https://github.com/sgallagher/sscg/actions

Comment 20 Tom Stellard 2023-11-06 23:52:56 UTC
What version of QEMU will this fix land in?

Comment 21 Ilya Leoshkevich 2023-11-07 02:22:57 UTC
The fix is in qemu v8.1.0.

Comment 22 Richard W.M. Jones 2023-11-07 08:40:58 UTC
Fixed in Fedora 39 and above.


Note You need to log in before you can comment on or make changes to this bug.