Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1990657
Summary: | non-reproducible rustc/LLVM failures when compiling sha1collisiondetection crate | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Fabio Valentini <decathorpe> |
Component: | rust | Assignee: | Rust SIG <rust-sig> |
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 35 | CC: | amulhern, igor.raits, jistone, rust-sig, sguelton, TicoTimo, tstellar, zebob.m |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | rust-1.57.0-1.fc36 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-12-03 08:34:58 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1998105, 2002229, 2005256, 2021910, 2021912, 2024123 |
Description
Fabio Valentini
2021-08-05 20:57:06 UTC
cc Tom and Serge in case this is an LLVM bug.
> I could not reproduce this issue with Rust toolchains installed on Fedora 34 via rustup (neither stable nor nightly had this problem).
That might mean something in the rust-lang/llvm-project fork already fixed this for the upstream toolchain.
This bug appears to have been reported against 'rawhide' during the Fedora 35 development cycle. Changing version to 35. This issue is now affecting All current Fedora branches (33, 34, 35, rawhide), since Rust 1.54.0 was pushed to stable everywhere. It's blocking any updates for packages that depend on the sha1collisiondetection crate, for example, sequoia-openpgp. Is there anything I can do to help debug this issue? It's starting to block updates of the Sequoia PGP stack. Can you test if it's fixed with this build: https://koji.fedoraproject.org/koji/taskinfo?taskID=74838670 Not sure. Do I need to rebuild Rust against LLVM 13 instead of the llvm12 compat package for this to work? Support for LLVM 13 won't be ready until Rust 1.56, currently in upstream beta. I would have you compare upstream stable/beta, but you already said that doesn't reproduce? If you can extract the LLVM IR / bitcode and reproduce with llvm12's opt and llc, then that could be tried with newer opt and llc for comparison. https://rustc-dev-guide.rust-lang.org/backend/debugging.html (in particular, rustc ... -C no-prepopulate-passes --emit llvm-bc) I tried to capture bitcode myself, but it wasn't reproducible. The bitcode is exactly the same between good and bad runs, but I didn't find any combination of opt/llc that crashed in any way. I did find something with valgrind on the rustc process, which consistently has this error even on "good" runs: ==325== Invalid read of size 1 ==325== at 0x93E6CF4: getVisibility (GlobalValue.h:229) ==325== by 0x93E6CF4: LLVMGetVisibility (Core.cpp:1992) ==325== by 0x4F6D05C: LLVMRustGetVisibility (RustWrapper.cpp:1602) ==325== by 0x51AE144: rustc_codegen_llvm::mono_item::<impl rustc_codegen_llvm::context::CodegenCx>::should_assume_dso_local (mono_item.rs:106) ==325== by 0x51A41DF: rustc_codegen_llvm::consts::<impl rustc_codegen_llvm::context::CodegenCx>::get_static (consts.rs:289) ==325== by 0x51A1E05: rustc_codegen_llvm::common::<impl rustc_codegen_ssa::traits::consts::ConstMethods for rustc_codegen_llvm::context::CodegenCx>::scalar_to_backend (common.rs:267) ==325== by 0x521D477: rustc_codegen_ssa::mir::operand::OperandRef<V>::from_const (operand.rs:85) ==325== by 0x523D07A: eval_mir_constant_to_operand<rustc_codegen_llvm::builder::Builder> (constant.rs:20) ==325== by 0x523D07A: rustc_codegen_ssa::mir::operand::<impl rustc_codegen_ssa::mir::FunctionCx<Bx>>::codegen_operand (operand.rs:450) ==325== by 0x5238B33: rustc_codegen_ssa::mir::rvalue::<impl rustc_codegen_ssa::mir::FunctionCx<Bx>>::codegen_rvalue_operand (rvalue.rs:546) ==325== by 0x522DB68: codegen_statement<rustc_codegen_llvm::builder::Builder> (statement.rs:24) ==325== by 0x522DB68: codegen_block<rustc_codegen_llvm::builder::Builder> (block.rs:901) ==325== by 0x522DB68: rustc_codegen_ssa::mir::codegen_mir (mod.rs:258) ==325== by 0x51B6E09: rustc_codegen_ssa::base::codegen_instance (base.rs:342) ==325== by 0x51E249C: <rustc_middle::mir::mono::MonoItem as rustc_codegen_ssa::mono_item::MonoItemExt>::define (mono_item.rs:70) ==325== by 0x51F713E: rustc_codegen_llvm::base::compile_codegen_unit::module_codegen (base.rs:141) ==325== Address 0x1734a400 is 8 bytes after a block of size 56 alloc'd ==325== at 0x4840FF5: operator new(unsigned long) (vg_replace_malloc.c:417) ==325== by 0x94DDFE4: allocateFixedOperandUser (User.cpp:127) ==325== by 0x94DDFE4: llvm::User::operator new(unsigned long, unsigned int) (User.cpp:146) ==325== by 0x93CE0C6: operator new (ConstantsContext.h:55) ==325== by 0x93CE0C6: llvm::ConstantExprKeyType::create(llvm::Type*) const (ConstantsContext.h:612) ==325== by 0x93DA482: create (ConstantsContext.h:715) ==325== by 0x93DA482: llvm::ConstantUniqueMap<llvm::ConstantExpr>::getOrCreate(llvm::Type*, llvm::ConstantExprKeyType) (ConstantsContext.h:734) ==325== by 0x93E02F2: getFoldedCast (Constants.cpp:1937) ==325== by 0x93E02F2: getBitCast (Constants.cpp:2194) ==325== by 0x93E02F2: llvm::ConstantExpr::getBitCast(llvm::Constant*, llvm::Type*, bool) (Constants.cpp:2185) ==325== by 0x94AA7E0: llvm::Module::getOrInsertGlobal(llvm::StringRef, llvm::Type*) (Module.cpp:226) ==325== by 0x51C67D5: declare_global (declare.rs:60) ==325== by 0x51C67D5: rustc_codegen_llvm::consts::check_and_apply_linkage (consts.rs:157) ==325== by 0x51A34BC: rustc_codegen_llvm::consts::<impl rustc_codegen_llvm::context::CodegenCx>::get_static (consts.rs:234) ==325== by 0x51A1E05: rustc_codegen_llvm::common::<impl rustc_codegen_ssa::traits::consts::ConstMethods for rustc_codegen_llvm::context::CodegenCx>::scalar_to_backend (common.rs:267) ==325== by 0x521D477: rustc_codegen_ssa::mir::operand::OperandRef<V>::from_const (operand.rs:85) ==325== by 0x523D07A: eval_mir_constant_to_operand<rustc_codegen_llvm::builder::Builder> (constant.rs:20) ==325== by 0x523D07A: rustc_codegen_ssa::mir::operand::<impl rustc_codegen_ssa::mir::FunctionCx<Bx>>::codegen_operand (operand.rs:450) ==325== by 0x5238B33: rustc_codegen_ssa::mir::rvalue::<impl rustc_codegen_ssa::mir::FunctionCx<Bx>>::codegen_rvalue_operand (rvalue.rs:546) I'm not sure what's wrong here, but I did find one commit that's new in 13 which mentions UB in User subclasses, detected by GCC: https://github.com/llvm/llvm-project/commit/d58c7a92380e030af6e6f82ce55bc14a919f39ea And *possibly* related to that, upstream Rust+LLVM are built with Clang on x86-64, so if UB is involved, that may have different/worse effect when LLVM is built by GCC in Fedora. I'll try to get a scratch build with Rust 1.56-beta and LLVM 13 so we can see what that does. I *can* reproduce this with upstream toolchains, both stable 1.55 with LLVM 12 and beta 1.56 with LLVM 13. I used mock with --no-cleanup-after, then used rustup to get upstream stable/beta in that chroot. Simple "cargo +stable build" (or "+beta") hits the same kind of errors, though not every time -- "cargo +stable clean -p sha1collisiondetection" to remove just that part and try again. Yup, this is still happening with Rust 1.56 / LLVM 13 in Rawhide: https://koschei.fedoraproject.org/build/11383210 I figured out the error from valgrind -- llvm::Module::getOrInsertGlobal returns a Constant*, but LLVMGetVisibility expects a GlobalValue* (which is a subclass). Most of the time you do get a GlobalVariable* (further subclass), except when getOrInsertGlobal is given different types it instead returns a constant bitcast expression, as you see in this backtrace with getBitCast. The type casting used in LLVMGetVisibility does have a debug assertion, so I ran a build with that assertions enabled and it failed: rustc: /checkout/src/llvm-project/llvm/include/llvm/Support/Casting.h:269: typename cast_retty<X, Y *>::ret_type llvm::cast(Y *) [X = llvm::GlobalValue, Y = llvm::Value]: Assertion `isa<X>(Val) && "cast<Ty>() argument of incompatible type!"' failed. So, casting the wrong pointer type is Undefined Behavior, and the non-reproducible aspect of this bug is just "luck" of whatever happens to be in memory there. I'll look for or file a bug upstream, and then see if I can figure out why we're getting that mismatch for a bitcast. There's still a real compiler bug here, but you can avoid it by removing the redundant externs: --- sha1collisiondetection-0.2.3/lib/sha1.rs.orig +++ sha1collisiondetection-0.2.3/lib/sha1.rs @@ -2,10 +2,7 @@ non_upper_case_globals, unused_assignments, unused_mut)] use libc::memcpy; use libc::abort; -extern "C" { - static mut sha1_dvs: [dv_info_t; 0]; - fn ubc_check(W: *const uint32_t, dvmask: *mut uint32_t); -} +use crate::ubc_check::{sha1_dvs, ubc_check}; use libc::size_t; pub type __uint32_t = u32; // libc::uint32_t, but that is deprecated. pub type __uint64_t = u64; // libc::uint64_t, but that is deprecated. (In reply to Josh Stone from comment #12) > There's still a real compiler bug here, but you can avoid it by removing the > redundant externs: > > --- sha1collisiondetection-0.2.3/lib/sha1.rs.orig > +++ sha1collisiondetection-0.2.3/lib/sha1.rs > @@ -2,10 +2,7 @@ > non_upper_case_globals, unused_assignments, unused_mut)] > use libc::memcpy; > use libc::abort; > -extern "C" { > - static mut sha1_dvs: [dv_info_t; 0]; > - fn ubc_check(W: *const uint32_t, dvmask: *mut uint32_t); > -} > +use crate::ubc_check::{sha1_dvs, ubc_check}; > use libc::size_t; > pub type __uint32_t = u32; // libc::uint32_t, but that is deprecated. > pub type __uint64_t = u64; // libc::uint64_t, but that is deprecated. That's what I was about to do. @decathorpe do you mind if i push this fix until upstream fix the compiler bug? Please, let me handle this one. I will push the update to 0.2.4 at the same time. Just tell me which side tag I should build the package into. Nevermind, I pushed the changes I want to dist-git for rawhide, f35, and f34. A scratch build for rawhide succeeded, so I hope this really works around the problem. https://src.fedoraproject.org/rpms/rust-sha1collisiondetection/c/c50fc54f74d7c8fd66a0e24b8ee086ff209853d4?branch=rawhide Feel free to build it where you need it. However, in the future, I would appreciate it if you didn't update sequoia packages without asking me. They're security sensitive, and I would've wanted to make sure that everything is built in the right order (i.e. against the latest bug-and-security-fixed dependencies). FEDORA-2021-8cf89f9ce7 has been submitted as an update to Fedora 36. https://bodhi.fedoraproject.org/updates/FEDORA-2021-8cf89f9ce7 FEDORA-2021-8cf89f9ce7 has been pushed to the Fedora 36 stable repository. If problem still persists, please make note of it in this bug report. |