Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.

Bug 1990657

Summary: non-reproducible rustc/LLVM failures when compiling sha1collisiondetection crate
Product: [Fedora] Fedora Reporter: Fabio Valentini <decathorpe>
Component: rustAssignee: Rust SIG <rust-sig>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 35CC: amulhern, igor.raits, jistone, rust-sig, sguelton, TicoTimo, tstellar, zebob.m
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: rust-1.57.0-1.fc36 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-12-03 08:34:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1998105, 2002229, 2005256, 2021910, 2021912, 2024123    

Description Fabio Valentini 2021-08-05 20:57:06 UTC
Description of problem:

Since the upgrade from Rust 1.53 to 1.54, the rust-sha1collisiondetection package has non-deterministic build failures in rawhide that look like the come from Rust <-> LLVM interaction, or memory corruption somewhere. This can also be seen on koschei:

https://koschei.fedoraproject.org/package/rust-sha1collisiondetection?

There's three possible outcomes for compiling the sha1collisiondetection crate, as far as I can tell:

1) LLVM error

LLVM ERROR: Invalid LLVMRustVisibility value!
(exit status: 101)

2) rustc / LLVM crash

local variable requires a valid scope

or

invalid scope
!6464 = !DILocalVariable(name: "ihvtmp", scope: !"

or

invalid scope
!6469 = !DILocalVariable(name: "ihvtmp", scope: !"\01\00\00\00\AF\86\00\00\01\00u\98\FF\FF\00\00\0A\00\00\00\FF\FF\00\00D\00\00\00\FF\FF\00\00\C1\01\00\00\00\00\00\00\D0\D9\E6\8D\FF\FF\00\00\80\00\00\8C\FF\FF\00\00\00\00\00\00\01\00\00\00\FE\22\00\00\01\00u\98\01\00\00\00=\11\00\00\01\00u\98\FF\FF\00\00D\00\00\00\FF\FF\00\00\81\01\00\00\00\00\00\00\D0\D9\E6\8D\FF\FF\00\00\80\00\00\8C\FF\FF\00\00\01\00\00\00\09\86\00\00\01\00u\98\01\00\00\00L\11\00\00\01\00u\98\01\00\00\00\83\85\00\00\01\00u\98\FF\FF\00\00A\01\00\00\00\00\00\00 ... (a few more pages of this garbage)

(signal: 11, SIGSEGV: invalid memory reference)

3) Compilation finishes successfully


Version-Release number of selected component (if applicable):

rust-1.54.0-1.fc35.x86_64
llvm-libs-12.0.1-2.fc35.x86_64


How reproducible:

Non-deterministic. But seems to happen about 50% of the time on my local machine and in koschei.


Steps to Reproduce:

1. fedpkg clone rust-sha1collisiondetection
2. cd rust-sha1collisiondetection
3. fedpkg mockbuild


Actual results:

Around 50% of the time, there's an issue in rust/llvm, and the build fails.


Expected results:

Build should pass 100% of the time.


Additional info:

I can only reproduce this on my machine when using the rust 1.54.0 RPM package on Fedora rawhide. I could not reproduce this issue with Rust toolchains installed on Fedora 34 via rustup (neither stable nor nightly had this problem).

Build logs where I got the error message snippets from can be found on koschei:
https://koschei.fedoraproject.org/package/rust-sha1collisiondetection?

The aarch64 build log from the failed build on 2021-08-01 seems particularly ... bad. Is LLVM reading past the end of a string there?

Comment 1 Josh Stone 2021-08-06 00:30:50 UTC
cc Tom and Serge in case this is an LLVM bug.

> I could not reproduce this issue with Rust toolchains installed on Fedora 34 via rustup (neither stable nor nightly had this problem).

That might mean something in the rust-lang/llvm-project fork already fixed this for the upstream toolchain.

Comment 2 Ben Cotton 2021-08-10 13:35:41 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 35 development cycle.
Changing version to 35.

Comment 3 Fabio Valentini 2021-08-24 15:14:10 UTC
This issue is now affecting All current Fedora branches (33, 34, 35, rawhide), since Rust 1.54.0 was pushed to stable everywhere.
It's blocking any updates for packages that depend on the sha1collisiondetection crate, for example, sequoia-openpgp.

Comment 4 Fabio Valentini 2021-09-11 19:08:35 UTC
Is there anything I can do to help debug this issue?
It's starting to block updates of the Sequoia PGP stack.

Comment 5 Tom Stellard 2021-09-13 12:24:06 UTC
Can you test if it's fixed with this build: https://koji.fedoraproject.org/koji/taskinfo?taskID=74838670

Comment 6 Fabio Valentini 2021-09-13 13:25:47 UTC
Not sure. Do I need to rebuild Rust against LLVM 13 instead of the llvm12 compat package for this to work?

Comment 7 Josh Stone 2021-09-13 21:41:53 UTC
Support for LLVM 13 won't be ready until Rust 1.56, currently in upstream beta. I would have you compare upstream stable/beta, but you already said that doesn't reproduce?

If you can extract the LLVM IR / bitcode and reproduce with llvm12's opt and llc, then that could be tried with newer opt and llc for comparison.
https://rustc-dev-guide.rust-lang.org/backend/debugging.html
(in particular, rustc ... -C no-prepopulate-passes --emit llvm-bc)

Comment 8 Josh Stone 2021-09-14 16:51:06 UTC
I tried to capture bitcode myself, but it wasn't reproducible. The bitcode is exactly the same between good and bad runs, but I didn't find any combination of opt/llc that crashed in any way.

I did find something with valgrind on the rustc process, which consistently has this error even on "good" runs:

==325== Invalid read of size 1
==325==    at 0x93E6CF4: getVisibility (GlobalValue.h:229)
==325==    by 0x93E6CF4: LLVMGetVisibility (Core.cpp:1992)
==325==    by 0x4F6D05C: LLVMRustGetVisibility (RustWrapper.cpp:1602)
==325==    by 0x51AE144: rustc_codegen_llvm::mono_item::<impl rustc_codegen_llvm::context::CodegenCx>::should_assume_dso_local (mono_item.rs:106)
==325==    by 0x51A41DF: rustc_codegen_llvm::consts::<impl rustc_codegen_llvm::context::CodegenCx>::get_static (consts.rs:289)
==325==    by 0x51A1E05: rustc_codegen_llvm::common::<impl rustc_codegen_ssa::traits::consts::ConstMethods for rustc_codegen_llvm::context::CodegenCx>::scalar_to_backend (common.rs:267)
==325==    by 0x521D477: rustc_codegen_ssa::mir::operand::OperandRef<V>::from_const (operand.rs:85)
==325==    by 0x523D07A: eval_mir_constant_to_operand<rustc_codegen_llvm::builder::Builder> (constant.rs:20)
==325==    by 0x523D07A: rustc_codegen_ssa::mir::operand::<impl rustc_codegen_ssa::mir::FunctionCx<Bx>>::codegen_operand (operand.rs:450)
==325==    by 0x5238B33: rustc_codegen_ssa::mir::rvalue::<impl rustc_codegen_ssa::mir::FunctionCx<Bx>>::codegen_rvalue_operand (rvalue.rs:546)
==325==    by 0x522DB68: codegen_statement<rustc_codegen_llvm::builder::Builder> (statement.rs:24)
==325==    by 0x522DB68: codegen_block<rustc_codegen_llvm::builder::Builder> (block.rs:901)
==325==    by 0x522DB68: rustc_codegen_ssa::mir::codegen_mir (mod.rs:258)
==325==    by 0x51B6E09: rustc_codegen_ssa::base::codegen_instance (base.rs:342)
==325==    by 0x51E249C: <rustc_middle::mir::mono::MonoItem as rustc_codegen_ssa::mono_item::MonoItemExt>::define (mono_item.rs:70)
==325==    by 0x51F713E: rustc_codegen_llvm::base::compile_codegen_unit::module_codegen (base.rs:141)
==325==  Address 0x1734a400 is 8 bytes after a block of size 56 alloc'd
==325==    at 0x4840FF5: operator new(unsigned long) (vg_replace_malloc.c:417)
==325==    by 0x94DDFE4: allocateFixedOperandUser (User.cpp:127)
==325==    by 0x94DDFE4: llvm::User::operator new(unsigned long, unsigned int) (User.cpp:146)
==325==    by 0x93CE0C6: operator new (ConstantsContext.h:55)
==325==    by 0x93CE0C6: llvm::ConstantExprKeyType::create(llvm::Type*) const (ConstantsContext.h:612)
==325==    by 0x93DA482: create (ConstantsContext.h:715)
==325==    by 0x93DA482: llvm::ConstantUniqueMap<llvm::ConstantExpr>::getOrCreate(llvm::Type*, llvm::ConstantExprKeyType) (ConstantsContext.h:734)
==325==    by 0x93E02F2: getFoldedCast (Constants.cpp:1937)
==325==    by 0x93E02F2: getBitCast (Constants.cpp:2194)
==325==    by 0x93E02F2: llvm::ConstantExpr::getBitCast(llvm::Constant*, llvm::Type*, bool) (Constants.cpp:2185)
==325==    by 0x94AA7E0: llvm::Module::getOrInsertGlobal(llvm::StringRef, llvm::Type*) (Module.cpp:226)
==325==    by 0x51C67D5: declare_global (declare.rs:60)
==325==    by 0x51C67D5: rustc_codegen_llvm::consts::check_and_apply_linkage (consts.rs:157)
==325==    by 0x51A34BC: rustc_codegen_llvm::consts::<impl rustc_codegen_llvm::context::CodegenCx>::get_static (consts.rs:234)
==325==    by 0x51A1E05: rustc_codegen_llvm::common::<impl rustc_codegen_ssa::traits::consts::ConstMethods for rustc_codegen_llvm::context::CodegenCx>::scalar_to_backend (common.rs:267)
==325==    by 0x521D477: rustc_codegen_ssa::mir::operand::OperandRef<V>::from_const (operand.rs:85)
==325==    by 0x523D07A: eval_mir_constant_to_operand<rustc_codegen_llvm::builder::Builder> (constant.rs:20)
==325==    by 0x523D07A: rustc_codegen_ssa::mir::operand::<impl rustc_codegen_ssa::mir::FunctionCx<Bx>>::codegen_operand (operand.rs:450)
==325==    by 0x5238B33: rustc_codegen_ssa::mir::rvalue::<impl rustc_codegen_ssa::mir::FunctionCx<Bx>>::codegen_rvalue_operand (rvalue.rs:546)

I'm not sure what's wrong here, but I did find one commit that's new in 13 which mentions UB in User subclasses, detected by GCC:
https://github.com/llvm/llvm-project/commit/d58c7a92380e030af6e6f82ce55bc14a919f39ea

And *possibly* related to that, upstream Rust+LLVM are built with Clang on x86-64, so if UB is involved, that may have different/worse effect when LLVM is built by GCC in Fedora.

I'll try to get a scratch build with Rust 1.56-beta and LLVM 13 so we can see what that does.

Comment 9 Josh Stone 2021-09-14 17:37:21 UTC
I *can* reproduce this with upstream toolchains, both stable 1.55 with LLVM 12 and beta 1.56 with LLVM 13. I used mock with --no-cleanup-after, then used rustup to get upstream stable/beta in that chroot. Simple "cargo +stable build"  (or "+beta") hits the same kind of errors, though not every time -- "cargo +stable clean -p sha1collisiondetection" to remove just that part and try again.

Comment 10 Fabio Valentini 2021-10-22 08:56:15 UTC
Yup, this is still happening with Rust 1.56 / LLVM 13 in Rawhide:
https://koschei.fedoraproject.org/build/11383210

Comment 11 Josh Stone 2021-11-19 17:32:04 UTC
I figured out the error from valgrind -- llvm::Module::getOrInsertGlobal returns a Constant*, but LLVMGetVisibility expects a GlobalValue* (which is a subclass). Most of the time you do get a GlobalVariable* (further subclass), except when getOrInsertGlobal is given different types it instead returns a constant bitcast expression, as you see in this backtrace with getBitCast. The type casting used in LLVMGetVisibility does have a debug assertion, so I ran a build with that assertions enabled and it failed:

rustc: /checkout/src/llvm-project/llvm/include/llvm/Support/Casting.h:269: typename cast_retty<X, Y *>::ret_type llvm::cast(Y *) [X = llvm::GlobalValue, Y = llvm::Value]: Assertion `isa<X>(Val) && "cast<Ty>() argument of incompatible type!"' failed.

So, casting the wrong pointer type is Undefined Behavior, and the non-reproducible aspect of this bug is just "luck" of whatever happens to be in memory there.

I'll look for or file a bug upstream, and then see if I can figure out why we're getting that mismatch for a bitcast.

Comment 12 Josh Stone 2021-11-19 20:32:51 UTC
There's still a real compiler bug here, but you can avoid it by removing the redundant externs:

--- sha1collisiondetection-0.2.3/lib/sha1.rs.orig
+++ sha1collisiondetection-0.2.3/lib/sha1.rs
@@ -2,10 +2,7 @@
          non_upper_case_globals, unused_assignments, unused_mut)]
 use libc::memcpy;
 use libc::abort;
-extern "C" {
-    static mut sha1_dvs: [dv_info_t; 0];
-    fn ubc_check(W: *const uint32_t, dvmask: *mut uint32_t);
-}
+use crate::ubc_check::{sha1_dvs, ubc_check};
 use libc::size_t;
 pub type __uint32_t = u32; // libc::uint32_t, but that is deprecated.
 pub type __uint64_t = u64; // libc::uint64_t, but that is deprecated.

Comment 13 Robert-André Mauchin 🐧 2021-11-20 17:28:19 UTC
(In reply to Josh Stone from comment #12)
> There's still a real compiler bug here, but you can avoid it by removing the
> redundant externs:
> 
> --- sha1collisiondetection-0.2.3/lib/sha1.rs.orig
> +++ sha1collisiondetection-0.2.3/lib/sha1.rs
> @@ -2,10 +2,7 @@
>           non_upper_case_globals, unused_assignments, unused_mut)]
>  use libc::memcpy;
>  use libc::abort;
> -extern "C" {
> -    static mut sha1_dvs: [dv_info_t; 0];
> -    fn ubc_check(W: *const uint32_t, dvmask: *mut uint32_t);
> -}
> +use crate::ubc_check::{sha1_dvs, ubc_check};
>  use libc::size_t;
>  pub type __uint32_t = u32; // libc::uint32_t, but that is deprecated.
>  pub type __uint64_t = u64; // libc::uint64_t, but that is deprecated.

That's what I was about to do.

@decathorpe do you mind if i push this fix until upstream fix the compiler bug?

Comment 14 Fabio Valentini 2021-11-20 21:19:03 UTC
Please, let me handle this one. I will push the update to 0.2.4 at the same time.
Just tell me which side tag I should build the package into.

Comment 15 Fabio Valentini 2021-11-20 21:41:39 UTC
Nevermind, I pushed the changes I want to dist-git for rawhide, f35, and f34.
A scratch build for rawhide succeeded, so I hope this really works around the problem.
https://src.fedoraproject.org/rpms/rust-sha1collisiondetection/c/c50fc54f74d7c8fd66a0e24b8ee086ff209853d4?branch=rawhide
Feel free to build it where you need it.

However, in the future, I would appreciate it if you didn't update sequoia packages without asking me. They're security sensitive, and I would've wanted to make sure that everything is built in the right order (i.e. against the latest bug-and-security-fixed dependencies).

Comment 16 Fedora Update System 2021-12-03 07:20:33 UTC
FEDORA-2021-8cf89f9ce7 has been submitted as an update to Fedora 36. https://bodhi.fedoraproject.org/updates/FEDORA-2021-8cf89f9ce7

Comment 17 Fedora Update System 2021-12-03 08:34:58 UTC
FEDORA-2021-8cf89f9ce7 has been pushed to the Fedora 36 stable repository.
If problem still persists, please make note of it in this bug report.