Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1183242 - Random crashes in dot
Summary: Random crashes in dot
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: cairo
Version: rawhide
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
Assignee: Benjamin Otte
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1188217 1188797 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-01-17 15:10 UTC by Mamoru TASAKA
Modified: 2015-03-01 06:51 UTC (History)
11 users (show)

Fixed In Version: cairo-1.14.0-2.fc21
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-03-01 06:50:04 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Backtraces from 4 different segfaulting executions of dot. (26.95 KB, text/plain)
2015-01-19 13:38 UTC, Mattias Ellert
no flags Details

Description Mamoru TASAKA 2015-01-17 15:10:57 UTC
Description of problem:
root-5.34.24-2.fc22 FTBFS on f22-ruby:
http://koji.fedoraproject.org/koji/buildinfo?buildID=604357

Build itself is successful, however root-doc (noarch) differs between
i686 and x86_64, I don't know if this is ruby22 issue or not.

Comment 1 Mattias Ellert 2015-01-18 17:35:34 UTC
(In reply to Mamoru TASAKA from comment #0)
> I don't know if this is ruby22 issue or not.

It is not related to ruby.

These problems are due to crashes in dot, a program provided by the graphviz package.

The crashes seem random, it is not the same invocation that crash in every attempt.

Comment 2 Mamoru TASAKA 2015-01-18 22:17:17 UTC
shogun built successfully, however -doc subpackage differs between archs (marked as arch-dependent for now). May be related to this.

Comment 3 Mattias Ellert 2015-01-19 13:38:47 UTC
Created attachment 981514 [details]
Backtraces from 4 different segfaulting executions of dot.

As can be seen the segfault happens at the same location each of the 4 times.

The segfault happens inside the libcairo library. I can not tell whether the bug is in libcairo or if it is dot that calls libcairo with bad input.

Comment 4 Mattias Ellert 2015-01-20 06:11:15 UTC
Further investigation shows that this is indeed a bug in cairo.

The bug is fixed upstream:

http://cgit.freedesktop.org/cairo/patch/src/cairo-image-compositor.c?id=5c82d91a5e15d29b1489dcb413b24ee7fdf59934

Reassigning to cairo.

Comment 5 Mattias Ellert 2015-01-29 06:24:41 UTC
Increasing priority because the buggy version is now proposed for stable releases (F20 updates testing and F21 updates testing).

Comment 6 Kevin Fenzi 2015-01-29 15:25:13 UTC
Can you describe better the impact here? 

Is this all invocations of dot always crashing? 

Or only with some command line arguments? or ?

Comment 7 Mamoru TASAKA 2015-01-29 15:30:11 UTC
Perhaps every package using doxygen to generate documentation may be met with this bug. At least root and shogun met with this bug (after 9 hours of shogun build, build failed with "noarch packages do not match between arch)

Comment 8 Mattias Ellert 2015-01-29 16:07:43 UTC
Not every invocation of dot crashes. And not every invocation with a specific input either.

The generation of the documentation during the root build calls dot many thousand times, And the probability for the crash happening is large enough that I have not been able to build the package properly with this version of cairo.

What happens in the root case is that once a crash in dot is detected no more dot runs are done and the remaining ones are skipped.

Since there is more than one architecture being built the file content of the doc package will differ between architectures if the crashing dot run is for a different invocation for different architectures. If this happens koji will fail the build because the noarch package as built on different architectures has different file lists. My last build in koji eventually "succeeded" because all architectures in the final attampt all decided to crash on the very first dot invocation. Which meant that the file lists were identical for all architectures, but completely broken.

I built a private cairo update with the upstream patch applied, and I have built the root package several times without problems on a Fedora Rawhide VM with the fixed cairo.

Comment 9 Kevin Fenzi 2015-01-29 16:42:27 UTC
ok, sounds like we should get this in then... 

I will look at doing a rawhide build and we can confirm things with it and let it run for a bit to make sure there's no fallout from the patch...

Comment 10 Kevin Fenzi 2015-01-29 20:46:04 UTC
https://koji.fedoraproject.org/koji/taskinfo?taskID=8773403

Please test your builds after it lands in the buildroot and let me know if you see any issues.

Comment 11 Mattias Ellert 2015-01-29 22:20:42 UTC
Building root with the cairo update in the Koji buildroot worked flawlessly:

https://koji.fedoraproject.org/koji/buildinfo?buildID=608017

Comment 12 Kevin Fenzi 2015-01-29 22:35:53 UTC
Great. So, how about we let it sit in rawhide until early next week, then push a revised f21/f20 update with the patch if everything looks ok?

Comment 13 Björn 'besser82' Esser 2015-01-31 08:50:54 UTC
The libyui-package failed [1] possibly for the reason described in this bug, too.  After the update issued by Kevin the build is fine [2] again.

I'm currently running an f22-scratch-build of shogun [3] on Koji.  Let's see what the outcome with "noarch"'ed documentation-packages will be…


[1]  https://koji.fedoraproject.org/koji/taskinfo?taskID=8747790
[2]  https://koji.fedoraproject.org/koji/taskinfo?taskID=8785874
[3]  https://koji.fedoraproject.org/koji/taskinfo?taskID=8785883

Comment 14 Björn 'besser82' Esser 2015-01-31 08:59:16 UTC
Last url of shogun-build was wrong.  Here [1] is the correct one.


[1]  https://koji.fedoraproject.org/koji/taskinfo?taskID=8785898

Comment 15 Björn 'besser82' Esser 2015-02-02 10:21:10 UTC
Problem seems to be still present…  Two builds [1,2] of libyui-qt have been sequentially failing on f22 armv7hl with `malloc(): smallbin double linked list corrupted`.  ;(


[1]  https://kojipkgs.fedoraproject.org//work/tasks/2242/8792242/build.log
[2]  https://kojipkgs.fedoraproject.org//work/tasks/2286/8792286/build.log

Comment 16 Kevin Fenzi 2015-02-02 14:22:14 UTC
Yeah, but I don't see it crashing in dot there or cairo. Instead it seems to be in doxygen?

Running*** Error in `/usr/bin/doxygen': malloc(): smallbin double linked list corrupted: 0x00b01770 ***
Patchi*** Error in `/usr/bin/doxygen': corrupted double-linked list: 0x01811888 ***

Can you file a new doxygen bug on that? 
Or is there something I am missing that makes you think it's related to this package/fix?

Comment 17 Mattias Ellert 2015-02-02 15:09:11 UTC
The libyui-qt issue is different. It is an "out-of-memory" problem, while the original problem reported was a "segmentation fault" problem. Also the backtraces are very different.

The problem is related to doxygen's way to determine how many invocations of dot to run in parallel. It checks the number of available cores, but ignores the available memory, so if you have many cores and little memory you run out of memory, and this seems to happen on the ARM builders.

I added a hack to the specfile to limit the number of dot invocations that are run in parallel and the package built twice without problem also on ARM:

Hack:

$ git diff
diff --git a/libyui-qt.spec b/libyui-qt.spec
index 0d5a4f4..3a88829 100644
--- a/libyui-qt.spec
+++ b/libyui-qt.spec
@@ -89,6 +89,8 @@ pushd %{_cmake_build_subdir}
        -DRESPECT_FLAGS=ON                              \
        ..
 
+sed 's/\(DOT_NUM_THREADS\s*=\s*\).*/\11/' -i Doxyfile
+
 %{__make} %{?_smp_mflags}
 %{__make} %{?_smp_mflags} docs
 popd

The Doxyfile is generated form a Doxygen.in file installed by libyui-devel so if you want the apply this change at source level you would need to fix it in a different package.

$ repoquery -ql libyui-devel | grep Doxyfile
/usr/share/libyui/buildtools/Doxyfile.in

Successful builds with the change applied:

https://koji.fedoraproject.org/koji/taskinfo?taskID=8794640
https://koji.fedoraproject.org/koji/taskinfo?taskID=8795147

Comment 18 Milan Bouchet-Valat 2015-02-03 20:19:09 UTC
*** Bug 1188217 has been marked as a duplicate of this bug. ***

Comment 19 Milan Bouchet-Valat 2015-02-03 20:22:07 UTC
I've observed the same crash with a short piece of R code on F21. It crashed 100% of times until I installed 1.14.0-2.fc22, which is indeed a good fix/workaround. +1 for moving it to f21-updates-testing when possible.

Comment 20 Fedora Update System 2015-02-08 16:54:13 UTC
cairo-1.14.0-2.fc21 has been submitted as an update for Fedora 21.
https://admin.fedoraproject.org/updates/FEDORA-2015-1407/cairo-1.14.0-2.fc21

Comment 21 Fedora Update System 2015-02-08 16:55:59 UTC
cairo-1.14.0-2.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/FEDORA-2015-1384/cairo-1.14.0-2.fc20

Comment 22 Milan Bouchet-Valat 2015-02-08 18:32:49 UTC
Thanks!

Any idea what's going on upstream? Do you tink they need a reproducer?

Comment 23 Kevin Fenzi 2015-02-08 19:15:15 UTC
(In reply to Milan Bouchet-Valat from comment #22)
> Thanks!
> 
> Any idea what's going on upstream? Do you tink they need a reproducer?

The patch in the fedora cairo-1.14.0-2 (to fix this bug) is already commited upstream. It was just not in the 1.14.0 release. So, not sure upstream needs anything here. ;)

Comment 24 Milan Bouchet-Valat 2015-02-08 20:01:28 UTC
OK, great -- I was under the impression that they considered the patch as a mere workaround.

Comment 25 Fedora Update System 2015-02-09 05:27:28 UTC
Package cairo-1.14.0-2.fc21:
* should fix your issue,
* was pushed to the Fedora 21 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing cairo-1.14.0-2.fc21'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2015-1407/cairo-1.14.0-2.fc21
then log in and leave karma (feedback).

Comment 26 Marek Kašík 2015-02-09 09:38:15 UTC
*** Bug 1188797 has been marked as a duplicate of this bug. ***

Comment 27 Fedora Update System 2015-03-01 06:50:04 UTC
cairo-1.14.0-2.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 28 Fedora Update System 2015-03-01 06:51:53 UTC
cairo-1.14.0-2.fc21 has been pushed to the Fedora 21 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.