Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1914777 - bout++ fails to build with Python 3.10: test-multigrid_laplace - timeout
Summary: bout++ fails to build with Python 3.10: test-multigrid_laplace - timeout
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: bout++
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: david08741
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: PYTHON3.10
TreeView+ depends on / blocked
 
Reported: 2021-01-11 08:38 UTC by Tomáš Hrnčiar
Modified: 2021-01-17 14:29 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-01-17 14:29:05 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Tomáš Hrnčiar 2021-01-11 08:38:25 UTC
bout++ fails to build with Python 3.10.0a4.

======= FAILURES ========

----- test-multigrid_laplace -----
rm: cannot remove 'data/BOUT.dmp.*.nc': No such file or directory

(It is likely that a timeout occured)
======= 1 failed in 929.92 seconds ========
make: *** [makefile:49: check-integrated-tests] Error 1

For the build logs, see:
https://copr-be.cloud.fedoraproject.org/results/@python/python3.10/fedora-rawhide-x86_64/01868967-bout++/

For all our attempts to build bout++ with Python 3.10, see:
https://copr.fedorainfracloud.org/coprs/g/python/python3.10/package/bout++/

Testing and mass rebuild of packages is happening in copr. You can follow these instructions to test locally in mock if your package builds with Python 3.10:
https://copr.fedorainfracloud.org/coprs/g/python/python3.10/

Let us know here if you have any questions.

Python 3.10 will be included in Fedora 35. To make that update smoother, we're building Fedora packages with early pre-releases of Python 3.10.
A build failure prevents us from testing all dependent packages (transitive [Build]Requires), so if this package is required a lot, it's important for us to get it fixed soon.
We'd appreciate help from the people who know this package best, but if you don't want to work on this now, let us know so we can try to work around it on our side.

Comment 1 Miro Hrončok 2021-01-11 10:29:56 UTC
IIRC this should only happen in Copr and not Koji. A workaround is to enable network access.

See https://bugzilla.redhat.com/show_bug.cgi?id=1793612#c1 for details.

Comment 2 david08741 2021-01-11 11:06:53 UTC
I don't think as it is that simple, the MPI issues is I think fixed, at least on rawhide.

The test should not be particular slow, either, normally 20 to 30 secs, so well below the 600 secs.

I will try to investigate this, and thus keep the bug open.

Comment 3 david08741 2021-01-12 15:52:50 UTC
I am tempted to say this is an issue that copr is not having enough cores. Even though the test only uses 3 threads - that might be sufficient to trigger the timeout.
On an old 2-core system the test finishes in about 4 seconds if it is using 1 thread, but with 3 threads it takes over 4 minutes.
I am not sure what copr is using, but I think it is also using old CPUs and very few CPU (1?) - in which case it might take well more then 10 minutes.
On a decent 64 core system the single tread version takes 1.3 seconds and 1.0 with 3 threads.

If this keeps being an issue, and I can disable the test on copr or if there is only one core available.

The underlying issue is that MPI is optimized to be fast on non-oversubscribed systems. While in the real world MPI should never be used oversubscribed, this is common for testing, in which case the "idle" threads are busy waiting on the other threads ...

Comment 4 Miro Hrončok 2021-01-12 16:06:38 UTC
Any explanation why it works with network enabled?

Comment 5 david08741 2021-01-12 16:12:46 UTC
Pure luck - I guess ...
Timeout is 600 seconds, in the case with network enabled it took:
test-multigrid_laplace           ✓ 588.655 s

In that case increasing the time-out might be the most easy solution ...

Comment 6 david08741 2021-01-17 14:29:05 UTC
I have increased the timeout from 10m to 15m, I think that should fix the issue.


Note You need to log in before you can comment on or make changes to this bug.