Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 173719 - Review Request: openmpi - a new MPI implementation
Summary: Review Request: openmpi - a new MPI implementation
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: Package Review
Version: rawhide
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Jason Vas Dias
QA Contact: David Lawrence
URL: http://mitgcm.org/eh3/fedora_misc/ope...
Whiteboard:
Depends On:
Blocks: FE-ACCEPT
TreeView+ depends on / blocked
 
Reported: 2005-11-19 14:51 UTC by Ed Hill
Modified: 2007-11-30 22:11 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-02-23 18:38:31 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Ed Hill 2005-11-19 14:51:11 UTC
Spec Url:  http://mitgcm.org/eh3/fedora_misc/openmpi.spec
SRPM Url:  http://mitgcm.org/eh3/fedora_misc/openmpi-1.0-1.fc4.src.rpm

Description:
Open MPI is a project combining technologies and resources from several 
other projects (FT-MPI, LA-MPI, LAM/MPI, and PACX-MPI) in order to build 
the best MPI library available.  A completely new MPI-2 compliant 
implementation, Open MPI offers advantages for system and software 
vendors, application developers and computer science researchers.

Notes: 
 - package builds in mock on FC-4
 - can install side-by-side with the Core-provided lam MPI package
 - takes a little time to compile [but no where near as bad as 
     Xorg or the kernel ;-)]

Comment 1 Ed Hill 2005-11-19 15:14:17 UTC
Oh, fudge!  The package needs just a little more work because three files 
conflict with lam in Core.  I'll post an updated version as soon as I can 
fix it.

Comment 2 Ed Hill 2005-11-19 19:48:14 UTC
Updated version peacefully co-exists with lam:
  http://mitgcm.org/eh3/fedora_misc/openmpi-1.0-2.src.rpm
  http://mitgcm.org/eh3/fedora_misc/openmpi.spec

Comment 3 Deji Akingunola 2005-11-20 05:01:20 UTC
Hi Ed,

(In reply to comment #2)
> Updated version peacefully co-exists with lam:

Well, you're still left with mpicc, mpic++, and mpif77 in the {_bindir} though.
But how about following through with the idea of using alternatives? I've been
testing openmpi, and might actually be using it instead of mpich, howevr
model/programs that have been built and configure to use mpirun/mpiexec will not
be able use openmpi as it's been packaged.
Also gfortran is used to build the fortran 90 module when found, so mpif90 needs
to be packaged in too. 



Comment 4 Ed Hill 2005-11-20 05:30:48 UTC
Hi Deji,

For openmpi, the names you list above are just soft-links so I removed 
them.  In addition, the above SRPM uses --program-suffix=".openmpi" so 
its a definite step towards whats needed for alternatives to work.  The 
other half is to get lam converted as Tom indicated at:

https://www.redhat.com/archives/fedora-extras-list/2005-November/msg00406.html

and I'm hoping Tom and/or a lam maintainer will be able to change the
in-Core parts.

I'm only just learning about alternatives works so things will need to 
get added/changed to the openmpi package.  Hopefully, it won't be too 
hard to figure it out.  ;-)

Comment 5 Orion Poplawski 2006-01-23 23:29:21 UTC
How are we doing with this?  I see there is a 1.0.1 and even a 1.0.2a as well...

Comment 6 Ed Hill 2006-01-24 00:26:37 UTC
Hi Orion (and everyone else interested in MPI), heres what I think ought 
to happen in approximate order:

 1) upgrade this package to the latest upstream (easy!)
 2) setup package to work with /usr/sbin/alternatives
    a) make this package work with it
    b) make the in-Core LAM work with it
    c) make the in-submission MPICH2 work with it (bug #171993)

and, also, I'd like to request that

 - LAM be removed from Core, and/or
 - OpenMPI replace LAM within Core

since the LAM developers have very publicly announced that LAM is in 
maintenance-only mode and have thrown their full weight behind OpenMPI:

  http://www.lam-mpi.org/

Tom ("Spot") has informally volunteered to help with the in-Core bits but 
I think we need to make some progress here first before pressuring (or is 
it begging?) the Core folks to help improve the MPI situation.  :-)

So, I'll try to get 1 and 2a done in a week or so...

Comment 7 Roland Dreier 2006-01-24 03:49:42 UTC
Is it possible to build InfiniBand support for Open MPI, possibly in a separate
package?  The only thing Open MPI needs to build against for this is
libibverbs-devel, which is packaged (by me) for Fedora Extras.  This would be
really useful, because it would mean that users could run MPI apps on IB
entirely with standard binaries, without having to rebuild anything on Fedora.

To build against the libibverbs I just build for Fedora will require the latest
Open MPI tree -- it requires the change described in the 1.0.2 changelog as:

- Update to match newest OpenIB user-level library API.  Thanks to
  Roland Dreier for submitting this patch.

I can provide more info if required.

Comment 8 Ed Hill 2006-01-24 04:17:32 UTC
Hi Roland, thanks for pointing that out.  Since libibverbs is in Extras, 
it should certainly be used!  So, lets add it to the TO-DO list for this 
package.

Comment 9 Orion Poplawski 2006-01-24 22:45:36 UTC
I've put a 1.0.1 version with a start on using alternatives here:

http://www.cora.nwra.com/~orion/fedora/openmpi-1.0.1-1.src.rpm

Some notes and questions that have arisen:

- Used %{version} in Source0 line
- Removed Requires: gcc-gfortan.  I'm sure many folks will use this without fortran.
- Add Requires(post): /usr/sbin/alternatives.
- Add post/preun scripts to install alternatives.  Note that the alternatives
for mpirun and mpiexec can't be used until lam is converted to use alternatives.
- Removed %exclude %{_libdir}/debug, unnecessary
- Made %sysconfdir stuff %config

 - Do we really need the .openmpi suffix on the following:

/usr/bin/ompi_info.openmpi
/usr/bin/orted.openmpi
/usr/bin/orteprobe.openmpi
/usr/bin/orterun.openmpi

Is anyone other than openmpi going to provide these?

I dropped the .openmpi suffix option and instead simply renamed the compiler
wrappers and handled the mpirun/mpiexec links via alternatives.

- Do we want to switch all of the mpi* commands as a unit, or individually. I've
done the mpirun/mpiexec as a unit named "mpi" and the compiler wrappers
individually here because I think I can imagine wanting different fortran
compilers on different machines and I'm not sure we can split an alternatives
unit between two sub-packages.


Comment 10 Deji Akingunola 2006-01-24 23:28:34 UTC
(In reply to comment #9)

> - Removed Requires: gcc-gfortan.  I'm sure many folks will use this without
fortran.

Most MPI implementations configures and build the fortran module by default, so
I guess rpm will pull in the dependency anyway.
 
> - Add Requires(post): /usr/sbin/alternatives.
> - Add post/preun scripts to install alternatives.  Note that the alternatives
> for mpirun and mpiexec can't be used until lam is converted to use alternatives.

I've been meaning to submit a patch for lam (in Core) to do this, but have been
very busy lately; maybe the Redhat maintainer will be willing to get it going
with a patch.

> 
> /usr/bin/ompi_info.openmpi
> /usr/bin/orted.openmpi
> /usr/bin/orteprobe.openmpi
> /usr/bin/orterun.openmpi
> 
> Is anyone other than openmpi going to provide these?

I don't think so.

> 
> I dropped the .openmpi suffix option and instead simply renamed the compiler
> wrappers and handled the mpirun/mpiexec links via alternatives.
> 
> - Do we want to switch all of the mpi* commands as a unit, or individually. 

What do you mean by a unit here ? I believe what you have in the src.rpm above
is okay, many applications built to use mpi specifically looks for this
individual executables.

Nice work.

Deji

PS:  Why not just pass -includedir=%{_includedir}/%{name} and
-libdir=%{_libdir}/%{name} to the configure flag instead of moving the files around.


Comment 11 Ed Hill 2006-01-25 04:05:29 UTC
Hi Orion, I just downloaded, built, and installed your 1.0.1-1 SRPM.  The 
alternatives bits look OK (although I've not looked at them thoroughly) but 
the /usr/bin/mpi*.openmpi programs all seg-fault with similar errors:

  mpic++.openmpi: error while loading shared libraries: libopal.so.0: 
    cannot open shared object file: No such file or directory

so it looks like the configure arguments need work -- I think we need to 
specify --libdir=/usr/lib/openmpi

In any case, thank you for finding some time to look at this package!

Comment 12 Orion Poplawski 2006-01-25 23:01:25 UTC
New version: http://www.cora.nwra.com/~orion/fedora/openmpi-1.0.1-2.src.rpm

This uses --includedir and --libdir to specify install locations and create
/etc/ld.so.conf.d/openmpi.conf to point to the library directory.  This gives us
some funky directory structures, but oh well:

/usr/include/openmpi
/usr/include/openmpi/mpi.h
/usr/include/openmpi/mpif.h
/usr/include/openmpi/openmpi
/usr/include/openmpi/openmpi/ompi
/usr/include/openmpi/openmpi/ompi/mpi
/usr/include/openmpi/openmpi/ompi/mpi/cxx
/usr/include/openmpi/openmpi/ompi/mpi/cxx/comm.h

/usr/lib64/openmpi/openmpi/mca_allocator_basic.so

I've also submitted a patch for lam, bug #178967

I'm currently testing with the patched lam.  The more eyes on this the better
though.

Comment 13 Jason Vas Dias 2006-02-16 00:35:30 UTC
I'm currently trying to get the openmpi-1.0.1 into FC-5 (or at least FC Rawhide
post FC-5), based on Orion's latest openmpi-1.0.1-3.src.rpm , and with a modified
lam to install conflicting files with /usr/sbin/alternatives .
Depending on the decision on which FC distro openmpi goes into, we still may need
openmpi Extras releases for the other FC distros.



Comment 14 Ed Hill 2006-02-16 01:45:33 UTC
Hi Jason, is there any chance that you (and the other Red Hat packagers) 
would consider using environment-modules for side-by-side installs of 
different MPI implementations?  Compared to the alternatives system,
environment-modules is far superior.  Alternatives does not (and probably 
never will) gracefully handle situations such as multiple implementations 
of the same API.  For instance, what do you do with the LAM, MPICH, etc. 
man pages?  If you use environment-modules, its very easy to install as 
many different MPI implimentations (and their corresponding MAN pages, 
etc.) as you want and then define multiple MPI "modules" (that is, scripts 
within the environment-modules system) that each set the correct MANPATH and any
other values.  And I mention man pages only as an example -- the idea readily
entends to binaries, libs, headers, what-have-you...

Further, the use of environment-modules is a __de facto__ standard.  The 
vast majority of (super-)computing centers, clusters, etc. where MPI is 
chiefly used also use environment-modules.  Its popular at these locations
exactly because it works so smoothly.  Its a simple, easily understood,
extensible, and elegant approach.  And it provides end users with maximal 
flexibility and choice.

To the best of my knowledge, not a *single* system on this planet uses
alternatives for MPI.  Its borderline moronic.  Why would you folks 
want to ignore the hard-won wisdom of an entire field and go with some 
clunky, inferior alternatives-based approach when you can have something
so much better?

PS: Orion has very helpfully packaged environment-modules for Extras so 
its *very* easy for you folks to adopt...  [Hint-hint...!]

PPS: I'll be glad to discuss this further with anyone who has questions.
Feel free to contact me by email, etc.

Comment 15 Orion Poplawski 2006-02-16 05:21:20 UTC
I have to second Ed's comments.  I tried to shoe horn lam, mpich, and openmpi
into an alternatives style system and it just doesn't fit.  I might be
acceptable to use it for binaries and have, for example:

/usr/bin/mpif77.{name}
/usr/bin/mpirun.{name}
...

with alternatives for those, but not for anything else.  Everything else should
be in:

/usr/include/{name}
/usr/lib/{name}
/usr/share/{name}/man
...

with environment-modules files to select.

There are still unresolved issues though:

- Do we still want a default implementation accessible with no configuration by
the user?  If so, how to update /etc/ld.so.conf.d/ to point to the various
libraries, or do we instead load a particular module (how?).

- Can we guarantee that different mpi programs get the proper set of libraries
when started remotely?

- Others?

Ed - do most places get around the library issues by linking statically?



Comment 16 Ed Hill 2006-02-16 05:53:31 UTC
Hi Orion, most environment-module scripts that I've seen use syntax such as:

  prepend-path PATH $SOMEPACKAGE_HOME/bin
  prepend-path MANPATH $SOMEPACKAGE_HOME/man
  prepend-path LD_LIBRARY_PATH $SOMEPACKAGE_HOME/lib
  ...etc...

which takes care of the binaries, libs, headers, etc.

And, if the Core packagers choose to avoid environment-modules and select 
one particular MPI implimentation as the "standard" for Core, thats still 
perfectly OK.  The "standard" or "preferred" MPI implimentation can be 
installed exactly as LAM is currently installed and then environment-
modules can be used in conjunction with any other MPI implimentatons 
(say, multiple one within Fedora Extras) per the above.  I know this works 
because many folks do this on their clustersand/or networks of workstations.  
For instance, we have the Core-supplied LAM installed and we have $N$ other
MPI implementations installed and they all work.

And users are free to *dynamically* select (whenever they want) which MPI 
bits they'd like to use for a particular task with either environment-
modules (which is, ultimately, just a convenience) or by manually selecting 
the desireed paths for builds, execution, etc.

Its that easy!  And it doesn't require any nasty static linking or other
ugly hacks.  Its very clean.

Comment 17 Jason Vas Dias 2006-02-16 16:23:33 UTC
I'll investigate using environment-modules / finding the best way to let 
OpenMPI and LAM co-exist. I don't think it is an option to completely 
replace LAM with OpenMPI, as software currently linked with LAM will break.


Comment 18 Orion Poplawski 2006-02-16 17:13:39 UTC
I think we can use modules with someling like:

alias mpirun mpirun.{name}
alias mpicc  mpicc.{name}
prepend-path MANPATH $SOMEPACKAGE_HOME/man
prepend-path LD_LIBRARY_PATH $SOMEPACKAGE_HOME/lib
...etc...

to coexist with alternatives/FHS compliance for binaries.


Comment 19 Ed Hill 2006-02-16 17:29:19 UTC
Hi Jason and Orion, Thank you for the responses -- I'm thrilled that other
folks are interested in MPI on Fedora!

When installing multiple MPI implementations, its best if you avoid putting
headers and libs into, for instance, /usr/include and /usr/lib.  If you
instead put them into, for example, /usr/include/openmpi or /usr/lib/lam
then compilers will have to use some sort of environment variables or other
build-time information to find them.  And thats desirable when installing
multiple implementations because it means that, for instance, a particular
software build won't accidentally include a lam-provided /usr/include/mpif.h
when what you really want is the version at /usr/include/mpich-2/mpif.h.

Basically, if we do our best to avoid "polluting" the standard locations
then users are much less likely to have problems with side-by-side installs.

And yes, I realize that these problems can be fixed by improving the build
systems for all the software out there that uses MPI.  Good build systems
can make it easy to ensure that you get the headers, libs, etc. that you 
want.  But file layout can also make it easier when dealing with the vast
number of thrown-together-with-duct-tape build systems that seem to be 
the norm in the real world.  :-)


Comment 20 Jonathan Underwood 2006-02-16 18:18:28 UTC
Yes - as an MPI user, I would also add to the plea not to use alternatives for
solving this issue. I think you'd be creating a maintainence nightmare across
the packages for each MPI implementation. Plus, alternatives isn't something
that users should be fiddling with, I'd have thought. Environment modules seems
like a much better solution.

Comment 21 Jason Vas Dias 2006-02-16 19:02:58 UTC
Perhaps the simplest solution may be to simply ship openmpi in a way that does
not clash at all with lam, and that allows users to select use of either 
alternatives(8), or environment-modules, or home-grown solutions, but that
does not depend on the new environment-modules package or use alternatives(8)
in the .spec file. 

For example:
the 6 mpi* clashing binaries could be installed as:
  /usr/bin/om-$bin
(eg. /usr/bin/om-mpicc) , 

The clashing libraries could be installed in /usr/lib/openmpi . These
are:  
   libmpi*.so  ( clashes with lam RPM )
   libopal.so  ( clashes with opal RPM - the "Open Phone Abstraction Library"!)
Proably it is simpler just to put all openmpi libs in /usr/lib/openmpi .

and includes in /usr/include/openmpi.

Then links could be created in:
  /usr/share/openmpi/{bin,lib,include}
so that e.g.
  /usr/share/openmpi/bin/mpicc -> /usr/bin/om-mpicc
  /usr/share/openmpi/lib/libmpi.so -> /usr/lib/openmpi/mpi.so
etc.

There are as yet NO man-pages shipped in the openmpi distribution (another
reason why openmpi is still of "experimental" status IMHO).

Then existing lam users and the existing lam package would be totally 
unaffected.

The openmpi package should also provide pkg-config openmpi.pc files to 
easily provide the linker library and compiler include options to openmpi
using builds .

Users could decide to use alternatives ( ie. move the lam clashing executables
to /usr/bin/lam-*, and make /usr/bin/mpicc an alternative between 
/usr/bin/om-mpicc and /usr/bin/lam-mpicc ), or could use environment-modules
(set PATH, LD_LIBRARY_PATH to select between /usr/{lib,bin,include} and 
/usr/share/openmpi/{bin,lib,include} with modules). The environment-modules
scripts could actually be shipped in /usr/share/openmpi/environment-modules
and used at users' discretion, and we could also ship a script to setup 
alternatives(8). But the existing lam package could stay the same, and 
no new packages would be required by the new openmpi package.

I think the above is what I'll be aiming for with the Core release, unless 
there are any objections. 

Comment 22 Ed Hill 2006-02-16 19:35:15 UTC
Hi Jason, I think most of comment #21 is fine but I have one objection:

*** PLEASE *** move the lam-supplied files [I'll refrain from using the 
word "crap" here :-)] currently in /usr/include/* to /usr/include/lam/* 
and from /usr/lib/* to /usr/lib/lam/*.  Its something we really ought to 
do.  In fact, if lam were submitted to Fedora Extras in its current shape 
it would not pass review for exactly these reasons.

And yes, the above lam changes would require some *small* changes to the 
packages that depend upon it.  Not a big deal.  Whatever small pain it 
causes now, the benefits for end users and Extras packagers dwarf it.

[And whoever decided it was a good idea to put files named "freq.h", 
 "boot.h", "net.h", and "args.h" in /usr/include should be banished to 
 rotating backup tapes in a cold machine room for the next month or two.]


Comment 23 Orion Poplawski 2006-02-17 17:00:22 UTC
I'll second Ed's comments.

As for man pages, while it's not an issue for openmpi/lam at the moment, it will
be, and it is for lam/mpich.  So, how do you handle the conflicting man pages? 
Can we install in /usr/share/<name>/man?  Is that acceptable?  I'm facing the
same issue with ncarg conflicting with some generic man3 names from allegro.

Comment 24 Jason Vas Dias 2006-02-17 23:32:42 UTC
OK, I've now imported openmpi-1.0.1-1 into rawhide (it remains to be seen 
whether it will make FC-5) . The SRPM and i386 RPMS are at:
   http://people.redhat.com/~jvdias/openmpi/

Please test this out and let me know of any issues ASAP - thanks .

I've also built lam-7.1.1-10.FC5, with the suggested changes - it should be in
tomorrow's rawhide - the srpm is also at the people page above.

LAM headers now all live under /usr/include/lam , libraries under /usr/lib/lam,
and man-pages in /usr/share/lam/man .

There are directory structures that could be used for environment-modules in 
/usr/share/{lam,openmpi}/{bin,lib,include,man}, and an attempt at module files
in /usr/share/{lam,openmpi}/*.module.

The binaries which clash between lam and openmpi are shipped named with a
prefix that differentiates them:
  /usr/bin/{om-,lam-}{mpirun,mpiexec,mpicc,mpic++,mpiCC,mpif77}

By default, the lam package creates links to its /usr/bin/lam-* files, without
the 'lam-' prefix - it will still be the default MPI implementation.

There are now pkg-config files for lam and openmpi, and openmpi now contains
an /usr/sbin/mpi-alternatives script that can be used to create alternatives 
for lam / openmpi:

$ pkg-config --libs --cflags lam
-I/usr/include/lam  -L/usr/lib/lam -lmpi
$ pkg-config --libs --cflags openmpi
-I/usr/include/openmpi  -L/usr/lib/openmpi -lmpi
$ mpi_alternatives
Usage: mpi_alternatives < install | display | remove | set>
    Sets up alternatives for MPI (Message Passing Interface) between LAM and
OpenMPI implementations.
$ mpi_alternatives install
$ mpi_alternatives display
mpi - status is manual.
 link currently points to /usr/bin/lam-mpirun
/usr/bin/om-mpirun - priority 50
 slave mpiCC: /usr/bin/om-mpic++
 slave mpic++: /usr/bin/om-mpic++
 slave mpicc: /usr/bin/om-mpicc
 slave mpiexec: /usr/bin/om-mpiexec
 slave mpif77: /usr/bin/om-mpif77
/usr/bin/lam-mpirun - priority 50
 slave mpiCC: /usr/bin/lam-mpic++
 slave mpic++: /usr/bin/lam-mpic++
 slave mpicc: /usr/bin/lam-mpicc
 slave mpiexec: /usr/bin/lam-mpiexec
 slave mpif77: /usr/bin/lam-mpif77
Current `best' version is /usr/bin/om-mpirun.
$ mpi_alternatives set openmpi
$ mpi_alternatives display
mpi - status is manual.
 link currently points to /usr/bin/om-mpirun
/usr/bin/om-mpirun - priority 50
 slave mpiCC: /usr/bin/om-mpic++
 slave mpic++: /usr/bin/om-mpic++
 slave mpicc: /usr/bin/om-mpicc
 slave mpiexec: /usr/bin/om-mpiexec
 slave mpif77: /usr/bin/om-mpif77
/usr/bin/lam-mpirun - priority 50
 slave mpiCC: /usr/bin/lam-mpic++
 slave mpic++: /usr/bin/lam-mpic++
 slave mpicc: /usr/bin/lam-mpicc
 slave mpiexec: /usr/bin/lam-mpiexec
 slave mpif77: /usr/bin/lam-mpif77
Current `best' version is /usr/bin/om-mpirun.
$ mpi_alternatives set lam
$ mpi_alternatives display
mpi - status is manual.
 link currently points to /usr/bin/lam-mpirun
/usr/bin/om-mpirun - priority 50
 slave mpiCC: /usr/bin/om-mpic++
 slave mpic++: /usr/bin/om-mpic++
 slave mpicc: /usr/bin/om-mpicc
 slave mpiexec: /usr/bin/om-mpiexec
 slave mpif77: /usr/bin/om-mpif77
/usr/bin/lam-mpirun - priority 50
 slave mpiCC: /usr/bin/lam-mpic++
 slave mpic++: /usr/bin/lam-mpic++
 slave mpicc: /usr/bin/lam-mpicc
 slave mpiexec: /usr/bin/lam-mpiexec
 slave mpif77: /usr/bin/lam-mpif77
Current `best' version is /usr/bin/om-mpirun.

Please let me know of any issues / problems with the above - thanks.

Comment 25 Deji Akingunola 2006-02-18 03:00:53 UTC
Oops, no mpif90 ?? I'm very sure gfortran in rawhide compiles the f90 module
just fine, please include it.
Anyway thanks for this nice new structure.

Comment 26 Jason Vas Dias 2006-02-18 15:43:40 UTC
Yes, there is mpif90 in openmpi-devel :
/usr/bin/om-mpif90
/usr/share/openmpi/bin/mpif90

It was a mistake to name it om-mpif90 in /usr/bin as it does not clash with LAM -
I'll name it /usr/bin/mpif90 in the release that gets submitted to FC.



Comment 27 Ed Hill 2006-02-20 17:27:01 UTC
Hi Jason, thank you for looking into this!

I've locally built, installed, and started testing the openmpi and lam 
versions that you list in comment #24 above:

  # md5sum openmpi-1.0.1-1.src.rpm lam-7.1.1-10.FC5.src.rpm
  b4fe04dbdd4a3a80a20e9e4d68fd5b85  openmpi-1.0.1-1.src.rpm
  80dfb5f422480835013ccc2b1ea7792c  lam-7.1.1-10.FC5.src.rpm

Both built and installed without any problems (AFAICT) on a "stock" FC4 
system (not yet tried builds in mock -- but will next).

The first problem I've run into are links such as:

  cd /usr/share/openmpi/bin/
  ls -l mpicc
  lrwxrwxrwx  1 root root 59 Feb 17 23:13 mpicc -> 
    ../(/var/tmp/openmpi-1.0.1-1-root-edhill//usr/bin)/om-mpicc

which seems to be a failed sed invocation.  I'll test some more and post 
a thorough set of comments as soon as I have a chance.  And thanks again!

Comment 28 Ed Hill 2006-02-20 17:50:47 UTC
Problem in comment #27 exists for the referenced lam version also:

  ls -l /usr/share/lam/bin

returns a directory full of broken links such as:

  lrwxrwxrwx  1 root root 47 Feb 17 22:30 mpicc ->
    ../(/var/tmp/lam-7.1.1-root//usr/bin)/lam-mpicc

but the lam* binaries in /usr/bin work nicely such as:

  /usr/bin/lam-mpicc -o mpi_hi mpi_hi.c
  /usr/bin/lamboot
  /usr/bin/lam-mpirun -np 2 ./mpi_hi
    XXX: hello world
    XXX: hello world

Comment 29 Jason Vas Dias 2006-02-20 18:06:40 UTC
Hi - 
RE: broken links:
> lrwxrwxrwx  1 root root 59 Feb 17 23:13 mpicc -> 
> ../(/var/tmp/openmpi-1.0.1-1-root-edhill//usr/bin)/om-mpicc

This appears to be due to a bug in FC-4 bash - on FC-5 / Rawhide, in the
openmpi/devel/ directory, I get:

$ (. relpath.sh; relpath /a/b/c/d /a/b/d/c/e)
../../../c/d

which is correct, and leads to correct links being created (see $rpath in
.spec file). But on FC-4, I get:
$ (. relpath.sh; relpath /a/b/c/d /a/b/d/c/e)
../(/a/b/c/d)

which leads to the incorrect links. I'm now fixing relpath.sh to work correctly
on FC-4, and will submit a new version ASAP.

Comment 30 Jason Vas Dias 2006-02-20 18:10:51 UTC
Aha! this fixes relpath.sh on FC-4:
  $ sed -i 's/local//g' relpath.sh

The 'local' feature for variables is not properly implemented in FC-4 bash.

I'll submit these changes (and the change to fix the mpif90 link) now.


Comment 31 Jason Vas Dias 2006-02-20 20:07:25 UTC
The FC4 build and mpif90 issues are now fixed with the openmpi-1.0.1-1 RPMs at:
  http://people.redhat.com/~jvdias/openmpi
these are now submitted to Fedora rawhide CVS.

I've had a response from the Fedora "powers-that-be" : openmpi CANNOT go into
FC-5, but will be going into FC Rawhide after the release of FC-5 GOLD .

So we'll need to put openmpi into Extras for FC-5, and it will be available
from the Rawhide/development repos and will be in the FC-6 release 
(only 6 months away).

I'm happy to submit openmpi into FC-5/Extras, unless one of you wants to be
the openmpi Extras package owner ...  let me know.

I don't think we should submit openmpi to FC-4 Extras, which would require
respinning the FC-4 LAM, and a rebuild of all LAM using FC-4 RPMs , which 
is too much churn for a stable FC release.

Comment 32 Ed Hill 2006-02-21 01:21:03 UTC
Hi Jason, not having OpenMPI in either FC5 or FE4 seems very reasonable.  
And it'll be nice to have it in FE5 along with your lam cleanups.  Thank
you!  If it helps I'll withdraw my openmpi submission and act as a 
reviewer (and probably Orion will want to help out with reviews and/or 
patches).

So I built (in mock) the newer openmpi version from comment #31 above (it
would be less confusing if the release number were bumped with each change 
but its not a big deal):

  e78dee1eae42c099c958a8a335ef758a  openmpi-1.0.1-1.src.rpm

and the soft-links are now fixed--good.  Unfortunately, it seems to have
problems running the daemon and locating some run-time help files:

$ /usr/share/openmpi/bin/mpicc -o om_hi ./mpi_hi.c
$ /usr/share/openmpi/bin/orted
$ /usr/share/openmpi/bin/mpirun -np 1 om_hi
--------------------------------------------------------------------------
Sorry!  You were supposed to get help about:
    orterun:proc-aborted
from the file:
    help-orterun.txt
But I couldn't find any file matching that name.  Sorry!
--------------------------------------------------------------------------
[ernie:05944] ERROR: A daemon on node localhost failed to start as expected.
[ernie:05944] ERROR: There may be more information available from
[ernie:05944] ERROR: the remote shell (see above).
[ernie:05944] The daemon received a signal 11.

but its possible that I've botched something.  I've got to get some other 
work done so I'll look into it later this week.

Comment 33 Jason Vas Dias 2006-02-22 17:14:43 UTC
RE: > Sorry!  You were supposed to get help about:
    > orterun:proc-aborted

This was caused by the wrong installation location for the help*.txt files -
they now live in /usr/share/openmpi/help/openmpi, and the 'pkgdatadir' 
configuration option tells the libraries where to find them.

Another problem was with the libmpi shared libraries - these used to live in
/usr/lib/libmpi* .  Having lam and openmpi both install /etc/ld.so.conf.d/*.conf
files meant that the LAM libmpi* libraries were always resolved first - 
('l' precedes 'o' alphabetically) . 

Now, both lam-7.1.1-11+ and openmpi own a '%ghost' file 
/etc/ld.so.conf.d/mpi.conf ,
which is created as a link to either /usr/share/{lam,openmpi}/ld.conf during 
the '%post' script of both RPMs only if it does not already exist.

openmpi's /usr/sbin/mpi_alternatives now makes /etc/ld.so.conf.d/mpi.conf the
primary alternative:

$ mpi_alternatives display
mpi - status is manual.
 link currently points to /usr/share/openmpi/ld.conf
/usr/share/openmpi/ld.conf - priority 50
 slave mpiCC: /usr/bin/om-mpic++
 slave mpic++: /usr/bin/om-mpic++
 slave mpicc: /usr/bin/om-mpicc
 slave mpiexec: /usr/bin/om-mpiexec
 slave mpif77: /usr/bin/om-mpif77
 slave mpirun: /usr/bin/om-mpirun
/usr/share/lam/ld.conf - priority 50
 slave mpiCC: /usr/bin/lam-mpic++
 slave mpic++: /usr/bin/lam-mpic++
 slave mpicc: /usr/bin/lam-mpicc
 slave mpiexec: /usr/bin/lam-mpiexec
 slave mpif77: /usr/bin/lam-mpif77
 slave mpirun: /usr/bin/lam-mpirun
Current `best version is /usr/share/openmpi/ld.conf.
 
I've uploaded the new openmpi-1.0.1-1.fe5 and lam-7.1.1-11 RPMs to 
http://people.redhat.com/~jvdias/openmpi .

Unless there are any objections, I'll import openmpi-1.0.1-1.fe5 into 
FC5 Extras tomorrow. Please try out the new RPMs - thanks!

Comment 34 Jason Vas Dias 2006-02-23 18:38:31 UTC
OK, openmpi-1.0.1-1.fe5 is now submitted to FC-5 Extras - 
use with lam-7.1.1-11+ from FC-5 .


Comment 35 Christian Iseli 2006-03-29 14:06:52 UTC
Set blocker to FE-ACCEPT


Note You need to log in before you can comment on or make changes to this bug.