[mvapich-discuss] Building mvapich-based applications without access to infiniband system

Tom Mitchell tom.mitchell at qlogic.com
Thu Jan 3 17:47:46 EST 2008


On Jan 03 11:56, Ben Held wrote:
> 
>    Our company offers a commercial product that we currently build for standard
>    MPICH-1 and LAM.  We have a client that has a new Infiniband Linux cluster
>    that has MVAPICH installed on it.  Our company does not own any infiniband
>    hardware,  but  we  are  faced  with providing an application for this
>    customer’s cluster.  Is this possible and how do we proceed.  It appears
>    that the build process for mvapich automatically detects the hardware (that
>    we  don’t have), so I have concerns that building mvapich here and the
>    linking it into our app will result in a binary that will not run on their
>    cluster.


Ben,

Jeff Squyres had very good advice.

I would like to add that MPI is an API not ABI.   As you
branch out you will have to pay attention to the binary stacks
that you target at compile time.   Examples might be HP-MPI,
Cisco's MPI, QLogic's MPI, Open MPI, LAM, Intel MPI... and more
including a customers hand crafted MPI.

The MVAPICH that your client built for his Infiniband Linux
cluster will have been compiled with a specific set of options
and a specific compiler.  Having built MVAPICH the client
would have versions of the helper scripts mpicc, mpif77,
mpif90... these scripts match the correct compiler to the
correct library and almost all the other moving parts.

If you look at the ABI issue for compilers in isolation you
can find subtle things like  Fortran  logical True and False
having underlying differences in the digital representation.
For some Fortran compilers the logical .TRUE. and .FALSE. use
the int pair 1 and 0.   While others use 0 and -1....  getargs,
memcpy  are also other places I know where ABI mismatches
can happen.

The  logical .TRUE. and .FALSE. case is interesting because
correct Boolean logic transformations by the compiler can
convert working code to code that fails in strange ways after
ABI cross linking or a change in optimization....  This can
be critical for Basic Linear Algebra packages.... where a
researcher finds that compiler A gives +5% on library foo.so and
compiler B gives +5% on library bar.so and then MPI was built
with compiler C.  Or worse ld search order finds unexpected
and different packages out on nodes in a cluster.

To research this a bit look at Open MPI.  The Open MPI configure
script and README does have comments that discuss and address
the logical .TRUE. and .FALSE. issue.  For Open MPI users the
ompi_info command is valuable to rediscover which Fortran compiler
Open MPI was configured with but may not have all the
compiler flags (like  "....Portland Group compilers provide
the "-Munixlogical" option, and Intel compilers (version >=
8.) provide the "-fpscomp logicals" option...."  Also the
environment (see also alternatives) can get in the mix....

Now with gcc we also have gcc3 and gcc4 versions to watch...

As you branch out your build environments need full and detailed 
records so you can reproduce/ debug these issues.   

Since MPI is an API you would do well to collect as many MPIs
and compilers as you can find then build and test with each.

In this case you only have the one additional customers
MVAPICH and cluster to work with.   That has the potential
of making your life easy as long as the customer can give
you access.  It is the next handful of customers  that makes
things interesting.

If you build your package on the customers cluster do log all
you can about the cluster and build environment.  A security
fix, aptget, yum update, emerge world or up2date can change
things that you do not expect  ;-)

Have fun,
mitch

> 
> 
>    Thanks,
> 
>    Ben
> 
> 
>    Ben Held
>    Simulation Technology & Applied Research, Inc.
>    11520 N. Port Washington Rd., Suite 201
>    Mequon, WI 53092
>    P: 1.262.240.0291 x101
>    F: 1.262.240.0294
>    E: [1]ben.held at staarinc.com
>    [2]http://www.staarinc.com
> 
> References
> 
>    1. mailto:ben.held at staarinc.com
>    2. http://www.staarinc.com/

> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss


-- 
	T o m   M i t c h e l l
	Host Solutions Group, QLogic Corp.  
	http://www.qlogic.com   http://support.qlogic.com



More information about the mvapich-discuss mailing list