[mvapich-discuss] compiling MVAPICH against GCC-4.3.3

Jonathan Perkins perkinjo at cse.ohio-state.edu
Tue Jun 30 11:21:43 EDT 2009


On Tue, Jun 30, 2009 at 04:21:08PM +0200, Michael Rapson wrote:
> Hi there,
> 
> The exotic thing about the architecture is that everything dates from
> 2006. It is an IBM e1350 Cluster with AMD Opteron processors (x86_64)
> but it was commissioned in 2006 and the software has been left as is
> since then. (Only the original software is supported fully). In
> particular it is still using OFED version 1.1.

I wouldn't think that this would cause a problem.  Can you try using a
fresh copy of the source (perhaps last night's tarball) and give it a
run through.

> 
> There is an existing MVAPICH installation on the machine (version
> 0.9.8 apparently) which I have used to compare my new build against,
> in particular getting those parameters I mentioned, but I need to use
> more recent gcc compilers for some of my code. (The gcc version on the
> machine is 3.3.3.) I have figured out / borrowed from other
> applications a good submit script for the old MVAPICH install (pasted
> below) but as I say this method of running the executable gives a
> 'Child exited abnormally!' error (slightly better than when I use
> mpirun_rsh -rsh alternative where I get permission denied messages).

It may be easier to debug the mpirun_rsh alternative.  Is ssh enabled on
your system?  Was there a particular file or executable listed when it
said permission denied?

> 
> So regarding the unusual architecture, its mainly that all the system
> files are very old compared to the gcc-4.3.3 compiler. (I have used
> one of the other tips for linking to an old ofed package, removing the
> -DXRC flag as suggested in
> http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2009-March/002184.html.)
> Perhaps this explains why the PARAMETER statement's values aren't
> determined correctly?

This is unrelated.

> 
> Any ideas about whether some of the environment variables below are
> necessary or could be causing problems would be most appreciated.

The following variables are no longer used:
VIADEV_DEFAULT_RETRY_COUNT
VIADEV_DEFAULT_TIME_OUT
VIADEV_DEFAULT_MAX_SG_LIST
DISABLE_RDMA_ALLTOALL
DISABLE_RDMA_ALLGATHER
DISABLE_RDMA_BARRIER

This should be changed:
VIADEV_SQ_SIZE_MAX=64;    --> VIADEV_SQ_SIZE=64;
VIADEV_ENABLE_AFFINITY=0; --> VIADEV_USE_AFFINITY=0;

If a fresh build with the edits to the environment variables don't work
I wonder whether you can isolate the run to a single node to see if the
error shows in this case as well.

>
> 
> Thanks,
> 
> Michael
> 
> # @ shell = /usr/bin/ksh
> # @ output = $(Executable).$(Cluster).out
> # @ error = $(Executable).$(Cluster).err
> # @ wall_clock_limit = 12:00:00
> # @ class= UAT
> # @ node = 4
> # @ node_usage = not_shared
> # @ job_type = MPICH
> # @ notification = error
> # @ resources = ConsumableCpus(1)
> # @ tasks_per_node = 4
> # @ environment = GOTO_NUM_THREADS=1; OMP_NUM_THREADS=1;
> VIADEV_CLUSTER_SIZE=AUTO; VIADEV_DEFAULT_RETRY_COUNT=15;
> VIADEV_DEFAULT_TIME_OUT=22; VIADEV_NUM_RDMA_BUFFER=4;
> VIADEV_ADAPTIVE_RDMA_LIMIT=2; VIADEV_SQ_SIZE_MAX=64;
> VIADEV_DEFAULT_MAX_SG_LIST=1; VIADEV_MAX_INLINE_SIZE=80;
> VIADEV_SRQ_SIZE=2048; VIADEV_VBUF_TOTAL_SIZE=2048;
> VIADEV_VBUF_POOL_SIZE=512; VIADEV_VBUF_SECONDARY_POOL_SIZE=128;
> VIADEV_ENABLE_AFFINITY=0; DISABLE_RDMA_ALLTOALL=1;
> DISABLE_RDMA_ALLGATHER=1; DISABLE_RDMA_BARRIER=1
> 
> # @ queue
> echo "++++++++++"
> echo "host files is:"
> echo " "
> cat $LOADL_HOSTFILE
> cp $LOADL_HOSTFILE $LOADL_STEP_OUT.hostfile
> echo " "
> echo "++++++++++"
> 
> /CHPC/usr/local/mvapich/bin/mpirun \
> -np $LOADL_TOTAL_TASKS \
> -hostfile $LOADL_HOSTFILE \
> src/snes/examples/tutorials/ex5f
> #src/snes/examples/tutorials/ex19
> 
> 
> 
> On Tue, Jun 30, 2009 at 3:53 PM, Jonathan
> Perkins<perkinjo at cse.ohio-state.edu> wrote:
> > On Tue, Jun 30, 2009 at 03:11:01PM +0200, Michael Rapson wrote:
> >> Hi there Jonathan,
> >>
> >> Thanks for the patch, I applied it and it fixed the c++ problem.
> >
> > Glad to hear this.
> >
> >>
> >> I was able to build the library, but needed to use a work around for
> >> an unrelated problem (I think). For some reason the value of
> >> MPI_ADDRESS_KIND and MPI_OFFSET_KIND in mpif.h (all copies) is not
> >> determined correctly. I  edited all versions of mpif.h by hand and
> >> gave these terms the value 8 (found in other mvapich install) then ran
> >> make mpi-modules, make install, and make mpi-lib-test.
> >
> > That's odd.  Is there anything exotic about your system (architecture)?
> >
> >>
> >> The library passed all its internal checks and I was planning to send
> >> you a note letting you know that it worked once I had run some of the
> >> tests in packages depending on mpi. (PETSc and Trilinos) cracking the
> >> llsubmit script is taking longer than I thought though (I am getting
> >> "Child exited abnormally!" errors which I see from the archives can be
> >> related to the scheduler (the cluster uses Tivoli Load Leveler
> >> software).
> >>
> >> So summary, thanks I am 90% sure the patch worked on my system but am
> >> tracking down the correct submission script before I can be certain.
> >> Thanks for the help!
> >
> > Thanks for the feedback.
> >
> >>
> >> Cheers,
> >> Michael
> >>
> >> On Mon, Jun 29, 2009 at 1:59 PM, Jonathan
> >> Perkins<perkinjo at cse.ohio-state.edu> wrote:
> >> > On Mon, Jun 29, 2009 at 07:45:01AM +0200, Michael Rapson wrote:
> >> >> Hi all,
> >> >>
> >> >> I am coming in at the end of a conversation between Atencio and
> >> >> Jonathan discussing a problem compiling against GCC-4.3.3. I am also
> >> >> compiling against GCC-4.3.3 and am running into the same issue with
> >> >> the iostream.h header file (and I presume many similar files since
> >> >> this is just a testcase). I am new to the MVAPICH mailing list and it
> >> >> seems like a patch for the problem has been written, but it hasn't
> >> >> made it onto
> >> >> the archives of mvapich-discuss. Does someone have a copy of the patch
> >> >> that they could send me or where else could I download it from?
> >> >
> >> > I'm attaching it in this reply, it won't show on the list but you'll get
> >> > it directly.
> >> >
> >> >>
> >> >> I have been trying to install mvapich-1.1-2009-06-21. Is it likely
> >> >> that the patch would have already been incorporated into the newer
> >> >> daily tarballs?
> >> >
> >> > It should have been, this is an oversight on my part.  It'll be in
> >> > tonight's nightly tarball.
> >> >
> >> >>
> >> >> Thanks for your help.
> >> >>
> >> >> Regards,
> >> >> Michael
> >> >> _______________________________________________
> >> >> mvapich-discuss mailing list
> >> >> mvapich-discuss at cse.ohio-state.edu
> >> >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >> >
> >> > --
> >> > Jonathan Perkins
> >> > http://www.cse.ohio-state.edu/~perkinjo
> >> >
> >>
> >> _______________________________________________
> >> mvapich-discuss mailing list
> >> mvapich-discuss at cse.ohio-state.edu
> >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >
> > --
> > Jonathan Perkins
> > http://www.cse.ohio-state.edu/~perkinjo
> >
> 
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss

-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090630/186d25b9/attachment.bin


More information about the mvapich-discuss mailing list