[mvapich-discuss] Solaris x86
Di Domenico, Michael
mdidomenico at silverstorm.com
Tue Apr 18 15:01:31 EDT 2006
Lei,
I'll look into remote access, not sure if it's possible... But, If
you're testing Solaris 10 x86 in your lab, then I'll see if I can debug
it a little more here and see what I maybe missing.
For reference I used Solaris 10 1/06 x86
Did a Full OEM install of the O/S from the DVD
Then I installed these packages from Sunfreeware.com
application SMCautoc autoconf
application SMCautom automake
application SMCbison bison
application SMCflex flex
application SMCgcc gcc
application SMCgzip gzip
application SMCiconv libiconv
application SMCmake make
application SMCperl perl
application SMCrsync rsync
application SMCtar tar
This is all, I've done to the machine other then bring up IPofIB using
ifconfig ibd0 plumb
ifconfig ibd0 inet 192.168.101.41 netmask + up
Otherwise, I've followed the procedures for compiling mvapich using the
make.mvapich.udapl scripts...
Thanks again for your help...
-----Original Message-----
From: LEI CHAI [mailto:chai.15 at osu.edu]
Sent: Tuesday, April 18, 2006 2:46 PM
To: Di Domenico, Michael
Cc: mvapich-discuss at cse.ohio-state.edu
Subject: Re: RE: RE: RE: RE: [mvapich-discuss] Solaris x86
Michael,
I think we may miss something very small here, but since it works
smoothly on our cluster, it is tough to just guess what might be wrong.
If you could provide us an account to your cluster, it will be easier
for us to find the problem.
Thanks.
Lei
----- Original Message -----
From: "Di Domenico, Michael" <mdidomenico at silverstorm.com>
Date: Tuesday, April 18, 2006 1:58 pm
Subject: RE: RE: RE: RE: [mvapich-discuss] Solaris x86
> Lei,
>
> I appreciate your help on this... thanks
>
> bash-3.00# pwd
> /root
>
> bash-3.00# ls /root
> mvapich-0.9.7 mvapich-0.9.7.tar.gz print.pl
>
> bash-3.00# ls /root/mvapich-0.9.7
> a.out install-mine.log
> mpich.static.dsw
> acconfig.h installtest mpichconf.h
> aclocal.m4 lib
> mpichconf.h.inaclocal_tcl.m4 LICENSE.TXT
> mpichversion.o
> bin make-mine.log mpid
> buildmsg make.mvapich.def
> multirail.mpd.sh
> ccbugs make.mvapich.gen2
> mvapich.mpd.shconfig-mine.log
> make.mvapich.gen2_multirail osu_benchmarks
> config.log make.mvapich.tcp README
> config.status make.mvapich.udapl README_MPICH
> configure make.mvapich.vapi romio
> configure.in make.mvapich.vapi_multirail sbin
> COPYRIGHT Makefile share
> COPYRIGHT_MVAPICH Makefile.in src
> doc makelinks util
> etc man www
> examples mpe www.index
> f90modules MPI-2-C++
> include mpich.dsw
>
> bash-3.00# /root/mvapich-0.9.7/bin/mpirun_rsh -np 2 tse41 tse42
> DAPL_PROVIDER=ibd0 /opt/mvapich/examples/cpi
> /usr/bin/env: No such file or directory
> /usr/bin/env: No such file or directory
>
> Changing the command line to
>
> bash-3.00# /root/mvapich-0.9.7/bin/mpirun -np 2 -machinefile
> /opt/mvapich/share/machines.udapl /opt/mvapich/examples/cpi
> [0] Abort: cannot open IA at line 214 in file viainit.c
> mpirun: executable version 0 does not match our version 3.
> done.
>
> Applying the patch provided...
>
> bash-3.00# /root/mvapich-0.9.7/bin/mpirun_rsh -np 2 -hostfile
> /opt/mvapich/share/machines.udapl /opt/mvapich/examples/cpi
> /usr/bin/env: No such file or directory
> sh: /root/mvapich-0.9.7: does not exist
>
> -----Original Message-----
> From: LEI CHAI [chai.15 at osu.edu]
> Sent: Tuesday, April 18, 2006 1:20 PM
> To: Di Domenico, Michael
> Cc: mvapich-discuss at cse.ohio-state.edu
> Subject: Re: RE: RE: RE: [mvapich-discuss] Solaris x86
>
> Michael,
>
> We also have Solaris/X86 in our lab, and we have tested MVAPICH on
> Solaris and didn't have this problem. For the time being, could you
> use$COMPILE_PATH/bin/mpirun_rsh instead of $INSTALL/bin/mpirun_rsh?
> $COMPILE_PATH is the directory of MVAPICH source code:
>
> $COMPILE_PATH/bin/mpirun_rsh -np 2 node1 node2 DAPL_PROVIDER=ibd0
> ./cpi
> If you still see the "cannot open IA" problem, could you apply the
> patchbelow and let us know the output? The patch is just to print
> out the
> IAname.
>
> Thanks.
> Lei
>
> -------------------------------------------------
> --- viainit.c.orig Tue Apr 18 13:04:10 2006
> +++ viainit.c.new Tue Apr 18 13:05:26 2006
> @@ -211,7 +211,7 @@
> &async_evd_handle, &viadev.nic);
> if (ret != DAT_SUCCESS)
> {
> - udapl_error_abort (GEN_EXIT_ERR, "cannot open IA");
> + udapl_error_abort (GEN_EXIT_ERR, "cannot open IA: %s",
> dapl_provider);
> }
>
> viadev.maxtransfersize = viadev_max_rdma_size;
>
> -----------------------------------------------
>
> ----- Original Message -----
> From: "Di Domenico, Michael" <mdidomenico at silverstorm.com>
> Date: Tuesday, April 18, 2006 12:29 pm
> Subject: RE: RE: RE: [mvapich-discuss] Solaris x86
>
> > Lei,
> >
> > Something must not be moved correctly during the install process
> of
> > themake script and is corrupting the executable... more then
> > likely I
> > would personally suspect is that a tool your using to move the
> > files is
> > different on solaris then it is on linux....
> >
> > I've also added DAPL_PROVIDER to the ~/.bashrc and ~/.profile
> > files. If
> > I ssh from one machine to another it does get set, as evidenced
> by
> > echo$DAPL_PROVIDER...
> >
> >
> > ...output truncated....
> > installed MPICH in /opt/mvapich
> > /opt/mvapich/sbin/mpiuninstall may be used to remove the
> installation.> Congratulations on successfully building MVAPICH.
> Please send your
> > feedback to mvapich-help at cse.ohio
> > -state.edu.
> > bash-3.00# /opt/mvapich/bin/mpirun_rsh
> > bash: /opt/mvapich/bin/mpirun_rsh: Invalid argument
> > bash-3.00# file /opt/mvapich/bin/mpirun_rsh
> > can't read ELF header
> > /opt/mvapich/bin/mpirun_rsh:
> > bash-3.00#
> >
> > -----Original Message-----
> > From: LEI CHAI [chai.15 at osu.edu]
> > Sent: Tuesday, April 18, 2006 12:01 PM
> > To: Di Domenico, Michael
> > Cc: mvapich-discuss at cse.ohio-state.edu
> > Subject: Re: RE: RE: [mvapich-discuss] Solaris x86
> >
> > Michael,
> >
> > One small thing, please make sure to export DAPL_PROVIDER in the
> > .bashrcfile instead of export it in the current shell. Export it
> in
> > the current
> > shell does not help and we are taking a look at it.
> >
> > Also, we do not understand why you need to copy mpirun_rsh from
> > mpid/udapl/process. If you run mvapich-0.9.7/make.mvapich.udapl to
> > rebuild mvapich , mpirun_rsh should be generated automatically in
> your> $INSTALL/bin directory. Could you just run
> $INSTALL/bin/mpirun_rsh> without any argument and let us know the
> result?>
> > Thanks.
> > Lei
> >
> >
> > ----- Original Message -----
> > From: "Di Domenico, Michael" <mdidomenico at silverstorm.com>
> > Date: Tuesday, April 18, 2006 11:20 am
> > Subject: RE: RE: [mvapich-discuss] Solaris x86
> >
> > > Lei,
> > >
> > > I had a feeling you were going to say that... See my outputs
> > below.
> > > The
> > > IB card is definitely up, it's detected successfully by the
> > kernel
> > > and I
> > > can run mvapich using IP over IB with no issues...
> > >
> > > bash-3.00# tail /etc/dat/dat.conf
> > > ....output truncated....
> > > # IAname version threadsafe default library-path provider-
> version \
> > > # instance-data platform-information
> > > #
> > > ibd0 u1.2 nonthreadsafe default udapl_tavor.so.1 SUNW.1.0
> " "
> > > "driver_name=tavor"
> > >
> > > bash-3.00# ifconfig ibd0
> > > ibd0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu
> 2044
> > > index3
> > > inet 192.168.101.41 netmask ffffff00 broadcast
> > 192.168.101.255> ipib
> > 0:0:4:6:fe:80:0:0:0:0:0:0:0:6:6a:0:a0:0:3:a1>
> > > bash-3.00# ping 192.168.101.41
> > > 192.168.101.41 is alive (tse41-ib)
> > > bash-3.00# ping 192.168.101.42
> > > 192.168.101.42 is alive (tse42-ib)
> > >
> > > bash-3.00# echo $DAPL_PROVIDER
> > > ibd0
> > > bash-3.00#
> > >
> > > -----Original Message-----
> > > From: LEI CHAI [chai.15 at osu.edu]
> > > Sent: Tuesday, April 18, 2006 11:13 AM
> > > To: Di Domenico, Michael
> > > Cc: mvapich-discuss at cse.ohio-state.edu
> > > Subject: Re: RE: [mvapich-discuss] Solaris x86
> > >
> > > Michael,
> > >
> > > There are several possible reasons that you see this error:
> > >
> > > 1. There is no valid entry in /etc/dat/dat.conf
> > >
> > > 2. There is no "export DAPL_PROVIDER=ibd0" in your .bashrc
> file, or
> > > "source ~/.bashrc" was not done if you were already in the shell.
> > >
> > > 3. InfiniBand on the node is not working properly.
> > >
> > > I guess you have taken care of 1 and 2. For 3, could you do a
> > > "ifconfigibd0" and let us know the output?
> > >
> > > Thanks.
> > > Lei
> > >
> > >
> > > ----- Original Message -----
> > > From: "Di Domenico, Michael" <mdidomenico at silverstorm.com>
> > > Date: Tuesday, April 18, 2006 9:24 am
> > > Subject: RE: [mvapich-discuss] Solaris x86
> > >
> > > > Lei,
> > > >
> > > >
> > > >
> > > > I copied mpirun_rsh from the mpid/udal/process directory
> which
> > > > seems to
> > > > be a valid executable, and now I get
> > > >
> > > >
> > > >
> > > > bash-3.00# ./mpirun -hostfile ../share/machines.udapl ./cpi
> > > >
> > > > [0] Abort: cannot open IA at line 214 in file viainit.c
> > > >
> > > > mpirun: executable version 0 does not match our version 3.
> > > >
> > > > done.
> > > >
> > > >
> > > >
> > > > ________________________________
> > > >
> > > > From: lei chai [chai.15 at osu.edu]
> > > > Sent: Monday, April 17, 2006 10:14 PM
> > > > To: Di Domenico, Michael; mvapich-discuss at cse.ohio-state.edu
> > > > Subject: Re: [mvapich-discuss] Solaris x86
> > > >
> > > >
> > > >
> > > > Michael,
> > > >
> > > >
> > > >
> > > > Thanks for reporting the mpirun problem. We have now fixed
> it.
> > > > Please go
> > > > to your mvapich-0.9.7/mpid/udapl directory, change the names
> of
> > the> > files mpirun.vapi.args and mpirun.vapi.in to
> > mpirun.udapl.args and
> > > > mpirun.udapl.in. Then replace "vapi" in
> > > > mvapich-0.9.7/mpid/udapl/mpirun.lst to "udapl". You also need
> > to add
> > > > "export DAPL_PROVIDER=ibd0" to your .bashrc file. After
> > rebuild, you
> > > > could run a program:
> > > >
> > > >
> > > >
> > > > mpirun -n 2 -machinefile my-machine-file ./cpi
> > > >
> > > >
> > > >
> > > > where my-machine-file contains host names.
> > > >
> > > >
> > > >
> > > > We have never had problem with mpirun_rsh before. Please
> follow
> > > Matt's> suggestion and let us know the result.
> > > >
> > > >
> > > >
> > > > Thanks.
> > > >
> > > > Lei
> > > >
> > > >
> > > >
> > > > ----- Original Message -----
> > > >
> > > > From: Di Domenico, Michael <')"
>mdidomenico at silverstorm.com>
> > > >
> > > >
> > > > To: lei chai <')" >chai.15 at osu.edu> ;
> > > > mvapich-discuss at cse.ohio-state.edu
> > > >
> > > > Sent: Monday, April 17, 2006 5:00 PM
> > > >
> > > > Subject: RE: [mvapich-discuss] Solaris x86
> > > >
> > > >
> > > >
> > > > Lei,
> > > >
> > > >
> > > >
> > > > Thanks for the reply, but it still doesn't work...
> > > >
> > > >
> > > >
> > > > --- first try with mpirun_rsh
> > > >
> > > >
> > > >
> > > > bash-3.00# /opt/mvapich/bin/mpirun_rsh -np 2 tse41-ib
tse42-ib
> > > > DAPL_PROVIDER="ibd0" ./cpi
> > > >
> > > > bash: /opt/mvapich/bin/mpirun_rsh: Invalid argument
> > > >
> > > >
> > > >
> > > > --- second try with mpirun (just to see what happens)
> > > >
> > > >
> > > >
> > > > bash-3.00# /opt/mvapich/bin/mpirun -np 2 tse41-ib
tse42-ib
> > > > DAPL_PROVIDER="ibd0" ./cpi
> > > >
> > > > Warning: Command line arguments for program should be
given
> > > >
> > > > after the program name. Assuming that tse42-ib is a
> > > >
> > > > command line argument for the program.
> > > >
> > > > Warning: Command line arguments for program should be
given
> > > >
> > > > after the program name. Assuming that
DAPL_PROVIDER=ibd0 is a
> > > >
> > > > command line argument for the program.
> > > >
> > > > Unrecognized argument tse41-ib ignored.
> > > >
> > > > Cannot find MPIRUN machine file for machine udapl
> > > >
> > > > and architecture solaris86 .
> > > >
> > > > (No device specified.)
> > > >
> > > > bash-3.00#
> > > >
> > > >
> > > >
> > > >
> > > > ________________________________
> > > >
> > > >
> > > > From: lei chai [chai.15 at osu.edu]
> > > > Sent: Monday, April 17, 2006 4:50 PM
> > > > To: Di Domenico, Michael;
mvapich-discuss at cse.ohio-state.edu
> > > > Subject: Re: [mvapich-discuss] Solaris x86
> > > >
> > > >
> > > >
> > > > Hi,
> > > >
> > > >
> > > >
> > > > Thank you for trying out MVAPICH-0.9.7. Please use
mpirun_rsh
> > > > instead of mpirun. And for using the uDAPL device, please
> > specify an
> > > > IAname, e.g.
> > > >
> > > >
> > > >
> > > > /opt/mvapich/bin/mpirun_rsh -np 2 node1 node2
> > > > DAPL_PROVIDER="IAname" ./cpi
> > > >
> > > >
> > > >
> > > > The IAname can be found in /etc/dat/dat.conf, it is the
first
> > > > field.
> > > >
> > > >
> > > >
> > > > Hope this helps.
> > > >
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Lei
> > > >
> > > >
> > > >
> > > > ----- Original Message -----
> > > >
> > > > From: Di Domenico, Michael
> > > > <')" >mdidomenico at silverstorm.com>
> > > >
> > > > To: mvapich-discuss at cse.ohio-state.edu
> > > >
> > > > Sent: Monday, April 17, 2006 4:06 PM
> > > >
> > > > Subject: [mvapich-discuss] Solaris x86
> > > >
> > > >
> > > >
> > > > I'm trying to get Mvapich 0.9.7 to compile and run on
> > > > Solaris 10 1/06 x86 using the GNU toolset downloaded from
> > > > sunfreeware.com...
> > > >
> > > >
> > > >
> > > > I'm attaching the outputs from ./make.mvapich.udapl.
> > > >
> > > >
> > > >
> > > > Everything seems to compile, but I don't ever seem to
> > > > get a mpirun.udapl file... Any clue's that I missed from the
> make> > > outputs?
> > > >
> > > >
> > > >
> > > > bash-3.00# cd /opt/mvapich/examples/
> > > >
> > > > bash-3.00# ls
> > > >
> > > > cpi cpi.o cpip.c Makefile
> > > > MPI-2-C++ README
> > > >
> > > > cpi.c cpilog.c hello++.cc Makefile.in
> > > > mpirun simpleio.c
> > > >
> > > > bash-3.00# ./mpirun ./cpi
> > > >
> > > > Cannot find MPIRUN machine file for machine udapl
> > > >
> > > > and architecture solaris86 .
> > > >
> > > > (No device specified.)
> > > >
> > > > bash-3.00# sh -x ./mpirun ./cpi
> > > >
> > > > ....output truncated....
> > > >
> > > > + [ -x /opt/mvapich/bin/mpirun.udapl ]
> > > >
> > > > + echo Cannot find MPIRUN machine file for machine udapl
> > > >
> > > >
> > > > Cannot find MPIRUN machine file for machine udapl
> > > >
> > > > + echo and architecture solaris86 .
> > > >
> > > > and architecture solaris86 .
> > > >
> > > > + [ -n ]
> > > >
> > > > + echo (No device specified.)
> > > >
> > > > (No device specified.)
> > > >
> > > > + exit 1
> > > >
> > > > bash-3.00# ls /opt/mvapich/bin
> > > >
> > > > mpiCC mpiman mpirun.args
> > > > mpirun_dbg.ddd mpirun_dbg.xxgdb
> > > >
> > > > mpicc mpireconfig mpirun.vapi
> > > > mpirun_dbg.gdb mpirun_rsh
> > > >
> > > > mpichversion mpireconfig.dat
> > > > mpirun.vapi.args mpirun_dbg.ladebug tarch
> > > >
> > > > mpicxx mpirun
> > > > mpirun_dbg.dbx mpirun_dbg.totalview tdevice
> > > >
> > > >
> > > > ________________________________
> > > >
> > > >
> > > > _______________________________________________
> > > > mvapich-discuss mailing list
> > > > mvapich-discuss at cse.ohio-state.edu
> > > >
> > > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> > > >
> > > >
> > >
> >
> >
>
>
More information about the mvapich-discuss
mailing list