[mvapich-discuss] Solaris x86

Di Domenico, Michael mdidomenico at silverstorm.com
Tue Apr 18 15:01:31 EDT 2006


Lei,

I'll look into remote access, not sure if it's possible...  But, If
you're testing Solaris 10 x86 in your lab, then I'll see if I can debug
it a little more here and see what I maybe missing.

For reference I used Solaris 10 1/06 x86
Did a Full OEM install of the O/S from the DVD
Then I installed these packages from Sunfreeware.com
application SMCautoc                         autoconf
application SMCautom                         automake
application SMCbison                         bison
application SMCflex                          flex
application SMCgcc                           gcc
application SMCgzip                          gzip
application SMCiconv                         libiconv
application SMCmake                          make
application SMCperl                          perl
application SMCrsync                         rsync
application SMCtar                           tar

This is all, I've done to the machine other then bring up IPofIB using
ifconfig ibd0 plumb
ifconfig ibd0 inet 192.168.101.41 netmask + up

Otherwise, I've followed the procedures for compiling mvapich using the
make.mvapich.udapl scripts...

Thanks again for your help...


-----Original Message-----
From: LEI CHAI [mailto:chai.15 at osu.edu] 
Sent: Tuesday, April 18, 2006 2:46 PM
To: Di Domenico, Michael
Cc: mvapich-discuss at cse.ohio-state.edu
Subject: Re: RE: RE: RE: RE: [mvapich-discuss] Solaris x86

Michael,

I think we may miss something very small here, but since it works
smoothly on our cluster, it is tough to just guess what might be wrong.
If you could provide us an account to your cluster, it will be easier
for us to find the problem.

Thanks.
Lei


----- Original Message -----
From: "Di Domenico, Michael" <mdidomenico at silverstorm.com>
Date: Tuesday, April 18, 2006 1:58 pm
Subject: RE: RE: RE: RE: [mvapich-discuss] Solaris x86

> Lei,
> 
> I appreciate your help on this... thanks
> 
> bash-3.00# pwd
> /root
> 
> bash-3.00# ls /root
> mvapich-0.9.7         mvapich-0.9.7.tar.gz  print.pl
> 
> bash-3.00# ls /root/mvapich-0.9.7
> a.out                        install-mine.log
> mpich.static.dsw
> acconfig.h                   installtest                  mpichconf.h
> aclocal.m4                   lib                          
> mpichconf.h.inaclocal_tcl.m4               LICENSE.TXT              
>    mpichversion.o
> bin                          make-mine.log                mpid
> buildmsg                     make.mvapich.def
> multirail.mpd.sh
> ccbugs                       make.mvapich.gen2            
> mvapich.mpd.shconfig-mine.log              
> make.mvapich.gen2_multirail  osu_benchmarks
> config.log                   make.mvapich.tcp             README
> config.status                make.mvapich.udapl           README_MPICH
> configure                    make.mvapich.vapi            romio
> configure.in                 make.mvapich.vapi_multirail  sbin
> COPYRIGHT                    Makefile                     share
> COPYRIGHT_MVAPICH            Makefile.in                  src
> doc                          makelinks                    util
> etc                          man                          www
> examples                     mpe                          www.index
> f90modules                   MPI-2-C++
> include                      mpich.dsw
> 
> bash-3.00# /root/mvapich-0.9.7/bin/mpirun_rsh -np 2 tse41 tse42
> DAPL_PROVIDER=ibd0 /opt/mvapich/examples/cpi
> /usr/bin/env: No such file or directory
> /usr/bin/env: No such file or directory
> 
> Changing the command line to
> 
> bash-3.00# /root/mvapich-0.9.7/bin/mpirun -np 2 -machinefile
> /opt/mvapich/share/machines.udapl /opt/mvapich/examples/cpi
> [0] Abort: cannot open IA at line 214 in file viainit.c
> mpirun: executable version 0 does not match our version 3.
> done.
> 
> Applying the patch provided...
> 
> bash-3.00# /root/mvapich-0.9.7/bin/mpirun_rsh -np 2 -hostfile
> /opt/mvapich/share/machines.udapl /opt/mvapich/examples/cpi 
> /usr/bin/env: No such file or directory
> sh: /root/mvapich-0.9.7: does not exist
> 
> -----Original Message-----
> From: LEI CHAI [chai.15 at osu.edu] 
> Sent: Tuesday, April 18, 2006 1:20 PM
> To: Di Domenico, Michael
> Cc: mvapich-discuss at cse.ohio-state.edu
> Subject: Re: RE: RE: RE: [mvapich-discuss] Solaris x86
> 
> Michael,
> 
> We also have Solaris/X86 in our lab, and we have tested MVAPICH on
> Solaris and didn't have this problem. For the time being, could you 
> use$COMPILE_PATH/bin/mpirun_rsh instead of $INSTALL/bin/mpirun_rsh?
> $COMPILE_PATH is the directory of MVAPICH source code:
> 
> $COMPILE_PATH/bin/mpirun_rsh -np 2 node1 node2 DAPL_PROVIDER=ibd0 
> ./cpi
> If you still see the "cannot open IA" problem, could you apply the 
> patchbelow and let us know the output? The patch is just to print 
> out the
> IAname.
> 
> Thanks.
> Lei
> 
> -------------------------------------------------
> --- viainit.c.orig      Tue Apr 18 13:04:10 2006
> +++ viainit.c.new       Tue Apr 18 13:05:26 2006
> @@ -211,7 +211,7 @@
>                        &async_evd_handle, &viadev.nic);
>     if (ret != DAT_SUCCESS)
>       {
> -          udapl_error_abort (GEN_EXIT_ERR, "cannot open IA");
> +          udapl_error_abort (GEN_EXIT_ERR, "cannot open IA: %s",
> dapl_provider);
>       }
> 
>     viadev.maxtransfersize = viadev_max_rdma_size;
> 
> -----------------------------------------------
> 
> ----- Original Message -----
> From: "Di Domenico, Michael" <mdidomenico at silverstorm.com>
> Date: Tuesday, April 18, 2006 12:29 pm
> Subject: RE: RE: RE: [mvapich-discuss] Solaris x86
> 
> > Lei,
> > 
> > Something must not be moved correctly during the install process 
> of 
> > themake script and is corrupting the executable...  more then 
> > likely I
> > would personally suspect is that a tool your using to move the 
> > files is
> > different on solaris then it is on linux....
> > 
> > I've also added DAPL_PROVIDER to the ~/.bashrc and ~/.profile 
> > files.  If
> > I ssh from one machine to another it does get set, as evidenced 
> by 
> > echo$DAPL_PROVIDER...
> > 
> > 
> > ...output truncated....
> > installed MPICH in /opt/mvapich
> > /opt/mvapich/sbin/mpiuninstall may be used to remove the 
> installation.> Congratulations on successfully building MVAPICH. 
> Please send your
> > feedback to mvapich-help at cse.ohio
> > -state.edu.
> > bash-3.00# /opt/mvapich/bin/mpirun_rsh
> > bash: /opt/mvapich/bin/mpirun_rsh: Invalid argument
> > bash-3.00# file /opt/mvapich/bin/mpirun_rsh
> > can't read ELF header
> > /opt/mvapich/bin/mpirun_rsh:
> > bash-3.00#
> > 
> > -----Original Message-----
> > From: LEI CHAI [chai.15 at osu.edu] 
> > Sent: Tuesday, April 18, 2006 12:01 PM
> > To: Di Domenico, Michael
> > Cc: mvapich-discuss at cse.ohio-state.edu
> > Subject: Re: RE: RE: [mvapich-discuss] Solaris x86
> > 
> > Michael,
> > 
> > One small thing, please make sure to export DAPL_PROVIDER in the 
> > .bashrcfile instead of export it in the current shell. Export it 
> in 
> > the current
> > shell does not help and we are taking a look at it.
> > 
> > Also, we do not understand why you need to copy mpirun_rsh from
> > mpid/udapl/process. If you run mvapich-0.9.7/make.mvapich.udapl to
> > rebuild mvapich , mpirun_rsh should be generated automatically in 
> your> $INSTALL/bin directory. Could you just run 
> $INSTALL/bin/mpirun_rsh> without any argument and let us know the 
> result?> 
> > Thanks.
> > Lei
> > 
> > 
> > ----- Original Message -----
> > From: "Di Domenico, Michael" <mdidomenico at silverstorm.com>
> > Date: Tuesday, April 18, 2006 11:20 am
> > Subject: RE: RE: [mvapich-discuss] Solaris x86
> > 
> > > Lei,
> > > 
> > > I had a feeling you were going to say that... See my outputs 
> > below. 
> > > The
> > > IB card is definitely up, it's detected successfully by the 
> > kernel 
> > > and I
> > > can run mvapich using IP over IB with no issues...
> > > 
> > > bash-3.00# tail /etc/dat/dat.conf
> > > ....output truncated....
> > > # IAname version threadsafe default library-path provider-
> version \
> > > #       instance-data platform-information
> > > #
> > > ibd0  u1.2  nonthreadsafe  default  udapl_tavor.so.1  SUNW.1.0  
> " "
> > > "driver_name=tavor"
> > > 
> > > bash-3.00# ifconfig ibd0
> > > ibd0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 
> 2044 
> > > index3
> > >        inet 192.168.101.41 netmask ffffff00 broadcast 
> > 192.168.101.255>        ipib 
> > 0:0:4:6:fe:80:0:0:0:0:0:0:0:6:6a:0:a0:0:3:a1> 
> > > bash-3.00# ping 192.168.101.41
> > > 192.168.101.41 is alive (tse41-ib)
> > > bash-3.00# ping 192.168.101.42
> > > 192.168.101.42 is alive (tse42-ib)
> > > 
> > > bash-3.00# echo $DAPL_PROVIDER
> > > ibd0
> > > bash-3.00#
> > > 
> > > -----Original Message-----
> > > From: LEI CHAI [chai.15 at osu.edu] 
> > > Sent: Tuesday, April 18, 2006 11:13 AM
> > > To: Di Domenico, Michael
> > > Cc: mvapich-discuss at cse.ohio-state.edu
> > > Subject: Re: RE: [mvapich-discuss] Solaris x86
> > > 
> > > Michael,
> > > 
> > > There are several possible reasons that you see this error:
> > > 
> > > 1. There is no valid entry in /etc/dat/dat.conf
> > > 
> > > 2. There is no "export DAPL_PROVIDER=ibd0" in your .bashrc 
> file, or
> > > "source ~/.bashrc" was not done if you were already in the shell.
> > > 
> > > 3. InfiniBand on the node is not working properly.
> > > 
> > > I guess you have taken care of 1 and 2. For 3, could you do a 
> > > "ifconfigibd0" and let us know the output?
> > > 
> > > Thanks.
> > > Lei
> > > 
> > > 
> > > ----- Original Message -----
> > > From: "Di Domenico, Michael" <mdidomenico at silverstorm.com>
> > > Date: Tuesday, April 18, 2006 9:24 am
> > > Subject: RE: [mvapich-discuss] Solaris x86
> > > 
> > > > Lei,
> > > > 
> > > > 
> > > > 
> > > > I copied mpirun_rsh from the mpid/udal/process directory 
> which 
> > > > seems to
> > > > be a valid executable, and now I get
> > > > 
> > > > 
> > > > 
> > > > bash-3.00# ./mpirun -hostfile ../share/machines.udapl ./cpi
> > > > 
> > > > [0] Abort: cannot open IA at line 214 in file viainit.c
> > > > 
> > > > mpirun: executable version 0 does not match our version 3.
> > > > 
> > > > done.
> > > > 
> > > > 
> > > > 
> > > > ________________________________
> > > > 
> > > > From: lei chai [chai.15 at osu.edu] 
> > > > Sent: Monday, April 17, 2006 10:14 PM
> > > > To: Di Domenico, Michael; mvapich-discuss at cse.ohio-state.edu
> > > > Subject: Re: [mvapich-discuss] Solaris x86
> > > > 
> > > > 
> > > > 
> > > > Michael,
> > > > 
> > > > 
> > > > 
> > > > Thanks for reporting the mpirun problem. We have now fixed 
> it. 
> > > > Please go
> > > > to your mvapich-0.9.7/mpid/udapl directory, change the names 
> of 
> > the> > files mpirun.vapi.args and mpirun.vapi.in to 
> > mpirun.udapl.args and
> > > > mpirun.udapl.in. Then replace "vapi" in
> > > > mvapich-0.9.7/mpid/udapl/mpirun.lst to "udapl". You also need 
> > to add
> > > > "export DAPL_PROVIDER=ibd0" to your .bashrc file. After 
> > rebuild, you
> > > > could run a program:
> > > > 
> > > > 
> > > > 
> > > > mpirun -n 2 -machinefile my-machine-file ./cpi
> > > > 
> > > > 
> > > > 
> > > > where my-machine-file contains host names.
> > > > 
> > > > 
> > > > 
> > > > We have never had problem with mpirun_rsh before. Please 
> follow 
> > > Matt's> suggestion and let us know the result.
> > > > 
> > > > 
> > > > 
> > > > Thanks.
> > > > 
> > > > Lei
> > > > 
> > > > 
> > > > 
> > > > 	----- Original Message ----- 
> > > > 
> > > > 	From: Di Domenico, Michael <')"
>mdidomenico at silverstorm.com>
> > > > 
> > > > 
> > > > 	To: lei chai <')" >chai.15 at osu.edu>  ;
> > > > mvapich-discuss at cse.ohio-state.edu 
> > > > 
> > > > 	Sent: Monday, April 17, 2006 5:00 PM
> > > > 
> > > > 	Subject: RE: [mvapich-discuss] Solaris x86
> > > > 
> > > >         
> > > > 
> > > > 	Lei,
> > > > 
> > > >         
> > > > 
> > > > 	Thanks for the reply, but it still doesn't work...
> > > > 
> > > >         
> > > > 
> > > > 	--- first try with mpirun_rsh 
> > > > 
> > > >         
> > > > 
> > > > 	bash-3.00# /opt/mvapich/bin/mpirun_rsh -np 2 tse41-ib
tse42-ib
> > > > DAPL_PROVIDER="ibd0" ./cpi
> > > > 
> > > > 	bash: /opt/mvapich/bin/mpirun_rsh: Invalid argument
> > > > 
> > > >         
> > > > 
> > > > 	--- second try with mpirun (just to see what happens)
> > > > 
> > > >         
> > > > 
> > > > 	bash-3.00# /opt/mvapich/bin/mpirun -np 2 tse41-ib
tse42-ib
> > > > DAPL_PROVIDER="ibd0" ./cpi
> > > > 
> > > > 	Warning: Command line arguments for program should be
given
> > > > 
> > > > 	after the program name.  Assuming that tse42-ib is a
> > > > 
> > > > 	command line argument for the program.
> > > > 
> > > > 	Warning: Command line arguments for program should be
given
> > > > 
> > > > 	after the program name.  Assuming that
DAPL_PROVIDER=ibd0 is a
> > > > 
> > > > 	command line argument for the program.
> > > > 
> > > > 	Unrecognized argument tse41-ib ignored.
> > > > 
> > > > 	Cannot find MPIRUN machine file for machine udapl
> > > > 
> > > > 	and architecture solaris86 .
> > > > 
> > > > 	(No device specified.)
> > > > 
> > > > 	bash-3.00#
> > > > 
> > > >         
> > > > 
> > > > 	
> > > > ________________________________
> > > > 
> > > > 
> > > > 	From: lei chai [chai.15 at osu.edu] 
> > > > 	Sent: Monday, April 17, 2006 4:50 PM
> > > > 	To: Di Domenico, Michael;
mvapich-discuss at cse.ohio-state.edu
> > > > 	Subject: Re: [mvapich-discuss] Solaris x86
> > > > 
> > > >         
> > > > 
> > > > 	Hi,
> > > > 
> > > >         
> > > > 
> > > > 	Thank you for trying out MVAPICH-0.9.7. Please use
mpirun_rsh
> > > > instead of mpirun. And for using the uDAPL device, please 
> > specify an
> > > > IAname, e.g.
> > > > 
> > > >         
> > > > 
> > > > 	/opt/mvapich/bin/mpirun_rsh -np 2 node1 node2
> > > > DAPL_PROVIDER="IAname" ./cpi
> > > > 
> > > >         
> > > > 
> > > > 	The IAname can be found in /etc/dat/dat.conf, it is the
first
> > > > field.
> > > > 
> > > >         
> > > > 
> > > > 	Hope this helps.
> > > > 
> > > >         
> > > > 
> > > > 	Regards,
> > > > 
> > > > 	Lei
> > > > 
> > > >         
> > > > 
> > > >        	----- Original Message ----- 
> > > > 
> > > >        	From: Di Domenico, Michael
> > > > <')" >mdidomenico at silverstorm.com>  
> > > > 
> > > >        	To: mvapich-discuss at cse.ohio-state.edu 
> > > > 
> > > >        	Sent: Monday, April 17, 2006 4:06 PM
> > > > 
> > > >        	Subject: [mvapich-discuss] Solaris x86
> > > > 
> > > >                 
> > > > 
> > > >        	I'm trying to get Mvapich 0.9.7 to compile and run on
> > > > Solaris 10 1/06 x86 using the GNU toolset downloaded from
> > > > sunfreeware.com...
> > > > 
> > > >                 
> > > > 
> > > >        	I'm attaching the outputs from ./make.mvapich.udapl.
> > > > 
> > > >                 
> > > > 
> > > >        	Everything seems to compile, but I don't ever seem to
> > > > get a mpirun.udapl file...  Any clue's that I missed from the 
> make> > > outputs?
> > > > 
> > > >                 
> > > > 
> > > >        	bash-3.00# cd /opt/mvapich/examples/
> > > > 
> > > >        	bash-3.00# ls
> > > > 
> > > >        	cpi          cpi.o        cpip.c       Makefile
> > > > MPI-2-C++    README
> > > > 
> > > >        	cpi.c        cpilog.c     hello++.cc   Makefile.in
> > > > mpirun       simpleio.c
> > > > 
> > > >        	bash-3.00# ./mpirun ./cpi
> > > > 
> > > >        	Cannot find MPIRUN machine file for machine udapl
> > > > 
> > > >        	and architecture solaris86 .
> > > > 
> > > >        	(No device specified.)
> > > > 
> > > >        	bash-3.00# sh -x ./mpirun ./cpi
> > > > 
> > > >        	....output truncated....
> > > > 
> > > >        	+ [ -x /opt/mvapich/bin/mpirun.udapl ] 
> > > > 
> > > >        	+ echo Cannot find MPIRUN machine file for machine udapl
> > > > 
> > > > 
> > > >        	Cannot find MPIRUN machine file for machine udapl
> > > > 
> > > >        	+ echo and architecture solaris86 . 
> > > > 
> > > >        	and architecture solaris86 .
> > > > 
> > > >        	+ [ -n  ] 
> > > > 
> > > >        	+ echo (No device specified.) 
> > > > 
> > > >        	(No device specified.)
> > > > 
> > > >        	+ exit 1 
> > > > 
> > > >        	bash-3.00# ls /opt/mvapich/bin
> > > > 
> > > >        	mpiCC                 mpiman                mpirun.args
> > > > mpirun_dbg.ddd        mpirun_dbg.xxgdb
> > > > 
> > > >        	mpicc                 mpireconfig           mpirun.vapi
> > > > mpirun_dbg.gdb        mpirun_rsh
> > > > 
> > > >        	mpichversion          mpireconfig.dat
> > > > mpirun.vapi.args      mpirun_dbg.ladebug    tarch
> > > > 
> > > >        	mpicxx                mpirun
> > > > mpirun_dbg.dbx        mpirun_dbg.totalview  tdevice
> > > > 
> > > >        	
> > > > ________________________________
> > > > 
> > > > 
> > > >        	_______________________________________________
> > > >        	mvapich-discuss mailing list
> > > >        	mvapich-discuss at cse.ohio-state.edu
> > > > 	
> > > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> > > > 
> > > > 
> > > 
> > 
> > 
> 
> 




More information about the mvapich-discuss mailing list