[mvapich-discuss] No execution with mvapich-gen2

Owen Stampflee ostampflee at terrasoftsolutions.com
Wed Mar 29 00:32:49 EST 2006


Yeah, same problem w/0.9.7 as we were having previously...

I've set ulimit -l to be unlimited, ibv_*_pingpong tests work...

[root at node-192-168-111-248 ~]# mpirun_rsh -np 2 192.168.111.248
192.168.111.248 /home/cpi
mpirun: executable version 0 does not match our version 3.
done.

but it works on a single node
[root at node-192-168-111-248 ~]# mpirun_rsh -np 1
192.168.111.248 /home/cpi
Process 0 on node-192-168-111-248
pi is approximately 3.1416009869231254, Error is 0.0000083333333323
wall clock time = 0.000260

Here is teh strace -f of the mpirun_rsh that doesnt execute properly:
http://cvs.terraplex.com/~owen/mpirun_rsh.strace

Cheers,
Owen

On Tue, 2006-03-21 at 21:11 -0500, Abhinav Vishnu wrote:
> Hi Owen,
> 
> > Should 0.9.7 work properly with OpenIB 1.0rc1?
> 
> There should be no problems running 0.9.7 with OpenIB 1.0rc1.
> Please let us know if you face any problems.
> 
> Thanks,
> 
> -- Abhinav
> >
> > Thanks,
> > Owen
> >
> > On Tue, 2006-03-21 at 18:10 -0500, Abhinav Vishnu wrote:
> > > Owen,
> > >
> > > Thanks for your mail.
> > >
> > > We are looking forward to your experience with 0.9.7.
> > > Please keep up posted.
> > >
> > > Thanks,
> > >
> > > -- Abhinav
> > >
> > > -------------------------------
> > > Abhinav Vishnu,
> > > Graduate Research Associate,
> > > Department Of Comp. Sc. & Engg.
> > > The Ohio State University.
> > > -------------------------------
> > >
> > > On Tue, 21 Mar 2006, Owen Stampflee wrote:
> > >
> > > > Sorry for the lenghty delay, here's where I'm at currently:
> > > >
> > > > On Tue, 2006-02-21 at 23:14 -0500, Sayantan Sur wrote:
> > > > > Owen,
> > > > >
> > > > > Thanks for trying out mvapich-gen2. Sorry to know about your problems.
> > > > > Hopefully, we can resolve this issue quickly.
> > > > >
> > > > > > I get the following output running the cpi example...
> > > > > >
> > > > > > [root at m2 examples]# mpirun_rsh -np 1 localhost ./cpi
> > > > > > mpirun: executable version 0 does not match our version 2.
> > > > > > done.
> > > > >
> > > > > Could you tell us what happens if you do:
> > > > >
> > > > > $ mpirun_rsh -np 1 m2 ./cpi
> > > > Same result, application doesnt run.
> > > >
> > > > > > I'm using openib svn5411, and mvapich-gen2-1.0 with the 101, 104, 105,
> > > > > > 106. Oddly enough even with the recent 5411 version of openib, patch 103
> > > > > > (CQ creation) doesnt compile.
> > > > >
> > > > > If you've been following the OpenIB mailing list, then you must be aware
> > > > > of this. Sometime last October, the ibv_create_cq (which is the Gen2
> > > > > interface) verb arguments changed. To work around this interface change,
> > > > > we introduced patch #103 which uses the new verb by DEFAULT.
> > > > >
> > > > > So, if you are at patch level 106, then you do NOT need to specify
> > > > > -DGEN2_OLD_CQ_VERB. Just using the default mvapich.make.gcc should be
> > > > > enough.
> > > > >
> > > > > Just for clarification, can you send us the compilation failure you get
> > > > > with patch #106? Also, if you just download the integrated tarball from
> > > > > the mvapich-gen2 download page (instead of applying all patches by
> > > > > hand), do you still get the same results?
> > > > I'll follow this up in a 2nd email about my mvapich-0.97 compile results.
> > > >
> > > > > > I'm building everything in 32-bit mode, and using -D_IA32_ (those look
> > > > > > fairly sane but I could have missed something).
> > > > > >
> > > > > > All the OpenIB pingpong tests are fine, I'm really quite stumped on
> > > > > > where to go from here.
> > > > >
> > > > > Gen2 uses the lockable memory limits set by the system administrator. In
> > > > > order to use MVAPICH, you must set this parameter to `unlimited' or to a
> > > > > larger memory size so that MVAPICH is able to register communication
> > > > > buffers. This is common of all MPI and other higher level software on
> > > > > top of Gen2.
> > > > >
> > > > > There are three steps to setting up the lockable memory privileges for
> > > > > users:
> > > > >
> > > > > 1) In /etc/security/limits.conf: Add a line
> > > > >
> > > > > *               soft    memlock         unlimited
> > > > >
> > > > > 2) In /etc/init.d/sshd: Add a line
> > > > >
> > > > > ulimit -l unlimited
> > > > >
> > > > > 3) Restart sshd
> > > > >
> > > > > /etc/init.d/sshd restart
> > > > >
> > > > > All subsequent SSH sessions by users should have this new lockable
> > > > > memory limit set. To verify this, you can do:
> > > > >
> > > > > $ ssh node1 ulimit -l
> > > > >
> > > > > If this shows unlimited, then the setup was OK.
> > > > >
> > > > > Please let us know if this was able to resolve your problems.
> > > > Tried this as well, same result... application doesnt run. I'm currently attempting to get OpenIB 1.0rc1 running as well mvapich-0.97
> > > >
> > > > Cheers,
> > > > Owen
> > > >
> > > > _______________________________________________
> > > > mvapich-discuss mailing list
> > > > mvapich-discuss at cse.ohio-state.edu
> > > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> > > >
> > >
> > >
> > > 
> >
> 
> 
> !DSPAM:4420b26e179841771250337!



More information about the mvapich-discuss mailing list