[mvapich-discuss] Can't run jobs on multiple nodes

Thu Aug 16 13:22:22 EDT 2012

It's good that the basic benchmark is working and your locked memory
limit isn't causing any problems when you're using an interactive shell.

There is one caveat here, its possible that the locked memory limit is
different when using the batch scheduling system.  Can you schedule a
batch job that simply runs ulimit -l on each node.  You should also try
osu_latency through the batch system as well.

Another thing that you can try is to run your original application with
mpirun_rsh directly and see if you get different results.

You normally do not need to uninstall MVAPICH2 when installing an
additional build.  However if the first installation is in the users
default PATH it could cause problems when they are trying to use the
other install.

I suggest the following types of installs...

## Production build ##
--prefix=/usr/local/mvapich2-<version>/production

## Debug build ##
--prefix=/usr/local/mvapich2-<version>/debug --enable-g=dbg --disable-fast

This way you can have both types of builds available at once.

On Thu, Aug 16, 2012 at 11:54:04AM -0500, Xing Wang wrote:
> Hi Jonathan
> 
> Thanks so much for the quick reply and kind help! Here is what I found:
> 
> 
> 1.I tested osu_latency between compute-0-3 and compute-04 and it seems OK. Detailed results:
> [testuser@*** osu-micro-benchmarks]$ mpirun_rsh -np 2 compute-0-3 compute-0-4 ./osu_latency
> # OSU MPI Latency Test v3.6
> # Size Latency (us)
> 0 1.27
> 1 1.36
> 2 1.41
> 4 1.33
> 8 1.30
> 16 1.29
> 32 1.31
> 64 1.34
> 128 1.49
> 256 2.16
> 512 2.37
> 1024 2.81
> 2048 3.70
> 4096 4.44
> 8192 6.29
> 16384 10.44
> 32768 14.72
> 65536 23.38
> 131072 40.67
> 262144 75.96
> 524288 145.59
> 1048576 288.11
> 2097152 565.41
> 4194304 1124.86
> 
> 
> 2. I typed in ulimit -l and ulimit -a on both compute-0-3 and compute-0-4, and here is the results:
> 
> 
> [testuser at compute-0-3 ~]$ ulimit -l
> unlimited
> [testuser at compute-0-3 ~]$ ulimit -a
> core file size (blocks, -c) 0
> data seg size (kbytes, -d) unlimited
> scheduling priority (-e) 0
> file size (blocks, -f) unlimited
> pending signals (-i) 257560
> max locked memory (kbytes, -l) unlimited
> max memory size (kbytes, -m) unlimited
> open files (-n) 1024
> pipe size (512 bytes, -p) 8
> POSIX message queues (bytes, -q) 819200
> real-time priority (-r) 0
> stack size (kbytes, -s) 10240
> cpu time (seconds, -t) unlimited
> max user processes (-u) 257560
> virtual memory (kbytes, -v) unlimited
> file locks (-x) unlimited
> 
> 
> 
> The information from the other node (compute-0-4) is exactly the same:
> 
> 
> [testuser at compute-0-4 ~]$ ulimit -l
> unlimited
> [testuser at compute-0-4 ~]$ ulimit -a
> core file size (blocks, -c) 0
> data seg size (kbytes, -d) unlimited
> scheduling priority (-e) 0
> file size (blocks, -f) unlimited
> pending signals (-i) 257560
> max locked memory (kbytes, -l) unlimited
> max memory size (kbytes, -m) unlimited
> open files (-n) 1024
> pipe size (512 bytes, -p) 8
> POSIX message queues (bytes, -q) 819200
> real-time priority (-r) 0
> stack size (kbytes, -s) 10240
> cpu time (seconds, -t) unlimited
> max user processes (-u) 257560
> virtual memory (kbytes, -v) unlimited
> file locks (-x) unlimited
> 
> 
> 
> 
> 
> 3. Thanks for the hint to use a debug build. Another silly question (excuse the new hand: ) : If I want to have a debug build and use GDB to find what's going on, do I need to uninstall the current mvapich2 and re-configure & make & make install it? I'm happy to do it, but just want to confirm. 
> 
> 
> Thanks so much for the help! Any comment/suggestions/questions are truly appreciated.
> 
> 
> --
> Sincerely, 
> Xing Wang
> 
> Graduate Student 
> Department of Engineering Physics 
> UW-Madison
> 1509 University Ave.
> Madison, WI, 53706 
> 
> 
>  
> 
> 
> 
> On 12/08/16, Jonathan Perkins 
>  wrote:
> > Hello, let's try seeing if a simple case works.
> > 
> > Does something basic like osu_latency work between two nodes? What does
> > ulimit -l show when run on the two nodes?
> > 
> > Also, a debug build of mvapich2 should provide more information in this
> > error case. In addition to --enable-g=dbg, I suggest adding
> > --disable-fast.
> > 
> > http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.8.html#x1-1120009.1.10
> > 
> > On Thu, Aug 16, 2012 at 10:12:53AM -0500, Xing Wang wrote:
> > > Hi All,
> > > 
> > > Thanks for reading the email. Currently I'm working on a new 44-nodes cluster. I guess my question is silly one but since I'm new to linux/mvapich2, your help/comment would be very helpful to me and sincerely appreciated.
> > > 
> > > 
> > > Problem situation:
> > > 
> > > 
> > > We want to run LAMMPS (a parallel computing software) on the new cluster. The MPI implementation is MVAPICH2-1.8 and batch-queuing system is Oracle Grid Engine (GE) 6.2u5. I've set up a queue and assign 2 compute nodes (compute-0-3 and compute-0-4, each node has 24 processors) to it. Before run LAMMPS, I tested MVAPICH2 and Grid Engine by submitting simple parallel script (free -m, inquire the memory on multiple nodes), it works very well. 
> > > 
> > > 
> > > Then I installed and run LAMMPS as a cluster user. If I run the jobs on multiple processors within a single node, it works very well. However, if I expand the job to two nodes (i.e. I require more than 24 nodes in the parallel submitting scripts), it got stuck and a error message appear as follows:
> > > 
> > > 
> > > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > > [cli_35]: aborting job:
> > > Fatal error in MPI_Init:
> > > Other MPI error
> > > [proxy:0:0 at compute-0-4.local] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:955): assert (!closed) failed
> > > [proxy:0:0 at compute-0-4.local] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> > > [proxy:0:0 at compute-0-4.local] main (./pm/pmiserv/pmip.c:226): demux engine error waiting for event
> > > [mpiexec at compute-0-4.local] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:69): one of the processes terminated badly; aborting
> > > [mpiexec at compute-0-4.local] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
> > > [mpiexec at compute-0-4.local] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:191): launcher returned error waiting for completion
> > > [mpiexec at compute-0-4.local] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion 
> > > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > > 
> > > Does anyone has similar experiences before? Your comment/help/suggestions would be really helpful. 
> > > 
> > > 
> > > Here is more information in case of need:
> > > 
> > > 
> > > 
> > > 
> > > 1.The parallel pe:
> > > pe_name mvapich2_test
> > > slots 9999
> > > user_lists NONE
> > > xuser_lists NONE
> > > start_proc_args /opt/gridengine/mpi/startmpi.sh $pe_hostfile
> > > stop_proc_args NONE
> > > allocation_rule $fill_up
> > > control_slaves TRUE
> > > job_is_first_task FALSE
> > > urgency_slots min
> > > accounting_summary FALSE
> > > 
> > > 
> > > 2. The queue set up:
> > > qname Ltest.q
> > > hostlist @LAMMPShosts
> > > seq_no 0
> > > load_thresholds np_load_avg=3.75
> > > suspend_thresholds NONE
> > > nsuspend 1
> > > suspend_interval 00:05:00
> > > priority 0
> > > min_cpu_interval 00:05:00
> > > processors UNDEFINED
> > > qtype BATCH INTERACTIVE
> > > ckpt_list NONE
> > > pe_list make mpich mpi orte mvapich2_test
> > > rerun FALSE
> > > slots 6,[compute-0-3.local=24],[compute-0-4.local=24]
> > > tmpdir /tmp
> > > shell /bin/bash
> > > prolog NONE
> > > epilog NONE
> > > shell_start_mode posix_compliant
> > > starter_method NONE
> > > suspend_method NONE
> > > resume_method NONE
> > > terminate_method NONE
> > > notify 00:00:60
> > > owner_list NONE
> > > user_lists NONE
> > > xuser_lists NONE
> > > subordinate_list NONE
> > > complex_values NONE
> > > projects NONE
> > > xprojects NONE
> > > calendar NONE
> > > initial_state default
> > > s_rt INFINITY
> > > h_rt INFINITY
> > > s_cpu INFINITY
> > > h_cpu INFINITY
> > > s_fsize INFINITY
> > > h_fsize INFINITY
> > > s_data INFINITY
> > > h_data INFINITY
> > > s_stack INFINITY
> > > h_stack INFINITY
> > > s_core INFINITY
> > > h_core INFINITY
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 3. The host file @LAMMPShosts:
> > > 
> > > 
> > > # qconf -shgrp @LAMMPShosts
> > > group_name @LAMMPShosts
> > > hostlist compute-0-3.local compute-0-4.local
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 4. The submitting script:
> > > #!/bin/bash
> > > #$ -N Lammps_test
> > > 
> > > 
> > > # request the queue for this job
> > > # for VASP test, replace <queue_name> with Vtest.q
> > > # for LAMMPS test, repalce <queue_name> with Ltest.q
> > > #$ -q Ltest.q
> > > 
> > > 
> > > # request computational resources for this job as follows
> > > # replace <num> below with the number of CPUs for the job
> > > # For Vtest.q, <num>=0~48; fro Ltest.q, <num>=0~48 
> > > #$ -pe mvapich2_test 36
> > > 
> > > 
> > > # request wall time (max is 96:00:00)
> > > #$ -l h_rt=48:00:00
> > > 
> > > 
> > > # run the job from the directory of submission.Uncomment only if you don't want the defults.
> > > #$ -cwd
> > > # combine SGE standard output and error files
> > > #$ -o $JOB_NAME.o$JOB_ID
> > > #$ -e $JOB_NAME.e$JOB_ID
> > > # transfer all your environment variables. Uncomment only if you don't want the defults
> > > #$ -V
> > > 
> > > 
> > > # Use full pathname to make sure we are using the right mpi
> > > MPI_HOME=/share/apps/mvapich2/1.8/intel_Composer_XE_12.2.137/bin
> > > ## $MPI_HOME/mpiexec -n $NSLOTS lammps-20Aug12/src/lmp_linux < in.poly > out.poly
> > > $MPI_HOME/mpiexec -n $NSLOTS lammps-20Aug12/src/lmp_linux < lammps-20Aug12/examples/crack/in.crack > out.crack
> > > 
> > > 
> > > 
> > > --
> > > Sincerely, 
> > > Xing Wang
> > > 
> > > Graduate Student 
> > > Department of Engineering Physics 
> > > UW-Madison
> > > Madison, WI, 53706
> > > _______________________________________________
> > > mvapich-discuss mailing list
> > > mvapich-discuss at cse.ohio-state.edu
> > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> > > 
> > 
> > -- 
> > Jonathan Perkins
> > http://www.cse.ohio-state.edu/~perkinjo
> 

-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo