[mvapich-discuss] Re: [mvapich] Announcing the release of MVAPICH2 1.8.1

Xing Wang xwang348 at wisc.edu
Fri Sep 28 23:28:14 EDT 2012


Hi, Karl and DK

Thanks so much for this helpful suggestions! I have some further questions and sincerely wish for your help. (Please excuse me if the questions are too silly since I'm new to sge )

For Karl
1. I also believe we're running mavapich2 with loose integration under SGE, since I just followed the standard 3 steps: "configure+make+make install" during installation. I also go for the first option since using "qdel" is inevitable in our group and we hope "qdel" cleans up all the process on corresponding nodes. 
So could you tell me more details about how to implement this mechanism in the epilogue process? Should I add some script to SGE, or change some configuration? If it's convenient, could you give me some examples that work in your situation? 

2. Will it help to solve this problem if I changed to "tight integration"? If so, could you tell me where I could find this guide/manual? I searched online but just got some "obsolete" pages. 
(http://arc.liv.ac.uk/SGE/howto/mvapich/MVAPICH_Integration.html ) 

For DK
1. We were trying to use mvapich2 1.8 and it's true when we use mpiexec.mpirun_rsh, "qdel" could clean up all the process successfully. However, we met with a more important problem: we could only assign jobs within ONE node. If we assign jobs to more than one (for example, require 48 processors, 24 processors/node), the jobs would still go to the first node and stuck there. Could you give any comments/advice on this? Is it possible that my "pe", "queue" or submit script are wrong? Please let me know which set-up you need to know and I could reply to you asap.

I'm really appreciating your kind help and suggestions, especially at Weekends.

Cheers,
Xing
--------------------------------------------
[root at turnbull ~]# qconf -sp mvapich2
pe_name mvapich2
slots 9999
user_lists NONE
xuser_lists NONE
start_proc_args /opt/gridengine/mpi/startmpi.sh $pe_hostfile
stop_proc_args NONE
allocation_rule $fill_up
control_slaves FALSE
job_is_first_task TRUE
urgency_slots min
accounting_summary TRUE
---------------------------------------------
[root at turnbull ~]# qconf -sq Vtest.q
qname Vtest.q
hostlist @VASPhosts
seq_no 0
load_thresholds np_load_avg=1.75
suspend_thresholds NONE
nsuspend 1
suspend_interval 00:05:00
priority 0
min_cpu_interval 00:05:00
processors UNDEFINED
qtype BATCH INTERACTIVE
ckpt_list NONE
pe_list make mpich mpi orte mvapich2
rerun FALSE
slots 1,[compute-1-0.local=24],[compute-1-1.local=24], \
 [compute-1-2.local=24],[compute-1-3.local=24], \
 [compute-1-4.local=24],[compute-1-5.local=24], \
 [compute-1-6.local=24],[compute-1-7.local=24], \
 [compute-1-8.local=24],[compute-1-9.local=24], \
 [compute-1-10.local=24],[compute-1-11.local=24], \
 [compute-1-12.local=24],[compute-1-13.local=24], \
 [compute-1-14.local=24],[compute-1-15.local=24]
tmpdir /tmp
shell /bin/bash
prolog NONE
epilog NONE
shell_start_mode posix_compliant
starter_method NONE
suspend_method NONE
resume_method NONE
terminate_method NONE
notify 00:00:60
owner_list NONE
user_lists NONE
xuser_lists NONE
subordinate_list NONE
complex_values NONE
projects NONE
xprojects NONE
calendar NONE
initial_state default
s_rt INFINITY
h_rt INFINITY
s_cpu INFINITY
h_cpu INFINITY
s_fsize INFINITY
h_fsize INFINITY
s_data INFINITY
h_data INFINITY
s_stack INFINITY
h_stack INFINITY
s_core INFINITY
h_core INFINITY
s_rss INFINITY
h_rss INFINITY
s_vmem INFINITY
h_vmem INFINITY
---------------------------------------------

On 12/09/28, Dhabaleswar Panda 
 wrote:
> The MVAPICH team is pleased to announce the release of MVAPICH2 1.8.1. This is a bug-fix release compared to MVAPICH2 1.8.
> 
> Bug-Fixes for MVAPICH2 1.8.1. (since MVAPICH2 1.8GA release) are listed below:
> 
> - Fix issue in intra-node knomial bcast
> - Handle gethostbyname return values gracefully
> - Fix corner case issue in two-level gather code path
> - Fix bug in CUDA events/streams pool management
> - Fix in GPU device pointer detection
> - Thanks to Brody Huval from Stanford for the report
> - Fix issue in selecting CUDA run-time variables when running on single
> node in SMP only mode
> - Fix ptmalloc initialization issue when MALLOC_CHECK_ is defined in the
> environment
> - Thanks to Mehmet Belgin from Georgia Institute of Technology for the
> report
> - Fix memory corruption and handle heterogeneous architectures in gather
> collective
> - Fix issue in detecting the correct HCA type
> - Fix issue in ring start-up to select correct HCA when MV2_IBA_HCA is
> specified
> - Fix SEGFAULT in MPI_Finalize when IB loop-back is used
> - Fix memory corruption on nodes with 64-cores
> - Thanks to M Xie for the report
> - Fix hang in MPI_Finalize with Nemesis interface when ptmalloc
> initialization fails
> - Thanks to Carson Holt from OICR for the report
> - Fix memory corruption in shared memory communication
> - Thanks to Craig Tierney from NOAA for the report and testing the
> patch
> - Fix issue in IB ring start-up selection with mpiexec.hydra
> - Option for selecting non-default gid-index in a loss-less fabric setup
> in RoCE mode
> - Improved error reporting
> - Option to disable signal handler setup
> 
> Most of these bug-fixes are also available with MVAPICH2 1.9a release. MVAPICH2 users are strongly requested to upgrade their 1.8 installations to 1.8.1 or 1.9a.
> 
> For downloading MVAPICH2 1.8.1, associated user guide, quick start guide, and accessing the SVN, please visit the following URL:
> 
> http://mvapich.cse.ohio-state.edu
> 
> All questions, feedbacks, bug reports, hints for performance tuning, patches and enhancements are welcome. Please post it to the mvapich-discuss mailing list (mvapich-discuss at cse.ohio-state.edu).
> 
> Thanks,
> 
> The MVAPICH Team
> _______________________________________________
> mvapich mailing list
> mvapich at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich

--
Sincerely, 
WANG, Xing

Graduate Student 
Department of Engineering Physics & 
Nuclear Engineering, UW-Madison
Room 137, 1509 University Ave.
Madison, WI, 53706 
(Cell)608-320-7086



More information about the mvapich-discuss mailing list