[mvapich-discuss] Announcing the Release of MVAPICH2 1.8RC1 and OSU Micro-Benchmarks (OMB) 3.5.2

Thu Mar 22 23:27:55 EDT 2012

Great. Thank you for the quick fix! It is working now.

Jens.

On Mar 22, 2012, at 8:10 PM, Devendar Bureddy wrote:

> Hi Jens
> 
> Thank you for letting us know this issue.  This is a corner case we
> missed in heterogeneous GPU configuration.  This issue should happen
> only when running with 3 processes in your configuration (GPU0 to
> IOH1, GPU1&2 to IOH2).  The attached patch should fix this issue.  Can
> you please try this patch and let us know if this works for you?
> 
> Please follow below instructions for applying the patch.
> 
> $ tar xf mvapich2-1.8rc1.tar.gz
> $ cd mvapich2-1.8rc1
> $ patch -p1 < diff.patch
> patching file src/mpid/ch3/channels/mrail/src/gen2/ibv_cuda_util.c
> $
> 
> Thanks
> Devendar
> 
> On Thu, Mar 22, 2012 at 3:14 PM, Jens Glaser <jglaser at umn.edu> wrote:
>> Hi,
>> 
>> I am having trouble using the new version of MVAPICH2 with CUDA support.
>> 
>> I am running on a host with 3 GPUs connected to two IO hubs (GPU0 to IOH1, GPU1&2 to IOH2), and MPI_Initialize hangs on this system when I run it with mpirun -np 3.
>> 
>> Details:
>> 
>> Configure line:
>> 
>> ./configure --prefix=/nics/d/home/jglaser/mpich2-install --enable-cuda --with-cuda-include=/sw/keeneland/cuda/4.1/linux_binary/include/ --with-cuda-libpath=/sw/keeneland/cuda/4.1/linux_binary/lib64 --enable-shared --with-ib-libpath=/usr/lib64/
>> 
>> Test program:
>> ================
>> #include <mpi.h>
>> #include <cuda_runtime.h>
>> #include <stdlib.h>
>> 
>> int main(int argc, char ** argv)
>>    {
>>    int rank;
>> 
>>    cudaSetDevice(atoi(getenv("MV2_COMM_WORLD_LOCAL_RANK")));
>>    printf("before init\n");
>>    MPI_Init(&argc,&argv);
>>    printf("after init");
>>    MPI_Finalize();
>>    printf("after finalize");
>>    }
>> ================
>> 
>> Compile with NVCC and appropriate options (obtained from mpicc -show)
>> 
>> Test program output
>> 
>> mpirun -np 3 ./mpitest
>> before init
>> before init
>> before init
>> Ctrl-C caught... cleaning up processes
>> (it hangs)
>> 
>> It works with two GPUs:
>> mpirun -np 2 ./mpitest
>> before init
>> before init
>> after init
>> after init
>> after finalize
>> after finalize
>> 
>> The last version of MVAPICH2 (1.8a2) did work without problems.
>> 
>> Any idea?
>> 
>> Thanks,
>> 
>> Jens
>> 
>> On Mar 22, 2012, at 12:21 PM, Dhabaleswar Panda wrote:
>> 
>>> The MVAPICH team is pleased to announce the release of MVAPICH2 1.8RC1
>>> and OSU Micro-Benchmarks (OMB) 3.5.2.
>>> 
>>> Features, Enhancements, and Bug Fixes for MVAPICH2 1.8RC1 are listed
>>> here.
>>> 
>>> * New Features and Enhancements (since 1.8a2):
>>> 
>>>    - New design for intra-node communication from GPU Device buffers
>>>      using CUDA IPC for better performance and correctness
>>>        - Thanks to Joel Scherpelz from NVIDIA for his suggestions
>>>    - Enabled shared memory communication for host transfers when CUDA is
>>>      enabled
>>>    - Optimized and tuned collectives for GPU device buffers
>>>    - Enhanced pipelined inter-node device transfers
>>>    - Enhanced shared memory design for GPU device transfers for
>>>      large messages
>>>    - Enhanced support for CPU binding with socket and numanode level
>>>      granularity
>>>    - Support suspend/resume functionality with mpirun_rsh
>>>    - Exporting local rank, local size, global rank and global size
>>>      through environment variables (both mpirun_rsh and hydra)
>>>    - Update to hwloc v1.4
>>>    - Checkpoint-Restart support in OFA-IB-Nemesis interface
>>>    - Enabling run-through stabilization support to handle process
>>>      failures in OFA-IB-Nemesis interface
>>>    - Enhancing OFA-IB-Nemesis interface to handle IB errors gracefully
>>>    - Performance tuning on various platforms
>>>    - Support for Mellanox IB FDR adapter
>>> 
>>> * Bug Fixes (since 1.8a2):
>>> 
>>>    - Fix a hang issue on InfiniHost SDR/DDR cards
>>>        - Thanks to Nirmal Seenu from Fermilab for the report
>>>    - Fix an issue with runtime parameter MV2_USE_COALESCE usage
>>>    - Fix an issue with LiMIC2 when CUDA is enabled
>>>    - Fix an issue with intra-node communication using datatypes and GPU
>>>      device buffers
>>>    - Fix an issue with Dynamic Process Management when launching
>>>      processes on multiple nodes
>>>        - Thanks to Rutger Hofman from VU Amsterdam for the report
>>>    - Fix build issue in hwloc source with mcmodel=medium flags
>>>        - Thanks to Nirmal Seenu from Fermilab for the report
>>>    - Fix a build issue in hwloc with --disable-shared or
>>>      --disabled-static options
>>>    - Use portable stdout and stderr redirection
>>>        - Thanks to Dr. Axel Philipp from MTU Aero Engines for the patch
>>>    - Fix a build issue with PGI 12.2
>>>        - Thanks to Thomas Rothrock from U.S. Army SMDC for the patch
>>>    - Fix an issue with send message queue in OFA-IB-Nemesis interface
>>>    - Fix a process cleanup issue in Hydra when MPI_ABORT is called
>>>      (upstream MPICH2 patch)
>>>    - Fix an issue with non-contiguous datatypes in MPI_Gather
>>>    - Fix a few memory leaks and warnings
>>> 
>>> Bugfixes for OSU Micro-Benchmarks (OMB) 3.5.2 is listed here.
>>> 
>>> * Bug Fix (since OMB 3.5.1):
>>>  - Fix typo which led to use of incorrect buffers
>>> 
>>> The complete set of features and enhancements for MVAPICH2 1.8RC1 compared
>>> to MVAPICH2 1.7 are as follows:
>>> 
>>> * Features & Enhancements:
>>>    - Support for MPI communication from NVIDIA GPU device memory
>>>        - High performance RDMA-based inter-node point-to-point
>>>          communication (GPU-GPU, GPU-Host and Host-GPU)
>>>        - High performance intra-node point-to-point communication for
>>>          multi-GPU adapters/node (GPU-GPU, GPU-Host and Host-GPU)
>>>        - Taking advantage of CUDA IPC (available in CUDA 4.1) in
>>>          intra-node communication for multiple GPU adapters/node
>>>        - Optimized and tuned collectives for GPU device buffers
>>>        - MPI datatype support for point-to-point and collective
>>>          communication from GPU device buffers
>>>    - Support suspend/resume functionality with mpirun_rsh
>>>    - Enhanced support for CPU binding with socket and numanode level
>>>      granularity
>>>    - Exporting local rank, local size, global rank and global size
>>>      through environment variables (both mpirun_rsh and hydra)
>>>    - Update to hwloc v1.4
>>>    - Checkpoint-Restart support in OFA-IB-Nemesis interface
>>>    - Enabling run-through stabilization support to handle process
>>>      failures in OFA-IB-Nemesis interface
>>>    - Enhancing OFA-IB-Nemesis interface to handle IB errors gracefully
>>>    - Performance tuning on various architecture clusters
>>>    - Support for Mellanox IB FDR adapter
>>>    - Adjust shared-memory communication block size at runtime
>>>    - Enable XRC by default at configure time
>>>    - New shared memory design for enhanced intra-node small message
>>>      performance
>>>    - Tuned inter-node and intra-node performance on different cluster
>>>      architectures
>>>    - Support for fallback to R3 rendezvous protocol if RGET fails
>>>    - SLURM integration with mpiexec.mpirun_rsh to use SLURM allocated
>>>      hosts without specifying a hostfile
>>>    - Support added to automatically use PBS_NODEFILE in Torque and PBS
>>>      environments
>>>    - Enable signal-triggered (SIGUSR2) migration
>>>    - Reduced memory footprint of the library
>>>    - Enhanced one-sided communication design with reduced memory
>>>      requirement
>>>    - Enhancements and tuned collectives (Bcast and Alltoallv)
>>>    - Flexible HCA selection with Nemesis interface
>>>        - Thanks to Grigori Inozemtsev, Queens University
>>>    - Support iWARP interoperability between Intel NE020 and
>>>      Chelsio T4 Adapters
>>>    - RoCE enable environment variable name is changed from MV2_USE_RDMAOE
>>>      to MV2_USE_RoCE
>>> 
>>> Sample performance numbers for MPI communication from NVIDIA GPU memory
>>> using MVAPICH2 1.8RC1 and OMB 3.5.2 can be obtained from the following
>>> URL:
>>> 
>>> http://mvapich.cse.ohio-state.edu/performance/gpu.shtml
>>> 
>>> For downloading MVAPICH2 1.8RC1, OMB 3.5.2, associated user guide, quick
>>> start guide, and accessing the SVN, please visit the following URL:
>>> 
>>> http://mvapich.cse.ohio-state.edu
>>> 
>>> All questions, feedbacks, bug reports, hints for performance tuning,
>>> patches and enhancements are welcome. Please post it to the
>>> mvapich-discuss mailing list (mvapich-discuss at cse.ohio-state.edu).
>>> 
>>> We are also happy to inform that the number of downloads from MVAPICH
>>> project site has crossed 100,000. The MVAPICH team extends thanks to all
>>> MVAPICH/MVAPICH2 users and their organizations.
>>> 
>>> Thanks,
>>> 
>>> The MVAPICH Team
>>> 
>>> 
>>> _______________________________________________
>>> mvapich-discuss mailing list
>>> mvapich-discuss at cse.ohio-state.edu
>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>> 
>> 
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> 
> 
> 
> -- 
> Devendar
> <diff.patch>