[mvapich-discuss] Announcing the Release of MVAPICH2 1.8RC1 and
OSU Micro-Benchmarks (OMB) 3.5.2
Jens Glaser
jglaser at umn.edu
Thu Mar 22 23:27:55 EDT 2012
Great. Thank you for the quick fix! It is working now.
Jens.
On Mar 22, 2012, at 8:10 PM, Devendar Bureddy wrote:
> Hi Jens
>
> Thank you for letting us know this issue. This is a corner case we
> missed in heterogeneous GPU configuration. This issue should happen
> only when running with 3 processes in your configuration (GPU0 to
> IOH1, GPU1&2 to IOH2). The attached patch should fix this issue. Can
> you please try this patch and let us know if this works for you?
>
> Please follow below instructions for applying the patch.
>
> $ tar xf mvapich2-1.8rc1.tar.gz
> $ cd mvapich2-1.8rc1
> $ patch -p1 < diff.patch
> patching file src/mpid/ch3/channels/mrail/src/gen2/ibv_cuda_util.c
> $
>
> Thanks
> Devendar
>
> On Thu, Mar 22, 2012 at 3:14 PM, Jens Glaser <jglaser at umn.edu> wrote:
>> Hi,
>>
>> I am having trouble using the new version of MVAPICH2 with CUDA support.
>>
>> I am running on a host with 3 GPUs connected to two IO hubs (GPU0 to IOH1, GPU1&2 to IOH2), and MPI_Initialize hangs on this system when I run it with mpirun -np 3.
>>
>> Details:
>>
>> Configure line:
>>
>> ./configure --prefix=/nics/d/home/jglaser/mpich2-install --enable-cuda --with-cuda-include=/sw/keeneland/cuda/4.1/linux_binary/include/ --with-cuda-libpath=/sw/keeneland/cuda/4.1/linux_binary/lib64 --enable-shared --with-ib-libpath=/usr/lib64/
>>
>> Test program:
>> ================
>> #include <mpi.h>
>> #include <cuda_runtime.h>
>> #include <stdlib.h>
>>
>> int main(int argc, char ** argv)
>> {
>> int rank;
>>
>> cudaSetDevice(atoi(getenv("MV2_COMM_WORLD_LOCAL_RANK")));
>> printf("before init\n");
>> MPI_Init(&argc,&argv);
>> printf("after init");
>> MPI_Finalize();
>> printf("after finalize");
>> }
>> ================
>>
>> Compile with NVCC and appropriate options (obtained from mpicc -show)
>>
>> Test program output
>>
>> mpirun -np 3 ./mpitest
>> before init
>> before init
>> before init
>> Ctrl-C caught... cleaning up processes
>> (it hangs)
>>
>> It works with two GPUs:
>> mpirun -np 2 ./mpitest
>> before init
>> before init
>> after init
>> after init
>> after finalize
>> after finalize
>>
>> The last version of MVAPICH2 (1.8a2) did work without problems.
>>
>> Any idea?
>>
>> Thanks,
>>
>> Jens
>>
>> On Mar 22, 2012, at 12:21 PM, Dhabaleswar Panda wrote:
>>
>>> The MVAPICH team is pleased to announce the release of MVAPICH2 1.8RC1
>>> and OSU Micro-Benchmarks (OMB) 3.5.2.
>>>
>>> Features, Enhancements, and Bug Fixes for MVAPICH2 1.8RC1 are listed
>>> here.
>>>
>>> * New Features and Enhancements (since 1.8a2):
>>>
>>> - New design for intra-node communication from GPU Device buffers
>>> using CUDA IPC for better performance and correctness
>>> - Thanks to Joel Scherpelz from NVIDIA for his suggestions
>>> - Enabled shared memory communication for host transfers when CUDA is
>>> enabled
>>> - Optimized and tuned collectives for GPU device buffers
>>> - Enhanced pipelined inter-node device transfers
>>> - Enhanced shared memory design for GPU device transfers for
>>> large messages
>>> - Enhanced support for CPU binding with socket and numanode level
>>> granularity
>>> - Support suspend/resume functionality with mpirun_rsh
>>> - Exporting local rank, local size, global rank and global size
>>> through environment variables (both mpirun_rsh and hydra)
>>> - Update to hwloc v1.4
>>> - Checkpoint-Restart support in OFA-IB-Nemesis interface
>>> - Enabling run-through stabilization support to handle process
>>> failures in OFA-IB-Nemesis interface
>>> - Enhancing OFA-IB-Nemesis interface to handle IB errors gracefully
>>> - Performance tuning on various platforms
>>> - Support for Mellanox IB FDR adapter
>>>
>>> * Bug Fixes (since 1.8a2):
>>>
>>> - Fix a hang issue on InfiniHost SDR/DDR cards
>>> - Thanks to Nirmal Seenu from Fermilab for the report
>>> - Fix an issue with runtime parameter MV2_USE_COALESCE usage
>>> - Fix an issue with LiMIC2 when CUDA is enabled
>>> - Fix an issue with intra-node communication using datatypes and GPU
>>> device buffers
>>> - Fix an issue with Dynamic Process Management when launching
>>> processes on multiple nodes
>>> - Thanks to Rutger Hofman from VU Amsterdam for the report
>>> - Fix build issue in hwloc source with mcmodel=medium flags
>>> - Thanks to Nirmal Seenu from Fermilab for the report
>>> - Fix a build issue in hwloc with --disable-shared or
>>> --disabled-static options
>>> - Use portable stdout and stderr redirection
>>> - Thanks to Dr. Axel Philipp from MTU Aero Engines for the patch
>>> - Fix a build issue with PGI 12.2
>>> - Thanks to Thomas Rothrock from U.S. Army SMDC for the patch
>>> - Fix an issue with send message queue in OFA-IB-Nemesis interface
>>> - Fix a process cleanup issue in Hydra when MPI_ABORT is called
>>> (upstream MPICH2 patch)
>>> - Fix an issue with non-contiguous datatypes in MPI_Gather
>>> - Fix a few memory leaks and warnings
>>>
>>> Bugfixes for OSU Micro-Benchmarks (OMB) 3.5.2 is listed here.
>>>
>>> * Bug Fix (since OMB 3.5.1):
>>> - Fix typo which led to use of incorrect buffers
>>>
>>> The complete set of features and enhancements for MVAPICH2 1.8RC1 compared
>>> to MVAPICH2 1.7 are as follows:
>>>
>>> * Features & Enhancements:
>>> - Support for MPI communication from NVIDIA GPU device memory
>>> - High performance RDMA-based inter-node point-to-point
>>> communication (GPU-GPU, GPU-Host and Host-GPU)
>>> - High performance intra-node point-to-point communication for
>>> multi-GPU adapters/node (GPU-GPU, GPU-Host and Host-GPU)
>>> - Taking advantage of CUDA IPC (available in CUDA 4.1) in
>>> intra-node communication for multiple GPU adapters/node
>>> - Optimized and tuned collectives for GPU device buffers
>>> - MPI datatype support for point-to-point and collective
>>> communication from GPU device buffers
>>> - Support suspend/resume functionality with mpirun_rsh
>>> - Enhanced support for CPU binding with socket and numanode level
>>> granularity
>>> - Exporting local rank, local size, global rank and global size
>>> through environment variables (both mpirun_rsh and hydra)
>>> - Update to hwloc v1.4
>>> - Checkpoint-Restart support in OFA-IB-Nemesis interface
>>> - Enabling run-through stabilization support to handle process
>>> failures in OFA-IB-Nemesis interface
>>> - Enhancing OFA-IB-Nemesis interface to handle IB errors gracefully
>>> - Performance tuning on various architecture clusters
>>> - Support for Mellanox IB FDR adapter
>>> - Adjust shared-memory communication block size at runtime
>>> - Enable XRC by default at configure time
>>> - New shared memory design for enhanced intra-node small message
>>> performance
>>> - Tuned inter-node and intra-node performance on different cluster
>>> architectures
>>> - Support for fallback to R3 rendezvous protocol if RGET fails
>>> - SLURM integration with mpiexec.mpirun_rsh to use SLURM allocated
>>> hosts without specifying a hostfile
>>> - Support added to automatically use PBS_NODEFILE in Torque and PBS
>>> environments
>>> - Enable signal-triggered (SIGUSR2) migration
>>> - Reduced memory footprint of the library
>>> - Enhanced one-sided communication design with reduced memory
>>> requirement
>>> - Enhancements and tuned collectives (Bcast and Alltoallv)
>>> - Flexible HCA selection with Nemesis interface
>>> - Thanks to Grigori Inozemtsev, Queens University
>>> - Support iWARP interoperability between Intel NE020 and
>>> Chelsio T4 Adapters
>>> - RoCE enable environment variable name is changed from MV2_USE_RDMAOE
>>> to MV2_USE_RoCE
>>>
>>> Sample performance numbers for MPI communication from NVIDIA GPU memory
>>> using MVAPICH2 1.8RC1 and OMB 3.5.2 can be obtained from the following
>>> URL:
>>>
>>> http://mvapich.cse.ohio-state.edu/performance/gpu.shtml
>>>
>>> For downloading MVAPICH2 1.8RC1, OMB 3.5.2, associated user guide, quick
>>> start guide, and accessing the SVN, please visit the following URL:
>>>
>>> http://mvapich.cse.ohio-state.edu
>>>
>>> All questions, feedbacks, bug reports, hints for performance tuning,
>>> patches and enhancements are welcome. Please post it to the
>>> mvapich-discuss mailing list (mvapich-discuss at cse.ohio-state.edu).
>>>
>>> We are also happy to inform that the number of downloads from MVAPICH
>>> project site has crossed 100,000. The MVAPICH team extends thanks to all
>>> MVAPICH/MVAPICH2 users and their organizations.
>>>
>>> Thanks,
>>>
>>> The MVAPICH Team
>>>
>>>
>>> _______________________________________________
>>> mvapich-discuss mailing list
>>> mvapich-discuss at cse.ohio-state.edu
>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
>
> --
> Devendar
> <diff.patch>
More information about the mvapich-discuss
mailing list