In case one MPI process is launched per GPU, and MPI_Send/MPI_Recv operation referencing GPU pointers happens to be between GPUs attached to the same machine, is MVAPICH able to recognize such situation, and utilize CUDA P2P transfers in this case? Thanks.