[mvapich-discuss] Having problems installing mvapich2-gdr/2.2-4 for the PGI compiler

Hari Subramoni subramoni.1 at osu.edu
Mon Jul 10 13:04:44 EDT 2017


Hello,

We're building the Intel RPMs. They should be available in a couple of
days.

Let us take a look at it and get back to you soon.

Regards,
Hari.

On Jul 10, 2017 12:02 PM, "Raghu Reddy" <raghu.reddy at noaa.gov> wrote:

> Hi team,
>
>
>
> Here is our environment:
>
>
>
> -          Intel Haswell processors
>
> -          P100 GPUs (8 per node)
>
> -          Mellanox QDR IB
>
> -          RHEL 7.3
>
> -          Cuda/8.0
>
> -          Running stock ofed
>
>
>
> We have already installed the GNU version of the mvapich2-gdr library and
> using it with the Intel compiler (because the Intel versions are not yet
> available) and it is working fine. Thank you!
>
>
>
> Our users are now requesting for the version that works with the PGI
> compiler.  We have tried a couple of different downloads and have not been
> successful in getting it to work.
>
>
>
> For our initial testing, we are trying to test a simple MPI hello world
> program without involving the GPUs.
>
>
>
> Just for testing purposes, we do not want to install it in the standard
> location, so instead of using RPM commands to install, where using CPIO to
> install it in a known standard location.
>
>
>
> When we install the GNU version of the library, it works fine with the PGI
> compiler.
>
> When we install the PGI version of the library, we are unable to get MPI
> hello world code working with the PGI compiler.
>
>
>
> *Installing the GNU version of mvapich2-gdr:*
>
>
>
> rpm2cpio /home/admin/theia_software/mvapich2-gdr/mvapich2-gdr-2.2-
> 4.cuda8.0.stock.gnu4.8.5.el7.centos.x86_64.rpm | cpio -i -v -d –m
>
> mv opt opt-gnu-nomcast-nopbs
>
>
>
> *Installing the PGI version of mvapich2-gdr:*
>
>
>
> rpm2cpio /home/admin/theia_software/mvapich2-gdr/mvapich2-gdr-2.2-
> 4.cuda8.0.stock.pgi16.10.el7.centos.x86_64.rpm | cpio -i -v -d –m
>
> mv opt opt-pgi-nomcast-nopbs
>
>
>
> Having installed these two versions, I was trying to do a quick check with
> MPI hello world code using these two versions while compiling them with the
> PGI compilers (without involving the wrappers).
>
>
>
> When I did that, it works fine with the GNU version even when the code is
> compiled with PGI.
>
> But when I use the PGI version, it fails as shown below:
>
>
>
> *Using GNU version of mvapich2-gdr with the PGI compiler:*
>
>
>
> sg001% module purge
>
> sg001% module load pgi/17.5 cuda/8.0
>
> sg001% module load mvapich2-gdr/2.2-4-gnu-mcast-
> nopbs-rr
>
>
>
>
> sg001% echo $MPIROOT
>
> /tds_scratch3/SYSADMIN/nesccmgmt/Raghu.Reddy/apps/
> mvapich2/opt-gnu-mcast-nopbs/mvapich2/gdr/mcast/2.2/cuda8.
> 0/mpirun/gnu4.8.5
>
> sg001%
>
>
>
> sg001% pgcc -I$MPIROOT/include -L$MPIROOT/lib64 -lmpich hello_mpi_c.c
> -L$CUDALIBDIR -lcuda -lcudart
>
> sg001%
>
>
>
> sg001% env LD_PRELOAD=$MPIROOT/lib64/libmpi.so /apps/mvapich2-gdr/2.2-3/cuda8.0-intel/bin/mpirun
> -np 4 ./a.out
>
> Hello from rank 0 out of 4; procname = sg001
>
> Hello from rank 1 out of 4; procname = sg001
>
> Hello from rank 2 out of 4; procname = sg001
>
> Hello from rank 3 out of 4; procname = sg001
>
> sg001%
>
> sg001%
>
>
>
> *Using the PGI version of mvapich2-gdr with the PGI compiler:*
>
>
>
> sg001% module purge
>
> sg001% module load pgi/17.5 cuda/8.0
>
> sg001% module load mvapich2-gdr/2.2-4-pgi-mcast-
> nopbs-rr
>
>
> sg001%
>
>
>
> sg001% echo $MPIROOT
>
> /tds_scratch3/SYSADMIN/nesccmgmt/Raghu.Reddy/apps/
> mvapich2/opt-pgi-mcast-nopbs/mvapich2/gdr/mcast/2.2/cuda8.
> 0/mpirun/pgi16.10
>
> sg001%
>
>
>
> sg001% pgcc -I$MPIROOT/include -L$MPIROOT/lib64 -lmpich hello_mpi_c.c
> -L$CUDALIBDIR -lcuda -lcudart
>
> sg001%
>
>
>
> sg001% env LD_PRELOAD=$MPIROOT/lib64/libmpi.so /apps/mvapich2-gdr/2.2-3/cuda8.0-intel/bin/mpirun
> -np 4 ./a.out
>
> [sg001:mpi_rank_2][error_sighandler] Caught error: Segmentation fault
> (signal 11)
>
> [sg001:mpi_rank_1][error_sighandler] Caught error: Segmentation fault
> (signal 11)
>
> [sg001:mpi_rank_0][error_sighandler] Caught error: Segmentation fault
> (signal 11)
>
> [sg001:mpi_rank_3][error_sighandler] Caught error: Segmentation fault
> (signal 11)
>
>
>
> ============================================================
> =======================
>
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>
> =   PID 86087 RUNNING AT sg001
>
> =   EXIT CODE: 139
>
> =   CLEANING UP REMAINING PROCESSES
>
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>
> ============================================================
> =======================
>
> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault
> (signal 11)
>
> This typically refers to a problem with your application.
>
> Please see the FAQ page for debugging suggestions
>
> sg001%
>
>
>
> For completeness, it is the program that was used:
>
>
>
> sfe01% cat hello_mpi_c.c
>
> #include <stdio.h>
>
> #include <mpi.h>
>
>
>
> int main(int argc, char **argv)
>
> {
>
>    int ierr, myid, npes;
>
>    int len;
>
>    char name[MPI_MAX_PROCESSOR_NAME];
>
>
>
>    ierr = MPI_Init(&argc, &argv);
>
>
>
>    ierr = MPI_Comm_rank(MPI_COMM_WORLD, &myid);
>
>    ierr = MPI_Comm_size(MPI_COMM_WORLD, &npes);
>
>    ierr = MPI_Get_processor_name( name, &len );
>
>
>
>    printf("Hello from rank %d out of %d; procname = %s\n", myid, npes,
> name);
>
>
>
>    ierr = MPI_Finalize();
>
>
>
> }
>
> sfe01%
>
>
>
> Any suggestions on how to fix this problem?
>
>
>
> Also, I was wondering if there is an ETA for the Intel version?  As I have
> mentioned above, for the time being we are using the GNU version of the
> library with the Intel compiler and it is working fine.  We just used the
> module files from an earlier version of the library (for which we had an
> Intel download available) as a workaround for the FORTRAN 90 modules.
>
>
>
> Thanks,
>
> Raghu
>
>
>
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20170710/2dc3a59b/attachment-0001.html>


More information about the mvapich-discuss mailing list