[mvapich-discuss] Issues running mvapich2 with slurm

Matthew Russell matthew.g.russell at gmail.com
Mon Oct 22 12:26:41 EDT 2012


Hi,

I'm not sure whether I should be posting this here or to a slurm mailing
list, figured I'd try here though.

I can't seem to run even simple hello_world executables with mvapich2 with
slurm.

I built mvapich2 to use slurm as it's PMI, i.e.

./configure --prefix=/cm/shared/apps/mvapich2/pgi/64/1.8 --with-pmi=slurm \
   --with-pm=no CPPFLAGS=-I/cm/shared/apps/slurm/2.4.2/slurm-2.4.2/ \
   LDFLAGS=-L/cm/shared/apps/slurm/2.4.2/lib/


When I try to run apps though, I get:
[matt at dena]~/cluster_tests% srun -n16 --mpi=none hello_mvapich2_slurm
srun: error: Unable to confirm allocation for job 14: Invalid job id
specified
srun: Check SLURM_JOB_ID environment variable for expired or invalid job.

When I first allocate cores, I get:
[matt at dena]~/cluster_tests% salloc -N 2
salloc: Granted job allocation 16
[matt at dena]~/cluster_tests% srun
 /home/matt/cluster_tests/hello_mvapich2_slurm
In: PMI_Abort(1, Fatal error in MPI_Init:
Other MPI error
)
In: PMI_Abort(1, Fatal error in MPI_Init:
Other MPI error
)
srun: Job step aborted: Waiting up to 2 seconds for job step to finish.
slurmd[dena2]: *** STEP 16.0 KILLED AT 2012-10-19T12:45:22 WITH SIGNAL 9 ***
srun: error: dena2: task 1: Exited with exit code 1
srun: error: dena1: task 0: Exited with exit code 1
slurmd[dena1]: *** STEP 16.0 KILLED AT 2012-10-19T12:45:22 WITH SIGNAL 9 ***
slurmd[dena2]: *** STEP 16.0 KILLED AT 2012-10-19T12:45:22 WITH SIGNAL 9 ***

I've searched around, but have had no luck finding solutions to either
problem.  I have no idea how to proceed.

Help?  Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20121022/35833d29/attachment.html


More information about the mvapich-discuss mailing list