[mvapich-discuss] mpirun_rsh -export quirks

Lockwood, Glenn glock at sdsc.edu
Fri Dec 13 16:34:08 EST 2013


Hi

It looks like "mpirun_rsh -export" does not overwrite environment variables that are already set in shell startup files (see src/pm/mpirun/environ.c:43 in mvapich2 1.9).  I wanted to raise the point here in case anyone else is running into this peculiar behavior since this quirk doesn't appear to be mentioned in the mvapich2 1.9 manual.

The issue arose because our system has a dual-rail configuration (two HCAs per host) that causes mvapich2 to hang unless we explicitly export MV2_IBA_HCA=mlx4_0 and MV2_NUM_HCAS=1.  We have this set in /etc/profile so that mvapich2 jobs are single-rail by default, but found that having users do something like

export MV2_IBA_HCA=mlx4_0:mlx4_1
export MV2_NUM_HCAS=2
mpirun_rsh -export -np X -hostfile Y ./a.out

would export everything EXCEPT the MV2_IBA_HCA and MV2_NUM_HCAS variables, causing single-rail behavior to persist.  This was the result of /etc/profile touching these variables before mpispawn got launched, preventing mpispawn from setting them to the correct values.

The obvious workaround was to explicitly pass these two variables, e.g., 

mpirun_rsh -export -np X -hostfile Y MV2_IBA_HCA=mlx4_0:mlx4_1 MV2_NUM_HCAS=2 ./a.out

which does work.  It seems like this conditional exporting of the job's submit environment might be worth documenting in the mvapich2 user manual though, as I would imagine many sites have default MV2_* variables set system-wide like we do.

Glenn


--
Glenn K. Lockwood, Ph.D.
User Services Group
San Diego Supercomputer Center
glock at sdsc.edu / (858) 246-1075





More information about the mvapich-discuss mailing list