[mvapich-discuss] collectives fail under mvapich2-1.0 (fwd)

Edmund Sumbar esumbar at ualberta.ca
Thu Sep 27 10:27:49 EDT 2007


Thanks for your prompt reply Wei...

wei huang wrote:
> We wonder on how many processes you carried out the tests? Are you using
> all 8 processors when running coll tests and SKaMPI tests?

The tests passed for the 2- and 4-cpu cases within a node.
The tests stall as soon as I go between nodes, for example,
using one cpu on each of two nodes.

Our nodes are running Linux kernel 2.6.21-smp, with the
associated Infiniband driver modules installed.  Please
let me know what other information you might need to
investigate this issue.


>> I'm trying to confirm my mvapich2-1.0 installation by running
>> SKaMPI (pt2pt, coll, onesided), Intel MPI Benchmark (MPI1 only)
>> and the mvapich2 coll tests.
>>
>> Running the mvapich2 coll tests in alphabetical order, I find
>> that it stalls on "icallgather," that is, no output after
>> several minutes and the next test is not run.  I also experience
>> stalling with the coll and onesided SKaMPI tests.  The IMB-MPI1
>> tests pass however.
>>
>> Mvapich2-1.0 was compiled using make.mvapich2.ofa (gcc 4.2.0).
>> The tests were run under Torque using mpiexec-0.81 between two
>> nodes (dual-socket, dual-core Opterons).
>>
>>    for test in allgather2 allgather3 ...; do
>>      mpiexec $test >out 2>err
>>    done
>>
>> No problem, when run within a node.
>>
>> Any ideas?


-- 
Ed[mund [Sumbar]]
AICT Research Support, Univ of Alberta


More information about the mvapich-discuss mailing list