[mvapich-discuss] some questions about mvapich-0.9.8

Abhinav Vishnu vishnu at cse.ohio-state.edu
Tue Apr 24 11:39:44 EDT 2007


Hi,

Thanks for using MVAPICH and reporting the problem to us. Can you also
verify that you are using MVAPICH or MVAPICH2.

> I'm developing a program for cluster based on mvapich-.0.9.8. I hope the MPI
> processes can return error completion status through the Completion Queue when
> they call ibv_poll_cq() and their peers aborted abnormally. But what I can see
> now is that they call ibv_poll_cq() again and again without any Error
> Completion.

I hope that i have understood your question correctly. In an MPI job,
some of the processes have aborted and other processes (which have not
aborted), should get an completion queue entry providing this status.

During our experimentation, we have seen that it actually happens, and 
the processes which have not aborted get an event and abort too. This
is also mentioned in our userguide at the following URL:

http://nowlab.cse.ohio-state.edu/projects/mpi-iba/mvapich_user_guide.html#x1-390007.2.6

Can you provide more information about your cluster. The more the better
:-). Please include InfiniBand HCAs type, what kind of switch
connectivity do you have, are you using mpirun_rsh or some other startup
client. In addition, please also let us know how exactly you are
creating the faults.

Thanks much,

:- Abhinav

> _______________________________________________
>  
> Can you guide me? I'm anxious to resolve this problem.
>  
>  
> 
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss



More information about the mvapich-discuss mailing list