[mvapich-discuss] Problem with mvapich1.0 on job dead due to mpi process wait too long on mpi_barrier ??

Krishna Chaitanya Kandalla kandalla at cse.ohio-state.edu
Tue Nov 24 11:18:15 EST 2009


Terrence,
                Thanks for reporting the problem. Can you provide some 
more details about the failure? Are you seeing any error messages or can 
you get hold of a backtrace? Can you also try running your job by 
setting the flag VIADEV_USE_SHMEM_BARRIER=0 at run-time?

Thanks,
Krishna





On 11/24/2009 10:46 AM, Terrence.LIAO at total.com wrote:
>
> Dear Mvapich,
>
> Our code has poor loadbalance,  in the final stage of the run,  only 
> one mpi process will continue to performance calculation, while others 
> wait at mpi_barrier.  We have encountered problem that run fail, when 
> the wait is too long?  do you have any advice on how to fix this problem?
>
> Thank you very much.
>
> -- Terrence
> --------------------------------------------------------
> Terrence Liao, Ph.D.
> Sr. HPC Research Scientist
> TOTAL E&P RESEARCH & TECHNOLOGY USA, LLC
> 1201 Louisiana, Suite 1800, Houston, TX 77002
> Tel: 713.647.3498  Fax: 713.647.3638
> Email: terrence.liao at total.com
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>    

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20091124/a4f5b0c6/attachment.html


More information about the mvapich-discuss mailing list