[mvapich-discuss] Problem with mvapich1.0 on job dead due to
mpi process wait too long on mpi_barrier ??
Krishna Chaitanya Kandalla
kandalla at cse.ohio-state.edu
Tue Nov 24 11:18:15 EST 2009
Terrence,
Thanks for reporting the problem. Can you provide some
more details about the failure? Are you seeing any error messages or can
you get hold of a backtrace? Can you also try running your job by
setting the flag VIADEV_USE_SHMEM_BARRIER=0 at run-time?
Thanks,
Krishna
On 11/24/2009 10:46 AM, Terrence.LIAO at total.com wrote:
>
> Dear Mvapich,
>
> Our code has poor loadbalance, in the final stage of the run, only
> one mpi process will continue to performance calculation, while others
> wait at mpi_barrier. We have encountered problem that run fail, when
> the wait is too long? do you have any advice on how to fix this problem?
>
> Thank you very much.
>
> -- Terrence
> --------------------------------------------------------
> Terrence Liao, Ph.D.
> Sr. HPC Research Scientist
> TOTAL E&P RESEARCH & TECHNOLOGY USA, LLC
> 1201 Louisiana, Suite 1800, Houston, TX 77002
> Tel: 713.647.3498 Fax: 713.647.3638
> Email: terrence.liao at total.com
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20091124/a4f5b0c6/attachment.html
More information about the mvapich-discuss
mailing list