[mvapich-discuss] OpenFabrics

Sayantan Sur surs at cse.ohio-state.edu
Thu Aug 31 17:58:38 EDT 2006


Hello Kevin,

>This does not fix the problem.  Also, this patch makes the program
>vulnerable to misreporting bandwidths in the case when multiple
>processes are sending across one link.  Because timing is done locally
>rather than globally, without a barrier at the start of timing,
>processes can serialize and each report full bandwidth, and the
>benchmark will not perceive that the total completion time was doubled.
>  
>
Okay, the reason behind that patch was that the `other' processes would 
proceed to MPI_Reduce (and even further, since MPI_Reduce need not block 
for all processes). Since the two processes in the bw test enter 
MPI_Barrier(MPI_COMM_WORLD), the `other' processes waiting in this call 
can proceed and needn't wait till both these processes are done with the 
`find_bw' subroutine.

Anyways, since that doesn't solve the problem, we can forget about it 
for now.

>
>[root at hwlab-dhcp-228 OSU]# mpirun -np 2 -machinefile mpihosts
>./osu_latency
>
>
>I do not have hostnames currently attached to the ib0 addresses.  I'll
>look at doing this and see if it helps.
>  
>
Umm, could you tell us what happened? The test succeeded or failed with 
the same error as before? Also, please let us know if you see the same 
error with other well-known MPI benchmarks.

Thanks,
Sayantan.

-- 
http://www.cse.ohio-state.edu/~surs



More information about the mvapich-discuss mailing list