[mvapich-discuss] DAPL_MISMATCH error

Hoot Thompson hoot at ptpnow.com
Fri Oct 5 13:10:56 EDT 2012


=> mvapich2-1.8-r5435

=> [jhthomps at sriov-3 mvapich2]$ mpiname -a
MVAPICH2 1.8 Mon Apr 30 14:56:40 EDT 2012 ch3:mrail

Compilation
CC: 
/usr/local/other/utilities/intel/composer_xe_2013.0.079/bin/intel64/icc 
-DNDEBUG -DNVALGRIND -O2
CXX: 
/usr/local/other/utilities/intel/composer_xe_2013.0.079/bin/intel64/icpc 
-DNDEBUG -DNVALGRIND -O2
F77: 
/usr/local/other/utilities/intel/composer_xe_2013.0.079/bin/intel64/ifort -O2 

FC: 
/usr/local/other/utilities/intel/composer_xe_2013.0.079/bin/intel64/ifort -O2

Configuration
--with-rdma=gen2 --enable-fast --with-pm=mpd,hydra 
--prefix=/usr/local/other/utilities/mvapich2




=> mpiexec.hydra -n 24 -f ~/naspb/hosts-vm-2-13-12 ./xhpl_intel64


On 10/05/2012 11:15 AM, Devendar Bureddy wrote:
> On Fri, Oct 5, 2012 at 11:02 AM, Hoot Thompson <hoot at ptpnow.com> wrote:
>> You're right, the prior problem I had was indeed network related. This
>> problem has some of the same error characteristics but it's not proceeded by
>> the getaddrinfo error. Let me set the stage, this is actually a system of
>> four virtualized environments including virtualized IB (SR-IOV). The test
>> works fine between two of the VMs but not between the other two. All VMs
>> should be set up identically but something is obviously different. So if I
>> understand correctly the DAPL_MISMATCH message is not coming out of your
>> code. Any idea where it might be coming from?
>   Can you please provide following information
>       -  MVAPICH2 version and configure info ( "mpiname -a" output
> should be fine)
>       -  your run command with all runtime flags
>
> -Devendar
>
>
>> Thanks as always for your help
>>
>> Hoot
>>
>> -----Original Message-----
>> From: Devendar Bureddy [mailto:bureddy at cse.ohio-state.edu]
>> Sent: Thursday, October 04, 2012 9:14 PM
>> To: Hoot Thompson
>> Cc: mvapich-discuss at cse.ohio-state.edu
>> Subject: Re: [mvapich-discuss] DAPL_MISMATCH error
>>
>> Hi Hoot
>>
>> We are not sure what is going wrong here.  I couldn't see this symbol
>> in MVAPICH2 code base.  Can you please give more details about
>> mvapich2 version and configuration.   It seems that, you have reported
>> similar kind of issue couple of days back and it turned to be a
>> network setup issue.  Can you please make sure setup is fine.
>>
>> -Devendar
>> On Thu, Oct 4, 2012 at 6:39 PM, Hoot Thompson <hoot at ptpnow.com> wrote:
>>> Not sure of the origin of this error but it occurs when I increase the
>>> number of processes by spreading to an additional compute node executing a
>>> Linpack run….
>>>
>>> [cli_26]: Command cmd=put kvsname=kvs_4347_0 key=DAPL_MISMATCH valuie=1
>>>
>>> failed, reason=’duplicate _keyDAPL_MISMATCH’
>>>
>>> Any thoughts?
>>>
>>>
>>>
>>> _______________________________________________
>>> mvapich-discuss mailing list
>>> mvapich-discuss at cse.ohio-state.edu
>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>
>>
>>
>> --
>> Devendar
>>
>>
>
>



More information about the mvapich-discuss mailing list