[mvapich-discuss] mvapich 0.9.7 vapi driver + SilverStorm drivers

Anton Starikov A.Starikov at utwente.nl
Fri Apr 7 15:59:05 EDT 2006


Thanks a lot for fast response!


Rimmer, Todd wrote:
> Troy & Co,
> 
> I have identified the issue.  The problem is in mpid/vapi/vapi_const.h:
> #define VIADEV_DEFAULT_QP_OUS_RD_ATOM         (8)
> 
> This value exceeds the capability reported by the HCA driver (which is 4), as such the HCA driver rejects the modify_qp call with an error.
> 
> This parameter can be set in the environment via: VIADEV_DEFAULT_QP_OUS_RD_ATOM
> Alternately vapi_const.h can be edited to change the default.
> 
> You can manually check the capabilities of the HCA via /proc/iba/mt*/*/capacities and other /proc entries in this directory.
> 
> Alternately MVAPICH could be adjusted to limit its parameters to the range of values supported, this could be done by adding code fragments after the EVAPI_get_hca_hndl call (near line 321 in viainit.c) such as:
>     {
>         VAPI_hca_vendor_t hca_vendor;
>         VAPI_hca_cap_t hca_cap;
> 
>         ret = VAPI_query_hca_cap(viadev.nic, &hca_vendor, &hca_cap);
>         if (VAPI_OK != ret)
>             vapidev_error_abort(VAPI_RETURN_ERR,
>                                 "Could not get HCA capabilities (%s)",
>                                 VAPI_strerror(ret));
>         if (viadev_default_qp_ous_rd_atom > hca_cap.max_qp_ous_rd_atom)
>         {
>             viadev_default_qp_ous_rd_atom = hca_cap.max_qp_ous_rd_atom;
>             fprintf(stderr, "reduced qp_ous_rd_atom to %d\n", viadev_default_qp_ous_rd_atom);
>         }
>     }
> 
> The SilverStorm distribution of MVAPICH does not have the issue because it is using the SilverStorm Communications manager to assist in the configuration of QPs, as such the settings are always within the range of the hardware's capabilities.
> 
> FYI, to aid debug of similar issues in the future, the SilverStorm HCA driver supports runtime adjustment of its logging.  By enabling warning output (echo "VpdDbg=0xf000000" > /proc/iba/mt*/config) you will get messages in /var/log/messages to help debug errors such as this.  In this case the message:
> 	Apr  7 14:39:50 duster kernel: TVpd: !WARNING! ResponderResources too large
> 
> For more information about the driver logging facilities:
> 	cat /proc/iba/mt*/config
> for stack level logging options and other facilities:
> 	cat /proc/iba/config
> 
> Todd Rimmer
> Chief Systems Architect             SilverStorm Technologies
> Voice: 610-233-4852                   Fax: 610-233-4777
> TRimmer at SilverStorm.com         www.SilverStorm.com
> 
> 
> 
>> -----Original Message-----
>> From: Rimmer, Todd 
>> Sent: Friday, April 07, 2006 2:37 PM
>> To: 'panda at cse.ohio-state.edu'; Troy Telford
>> Cc: Anton Starikov; mvapich-discuss at cse.ohio-state.edu
>> Subject: RE: [mvapich-discuss] mvapich 0.9.7 vapi driver + SilverStorm
>> drivers
>>
>>
>> Troy & Co.
>>
>> I saw your email and we will take a look into the issue and 
>> provide a response to this group.
>>
>> FYI, you should pursue any future SilverStorm stack questions 
>> through your SilverStorm support representative.
>>
>> While looking at this problem, I noticed MVAPICH 0.9.7 gets 
>> compile errors (unrelated to IB stack) in 
>> mpid/vapi/process/pmgr_client_mpirun_rsh.c near line 60:
>>
>> int pmgr_client_init(int *argc_p, char ***argv_p, int *np_p, 
>> int *me_p,
>>                      int *id_p, char ***processes_p)
>> {
>>     char *str;
>>     char *str_token;
>>     char **pmgr_processes;
>>     int i;
>>     setvbuf(stdout, NULL, _IONBF, 0);
>>     struct sockaddr_in sockaddr;
>>
>>     /* Get information from environment, not from the argument list */
>>
>> The declaration of a structure after making a function call 
>> is not permitted in ANSI C.
>>
>> This should be coded as:
>>
>> int pmgr_client_init(int *argc_p, char ***argv_p, int *np_p, 
>> int *me_p,
>>                      int *id_p, char ***processes_p)
>> {
>>     char *str;
>>     char *str_token;
>>     char **pmgr_processes;
>>     int i;
>>     struct sockaddr_in sockaddr;
>>
>>     setvbuf(stdout, NULL, _IONBF, 0);
>>     /* Get information from environment, not from the argument list */
>>
>> Todd Rimmer
>> Chief Systems Architect             SilverStorm Technologies
>> Voice: 610-233-4852                   Fax: 610-233-4777
>> TRimmer at SilverStorm.com         www.SilverStorm.com
>>
>>
>>> -----Original Message-----
>>> From: Dhabaleswar Panda [mailto:panda at cse.ohio-state.edu]
>>> Sent: Friday, April 07, 2006 1:10 PM
>>> To: Troy Telford
>>> Cc: Dhabaleswar Panda; Anton Starikov;
>>> mvapich-discuss at cse.ohio-state.edu; Rimmer, Todd
>>> Subject: Re: [mvapich-discuss] mvapich 0.9.7 vapi driver + 
>> SilverStorm
>>> drivers
>>>
>>>
>>> Hi Troy, 
>>>
>>> Thanks for chiming in and indicating that you are also 
>> seeing similar
>>> problems with 0.9.7 and SilverStorm drivers.
>>>
>>> It appears that SilverStorm's VAPI driver has different functions or
>>> data structures than Mellanox IBGD and this is leading to this
>>> incompatibility problem.
>>>
>>> Since SilverStorm people are running a version of MVAPICH (not sure
>>> which older version of MVAPICH is being used), they will be able to
>>> clearly indicate the required changes needed to make the latest
>>> MVAPICH 0.9.7 run with their proprietary latest driver. 
>> This will help
>>> SilverStorm's customers to take advantage of the latest features of
>>> MVAPICH.
>>>
>>> As I had indicated earlier, we do not have access to SilverStorm's
>>> stack. Thus, it is very hard for us to figure out what is going on
>>> here and what changes are needed.
>>>
>>> I am cc'ing this note to Mr. Todd Rimmer, Chief Systems 
>> Architect, of
>>> SilverStorm. 
>>>
>>> Mr. Rimmer: Could you please let all of us know about this
>>> incompatibility problem and how Silverstorm's users can solve it.
>>>
>>> Thanks, 
>>>
>>> DK
>>>
>>>
>>>
>>>
>>>> On Sun, 02 Apr 2006 21:11:23 -0600, Dhabaleswar Panda  
>>>> <panda at cse.ohio-state.edu> wrote:
>>>>
>>>>>>> Are you trying the gen2 device or vapi device? If you 
>>> are using the
>>>>>>> vapi device, I hope you are using the 
>>> ADAPTIVE_RDMA_FAST_PATH support
>>>>>>> (not the SRQ support which is available with gen2 
>>> device only). We are
>>>>>>> working on the SRQ support for vapi device and will be 
>>> available soon.
>>>>>> I use vapi device with ADAPTIVE_RDMA_FAST_PATH.
>>>>> Thanks for this information.
>>>>>
>>>>>>  > This discussion list has some members from 
>>> SilverStrom. May I request
>>>>>>> them to extend help on this and let us know why this 
>>> error is coming
>>>>>>> with the latest SilverStrom driver.
>>>>>>>
>>>>>>> If some other users are using the same driver 
>>> (3.2.0.0.25) driver from
>>>>>>> Silverstorm with mvapich 0.9.7, may I request them to 
>>> indicate whether
>>>>>>> they are seeing this error or not.
>>>>>>> [0] Abort: Could not modify qp to RTR (Invalid 
>>> Parameter) at line 935  
>>>>>> in
>>>>>>> file viainit.c
>>>> I just wanted to chime in and report that I too am seeing 
>>> this error with  
>>>> MVAPICH 0.9.7 and the SilverStorm 3.2.0.0.25 drivers.
>>>>
>>>> I'm heartened with the progress towards the 2nd generation 
>>> OpenIB stack,  
>>>> but sometimes it's not an option.
>>>>
>>>> Interestingly enough, SilverStorm seems to distribute their 
>>> own branch of  
>>>> MVAPICH (albeit an older one).  Binaries compiled with the 
>>> semi-functional  
>>>> MVAPICH (ie using SilverStorm drivers & libs) will run 
>> fine on the  
>>>> SilverStorm branch of MVAPICH.  (Running SS 
>>> MVAPICH-combiled binaries on  
>>>> MVAPICH 0.9.7 results in the same error as reported above.)
>>>>
>>>> I've also verified that I'm not running into a case of 
>>> mixed libraries  
>>>>  from MVAPICH and SilverStorm's MPI.  (Start with a fresh 
>>> OS install, then  
>>>> install the OS & MPI, test, wipe the system, start over.)
>>>> -- 
>>>> Troy Telford
>>>>
>>>
> 
> 



More information about the mvapich-discuss mailing list