[mvapich-discuss] vbuf problem

David Race drace at appro.com
Mon Sep 15 19:45:43 EDT 2008


Hello,

We are using mvapich2-1.2rc2 with a system that has four mellanox DDR interfaces in each computer and 16 cpus in each computer.  When we define

MV2_NUM_HCAS=4

we get a failure in line 230 of vbuf.c which indicates a failure in the following code

    for (i = 0; i < rdma_num_hcas; ++i)
    {
        reg->mem_handle[i] = ibv_reg_mr(
            ptag_save[i],
            vbuf_dma_buffer,
            nvbufs * rdma_vbuf_total_size,
            IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_WRITE);
        if (!reg->mem_handle[i])
        {
            fprintf(stderr, "[%s %d] Cannot register vbuf region\n", __FILE__, __LINE__);
            return -1;
        }
    }
We get this failure in as few as 289 processors, has someone run across this problem before?  Is there a suggested set of environment variables that might help prevent the failure?

Thanks

David Race, Ph.D.
Principle Engineer
Appro International, Inc.
25003 Pitkin Road, Suite F600
Spring, TX  77386
Phone:  469-212-4860
Email:   drace at appro.com



More information about the mvapich-discuss mailing list