[mvapich-discuss] vbuf problem
Lei Chai
chai.15 at osu.edu
Mon Sep 15 23:26:15 EDT 2008
Hi David,
Thanks for reporting the error. We have not tested it with 4 HCAs per node. Could you run the command "ulimit -l" on your system and let us know the output? If it's not "unlimited", please follow the instructions in the userguide section 9.3.4 (
http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.2rc2.html#x1-530009.3.4
) and set the limit to "unlimited" and try again.
If you still see the error, then may I ask you the following questions:
- Did you see the error with a benchmark or an application? And what benchmark/application is it?
- What configure/make/run-time options did you use?
- Do you see the error when using less than 4 HCAs?
These will help us get more insight into the problem.
Thanks,
Lei
David Race wrote:
> Hello,
>
> We are using mvapich2-1.2rc2 with a system that has four mellanox DDR interfaces in each computer and 16 cpus in each computer. When we define
>
> MV2_NUM_HCAS=4
>
> we get a failure in line 230 of vbuf.c which indicates a failure in the following code
>
> for (i = 0; i < rdma_num_hcas; ++i)
> {
> reg->mem_handle[i] = ibv_reg_mr(
> ptag_save[i],
> vbuf_dma_buffer,
> nvbufs * rdma_vbuf_total_size,
> IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_WRITE);
> if (!reg->mem_handle[i])
> {
> fprintf(stderr, "[%s %d] Cannot register vbuf region\n", __FILE__, __LINE__);
> return -1;
> }
> }
> We get this failure in as few as 289 processors, has someone run across this problem before? Is there a suggested set of environment variables that might help prevent the failure?
>
> Thanks
>
> David Race, Ph.D.
> Principle Engineer
> Appro International, Inc.
> 25003 Pitkin Road, Suite F600
> Spring, TX 77386
> Phone: 469-212-4860
> Email: drace at appro.com
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
More information about the mvapich-discuss
mailing list