[mvapich-discuss] Hard coded /tmp patch for shared memory files

Lei Chai chai.15 at osu.edu
Tue Jul 29 17:56:40 EDT 2008


Hi Adam,

Thank you for your feedbacks on using shared memory segments. They are 
helpful. We will investigate on the resource cleanup issue and before we 
find a single perfect solution we will keep both methods available.

Thanks,
Lei


Adam Moody wrote:
> Hi Lei,
> In practice, we found there are some disadvantages with using shared 
> memory segments, as well.  Some codes may seg fault or be killed early 
> by the user, which then leaves its shared memory segment orphaned.  
> Over time, the cluster runs into problems with resource exhaustion.  
> It's difficult to know which segments can be freed, especially on 
> nodes which may be running sereral jobs.  We encountered such problems 
> with another MPI implementation on a cluster which is cpu-scheduled, 
> such that each node may run multiple jobs at once.
>
> We don't see this problem when using files in /tmp, since they are 
> unlinked very soon after they are created (so that the OS will do the 
> cleanup) and before MPI returns control to the user application from 
> MPI_Init.  It may be good to keep both methods available.  I think 
> we'd prefer the /tmp files here.
> -Adam Moody
> Lawrence Livermore National Laboratory
>
>
> Lei Chai wrote:
>
>> Hi John,
>>
>> Thanks for reporting the problem and sending the patch to us. We have 
>> also realized the limitation, and have come up with a solution that 
>> does not require an actual file path for shared memory communication 
>> (by using shmget and shmat function calls, thanks to suggestions from 
>> TACC). The new solution will be available in the next mvapich2 release.
>>
>> Thanks again,
>> Lei
>>
>>
>> John Partridge wrote:
>>
>>> We recently had a customer issue with shared memory files being
>>> hard coded to /tmp. The circumstances were that the system was
>>> a diskless cluster with /tmp being an in memory files system.
>>>
>>> The /tmp file system was not large enough to support the shared
>>> memory files. So, the customer asked if we could make mvapich use
>>> an alternative path for the shared memory files.
>>>
>>> The version the customer is using is mvapich-0.9.9-1326 (from ofed-1.3)
>>> and  we produced a patch to get an alternative path via an environment
>>> variable. The patch is attached in case you might want to include it
>>> in a future release of mvapich/mvapich2
>>>
>>> Regards
>>> John
>>>
>>> ------------------------------------------------------------------------ 
>>>
>>>
>>> _______________________________________________
>>> mvapich-discuss mailing list
>>> mvapich-discuss at cse.ohio-state.edu
>>> http:// mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http:// mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>



More information about the mvapich-discuss mailing list