[mvapich-discuss] Hard coded /tmp patch for shared memory files
Lei Chai
chai.15 at osu.edu
Tue Jul 29 17:56:40 EDT 2008
Hi Adam,
Thank you for your feedbacks on using shared memory segments. They are
helpful. We will investigate on the resource cleanup issue and before we
find a single perfect solution we will keep both methods available.
Thanks,
Lei
Adam Moody wrote:
> Hi Lei,
> In practice, we found there are some disadvantages with using shared
> memory segments, as well. Some codes may seg fault or be killed early
> by the user, which then leaves its shared memory segment orphaned.
> Over time, the cluster runs into problems with resource exhaustion.
> It's difficult to know which segments can be freed, especially on
> nodes which may be running sereral jobs. We encountered such problems
> with another MPI implementation on a cluster which is cpu-scheduled,
> such that each node may run multiple jobs at once.
>
> We don't see this problem when using files in /tmp, since they are
> unlinked very soon after they are created (so that the OS will do the
> cleanup) and before MPI returns control to the user application from
> MPI_Init. It may be good to keep both methods available. I think
> we'd prefer the /tmp files here.
> -Adam Moody
> Lawrence Livermore National Laboratory
>
>
> Lei Chai wrote:
>
>> Hi John,
>>
>> Thanks for reporting the problem and sending the patch to us. We have
>> also realized the limitation, and have come up with a solution that
>> does not require an actual file path for shared memory communication
>> (by using shmget and shmat function calls, thanks to suggestions from
>> TACC). The new solution will be available in the next mvapich2 release.
>>
>> Thanks again,
>> Lei
>>
>>
>> John Partridge wrote:
>>
>>> We recently had a customer issue with shared memory files being
>>> hard coded to /tmp. The circumstances were that the system was
>>> a diskless cluster with /tmp being an in memory files system.
>>>
>>> The /tmp file system was not large enough to support the shared
>>> memory files. So, the customer asked if we could make mvapich use
>>> an alternative path for the shared memory files.
>>>
>>> The version the customer is using is mvapich-0.9.9-1326 (from ofed-1.3)
>>> and we produced a patch to get an alternative path via an environment
>>> variable. The patch is attached in case you might want to include it
>>> in a future release of mvapich/mvapich2
>>>
>>> Regards
>>> John
>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>> _______________________________________________
>>> mvapich-discuss mailing list
>>> mvapich-discuss at cse.ohio-state.edu
>>> http:// mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http:// mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
More information about the mvapich-discuss
mailing list