[mvapich-discuss] OpenIB Presentation on MVAPICH/MVAPICH2 has beenlinked to the MVAPICH page

Choudhury, Durga Durga.Choudhury at drs-ss.com
Mon Feb 13 13:53:13 EST 2006


Hi All

Does anyone know any work on implementing (hardware) fault tolerance on
MPI? (i.e. node/connection failures in the middle of a calculation
should be handled in software). I downloaded some academic software that
claims to achieve this but I could not get it to run correctly even on a
Linux PC cluster (and our target hardware is a lot more sophisticated)

Has any work on this regard been done/being planned in Prof. Panda's
lab? Any pointers will be greatly appreciated.

Thanks
Durga

-----Original Message-----
From: mvapich-discuss-bounces at cse.ohio-state.edu
[mailto:mvapich-discuss-bounces at cse.ohio-state.edu] On Behalf Of
Dhabaleswar Panda
Sent: Friday, February 10, 2006 10:47 PM
To: mvapich-discuss at cse.ohio-state.edu
Cc: Dhabaleswar Panda
Subject: [mvapich-discuss] OpenIB Presentation on MVAPICH/MVAPICH2 has
beenlinked to the MVAPICH page

The presentation made at the OpenIB workshop in Sonoma is now linked
to the MVAPICH page. I have also sent the slides to Matt for it to be
linked to the main OpenIB page. 

This presentation describes the current status of MVAPICH and MVAPICH2
projects, latest performance numbers (such as SDR-DDR comparisons) and
upcoming features. Especially, the initial set of performance numbers
with the following upcoming features are included:

- SRQ with Flow control for scalability to multi-thousand nodes
   - basic performance benefits
   - reduced memory requirements as systems scale and performance
     benefits to applications

- Fault Tolerance features
   - Memory-to-memory reliability
   - Network-level Fault Tolerance with Automatic Path Migration (APM)
   - Process-level Fault Tolerance with Checkpoint-Restart

- Multi-threading Support

- Multi-network Support through uDAPL

- Adaptive Connection Management
   - On-demand based schemes for scalability to multi-thousand nodes

We are working towards rolling out these features in successive
MVAPICH and MVAPICH2 releases during the next 2-3 months.

Please feel free to take a look at the slides and let us know if you
have any comments or suggestions on these upcoming features.

Thanks, 

DK

_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu
http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss



More information about the mvapich-discuss mailing list