[Mvapich-discuss] MPI Running Problem.

Shineman, Nat shineman.5 at osu.edu
Fri Jun 23 09:48:20 EDT 2023


Hi Ryan,

Based on the description of your setup you need to build with the ch3:sock​ device, or ch3:nemesis:tcp​ device. The default device for MVAPICH2 2.3.7 is ch3:mrail​. This device is suited for high performance IB networks. Since you are using a standard ethernet network, it will not work for your system. Please note that the MVAPICH team does not actively support standard ethernet networks, as our focus is on high performance networks, so most of the designs in the ch3:sock​ and ch3:nemesis:tcp​ devices will be those found in stock MPICH. You can build for these devices by running ./configure --device=ch3:nemesis. For more information, ​ please see section 4.11 of our user guide: https://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-userguide.html.

Thanks,
Nat
________________________________
From: Mvapich-discuss <mvapich-discuss-bounces at lists.osu.edu> on behalf of Lei, Ryan via Mvapich-discuss <mvapich-discuss at lists.osu.edu>
Sent: Thursday, June 22, 2023 15:41
To: mvapich-discuss at lists.osu.edu <mvapich-discuss at lists.osu.edu>
Subject: [Mvapich-discuss] MPI Running Problem.

Hello Sir/Madam, Nice to meet you guys. I am currently building a cluster with 2 slave nodes of jetson nano and 1 master node of jetson orin connected with ethernet. I have verified the passwordless ssh and all the connections, all the devices
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
<https://us-phishalarm-ewt.proofpoint.com/EWT/v1/KGKeukY!vYQd06jLTGauQXKQMMLL1yWuR-bceqPKHqTRduO9GSP06swa8enCqPeBug2UoURXdQL7Dyj--RHVMKdu_JtZ6A-lLDnLAb8rDz6tW0gMLXADa3ufcdiSe9z38UBpZc7tdBl5yg$>
Report Suspicious

ZjQcmQRYFpfptBannerEnd
Hello Sir/Madam,

Nice to meet you guys. I am currently building a cluster with 2 slave nodes of jetson nano and 1 master node of jetson orin connected with ethernet. I have verified the passwordless ssh and all the connections, all the devices ip have been set properly. The version of mvapich2 is 2.3.7. When I run "mpirun -np 3 -hostfile hostfile ./hello", I will face an error with no HCAs found and lead to an aborting job. Even if I set -iface to pick a specific interface eth0, it still just out this problem. I have tried "MV2_IBA_HCA=eth0 MV2_DEFAULT_PORT=0" and some other way to jump out the HCAs and IB, but non of them are working, do you have some idea about how to solve this problem?
[1.jpg]

Also, if I try to do "mpirun -np 2 -hosts 192.168.2.10,192.168.2.11(ip for two node) /home/reu/hello" , I will also face some problem and ask me to check the firewall, which is different port each time, I am also need some help with it.
[2.jpg]

Last, I can do rsh reu at 192.168.2.10<mailto:reu at 192.168.2.10> (or 11) mpirun -n 2 ./hello easily, but when I do rsh reu at 192.168.2.10<mailto:reu at 192.168.2.10>,reu at 192.168.2.11<mailto:reu at 192.168.2.11> mpirun ... at one command, it will ask me for password for some reason, what's the problem there?

Thanks for your time.

sincerely,
Ryan Lei




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20230623/57918b2f/attachment-0005.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1.jpg
Type: image/jpeg
Size: 180805 bytes
Desc: 1.jpg
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20230623/57918b2f/attachment-0010.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 2.jpg
Type: image/jpeg
Size: 191132 bytes
Desc: 2.jpg
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20230623/57918b2f/attachment-0011.jpg>


More information about the Mvapich-discuss mailing list