<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Aptos;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:12.0pt;
font-family:"Aptos",sans-serif;}
span.EmailStyle19
{mso-style-type:personal-reply;
font-family:"Aptos",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:11.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt">Hi Gabriel,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Thanks for contacting us. <o:p>
</o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">We are taking a look at this. We will get back to you once we have an update.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">-Akshay<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> Mvapich-discuss <mvapich-discuss-bounces+panirajaguptha.1=osu.edu@lists.osu.edu>
<b>On Behalf Of </b>GABRIEL SOTODOSOS MORALES via Mvapich-discuss<br>
<b>Sent:</b> Tuesday, June 11, 2024 6:57 AM<br>
<b>To:</b> mvapich-discuss@lists.osu.edu<br>
<b>Subject:</b> [Mvapich-discuss] Problems trying to run SparkPi example with MPI4Spark<o:p></o:p></span></p>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal" style="mso-line-height-alt:.75pt"><span style="font-size:1.0pt;color:white">Hi Mvapich-discuss, I´m trying to run the SparkPi example in my cluster using the Standalone Cluster Manager. However, my executor gets stuck when deploying the
tasks to the executors with the following message: "WARN TaskSchedulerImpl:</span><span style="font-size:1.0pt;font-family:"Arial",sans-serif;color:white"> </span><span style="font-size:1.0pt;color:white">
<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal" style="mso-line-height-alt:.75pt"><span style="font-size:1.0pt;color:white">ZjQcmQRYFpfptBannerStart<o:p></o:p></span></p>
</div>
<table class="MsoNormalTable" border="0" cellspacing="0" cellpadding="0" width="100%" style="width:100.0%;border-radius:4px">
<tbody>
<tr>
<td style="padding:12.0pt 0in 12.0pt 0in">
<table class="MsoNormalTable" border="1" cellspacing="0" cellpadding="0" width="100%" style="width:100.0%;background:#CFD3D7;border:none;border-top:solid #8C8E91 3.0pt">
<tbody>
<tr>
<td valign="top" style="border:none;padding:0in 7.5pt 3.75pt 4.5pt">
<table class="MsoNormalTable" border="0" cellspacing="0" cellpadding="0" align="left">
<tbody>
<tr>
<td style="padding:3.0pt 6.0pt 3.0pt 6.0pt">
<p class="MsoNormal"><b><span style="font-size:10.5pt;font-family:"Arial",sans-serif;color:black">This Message Is From an External Sender
<o:p></o:p></span></b></p>
</td>
</tr>
<tr>
<td style="padding:3.0pt 6.0pt 3.0pt 6.0pt">
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Arial",sans-serif;color:black">This message came from outside your organization.
<o:p></o:p></span></p>
</td>
</tr>
</tbody>
</table>
<table class="MsoNormalTable" border="0" cellspacing="0" cellpadding="0" align="right">
<tbody>
<tr>
<td style="padding:3.0pt 0in 3.0pt 0in">
<p class="MsoNormal"> <a href="https://us-phishalarm-ewt.proofpoint.com/EWT/v1/KGKeukY!vYQd06pBw4oBRdba98esFIKgpANwdl1S6IfscYZnr4apKTF-7DiT_5EL47mWLAkw8pHSoKk_PVw5cKpbRkacW39EBXiUPba68xgUEfe2bq4iOmtF4a-fk3bUBAywT6wJOZgobw$" target="_blank"><strong><span style="font-size:10.5pt;font-family:"Arial",sans-serif;color:black;border:solid #666666 1.0pt;padding:6.0pt;font-weight:normal;text-decoration:none"> Report Suspicious </span></strong></a>
<o:p></o:p></p>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<div>
<p class="MsoNormal" style="mso-line-height-alt:.75pt"><span style="font-size:1.0pt;color:white">ZjQcmQRYFpfptBannerEnd<o:p></o:p></span></p>
</div>
<p class="MsoNormal">Hi Mvapich-discuss,<o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">I´m trying to run the SparkPi example in my cluster using the Standalone Cluster Manager. However, my executor gets stuck when deploying the tasks to the executors with the following message:<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal"><i>"WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources"</i><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">I have followed the steps in the user guide, I don´t know if I did something wrong or if I missed something. With the same configuration in Spark, I can run the SparkPi example without problems.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">I am using MVAPICH-3.0 compiled as follows: <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">--prefix=/beegfs/home/javier.garciablas/gsotodos/bin_noref/mvapich/ --enable-threads=multiple --enable-romio --with-device=ch4:ofi:psm2 --with-libfabric=/opt/libfabric<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">And here are my configuration files:<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">spark-env.sh:<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">export SPARK_HOME=$HOME/mpi4spark-0.2-x86-bin<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">export SPARK_NO_DAEMONIZE=1<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:$MV2J_HOME<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$MV2J_HOME/lib<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">export SPARK_LIBRARY_PATH=$MV2J_HOME/lib<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">export JAVA_BINARY=$JAVA_HOME/bin<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">export WORK_DIR=$SPARK_HOME/exec-wdir<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">spark-defaults.conf:<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">spark.executor.extraJavaOptions -Djava.library.path=$HOME/mvapich2-j-2.3.7/lib<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">app.sh:<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">./bin/spark-submit --master spark://$1:7077 --class org.apache.spark.examples.SparkPi examples/jars/spark-examples_2.12-3.3.0-SNAPSHOT.jar 1024<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">sbin/start-mpi4spark.sh:<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">HOSTFILE=hostfile<br>
procs=`wc -l < ${HOSTFILE}`<br>
javac -cp $MV2J_HOME/lib/mvapich2-j.jar SparkMPI.java<br>
host=`tail -2 ${HOSTFILE} | head -1`<br>
<br>
{<br>
$MPILIB/bin/mpirun_rsh -export-all -np $procs -hostfile ${HOSTFILE} SLURM_JOB_ID=$SLURM_JOB_ID MV2_RNDV_PROTOCOL=RGET MV2_USE_RDMA_FAST_PATH=0 MV2_USE_COALESCE=0 MV2_SUPPORT_DPM=1 MV2_HOMOGENEOUS_CLUSTER=1 MV2_ENABLE_AFFINITY=0 LD_PRELOAD= $MPILIB/lib/libmpi.so
java -cp $MV2J_HOME/lib/mvapich2-j.jar:. -Djava.library.path=$MV2J_HOME/lib SparkMPI $host<br>
} >& exec.log<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">After launching sbin/start-mpi4spark.sh the master and workers nodes keep alive but the execution gets stuck as said before. Am I missing something? Thanks for the help in advance.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Best regads.<o:p></o:p></p>
</div>
<p class="MsoNormal">Gabriel.<o:p></o:p></p>
</div>
</body>
</html>