[Mvapich-discuss] Problems trying to run SparkPi example with MPI4Spark
Paniraja Guptha, Akshay
panirajaguptha.1 at osu.edu
Wed Jun 12 12:41:59 EDT 2024
Hi Gabriel,
Thanks for contacting us.
We are taking a look at this. We will get back to you once we have an update.
-Akshay
From: Mvapich-discuss <mvapich-discuss-bounces+panirajaguptha.1=osu.edu at lists.osu.edu> On Behalf Of GABRIEL SOTODOSOS MORALES via Mvapich-discuss
Sent: Tuesday, June 11, 2024 6:57 AM
To: mvapich-discuss at lists.osu.edu
Subject: [Mvapich-discuss] Problems trying to run SparkPi example with MPI4Spark
Hi Mvapich-discuss, I´m trying to run the SparkPi example in my cluster using the Standalone Cluster Manager. However, my executor gets stuck when deploying the tasks to the executors with the following message: "WARN TaskSchedulerImpl:
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
Report Suspicious <https://us-phishalarm-ewt.proofpoint.com/EWT/v1/KGKeukY!vYQd06pBw4oBRdba98esFIKgpANwdl1S6IfscYZnr4apKTF-7DiT_5EL47mWLAkw8pHSoKk_PVw5cKpbRkacW39EBXiUPba68xgUEfe2bq4iOmtF4a-fk3bUBAywT6wJOZgobw$>
ZjQcmQRYFpfptBannerEnd
Hi Mvapich-discuss,
I´m trying to run the SparkPi example in my cluster using the Standalone Cluster Manager. However, my executor gets stuck when deploying the tasks to the executors with the following message:
"WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources"
I have followed the steps in the user guide, I don´t know if I did something wrong or if I missed something. With the same configuration in Spark, I can run the SparkPi example without problems.
I am using MVAPICH-3.0 compiled as follows:
--prefix=/beegfs/home/javier.garciablas/gsotodos/bin_noref/mvapich/ --enable-threads=multiple --enable-romio --with-device=ch4:ofi:psm2 --with-libfabric=/opt/libfabric
And here are my configuration files:
spark-env.sh:
export SPARK_HOME=$HOME/mpi4spark-0.2-x86-bin
export SPARK_NO_DAEMONIZE=1
export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:$MV2J_HOME
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$MV2J_HOME/lib
export SPARK_LIBRARY_PATH=$MV2J_HOME/lib
export JAVA_BINARY=$JAVA_HOME/bin
export WORK_DIR=$SPARK_HOME/exec-wdir
spark-defaults.conf:
spark.executor.extraJavaOptions -Djava.library.path=$HOME/mvapich2-j-2.3.7/lib
app.sh:
./bin/spark-submit --master spark://$1:7077 --class org.apache.spark.examples.SparkPi examples/jars/spark-examples_2.12-3.3.0-SNAPSHOT.jar 1024
sbin/start-mpi4spark.sh:
HOSTFILE=hostfile
procs=`wc -l < ${HOSTFILE}`
javac -cp $MV2J_HOME/lib/mvapich2-j.jar SparkMPI.java
host=`tail -2 ${HOSTFILE} | head -1`
{
$MPILIB/bin/mpirun_rsh -export-all -np $procs -hostfile ${HOSTFILE} SLURM_JOB_ID=$SLURM_JOB_ID MV2_RNDV_PROTOCOL=RGET MV2_USE_RDMA_FAST_PATH=0 MV2_USE_COALESCE=0 MV2_SUPPORT_DPM=1 MV2_HOMOGENEOUS_CLUSTER=1 MV2_ENABLE_AFFINITY=0 LD_PRELOAD= $MPILIB/lib/libmpi.so java -cp $MV2J_HOME/lib/mvapich2-j.jar:. -Djava.library.path=$MV2J_HOME/lib SparkMPI $host
} >& exec.log
After launching sbin/start-mpi4spark.sh the master and workers nodes keep alive but the execution gets stuck as said before. Am I missing something? Thanks for the help in advance.
Best regads.
Gabriel.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20240612/93789a3c/attachment-0002.html>
More information about the Mvapich-discuss
mailing list