[Mvapich-discuss] Problems trying to run SparkPi example with MPI4Spark
GABRIEL SOTODOSOS MORALES
gsotodos at pa.uc3m.es
Thu Jun 13 03:54:02 EDT 2024
Hi Akshay,
Thank you so much for your help. I am testing new things with your library.
If I can help you in any way, please let me know.
I tried to start the spark cluster with the traditional script
(./sbin/start-all.sh) with the same result, for any reason no workers seem
to be available.
On the other hand, I have downloaded the tarball you have available on the
website, would you have a public repository where the source code is?
Thanks again for your help. Best regards.
Gabriel.
El mié, 12 jun 2024 a las 18:42, Paniraja Guptha, Akshay (<
panirajaguptha.1 at osu.edu>) escribió:
> Hi Gabriel,
>
> Thanks for contacting us.
>
> We are taking a look at this. We will get back to you once we have an
> update.
>
>
>
> -Akshay
>
>
>
> *From:* Mvapich-discuss <mvapich-discuss-bounces+panirajaguptha.1=
> osu.edu at lists.osu.edu> *On Behalf Of *GABRIEL SOTODOSOS MORALES via
> Mvapich-discuss
> *Sent:* Tuesday, June 11, 2024 6:57 AM
> *To:* mvapich-discuss at lists.osu.edu
> *Subject:* [Mvapich-discuss] Problems trying to run SparkPi example with
> MPI4Spark
>
>
>
> Hi Mvapich-discuss, I´m trying to run the SparkPi example in my cluster
> using the Standalone Cluster Manager. However, my executor gets stuck when
> deploying the tasks to the executors with the following message: "WARN
> TaskSchedulerImpl:
>
> Hi Mvapich-discuss,
>
>
>
> I´m trying to run the SparkPi example in my cluster using the Standalone
> Cluster Manager. However, my executor gets stuck when deploying the tasks
> to the executors with the following message:
>
>
>
> *"WARN TaskSchedulerImpl: Initial job has not accepted any resources;
> check your cluster UI to ensure that workers are registered and have
> sufficient resources"*
>
>
>
> I have followed the steps in the user guide, I don´t know if I did
> something wrong or if I missed something. With the same configuration in
> Spark, I can run the SparkPi example without problems.
>
>
>
> I am using MVAPICH-3.0 compiled as follows:
>
> --prefix=/beegfs/home/javier.garciablas/gsotodos/bin_noref/mvapich/
> --enable-threads=multiple --enable-romio --with-device=ch4:ofi:psm2
> --with-libfabric=/opt/libfabric
>
>
>
> And here are my configuration files:
>
> spark-env.sh:
>
> export SPARK_HOME=$HOME/mpi4spark-0.2-x86-bin
>
> export SPARK_NO_DAEMONIZE=1
>
> export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:$MV2J_HOME
>
> export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$MV2J_HOME/lib
>
> export SPARK_LIBRARY_PATH=$MV2J_HOME/lib
>
> export JAVA_BINARY=$JAVA_HOME/bin
>
> export WORK_DIR=$SPARK_HOME/exec-wdir
>
>
>
> spark-defaults.conf:
>
> spark.executor.extraJavaOptions
> -Djava.library.path=$HOME/mvapich2-j-2.3.7/lib
>
>
>
> app.sh:
>
> ./bin/spark-submit --master spark://$1:7077 --class
> org.apache.spark.examples.SparkPi
> examples/jars/spark-examples_2.12-3.3.0-SNAPSHOT.jar 1024
>
>
>
> sbin/start-mpi4spark.sh:
>
>
>
> HOSTFILE=hostfile
> procs=`wc -l < ${HOSTFILE}`
> javac -cp $MV2J_HOME/lib/mvapich2-j.jar SparkMPI.java
> host=`tail -2 ${HOSTFILE} | head -1`
>
> {
> $MPILIB/bin/mpirun_rsh -export-all -np $procs -hostfile ${HOSTFILE}
> SLURM_JOB_ID=$SLURM_JOB_ID MV2_RNDV_PROTOCOL=RGET MV2_USE_RDMA_FAST_PATH=0
> MV2_USE_COALESCE=0 MV2_SUPPORT_DPM=1 MV2_HOMOGENEOUS_CLUSTER=1
> MV2_ENABLE_AFFINITY=0 LD_PRELOAD= $MPILIB/lib/libmpi.so java -cp
> $MV2J_HOME/lib/mvapich2-j.jar:. -Djava.library.path=$MV2J_HOME/lib SparkMPI
> $host
> } >& exec.log
>
>
>
> After launching sbin/start-mpi4spark.sh the master and workers nodes keep
> alive but the execution gets stuck as said before. Am I missing something?
> Thanks for the help in advance.
>
>
>
> Best regads.
>
> Gabriel.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20240613/3be28522/attachment-0002.html>
More information about the Mvapich-discuss
mailing list