[Mvapich-discuss] Problems trying to run SparkPi example with MPI4Spark

GABRIEL SOTODOSOS MORALES gsotodos at pa.uc3m.es
Thu Jun 20 02:59:14 EDT 2024


Hi,

I´m having the same problems with MVAPICH2 2.3.7. I compiled this version
as follows:

--prefix=$HOME/gsotodos/bin_noref/mvapich2/ --enable-threads=multiple
--enable-romio --with-device=ch3:psm --with-libfabric=/opt/libfabric
--with-psm2=/opt/psm2/usr

Please, let me know if I can help you in any way.

Regards,
Gabriel.

El jue, 13 jun 2024 a las 19:06, Al Attar, Kinan via Mvapich-discuss (<
mvapich-discuss at lists.osu.edu>) escribió:

> Hi Gabriel,
>
> Can you please try running MPI4Spark with MVAPICH2 2.3.7? Please let us
> know if this version is working for you or not. Thanks.
>
> Regards,
> Kinan
>
> *From: *Mvapich-discuss <mvapich-discuss-bounces+shafi.16=
> osu.edu at lists.osu.edu> on behalf of GABRIEL SOTODOSOS MORALES via
> Mvapich-discuss <mvapich-discuss at lists.osu.edu>
> *Date: *Thursday, June 13, 2024 at 3:54 AM
> *To: *Paniraja Guptha, Akshay <panirajaguptha.1 at osu.edu>
> *Cc: *Announcement about MVAPICH (MPI over InfiniBand, RoCE, Omni-Path,
> Slingshot, iWARP and EFA) Libraries developed at NBCL/OSU <
> mvapich-discuss at lists.osu.edu>
> *Subject: *Re: [Mvapich-discuss] Problems trying to run SparkPi example
> with MPI4Spark
>
> Hi Akshay, Thank you so much for your help. I am testing new things with
> your library. If I can help you in any way, please let me know. I tried to
> start the spark cluster with the traditional script (. /sbin/start-all. sh)
> with the same result,
>
> Hi Akshay,
>
>
>
> Thank you so much for your help. I am testing new things with your
> library. If I can help you in any way, please let me know.
>
>
>
> I tried to start the spark cluster with the traditional script
> (./sbin/start-all.sh) with the same result, for any reason no workers seem
> to be available.
>
>
>
> On the other hand, I have downloaded the tarball you have available on the
> website, would you have a public repository where the source code is?
>
>
>
> Thanks again for your help. Best regards.
>
> Gabriel.
>
>
>
> El mié, 12 jun 2024 a las 18:42, Paniraja Guptha, Akshay (<
> panirajaguptha.1 at osu.edu>) escribió:
>
> ­­­Hi Gabriel,
>
> Thanks for contacting us.
>
> We are taking a look at this. We will get back to you once we have an
> update.
>
>
>
> -Akshay
>
>
>
> *From:* Mvapich-discuss <mvapich-discuss-bounces+panirajaguptha.1=
> osu.edu at lists.osu.edu> *On Behalf Of *GABRIEL SOTODOSOS MORALES via
> Mvapich-discuss
> *Sent:* Tuesday, June 11, 2024 6:57 AM
> *To:* mvapich-discuss at lists.osu.edu
> *Subject:* [Mvapich-discuss] Problems trying to run SparkPi example with
> MPI4Spark
>
>
>
> Hi Mvapich-discuss, I´m trying to run the SparkPi example in my cluster
> using the Standalone Cluster Manager. However, my executor gets stuck when
> deploying the tasks to the executors with the following message: "WARN
> TaskSchedulerImpl:
>
> Hi Mvapich-discuss,
>
>
>
> I´m trying to run the SparkPi example in my cluster using the Standalone
> Cluster Manager. However, my executor gets stuck when deploying the tasks
> to the executors with the following message:
>
>
>
> *"WARN TaskSchedulerImpl: Initial job has not accepted any resources;
> check your cluster UI to ensure that workers are registered and have
> sufficient resources"*
>
>
>
> I have followed the steps in the user guide, I don´t know if I did
> something wrong or if I missed something. With the same configuration in
> Spark, I can run the SparkPi example without problems.
>
>
>
> I am using MVAPICH-3.0 compiled as follows:
>
> --prefix=/beegfs/home/javier.garciablas/gsotodos/bin_noref/mvapich/
> --enable-threads=multiple --enable-romio --with-device=ch4:ofi:psm2
> --with-libfabric=/opt/libfabric
>
>
>
> And here are my configuration files:
>
> spark-env.sh:
>
> export SPARK_HOME=$HOME/mpi4spark-0.2-x86-bin
>
> export SPARK_NO_DAEMONIZE=1
>
> export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:$MV2J_HOME
>
> export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$MV2J_HOME/lib
>
> export SPARK_LIBRARY_PATH=$MV2J_HOME/lib
>
> export JAVA_BINARY=$JAVA_HOME/bin
>
> export WORK_DIR=$SPARK_HOME/exec-wdir
>
>
>
> spark-defaults.conf:
>
> spark.executor.extraJavaOptions
> -Djava.library.path=$HOME/mvapich2-j-2.3.7/lib
>
>
>
> app.sh:
>
> ./bin/spark-submit --master spark://$1:7077 --class
> org.apache.spark.examples.SparkPi
> examples/jars/spark-examples_2.12-3.3.0-SNAPSHOT.jar 1024
>
>
>
> sbin/start-mpi4spark.sh:
>
>
>
> HOSTFILE=hostfile
> procs=`wc -l < ${HOSTFILE}`
> javac -cp $MV2J_HOME/lib/mvapich2-j.jar SparkMPI.java
> host=`tail -2 ${HOSTFILE} | head -1`
>
> {
>  $MPILIB/bin/mpirun_rsh  -export-all -np $procs -hostfile ${HOSTFILE}
> SLURM_JOB_ID=$SLURM_JOB_ID MV2_RNDV_PROTOCOL=RGET MV2_USE_RDMA_FAST_PATH=0
> MV2_USE_COALESCE=0 MV2_SUPPORT_DPM=1 MV2_HOMOGENEOUS_CLUSTER=1
> MV2_ENABLE_AFFINITY=0 LD_PRELOAD= $MPILIB/lib/libmpi.so java -cp
> $MV2J_HOME/lib/mvapich2-j.jar:. -Djava.library.path=$MV2J_HOME/lib SparkMPI
> $host
>   } >& exec.log
>
>
>
> After launching sbin/start-mpi4spark.sh the master and workers nodes keep
> alive but the execution gets stuck as said before. Am I missing something?
> Thanks for the help in advance.
>
>
>
> Best regads.
>
> Gabriel.
>
> _______________________________________________
> Mvapich-discuss mailing list
> Mvapich-discuss at lists.osu.edu
> https://lists.osu.edu/mailman/listinfo/mvapich-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20240620/9894103a/attachment-0002.html>


More information about the Mvapich-discuss mailing list