[Mvapich-discuss] Problems trying to run SparkPi example with MPI4Spark
Al Attar, Kinan
alattar.2 at buckeyemail.osu.edu
Thu Jun 13 13:05:44 EDT 2024
Hi Gabriel,
Can you please try running MPI4Spark with MVAPICH2 2.3.7? Please let us know if this version is working for you or not. Thanks.
Regards,
Kinan
From: Mvapich-discuss <mvapich-discuss-bounces+shafi.16=osu.edu at lists.osu.edu> on behalf of GABRIEL SOTODOSOS MORALES via Mvapich-discuss <mvapich-discuss at lists.osu.edu>
Date: Thursday, June 13, 2024 at 3:54 AM
To: Paniraja Guptha, Akshay <panirajaguptha.1 at osu.edu>
Cc: Announcement about MVAPICH (MPI over InfiniBand, RoCE, Omni-Path, Slingshot, iWARP and EFA) Libraries developed at NBCL/OSU <mvapich-discuss at lists.osu.edu>
Subject: Re: [Mvapich-discuss] Problems trying to run SparkPi example with MPI4Spark
Hi Akshay, Thank you so much for your help. I am testing new things with your library. If I can help you in any way, please let me know. I tried to start the spark cluster with the traditional script (. /sbin/start-all. sh) with the same result,
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
Report Suspicious <https://us-phishalarm-ewt.proofpoint.com/EWT/v1/KGKeukY!vYQd06pBw4oBSdX73OJkWxk97QaxYQXWciJPQDXl_d_Uhs_cNMn5Jltjwq8NHayNpiqmthJJRbFYmw1sc2wfSimD7NcOs6WTkkJgjFna39sQvLXNYU5ViK5Y4fC2JuinJl5Mhw$>
ZjQcmQRYFpfptBannerEnd
Hi Akshay,
Thank you so much for your help. I am testing new things with your library. If I can help you in any way, please let me know.
I tried to start the spark cluster with the traditional script (./sbin/start-all.sh) with the same result, for any reason no workers seem to be available.
On the other hand, I have downloaded the tarball you have available on the website, would you have a public repository where the source code is?
Thanks again for your help. Best regards.
Gabriel.
El mié, 12 jun 2024 a las 18:42, Paniraja Guptha, Akshay (<panirajaguptha.1 at osu.edu<mailto:panirajaguptha.1 at osu.edu>>) escribió:
Hi Gabriel,
Thanks for contacting us.
We are taking a look at this. We will get back to you once we have an update.
-Akshay
From: Mvapich-discuss <mvapich-discuss-bounces+panirajaguptha.1=osu.edu at lists.osu.edu<mailto:osu.edu at lists.osu.edu>> On Behalf Of GABRIEL SOTODOSOS MORALES via Mvapich-discuss
Sent: Tuesday, June 11, 2024 6:57 AM
To: mvapich-discuss at lists.osu.edu<mailto:mvapich-discuss at lists.osu.edu>
Subject: [Mvapich-discuss] Problems trying to run SparkPi example with MPI4Spark
Hi Mvapich-discuss, I´m trying to run the SparkPi example in my cluster using the Standalone Cluster Manager. However, my executor gets stuck when deploying the tasks to the executors with the following message: "WARN TaskSchedulerImpl:
Hi Mvapich-discuss,
I´m trying to run the SparkPi example in my cluster using the Standalone Cluster Manager. However, my executor gets stuck when deploying the tasks to the executors with the following message:
"WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources"
I have followed the steps in the user guide, I don´t know if I did something wrong or if I missed something. With the same configuration in Spark, I can run the SparkPi example without problems.
I am using MVAPICH-3.0 compiled as follows:
--prefix=/beegfs/home/javier.garciablas/gsotodos/bin_noref/mvapich/ --enable-threads=multiple --enable-romio --with-device=ch4:ofi:psm2 --with-libfabric=/opt/libfabric
And here are my configuration files:
spark-env.sh:
export SPARK_HOME=$HOME/mpi4spark-0.2-x86-bin
export SPARK_NO_DAEMONIZE=1
export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:$MV2J_HOME
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$MV2J_HOME/lib
export SPARK_LIBRARY_PATH=$MV2J_HOME/lib
export JAVA_BINARY=$JAVA_HOME/bin
export WORK_DIR=$SPARK_HOME/exec-wdir
spark-defaults.conf:
spark.executor.extraJavaOptions -Djava.library.path=$HOME/mvapich2-j-2.3.7/lib
app.sh:
./bin/spark-submit --master spark://$1:7077 --class org.apache.spark.examples.SparkPi examples/jars/spark-examples_2.12-3.3.0-SNAPSHOT.jar 1024
sbin/start-mpi4spark.sh:
HOSTFILE=hostfile
procs=`wc -l < ${HOSTFILE}`
javac -cp $MV2J_HOME/lib/mvapich2-j.jar SparkMPI.java
host=`tail -2 ${HOSTFILE} | head -1`
{
$MPILIB/bin/mpirun_rsh -export-all -np $procs -hostfile ${HOSTFILE} SLURM_JOB_ID=$SLURM_JOB_ID MV2_RNDV_PROTOCOL=RGET MV2_USE_RDMA_FAST_PATH=0 MV2_USE_COALESCE=0 MV2_SUPPORT_DPM=1 MV2_HOMOGENEOUS_CLUSTER=1 MV2_ENABLE_AFFINITY=0 LD_PRELOAD= $MPILIB/lib/libmpi.so java -cp $MV2J_HOME/lib/mvapich2-j.jar:. -Djava.library.path=$MV2J_HOME/lib SparkMPI $host
} >& exec.log
After launching sbin/start-mpi4spark.sh the master and workers nodes keep alive but the execution gets stuck as said before. Am I missing something? Thanks for the help in advance.
Best regads.
Gabriel.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20240613/567f2846/attachment-0002.html>
More information about the Mvapich-discuss
mailing list