[Mvapich-discuss] Got "Initial job has not accepted any resources" when run MPI4Spark
Paniraja Guptha, Akshay
panirajaguptha.1 at osu.edu
Thu May 23 09:32:26 EDT 2024
Hi,
Thanks for contacting us. We will take a look at the issue and get back to you.
-Akshay Paniraja Guptha
From: Mvapich-discuss <mvapich-discuss-bounces at lists.osu.edu> On Behalf Of ???(??) via Mvapich-discuss
Sent: Wednesday, May 22, 2024 10:55 PM
To: mvapich-discuss <mvapich-discuss at lists.osu.edu>
Subject: [Mvapich-discuss] Got "Initial job has not accepted any resources" when run MPI4Spark
Hello, I am very interested in the MPI4Spark framework and try to run it in a cluster with 4 nodes equipped with RoCE network. I ran mvapich2 and mvapich2-j examples successfully. Unfortunately,I got the “Initial job has not accepted any resources;
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
Report Suspicious <https://us-phishalarm-ewt.proofpoint.com/EWT/v1/KGKeukY!vUQTVyxhgsYqSlqbVSSu1IJRp4bxOiNDMCRJjN24QddHGhK8J-CpX_7SWvT_VGFjfEGiH6tyZ4jGywwiBXidU4KQbkJOIdsWawviVTOrnenjg66cutL4-AKILdzeMyfKY18$>
ZjQcmQRYFpfptBannerEnd
Hello, I am very interested in the MPI4Spark framework and try to run it in a cluster with 4 nodes equipped with RoCE network. I ran mvapich2 and mvapich2-j examples successfully. Unfortunately,I got the “Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources” warnings when I run MPI4Spark in both standalone and yarn mode. I read the worker.log and found that both workers got the job request but they seemed not to launch the executor. Is this problem caused by some misconfigurations? The detailed configuraions and logs are listed below. I will appreciate if someone could help me to figure it out. :)
Hostfile
mpi4spark000
mpi4spark001
mpi4spark002
mpi4spark003
spark-env.sh
export SPARK_HOME=/root/mpi4spark/mpi4spark-0.2-x86-bin
export SPARK_NO_DAEMONIZE=1
export SPARK_USE_MPI=1
export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:$MV2J_HOME
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$MV2J_HOME/lib
export SPARK_LIBRARY_PATH=$MV2J_HOME/lib
export JAVA_BINARY=$JAVA_HOME/bin
export WORK_DIR=$SPARK_HOME/exec-wdir
spark-defaults.conf
spark.executor.extraJavaOptions -Djava.library.path=$MV2J_HOME/lib
app.sh
./bin/spark-submit --master spark://$1:7077 --class org.apache.spark.examples.SparkPi examples/jars/spark-examples_2.12-3.3.0-SNAPSHOT.jar
Master.log
starting org.apache.spark.deploy.master.Master, logging to /root/mpi4spark/mpi4spark-0.2-x86-bin/logs/spark-root-org.apache.spark.deploy.master.Master-1-mpi4spark002.out
Spark Command: /root/mpi4spark/jdk1.8.0_321//bin/java -cp /root/mpi4spark/mpi4spark-0.2-x86-bin//conf/:/root/mpi4spark/mpi4spark-0.2-x86-bin/jars/*:/root/mpi4spark/hadoop-3.3.4//etc/hadoop/ -Xmx1g org.apache.spark.deploy.master.Master --host mpi4spark002 --port 7077 --webui-port 8080
========================================
2024-05-22 16:37:50,724 INFO master.Master: Started daemon with process name: 2621571 at mpi4spark002
2024-05-22 16:37:50,728 INFO util.SignalUtils: Registering signal handler for TERM
2024-05-22 16:37:50,728 INFO util.SignalUtils: Registering signal handler for HUP
2024-05-22 16:37:50,728 INFO util.SignalUtils: Registering signal handler for INT
2024-05-22 16:37:50,932 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2024-05-22 16:37:50,968 INFO spark.SecurityManager: Changing view acls to: root
2024-05-22 16:37:50,968 INFO spark.SecurityManager: Changing modify acls to: root
2024-05-22 16:37:50,968 INFO spark.SecurityManager: Changing view acls groups to:
2024-05-22 16:37:50,969 INFO spark.SecurityManager: Changing modify acls groups to:
2024-05-22 16:37:50,969 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
2024-05-22 16:37:53,395 INFO nio.NioEventLoop: Starting MPI4Spark.
2024-05-22 16:37:53,495 INFO util.Utils: Successfully started service 'sparkMaster' on port 7077.
2024-05-22 16:37:53,511 INFO master.Master: Starting Spark master at spark://mpi4spark002:7077
2024-05-22 16:37:53,514 INFO master.Master: Running Spark version 3.3.0-SNAPSHOT
2024-05-22 16:37:53,534 INFO util.log: Logging initialized @3209ms to org.sparkproject.jetty.util.log.Slf4jLog
2024-05-22 16:37:53,567 INFO server.Server: jetty-9.4.43.v20210629; built: 2021-06-30T11:07:22.254Z; git: 526006ecfa3af7f1a27ef3a288e2bef7ea9dd7e8; jvm 1.8.0_321-b07
2024-05-22 16:37:53,579 INFO server.Server: Started @3255ms
2024-05-22 16:37:53,605 INFO server.AbstractConnector: Started ServerConnector at 2efb59fc{HTTP/1.1<mailto:ServerConnector at 2efb59fc%7bHTTP/1.1>, (http/1.1)}{0.0.0.0:8080}
2024-05-22 16:37:53,605 INFO util.Utils: Successfully started service 'MasterUI' on port 8080.
2024-05-22 16:37:53,606 INFO ui.MasterWebUI: Bound MasterWebUI to 0.0.0.0, and started at http://mpi4spark002:-1<https://urldefense.com/v3/__http:/mpi4spark002:-1__;!!KGKeukY!3NXRmAG6eoLkZWP5gVYrEkoDx_fki5KEAjcJEOyLIUlOcmopE4jesI-LWd7MS8NM69GSnuqN3rBbunjqS0bJMG778G3nTxJ4si3J$>
2024-05-22 16:37:53,620 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 1332e5ce{/app,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 1332e5ce%7b/app,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:53,621 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 1f910681{/app/json,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 1f910681%7b/app/json,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:53,622 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 2a186645{/,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 2a186645%7b/,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:53,623 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 772642ad{/json,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 772642ad%7b/json,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:53,628 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 751cafb2{/static,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 751cafb2%7b/static,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:53,628 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 1b2adb3b{/app/kill,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 1b2adb3b%7b/app/kill,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:53,629 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 1d5107f3{/driver/kill,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 1d5107f3%7b/driver/kill,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:53,630 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 7e39159e{/workers/kill,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 7e39159e%7b/workers/kill,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:53,715 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 64ff458a{/metrics/master/json,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 64ff458a%7b/metrics/master/json,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:53,716 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 5b0b91df{/metrics/applications/json,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 5b0b91df%7b/metrics/applications/json,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:53,725 INFO master.Master: I have been elected leader! New state: ALIVE
2024-05-22 16:37:53,826 INFO master.Master: Registering worker 200.2.149.6:46337 with 64 cores, 502.3 GiB RAM
2024-05-22 16:37:53,831 INFO master.Master: Registering worker 200.2.149.2:40247 with 64 cores, 502.3 GiB RAM
2024-05-22 16:37:53,906 INFO master.Master: Registering app Spark Pi
2024-05-22 16:37:53,908 INFO master.Master: Registered app Spark Pi with ID app-20240522163753-0000
2024-05-22 16:37:53,923 INFO master.Master: Launching executor app-20240522163753-0000/0 on worker worker-20240522163753-200.2.149.2-40247
2024-05-22 16:37:53,927 INFO master.Master: Launching executor app-20240522163753-0000/1 on worker worker-20240522163753-200.2.149.6-46337
Worker0.log
starting org.apache.spark.deploy.worker.Worker, logging to /root/mpi4spark/mpi4spark-0.2-x86-bin/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-mpi4spark000.out
Spark Command: /root/mpi4spark/jdk1.8.0_321//bin/java -cp /root/mpi4spark/mpi4spark-0.2-x86-bin//conf/:/root/mpi4spark/mpi4spark-0.2-x86-bin/jars/*:/root/mpi4spark/hadoop-3.3.4//etc/hadoop/ -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://mpi4spark002:7077
========================================
2024-05-22 16:37:51,718 INFO worker.Worker: Started daemon with process name: 2807211 at mpi4spark000
2024-05-22 16:37:51,721 INFO util.SignalUtils: Registering signal handler for TERM
2024-05-22 16:37:51,722 INFO util.SignalUtils: Registering signal handler for HUP
2024-05-22 16:37:51,722 INFO util.SignalUtils: Registering signal handler for INT
2024-05-22 16:37:51,931 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2024-05-22 16:37:51,966 INFO spark.SecurityManager: Changing view acls to: root
2024-05-22 16:37:51,966 INFO spark.SecurityManager: Changing modify acls to: root
2024-05-22 16:37:51,966 INFO spark.SecurityManager: Changing view acls groups to:
2024-05-22 16:37:51,967 INFO spark.SecurityManager: Changing modify acls groups to:
2024-05-22 16:37:51,967 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
[mpi4spark000:mpi_rank_0][MPID_Init] [Performance Suggestion]: Application has requested for multi-thread capability. If allocating memory from different pthreads/OpenMP threads, please consider setting MV2_USE_ALIGNED_ALLOC=1 for improved performance.
Use MV2_USE_THREAD_WARNING=0 to suppress this error message.
2024-05-22 16:37:53,396 INFO nio.NioEventLoop: Starting MPI4Spark.
2024-05-22 16:37:53,496 INFO util.Utils: Successfully started service 'sparkWorker' on port 40247.
2024-05-22 16:37:53,497 INFO worker.Worker: Worker decommissioning not enabled.
2024-05-22 16:37:53,601 INFO worker.Worker: Starting Spark worker 200.2.149.2:40247 with 64 cores, 502.3 GiB RAM
2024-05-22 16:37:53,604 INFO worker.Worker: Running Spark version 3.3.0-SNAPSHOT
2024-05-22 16:37:53,605 INFO worker.Worker: Spark home: /root/mpi4spark/mpi4spark-0.2-x86-bin
2024-05-22 16:37:53,616 INFO resource.ResourceUtils: ==============================================================
2024-05-22 16:37:53,616 INFO resource.ResourceUtils: No custom resources configured for spark.worker.
2024-05-22 16:37:53,616 INFO resource.ResourceUtils: ==============================================================
2024-05-22 16:37:53,635 INFO util.log: Logging initialized @2311ms to org.sparkproject.jetty.util.log.Slf4jLog
2024-05-22 16:37:53,667 INFO server.Server: jetty-9.4.43.v20210629; built: 2021-06-30T11:07:22.254Z; git: 526006ecfa3af7f1a27ef3a288e2bef7ea9dd7e8; jvm 1.8.0_321-b07
2024-05-22 16:37:53,678 INFO server.Server: Started @2355ms
2024-05-22 16:37:53,708 INFO server.AbstractConnector: Started ServerConnector at 41cf10a2{HTTP/1.1<mailto:ServerConnector at 41cf10a2%7bHTTP/1.1>, (http/1.1)}{0.0.0.0:8081}
2024-05-22 16:37:53,708 INFO util.Utils: Successfully started service 'WorkerUI' on port 8081.
2024-05-22 16:37:53,710 INFO ui.WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://mpi4spark000:-1<https://urldefense.com/v3/__http:/mpi4spark000:-1__;!!KGKeukY!3NXRmAG6eoLkZWP5gVYrEkoDx_fki5KEAjcJEOyLIUlOcmopE4jesI-LWd7MS8NM69GSnuqN3rBbunjqS0bJMG778G3nT819oP5I$>
2024-05-22 16:37:53,722 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 8e107a9{/logPage,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 8e107a9%7b/logPage,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:53,724 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 4375cbe0{/logPage/json,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 4375cbe0%7b/logPage/json,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:53,724 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 5e18e1c2{/,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 5e18e1c2%7b/,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:53,726 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 6ef7379{/json,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 6ef7379%7b/json,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:53,731 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 548ada47{/static,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 548ada47%7b/static,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:53,731 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 4a02c874{/log,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 4a02c874%7b/log,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:53,732 INFO worker.Worker: Connecting to master mpi4spark002:7077...
2024-05-22 16:37:53,742 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 73e20207{/metrics/json,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 73e20207%7b/metrics/json,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:53,775 INFO client.TransportClientFactory: Successfully created connection to mpi4spark002/200.2.151.2:7077 after 31 ms (0 ms spent in bootstraps)
2024-05-22 16:37:53,837 INFO worker.Worker: Successfully registered with master spark://mpi4spark002:7077
2024-05-22 16:37:53,944 INFO worker.Worker: Asked to launch executor app-20240522163753-0000/0 for Spark Pi
2024-05-22 16:37:53,959 INFO spark.SecurityManager: Changing view acls to: root
2024-05-22 16:37:53,959 INFO spark.SecurityManager: Changing modify acls to: root
2024-05-22 16:37:53,959 INFO spark.SecurityManager: Changing view acls groups to:
2024-05-22 16:37:53,960 INFO spark.SecurityManager: Changing modify acls groups to:
2024-05-22 16:37:53,960 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
Worker1.log
starting org.apache.spark.deploy.worker.Worker, logging to /root/mpi4spark/mpi4spark-0.2-x86-bin/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-mpi4spark001.out
Spark Command: /root/mpi4spark/jdk1.8.0_321//bin/java -cp /root/mpi4spark/mpi4spark-0.2-x86-bin//conf/:/root/mpi4spark/mpi4spark-0.2-x86-bin/jars/*:/root/mpi4spark/hadoop-3.3.4//etc/hadoop/ -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://mpi4spark002:7077
========================================
2024-05-22 16:37:51,720 INFO worker.Worker: Started daemon with process name: 2883486 at mpi4spark001
2024-05-22 16:37:51,723 INFO util.SignalUtils: Registering signal handler for TERM
2024-05-22 16:37:51,724 INFO util.SignalUtils: Registering signal handler for HUP
2024-05-22 16:37:51,724 INFO util.SignalUtils: Registering signal handler for INT
2024-05-22 16:37:51,936 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2024-05-22 16:37:51,971 INFO spark.SecurityManager: Changing view acls to: root
2024-05-22 16:37:51,971 INFO spark.SecurityManager: Changing modify acls to: root
2024-05-22 16:37:51,972 INFO spark.SecurityManager: Changing view acls groups to:
2024-05-22 16:37:51,972 INFO spark.SecurityManager: Changing modify acls groups to:
2024-05-22 16:37:51,972 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
2024-05-22 16:37:53,397 INFO nio.NioEventLoop: Starting MPI4Spark.
2024-05-22 16:37:53,506 INFO util.Utils: Successfully started service 'sparkWorker' on port 46337.
2024-05-22 16:37:53,507 INFO worker.Worker: Worker decommissioning not enabled.
2024-05-22 16:37:53,616 INFO worker.Worker: Starting Spark worker 200.2.149.6:46337 with 64 cores, 502.3 GiB RAM
2024-05-22 16:37:53,619 INFO worker.Worker: Running Spark version 3.3.0-SNAPSHOT
2024-05-22 16:37:53,620 INFO worker.Worker: Spark home: /root/mpi4spark/mpi4spark-0.2-x86-bin
2024-05-22 16:37:53,632 INFO resource.ResourceUtils: ==============================================================
2024-05-22 16:37:53,632 INFO resource.ResourceUtils: No custom resources configured for spark.worker.
2024-05-22 16:37:53,632 INFO resource.ResourceUtils: ==============================================================
2024-05-22 16:37:53,651 INFO util.log: Logging initialized @2328ms to org.sparkproject.jetty.util.log.Slf4jLog
2024-05-22 16:37:53,685 INFO server.Server: jetty-9.4.43.v20210629; built: 2021-06-30T11:07:22.254Z; git: 526006ecfa3af7f1a27ef3a288e2bef7ea9dd7e8; jvm 1.8.0_321-b07
2024-05-22 16:37:53,696 INFO server.Server: Started @2373ms
2024-05-22 16:37:53,723 INFO server.AbstractConnector: Started ServerConnector at 679db676{HTTP/1.1<mailto:ServerConnector at 679db676%7bHTTP/1.1>, (http/1.1)}{0.0.0.0:8081}
2024-05-22 16:37:53,723 INFO util.Utils: Successfully started service 'WorkerUI' on port 8081.
2024-05-22 16:37:53,724 INFO ui.WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://mpi4spark001:-1<https://urldefense.com/v3/__http:/mpi4spark001:-1__;!!KGKeukY!3NXRmAG6eoLkZWP5gVYrEkoDx_fki5KEAjcJEOyLIUlOcmopE4jesI-LWd7MS8NM69GSnuqN3rBbunjqS0bJMG778G3nT8EPCLCb$>
2024-05-22 16:37:53,737 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 648bb0dd{/logPage,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 648bb0dd%7b/logPage,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:53,738 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 15e3912{/logPage/json,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 15e3912%7b/logPage/json,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:53,739 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 5f9f4fef{/,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 5f9f4fef%7b/,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:53,740 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 50387b20{/json,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 50387b20%7b/json,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:53,745 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 1117f451{/static,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 1117f451%7b/static,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:53,746 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 35f11bf7{/log,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 35f11bf7%7b/log,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:53,747 INFO worker.Worker: Connecting to master mpi4spark002:7077...
2024-05-22 16:37:53,757 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 113317dd{/metrics/json,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 113317dd%7b/metrics/json,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:53,785 INFO client.TransportClientFactory: Successfully created connection to mpi4spark002/200.2.151.2:7077 after 25 ms (0 ms spent in bootstraps)
2024-05-22 16:37:53,835 INFO worker.Worker: Successfully registered with master spark://mpi4spark002:7077
2024-05-22 16:37:53,946 INFO worker.Worker: Asked to launch executor app-20240522163753-0000/1 for Spark Pi
2024-05-22 16:37:53,961 INFO spark.SecurityManager: Changing view acls to: root
2024-05-22 16:37:53,961 INFO spark.SecurityManager: Changing modify acls to: root
2024-05-22 16:37:53,961 INFO spark.SecurityManager: Changing view acls groups to:
2024-05-22 16:37:53,961 INFO spark.SecurityManager: Changing modify acls groups to:
2024-05-22 16:37:53,961 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
app.log
2024-05-22 16:37:53,006 INFO spark.SparkContext: Running Spark version 3.3.0-SNAPSHOT
2024-05-22 16:37:53,043 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2024-05-22 16:37:53,116 INFO resource.ResourceUtils: ==============================================================
2024-05-22 16:37:53,116 INFO resource.ResourceUtils: No custom resources configured for spark.driver.
2024-05-22 16:37:53,117 INFO resource.ResourceUtils: ==============================================================
2024-05-22 16:37:53,117 INFO spark.SparkContext: Submitted application: Spark Pi
2024-05-22 16:37:53,133 INFO resource.ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
2024-05-22 16:37:53,141 INFO resource.ResourceProfile: Limiting resource is cpu
2024-05-22 16:37:53,141 INFO resource.ResourceProfileManager: Added ResourceProfile id: 0
2024-05-22 16:37:53,173 INFO spark.SecurityManager: Changing view acls to: root
2024-05-22 16:37:53,174 INFO spark.SecurityManager: Changing modify acls to: root
2024-05-22 16:37:53,174 INFO spark.SecurityManager: Changing view acls groups to:
2024-05-22 16:37:53,174 INFO spark.SecurityManager: Changing modify acls groups to:
2024-05-22 16:37:53,174 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
2024-05-22 16:37:53,395 INFO nio.NioEventLoop: Starting MPI4Spark.
2024-05-22 16:37:53,462 INFO util.Utils: Successfully started service 'sparkDriver' on port 45597.
2024-05-22 16:37:53,478 INFO spark.SparkEnv: Registering MapOutputTracker
2024-05-22 16:37:53,498 INFO spark.SparkEnv: Registering BlockManagerMaster
2024-05-22 16:37:53,509 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2024-05-22 16:37:53,510 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
2024-05-22 16:37:53,512 INFO spark.SparkEnv: Registering BlockManagerMasterHeartbeat
2024-05-22 16:37:53,524 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-4d5b7f26-5a1f-4c8c-9b8c-f60132a6f261
2024-05-22 16:37:53,538 INFO memory.MemoryStore: MemoryStore started with capacity 408.9 MiB
2024-05-22 16:37:53,548 INFO spark.SparkEnv: Registering OutputCommitCoordinator
2024-05-22 16:37:53,571 INFO util.log: Logging initialized @1227ms to org.sparkproject.jetty.util.log.Slf4jLog
2024-05-22 16:37:53,641 INFO server.Server: jetty-9.4.43.v20210629; built: 2021-06-30T11:07:22.254Z; git: 526006ecfa3af7f1a27ef3a288e2bef7ea9dd7e8; jvm 1.8.0_321-b07
2024-05-22 16:37:53,654 INFO server.Server: Started @1311ms
2024-05-22 16:37:53,680 INFO server.AbstractConnector: Started ServerConnector at 5225305d{HTTP/1.1<mailto:ServerConnector at 5225305d%7bHTTP/1.1>, (http/1.1)}{0.0.0.0:4040}
2024-05-22 16:37:53,680 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
2024-05-22 16:37:53,681 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://mpi4spark003:-1<https://urldefense.com/v3/__http:/mpi4spark003:-1__;!!KGKeukY!3NXRmAG6eoLkZWP5gVYrEkoDx_fki5KEAjcJEOyLIUlOcmopE4jesI-LWd7MS8NM69GSnuqN3rBbunjqS0bJMG778G3nT85cx_0v$>
2024-05-22 16:37:53,694 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 3fcdcf{/,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 3fcdcf%7b/,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:53,703 INFO spark.SparkContext: Added JAR file:/root/mpi4spark/mpi4spark-0.2-x86-bin/examples/jars/spark-examples_2.12-3.3.0-SNAPSHOT.jar at spark://mpi4spark003:45597/jars/spark-examples_2.12-3.3.0-SNAPSHOT.jar with timestamp 1716367073001
2024-05-22 16:37:53,818 INFO client.StandaloneAppClient$ClientEndpoint: Connecting to master spark://mpi4spark002:7077...
2024-05-22 16:37:53,853 INFO client.TransportClientFactory: Successfully created connection to mpi4spark002/200.2.151.2:7077 after 24 ms (0 ms spent in bootstraps)
2024-05-22 16:37:53,912 INFO cluster.StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20240522163753-0000
2024-05-22 16:37:53,915 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 36911.
2024-05-22 16:37:53,916 INFO netty.NettyBlockTransferService: Server created on mpi4spark003:36911
2024-05-22 16:37:53,916 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2024-05-22 16:37:53,920 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, mpi4spark003, 36911, None)
2024-05-22 16:37:53,922 INFO storage.BlockManagerMasterEndpoint: Registering block manager mpi4spark003:36911 with 408.9 MiB RAM, BlockManagerId(driver, mpi4spark003, 36911, None)
2024-05-22 16:37:53,923 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, mpi4spark003, 36911, None)
2024-05-22 16:37:53,924 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, mpi4spark003, 36911, None)
2024-05-22 16:37:53,929 INFO client.StandaloneAppClient$ClientEndpoint: Executor added: app-20240522163753-0000/0 on worker-20240522163753-200.2.149.2-40247 (200.2.149.2:40247) with 64 core(s)
2024-05-22 16:37:53,930 INFO cluster.StandaloneSchedulerBackend: Granted executor ID app-20240522163753-0000/0 on hostPort 200.2.149.2:40247 with 64 core(s), 1024.0 MiB RAM
2024-05-22 16:37:53,930 INFO client.StandaloneAppClient$ClientEndpoint: Executor added: app-20240522163753-0000/1 on worker-20240522163753-200.2.149.6-46337 (200.2.149.6:46337) with 64 core(s)
2024-05-22 16:37:54,020 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler at 3fcdcf{/,null,STOPPED, at Spark}<mailto:o.s.j.s.ServletContextHandler at 3fcdcf%7b/,null,STOPPED, at Spark%7d>
2024-05-22 16:37:54,021 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 28d79cba{/jobs,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 28d79cba%7b/jobs,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:54,021 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 29f0c4f2{/jobs/json,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 29f0c4f2%7b/jobs/json,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:54,022 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 13047d7d{/jobs/job,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 13047d7d%7b/jobs/job,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:54,022 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 65bb9029{/jobs/job/json,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 65bb9029%7b/jobs/job/json,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:54,022 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 2b214b94{/stages,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 2b214b94%7b/stages,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:54,023 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 49601f82{/stages/json,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 49601f82%7b/stages/json,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:54,023 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 24fabd0f{/stages/stage,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 24fabd0f%7b/stages/stage,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:54,024 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 61f3fbb8{/stages/stage/json,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 61f3fbb8%7b/stages/stage/json,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:54,024 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 432034a{/stages/pool,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 432034a%7b/stages/pool,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:54,025 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 60e5272{/stages/pool/json,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 60e5272%7b/stages/pool/json,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:54,025 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 69c93ca4{/storage,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 69c93ca4%7b/storage,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:54,025 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 173373b4{/storage/json,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 173373b4%7b/storage/json,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:54,026 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 60dd3c23{/storage/rdd,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 60dd3c23%7b/storage/rdd,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:54,026 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 5e9456ae{/storage/rdd/json,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 5e9456ae%7b/storage/rdd/json,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:54,026 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 1f1cae23{/environment,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 1f1cae23%7b/environment,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:54,027 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 985696{/environment/json,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 985696%7b/environment/json,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:54,027 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 215a34b4{/executors,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 215a34b4%7b/executors,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:54,027 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 35d3ab60{/executors/json,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 35d3ab60%7b/executors/json,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:54,028 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 71870da7{/executors/threadDump,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 71870da7%7b/executors/threadDump,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:54,028 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 45792847{/executors/threadDump/json,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 45792847%7b/executors/threadDump/json,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:54,034 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 4e25147a{/static,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 4e25147a%7b/static,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:54,034 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 5f303ecd{/,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 5f303ecd%7b/,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:54,035 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 25a73de1{/api,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 25a73de1%7b/api,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:54,035 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 260f2144{/jobs/job/kill,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 260f2144%7b/jobs/job/kill,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:54,036 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 51827393{/stages/stage/kill,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 51827393%7b/stages/stage/kill,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:54,038 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler at 7ec58feb{/metrics/json,null,AVAILABLE, at Spark}<mailto:o.s.j.s.ServletContextHandler at 7ec58feb%7b/metrics/json,null,AVAILABLE, at Spark%7d>
2024-05-22 16:37:54,038 INFO cluster.StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
2024-05-22 16:37:54,302 INFO spark.SparkContext: Starting job: reduce at SparkPi.scala:38
2024-05-22 16:37:54,311 INFO scheduler.DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 2 output partitions
2024-05-22 16:37:54,311 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
2024-05-22 16:37:54,311 INFO scheduler.DAGScheduler: Parents of final stage: List()
2024-05-22 16:37:54,312 INFO scheduler.DAGScheduler: Missing parents: List()
2024-05-22 16:37:54,314 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
2024-05-22 16:37:54,349 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 4.0 KiB, free 408.9 MiB)
2024-05-22 16:37:54,365 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 2.3 KiB, free 408.9 MiB)
2024-05-22 16:37:54,367 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on mpi4spark003:36911 (size: 2.3 KiB, free: 408.9 MiB)
2024-05-22 16:37:54,368 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1429
2024-05-22 16:37:54,376 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34) (first 15 tasks are for partitions Vector(0, 1))
2024-05-22 16:37:54,377 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks resource profile 0
2024-05-22 16:38:09,388 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2024-05-22 16:38:24,387 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2024-05-22 16:38:39,387 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2024-05-22 16:38:54,388 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2024-05-22 16:39:09,387 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2024-05-22 16:39:24,387 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2024-05-22 16:39:39,387 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2024-05-22 16:39:54,387 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2024-05-22 16:40:09,387 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2024-05-22 16:40:24,387 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2024-05-22 16:40:39,387 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2024-05-22 16:40:54,387 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2024-05-22 16:41:09,387 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2024-05-22 16:41:24,388 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2024-05-22 16:41:39,387 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20240523/1e60cd39/attachment-0002.html>
More information about the Mvapich-discuss
mailing list