[Mvapich-discuss] Troubles running the examples

Panda, Dhabaleswar panda at cse.ohio-state.edu
Sun Sep 24 10:36:26 EDT 2023


Hi,

Thanks for your note. Sorry to know that you are experiencing issues here.

I am cc'ing Kinan (one of the developers of the MPI4Spark project). He will follow-up with you to see what could be going on here.

Thanks,

DK

________________________________________
From: Mvapich-discuss <mvapich-discuss-bounces at lists.osu.edu> on behalf of Bengt Lennicke via Mvapich-discuss <mvapich-discuss at lists.osu.edu>
Sent: Sunday, September 24, 2023 10:21 AM
To: mvapich-discuss at lists.osu.edu
Subject: [Mvapich-discuss] Troubles running the examples

Hello, I am currently trying to install mpi4spark-0. 2-x86-bin and run the "GroupByTest" (6. 2. 1 in the user guide). I am running into the following error: exec. log: [ffmk-n3: mpispawn_0][readline] Unexpected End-Of-File on file descriptor
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
<https://us-phishalarm-ewt.proofpoint.com/EWT/v1/KGKeukY!vOQd8yZtA6cARRdxP2-Ed4o3zhqbsM51VvDYO-EHBae3XuNJ7weArIhwIiM1cppw3wez14CJsr4VZLT9uba6yz3vmV4OzObSI5n15tchxeGJ-ttnsOhAGEQ0pb7usmuFyXFWd6r_jwngCHw3RcPc$>
Report Suspicious

ZjQcmQRYFpfptBannerEnd
Hello,

I am currently trying to install mpi4spark-0.2-x86-bin and run the "GroupByTest" (6.2.1 in the user guide<http://hibd.cse.ohio-state.edu/static/media/hibd/mpi4spark/mpi4spark_user_guide.pdf>).

I am running into the following error:

exec.log:
[ffmk-n3:mpispawn_0][readline] Unexpected End-Of-File on file descriptor 6. MPI process died?
[ffmk-n3:mpispawn_0][mtpmi_processops] Error while reading PMI socket. MPI process died?
[ffmk-n3:mpispawn_0][child_handler] MPI process (rank: 0, pid: 2197) exited with status 130

app.log:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
23/09/24 13:28:29 INFO SparkContext: Running Spark version 3.3.0-SNAPSHOT
23/09/24 13:28:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/09/24 13:28:29 INFO ResourceUtils: ==============================================================
23/09/24 13:28:29 INFO ResourceUtils: No custom resources configured for spark.driver.
23/09/24 13:28:29 INFO ResourceUtils: ==============================================================
23/09/24 13:28:29 INFO SparkContext: Submitted application: GroupBy Test
23/09/24 13:28:29 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
23/09/24 13:28:29 INFO ResourceProfile: Limiting resource is cpu
23/09/24 13:28:29 INFO ResourceProfileManager: Added ResourceProfile id: 0
23/09/24 13:28:29 INFO SecurityManager: Changing view acls to: bengt
23/09/24 13:28:29 INFO SecurityManager: Changing modify acls to: bengt
23/09/24 13:28:29 INFO SecurityManager: Changing view acls groups to:
23/09/24 13:28:29 INFO SecurityManager: Changing modify acls groups to:
23/09/24 13:28:29 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(bengt); groups with view permissions: Set(); users  with modify permissions: Set(bengt); groups with modify permissions: Set()
[ffmk-n3:mpi_rank_0][rdma_find_network_type] Unable to find the numa process is bound to. Disabling process placement aware hca mapping.
[ffmk-n3:mpi_rank_0][MPID_Init] [Performance Suggestion]: Application has requested for multi-thread capability. If allocating memory from different pthreads/OpenMP threads, please consider setting MV2_USE_ALIGNED_ALLOC=1 for improved performance.
Use MV2_USE_THREAD_WARNING=0 to suppress this error message.
23/09/24 13:28:29 INFO NioEventLoop: Starting MPI4Spark.
23/09/24 13:28:29 INFO Utils: Successfully started service 'sparkDriver' on port 33211.
23/09/24 13:28:29 INFO SparkEnv: Registering MapOutputTracker
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/home/bengt/tmp/mpi4spark-0.2-x86-bin/jars/spark-unsafe_2.12-3.3.0-SNAPSHOT.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
23/09/24 13:28:29 INFO SparkEnv: Registering BlockManagerMaster
23/09/24 13:28:29 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
23/09/24 13:28:29 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
23/09/24 13:28:29 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
23/09/24 13:28:29 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-25cd1d9f-cb83-4262-a938-0057ad98f451
23/09/24 13:28:29 INFO MemoryStore: MemoryStore started with capacity 434.4 MiB
23/09/24 13:28:29 INFO SparkEnv: Registering OutputCommitCoordinator
23/09/24 13:28:30 INFO Utils: Successfully started service 'SparkUI' on port 4040.
23/09/24 13:28:30 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://<ffmk3-url>:-1<https://urldefense.com/v3/__http://*3Cffmk3-url*3E:-1__;JSU!!KGKeukY!1O8XjuysQLdi7c6JzYF2Zw2YcSnphLcw9QP-DKnQ-HnKS6fJMaN6sUZfl_YFva3lfE8wDWDs-5s8vdO-mHM-oR2dUgQ6SIyl4O6Q-OJJ7-idFg$>
23/09/24 13:28:30 INFO SparkContext: Added JAR file:/home/bengt/tmp/mpi4spark-0.2-x86-bin/examples/jars/spark-examples_2.12-3.3.0-SNAPSHOT.jar at spark://<ffmk3-url>:33211/jars/spark-examples_2.12-3.3.0-SNAPSHOT.jar with timestamp 1695562109112
23/09/24 13:28:30 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://ffmk-n3:7077...
23/09/24 13:28:30 WARN ChannelInitializer: Failed to initialize a channel. Closing: [id: 0x2430e197]
java.lang.NullPointerException
      at java.base/java.lang.String.contains(String.java:2036)
      at org.apache.spark.network.TransportContext.initializePipeline(TransportContext.java:221)
      at org.apache.spark.network.TransportContext.initializePipeline(TransportContext.java:194)
      at org.apache.spark.network.client.TransportClientFactory$1.initChannel(TransportClientFactory.java:302)
      at org.apache.spark.network.client.TransportClientFactory$1.initChannel(TransportClientFactory.java:299)
      at io.netty.channel.ChannelInitializer.initChannel(ChannelInitializer.java:129)
      at io.netty.channel.ChannelInitializer.handlerAdded(ChannelInitializer.java:112)
      at io.netty.channel.AbstractChannelHandlerContext.callHandlerAdded(AbstractChannelHandlerContext.java:943)
      at io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(DefaultChannelPipeline.java:609)
      at io.netty.channel.DefaultChannelPipeline.access$100(DefaultChannelPipeline.java:46)
      at io.netty.channel.DefaultChannelPipeline$PendingHandlerAddedTask.execute(DefaultChannelPipeline.java:1463)
      at io.netty.channel.DefaultChannelPipeline.callHandlerAddedForAllHandlers(DefaultChannelPipeline.java:1115)
      at io.netty.channel.DefaultChannelPipeline.invokeHandlerAddedIfNeeded(DefaultChannelPipeline.java:650)
      at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:528)
      at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:443)
      at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:500)
      at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
      at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469)
      at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:559)
      at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
      at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
      at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
      at java.base/java.lang.Thread.run(Thread.java:834)
23/09/24 13:28:30 WARN StandaloneAppClient$ClientEndpoint: Failed to connect to master ffmk-n3:7077
org.apache.spark.SparkException: Exception thrown in awaitResult:
      at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
      at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
      at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
      at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109)
      at org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint$$anon$1.run(StandaloneAppClient.scala:126)
      at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
      at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
      at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
      at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
      at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.io.IOException: Failed to connect to ffmk-n3/141.76.48.47:7077
      at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:316)
      at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:246)
      at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:258)
      at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:204)
      at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:202)
      at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:198)
      ... 4 more
Caused by: io.netty.channel.StacklessClosedChannelException
      at io.netty.channel.AbstractChannel$AbstractUnsafe.ensureOpen(ChannelPromise)(Unknown Source)
23/09/24 13:28:50 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://ffmk-n3:7077...
23/09/24 13:28:50 WARN ChannelInitializer: Failed to initialize a channel. Closing: [id: 0x564d0fc7]
java.lang.NullPointerException
      at java.base/java.lang.String.contains(String.java:2036)
      at org.apache.spark.network.TransportContext.initializePipeline(TransportContext.java:221)
      at org.apache.spark.network.TransportContext.initializePipeline(TransportContext.java:194)
      at org.apache.spark.network.client.TransportClientFactory$1.initChannel(TransportClientFactory.java:302)
      at org.apache.spark.network.client.TransportClientFactory$1.initChannel(TransportClientFactory.java:299)
      at io.netty.channel.ChannelInitializer.initChannel(ChannelInitializer.java:129)
      at io.netty.channel.ChannelInitializer.handlerAdded(ChannelInitializer.java:112)
      at io.netty.channel.AbstractChannelHandlerContext.callHandlerAdded(AbstractChannelHandlerContext.java:943)
      at io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(DefaultChannelPipeline.java:609)
      at io.netty.channel.DefaultChannelPipeline.access$100(DefaultChannelPipeline.java:46)
      at io.netty.channel.DefaultChannelPipeline$PendingHandlerAddedTask.execute(DefaultChannelPipeline.java:1463)
      at io.netty.channel.DefaultChannelPipeline.callHandlerAddedForAllHandlers(DefaultChannelPipeline.java:1115)
      at io.netty.channel.DefaultChannelPipeline.invokeHandlerAddedIfNeeded(DefaultChannelPipeline.java:650)
      at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:528)
      at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:443)
      at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:500)
      at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
      at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469)
      at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:559)
      at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
      at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
      at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
      at java.base/java.lang.Thread.run(Thread.java:834)
23/09/24 13:28:50 WARN StandaloneAppClient$ClientEndpoint: Failed to connect to master ffmk-n3:7077
org.apache.spark.SparkException: Exception thrown in awaitResult:
      at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
      at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
      at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
      at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109)
      at org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint$$anon$1.run(StandaloneAppClient.scala:126)
      at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
      at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
      at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
      at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
      at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.io.IOException: Failed to connect to ffmk-n3/141.76.48.47:7077
      at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:316)
      at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:246)
      at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:258)
      at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:204)
      at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:202)
      at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:198)
      ... 4 more
Caused by: io.netty.channel.StacklessClosedChannelException
      at io.netty.channel.AbstractChannel$AbstractUnsafe.ensureOpen(ChannelPromise)(Unknown Source)
23/09/24 13:29:10 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://ffmk-n3:7077...
23/09/24 13:29:10 WARN ChannelInitializer: Failed to initialize a channel. Closing: [id: 0x2ee6c860]
java.lang.NullPointerException
      at java.base/java.lang.String.contains(String.java:2036)
      at org.apache.spark.network.TransportContext.initializePipeline(TransportContext.java:221)
      at org.apache.spark.network.TransportContext.initializePipeline(TransportContext.java:194)
      at org.apache.spark.network.client.TransportClientFactory$1.initChannel(TransportClientFactory.java:302)
      at org.apache.spark.network.client.TransportClientFactory$1.initChannel(TransportClientFactory.java:299)
      at io.netty.channel.ChannelInitializer.initChannel(ChannelInitializer.java:129)
      at io.netty.channel.ChannelInitializer.handlerAdded(ChannelInitializer.java:112)
      at io.netty.channel.AbstractChannelHandlerContext.callHandlerAdded(AbstractChannelHandlerContext.java:943)
      at io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(DefaultChannelPipeline.java:609)
      at io.netty.channel.DefaultChannelPipeline.access$100(DefaultChannelPipeline.java:46)
      at io.netty.channel.DefaultChannelPipeline$PendingHandlerAddedTask.execute(DefaultChannelPipeline.java:1463)
      at io.netty.channel.DefaultChannelPipeline.callHandlerAddedForAllHandlers(DefaultChannelPipeline.java:1115)
      at io.netty.channel.DefaultChannelPipeline.invokeHandlerAddedIfNeeded(DefaultChannelPipeline.java:650)
      at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:528)
      at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:443)
      at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:500)
      at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
      at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469)
      at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:559)
      at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
      at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
      at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
      at java.base/java.lang.Thread.run(Thread.java:834)
23/09/24 13:29:10 WARN StandaloneAppClient$ClientEndpoint: Failed to connect to master ffmk-n3:7077
org.apache.spark.SparkException: Exception thrown in awaitResult:
      at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
      at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
      at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
      at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109)
      at org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint$$anon$1.run(StandaloneAppClient.scala:126)
      at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
      at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
      at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
      at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
      at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.io.IOException: Failed to connect to ffmk-n3/141.76.48.47:7077
      at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:316)
      at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:246)
      at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:258)
      at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:204)
      at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:202)
      at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:198)
      ... 4 more
Caused by: io.netty.channel.StacklessClosedChannelException
      at io.netty.channel.AbstractChannel$AbstractUnsafe.ensureOpen(ChannelPromise)(Unknown Source)
23/09/24 13:29:30 ERROR StandaloneSchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
23/09/24 13:29:30 WARN StandaloneSchedulerBackend: Application ID is not initialized yet.
23/09/24 13:29:30 INFO SparkUI: Stopped Spark web UI at http://<ffmk3-url>:4040<https://urldefense.com/v3/__http://*3Cffmk3-url*3E:4040__;JSU!!KGKeukY!1O8XjuysQLdi7c6JzYF2Zw2YcSnphLcw9QP-DKnQ-HnKS6fJMaN6sUZfl_YFva3lfE8wDWDs-5s8vdO-mHM-oR2dUgQ6SIyl4O6Q-OIT6RVKyQ$>
23/09/24 13:29:30 INFO StandaloneSchedulerBackend: Shutting down all executors
23/09/24 13:29:30 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down
23/09/24 13:29:30 WARN StandaloneAppClient$ClientEndpoint: Drop UnregisterApplication(null) because has not yet connected to master
23/09/24 13:29:30 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
23/09/24 13:29:30 INFO MemoryStore: MemoryStore cleared
23/09/24 13:29:30 INFO BlockManager: BlockManager stopped
23/09/24 13:29:30 INFO BlockManagerMaster: BlockManagerMaster stopped
23/09/24 13:29:30 WARN MetricsSystem: Stopping a MetricsSystem that is not running
23/09/24 13:29:30 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
23/09/24 13:29:30 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 33659.
23/09/24 13:29:30 INFO NettyBlockTransferService: Server created on <ffmk3-url>:33659
23/09/24 13:29:30 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
23/09/24 13:29:30 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, <ffmk3-url>, 33659, None)
23/09/24 13:29:30 ERROR SparkContext: Error initializing SparkContext.
java.lang.NullPointerException
      at org.apache.spark.storage.BlockManagerMaster.registerBlockManager(BlockManagerMaster.scala:79)
      at org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:518)
      at org.apache.spark.SparkContext.<init>(SparkContext.scala:593)
      at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2690)
      at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:949)
      at scala.Option.getOrElse(Option.scala:189)
      at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:943)
      at org.apache.spark.examples.GroupByTest$.main(GroupByTest.scala:32)
      at org.apache.spark.examples.GroupByTest.main(GroupByTest.scala)
      at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.base/java.lang.reflect.Method.invoke(Method.java:566)
      at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
      at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
      at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
      at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
      at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
      at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
      at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
      at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
23/09/24 13:29:30 INFO SparkContext: SparkContext already stopped.
Exception in thread "main" java.lang.NullPointerException
      at org.apache.spark.storage.BlockManagerMaster.registerBlockManager(BlockManagerMaster.scala:79)
      at org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:518)
      at org.apache.spark.SparkContext.<init>(SparkContext.scala:593)
      at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2690)
      at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:949)
      at scala.Option.getOrElse(Option.scala:189)
      at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:943)
      at org.apache.spark.examples.GroupByTest$.main(GroupByTest.scala:32)
      at org.apache.spark.examples.GroupByTest.main(GroupByTest.scala)
      at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.base/java.lang.reflect.Method.invoke(Method.java:566)
      at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
      at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
      at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
      at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
      at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
      at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
      at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
      at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
23/09/24 13:29:30 INFO SparkContext: Successfully stopped SparkContext
23/09/24 13:29:30 INFO ShutdownHookManager: Shutdown hook called
23/09/24 13:29:30 INFO ShutdownHookManager: Deleting directory /tmp/spark-ddf1dc85-4e42-4ff1-b6cd-b2de089b8bdb
23/09/24 13:29:30 INFO ShutdownHookManager: Deleting directory /tmp/spark-f5abcae1-4502-4b9d-8ba1-42184db05c21

So I probably have done something wrong during the setup. I'll explain how I set it up.

I installed mvapich2 as follows:
wget https://mvapich.cse.ohio-state.edu/download/mvapich/mv2/mvapich2-2.3.7.tar.gz
gzip -dc mvapich2-2.3.7.tar.gz | tar -x
export LIBRARY_PATH="/home/bengt/nooby/baseline/install/lib"
export CPPFLAGS="-I /home/bengt/nooby/baseline/install/include"
export FFLAGS="-w -fallow-argument-mismatch -O2"
export LDFLAGS="-L/home/bengt/nooby/baseline/install/lib -L/home/bengt/nooby/baseline/install/lib64 -Wl,-rpath,/home/bengt/nooby/baseline/install/lib -Wl,-rpath,/home/bengt/nooby/baseline/install/lib64"
export LIBS="-libverbs"
cd mvapich2-2.3.7
./configure --with-ibverbs=/home/bengt/nooby/baseline/install --prefix=/home/bengt/nooby/baseline/install
make
make install

Mvapich2 seems to run fine as the command
./mpirun_rsh -np 2 ffmk-n3 ffmk-3 ../libexec/osu-micro-benchmarks/mpi/collective/osu_alltoall
runs fine.

Mvapich2-j is also required. I installed this following this user guide<http://mvapich.cse.ohio-state.edu/userguide/mv2j/>.
As java was missing I installed java 11.
wget https://download.java.net/openjdk/jdk11/ri/openjdk-11+28_linux-x64_bin.tar.gz<https://urldefense.com/v3/__https://download.java.net/openjdk/jdk11/ri/openjdk-11*28_linux-x64_bin.tar.gz__;Kw!!KGKeukY!1O8XjuysQLdi7c6JzYF2Zw2YcSnphLcw9QP-DKnQ-HnKS6fJMaN6sUZfl_YFva3lfE8wDWDs-5s8vdO-mHM-oR2dUgQ6SIyl4O6Q-OKaYsuZWA$>
gzip -dc openjdk-11+28_linux-x64_bin.tar.gz | tar -x
I tried java 19 first which ran into an error and then I read online only 8 and 11 work.

Ant was also missing:
wget https://dlcdn.apache.org//ant/binaries/apache-ant-1.10.14-bin.tar.gz<https://urldefense.com/v3/__https://dlcdn.apache.org/*ant/binaries/apache-ant-1.10.14-bin.tar.gz__;Lw!!KGKeukY!1O8XjuysQLdi7c6JzYF2Zw2YcSnphLcw9QP-DKnQ-HnKS6fJMaN6sUZfl_YFva3lfE8wDWDs-5s8vdO-mHM-oR2dUgQ6SIyl4O6Q-OJDtBaxUw$>
gzip -dc apache-ant-1.10.14-bin.tar.gz | tar -x

And I got mvapich2-j from here:
wget http://mvapich.cse.ohio-state.edu/download/mvapich/mv2j/mvapich2-j-2.3.7.tar.gz
gzip -dc mvapich2-j-2.3.7.tar.gz | tar -x

Running ant:
~/tmp/mvapich2-j-2.3.7/src/java$ $ANT_HOME/bin/ant
Buildfile: /home/bengt/tmp/mvapich2-j-2.3.7/src/java/build.xml

compile:
    [javac] /home/bengt/tmp/mvapich2-j-2.3.7/src/java/build.xml:18: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds
    [javac] Compiling 24 source files

jars:
      [jar] Building jar: /home/bengt/tmp/mvapich2-j-2.3.7/lib/mvapich2-j.jar

clean:

all:

BUILD SUCCESSFUL
Total time: 1 second

Running make:
~/tmp/mvapich2-j-2.3.7/src/c$ make
/home/bengt/tmp/jdk-11/bin/javac -cp /home/bengt/tmp/mvapich2-j-2.3.7/lib/mvapich2-j.jar:. -h . /home/bengt/tmp/mvapich2-j-2.3.7/src/java/mpi/*.java
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
/home/bengt/nooby/baseline/install/bin/mpicc -O3 -o libmvapich2-j.so -shared -fPIC -Wl,-soname -Wl,-rpath -I/home/bengt/tmp/jdk-11/include -I/home/bengt/tmp/jdk-11/include/linux -I/home/bengt/nooby/baseline/install/include -L/usr/lib64  *.c
mpi_Group.c: In function ‘Java_mpi_Group_nativeSize’:
mpi_Group.c:150:9: warning: implicit declaration of function ‘println’; did you mean ‘printf’? [-Wimplicit-function-declaration]
  150 |         println("ERROR: nativeSize for Group failed.");
      |         ^~~~~~~
      |         printf
mpi_Intracomm.c: In function ‘Java_mpi_Intracomm_nativeSplit’:
mpi_Intracomm.c:424:9: warning: implicit declaration of function ‘println’; did you mean ‘printf’? [-Wimplicit-function-declaration]
  424 |         println("ERROR: Could not split communicator.");
      |         ^~~~~~~
      |         printf
mv libmvapich2-j.so /home/bengt/tmp/mvapich2-j-2.3.7/lib
/sbin/ldconfig -n /home/bengt/tmp/mvapich2-j-2.3.7/lib

Then I get mpi4spark:
wget http://hibd.cse.ohio-state.edu/download/hibd/mpi4spark-0.2-x86-bin.tar.gz
gzip -dc mpi4spark-0.2-x86-bin.tar.gz | tar -x

and copy the mvapich2 jar:
cp $MV2J_HOME/lib/mvapich2-j.jar $SPARK_HOME/jars

The used environment variables are:
export JAVA_HOME=/home/bengt/tmp/jdk-11
export MPILIB=/home/bengt/nooby/baseline/install
export MV2J_HOME=/home/bengt/tmp/mvapich2-j-2.3.7
export ANT_HOME=/home/bengt/tmp/apache-ant-1.10.14
export HADOOP_HOME=/home/bengt/tmp/hadoop-3.3.4
export SPARK_HOME=/home/bengt/tmp/mpi4spark-0.2-x86-bin
export MV2_ENABLE_AFFINITY=0


Then I am following the mpi4spark guide for the GroupByTest Benchmark:
The hostfile looks like this:
bengt at ffmk-n3:~/tmp/mpi4spark-0.2-x86-bin$ cat hostfile
ffmk-n3
As you can see there is only one node because I am trying to work out the installation process manually before writing a script to deploy the installation to multiple nodes.

app.sh:
bengt at ffmk-n3:~/tmp/mpi4spark-0.2-x86-bin$ cat app.sh
./bin/spark-submit --master spark://$1:7077 --class org.apache.spark.examples.GroupByTest examples/jars/spark-examples_2.12-3.3.0-SNAPSHOT.jar

spark-env.sh:
     < default comments at the beginning >
export SPARK_USE_MPI=1
export JAVA_HOME=/home/bengt/tmp/jdk-11
export MPILIB=/home/bengt/nooby/baseline/install
export MV2J_HOME=/home/bengt/tmp/mvapich2-j-2.3.7
export ANT_HOME=/home/bengt/tmp/apache-ant-1.10.14
export HADOOP_HOME=/home/bengt/tmp/hadoop-3.3.4
export SPARK_HOME=/home/bengt/tmp/mpi4spark-0.2-x86-bin
export SPARK_NO_DAEMONIZE=1
export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:$MV2J_HOME
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$MV2J_HOME/lib
export JAVA_BINARY=/home/bengt/tmp/jdk-11/bin/java
export SPARK_LIBRARY_PATH=$MV2J_HOME/lib
export WORK_DIR=$SPARK_HOME/exec-wdir

spark-default.conf:
bengt at ffmk-n3:~/tmp/mpi4spark-0.2-x86-bin$ cat conf/spark-defaults.conf
< default comments at the beginning >
spark.executor.extraJavaOptions -Djava.library.path=/home/bengt/tmp/mvapich2-j-2.3.7/lib

After this i run:
./sbin/start-mpi4spark.sh

This results in the app.log and exec.log above.

Trying to figure out what is wrong I also tried running the master and worker manually.
bengt at ffmk-n3:~/tmp/mpi4spark-0.2-x86-bin$ ./sbin/start-master.sh
starting org.apache.spark.deploy.master.Master, logging to /home/bengt/tmp/mpi4spark-0.2-x86-bin/logs/spark-bengt-org.apache.spark.deploy.master.Master-1-ffmk-n3.out
Spark Command: /home/bengt/tmp/jdk-11/bin/java -cp /home/bengt/tmp/mpi4spark-0.2-x86-bin/conf/:/home/bengt/tmp/mpi4spark-0.2-x86-bin/jars/* -Xmx1g org.apache.spark.deploy.master.Master --host <ffmk3-url> --port 7077 --webui-port 8080
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
23/09/24 14:09:23 INFO Master: Started daemon with process name: 2847 at ffmk-n3
23/09/24 14:09:23 INFO SignalUtils: Registering signal handler for TERM
23/09/24 14:09:23 INFO SignalUtils: Registering signal handler for HUP
23/09/24 14:09:23 INFO SignalUtils: Registering signal handler for INT
23/09/24 14:09:23 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/09/24 14:09:23 INFO SecurityManager: Changing view acls to: bengt
23/09/24 14:09:23 INFO SecurityManager: Changing modify acls to: bengt
23/09/24 14:09:23 INFO SecurityManager: Changing view acls groups to:
23/09/24 14:09:23 INFO SecurityManager: Changing modify acls groups to:
23/09/24 14:09:23 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(bengt); groups with view permissions: Set(); users  with modify permissions: Set(bengt); groups with modify permissions: Set()
[ffmk-n3:mpi_rank_0][MPID_Init] [Performance Suggestion]: Application has requested for multi-thread capability. If allocating memory from different pthreads/OpenMP threads, please consider setting MV2_USE_ALIGNED_ALLOC=1 for improved performance.
Use MV2_USE_THREAD_WARNING=0 to suppress this error message.
23/09/24 14:09:24 INFO NioEventLoop: Starting MPI4Spark.
23/09/24 14:09:24 INFO Utils: Successfully started service 'sparkMaster' on port 7077.
23/09/24 14:09:24 INFO Master: Starting Spark master at spark://<ffmk3-u>:7077
23/09/24 14:09:24 INFO Master: Running Spark version 3.3.0-SNAPSHOT
23/09/24 14:09:24 INFO Utils: Successfully started service 'MasterUI' on port 8080.
23/09/24 14:09:24 INFO MasterWebUI: Bound MasterWebUI to 0.0.0.0, and started at http://<ffmk3-url>:-1<https://urldefense.com/v3/__http://*3Cffmk3-url*3E:-1__;JSU!!KGKeukY!1O8XjuysQLdi7c6JzYF2Zw2YcSnphLcw9QP-DKnQ-HnKS6fJMaN6sUZfl_YFva3lfE8wDWDs-5s8vdO-mHM-oR2dUgQ6SIyl4O6Q-OJJ7-idFg$>
23/09/24 14:09:24 INFO Master: I have been elected leader! New state: ALIVE

When I try to telnet into the master:
bengt at ffmk-n3:~/tmp/mpi4spark-0.2-x86-bin$ telnet<ffmk3-url> 7077
Trying 141.76.48.47...
Connected to <ffmk3-url>.
Escape character is '^]'.
Connection closed by foreign host.

And on the running master process appears:
23/09/24 14:11:43 WARN NioServerSocketChannel: Failed to create a new channel from an accepted socket.
java.lang.NullPointerException
      at java.base/java.lang.String.contains(String.java:2036)
      at io.netty.channel.socket.nio.NioServerSocketChannel.doReadMessages(NioServerSocketChannel.java:184)
      at io.netty.channel.nio.AbstractNioMessageChannel$NioMessageUnsafe.read(AbstractNioMessageChannel.java:85)
      at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:802)
      at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:737)
      at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:661)
      at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:552)
      at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
      at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
      at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
      at java.base/java.lang.Thread.run(Thread.java:834)

And running a worker:
bengt at ffmk-n3:~/tmp/mpi4spark-0.2-x86-bin$ ./sbin/start-worker.sh spark://<ffmk3-url>:7077
starting org.apache.spark.deploy.worker.Worker, logging to /home/bengt/tmp/mpi4spark-0.2-x86-bin/logs/spark-bengt-org.apache.spark.deploy.worker.Worker-1-ffmk-n3.out
Spark Command: /home/bengt/tmp/jdk-11/bin/java -cp /home/bengt/tmp/mpi4spark-0.2-x86-bin/conf/:/home/bengt/tmp/mpi4spark-0.2-x86-bin/jars/* -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://<ffmk3-url>:7077
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
23/09/24 14:12:43 INFO Worker: Started daemon with process name: 2962 at ffmk-n3
23/09/24 14:12:43 INFO SignalUtils: Registering signal handler for TERM
23/09/24 14:12:43 INFO SignalUtils: Registering signal handler for HUP
23/09/24 14:12:43 INFO SignalUtils: Registering signal handler for INT
23/09/24 14:12:43 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/09/24 14:12:43 INFO SecurityManager: Changing view acls to: bengt
23/09/24 14:12:43 INFO SecurityManager: Changing modify acls to: bengt
23/09/24 14:12:43 INFO SecurityManager: Changing view acls groups to:
23/09/24 14:12:43 INFO SecurityManager: Changing modify acls groups to:
23/09/24 14:12:43 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(bengt); groups with view permissions: Set(); users  with modify permissions: Set(bengt); groups with modify permissions: Set()
[ffmk-n3:mpi_rank_0][MPID_Init] [Performance Suggestion]: Application has requested for multi-thread capability. If allocating memory from different pthreads/OpenMP threads, please consider setting MV2_USE_ALIGNED_ALLOC=1 for improved performance.
Use MV2_USE_THREAD_WARNING=0 to suppress this error message.
23/09/24 14:12:44 ERROR NioEventLoop: ERROR: MPI_Init_thread did not reutrn MPI.THREAD_MULTIPLE
23/09/24 14:12:44 INFO NioEventLoop: Starting MPI4Spark.
23/09/24 14:12:44 INFO Utils: Successfully started service 'sparkWorker' on port 39987.
23/09/24 14:12:44 INFO Worker: Worker decommissioning not enabled.
23/09/24 14:12:44 INFO Worker: Starting Spark worker 141.76.48.47:39987 with 8 cores, 14.5 GiB RAM
23/09/24 14:12:44 INFO Worker: Running Spark version 3.3.0-SNAPSHOT
23/09/24 14:12:44 INFO Worker: Spark home: /home/bengt/tmp/mpi4spark-0.2-x86-bin
23/09/24 14:12:44 INFO ResourceUtils: ==============================================================
23/09/24 14:12:44 INFO ResourceUtils: No custom resources configured for spark.worker.
23/09/24 14:12:44 INFO ResourceUtils: ==============================================================
23/09/24 14:12:44 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
23/09/24 14:12:44 INFO WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://<ffmk3-url>:-1<https://urldefense.com/v3/__http://*3Cffmk3-url*3E:-1__;JSU!!KGKeukY!1O8XjuysQLdi7c6JzYF2Zw2YcSnphLcw9QP-DKnQ-HnKS6fJMaN6sUZfl_YFva3lfE8wDWDs-5s8vdO-mHM-oR2dUgQ6SIyl4O6Q-OJJ7-idFg$>
23/09/24 14:12:44 INFO Worker: Connecting to master <ffmk3-url>:7077...
23/09/24 14:12:45 WARN ChannelInitializer: Failed to initialize a channel. Closing: [id: 0x32d38e92]
java.lang.NullPointerException
      at java.base/java.lang.String.contains(String.java:2036)
      at org.apache.spark.network.TransportContext.initializePipeline(TransportContext.java:221)
      at org.apache.spark.network.TransportContext.initializePipeline(TransportContext.java:194)
      at org.apache.spark.network.client.TransportClientFactory$1.initChannel(TransportClientFactory.java:302)
      at org.apache.spark.network.client.TransportClientFactory$1.initChannel(TransportClientFactory.java:299)
      at io.netty.channel.ChannelInitializer.initChannel(ChannelInitializer.java:129)
      at io.netty.channel.ChannelInitializer.handlerAdded(ChannelInitializer.java:112)
      at io.netty.channel.AbstractChannelHandlerContext.callHandlerAdded(AbstractChannelHandlerContext.java:943)
      at io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(DefaultChannelPipeline.java:609)
      at io.netty.channel.DefaultChannelPipeline.access$100(DefaultChannelPipeline.java:46)
      at io.netty.channel.DefaultChannelPipeline$PendingHandlerAddedTask.execute(DefaultChannelPipeline.java:1463)
      at io.netty.channel.DefaultChannelPipeline.callHandlerAddedForAllHandlers(DefaultChannelPipeline.java:1115)
      at io.netty.channel.DefaultChannelPipeline.invokeHandlerAddedIfNeeded(DefaultChannelPipeline.java:650)
      at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:528)
      at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:443)
      at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:500)
      at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
      at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469)
      at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:559)
      at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
      at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
      at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
      at java.base/java.lang.Thread.run(Thread.java:834)
23/09/24 14:12:45 WARN Worker: Failed to connect to master <ffmk3-url>:7077
org.apache.spark.SparkException: Exception thrown in awaitResult:
      at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
      at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
      at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
      at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109)
      at org.apache.spark.deploy.worker.Worker$$anon$1.run(Worker.scala:311)
      at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
      at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
      at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
      at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
      at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.io.IOException: Failed to connect to <ffmk3-url>:7077
      at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:316)
      at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:246)
      at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:258)
      at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:204)
      at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:202)
      at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:198)
      ... 4 more
Caused by: io.netty.channel.StacklessClosedChannelException
      at io.netty.channel.AbstractChannel$AbstractUnsafe.ensureOpen(ChannelPromise)(Unknown Source)


I don't know if the example with the manual master and worker helps at all.
The goal would be to be able to run the GroupByTest.
Let me know if I left out any information that would be useful for you.

Thanks in advance for your time and effort!

Kind regards,
Bengt


Barkhausen Institut
www.barkhauseninstitut.org<https://urldefense.com/v3/__http://www.barkhauseninstitut.org__;!!KGKeukY!1O8XjuysQLdi7c6JzYF2Zw2YcSnphLcw9QP-DKnQ-HnKS6fJMaN6sUZfl_YFva3lfE8wDWDs-5s8vdO-mHM-oR2dUgQ6SIyl4O6Q-OKjL9s2DQ$>


Barkhausen Institut gGmbH | Sitz: Würzburger Straße 46, 01187 Dresden, Germany | Registergericht: Amtsgericht Dresden, HRB 37267 | Geschäftsführer: Prof. Dr. Gerhard Fettweis, Dr. Tim Hentschel | Vorsitzender der Gesellschafterdelegation: Jan Gerken

Hinweise zum Datenschutz und zur Verarbeitung Ihrer Daten finden Sie unter: https://barkhauseninstitut.org/data-privacy<https://urldefense.com/v3/__https://barkhauseninstitut.org/data-privacy__;!!KGKeukY!1O8XjuysQLdi7c6JzYF2Zw2YcSnphLcw9QP-DKnQ-HnKS6fJMaN6sUZfl_YFva3lfE8wDWDs-5s8vdO-mHM-oR2dUgQ6SIyl4O6Q-OIATgIgWw$>

This email and any attachments are intended only for the person to whom this email is addressed and may contain confidential and/or privileged information. If you received this email in error, please do not disclose the contents to anyone, but notify the sender by return email and delete this email (and any attachments) from your system. Information on data protection and processing of your personal information: https://barkhauseninstitut.org/data-privacy<https://urldefense.com/v3/__https://barkhauseninstitut.org/data-privacy__;!!KGKeukY!1O8XjuysQLdi7c6JzYF2Zw2YcSnphLcw9QP-DKnQ-HnKS6fJMaN6sUZfl_YFva3lfE8wDWDs-5s8vdO-mHM-oR2dUgQ6SIyl4O6Q-OIATgIgWw$>

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ATT00001.txt
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20230924/a2366b34/attachment-0002.txt>


More information about the Mvapich-discuss mailing list