[Hadoop-RDMA-discuss] Problem with RDMA-Hadoop and Mellanox CX5

Mark Goddard mark at stackhpc.com
Wed Jun 28 05:24:29 EDT 2017


Hi Vittorio,

It sounds like we're experiencing similar issues. I'm using Mellanox
connectX4 dual port NICs with port 0 in IB mode and am unable to start HDFS
services. I've not tried running YARN.

Here's my email to this list on the issue:
http://mailman.cse.ohio-state.edu/pipermail/rdma-hadoop-discuss/2017-June/000095.html
.

Regards,
Mark

On 27 June 2017 at 16:09, Vittorio Rebecchi - A3Cube Inc. <
vittorio at a3cube-inc.com> wrote:

> Hi Xiao yi,
>
> thank for yout attention.
>
> Im sending, as attachment to the mail, our configuration and logs from the
> clusters on which I run RDAM-hadoop.
> I've managed to be able to go on with terasort by removing all the
> optimizations I usually add. The logs were regenerated today for a better
> picture of the problem.
>
> Let me describe better the environment in which we use RDMA-hadoop: we
> don't use neither HDFS nor Lustre as filesystem but our distributed
> filesystem, "A3Cube Anima", and it works really fine with hadoop with our
> specific plug-in. You will see its setup in the confs together with
> RDMA-hadoop. We usually run terasort with our distributed filesystem
> without any problems.
> Im trying to use RDMA-hadoop for its improvements in map/reduce with IB.
> Teragen works fine and it creates the files to process with terasort. Then
> we run terasort with: ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar
> terasort teraInput teraOutput
>
> The IB IPC communication in yarn and other hadoop parts works fine.
> Terasort completes the map phase and, when its time for reduce after a
> while, for example at "mapreduce.Job:  map 100% reduce 33%", it wont go
> further and the logs reports the following lines:
>     2017-06-27 07:11:12,277 INFO org.apache.hadoop.yarn.server.
> nodemanager.NodeStatusUpdaterImpl: Removed completed containers from NM
> context: [container_1498572173359_0002_01_000154,
> container_1498572173359_0002_01_000151]
>     2017-06-27 07:15:26,097 INFO org.apache.hadoop.mapred.HOMRShuffleHandler:
> RDMA Receiver 0 is turning off!!!
>
> What I have describes happens on the 2 node ConnectX-3 cluster (i7-4770
> cpus with 16 GB ram). Logs1.tar.gz is the log of node1, logs2.tar.gz is the
> log of node2, etc.tar.gz has the conf files and terasort.sh is the
> terasort.sh with some optimization that breaks java communication after a
> while.
>
> I have added terasort.sh with some optimization we usually use with
> terasort because, during the reduce, phase nodemanager crashes with the
> following output:
>
> #A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x00007fb5c0600761, pid=18982,
> tid=0x00007fb5bcebd700
> #
> # JRE version: Java(TM) SE Runtime Environment (8.0_131-b11) (build
> 1.8.0_131-b11)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.131-b11 mixed mode
> linux-amd64 compressed oops)
> # Problematic frame:
> # C  [librdmamrserver.so.1.1.0+0x7761] ucr_setup_ib_qp_init_params+0x21
>
> I have added the logs about this run and it's called logs_extended.tar.gz
>
> Let me add that the same setup I have reported, won't work on out
> ConnectX5 cluster because i think RDMA-hadoop cannot connect to the proper
> Mellanox device in the system.
> It looks like to be mandatory to specify Mellanox devices to use (mlx5_0
> or mlx5_1) even for ib_write_bw testing command.
>
> Do you have any hints on the setup about map/reduce? There is something to
> fix on the setup for allowing terasort to complete its processing?
> Do you have any suggestions to activate the use of RDMA-hadoop with
> Mellanox ConnecX5 cards and to fix the reduce behavour? I have checked the
> configuration with your manual and everything seems correct.
>
> Thanks in advance.
>
> Bye
>
> -- Vittorio Rebecchi
>
> Il 26/06/2017 19:48, Xiaoyi Lu ha scritto:
>
>> Hi, Vittorio,
>>
>> Thanks for your interest in our project. We locally tried to run some
>> benchmarks on our ConnextX-5 nodes and things run fine.
>>
>> Can you please send us your logs, confs, and exact commands? We can try
>> to reproduce this and get back to you.
>>
>> Thanks,
>> Xiaoyi
>>
>> On Jun 26, 2017, at 9:20 AM, vittorio at a3cube-inc.com wrote:
>>>
>>> Hi,
>>>
>>> My name is Vittorio Rebecchi and I'm testing RDMA-based Apache hadoop
>>> with an 8 node cluster with 1 Mellanox ConnectX 5 card on each machine.
>>>
>>> The Mellanox ConnectX 5 cards have  2 ports and they are mapped as 2
>>> independant devices (mlx5_0 and mlx5_1) on the OS by Mellanox drivers. The
>>> cards work properly (tested with ib_write_lat and ib_write_bw) bui I must
>>> specify which IB device to use (in ib_write_lat, for example, I must
>>> specify "-d mlx5_0").
>>>
>>> On my setup, currenty, Yarn starts but the nodemanager nodes report the
>>> following message:
>>>
>>> ctx error: ibv_poll_cq() failed: IBV_WC_SUCCESS != wc.status
>>> IBV_WC_SUCCESS != wc.status (12)
>>> ucr_probe_blocking return value -1
>>>
>>> I tested the same installation base on other two nodes with ConnectX 3
>>> and RDMa-hadoop works witout showing that message.
>>> So I suppose this error is due to the fact ConnecX5 cards have 2 ports
>>> that are exposed to applications as independent devices by the new Mellanox
>>> Driver (4.0 - the one that supports CX5) and RDMA-hadoop cannot establish
>>> which device to use. In other software we must specify the device (and
>>> sometime even the port) to use, as "mlx5_0", to solve similar problems.
>>>
>>> Is there a way to specify, in RDMA-bases hadoop (and plugin) setup, the
>>> proper IB device to use?
>>>
>>> Thanks.
>>>
>>> Vittorio Rebecchi
>>> _______________________________________________
>>> RDMA-Hadoop-discuss mailing list
>>> RDMA-Hadoop-discuss at cse.ohio-state.edu
>>> http://mailman.cse.ohio-state.edu/mailman/listinfo/rdma-hadoop-discuss
>>>
>>
>
>
> _______________________________________________
> RDMA-Hadoop-discuss mailing list
> RDMA-Hadoop-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/rdma-hadoop-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/rdma-hadoop-discuss/attachments/20170628/b78fe73e/attachment-0001.html>


More information about the RDMA-Hadoop-discuss mailing list