[Hadoop-RDMA-discuss] Problem with RDMA-Hadoop and Mellanox CX5
Vittorio Rebecchi - A3Cube Inc.
vittorio at a3cube-inc.com
Tue Jun 27 11:09:09 EDT 2017
Hi Xiao yi,
thank for yout attention.
Im sending, as attachment to the mail, our configuration and logs from
the clusters on which I run RDAM-hadoop.
I've managed to be able to go on with terasort by removing all the
optimizations I usually add. The logs were regenerated today for a
better picture of the problem.
Let me describe better the environment in which we use RDMA-hadoop: we
don't use neither HDFS nor Lustre as filesystem but our distributed
filesystem, "A3Cube Anima", and it works really fine with hadoop with
our specific plug-in. You will see its setup in the confs together with
RDMA-hadoop. We usually run terasort with our distributed filesystem
without any problems.
Im trying to use RDMA-hadoop for its improvements in map/reduce with IB.
Teragen works fine and it creates the files to process with terasort.
Then we run terasort with: ./bin/hadoop jar
./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar terasort
teraInput teraOutput
The IB IPC communication in yarn and other hadoop parts works fine.
Terasort completes the map phase and, when its time for reduce after a
while, for example at "mapreduce.Job: map 100% reduce 33%", it wont go
further and the logs reports the following lines:
2017-06-27 07:11:12,277 INFO
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed
completed containers from NM context:
[container_1498572173359_0002_01_000154,
container_1498572173359_0002_01_000151]
2017-06-27 07:15:26,097 INFO
org.apache.hadoop.mapred.HOMRShuffleHandler: RDMA Receiver 0 is turning
off!!!
What I have describes happens on the 2 node ConnectX-3 cluster (i7-4770
cpus with 16 GB ram). Logs1.tar.gz is the log of node1, logs2.tar.gz is
the log of node2, etc.tar.gz has the conf files and terasort.sh is the
terasort.sh with some optimization that breaks java communication after
a while.
I have added terasort.sh with some optimization we usually use with
terasort because, during the reduce, phase nodemanager crashes with the
following output:
#A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007fb5c0600761, pid=18982, tid=0x00007fb5bcebd700
#
# JRE version: Java(TM) SE Runtime Environment (8.0_131-b11) (build
1.8.0_131-b11)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.131-b11 mixed mode
linux-amd64 compressed oops)
# Problematic frame:
# C [librdmamrserver.so.1.1.0+0x7761] ucr_setup_ib_qp_init_params+0x21
I have added the logs about this run and it's called logs_extended.tar.gz
Let me add that the same setup I have reported, won't work on out
ConnectX5 cluster because i think RDMA-hadoop cannot connect to the
proper Mellanox device in the system.
It looks like to be mandatory to specify Mellanox devices to use (mlx5_0
or mlx5_1) even for ib_write_bw testing command.
Do you have any hints on the setup about map/reduce? There is something
to fix on the setup for allowing terasort to complete its processing?
Do you have any suggestions to activate the use of RDMA-hadoop with
Mellanox ConnecX5 cards and to fix the reduce behavour? I have checked
the configuration with your manual and everything seems correct.
Thanks in advance.
Bye
-- Vittorio Rebecchi
Il 26/06/2017 19:48, Xiaoyi Lu ha scritto:
> Hi, Vittorio,
>
> Thanks for your interest in our project. We locally tried to run some benchmarks on our ConnextX-5 nodes and things run fine.
>
> Can you please send us your logs, confs, and exact commands? We can try to reproduce this and get back to you.
>
> Thanks,
> Xiaoyi
>
>> On Jun 26, 2017, at 9:20 AM, vittorio at a3cube-inc.com wrote:
>>
>> Hi,
>>
>> My name is Vittorio Rebecchi and I'm testing RDMA-based Apache hadoop with an 8 node cluster with 1 Mellanox ConnectX 5 card on each machine.
>>
>> The Mellanox ConnectX 5 cards have 2 ports and they are mapped as 2 independant devices (mlx5_0 and mlx5_1) on the OS by Mellanox drivers. The cards work properly (tested with ib_write_lat and ib_write_bw) bui I must specify which IB device to use (in ib_write_lat, for example, I must specify "-d mlx5_0").
>>
>> On my setup, currenty, Yarn starts but the nodemanager nodes report the following message:
>>
>> ctx error: ibv_poll_cq() failed: IBV_WC_SUCCESS != wc.status
>> IBV_WC_SUCCESS != wc.status (12)
>> ucr_probe_blocking return value -1
>>
>> I tested the same installation base on other two nodes with ConnectX 3 and RDMa-hadoop works witout showing that message.
>> So I suppose this error is due to the fact ConnecX5 cards have 2 ports that are exposed to applications as independent devices by the new Mellanox Driver (4.0 - the one that supports CX5) and RDMA-hadoop cannot establish which device to use. In other software we must specify the device (and sometime even the port) to use, as "mlx5_0", to solve similar problems.
>>
>> Is there a way to specify, in RDMA-bases hadoop (and plugin) setup, the proper IB device to use?
>>
>> Thanks.
>>
>> Vittorio Rebecchi
>> _______________________________________________
>> RDMA-Hadoop-discuss mailing list
>> RDMA-Hadoop-discuss at cse.ohio-state.edu
>> http://mailman.cse.ohio-state.edu/mailman/listinfo/rdma-hadoop-discuss
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Quarantined Attachment.txt
URL: <http://mailman.cse.ohio-state.edu/pipermail/rdma-hadoop-discuss/attachments/20170627/8c875df8/attachment-0001.txt>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: logs1.tar.gz
Type: application/gzip
Size: 132499 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/rdma-hadoop-discuss/attachments/20170627/8c875df8/attachment-0004.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: logs2.tar.gz
Type: application/gzip
Size: 517508 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/rdma-hadoop-discuss/attachments/20170627/8c875df8/attachment-0005.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: terasort.sh
Type: application/x-shellscript
Size: 973 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/rdma-hadoop-discuss/attachments/20170627/8c875df8/attachment-0006.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: logs_terasort_extended.tar.gz
Type: application/gzip
Size: 226317 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/rdma-hadoop-discuss/attachments/20170627/8c875df8/attachment-0007.bin>
More information about the RDMA-Hadoop-discuss
mailing list