[Hadoop-RDMA] Troubles with deploying hadoop-rdma-0.9.8 over IB cluster

Alexander Frolov alexndr.frolov at gmail.com
Wed Feb 5 10:26:44 EST 2014


UPD: just forgot to attach config files:

frolo at A11:~/hadoop-rdma-0.9.8> cat conf/core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>
<name>hadoop.tmp.dir</name>
<value>/home/frolo/hadoop-rdma-0.9.8/tmp</value>
</property>

<property>
<name>fs.default.name</name>
<value>hdfs://A11:9000</value>
<description>URL of NameNode</description>
</property>

<property>
<name>hadoop.ib.enabled</name>
<value>true</value>
<description>Enable the RDMA feature over IB. Default value of
hadoop.ib.enabled is true.</description>
</property>

<property>
<name>hadoop.roce.enabled</name>
<value>false</value>
<description>Disable the RDMA feature over RoCE. Default value of
hadoop.roce.enabled is false.</description>
</property>

</configuration>

frolo at A11:~/hadoop-rdma-0.9.8> cat conf/hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<!--
<property>
<name>dfs.name.dir</name>
<value>/home/frolo/hadoop-rdma-0.9.8/HadoopName</value>
<description>Path on the local filesystem where the NameNode stores the
namespace and transactions logs</description>
</property>
-->

<!--
<property>
<name>dfs.data.dir</name>
<value>/home/frolo/hadoop-rdma-0.9.8/HadoopName</value>
<description>Lists of paths on the local filesystem of a DataNode where it
should store its block</description>
</property>
-->

<property>
<name>dfs.replication</name>
<value>1</value>
</property>

</configuration>


frolo at A11:~/hadoop-rdma-0.9.8> cat conf/mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>
<name>mapred.job.tracker</name>
<value>A11:9001</value>
<description>Host or IP and port of JobTracker</description>
</property>

<property>
<name>mapred.system.dir</name>
<value>/home/frolo/hadoop-rdma-0.9.8/tmp</value>
<description>Path on the HDFS where the MapReduce framework stores system
files</description>
</property>

</configuration>

On Wed, Feb 5, 2014 at 7:22 PM, Alexander Frolov
<alexndr.frolov at gmail.com>wrote:

> Hello,
>
> I am trying to deploy Hadoop-RDMA on 8 node IB (OFED-1.5.3-4.0.42) cluster
> and got into the following problem (a.k.a File ... could only be replicated
> to 0 nodes, instead of 1):
>
> frolo at A11:~/hadoop-rdma-0.9.8> ./bin/hadoop dfs -copyFromLocal
> ../pg132.txt /user/frolo/input/pg132.txt
> Warning: $HADOOP_HOME is deprecated.
>
> 14/02/05 19:06:30 WARN hdfs.DFSClient: DataStreamer Exception:
> java.lang.reflect.UndeclaredThrowableException
>  at com.sun.proxy.$Proxy1.addBlock(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(Unknown
> Source)
>  at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(Unknown
> Source)
> at com.sun.proxy.$Proxy1.addBlock(Unknown Source)
>  at org.apache.hadoop.hdfs.From.Code(Unknown Source)
> at org.apache.hadoop.hdfs.From.F(Unknown Source)
> at org.apache.hadoop.hdfs.From.F(Unknown Source)
>  at org.apache.hadoop.hdfs.The.run(Unknown Source)
> Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException:
> File /user/frolo/input/pg132.txt could only be replicated to 0 nodes,
> instead of 1
>  at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(Unknown
> Source)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(Unknown Source)
>  at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.ipc.RPC$Server.call(Unknown Source)
>  at org.apache.hadoop.ipc.rdma.madness.Code(Unknown Source)
> at org.apache.hadoop.ipc.rdma.madness.run(Unknown Source)
>  at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
>  at org.apache.hadoop.security.UserGroupInformation.doAs(Unknown Source)
> at org.apache.hadoop.ipc.rdma.be.run(Unknown Source)
>
> at org.apache.hadoop.ipc.rdma.RDMAClient.Code(Unknown Source)
> at org.apache.hadoop.ipc.rdma.RDMAClient.call(Unknown Source)
>  at org.apache.hadoop.ipc.Tempest.invoke(Unknown Source)
> ... 12 more
>
> 14/02/05 19:06:30 WARN hdfs.DFSClient: Error Recovery for null bad
> datanode[0] nodes == null
> 14/02/05 19:06:30 WARN hdfs.DFSClient: Could not get block locations.
> Source file "/user/frolo/input/pg132.txt" - Aborting...
> 14/02/05 19:06:30 INFO hdfs.DFSClient: exception in isClosed
>
>
> It seems that data is not transferred to DataNodes when I start copying
> from local filesystem to HDFS. I tested availability of DataNodes:
>
> frolo at A11:~/hadoop-rdma-0.9.8> ./bin/hadoop dfsadmin -report
> Warning: $HADOOP_HOME is deprecated.
>
> Configured Capacity: 0 (0 KB)
> Present Capacity: 0 (0 KB)
> DFS Remaining: 0 (0 KB)
> DFS Used: 0 (0 KB)
> DFS Used%: �%
> Under replicated blocks: 0
> Blocks with corrupt replicas: 0
> Missing blocks: 0
>
> -------------------------------------------------
> Datanodes available: 0 (4 total, 4 dead)
>
> Name: 10.10.1.13:50010
> Decommission Status : Normal
> Configured Capacity: 0 (0 KB)
> DFS Used: 0 (0 KB)
> Non DFS Used: 0 (0 KB)
> DFS Remaining: 0(0 KB)
> DFS Used%: 100%
> DFS Remaining%: 0%
> Last contact: Wed Feb 05 19:02:54 MSK 2014
>
>
> Name: 10.10.1.14:50010
> Decommission Status : Normal
> Configured Capacity: 0 (0 KB)
> DFS Used: 0 (0 KB)
> Non DFS Used: 0 (0 KB)
> DFS Remaining: 0(0 KB)
> DFS Used%: 100%
> DFS Remaining%: 0%
> Last contact: Wed Feb 05 19:02:54 MSK 2014
>
>
> Name: 10.10.1.16:50010
> Decommission Status : Normal
> Configured Capacity: 0 (0 KB)
> DFS Used: 0 (0 KB)
> Non DFS Used: 0 (0 KB)
> DFS Remaining: 0(0 KB)
> DFS Used%: 100%
> DFS Remaining%: 0%
> Last contact: Wed Feb 05 19:02:54 MSK 2014
>
>
> Name: 10.10.1.11:50010
> Decommission Status : Normal
> Configured Capacity: 0 (0 KB)
> DFS Used: 0 (0 KB)
> Non DFS Used: 0 (0 KB)
> DFS Remaining: 0(0 KB)
> DFS Used%: 100%
> DFS Remaining%: 0%
> Last contact: Wed Feb 05 19:02:55 MSK 2014
>
> and tried to mkdir in HDFS filesystem which has been successful.
> Restarting of Hadoop daemons have not produced any positive effect.
>
> Could you please help me with this issue? Thank you.
>
> Best,
>    Alex
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/hibd-announce/attachments/20140205/a48e0251/attachment.html>


More information about the hibd-announce mailing list