[Hadoop-RDMA] Troubles with deploying hadoop-rdma-0.9.8 over IB cluster
Alexander Frolov
alexndr.frolov at gmail.com
Wed Feb 5 11:46:43 EST 2014
It seems that I solved my problem. The issue was related to configuration
of hadoop.tmp.dir which has been set to NFS partition. By default it is
configured to /tmp which is local fs. After removing hadoop.tmp.dir from
core-site.xml the problem has been solved.
Thank you.
On Wed, Feb 5, 2014 at 7:26 PM, Alexander Frolov
<alexndr.frolov at gmail.com>wrote:
> UPD: just forgot to attach config files:
>
> frolo at A11:~/hadoop-rdma-0.9.8> cat conf/core-site.xml
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <!-- Put site-specific property overrides in this file. -->
>
> <configuration>
>
> <property>
> <name>hadoop.tmp.dir</name>
> <value>/home/frolo/hadoop-rdma-0.9.8/tmp</value>
> </property>
>
> <property>
> <name>fs.default.name</name>
> <value>hdfs://A11:9000</value>
> <description>URL of NameNode</description>
> </property>
>
> <property>
> <name>hadoop.ib.enabled</name>
> <value>true</value>
> <description>Enable the RDMA feature over IB. Default value of
> hadoop.ib.enabled is true.</description>
> </property>
>
> <property>
> <name>hadoop.roce.enabled</name>
> <value>false</value>
> <description>Disable the RDMA feature over RoCE. Default value of
> hadoop.roce.enabled is false.</description>
> </property>
>
> </configuration>
>
> frolo at A11:~/hadoop-rdma-0.9.8> cat conf/hdfs-site.xml
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <!-- Put site-specific property overrides in this file. -->
>
> <configuration>
>
> <!--
> <property>
> <name>dfs.name.dir</name>
> <value>/home/frolo/hadoop-rdma-0.9.8/HadoopName</value>
> <description>Path on the local filesystem where the NameNode stores the
> namespace and transactions logs</description>
> </property>
> -->
>
> <!--
> <property>
> <name>dfs.data.dir</name>
> <value>/home/frolo/hadoop-rdma-0.9.8/HadoopName</value>
> <description>Lists of paths on the local filesystem of a DataNode where it
> should store its block</description>
> </property>
> -->
>
> <property>
> <name>dfs.replication</name>
> <value>1</value>
> </property>
>
> </configuration>
>
>
> frolo at A11:~/hadoop-rdma-0.9.8> cat conf/mapred-site.xml
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <!-- Put site-specific property overrides in this file. -->
>
> <configuration>
>
> <property>
> <name>mapred.job.tracker</name>
> <value>A11:9001</value>
> <description>Host or IP and port of JobTracker</description>
> </property>
>
> <property>
> <name>mapred.system.dir</name>
> <value>/home/frolo/hadoop-rdma-0.9.8/tmp</value>
> <description>Path on the HDFS where the MapReduce framework stores system
> files</description>
> </property>
>
> </configuration>
>
> On Wed, Feb 5, 2014 at 7:22 PM, Alexander Frolov <alexndr.frolov at gmail.com
> > wrote:
>
>> Hello,
>>
>> I am trying to deploy Hadoop-RDMA on 8 node IB (OFED-1.5.3-4.0.42)
>> cluster and got into the following problem (a.k.a File ... could only be
>> replicated to 0 nodes, instead of 1):
>>
>> frolo at A11:~/hadoop-rdma-0.9.8> ./bin/hadoop dfs -copyFromLocal
>> ../pg132.txt /user/frolo/input/pg132.txt
>> Warning: $HADOOP_HOME is deprecated.
>>
>> 14/02/05 19:06:30 WARN hdfs.DFSClient: DataStreamer Exception:
>> java.lang.reflect.UndeclaredThrowableException
>> at com.sun.proxy.$Proxy1.addBlock(Unknown Source)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(Unknown
>> Source)
>> at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(Unknown
>> Source)
>> at com.sun.proxy.$Proxy1.addBlock(Unknown Source)
>> at org.apache.hadoop.hdfs.From.Code(Unknown Source)
>> at org.apache.hadoop.hdfs.From.F(Unknown Source)
>> at org.apache.hadoop.hdfs.From.F(Unknown Source)
>> at org.apache.hadoop.hdfs.The.run(Unknown Source)
>> Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException:
>> File /user/frolo/input/pg132.txt could only be replicated to 0 nodes,
>> instead of 1
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(Unknown
>> Source)
>> at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(Unknown
>> Source)
>> at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at org.apache.hadoop.ipc.RPC$Server.call(Unknown Source)
>> at org.apache.hadoop.ipc.rdma.madness.Code(Unknown Source)
>> at org.apache.hadoop.ipc.rdma.madness.run(Unknown Source)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>> at org.apache.hadoop.security.UserGroupInformation.doAs(Unknown Source)
>> at org.apache.hadoop.ipc.rdma.be.run(Unknown Source)
>>
>> at org.apache.hadoop.ipc.rdma.RDMAClient.Code(Unknown Source)
>> at org.apache.hadoop.ipc.rdma.RDMAClient.call(Unknown Source)
>> at org.apache.hadoop.ipc.Tempest.invoke(Unknown Source)
>> ... 12 more
>>
>> 14/02/05 19:06:30 WARN hdfs.DFSClient: Error Recovery for null bad
>> datanode[0] nodes == null
>> 14/02/05 19:06:30 WARN hdfs.DFSClient: Could not get block locations.
>> Source file "/user/frolo/input/pg132.txt" - Aborting...
>> 14/02/05 19:06:30 INFO hdfs.DFSClient: exception in isClosed
>>
>>
>> It seems that data is not transferred to DataNodes when I start copying
>> from local filesystem to HDFS. I tested availability of DataNodes:
>>
>> frolo at A11:~/hadoop-rdma-0.9.8> ./bin/hadoop dfsadmin -report
>> Warning: $HADOOP_HOME is deprecated.
>>
>> Configured Capacity: 0 (0 KB)
>> Present Capacity: 0 (0 KB)
>> DFS Remaining: 0 (0 KB)
>> DFS Used: 0 (0 KB)
>> DFS Used%: �%
>> Under replicated blocks: 0
>> Blocks with corrupt replicas: 0
>> Missing blocks: 0
>>
>> -------------------------------------------------
>> Datanodes available: 0 (4 total, 4 dead)
>>
>> Name: 10.10.1.13:50010
>> Decommission Status : Normal
>> Configured Capacity: 0 (0 KB)
>> DFS Used: 0 (0 KB)
>> Non DFS Used: 0 (0 KB)
>> DFS Remaining: 0(0 KB)
>> DFS Used%: 100%
>> DFS Remaining%: 0%
>> Last contact: Wed Feb 05 19:02:54 MSK 2014
>>
>>
>> Name: 10.10.1.14:50010
>> Decommission Status : Normal
>> Configured Capacity: 0 (0 KB)
>> DFS Used: 0 (0 KB)
>> Non DFS Used: 0 (0 KB)
>> DFS Remaining: 0(0 KB)
>> DFS Used%: 100%
>> DFS Remaining%: 0%
>> Last contact: Wed Feb 05 19:02:54 MSK 2014
>>
>>
>> Name: 10.10.1.16:50010
>> Decommission Status : Normal
>> Configured Capacity: 0 (0 KB)
>> DFS Used: 0 (0 KB)
>> Non DFS Used: 0 (0 KB)
>> DFS Remaining: 0(0 KB)
>> DFS Used%: 100%
>> DFS Remaining%: 0%
>> Last contact: Wed Feb 05 19:02:54 MSK 2014
>>
>>
>> Name: 10.10.1.11:50010
>> Decommission Status : Normal
>> Configured Capacity: 0 (0 KB)
>> DFS Used: 0 (0 KB)
>> Non DFS Used: 0 (0 KB)
>> DFS Remaining: 0(0 KB)
>> DFS Used%: 100%
>> DFS Remaining%: 0%
>> Last contact: Wed Feb 05 19:02:55 MSK 2014
>>
>> and tried to mkdir in HDFS filesystem which has been successful.
>> Restarting of Hadoop daemons have not produced any positive effect.
>>
>> Could you please help me with this issue? Thank you.
>>
>> Best,
>> Alex
>>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/hibd-announce/attachments/20140205/6213278d/attachment.html>
More information about the hibd-announce
mailing list