Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> RE: M/R vs hbase problem in production


Copy link to this message
-
Re: M/R vs hbase problem in production
yes I'm sure, the map stage is used to aggregate data for the reduce stage.

On Mon, Aug 15, 2011 at 11:20 PM, Buttler, David <[EMAIL PROTECTED]> wrote:

> Are you sure you need to use a reducer to put rows into hbase?  You can
> save a lot of time if you can put the rows into hbase directly in the
> mappers.
>
> Dave
>
> -----Original Message-----
> From: Lior Schachter [mailto:[EMAIL PROTECTED]]
> Sent: Sunday, August 14, 2011 9:32 AM
> To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
> Subject: M/R vs hbase problem in production
>
> Hi,
>
> cluster details:
> hbase 0.90.2. 10 machines. 1GB switch.
>
> use-case
> M/R job that inserts about 10 million rows to hbase in the reducer,
> followed
> by M/R that works with hdfs files.
> When the first job maps finish the second job maps starts and region server
> crushes.
> please note, that when running the 2 jobs separately they both finish
> successfully.
>
> From our monitoring we see that when the 2 jobs work together the network
> load reaches to our max bandwidth - 1GB.
>
> In the region server log we see these exceptions:
> a.
> 2011-08-14 18:37:36,263 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
> Responder, call multi(org.apache.hadoop.hbase.client.MultiAction@491fb2f4)
> from 10.11.87.73:33737: output error
> 2011-08-14 18:37:36,264 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 24 on 8041 caught: java.nio.channels.ClosedChannelException
>        at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
>        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
>        at
> org.apache.hadoop.hbase.ipc.HBaseServer.channelIO(HBaseServer.java:1387)
>        at
> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1339)
>        at
>
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:727)
>        at
>
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:792)
>        at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1083)
>
> b.
> 2011-08-14 18:41:56,225 WARN org.apache.hadoop.hdfs.DFSClient:
> DFSOutputStream ResponseProcessor exception  for block
> blk_-8181634225601608891_579246java.io.EOFException
>        at java.io.DataInputStream.readFully(DataInputStream.java:180)
>        at java.io.DataInputStream.readLong(DataInputStream.java:399)
>        at
>
> org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:122)
>        at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2548)
>
> c.
> 2011-08-14 18:42:02,960 WARN org.apache.hadoop.hdfs.DFSClient: Failed
> recovery attempt #0 from primary datanode 10.11.87.72:50010
> org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException:
> blk_-8181634225601608891_579246 is already commited, storedBlock == null.
>        at
>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.nextGenerationStampForBlock(FSNamesystem.java:4877)
>        at
>
> org.apache.hadoop.hdfs.server.namenode.NameNode.nextGenerationStamp(NameNode.java:501)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955)
>
>        at org.apache.hadoop.ipc.Client.call(Client.java:740)
>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB