Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> HBase Thrift inserts bottlenecked somewhere -- but where?


+
Dan Crosta 2013-03-01, 12:17
+
Asaf Mesika 2013-03-01, 14:13
+
Dan Crosta 2013-03-01, 14:17
+
Jean-Daniel Cryans 2013-03-01, 17:33
+
Varun Sharma 2013-03-01, 18:46
+
Varun Sharma 2013-03-01, 18:46
+
Dan Crosta 2013-03-01, 18:49
+
Ted Yu 2013-03-01, 18:52
+
Varun Sharma 2013-03-01, 19:01
+
Ted Yu 2013-03-02, 03:53
+
Dan Crosta 2013-03-02, 17:15
+
lars hofhansl 2013-03-02, 03:42
+
Dan Crosta 2013-03-02, 17:12
+
lars hofhansl 2013-03-02, 17:38
+
Dan Crosta 2013-03-02, 18:47
+
Asaf Mesika 2013-03-02, 19:56
+
Ted Yu 2013-03-02, 20:02
+
lars hofhansl 2013-03-02, 20:50
+
lars hofhansl 2013-03-02, 20:50
+
Dan Crosta 2013-03-02, 22:29
+
Varun Sharma 2013-03-03, 11:08
+
Dan Crosta 2013-03-03, 13:53
Copy link to this message
-
Re: HBase Thrift inserts bottlenecked somewhere -- but where?
They are flushed to 3 nodes (but not sync'ed to disk on those replicas), so you'll eat 3 network RTTs.

I wrote a bit about this here: http://hadoop-hbase.blogspot.com/2012/05/hbase-hdfs-and-durable-sync.html

You can switch a column family to deferred log flush. In that case the edit is flushed to the 3 replication asynchronously with 1 or 2 secs.
(And if even get to finish HBASE-7801, one can control this per mutation).
-- Lars

________________________________
 From: Dan Crosta <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Sent: Saturday, March 2, 2013 10:47 AM
Subject: Re: HBase Thrift inserts bottlenecked somewhere -- but where?
 
On Mar 2, 2013, at 12:38 PM, lars hofhansl wrote:
> "That's only true from the HDFS perspective, right? Any given region is
> "owned" by 1 of the 6 regionservers at any given time, and writes are
> buffered to memory before being persisted to HDFS, right?"
>
> Only if you disabled the WAL, otherwise each change is written to the WAL first, and then committed to the memstore.
> So in the sense it's even worse. Each edit is written twice to the FS, replicated 3 times, and all that only 6 data nodes.

Are these writes synchronized somehow? Could there be a locking problem somewhere that wouldn't show up as utilization of disk or cpu?

What is the upshot of disabling WAL -- I assume it means that if a RegionServer crashes, you lose any writes that it has in memory but not committed to HFiles?
> 20k writes does seem a bit low.

I adjusted dfs.datanode.handler.count from 3 to 10 and now we're up to about 22-23k writes per second, but still no apparent contention for any of the basic system resources.

Any other suggestions on things to try?

Thanks,
- Dan
+
Andrew Purtell 2013-03-05, 07:04
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB