Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - HBase Thrift inserts bottlenecked somewhere -- but where?


+
Dan Crosta 2013-03-01, 12:17
+
Asaf Mesika 2013-03-01, 14:13
+
Dan Crosta 2013-03-01, 14:17
+
Jean-Daniel Cryans 2013-03-01, 17:33
+
Varun Sharma 2013-03-01, 18:46
+
Varun Sharma 2013-03-01, 18:46
+
Dan Crosta 2013-03-01, 18:49
+
Ted Yu 2013-03-01, 18:52
+
Varun Sharma 2013-03-01, 19:01
+
Ted Yu 2013-03-02, 03:53
+
Dan Crosta 2013-03-02, 17:15
+
lars hofhansl 2013-03-02, 03:42
+
Dan Crosta 2013-03-02, 17:12
+
lars hofhansl 2013-03-02, 17:38
+
Dan Crosta 2013-03-02, 18:47
+
Asaf Mesika 2013-03-02, 19:56
+
Ted Yu 2013-03-02, 20:02
+
lars hofhansl 2013-03-02, 20:50
+
lars hofhansl 2013-03-02, 20:50
+
Dan Crosta 2013-03-02, 22:29
+
Varun Sharma 2013-03-03, 11:08
+
Dan Crosta 2013-03-03, 13:53
Copy link to this message
-
Re: HBase Thrift inserts bottlenecked somewhere -- but where?
lars hofhansl 2013-03-02, 20:56
They are flushed to 3 nodes (but not sync'ed to disk on those replicas), so you'll eat 3 network RTTs.

I wrote a bit about this here: http://hadoop-hbase.blogspot.com/2012/05/hbase-hdfs-and-durable-sync.html

You can switch a column family to deferred log flush. In that case the edit is flushed to the 3 replication asynchronously with 1 or 2 secs.
(And if even get to finish HBASE-7801, one can control this per mutation).
-- Lars

________________________________
 From: Dan Crosta <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Sent: Saturday, March 2, 2013 10:47 AM
Subject: Re: HBase Thrift inserts bottlenecked somewhere -- but where?
 
On Mar 2, 2013, at 12:38 PM, lars hofhansl wrote:
> "That's only true from the HDFS perspective, right? Any given region is
> "owned" by 1 of the 6 regionservers at any given time, and writes are
> buffered to memory before being persisted to HDFS, right?"
>
> Only if you disabled the WAL, otherwise each change is written to the WAL first, and then committed to the memstore.
> So in the sense it's even worse. Each edit is written twice to the FS, replicated 3 times, and all that only 6 data nodes.

Are these writes synchronized somehow? Could there be a locking problem somewhere that wouldn't show up as utilization of disk or cpu?

What is the upshot of disabling WAL -- I assume it means that if a RegionServer crashes, you lose any writes that it has in memory but not committed to HFiles?
> 20k writes does seem a bit low.

I adjusted dfs.datanode.handler.count from 3 to 10 and now we're up to about 22-23k writes per second, but still no apparent contention for any of the basic system resources.

Any other suggestions on things to try?

Thanks,
- Dan
+
Andrew Purtell 2013-03-05, 07:04