Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - HBase Thrift inserts bottlenecked somewhere -- but where?


+
Dan Crosta 2013-03-01, 12:17
+
Asaf Mesika 2013-03-01, 14:13
+
Dan Crosta 2013-03-01, 14:17
+
Jean-Daniel Cryans 2013-03-01, 17:33
+
Varun Sharma 2013-03-01, 18:46
+
Varun Sharma 2013-03-01, 18:46
+
Dan Crosta 2013-03-01, 18:49
+
Ted Yu 2013-03-01, 18:52
+
Varun Sharma 2013-03-01, 19:01
+
Ted Yu 2013-03-02, 03:53
+
Dan Crosta 2013-03-02, 17:15
+
lars hofhansl 2013-03-02, 03:42
+
Dan Crosta 2013-03-02, 17:12
+
lars hofhansl 2013-03-02, 17:38
+
Dan Crosta 2013-03-02, 18:47
+
Asaf Mesika 2013-03-02, 19:56
+
Ted Yu 2013-03-02, 20:02
+
lars hofhansl 2013-03-02, 20:50
+
lars hofhansl 2013-03-02, 20:50
Copy link to this message
-
Re: HBase Thrift inserts bottlenecked somewhere -- but where?
Dan Crosta 2013-03-02, 22:29
Hm. This could be part of the problem in our case. Unfortunately we don't have very good control over which rowkeys will come from which workers (we're not using map-reduce or anything like it where we have that sort of control, at least not without some changes). But this is valuable information for future developments, thanks for mentioning it.

On Mar 2, 2013, at 2:56 PM, Asaf Mesika wrote:

> Make sure you are not sending a lot of put of the same rowkey. This can
> cause contention in the region server side. We fixed that in our project by
> aggregating all the columns for the same rowkey into the same Put object
> thus when sending List of Put we made sure each Put has a unique rowkey.
>
> On Saturday, March 2, 2013, Dan Crosta wrote:
>
>> On Mar 2, 2013, at 12:38 PM, lars hofhansl wrote:
>>> "That's only true from the HDFS perspective, right? Any given region is
>>> "owned" by 1 of the 6 regionservers at any given time, and writes are
>>> buffered to memory before being persisted to HDFS, right?"
>>>
>>> Only if you disabled the WAL, otherwise each change is written to the
>> WAL first, and then committed to the memstore.
>>> So in the sense it's even worse. Each edit is written twice to the FS,
>> replicated 3 times, and all that only 6 data nodes.
>>
>> Are these writes synchronized somehow? Could there be a locking problem
>> somewhere that wouldn't show up as utilization of disk or cpu?
>>
>> What is the upshot of disabling WAL -- I assume it means that if a
>> RegionServer crashes, you lose any writes that it has in memory but not
>> committed to HFiles?
>>
>>
>>> 20k writes does seem a bit low.
>>
>> I adjusted dfs.datanode.handler.count from 3 to 10 and now we're up to
>> about 22-23k writes per second, but still no apparent contention for any of
>> the basic system resources.
>>
>> Any other suggestions on things to try?
>>
>> Thanks,
>> - Dan
+
Varun Sharma 2013-03-03, 11:08
+
Dan Crosta 2013-03-03, 13:53
+
lars hofhansl 2013-03-02, 20:56
+
Andrew Purtell 2013-03-05, 07:04