Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase Thrift inserts bottlenecked somewhere -- but where?


Copy link to this message
-
Re: HBase Thrift inserts bottlenecked somewhere -- but where?
Fairly small -- row keys 32-48 bytes, column keys about the same, and values 50-100 bytes (with a few outliers that probably go up to 1k).

On Mar 3, 2013, at 6:08 AM, Varun Sharma wrote:

> What is the size of your writes ?
>
> On Sat, Mar 2, 2013 at 2:29 PM, Dan Crosta <[EMAIL PROTECTED]> wrote:
>
>> Hm. This could be part of the problem in our case. Unfortunately we don't
>> have very good control over which rowkeys will come from which workers
>> (we're not using map-reduce or anything like it where we have that sort of
>> control, at least not without some changes). But this is valuable
>> information for future developments, thanks for mentioning it.
>>
>> On Mar 2, 2013, at 2:56 PM, Asaf Mesika wrote:
>>
>>> Make sure you are not sending a lot of put of the same rowkey. This can
>>> cause contention in the region server side. We fixed that in our project
>> by
>>> aggregating all the columns for the same rowkey into the same Put object
>>> thus when sending List of Put we made sure each Put has a unique rowkey.
>>>
>>> On Saturday, March 2, 2013, Dan Crosta wrote:
>>>
>>>> On Mar 2, 2013, at 12:38 PM, lars hofhansl wrote:
>>>>> "That's only true from the HDFS perspective, right? Any given region is
>>>>> "owned" by 1 of the 6 regionservers at any given time, and writes are
>>>>> buffered to memory before being persisted to HDFS, right?"
>>>>>
>>>>> Only if you disabled the WAL, otherwise each change is written to the
>>>> WAL first, and then committed to the memstore.
>>>>> So in the sense it's even worse. Each edit is written twice to the FS,
>>>> replicated 3 times, and all that only 6 data nodes.
>>>>
>>>> Are these writes synchronized somehow? Could there be a locking problem
>>>> somewhere that wouldn't show up as utilization of disk or cpu?
>>>>
>>>> What is the upshot of disabling WAL -- I assume it means that if a
>>>> RegionServer crashes, you lose any writes that it has in memory but not
>>>> committed to HFiles?
>>>>
>>>>
>>>>> 20k writes does seem a bit low.
>>>>
>>>> I adjusted dfs.datanode.handler.count from 3 to 10 and now we're up to
>>>> about 22-23k writes per second, but still no apparent contention for
>> any of
>>>> the basic system resources.
>>>>
>>>> Any other suggestions on things to try?
>>>>
>>>> Thanks,
>>>> - Dan
>>
>>