Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Increment operations in hbase


Copy link to this message
-
Re: Increment operations in hbase
kiran 2013-01-13, 08:34
Also, the CF for the increments has been set to IN_MEMORY and bloom filter
ROWCOL
On Sun, Jan 13, 2013 at 1:17 PM, kiran <[EMAIL PROTECTED]> wrote:

> The idea was given a region server i can get HRegion and Store files in
> that region. In Store, there is a method incrementColumnValue, hence I
> thought of using this method as it may be low-level implementation.
>
> Yes, gets are proving very costly for me. The other operation in addition
> to this is writing data into hbase in the regionserver but thats into a
> different table not to the one which i need to increment values.
>
> I did profile using gets and puts across my cluster rather than directly
> using HTable.increment. I am running the daemon in each node, with 1000
> batch get actions and using HTableUtil.bucketRSPut for puts, some nodes
> were able to complete in 10 seconds , some were taking about 3 minutes to
> complete for 1000.
>
> What is surprising for me is I precomputed rows that are hosted in each
> node and starting the daemon and issued gets only on the rows in that node
> so that data is local, even in this case 3 minutes worst case scenario for
> 1000 actions is huge.
>
>
>
> On Sun, Jan 13, 2013 at 12:52 PM, Anoop John <[EMAIL PROTECTED]>wrote:
>
>> >Another alternative is to get store files for each row hosted in that
>> node
>> operating directly on store files for each increment object ??
>>
>> Sorry didnt get what is the idea. Can you explain pls?
>> Regarding support for Increments in batch API. Sorry I was checking 94
>> code
>> base. In 0.92 this support is not there.  :(
>>
>> Have you done any profiling of the operation at RS side? How many HFiles
>> on
>> an avg per store at this op time and how many CFs for table? Gets seems to
>> be costly for you? Is this bulk increment op only happening at this time?
>> Or some other concurrent ops? Is block cache getting used? Checked cache
>> hit ratio like metric?
>>
>> -Anoop-
>>
>> On Sun, Jan 13, 2013 at 12:20 PM, kiran <[EMAIL PROTECTED]>
>> wrote:
>>
>> > I am using hbase 0.92.1 and the table is split evenly across 19 nodes
>> and I
>> > know the node region splitd. I can construct increment objects for each
>> row
>> > hosted in that node according to splits (30-50k approx in 15 min per
>> node)
>> > ...
>> >
>> > there is no batch increment support (in api it is given it supports only
>> > get, put and delete)...can I directly use HTable.increment for 30-50k
>> > increment objects in each node sequentially or multithreaded and finish
>> in
>> > 15 min.
>> >
>> > Another alternative is to get store files for each row hosted in that
>> node
>> > operating directly on store files for each increment object ??
>> >
>> >
>> >
>> > On Sun, Jan 13, 2013 at 1:50 AM, Varun Sharma <[EMAIL PROTECTED]>
>> wrote:
>> >
>> > > IMHO, this seems too low - 1 million operations in 15 minutes
>> translates
>> > to
>> > > 2K increment operations per second which should be easy to support.
>> > > Moreover, you are running increments on different rows, so contention
>> due
>> > > to row locks is also not likely to be a problem.
>> > >
>> > > On hbase 0.94.0, I have seen upto 1K increments per second (note that
>> > this
>> > > will be significantly slower than incrementing individual rows
>> because of
>> > > contention and also this would be limited to 1 node, the one which
>> hosts
>> > > the row). So, I would assume that throughput should be significantly
>> > higher
>> > > for increments across multiple rows. How many nodes are you using and
>> is
>> > > the table appropriately split across the nodes.
>> > >
>> > > On Sat, Jan 12, 2013 at 10:59 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
>> > >
>> > > > Can you tell us which version of HBase you are using ?
>> > > >
>> > > > Thanks
>> > > >
>> > > > On Sat, Jan 12, 2013 at 10:57 AM, Asaf Mesika <
>> [EMAIL PROTECTED]>
>> > > > wrote:
>> > > >
>> > > > > Most time is spent reading from Store file and not on network
>> > transfer
>> > > > time
Thank you
Kiran Sarvabhotla