-Re: Increment operations in hbase
lars hofhansl 2013-01-14, 02:27
Did you change the HBase blocksize (in the column family)?
Large blocksize would be good for scans, but are detrimental to point access (Get/Increment/etc).
Something's off in your cluster.
From: kiran <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Sunday, January 13, 2013 12:34 AM
Subject: Re: Increment operations in hbase
Also, the CF for the increments has been set to IN_MEMORY and bloom filter
On Sun, Jan 13, 2013 at 1:17 PM, kiran <[EMAIL PROTECTED]> wrote:
> The idea was given a region server i can get HRegion and Store files in
> that region. In Store, there is a method incrementColumnValue, hence I
> thought of using this method as it may be low-level implementation.
> Yes, gets are proving very costly for me. The other operation in addition
> to this is writing data into hbase in the regionserver but thats into a
> different table not to the one which i need to increment values.
> I did profile using gets and puts across my cluster rather than directly
> using HTable.increment. I am running the daemon in each node, with 1000
> batch get actions and using HTableUtil.bucketRSPut for puts, some nodes
> were able to complete in 10 seconds , some were taking about 3 minutes to
> complete for 1000.
> What is surprising for me is I precomputed rows that are hosted in each
> node and starting the daemon and issued gets only on the rows in that node
> so that data is local, even in this case 3 minutes worst case scenario for
> 1000 actions is huge.
> On Sun, Jan 13, 2013 at 12:52 PM, Anoop John <[EMAIL PROTECTED]>wrote:
>> >Another alternative is to get store files for each row hosted in that
>> operating directly on store files for each increment object ??
>> Sorry didnt get what is the idea. Can you explain pls?
>> Regarding support for Increments in batch API. Sorry I was checking 94
>> base. In 0.92 this support is not there. :(
>> Have you done any profiling of the operation at RS side? How many HFiles
>> an avg per store at this op time and how many CFs for table? Gets seems to
>> be costly for you? Is this bulk increment op only happening at this time?
>> Or some other concurrent ops? Is block cache getting used? Checked cache
>> hit ratio like metric?
>> On Sun, Jan 13, 2013 at 12:20 PM, kiran <[EMAIL PROTECTED]>
>> > I am using hbase 0.92.1 and the table is split evenly across 19 nodes
>> and I
>> > know the node region splitd. I can construct increment objects for each
>> > hosted in that node according to splits (30-50k approx in 15 min per
>> > ...
>> > there is no batch increment support (in api it is given it supports only
>> > get, put and delete)...can I directly use HTable.increment for 30-50k
>> > increment objects in each node sequentially or multithreaded and finish
>> > 15 min.
>> > Another alternative is to get store files for each row hosted in that
>> > operating directly on store files for each increment object ??
>> > On Sun, Jan 13, 2013 at 1:50 AM, Varun Sharma <[EMAIL PROTECTED]>
>> > > IMHO, this seems too low - 1 million operations in 15 minutes
>> > to
>> > > 2K increment operations per second which should be easy to support.
>> > > Moreover, you are running increments on different rows, so contention
>> > > to row locks is also not likely to be a problem.
>> > >
>> > > On hbase 0.94.0, I have seen upto 1K increments per second (note that
>> > this
>> > > will be significantly slower than incrementing individual rows
>> because of
>> > > contention and also this would be limited to 1 node, the one which
>> > > the row). So, I would assume that throughput should be significantly
>> > higher
>> > > for increments across multiple rows. How many nodes are you using and
>> > > the table appropriately split across the nodes.