|
kiran
2013-01-12, 10:37
Anoop John
2013-01-12, 15:39
Asaf Mesika
2013-01-12, 18:57
Ted Yu
2013-01-12, 18:59
Varun Sharma
2013-01-12, 20:20
kiran
2013-01-13, 06:50
Anoop John
2013-01-13, 07:22
kiran
2013-01-13, 07:47
kiran
2013-01-13, 08:34
Ted Yu
2013-01-13, 15:35
lars hofhansl
2013-01-14, 02:27
|
-
Increment operations in hbasekiran 2013-01-12, 10:37
Hi,
My usecase is I need to increment 1 million rows with in 15 mins. I tried two approaches but none of the yielded results. I have used HTable.increment, but is not getting completed in the specified time. I tried multi-threading also but it is very costly. I have also implemented get and put as other alternative, but that approach is also not getting completed in 15 mins. Can I use any low level implementation like using "Store or HRegionServer" to increment 1 million rows. I know the table splits, and region servers serving them, and rows which fall into table splits. I suspect the major concern as network I/O rather than processing with the above two approaches. -- Thank you Kiran Sarvabhotla -----Even a correct decision is wrong when it is taken late
-
Re: Increment operations in hbaseAnoop John 2013-01-12, 15:39
Hi
Can you check with using API HTable#batch()? Here you can batch a number of increments for many rows in just one RPC call. Might help you to reduce the net time taken. Good luck. -Anoop- On Sat, Jan 12, 2013 at 4:07 PM, kiran <[EMAIL PROTECTED]> wrote: > Hi, > > My usecase is I need to increment 1 million rows with in 15 mins. I tried > two approaches but none of the yielded results. > > I have used HTable.increment, but is not getting completed in the specified > time. I tried multi-threading also but it is very costly. I have also > implemented get and put as other alternative, but that approach is also not > getting completed in 15 mins. > > Can I use any low level implementation like using "Store or HRegionServer" > to increment 1 million rows. I know the table splits, and region servers > serving them, and rows which fall into table splits. I suspect the major > concern as network I/O rather than processing with the above two > approaches. > > -- > Thank you > Kiran Sarvabhotla > > -----Even a correct decision is wrong when it is taken late >
-
Re: Increment operations in hbaseAsaf Mesika 2013-01-12, 18:57
Most time is spent reading from Store file and not on network transfer time
of Increment objects. Sent from my iPhone On 12 בינו 2013, at 17:40, Anoop John <[EMAIL PROTECTED]> wrote: Hi Can you check with using API HTable#batch()? Here you can batch a number of increments for many rows in just one RPC call. Might help you to reduce the net time taken. Good luck. -Anoop- On Sat, Jan 12, 2013 at 4:07 PM, kiran <[EMAIL PROTECTED]> wrote: Hi, My usecase is I need to increment 1 million rows with in 15 mins. I tried two approaches but none of the yielded results. I have used HTable.increment, but is not getting completed in the specified time. I tried multi-threading also but it is very costly. I have also implemented get and put as other alternative, but that approach is also not getting completed in 15 mins. Can I use any low level implementation like using "Store or HRegionServer" to increment 1 million rows. I know the table splits, and region servers serving them, and rows which fall into table splits. I suspect the major concern as network I/O rather than processing with the above two approaches. -- Thank you Kiran Sarvabhotla -----Even a correct decision is wrong when it is taken late
-
Re: Increment operations in hbaseTed Yu 2013-01-12, 18:59
Can you tell us which version of HBase you are using ?
Thanks On Sat, Jan 12, 2013 at 10:57 AM, Asaf Mesika <[EMAIL PROTECTED]> wrote: > Most time is spent reading from Store file and not on network transfer time > of Increment objects. > > Sent from my iPhone > > On 12 בינו 2013, at 17:40, Anoop John <[EMAIL PROTECTED]> wrote: > > Hi > Can you check with using API HTable#batch()? Here you can batch a > number of increments for many rows in just one RPC call. Might help you to > reduce the net time taken. Good luck. > > -Anoop- > > On Sat, Jan 12, 2013 at 4:07 PM, kiran <[EMAIL PROTECTED]> > wrote: > > Hi, > > > My usecase is I need to increment 1 million rows with in 15 mins. I tried > > two approaches but none of the yielded results. > > > I have used HTable.increment, but is not getting completed in the specified > > time. I tried multi-threading also but it is very costly. I have also > > implemented get and put as other alternative, but that approach is also not > > getting completed in 15 mins. > > > Can I use any low level implementation like using "Store or HRegionServer" > > to increment 1 million rows. I know the table splits, and region servers > > serving them, and rows which fall into table splits. I suspect the major > > concern as network I/O rather than processing with the above two > > approaches. > > > -- > > Thank you > > Kiran Sarvabhotla > > > -----Even a correct decision is wrong when it is taken late >
-
Re: Increment operations in hbaseVarun Sharma 2013-01-12, 20:20
IMHO, this seems too low - 1 million operations in 15 minutes translates to
2K increment operations per second which should be easy to support. Moreover, you are running increments on different rows, so contention due to row locks is also not likely to be a problem. On hbase 0.94.0, I have seen upto 1K increments per second (note that this will be significantly slower than incrementing individual rows because of contention and also this would be limited to 1 node, the one which hosts the row). So, I would assume that throughput should be significantly higher for increments across multiple rows. How many nodes are you using and is the table appropriately split across the nodes. On Sat, Jan 12, 2013 at 10:59 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > Can you tell us which version of HBase you are using ? > > Thanks > > On Sat, Jan 12, 2013 at 10:57 AM, Asaf Mesika <[EMAIL PROTECTED]> > wrote: > > > Most time is spent reading from Store file and not on network transfer > time > > of Increment objects. > > > > Sent from my iPhone > > > > On 12 בינו 2013, at 17:40, Anoop John <[EMAIL PROTECTED]> wrote: > > > > Hi > > Can you check with using API HTable#batch()? Here you can batch a > > number of increments for many rows in just one RPC call. Might help you > to > > reduce the net time taken. Good luck. > > > > -Anoop- > > > > On Sat, Jan 12, 2013 at 4:07 PM, kiran <[EMAIL PROTECTED]> > > wrote: > > > > Hi, > > > > > > My usecase is I need to increment 1 million rows with in 15 mins. I tried > > > > two approaches but none of the yielded results. > > > > > > I have used HTable.increment, but is not getting completed in the > specified > > > > time. I tried multi-threading also but it is very costly. I have also > > > > implemented get and put as other alternative, but that approach is also > not > > > > getting completed in 15 mins. > > > > > > Can I use any low level implementation like using "Store or > HRegionServer" > > > > to increment 1 million rows. I know the table splits, and region servers > > > > serving them, and rows which fall into table splits. I suspect the major > > > > concern as network I/O rather than processing with the above two > > > > approaches. > > > > > > -- > > > > Thank you > > > > Kiran Sarvabhotla > > > > > > -----Even a correct decision is wrong when it is taken late > > >
-
Re: Increment operations in hbasekiran 2013-01-13, 06:50
I am using hbase 0.92.1 and the table is split evenly across 19 nodes and I
know the node region splitd. I can construct increment objects for each row hosted in that node according to splits (30-50k approx in 15 min per node) ... there is no batch increment support (in api it is given it supports only get, put and delete)...can I directly use HTable.increment for 30-50k increment objects in each node sequentially or multithreaded and finish in 15 min. Another alternative is to get store files for each row hosted in that node operating directly on store files for each increment object ?? On Sun, Jan 13, 2013 at 1:50 AM, Varun Sharma <[EMAIL PROTECTED]> wrote: > IMHO, this seems too low - 1 million operations in 15 minutes translates to > 2K increment operations per second which should be easy to support. > Moreover, you are running increments on different rows, so contention due > to row locks is also not likely to be a problem. > > On hbase 0.94.0, I have seen upto 1K increments per second (note that this > will be significantly slower than incrementing individual rows because of > contention and also this would be limited to 1 node, the one which hosts > the row). So, I would assume that throughput should be significantly higher > for increments across multiple rows. How many nodes are you using and is > the table appropriately split across the nodes. > > On Sat, Jan 12, 2013 at 10:59 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > Can you tell us which version of HBase you are using ? > > > > Thanks > > > > On Sat, Jan 12, 2013 at 10:57 AM, Asaf Mesika <[EMAIL PROTECTED]> > > wrote: > > > > > Most time is spent reading from Store file and not on network transfer > > time > > > of Increment objects. > > > > > > Sent from my iPhone > > > > > > On 12 בינו 2013, at 17:40, Anoop John <[EMAIL PROTECTED]> wrote: > > > > > > Hi > > > Can you check with using API HTable#batch()? Here you can batch a > > > number of increments for many rows in just one RPC call. Might help you > > to > > > reduce the net time taken. Good luck. > > > > > > -Anoop- > > > > > > On Sat, Jan 12, 2013 at 4:07 PM, kiran <[EMAIL PROTECTED]> > > > wrote: > > > > > > Hi, > > > > > > > > > My usecase is I need to increment 1 million rows with in 15 mins. I > tried > > > > > > two approaches but none of the yielded results. > > > > > > > > > I have used HTable.increment, but is not getting completed in the > > specified > > > > > > time. I tried multi-threading also but it is very costly. I have also > > > > > > implemented get and put as other alternative, but that approach is also > > not > > > > > > getting completed in 15 mins. > > > > > > > > > Can I use any low level implementation like using "Store or > > HRegionServer" > > > > > > to increment 1 million rows. I know the table splits, and region > servers > > > > > > serving them, and rows which fall into table splits. I suspect the > major > > > > > > concern as network I/O rather than processing with the above two > > > > > > approaches. > > > > > > > > > -- > > > > > > Thank you > > > > > > Kiran Sarvabhotla > > > > > > > > > -----Even a correct decision is wrong when it is taken late > > > > > > -- Thank you Kiran Sarvabhotla -----Even a correct decision is wrong when it is taken late
-
Re: Increment operations in hbaseAnoop John 2013-01-13, 07:22
>Another alternative is to get store files for each row hosted in that node
operating directly on store files for each increment object ?? Sorry didnt get what is the idea. Can you explain pls? Regarding support for Increments in batch API. Sorry I was checking 94 code base. In 0.92 this support is not there. :( Have you done any profiling of the operation at RS side? How many HFiles on an avg per store at this op time and how many CFs for table? Gets seems to be costly for you? Is this bulk increment op only happening at this time? Or some other concurrent ops? Is block cache getting used? Checked cache hit ratio like metric? -Anoop- On Sun, Jan 13, 2013 at 12:20 PM, kiran <[EMAIL PROTECTED]> wrote: > I am using hbase 0.92.1 and the table is split evenly across 19 nodes and I > know the node region splitd. I can construct increment objects for each row > hosted in that node according to splits (30-50k approx in 15 min per node) > ... > > there is no batch increment support (in api it is given it supports only > get, put and delete)...can I directly use HTable.increment for 30-50k > increment objects in each node sequentially or multithreaded and finish in > 15 min. > > Another alternative is to get store files for each row hosted in that node > operating directly on store files for each increment object ?? > > > > On Sun, Jan 13, 2013 at 1:50 AM, Varun Sharma <[EMAIL PROTECTED]> wrote: > > > IMHO, this seems too low - 1 million operations in 15 minutes translates > to > > 2K increment operations per second which should be easy to support. > > Moreover, you are running increments on different rows, so contention due > > to row locks is also not likely to be a problem. > > > > On hbase 0.94.0, I have seen upto 1K increments per second (note that > this > > will be significantly slower than incrementing individual rows because of > > contention and also this would be limited to 1 node, the one which hosts > > the row). So, I would assume that throughput should be significantly > higher > > for increments across multiple rows. How many nodes are you using and is > > the table appropriately split across the nodes. > > > > On Sat, Jan 12, 2013 at 10:59 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > Can you tell us which version of HBase you are using ? > > > > > > Thanks > > > > > > On Sat, Jan 12, 2013 at 10:57 AM, Asaf Mesika <[EMAIL PROTECTED]> > > > wrote: > > > > > > > Most time is spent reading from Store file and not on network > transfer > > > time > > > > of Increment objects. > > > > > > > > Sent from my iPhone > > > > > > > > On 12 בינו 2013, at 17:40, Anoop John <[EMAIL PROTECTED]> wrote: > > > > > > > > Hi > > > > Can you check with using API HTable#batch()? Here you can > batch a > > > > number of increments for many rows in just one RPC call. Might help > you > > > to > > > > reduce the net time taken. Good luck. > > > > > > > > -Anoop- > > > > > > > > On Sat, Jan 12, 2013 at 4:07 PM, kiran <[EMAIL PROTECTED]> > > > > wrote: > > > > > > > > Hi, > > > > > > > > > > > > My usecase is I need to increment 1 million rows with in 15 mins. I > > tried > > > > > > > > two approaches but none of the yielded results. > > > > > > > > > > > > I have used HTable.increment, but is not getting completed in the > > > specified > > > > > > > > time. I tried multi-threading also but it is very costly. I have also > > > > > > > > implemented get and put as other alternative, but that approach is > also > > > not > > > > > > > > getting completed in 15 mins. > > > > > > > > > > > > Can I use any low level implementation like using "Store or > > > HRegionServer" > > > > > > > > to increment 1 million rows. I know the table splits, and region > > servers > > > > > > > > serving them, and rows which fall into table splits. I suspect the > > major > > > > > > > > concern as network I/O rather than processing with the above two > > > > > > > > approaches. > > > > > > > > > > > > -- > > > > > > >
-
Re: Increment operations in hbasekiran 2013-01-13, 07:47
The idea was given a region server i can get HRegion and Store files in
that region. In Store, there is a method incrementColumnValue, hence I thought of using this method as it may be low-level implementation. Yes, gets are proving very costly for me. The other operation in addition to this is writing data into hbase in the regionserver but thats into a different table not to the one which i need to increment values. I did profile using gets and puts across my cluster rather than directly using HTable.increment. I am running the daemon in each node, with 1000 batch get actions and using HTableUtil.bucketRSPut for puts, some nodes were able to complete in 10 seconds , some were taking about 3 minutes to complete for 1000. What is surprising for me is I precomputed rows that are hosted in each node and starting the daemon and issued gets only on the rows in that node so that data is local, even in this case 3 minutes worst case scenario for 1000 actions is huge. On Sun, Jan 13, 2013 at 12:52 PM, Anoop John <[EMAIL PROTECTED]> wrote: > >Another alternative is to get store files for each row hosted in that node > operating directly on store files for each increment object ?? > > Sorry didnt get what is the idea. Can you explain pls? > Regarding support for Increments in batch API. Sorry I was checking 94 code > base. In 0.92 this support is not there. :( > > Have you done any profiling of the operation at RS side? How many HFiles on > an avg per store at this op time and how many CFs for table? Gets seems to > be costly for you? Is this bulk increment op only happening at this time? > Or some other concurrent ops? Is block cache getting used? Checked cache > hit ratio like metric? > > -Anoop- > > On Sun, Jan 13, 2013 at 12:20 PM, kiran <[EMAIL PROTECTED]> > wrote: > > > I am using hbase 0.92.1 and the table is split evenly across 19 nodes > and I > > know the node region splitd. I can construct increment objects for each > row > > hosted in that node according to splits (30-50k approx in 15 min per > node) > > ... > > > > there is no batch increment support (in api it is given it supports only > > get, put and delete)...can I directly use HTable.increment for 30-50k > > increment objects in each node sequentially or multithreaded and finish > in > > 15 min. > > > > Another alternative is to get store files for each row hosted in that > node > > operating directly on store files for each increment object ?? > > > > > > > > On Sun, Jan 13, 2013 at 1:50 AM, Varun Sharma <[EMAIL PROTECTED]> > wrote: > > > > > IMHO, this seems too low - 1 million operations in 15 minutes > translates > > to > > > 2K increment operations per second which should be easy to support. > > > Moreover, you are running increments on different rows, so contention > due > > > to row locks is also not likely to be a problem. > > > > > > On hbase 0.94.0, I have seen upto 1K increments per second (note that > > this > > > will be significantly slower than incrementing individual rows because > of > > > contention and also this would be limited to 1 node, the one which > hosts > > > the row). So, I would assume that throughput should be significantly > > higher > > > for increments across multiple rows. How many nodes are you using and > is > > > the table appropriately split across the nodes. > > > > > > On Sat, Jan 12, 2013 at 10:59 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > > > Can you tell us which version of HBase you are using ? > > > > > > > > Thanks > > > > > > > > On Sat, Jan 12, 2013 at 10:57 AM, Asaf Mesika <[EMAIL PROTECTED] > > > > > > wrote: > > > > > > > > > Most time is spent reading from Store file and not on network > > transfer > > > > time > > > > > of Increment objects. > > > > > > > > > > Sent from my iPhone > > > > > > > > > > On 12 בינו 2013, at 17:40, Anoop John <[EMAIL PROTECTED]> > wrote: > > > > > > > > > > Hi > > > > > Can you check with using API HTable#batch()? Here you can > > batch a > > Thank you Kiran Sarvabhotla
-
Re: Increment operations in hbasekiran 2013-01-13, 08:34
Also, the CF for the increments has been set to IN_MEMORY and bloom filter
ROWCOL On Sun, Jan 13, 2013 at 1:17 PM, kiran <[EMAIL PROTECTED]> wrote: > The idea was given a region server i can get HRegion and Store files in > that region. In Store, there is a method incrementColumnValue, hence I > thought of using this method as it may be low-level implementation. > > Yes, gets are proving very costly for me. The other operation in addition > to this is writing data into hbase in the regionserver but thats into a > different table not to the one which i need to increment values. > > I did profile using gets and puts across my cluster rather than directly > using HTable.increment. I am running the daemon in each node, with 1000 > batch get actions and using HTableUtil.bucketRSPut for puts, some nodes > were able to complete in 10 seconds , some were taking about 3 minutes to > complete for 1000. > > What is surprising for me is I precomputed rows that are hosted in each > node and starting the daemon and issued gets only on the rows in that node > so that data is local, even in this case 3 minutes worst case scenario for > 1000 actions is huge. > > > > On Sun, Jan 13, 2013 at 12:52 PM, Anoop John <[EMAIL PROTECTED]>wrote: > >> >Another alternative is to get store files for each row hosted in that >> node >> operating directly on store files for each increment object ?? >> >> Sorry didnt get what is the idea. Can you explain pls? >> Regarding support for Increments in batch API. Sorry I was checking 94 >> code >> base. In 0.92 this support is not there. :( >> >> Have you done any profiling of the operation at RS side? How many HFiles >> on >> an avg per store at this op time and how many CFs for table? Gets seems to >> be costly for you? Is this bulk increment op only happening at this time? >> Or some other concurrent ops? Is block cache getting used? Checked cache >> hit ratio like metric? >> >> -Anoop- >> >> On Sun, Jan 13, 2013 at 12:20 PM, kiran <[EMAIL PROTECTED]> >> wrote: >> >> > I am using hbase 0.92.1 and the table is split evenly across 19 nodes >> and I >> > know the node region splitd. I can construct increment objects for each >> row >> > hosted in that node according to splits (30-50k approx in 15 min per >> node) >> > ... >> > >> > there is no batch increment support (in api it is given it supports only >> > get, put and delete)...can I directly use HTable.increment for 30-50k >> > increment objects in each node sequentially or multithreaded and finish >> in >> > 15 min. >> > >> > Another alternative is to get store files for each row hosted in that >> node >> > operating directly on store files for each increment object ?? >> > >> > >> > >> > On Sun, Jan 13, 2013 at 1:50 AM, Varun Sharma <[EMAIL PROTECTED]> >> wrote: >> > >> > > IMHO, this seems too low - 1 million operations in 15 minutes >> translates >> > to >> > > 2K increment operations per second which should be easy to support. >> > > Moreover, you are running increments on different rows, so contention >> due >> > > to row locks is also not likely to be a problem. >> > > >> > > On hbase 0.94.0, I have seen upto 1K increments per second (note that >> > this >> > > will be significantly slower than incrementing individual rows >> because of >> > > contention and also this would be limited to 1 node, the one which >> hosts >> > > the row). So, I would assume that throughput should be significantly >> > higher >> > > for increments across multiple rows. How many nodes are you using and >> is >> > > the table appropriately split across the nodes. >> > > >> > > On Sat, Jan 12, 2013 at 10:59 AM, Ted Yu <[EMAIL PROTECTED]> wrote: >> > > >> > > > Can you tell us which version of HBase you are using ? >> > > > >> > > > Thanks >> > > > >> > > > On Sat, Jan 12, 2013 at 10:57 AM, Asaf Mesika < >> [EMAIL PROTECTED]> >> > > > wrote: >> > > > >> > > > > Most time is spent reading from Store file and not on network >> > transfer >> > > > time Thank you Kiran Sarvabhotla
-
Re: Increment operations in hbaseTed Yu 2013-01-13, 15:35
bq. issued gets only on the rows in that node so that data is local
Looks like implementing the logic in coprocessor should help you. Cheers On Sat, Jan 12, 2013 at 11:47 PM, kiran <[EMAIL PROTECTED]> wrote: > The idea was given a region server i can get HRegion and Store files in > that region. In Store, there is a method incrementColumnValue, hence I > thought of using this method as it may be low-level implementation. > > Yes, gets are proving very costly for me. The other operation in addition > to this is writing data into hbase in the regionserver but thats into a > different table not to the one which i need to increment values. > > I did profile using gets and puts across my cluster rather than directly > using HTable.increment. I am running the daemon in each node, with 1000 > batch get actions and using HTableUtil.bucketRSPut for puts, some nodes > were able to complete in 10 seconds , some were taking about 3 minutes to > complete for 1000. > > What is surprising for me is I precomputed rows that are hosted in each > node and starting the daemon and issued gets only on the rows in that node > so that data is local, even in this case 3 minutes worst case scenario for > 1000 actions is huge. > > > > On Sun, Jan 13, 2013 at 12:52 PM, Anoop John <[EMAIL PROTECTED]> > wrote: > > > >Another alternative is to get store files for each row hosted in that > node > > operating directly on store files for each increment object ?? > > > > Sorry didnt get what is the idea. Can you explain pls? > > Regarding support for Increments in batch API. Sorry I was checking 94 > code > > base. In 0.92 this support is not there. :( > > > > Have you done any profiling of the operation at RS side? How many HFiles > on > > an avg per store at this op time and how many CFs for table? Gets seems > to > > be costly for you? Is this bulk increment op only happening at this time? > > Or some other concurrent ops? Is block cache getting used? Checked cache > > hit ratio like metric? > > > > -Anoop- > > > > On Sun, Jan 13, 2013 at 12:20 PM, kiran <[EMAIL PROTECTED]> > > wrote: > > > > > I am using hbase 0.92.1 and the table is split evenly across 19 nodes > > and I > > > know the node region splitd. I can construct increment objects for each > > row > > > hosted in that node according to splits (30-50k approx in 15 min per > > node) > > > ... > > > > > > there is no batch increment support (in api it is given it supports > only > > > get, put and delete)...can I directly use HTable.increment for 30-50k > > > increment objects in each node sequentially or multithreaded and finish > > in > > > 15 min. > > > > > > Another alternative is to get store files for each row hosted in that > > node > > > operating directly on store files for each increment object ?? > > > > > > > > > > > > On Sun, Jan 13, 2013 at 1:50 AM, Varun Sharma <[EMAIL PROTECTED]> > > wrote: > > > > > > > IMHO, this seems too low - 1 million operations in 15 minutes > > translates > > > to > > > > 2K increment operations per second which should be easy to support. > > > > Moreover, you are running increments on different rows, so contention > > due > > > > to row locks is also not likely to be a problem. > > > > > > > > On hbase 0.94.0, I have seen upto 1K increments per second (note that > > > this > > > > will be significantly slower than incrementing individual rows > because > > of > > > > contention and also this would be limited to 1 node, the one which > > hosts > > > > the row). So, I would assume that throughput should be significantly > > > higher > > > > for increments across multiple rows. How many nodes are you using and > > is > > > > the table appropriately split across the nodes. > > > > > > > > On Sat, Jan 12, 2013 at 10:59 AM, Ted Yu <[EMAIL PROTECTED]> > wrote: > > > > > > > > > Can you tell us which version of HBase you are using ? > > > > > > > > > > Thanks > > > > > > > > > > On Sat, Jan 12, 2013 at 10:57 AM, Asaf Mesika < > [EMAIL PROTECTED]
-
Re: Increment operations in hbaselars hofhansl 2013-01-14, 02:27
Did you change the HBase blocksize (in the column family)?
Large blocksize would be good for scans, but are detrimental to point access (Get/Increment/etc). Something's off in your cluster. -- Lars ________________________________ From: kiran <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Sunday, January 13, 2013 12:34 AM Subject: Re: Increment operations in hbase Also, the CF for the increments has been set to IN_MEMORY and bloom filter ROWCOL On Sun, Jan 13, 2013 at 1:17 PM, kiran <[EMAIL PROTECTED]> wrote: > The idea was given a region server i can get HRegion and Store files in > that region. In Store, there is a method incrementColumnValue, hence I > thought of using this method as it may be low-level implementation. > > Yes, gets are proving very costly for me. The other operation in addition > to this is writing data into hbase in the regionserver but thats into a > different table not to the one which i need to increment values. > > I did profile using gets and puts across my cluster rather than directly > using HTable.increment. I am running the daemon in each node, with 1000 > batch get actions and using HTableUtil.bucketRSPut for puts, some nodes > were able to complete in 10 seconds , some were taking about 3 minutes to > complete for 1000. > > What is surprising for me is I precomputed rows that are hosted in each > node and starting the daemon and issued gets only on the rows in that node > so that data is local, even in this case 3 minutes worst case scenario for > 1000 actions is huge. > > > > On Sun, Jan 13, 2013 at 12:52 PM, Anoop John <[EMAIL PROTECTED]>wrote: > >> >Another alternative is to get store files for each row hosted in that >> node >> operating directly on store files for each increment object ?? >> >> Sorry didnt get what is the idea. Can you explain pls? >> Regarding support for Increments in batch API. Sorry I was checking 94 >> code >> base. In 0.92 this support is not there. :( >> >> Have you done any profiling of the operation at RS side? How many HFiles >> on >> an avg per store at this op time and how many CFs for table? Gets seems to >> be costly for you? Is this bulk increment op only happening at this time? >> Or some other concurrent ops? Is block cache getting used? Checked cache >> hit ratio like metric? >> >> -Anoop- >> >> On Sun, Jan 13, 2013 at 12:20 PM, kiran <[EMAIL PROTECTED]> >> wrote: >> >> > I am using hbase 0.92.1 and the table is split evenly across 19 nodes >> and I >> > know the node region splitd. I can construct increment objects for each >> row >> > hosted in that node according to splits (30-50k approx in 15 min per >> node) >> > ... >> > >> > there is no batch increment support (in api it is given it supports only >> > get, put and delete)...can I directly use HTable.increment for 30-50k >> > increment objects in each node sequentially or multithreaded and finish >> in >> > 15 min. >> > >> > Another alternative is to get store files for each row hosted in that >> node >> > operating directly on store files for each increment object ?? >> > >> > >> > >> > On Sun, Jan 13, 2013 at 1:50 AM, Varun Sharma <[EMAIL PROTECTED]> >> wrote: >> > >> > > IMHO, this seems too low - 1 million operations in 15 minutes >> translates >> > to >> > > 2K increment operations per second which should be easy to support. >> > > Moreover, you are running increments on different rows, so contention >> due >> > > to row locks is also not likely to be a problem. >> > > >> > > On hbase 0.94.0, I have seen upto 1K increments per second (note that >> > this >> > > will be significantly slower than incrementing individual rows >> because of >> > > contention and also this would be limited to 1 node, the one which >> hosts >> > > the row). So, I would assume that throughput should be significantly >> > higher >> > > for increments across multiple rows. How many nodes are you using and >> is >> > > the table appropriately split across the nodes. Thank you Kiran Sarvabhotla |