|
Manoj Babu
2012-12-05, 13:03
Manoj Babu
2012-12-05, 13:13
Jean-Marc Spaggiari
2012-12-05, 13:31
Anoop John
2012-12-05, 14:17
Doug Meil
2012-12-05, 15:46
Mohammad Tariq
2012-12-05, 16:34
Leonid Fedotov
2012-12-05, 17:03
Nick Dimiduk
2012-12-05, 18:14
Manoj Babu
2012-12-06, 03:14
Anoop Sam John
2012-12-06, 04:35
ramkrishna vasudevan
2012-12-06, 05:15
Manoj Babu
2012-12-06, 06:44
|
-
Reg:delete performance on HBase tableManoj Babu 2012-12-05, 13:03
Hi All,
I am having doubt on delete performance inHBase table. I have 190 million rows in oracle table it hardly took 4hours to delete it, If i am having the same 190 million rows in HBase table how much time by approx Hbase will take to delete the rows(based on row key range) and how internally HBase handles delete? Thanks in advance! Cheers! Manoj.
-
Reg:delete performance on HBase tableManoj Babu 2012-12-05, 13:13
Hi All,
I am having doubt on delete performance inHBase table. I have 190 million rows in oracle table it hardly took 4hours to delete it, If i am having the same 190 million rows in HBase table how much time by approx Hbase will take to delete the rows(based on row key range) and how internally HBase handles delete? Thanks in advance! Cheers! Manoj.
-
Re: Reg:delete performance on HBase tableJean-Marc Spaggiari 2012-12-05, 13:31
Hi Manoj,
Delete in HBase is like a put. If you want to delete the entire table (drop) then it will be very fast. My test table has 100M rows and it's taking few seconds to delete (one CF and one C only). But if you want to delete the rows one by one (like 190M rows out of more) then it's like doing 190M puts. HTH. JM 2012/12/5, Manoj Babu <[EMAIL PROTECTED]>: > Hi All, > > I am having doubt on delete performance inHBase table. > > I have 190 million rows in oracle table it hardly took 4hours to delete it, > If i am having the same 190 million rows in HBase table how much time by > approx Hbase will take to delete the rows(based on row key range) and > how internally HBase handles delete? > > > Thanks in advance! > Cheers! > Manoj. >
-
Re: Reg:delete performance on HBase tableAnoop John 2012-12-05, 14:17
Hi Manoj
Can u tell more abt your use case.. You know the rowkey range which needs to be deleted? (all the rowkeys) Or is it like based on some condition you want to delete a set of rows? Which version of HBase you are using? HBASE-6284 provided some performance improvement in case of delete ( batch of rows deleted) Also I have provided one sample implementation of Endpoint for the bulk deletion ( which might be useful for the 2nd use case I mentioned above) in JIRA HBASE-6942 -Anoop- On Wed, Dec 5, 2012 at 7:01 PM, Jean-Marc Spaggiari <[EMAIL PROTECTED] > wrote: > Hi Manoj, > > Delete in HBase is like a put. > > If you want to delete the entire table (drop) then it will be very > fast. My test table has 100M rows and it's taking few seconds to > delete (one CF and one C only). But if you want to delete the rows one > by one (like 190M rows out of more) then it's like doing 190M puts. > > HTH. > > JM > > 2012/12/5, Manoj Babu <[EMAIL PROTECTED]>: > > Hi All, > > > > I am having doubt on delete performance inHBase table. > > > > I have 190 million rows in oracle table it hardly took 4hours to delete > it, > > If i am having the same 190 million rows in HBase table how much time by > > approx Hbase will take to delete the rows(based on row key range) and > > how internally HBase handles delete? > > > > > > Thanks in advance! > > Cheers! > > Manoj. > > >
-
Re: Reg:delete performance on HBase tableDoug Meil 2012-12-05, 15:46
Hi there, You probably want to read this section on the RefGuide about deleting from HBase. http://hbase.apache.org/book.html#perf.deleting On 12/5/12 8:31 AM, "Jean-Marc Spaggiari" <[EMAIL PROTECTED]> wrote: >Hi Manoj, > >Delete in HBase is like a put. > >If you want to delete the entire table (drop) then it will be very >fast. My test table has 100M rows and it's taking few seconds to >delete (one CF and one C only). But if you want to delete the rows one >by one (like 190M rows out of more) then it's like doing 190M puts. > >HTH. > >JM > >2012/12/5, Manoj Babu <[EMAIL PROTECTED]>: >> Hi All, >> >> I am having doubt on delete performance inHBase table. >> >> I have 190 million rows in oracle table it hardly took 4hours to delete >>it, >> If i am having the same 190 million rows in HBase table how much time by >> approx Hbase will take to delete the rows(based on row key range) and >> how internally HBase handles delete? >> >> >> Thanks in advance! >> Cheers! >> Manoj. >> >
-
Re: Reg:delete performance on HBase tableMohammad Tariq 2012-12-05, 16:34
Hello Manoj,
When a Delete command is issued , no data is actually deleted instantaneously. Instead a tombstone marker is set, making the deleted cells effectively invisible. The tombstone markers are only deleted during major compactions (which compacts all store files to a single one), because in order to prove that a tombstone marker has no effect HBase needs to look at all cells. HBase periodically removes deleted cells during compactions. HTH Regards, Mohammad Tariq On Wed, Dec 5, 2012 at 6:33 PM, Manoj Babu <[EMAIL PROTECTED]> wrote: > Hi All, > > I am having doubt on delete performance inHBase table. > > I have 190 million rows in oracle table it hardly took 4hours to delete it, > If i am having the same 190 million rows in HBase table how much time by > approx Hbase will take to delete the rows(based on row key range) and > how internally HBase handles delete? > > > Thanks in advance! > > Cheers! > Manoj. >
-
Re: Reg:delete performance on HBase tableLeonid Fedotov 2012-12-05, 17:03
do you want to delete just subset of the rows, or delete whole table?
if whole, then use "drop" it should be almost instant. Same as in oracle, you can do "delete * from table;" and it may take forever, even give errors on rollback segment too small… if you use "drop table" it mark it as deleted instantly, and then delete actual data during the next major compaction. Thank you! Sincerely, Leonid Fedotov On Dec 5, 2012, at 5:03 AM, Manoj Babu wrote: > Hi All, > > I am having doubt on delete performance inHBase table. > > I have 190 million rows in oracle table it hardly took 4hours to delete it, > If i am having the same 190 million rows in HBase table how much time by > approx Hbase will take to delete the rows(based on row key range) and > how internally HBase handles delete? > > > Thanks in advance! > > Cheers! > Manoj.
-
Re: Reg:delete performance on HBase tableNick Dimiduk 2012-12-05, 18:14
On Wed, Dec 5, 2012 at 7:46 AM, Doug Meil <[EMAIL PROTECTED]>wrote:
> You probably want to read this section on the RefGuide about deleting from > HBase. > > http://hbase.apache.org/book.html#perf.deleting So hold on. From the guide: 11.9.2. Delete RPC Behavior > > Be aware that htable.delete(Delete) doesn't use the writeBuffer. It will > execute an RegionServer RPC with each invocation. For a large number of > deletes, consider htable.delete(List). > > See > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#delete%28org.apache.hadoop.hbase.client.Delete%29 So Deletes are like Puts except they're not executed the same why. Indeed, HTable.put() is implemented using the write buffer while HTable.delete() makes a MutateRequest directly. What is the reason for this? Why is the semantic of Delete subtly different from Put? For that matter, why not buffer all mutation operations? HTable.checkAndPut(), checkAndDelete() both make direct MutateRequest calls as well. Thanks, -n
-
Re: Reg:delete performance on HBase tableManoj Babu 2012-12-06, 03:14
Team,
Thank you very much for the valuable information. HBase version am using is: HBase Version0.90.3-cdh3u1, r Use case is: We are collecting information on where the user is spending time in our site(tracking the user events) also we are doing historical data migration from existing system also based on the data we need to populate metrics for the year. like Customer A hits option x n times, hits option y n times, Customer B hits option x1 n times, hits option y1 n time. Earlier by using Hadoop MapReduce we are aggregating the whole year data every 2 or 4 days once and using DBOutputFormat emiting to Oracle Table and for inserting 181 Million rows it took only 20 mins through 20 reducers hitting parallel so before populating the year table we use to delete the existing 181 Million rows of that year alone but it tooks more than 3hrs even not deleted then by killing the session done a truncate actually we are in development stage so planning to try HBase for this case since delete is taking too much time in oracle for millions of rows. Need to delete rows based on the year only cannot drop, In oracle also truncate is extremely fast. Cheers! Manoj. On Wed, Dec 5, 2012 at 11:44 PM, Nick Dimiduk <[EMAIL PROTECTED]> wrote: > On Wed, Dec 5, 2012 at 7:46 AM, Doug Meil <[EMAIL PROTECTED] > >wrote: > > > You probably want to read this section on the RefGuide about deleting > from > > HBase. > > > > http://hbase.apache.org/book.html#perf.deleting > > > So hold on. From the guide: > > 11.9.2. Delete RPC Behavior > > > > > Be aware that htable.delete(Delete) doesn't use the writeBuffer. It will > > execute an RegionServer RPC with each invocation. For a large number of > > deletes, consider htable.delete(List). > > > > > See > > > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#delete%28org.apache.hadoop.hbase.client.Delete%29 > > > So Deletes are like Puts except they're not executed the same why. Indeed, > HTable.put() is implemented using the write buffer while HTable.delete() > makes a MutateRequest directly. What is the reason for this? Why is the > semantic of Delete subtly different from Put? > > For that matter, why not buffer all mutation operations? > HTable.checkAndPut(), checkAndDelete() both make direct MutateRequest calls > as well. > > Thanks, > -n >
-
RE: Reg:delete performance on HBase tableAnoop Sam John 2012-12-06, 04:35
Hi Manoj
If I read you correctly, I think you want to aggregate some 3,4 days of data and those data you want to get deleted. Can you think of creating tables for this period (one table for 4 days) and aggregate and drop the table? Then for the next 4 days another table? Or another option is TTL which HBase provides. -Anoop- ________________________________________ From: Manoj Babu [[EMAIL PROTECTED]] Sent: Thursday, December 06, 2012 8:44 AM To: user Subject: Re: Reg:delete performance on HBase table Team, Thank you very much for the valuable information. HBase version am using is: HBase Version0.90.3-cdh3u1, r Use case is: We are collecting information on where the user is spending time in our site(tracking the user events) also we are doing historical data migration from existing system also based on the data we need to populate metrics for the year. like Customer A hits option x n times, hits option y n times, Customer B hits option x1 n times, hits option y1 n time. Earlier by using Hadoop MapReduce we are aggregating the whole year data every 2 or 4 days once and using DBOutputFormat emiting to Oracle Table and for inserting 181 Million rows it took only 20 mins through 20 reducers hitting parallel so before populating the year table we use to delete the existing 181 Million rows of that year alone but it tooks more than 3hrs even not deleted then by killing the session done a truncate actually we are in development stage so planning to try HBase for this case since delete is taking too much time in oracle for millions of rows. Need to delete rows based on the year only cannot drop, In oracle also truncate is extremely fast. Cheers! Manoj. On Wed, Dec 5, 2012 at 11:44 PM, Nick Dimiduk <[EMAIL PROTECTED]> wrote: > On Wed, Dec 5, 2012 at 7:46 AM, Doug Meil <[EMAIL PROTECTED] > >wrote: > > > You probably want to read this section on the RefGuide about deleting > from > > HBase. > > > > http://hbase.apache.org/book.html#perf.deleting > > > So hold on. From the guide: > > 11.9.2. Delete RPC Behavior > > > > > Be aware that htable.delete(Delete) doesn't use the writeBuffer. It will > > execute an RegionServer RPC with each invocation. For a large number of > > deletes, consider htable.delete(List). > > > > > See > > > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#delete%28org.apache.hadoop.hbase.client.Delete%29 > > > So Deletes are like Puts except they're not executed the same why. Indeed, > HTable.put() is implemented using the write buffer while HTable.delete() > makes a MutateRequest directly. What is the reason for this? Why is the > semantic of Delete subtly different from Put? > > For that matter, why not buffer all mutation operations? > HTable.checkAndPut(), checkAndDelete() both make direct MutateRequest calls > as well. > > Thanks, > -n >
-
Re: Reg:delete performance on HBase tableramkrishna vasudevan 2012-12-06, 05:15
Generally if the data is not used after some short duration people tend to
go with individual tables and then drop the table itself.. Regards Ram On Thu, Dec 6, 2012 at 10:05 AM, Anoop Sam John <[EMAIL PROTECTED]> wrote: > Hi Manoj > If I read you correctly, I think you want to aggregate some 3,4 > days of data and those data you want to get deleted. Can you think of > creating tables for this period (one table for 4 days) and aggregate and > drop the table? Then for the next 4 days another table? > > Or another option is TTL which HBase provides. > > -Anoop- > ________________________________________ > From: Manoj Babu [[EMAIL PROTECTED]] > Sent: Thursday, December 06, 2012 8:44 AM > To: user > Subject: Re: Reg:delete performance on HBase table > > Team, > > Thank you very much for the valuable information. > > HBase version am using is: > HBase Version0.90.3-cdh3u1, r > > Use case is: > We are collecting information on where the user is spending time in our > site(tracking the user events) also we are doing historical data migration > from existing system also based on the data we need to populate metrics for > the year. like Customer A hits option x n times, hits option y n > times, Customer B hits option x1 n times, hits option y1 n time. > > Earlier by using Hadoop MapReduce we are aggregating the whole year data > every 2 or 4 days once and using DBOutputFormat emiting to Oracle Table and > for inserting 181 Million rows it took only 20 mins through 20 reducers > hitting parallel so before populating the year table we use to delete > the existing 181 Million rows of that year alone but it tooks more than > 3hrs even not deleted then by killing the session done a truncate actually > we are in development stage so planning to try HBase for this case since > delete is taking too much time in oracle for millions of rows. > > > Need to delete rows based on the year only cannot drop, In oracle also > truncate is extremely fast. > > Cheers! > Manoj. > > > > On Wed, Dec 5, 2012 at 11:44 PM, Nick Dimiduk <[EMAIL PROTECTED]> wrote: > > > On Wed, Dec 5, 2012 at 7:46 AM, Doug Meil <[EMAIL PROTECTED] > > >wrote: > > > > > You probably want to read this section on the RefGuide about deleting > > from > > > HBase. > > > > > > http://hbase.apache.org/book.html#perf.deleting > > > > > > So hold on. From the guide: > > > > 11.9.2. Delete RPC Behavior > > > > > > > > Be aware that htable.delete(Delete) doesn't use the writeBuffer. It > will > > > execute an RegionServer RPC with each invocation. For a large number of > > > deletes, consider htable.delete(List). > > > > > > > > See > > > > > > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#delete%28org.apache.hadoop.hbase.client.Delete%29 > > > > > > So Deletes are like Puts except they're not executed the same why. > Indeed, > > HTable.put() is implemented using the write buffer while HTable.delete() > > makes a MutateRequest directly. What is the reason for this? Why is the > > semantic of Delete subtly different from Put? > > > > For that matter, why not buffer all mutation operations? > > HTable.checkAndPut(), checkAndDelete() both make direct MutateRequest > calls > > as well. > > > > Thanks, > > -n > > >
-
Re: Reg:delete performance on HBase tableManoj Babu 2012-12-06, 06:44
Hi Annop,
Consider if i am running my job today for populating the latest counts for current year, i will process the data from Jan 1st 2012 to day before(Dec 5th 2012) and insert into the table also the same table contains the aggregated data of previous years like 2011,2010. So if i am running the job tomorrow i will process the data from Jan 1st 2012 to Dec 6th 2012 and before inserting into table i will clean all the rows for the year 2012. TTL is a nice option but i want to trigger manually from my job. Thank You. Cheers! Manoj. On Thu, Dec 6, 2012 at 10:05 AM, Anoop Sam John <[EMAIL PROTECTED]> wrote: > Hi Manoj > If I read you correctly, I think you want to aggregate some 3,4 > days of data and those data you want to get deleted. Can you think of > creating tables for this period (one table for 4 days) and aggregate and > drop the table? Then for the next 4 days another table? > > Or another option is TTL which HBase provides. > > -Anoop- > ________________________________________ > From: Manoj Babu [[EMAIL PROTECTED]] > Sent: Thursday, December 06, 2012 8:44 AM > To: user > Subject: Re: Reg:delete performance on HBase table > > Team, > > Thank you very much for the valuable information. > > HBase version am using is: > HBase Version0.90.3-cdh3u1, r > > Use case is: > We are collecting information on where the user is spending time in our > site(tracking the user events) also we are doing historical data migration > from existing system also based on the data we need to populate metrics for > the year. like Customer A hits option x n times, hits option y n > times, Customer B hits option x1 n times, hits option y1 n time. > > Earlier by using Hadoop MapReduce we are aggregating the whole year data > every 2 or 4 days once and using DBOutputFormat emiting to Oracle Table and > for inserting 181 Million rows it took only 20 mins through 20 reducers > hitting parallel so before populating the year table we use to delete > the existing 181 Million rows of that year alone but it tooks more than > 3hrs even not deleted then by killing the session done a truncate actually > we are in development stage so planning to try HBase for this case since > delete is taking too much time in oracle for millions of rows. > > > Need to delete rows based on the year only cannot drop, In oracle also > truncate is extremely fast. > > Cheers! > Manoj. > > > > On Wed, Dec 5, 2012 at 11:44 PM, Nick Dimiduk <[EMAIL PROTECTED]> wrote: > > > On Wed, Dec 5, 2012 at 7:46 AM, Doug Meil <[EMAIL PROTECTED] > > >wrote: > > > > > You probably want to read this section on the RefGuide about deleting > > from > > > HBase. > > > > > > http://hbase.apache.org/book.html#perf.deleting > > > > > > So hold on. From the guide: > > > > 11.9.2. Delete RPC Behavior > > > > > > > > Be aware that htable.delete(Delete) doesn't use the writeBuffer. It > will > > > execute an RegionServer RPC with each invocation. For a large number of > > > deletes, consider htable.delete(List). > > > > > > > > See > > > > > > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#delete%28org.apache.hadoop.hbase.client.Delete%29 > > > > > > So Deletes are like Puts except they're not executed the same why. > Indeed, > > HTable.put() is implemented using the write buffer while HTable.delete() > > makes a MutateRequest directly. What is the reason for this? Why is the > > semantic of Delete subtly different from Put? > > > > For that matter, why not buffer all mutation operations? > > HTable.checkAndPut(), checkAndDelete() both make direct MutateRequest > calls > > as well. > > > > Thanks, > > -n > > > |