Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Reg:delete performance on HBase table


+
Manoj Babu 2012-12-05, 13:13
+
Jean-Marc Spaggiari 2012-12-05, 13:31
+
Doug Meil 2012-12-05, 15:46
+
Nick Dimiduk 2012-12-05, 18:14
Copy link to this message
-
Re: Reg:delete performance on HBase table
Team,

Thank you very much for the valuable information.

HBase version am using is:
HBase Version0.90.3-cdh3u1, r

Use case is:
We are collecting information on where the user is spending time in our
site(tracking the user events) also we are doing historical data migration
from existing system also based on the data we need to populate metrics for
the year. like Customer A hits option x n times, hits option y n
times, Customer B hits option x1 n times, hits option y1 n time.

Earlier by using Hadoop MapReduce we are aggregating the whole year data
every 2 or 4 days once and using DBOutputFormat emiting to Oracle Table and
for inserting 181 Million rows it took only 20 mins through 20 reducers
hitting parallel so before populating the year table we use to delete
the existing 181 Million rows of that year alone but it tooks more than
3hrs even not deleted then by killing the session done a truncate actually
we are in development stage so planning to try HBase for this case since
delete is taking too much time in oracle for millions of rows.
Need to delete rows based on the year only cannot drop, In oracle also
truncate is extremely fast.

Cheers!
Manoj.

On Wed, Dec 5, 2012 at 11:44 PM, Nick Dimiduk <[EMAIL PROTECTED]> wrote:

> On Wed, Dec 5, 2012 at 7:46 AM, Doug Meil <[EMAIL PROTECTED]
> >wrote:
>
> > You probably want to read this section on the RefGuide about deleting
> from
> > HBase.
> >
> > http://hbase.apache.org/book.html#perf.deleting
>
>
> So hold on. From the guide:
>
> 11.9.2. Delete RPC Behavior
> >
>
> > Be aware that htable.delete(Delete) doesn't use the writeBuffer. It will
> > execute an RegionServer RPC with each invocation. For a large number of
> > deletes, consider htable.delete(List).
> >
>
> > See
> >
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#delete%28org.apache.hadoop.hbase.client.Delete%29
>
>
> So Deletes are like Puts except they're not executed the same why. Indeed,
> HTable.put() is implemented using the write buffer while HTable.delete()
> makes a MutateRequest directly. What is the reason for this? Why is the
> semantic of Delete subtly different from Put?
>
> For that matter, why not buffer all mutation operations?
> HTable.checkAndPut(), checkAndDelete() both make direct MutateRequest calls
> as well.
>
> Thanks,
> -n
>
+
Anoop Sam John 2012-12-06, 04:35
+
Manoj Babu 2012-12-06, 06:44
+
ramkrishna vasudevan 2012-12-06, 05:15
+
Anoop John 2012-12-05, 14:17
+
Manoj Babu 2012-12-05, 13:03
+
Leonid Fedotov 2012-12-05, 17:03
+
Mohammad Tariq 2012-12-05, 16:34
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB