Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Counter and Coprocessor Musing


Copy link to this message
-
RE: Re:Re: Counter and Coprocessor Musing
Agree with Azury
Ted : He mentions some thing different than HBASE-5982.
If the count of the rows maintained in another meta table, then getting the rows count from that will be much faster than the AggregateImplementation getRowNum I think.

Specific to the use case some one can make this using the CP. But a generic implementation might be difficult. How we can handle the versioning. When a new version comes for an existing row, we should not increment this. Also to handle the TTLs..

-Anoop-
________________________________________
From: Azury [[EMAIL PROTECTED]]
Sent: Wednesday, December 12, 2012 9:40 AM
To: [EMAIL PROTECTED]
Subject: Re:Re: Counter and Coprocessor Musing

Hi Ted,
I think he want to table 'meta data', not similar to Coprocessor.
such as long rows = table.rows();

just probably, not sure about that.

At 2012-12-12 01:11:49,"Ted Yu" <[EMAIL PROTECTED]> wrote:
>Thanks for sharing your thoughts.
>
>Which HBase version are you currently using ?
>Have you looked at AggregateImplementation which is included in hbase jar ?
>A count operation (getRowNum) is in AggregateImplementation.
>
>It would be nice if you can tell us how much difference (in terms of
>response time) this aggregation lags your expectation.
>
>Also take a look at HBASE-5982 HBase Coprocessor Local Aggregation
>
>Cheers
>
>On Tue, Dec 11, 2012 at 6:50 AM, nicolas maillard <
>[EMAIL PROTECTED]> wrote:
>
>> Hi everyone
>>
>> While working with hbase and looking at what the tables and meta look like
>> I
>> hava
>> thought of a couple things, maybe someone has insights.
>> My thoughts are around the count situation it is a current database
>> process to
>> count entries for a given query.
>> for example as a first check to see if everything is written or sometimes
>> to get
>> a
>> feel of a population.
>> I was wondering 2 things:
>> - Should'nt Hbase keep in the metrics for a table it's total entry count?
>> this would not take too much space and often comes in handy. Granted with a
>> coprocessor you could easily create a table with counters for all the other
>> tables in the system but it would be a nice have as a standard.
>>
>> - I was also wondering maybe every region could know the number of entries
>> it
>> contains. Every region already knows the start and endkey of it's entries.
>> For a
>> count on a given scan this would speed up the count. Every region who's
>> start
>> and
>> and endkey are in the scan would just send back it's population count and
>> only a
>> region that is wider then the count would need to be scanned and counted.
>>
>> Wondering if these thoughts are already implemented and if I'm missing
>> something
>> or would not be a good idea. Altenratly if this is a not a definite No for
>> some
>> reason could coprocessors allow to implement these thoughts. Can I with a
>> coprocessor write in the metrics part, or on a given scan first check if,
>> for a
>> region smaller than my scan, I already have written somewhere the count
>> instead
>> of
>> scanning and couning.
>>
>> Thnaks for any thoughts you may have
>>
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB