Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Counter and Coprocessor Musing


Copy link to this message
-
RE: Re:Re: Counter and Coprocessor Musing
Agree with Azury
Ted : He mentions some thing different than HBASE-5982.
If the count of the rows maintained in another meta table, then getting the rows count from that will be much faster than the AggregateImplementation getRowNum I think.

Specific to the use case some one can make this using the CP. But a generic implementation might be difficult. How we can handle the versioning. When a new version comes for an existing row, we should not increment this. Also to handle the TTLs..

-Anoop-
________________________________________
From: Azury [[EMAIL PROTECTED]]
Sent: Wednesday, December 12, 2012 9:40 AM
To: [EMAIL PROTECTED]
Subject: Re:Re: Counter and Coprocessor Musing

Hi Ted,
I think he want to table 'meta data', not similar to Coprocessor.
such as long rows = table.rows();

just probably, not sure about that.

At 2012-12-12 01:11:49,"Ted Yu" <[EMAIL PROTECTED]> wrote:
>Thanks for sharing your thoughts.
>
>Which HBase version are you currently using ?
>Have you looked at AggregateImplementation which is included in hbase jar ?
>A count operation (getRowNum) is in AggregateImplementation.
>
>It would be nice if you can tell us how much difference (in terms of
>response time) this aggregation lags your expectation.
>
>Also take a look at HBASE-5982 HBase Coprocessor Local Aggregation
>
>Cheers
>
>On Tue, Dec 11, 2012 at 6:50 AM, nicolas maillard <
>[EMAIL PROTECTED]> wrote:
>
>> Hi everyone
>>
>> While working with hbase and looking at what the tables and meta look like
>> I
>> hava
>> thought of a couple things, maybe someone has insights.
>> My thoughts are around the count situation it is a current database
>> process to
>> count entries for a given query.
>> for example as a first check to see if everything is written or sometimes
>> to get
>> a
>> feel of a population.
>> I was wondering 2 things:
>> - Should'nt Hbase keep in the metrics for a table it's total entry count?
>> this would not take too much space and often comes in handy. Granted with a
>> coprocessor you could easily create a table with counters for all the other
>> tables in the system but it would be a nice have as a standard.
>>
>> - I was also wondering maybe every region could know the number of entries
>> it
>> contains. Every region already knows the start and endkey of it's entries.
>> For a
>> count on a given scan this would speed up the count. Every region who's
>> start
>> and
>> and endkey are in the scan would just send back it's population count and
>> only a
>> region that is wider then the count would need to be scanned and counted.
>>
>> Wondering if these thoughts are already implemented and if I'm missing
>> something
>> or would not be a good idea. Altenratly if this is a not a definite No for
>> some
>> reason could coprocessors allow to implement these thoughts. Can I with a
>> coprocessor write in the metrics part, or on a given scan first check if,
>> for a
>> region smaller than my scan, I already have written somewhere the count
>> instead
>> of
>> scanning and couning.
>>
>> Thnaks for any thoughts you may have
>>
>>