Agree with Azury
Ted : He mentions some thing different than HBASE-5982.
If the count of the rows maintained in another meta table, then getting the rows count from that will be much faster than the AggregateImplementation getRowNum I think.
Specific to the use case some one can make this using the CP. But a generic implementation might be difficult. How we can handle the versioning. When a new version comes for an existing row, we should not increment this. Also to handle the TTLs..
From: Azury [[EMAIL PROTECTED]]
Sent: Wednesday, December 12, 2012 9:40 AM
To: [EMAIL PROTECTED]
Subject: Re:Re: Counter and Coprocessor Musing
I think he want to table 'meta data', not similar to Coprocessor.
such as long rows = table.rows();
just probably, not sure about that.
At 2012-12-12 01:11:49,"Ted Yu" <[EMAIL PROTECTED]> wrote:
>Thanks for sharing your thoughts.
>Which HBase version are you currently using ?
>Have you looked at AggregateImplementation which is included in hbase jar ?
>A count operation (getRowNum) is in AggregateImplementation.
>It would be nice if you can tell us how much difference (in terms of
>response time) this aggregation lags your expectation.
>Also take a look at HBASE-5982 HBase Coprocessor Local Aggregation
>On Tue, Dec 11, 2012 at 6:50 AM, nicolas maillard <
>[EMAIL PROTECTED]> wrote:
>> Hi everyone
>> While working with hbase and looking at what the tables and meta look like
>> thought of a couple things, maybe someone has insights.
>> My thoughts are around the count situation it is a current database
>> process to
>> count entries for a given query.
>> for example as a first check to see if everything is written or sometimes
>> to get
>> feel of a population.
>> I was wondering 2 things:
>> - Should'nt Hbase keep in the metrics for a table it's total entry count?
>> this would not take too much space and often comes in handy. Granted with a
>> coprocessor you could easily create a table with counters for all the other
>> tables in the system but it would be a nice have as a standard.
>> - I was also wondering maybe every region could know the number of entries
>> contains. Every region already knows the start and endkey of it's entries.
>> For a
>> count on a given scan this would speed up the count. Every region who's
>> and endkey are in the scan would just send back it's population count and
>> only a
>> region that is wider then the count would need to be scanned and counted.
>> Wondering if these thoughts are already implemented and if I'm missing
>> or would not be a good idea. Altenratly if this is a not a definite No for
>> reason could coprocessors allow to implement these thoughts. Can I with a
>> coprocessor write in the metrics part, or on a given scan first check if,
>> for a
>> region smaller than my scan, I already have written somewhere the count
>> scanning and couning.
>> Thnaks for any thoughts you may have