|
|
-
Counter and Coprocessor Musing
nicolas maillard 2012-12-11, 14:50
Hi everyone
While working with hbase and looking at what the tables and meta look like I hava thought of a couple things, maybe someone has insights. My thoughts are around the count situation it is a current database process to count entries for a given query. for example as a first check to see if everything is written or sometimes to get a feel of a population. I was wondering 2 things: - Should'nt Hbase keep in the metrics for a table it's total entry count? this would not take too much space and often comes in handy. Granted with a coprocessor you could easily create a table with counters for all the other tables in the system but it would be a nice have as a standard.
- I was also wondering maybe every region could know the number of entries it contains. Every region already knows the start and endkey of it's entries. For a count on a given scan this would speed up the count. Every region who's start and and endkey are in the scan would just send back it's population count and only a region that is wider then the count would need to be scanned and counted.
Wondering if these thoughts are already implemented and if I'm missing something or would not be a good idea. Altenratly if this is a not a definite No for some reason could coprocessors allow to implement these thoughts. Can I with a coprocessor write in the metrics part, or on a given scan first check if, for a region smaller than my scan, I already have written somewhere the count instead of scanning and couning.
Thnaks for any thoughts you may have
-
Re: Counter and Coprocessor Musing
Ted Yu 2012-12-11, 17:11
Thanks for sharing your thoughts.
Which HBase version are you currently using ? Have you looked at AggregateImplementation which is included in hbase jar ? A count operation (getRowNum) is in AggregateImplementation.
It would be nice if you can tell us how much difference (in terms of response time) this aggregation lags your expectation.
Also take a look at HBASE-5982 HBase Coprocessor Local Aggregation
Cheers
On Tue, Dec 11, 2012 at 6:50 AM, nicolas maillard < [EMAIL PROTECTED]> wrote:
> Hi everyone > > While working with hbase and looking at what the tables and meta look like > I > hava > thought of a couple things, maybe someone has insights. > My thoughts are around the count situation it is a current database > process to > count entries for a given query. > for example as a first check to see if everything is written or sometimes > to get > a > feel of a population. > I was wondering 2 things: > - Should'nt Hbase keep in the metrics for a table it's total entry count? > this would not take too much space and often comes in handy. Granted with a > coprocessor you could easily create a table with counters for all the other > tables in the system but it would be a nice have as a standard. > > - I was also wondering maybe every region could know the number of entries > it > contains. Every region already knows the start and endkey of it's entries. > For a > count on a given scan this would speed up the count. Every region who's > start > and > and endkey are in the scan would just send back it's population count and > only a > region that is wider then the count would need to be scanned and counted. > > Wondering if these thoughts are already implemented and if I'm missing > something > or would not be a good idea. Altenratly if this is a not a definite No for > some > reason could coprocessors allow to implement these thoughts. Can I with a > coprocessor write in the metrics part, or on a given scan first check if, > for a > region smaller than my scan, I already have written somewhere the count > instead > of > scanning and couning. > > Thnaks for any thoughts you may have > >
-
Re:Re: Counter and Coprocessor Musing
Azury 2012-12-12, 04:10
Hi Ted, I think he want to table 'meta data', not similar to Coprocessor. such as long rows = table.rows();
just probably, not sure about that.
At 2012-12-12 01:11:49,"Ted Yu" <[EMAIL PROTECTED]> wrote: >Thanks for sharing your thoughts. > >Which HBase version are you currently using ? >Have you looked at AggregateImplementation which is included in hbase jar ? >A count operation (getRowNum) is in AggregateImplementation. > >It would be nice if you can tell us how much difference (in terms of >response time) this aggregation lags your expectation. > >Also take a look at HBASE-5982 HBase Coprocessor Local Aggregation > >Cheers > >On Tue, Dec 11, 2012 at 6:50 AM, nicolas maillard < >[EMAIL PROTECTED]> wrote: > >> Hi everyone >> >> While working with hbase and looking at what the tables and meta look like >> I >> hava >> thought of a couple things, maybe someone has insights. >> My thoughts are around the count situation it is a current database >> process to >> count entries for a given query. >> for example as a first check to see if everything is written or sometimes >> to get >> a >> feel of a population. >> I was wondering 2 things: >> - Should'nt Hbase keep in the metrics for a table it's total entry count? >> this would not take too much space and often comes in handy. Granted with a >> coprocessor you could easily create a table with counters for all the other >> tables in the system but it would be a nice have as a standard. >> >> - I was also wondering maybe every region could know the number of entries >> it >> contains. Every region already knows the start and endkey of it's entries. >> For a >> count on a given scan this would speed up the count. Every region who's >> start >> and >> and endkey are in the scan would just send back it's population count and >> only a >> region that is wider then the count would need to be scanned and counted. >> >> Wondering if these thoughts are already implemented and if I'm missing >> something >> or would not be a good idea. Altenratly if this is a not a definite No for >> some >> reason could coprocessors allow to implement these thoughts. Can I with a >> coprocessor write in the metrics part, or on a given scan first check if, >> for a >> region smaller than my scan, I already have written somewhere the count >> instead >> of >> scanning and couning. >> >> Thnaks for any thoughts you may have >> >>
-
RE: Re:Re: Counter and Coprocessor Musing
Anoop Sam John 2012-12-12, 05:04
Agree with Azury Ted : He mentions some thing different than HBASE-5982. If the count of the rows maintained in another meta table, then getting the rows count from that will be much faster than the AggregateImplementation getRowNum I think.
Specific to the use case some one can make this using the CP. But a generic implementation might be difficult. How we can handle the versioning. When a new version comes for an existing row, we should not increment this. Also to handle the TTLs..
-Anoop- ________________________________________ From: Azury [[EMAIL PROTECTED]] Sent: Wednesday, December 12, 2012 9:40 AM To: [EMAIL PROTECTED] Subject: Re:Re: Counter and Coprocessor Musing
Hi Ted, I think he want to table 'meta data', not similar to Coprocessor. such as long rows = table.rows();
just probably, not sure about that.
At 2012-12-12 01:11:49,"Ted Yu" <[EMAIL PROTECTED]> wrote: >Thanks for sharing your thoughts. > >Which HBase version are you currently using ? >Have you looked at AggregateImplementation which is included in hbase jar ? >A count operation (getRowNum) is in AggregateImplementation. > >It would be nice if you can tell us how much difference (in terms of >response time) this aggregation lags your expectation. > >Also take a look at HBASE-5982 HBase Coprocessor Local Aggregation > >Cheers > >On Tue, Dec 11, 2012 at 6:50 AM, nicolas maillard < >[EMAIL PROTECTED]> wrote: > >> Hi everyone >> >> While working with hbase and looking at what the tables and meta look like >> I >> hava >> thought of a couple things, maybe someone has insights. >> My thoughts are around the count situation it is a current database >> process to >> count entries for a given query. >> for example as a first check to see if everything is written or sometimes >> to get >> a >> feel of a population. >> I was wondering 2 things: >> - Should'nt Hbase keep in the metrics for a table it's total entry count? >> this would not take too much space and often comes in handy. Granted with a >> coprocessor you could easily create a table with counters for all the other >> tables in the system but it would be a nice have as a standard. >> >> - I was also wondering maybe every region could know the number of entries >> it >> contains. Every region already knows the start and endkey of it's entries. >> For a >> count on a given scan this would speed up the count. Every region who's >> start >> and >> and endkey are in the scan would just send back it's population count and >> only a >> region that is wider then the count would need to be scanned and counted. >> >> Wondering if these thoughts are already implemented and if I'm missing >> something >> or would not be a good idea. Altenratly if this is a not a definite No for >> some >> reason could coprocessors allow to implement these thoughts. Can I with a >> coprocessor write in the metrics part, or on a given scan first check if, >> for a >> region smaller than my scan, I already have written somewhere the count >> instead >> of >> scanning and couning. >> >> Thnaks for any thoughts you may have >> >>
|
|