Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> feature request (count)


Copy link to this message
-
Re: feature request (count)
"Each HFile knows how many KV entries there are in it, but this does
not map in a general way to the
number of rows, or the number of rows with a specific column."

It would be nice to have an index like that;  Would solve a lot of
issues for people migrating from mysql.  I assume that without the
'count' feature, people are resorting to storing dataset elements in
other engines, which is not great, since you then end up to require a
non-hbase index to be consistent and authoritative for all of your
datasets that require counts.

-Jack
On Fri, Jun 3, 2011 at 3:24 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote:
> This is a commonly requested feature, and it remains unimplemented
> because it is actually quite hard.  Each HFile knows how many KV
> entries there are in it, but this does not map in a general way to the
> number of rows, or the number of rows with a specific column. Keeping
> track of the row count as new rows are created is also not as easy as
> it seems - this is because a Put does not know if a row already exists
> or not.  Making it aware of that fact would require doing a get before
> a put - not cheap.
>
> -ryan
>
> On Fri, Jun 3, 2011 at 3:20 PM, Jack Levin <[EMAIL PROTECTED]> wrote:
>> I have a feature request:  There should be a native function called
>> 'count', that produces count of rows based on specific family filter,
>> that is internal to HBASE and won't be required to read CELLs off the
>> disk/cache.  Just count up the rows in the most efficient way
>> possible.  I realize that family definitions are part of the cells, so
>> it would be nice to have an index that somehow can produce low IO/CPU
>> hit to hbase when doing a count (for example enabling an index like
>> that in table schema would be how you turn it on for a specific
>> family).
>>
>> Best,
>>
>> -Jack
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB