Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Distinct counters and counting rows


Copy link to this message
-
Re: Distinct counters and counting rows
Hi David,

Have a look at Coprocessors which can enable you run custom code(Observers)
on get/put/delete actions on Table. You can easily implement the counters
with the help of that.
Here is the description for coprocessors:
https://blogs.apache.org/hbase/entry/coprocessor_introduction

HTH,
Anil
On Wed, May 30, 2012 at 12:17 AM, David Koch <[EMAIL PROTECTED]> wrote:

> Hello,
>
> I am testing HBase for distinct counters - more concretely, counting
> unique users from a fairly large stream of user_ids. For some time to
> come the volume will be limited enough to use exact counting rather
> than approximation but already it's too big to hold the entire set of
> user_ids in memory.
>
> For now I am basically inserting all elements from the stream into a
> "user" table which has row key "user_id" as to enforce the unique
> constraint.
>
> My question:
> a) Is there a way to get a quick (i.e with small delay in a user
> interface) count of the size of the user table to return the number of
> users? Alternatively, is there a way to trigger an increment in
> another table (say "count") whenever a row was added to "user"? I
> guess this can be picked up eventually by the client application but I
> don't want this to delay the actual stream processing.
> b) I heard about Bloom filters in HBase but failed to understand if
> they are used for row keys as well. Are they? How do I activate it? I
> was looking to reduce the work-load of checking set membership for
> every user_id in the stream. If this is done by HBase internally even
> better.
> c) Eventually, I want to store distinct users by day and then do
> unions on different days to get the total amount of unique users for a
> multi-day period. Is this likely to involve a Map Reduce or is there a
> more "light-weight" approach?
>
> Thank you,
>
> /David
>

--
Thanks & Regards,
Anil Gupta
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB