Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> checkAnd...


Copy link to this message
-
Re: checkAnd...
Hi,
We have a simple HBase schema:
row key = subscriber id.
Column family A = counters - all kinds of aggregations.

Events records have a UUID, in some scenarios we might get duplicate
events. We should not count the duplicates.

A possible solution was to keep event ids as qualifiers in another CF and
do checkAndIncrement only if can't find the event id.

I understand how to utilize RegionObserver to solve the problem.

Any other suggestions ?

Thanks,
Lior.
On Sun, Apr 28, 2013 at 10:55 PM, Asaf Mesika <[EMAIL PROTECTED]> wrote:

> Yep.
> You can write a RegionObserver which take all event qualifiers with a time
> stamp larger than a certain grace period, sum it up, add it to the current
> value of the Count qualifier and emits an updated Count qualifier.
> I wrote something very similar for us at Akamai and it improved throughput
> by x10. I'm working on open sourcing it.
>
> On Saturday, April 27, 2013, Lior Schachter wrote:
>
> > Hi Ted,
> > Thanks for the prompt response.
> > I've already had a look at HRegionServer.checkAndPut and the
> implementation
> > looks quite straight forward.
> > That's why I was wondering why the other 2 methods are not available...or
> > planned (couldn't find Jira).
> > Seems like a useful functionality.
> >
> > Anyhow, I'm not allowed to make any source code modifications to the
> HBase
> > installation (in production) so I reckon I'll have to find a workaround.
> >
> > This is my use case:
> > Updating user counters by events.
> > We may get (in rare cases) duplicate events.
> > Should not count the duplicates.
> >
> > My initial thought was to have an event_id qualifier for each incoming
> > event (with '1' value). By checking if event_id exists before
> incrementing
> > I can avoid duplicates.
> > Without the checkAndIncrement functionality I must make 2 round trips for
> > each event (which doesn't make sense).
> >
> > Any ideas how I can solve this issue ?
> >
> > Thanks,
> > Lior
> >
> >
> >
> >
> >
> >
> >
> >
> > On Sat, Apr 27, 2013 at 4:23 PM, Ted Yu <[EMAIL PROTECTED]
> <javascript:;>>
> > wrote:
> >
> > > Take a look at the following method in HRegionServer:
> > >
> > >   public boolean checkAndPut(final byte[] regionName, final byte[] row,
> > >       final byte[] family, final byte[] qualifier, final byte[] value,
> > >       final Put put) throws IOException {
> > >
> > > You can create checkAndIncrement() in a similar way.
> > >
> > > Cheers
> > >
> > > On Sat, Apr 27, 2013 at 9:02 PM, Lior Schachter <[EMAIL PROTECTED]
> <javascript:;>>
> > wrote:
> > >
> > > > Hi,
> > > > I want to increment a cell value only after checking a condition on
> > > another
> > > > cell. I could find checkAndPut/checkAndDelete on HTableInteface. It
> > seems
> > > > that checkAndIncrement (and checkAndAppend) are missing.
> > > >
> > > > Can you suggest a workaround for my use-case ?  working with version
> > > > 0.94.5.
> > > >
> > > > Thanks,
> > > > Lior
> > > >
> > >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB