Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> checkAnd...


Copy link to this message
-
Re: checkAnd...
Hi,
We have a simple HBase schema:
row key = subscriber id.
Column family A = counters - all kinds of aggregations.

Events records have a UUID, in some scenarios we might get duplicate
events. We should not count the duplicates.

A possible solution was to keep event ids as qualifiers in another CF and
do checkAndIncrement only if can't find the event id.

I understand how to utilize RegionObserver to solve the problem.

Any other suggestions ?

Thanks,
Lior.
On Sun, Apr 28, 2013 at 10:55 PM, Asaf Mesika <[EMAIL PROTECTED]> wrote:

> Yep.
> You can write a RegionObserver which take all event qualifiers with a time
> stamp larger than a certain grace period, sum it up, add it to the current
> value of the Count qualifier and emits an updated Count qualifier.
> I wrote something very similar for us at Akamai and it improved throughput
> by x10. I'm working on open sourcing it.
>
> On Saturday, April 27, 2013, Lior Schachter wrote:
>
> > Hi Ted,
> > Thanks for the prompt response.
> > I've already had a look at HRegionServer.checkAndPut and the
> implementation
> > looks quite straight forward.
> > That's why I was wondering why the other 2 methods are not available...or
> > planned (couldn't find Jira).
> > Seems like a useful functionality.
> >
> > Anyhow, I'm not allowed to make any source code modifications to the
> HBase
> > installation (in production) so I reckon I'll have to find a workaround.
> >
> > This is my use case:
> > Updating user counters by events.
> > We may get (in rare cases) duplicate events.
> > Should not count the duplicates.
> >
> > My initial thought was to have an event_id qualifier for each incoming
> > event (with '1' value). By checking if event_id exists before
> incrementing
> > I can avoid duplicates.
> > Without the checkAndIncrement functionality I must make 2 round trips for
> > each event (which doesn't make sense).
> >
> > Any ideas how I can solve this issue ?
> >
> > Thanks,
> > Lior
> >
> >
> >
> >
> >
> >
> >
> >
> > On Sat, Apr 27, 2013 at 4:23 PM, Ted Yu <[EMAIL PROTECTED]
> <javascript:;>>
> > wrote:
> >
> > > Take a look at the following method in HRegionServer:
> > >
> > >   public boolean checkAndPut(final byte[] regionName, final byte[] row,
> > >       final byte[] family, final byte[] qualifier, final byte[] value,
> > >       final Put put) throws IOException {
> > >
> > > You can create checkAndIncrement() in a similar way.
> > >
> > > Cheers
> > >
> > > On Sat, Apr 27, 2013 at 9:02 PM, Lior Schachter <[EMAIL PROTECTED]
> <javascript:;>>
> > wrote:
> > >
> > > > Hi,
> > > > I want to increment a cell value only after checking a condition on
> > > another
> > > > cell. I could find checkAndPut/checkAndDelete on HTableInteface. It
> > seems
> > > > that checkAndIncrement (and checkAndAppend) are missing.
> > > >
> > > > Can you suggest a workaround for my use-case ?  working with version
> > > > 0.94.5.
> > > >
> > > > Thanks,
> > > > Lior
> > > >
> > >
> >
>