Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Hbase delta load


+
Jignesh Patel 2013-03-21, 16:04
+
Andrew Purtell 2013-03-21, 16:20
+
Jignesh Patel 2013-03-21, 20:24
Copy link to this message
-
Re: Hbase delta load
So at a minimum you'd need to extend HBase to understand the semantics of
the user records, what equality means for this case. This could be done by
writing a coprocessor - code deployed server side injected into query or
store processing, in effect a combination of stored procedures and
triggers. The coprocessor framework also provides plumbing for custom RPC
endpoints, so if existing HBase operations are not expressive enough you
can add your own.

On Thursday, March 21, 2013, Jignesh Patel wrote:

> Delta:
> We are trying to bring two different databases in synch. So in real time we
> insert data in 2 dbs(totally different format).
> But in the night we run a batch job and do cross checking if db2(which is
> actually Hbase) is missing a row or two we will insert it.
>
>
> Data Matching:
> We need to do user verification - i.e. when a new user inserted we will
> check his demographics and based on that we conclude user already exist or
> not.
>
> -Jignesh
>
>
> On Thu, Mar 21, 2013 at 12:20 PM, Andrew Purtell <[EMAIL PROTECTED]<javascript:;>
> >wrote:
>
> > I think you may need to provide just a bit more information about your
> > use case. Could you define a bit more 'delta' and 'data matching'?
> >
> > In a sense, every bulk load is a delta: updates for insert into a
> > larger table, representing a set of changes as a batch.
> >
> > We could consider the existing HBase mechanisms for handling
> > multiversioning to be a simple "data matching functionality" via
> > simple existence testing by coordinate, although I know that is not
> > what you mean (but I don't know what you mean precisely).
> >
> > * - coordinate: { row, column, qualifier, timestamp }
> >
> > On 3/21/13, Jignesh Patel <[EMAIL PROTECTED] <javascript:;>>
> wrote:
> > > We have a requirement to support data matching while loading deltas to
> > > HBase.
> > > I see there is a utility to support bulk loading.
> > > http://hbase.apache.org/book/arch.bulk.load.html
> > >
> > > But is there any way to support daily delta loading?
> > > Is there any open sourced MDM software which can be integrated with
> > HBase?
> > >
> > > Does Hbase has any data matching functionality?
> > >
> > > -Jignesh
> > >
> >
>
--
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)
+
Ted Yu 2013-03-21, 16:07
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB