Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Mixing Puts and Deletes in a single RPC


Copy link to this message
-
Re: Mixing Puts and Deletes in a single RPC
Jonathan Hsieh 2012-07-10, 03:02
Keith,

The HBASE-3584 feature is a 0.94 and we are strongly considering an 0.94
version for for a future CDH4 update.  There is very little chance this
will get into a CDH3 release.

Jon.

On Thu, Jul 5, 2012 at 4:50 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> I'll let the Cloudera folks speak, but I has assumed CDH4 would include
> HBase 0.94.
>
> -- Lars
>
>
>
> ________________________________
>  From: Ted Yu <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Thursday, July 5, 2012 11:28 AM
> Subject: Re: Mixing Puts and Deletes in a single RPC
>
> Take a look at HBASE-3584: Allow atomic put/delete in one call
> It is in 0.94, meaning it is not even in cdh4
>
> Cheers
>
> On Thu, Jul 5, 2012 at 11:19 AM, Keith Wyss <[EMAIL PROTECTED]>
> wrote:
>
> > Hi,
> >
> > My organization has been doing something zany to simulate atomic row
> > operations is HBase.
> >
> > We have a converter-object model for the writables that are populated in
> > an HBase table, and one of the governing assumptions
> > is that if you are dealing with an Object record, you read all the
> columns
> > that compose it out of HBase or a different data source.
> >
> > When we read lots of data in from a source system that we are trying to
> > mirror with HBase, if a column is null that means that whatever is
> > in HBase for that column is no longer valid. We  have simulated what I
> > believe is now called a AtomicRowMutation by using a single Put
> > and populating it with blanks. The downside is the wasted space accrued
> by
> > the metadata for the blank columns.
> >
> > Atomicity is not of utmost importance to us, but performance is. My
> > approach has been to create a Put and Delete object for a record and
> > populate the Delete with the null columns. Then we call
> > HTable.batch(List<Row>) on a bunch of these. It is my impression that
> this
> > shouldn't appreciably increase network traffic as the RPC calls will be
> > bundled.
> >
> > Has anyone else addressed this problem? Does this seem like a reasonable
> > approach?
> > What sort of performance overhead should I expect?
> >
> > Also, I've seen some Jira tickets about making this an atomic operation
> in
> > its own right. Is that something that
> > I can expect with CDH3U4?
> >
> > Thanks,
> >
> > Keith Wyss
> >
>

--
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// [EMAIL PROTECTED]