Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Mixing Puts and Deletes in a single RPC


Copy link to this message
-
Re: Mixing Puts and Deletes in a single RPC
Keith,

The HBASE-3584 feature is a 0.94 and we are strongly considering an 0.94
version for for a future CDH4 update.  There is very little chance this
will get into a CDH3 release.

Jon.

On Thu, Jul 5, 2012 at 4:50 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> I'll let the Cloudera folks speak, but I has assumed CDH4 would include
> HBase 0.94.
>
> -- Lars
>
>
>
> ________________________________
>  From: Ted Yu <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Thursday, July 5, 2012 11:28 AM
> Subject: Re: Mixing Puts and Deletes in a single RPC
>
> Take a look at HBASE-3584: Allow atomic put/delete in one call
> It is in 0.94, meaning it is not even in cdh4
>
> Cheers
>
> On Thu, Jul 5, 2012 at 11:19 AM, Keith Wyss <[EMAIL PROTECTED]>
> wrote:
>
> > Hi,
> >
> > My organization has been doing something zany to simulate atomic row
> > operations is HBase.
> >
> > We have a converter-object model for the writables that are populated in
> > an HBase table, and one of the governing assumptions
> > is that if you are dealing with an Object record, you read all the
> columns
> > that compose it out of HBase or a different data source.
> >
> > When we read lots of data in from a source system that we are trying to
> > mirror with HBase, if a column is null that means that whatever is
> > in HBase for that column is no longer valid. We  have simulated what I
> > believe is now called a AtomicRowMutation by using a single Put
> > and populating it with blanks. The downside is the wasted space accrued
> by
> > the metadata for the blank columns.
> >
> > Atomicity is not of utmost importance to us, but performance is. My
> > approach has been to create a Put and Delete object for a record and
> > populate the Delete with the null columns. Then we call
> > HTable.batch(List<Row>) on a bunch of these. It is my impression that
> this
> > shouldn't appreciably increase network traffic as the RPC calls will be
> > bundled.
> >
> > Has anyone else addressed this problem? Does this seem like a reasonable
> > approach?
> > What sort of performance overhead should I expect?
> >
> > Also, I've seen some Jira tickets about making this an atomic operation
> in
> > its own right. Is that something that
> > I can expect with CDH3U4?
> >
> > Thanks,
> >
> > Keith Wyss
> >
>

--
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// [EMAIL PROTECTED]
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB