Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Mixing Puts and Deletes in a single RPC


Copy link to this message
-
Re: Mixing Puts and Deletes in a single RPC
Michael Segel 2012-07-10, 11:57
Regardless,
Its still a bad design.

On Jul 9, 2012, at 10:02 PM, Jonathan Hsieh wrote:

> Keith,
>
> The HBASE-3584 feature is a 0.94 and we are strongly considering an 0.94
> version for for a future CDH4 update.  There is very little chance this
> will get into a CDH3 release.
>
> Jon.
>
> On Thu, Jul 5, 2012 at 4:50 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
>> I'll let the Cloudera folks speak, but I has assumed CDH4 would include
>> HBase 0.94.
>>
>> -- Lars
>>
>>
>>
>> ________________________________
>> From: Ted Yu <[EMAIL PROTECTED]>
>> To: [EMAIL PROTECTED]
>> Sent: Thursday, July 5, 2012 11:28 AM
>> Subject: Re: Mixing Puts and Deletes in a single RPC
>>
>> Take a look at HBASE-3584: Allow atomic put/delete in one call
>> It is in 0.94, meaning it is not even in cdh4
>>
>> Cheers
>>
>> On Thu, Jul 5, 2012 at 11:19 AM, Keith Wyss <[EMAIL PROTECTED]>
>> wrote:
>>
>>> Hi,
>>>
>>> My organization has been doing something zany to simulate atomic row
>>> operations is HBase.
>>>
>>> We have a converter-object model for the writables that are populated in
>>> an HBase table, and one of the governing assumptions
>>> is that if you are dealing with an Object record, you read all the
>> columns
>>> that compose it out of HBase or a different data source.
>>>
>>> When we read lots of data in from a source system that we are trying to
>>> mirror with HBase, if a column is null that means that whatever is
>>> in HBase for that column is no longer valid. We  have simulated what I
>>> believe is now called a AtomicRowMutation by using a single Put
>>> and populating it with blanks. The downside is the wasted space accrued
>> by
>>> the metadata for the blank columns.
>>>
>>> Atomicity is not of utmost importance to us, but performance is. My
>>> approach has been to create a Put and Delete object for a record and
>>> populate the Delete with the null columns. Then we call
>>> HTable.batch(List<Row>) on a bunch of these. It is my impression that
>> this
>>> shouldn't appreciably increase network traffic as the RPC calls will be
>>> bundled.
>>>
>>> Has anyone else addressed this problem? Does this seem like a reasonable
>>> approach?
>>> What sort of performance overhead should I expect?
>>>
>>> Also, I've seen some Jira tickets about making this an atomic operation
>> in
>>> its own right. Is that something that
>>> I can expect with CDH3U4?
>>>
>>> Thanks,
>>>
>>> Keith Wyss
>>>
>>
>
>
>
> --
> // Jonathan Hsieh (shay)
> // Software Engineer, Cloudera
> // [EMAIL PROTECTED]