Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> How to config hbase0.94.2 to retain deleted data


+
yun peng 2012-10-21, 20:53
+
Michael Segel 2012-10-21, 23:34
+
lars hofhansl 2012-10-22, 00:23
+
Michael Segel 2012-10-22, 01:56
+
Michael Segel 2012-10-23, 04:18
+
lars hofhansl 2012-10-23, 05:22
+
Michael Segel 2012-10-23, 11:41
+
lars hofhansl 2012-10-23, 18:35
Copy link to this message
-
Re: How to config hbase0.94.2 to retain deleted data
Lars,

No, that is not what I am suggesting.

Perhaps I am missing something. Was the OP interested in cells or in row deletes.?

Two different issues.

On Oct 23, 2012, at 1:35 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> HBase has time range queries. You can say "give me the data as of time T" or "give me the data between X and Y". How far back you want to retain your data is specified via TTL and VERSIONS.
>
> But... If you delete the data at T+X (X>0), a query as of time T won't return anything, even though at T the data was still there.
>
> If you don't use TTL and/or VERSIONS in HBase you won't need this feature.
>
> If you do use these you're doing so because you want get to the older data. And you delete stuff, chances are you want KEEP_DELETED_CELLS enabled.
> So within the boundaries specified by TTL/VERSIONS you can get to the data as of any time.
>
>
> By your logic nobody should use TTL/VERSIONS, which is nonsense.
>
>
>
> ________________________________
> From: Michael Segel <[EMAIL PROTECTED]>
> To: lars hofhansl <[EMAIL PROTECTED]>
> Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Sent: Tuesday, October 23, 2012 4:41 AM
> Subject: Re: How to config hbase0.94.2 to retain deleted data
>
> "Deleted cells are still subject to TTL and there will never be more than "maximum number of versions" deleted cells. A new "raw" scan options returns all deleted rows and the delete markers. "
>
> This is different from the idea suggested by the OP. Here deleted cells still get deleted. Just that when the compaction flag comes along, its told to ignore them.
>
> So if I say a column can have 3 versions (cells) then if I insert another value for that row:column key, I push that deleted cell down the stack.  Enough times, its gone.
>
> In theory, this feature would be useful if I wanted an OLTP implementation on top of HBase. It would allow the transaction to bridge a compaction cycle. However, that's pretty much it.
>
> This feature doesn't translate well beyond this.
>
> It also begs the following:  How do I handle a long transaction (OLTP)  timeouts, and isolation levels?
>
> If you look at this at the row level... definitely not a good idea. Think of fat clogging an artery.
>  
> On Oct 23, 2012, at 12:22 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
>> http://hbase.apache.org/book/cf.keep.deleted.html
>>
>> Without it you cannot do correct as-of-time queries when it comes to deletes.
>>
>> -- Lars
>>
>> From: Michael Segel <[EMAIL PROTECTED]>
>> To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
>> Sent: Monday, October 22, 2012 9:18 PM
>> Subject: Re: How to config hbase0.94.2 to retain deleted data
>>
>>>
>>> Curious, why do you think this is better than using the keep-deleted-cells feature?
>>> (It might well be, just curious)
>>
>> Ok... so what exactly does this feature mean?
>>
>> Suppose I have 500 rows within a region. I set this feature to be true.
>> I do a massive delete and there are only 50 rows left standing.
>>
>> So if I do a count of the number of rows in the region, I see only 50, yet if I compact the table, its still full.
>>
>> Granted I'm talking about rows and not cells, but the idea is the same. IMHO you're asking for more headaches that you solve.
>>
>> KISS would suggest that moving deleted data in to a different table would yield better performance in the long run.
>>
>>
>> On Oct 21, 2012, at 7:23 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>>
>>> That'd work too. Requires the regionservers to make remote updates to other regionservers, though. And you have to trap each and every change (Put, Delete, Increment, Append, RowMutations, etc)
>>>
>>>
>>> Curious, why do you think this is better than using the keep-deleted-cells feature?
>>> (It might well be, just curious)
>>>
>>>
>>> -- Lars
>>>
>>>
>>>
>>> ----- Original Message -----
>>> From: Michael Segel <[EMAIL PROTECTED]>
>>> To: [EMAIL PROTECTED]
+
lars hofhansl 2012-10-23, 18:47
+
Marcos Ortiz Valmaseda 2012-10-22, 02:12
+
lars hofhansl 2012-10-21, 23:04
+
yun peng 2012-10-22, 00:20
+
lars hofhansl 2012-10-22, 04:34
+
PG 2012-10-23, 22:01