Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Can manually remove HFiles (similar to bulk import, but bulk remove)?


Copy link to this message
-
Re: Can manually remove HFiles (similar to bulk import, but bulk remove)?
Jonathan Hsieh 2012-07-10, 12:10
On Mon, Jul 9, 2012 at 1:05 PM, Alex Baranau <[EMAIL PROTECTED]>wrote:

> Hey, this is closer!
>
> However, I think I'd want to avoid major compaction. In fact I was thinking
> about avoiding any compactions & splitting.
> ...

So, you are saying that major compaction will look at max/min ts metainfo
> of the HFile and will remove the whole file based on ttl if necessary
> (without going through the file)? Can I tell it not to actually compact
> other HFiles (i.e. leave them as is, otherwise it would be not as easy to
> remove HFiles again in an hour)? I.e. looks like "delete only whole HFiles
> based on TTL" functionality is wat I need here..
>
> Of the top of my head, I don't know how "smart" the major compaction code
is wrt to ttls.  I'm pretty sure it isn't smart enough to explicitly ignore
specific files.
> I fear that complexity with removing HFiles can be caused by (block) cache
> that may hold its information. Is that right? I'm actually OK with HBase to
> return me the data of files I "deleted" by removing HFiles: I will specify
> timerange on scans anyways (in this example to omit things older than 1
> week).
>
>
I'm not sure what the block cache eviction policy is when a single region
is closed, but it sounds like you are ok if stale data remains.

Sounds like you might want to try the close/delete/open advanced approach
on a test cluster to see if it meets your needs.

Jon.

--
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// [EMAIL PROTECTED]