-Re: Can manually remove HFiles (similar to bulk import, but bulk remove)?
Jonathan Hsieh 2012-07-10, 12:10
On Mon, Jul 9, 2012 at 1:05 PM, Alex Baranau <[EMAIL PROTECTED]>wrote:
> Hey, this is closer!
> However, I think I'd want to avoid major compaction. In fact I was thinking
> about avoiding any compactions & splitting.
So, you are saying that major compaction will look at max/min ts metainfo
> of the HFile and will remove the whole file based on ttl if necessary
> (without going through the file)? Can I tell it not to actually compact
> other HFiles (i.e. leave them as is, otherwise it would be not as easy to
> remove HFiles again in an hour)? I.e. looks like "delete only whole HFiles
> based on TTL" functionality is wat I need here..
> Of the top of my head, I don't know how "smart" the major compaction code
is wrt to ttls. I'm pretty sure it isn't smart enough to explicitly ignore
> I fear that complexity with removing HFiles can be caused by (block) cache
> that may hold its information. Is that right? I'm actually OK with HBase to
> return me the data of files I "deleted" by removing HFiles: I will specify
> timerange on scans anyways (in this example to omit things older than 1
I'm not sure what the block cache eviction policy is when a single region
is closed, but it sounds like you are ok if stale data remains.
Sounds like you might want to try the close/delete/open advanced approach
on a test cluster to see if it meets your needs.
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// [EMAIL PROTECTED]