Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Can manually remove HFiles (similar to bulk import, but bulk remove)?


Copy link to this message
-
Re: Can manually remove HFiles (similar to bulk import, but bulk remove)?
On Mon, Jul 9, 2012 at 1:05 PM, Alex Baranau <[EMAIL PROTECTED]>wrote:

> Hey, this is closer!
>
> However, I think I'd want to avoid major compaction. In fact I was thinking
> about avoiding any compactions & splitting.
> ...

So, you are saying that major compaction will look at max/min ts metainfo
> of the HFile and will remove the whole file based on ttl if necessary
> (without going through the file)? Can I tell it not to actually compact
> other HFiles (i.e. leave them as is, otherwise it would be not as easy to
> remove HFiles again in an hour)? I.e. looks like "delete only whole HFiles
> based on TTL" functionality is wat I need here..
>
> Of the top of my head, I don't know how "smart" the major compaction code
is wrt to ttls.  I'm pretty sure it isn't smart enough to explicitly ignore
specific files.
> I fear that complexity with removing HFiles can be caused by (block) cache
> that may hold its information. Is that right? I'm actually OK with HBase to
> return me the data of files I "deleted" by removing HFiles: I will specify
> timerange on scans anyways (in this example to omit things older than 1
> week).
>
>
I'm not sure what the block cache eviction policy is when a single region
is closed, but it sounds like you are ok if stale data remains.

Sounds like you might want to try the close/delete/open advanced approach
on a test cluster to see if it meets your needs.

Jon.

--
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// [EMAIL PROTECTED]
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB