Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # dev >> Usage of block encoding in bulk loading

Anoop Sam John 2012-05-11, 17:18
Stack 2012-05-12, 04:38
Copy link to this message
RE: Usage of block encoding in bulk loading
Thanks Stack for your reply. I will work on this and give a patch soon...

Sent: Saturday, May 12, 2012 10:08 AM
Subject: Re: Usage of block encoding in bulk loading

On Fri, May 11, 2012 at 10:18 AM, Anoop Sam John <[EMAIL PROTECTED]> wrote:
> Hi Devs
>              When the data is bulk loaded using HFileOutputFormat, we are not using the block encoding and the HBase handled checksum features I think..  When the writer is created for making the HFile, I am not seeing any such info passing to the WriterBuilder.
> In HFileOutputFormat.getNewWriter(byte[] family, Configuration conf), we dont have these info and do not pass also to the writer... So those HFiles will not have these optimizations..
> Later in LoadIncrementalHFiles.copyHFileHalf(), where we physically divide one HFile(created by the MR) iff it can not belong to just one region, I can see we pass the datablock encoding details and checksum details to the new HFile writer. But this step wont happen normally I think..
> Correct me if my understanding is wrong pls...

Sounds plausible Anoop.  Sounds like something worth fixing too?

Good on you,