-RE: Usage of block encoding in bulk loading
Anoop Sam John 2012-05-13, 18:50
Thanks Stack for your reply. I will work on this and give a patch soon...
From: [EMAIL PROTECTED] [[EMAIL PROTECTED]] on behalf of Stack [[EMAIL PROTECTED]]
Sent: Saturday, May 12, 2012 10:08 AM
To: [EMAIL PROTECTED]
Subject: Re: Usage of block encoding in bulk loading
On Fri, May 11, 2012 at 10:18 AM, Anoop Sam John <[EMAIL PROTECTED]> wrote:
> Hi Devs
> When the data is bulk loaded using HFileOutputFormat, we are not using the block encoding and the HBase handled checksum features I think.. When the writer is created for making the HFile, I am not seeing any such info passing to the WriterBuilder.
> In HFileOutputFormat.getNewWriter(byte family, Configuration conf), we dont have these info and do not pass also to the writer... So those HFiles will not have these optimizations..
> Later in LoadIncrementalHFiles.copyHFileHalf(), where we physically divide one HFile(created by the MR) iff it can not belong to just one region, I can see we pass the datablock encoding details and checksum details to the new HFile writer. But this step wont happen normally I think..
> Correct me if my understanding is wrong pls...
Sounds plausible Anoop. Sounds like something worth fixing too?
Good on you,