Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Usage of block encoding in bulk loading


Copy link to this message
-
RE: Usage of block encoding in bulk loading
Thanks Stack for your reply. I will work on this and give a patch soon...

-Anoop-
________________________________________
From: [EMAIL PROTECTED] [[EMAIL PROTECTED]] on behalf of Stack [[EMAIL PROTECTED]]
Sent: Saturday, May 12, 2012 10:08 AM
To: [EMAIL PROTECTED]
Subject: Re: Usage of block encoding in bulk loading

On Fri, May 11, 2012 at 10:18 AM, Anoop Sam John <[EMAIL PROTECTED]> wrote:
> Hi Devs
>              When the data is bulk loaded using HFileOutputFormat, we are not using the block encoding and the HBase handled checksum features I think..  When the writer is created for making the HFile, I am not seeing any such info passing to the WriterBuilder.
> In HFileOutputFormat.getNewWriter(byte[] family, Configuration conf), we dont have these info and do not pass also to the writer... So those HFiles will not have these optimizations..
>
> Later in LoadIncrementalHFiles.copyHFileHalf(), where we physically divide one HFile(created by the MR) iff it can not belong to just one region, I can see we pass the datablock encoding details and checksum details to the new HFile writer. But this step wont happen normally I think..
>
> Correct me if my understanding is wrong pls...
>

Sounds plausible Anoop.  Sounds like something worth fixing too?

Good on you,
St.Ack