When there is a need of bulk loading huge amount of data into HBase at one time, it will be better go with the direct HFile write.
Here 1st using the MR framework HFiles are directly written (Into HDFS).. For this HBase provides the utility classes and the ImportTSV tool itself.
Then using the IncrementalLoadHFile , these files are loaded into the regions managed by RS.
Once these 2 steps are over client can read the data normally.
For loading these much data in a normal way of HTable#put() will take lot of time.
From: Jerry Lam [[EMAIL PROTECTED]]
Sent: Wednesday, June 27, 2012 10:52 PM
To: [EMAIL PROTECTED]
Subject: Re: direct Hfile Read and Writes
I have used IncrementalLoadHFile successfully in the past. Basically, once
you have written hfile youreself you can use the IncrementalLoadHFile to
merge with the HFile currently managed by HBase. Once it is loaded to
HBase, the records in the increment hfile are accessible by clients.
On Wed, Jun 27, 2012 at 10:33 AM, shixing <[EMAIL PROTECTED]> wrote:
> 1. Since the data we might need would be distributed across regions how
> would direct reading of Hfile be helpful.
> You can read the HFilePrettyPrinter, it shows how to create a HFile.Reader
> and use it to read the HFile.
> Or you can use the ./hbase org.apache.hadoop.hbase.io.hfile.HFile -p -f
> hdfs://xxxx/xxx/hfile to print some info to have a look.
> 2. Any use-case for direct writes of Hfiles. If we write Hfiles will
> that data be accessible to the hbase shell.
> You can read the HFileOutputFormat, it shows how to create a HFile.Writer
> and use it to directly write kvs the HFile.
> If you want to read the data by hbase shell, you should firstly load the
> HFile to regionservers, details for bulkload
> http://hbase.apache.org/book.html#arch.bulk.load .
> On Wed, Jun 27, 2012 at 6:49 PM, samar kumar <[EMAIL PROTECTED]
> > Hi Hbase Users,
> > I have seen API's supporting HFile direct reads and write. I Do
> > it would create Hfiles in the location specified and it should be much
> > faster since we would skip all the look ups to ZK. catalog table . RS ,
> > can anyone point me to a particular case when we would like to read/write
> > directly .
> > 1. Since the data we might need would be distributed across regions how
> > would direct reading of Hfile be helpful.
> > 2. Any use-case for direct writes of Hfiles. If we write Hfiles will
> > that data be accessible to the hbase shell.
> > Regards,
> > Samar
> Best wishes!
> My Friend~