|
|
+
Sever Fundatureanu 2012-07-26, 16:39
+
Sateesh Lakkarsu 2012-07-26, 16:47
+
Sever Fundatureanu 2012-07-26, 22:46
-
Re: Bulk loading disadvantagesAnil Gupta 2012-07-27, 03:40
Hi Sever,
That's a very interesting thing. Which Hadoop and hbase version you are using? I am going to run bulk loads tomorrow. If you can tell me which directories in hdfs you compared with /hbase/$table then I will try to check the same. Best Regards, Anil On Jul 26, 2012, at 3:46 PM, Sever Fundatureanu <[EMAIL PROTECTED]> wrote: > On Thu, Jul 26, 2012 at 6:47 PM, Sateesh Lakkarsu <[EMAIL PROTECTED]> wrote: >>> >>> >>> For the bulkloading process, the HBase documentation mentions that in >>> a 2nd stage "the appropriate Region Server adopts the HFile, moving it >>> into its storage directory and making the data available to clients." >>> But from my experience the files also remain in the original location >>> from where they are "adopted". So I guess the data is actually copied >>> into the HBase directory right? This means that, compared to the >>> online importing, when bulk loading you essentially need twice the >>> disk space on HDFS, right? >>> >> >> Yes, if you are generating HFiles on one cluster and loading into a >> separate hbase cluster. If they are co-located, its just a hdfs mv. > > Hmm, both the HFile generation and the HBase cluster runs on top of > the same HDFS cluster. I did a "du" on both the source HDFS directory > and the destination "/hbase" directory and I got the same sizes (+- > few bytes). I deleted the source directory from HDFS and then scanned > the table without any problems. Maybe there is a config parameter I'm > missing? > > Sever +
Bijeet Singh 2012-07-27, 06:17
+
Sever Fundatureanu 2012-07-27, 11:17
+
Sever Fundatureanu 2012-07-27, 13:46
+
Alex Baranau 2012-07-27, 14:01
|