Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Bulk loading disadvantages


+
Sever Fundatureanu 2012-07-26, 16:39
+
Sateesh Lakkarsu 2012-07-26, 16:47
+
Sever Fundatureanu 2012-07-26, 22:46
Copy link to this message
-
Re: Bulk loading disadvantages
Hi Sever,

That's a very interesting thing. Which Hadoop and hbase version you are using? I am going to run bulk loads tomorrow. If you can tell me which directories in hdfs you compared with /hbase/$table then I will try to check the same.

Best Regards,
Anil

On Jul 26, 2012, at 3:46 PM, Sever Fundatureanu <[EMAIL PROTECTED]> wrote:

> On Thu, Jul 26, 2012 at 6:47 PM, Sateesh Lakkarsu <[EMAIL PROTECTED]> wrote:
>>>
>>>
>>> For the bulkloading process, the HBase documentation mentions that in
>>> a 2nd stage "the appropriate Region Server adopts the HFile, moving it
>>> into its storage directory and making the data available to clients."
>>> But from my experience the files also remain in the original location
>>> from where they are "adopted". So I guess the data is actually copied
>>> into the HBase directory right? This means that, compared to the
>>> online importing, when bulk loading you essentially need twice the
>>> disk space on HDFS, right?
>>>
>>
>> Yes, if you are generating HFiles on one cluster and loading into a
>> separate hbase cluster. If they are co-located, its just a hdfs mv.
>
> Hmm, both the HFile generation and the HBase cluster runs on top of
> the same HDFS cluster. I did a "du" on both the source HDFS directory
> and the destination "/hbase" directory and I got the same sizes (+-
> few bytes). I deleted the source directory from HDFS and then scanned
> the table without any problems. Maybe there is a config parameter I'm
> missing?
>
> Sever
+
Bijeet Singh 2012-07-27, 06:17
+
Sever Fundatureanu 2012-07-27, 11:17
+
Sever Fundatureanu 2012-07-27, 13:46
+
Alex Baranau 2012-07-27, 14:01
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB