Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Unexpected Data insertion time and Data size explosion


+
kranthi reddy 2011-12-04, 14:19
Copy link to this message
-
Re: Unexpected Data insertion time and Data size explosion
May I ask whether you pre-split your table before loading ?

On Dec 4, 2011, at 6:19 AM, kranthi reddy <[EMAIL PROTECTED]> wrote:

> Hi all,
>
>    I am a newbie to Hbase and Hadoop. I have setup a cluster of 4 machines
> and am trying to insert data. 3 of the machines are tasktrackers, with 4
> map tasks each.
>
>    My data consists of about 1.3 billion rows with 4 columns each (100GB
> txt file). The column structure is "rowID, word1, word2, word3".  My DFS
> replication in hadoop and hbase is set to 3 each. I have put only one
> column family and 3 qualifiers for each field (word*).
>
>    I am using the SampleUploader present in the HBase distribution. To
> complete 40% of the insertion, it has taken around 21 hrs and it's still
> running. I have 12 map tasks running.* I would like to know is the
> insertion time taken here on expected lines ??? Because when I used lucene,
> I was able to insert the entire data in about 8 hours.*
>
>    Also, there seems to be huge explosion of data size here. With a
> replication factor of 3 for HBase, I was expecting the table size inserted
> to be around 350-400GB. (350-400GB for an 100GB txt file I have, 300GB for
> replicating the data 3 times and 50+ GB for additional storage
> information). But even for 40% completion of data insertion, the space
> occupied is around 550GB (Looks like it might take around 1.2TB for an
> 100GB file).* I have used the rowID to be a String, instead of Long. Will
> that account for such rapid increase in data storage???
> *
>
> Regards,
> Kranthi
+
kranthi reddy 2011-12-05, 05:23
+
Ulrich Staudinger 2011-12-05, 07:56
+
kranthi reddy 2011-12-05, 09:10
+
Ulrich Staudinger 2011-12-05, 15:13
+
kranthi reddy 2011-12-05, 16:33
+
kranthi reddy 2011-12-05, 17:26
+
Doug Meil 2011-12-05, 17:42
+
kranthi reddy 2011-12-19, 05:54