Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - Estimating disk space requirements


+
Panshul Whisper 2013-01-18, 12:11
Copy link to this message
-
Re: Estimating disk space requirements
Mirko Kämpf 2013-01-18, 12:44
Hi,

some comments are inside your message ...
2013/1/18 Panshul Whisper <[EMAIL PROTECTED]>

> Hello,
>
> I was estimating how much disk space do I need for my cluster.
>
> I have 24 million JSON documents approx. 5kb each
> the Json is to be stored into HBASE with some identifying data in coloumns
> and I also want to store the Json for later retrieval based on the Id data
> as keys in Hbase.
> I have my HDFS replication set to 3
> each node has Hadoop and hbase and Ubuntu installed on it.. so approx 11
> GB is available for use on my 20 GB node.
>

11 GB is quite small  - or is there a typo?

The amount of raw data is about 115 GB
   *nr of items* *size of an item* *
* *Bytes* *GB*  24 1.00E+006 5 1.02E+003
122880000000 114.4409179688  (without additional key and metadata)

Depending in the amount of overhead this could be about 200GB x 3 is 600GB
just for distributed storage.

And than you need some capacity to store intermediate processing data (20%
to 30%) of the processed data is recommendet.

So you might prepare a capacity of 1TB or even more if your dataset grows.
>
>

> I have no idea, if I have not enabled Hbase replication, is the HDFS
> replication enough to keep the data safe and redundant.
>

The replication on the HDFS level is sufficient for keeping the data safe,
no need to replicate the HBase tables separately.
>  How much total disk space I will need for the storage of the data.
>
>
> Please help me estimate this.
>
> Thank you so much.
>
> --
> Regards,
> Ouch Whisper
> 010101010101
>

Best wishes
Mirko
+
Jean-Marc Spaggiari 2013-01-18, 12:36
+
Panshul Whisper 2013-01-18, 13:20
+
Jean-Marc Spaggiari 2013-01-18, 13:26
+
Panshul Whisper 2013-01-18, 13:55
+
Mohammad Tariq 2013-01-18, 14:37
+
Mohammad Tariq 2013-01-18, 22:35
+
Ted Dunning 2013-01-18, 23:36