Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Estimating disk space requirements


Copy link to this message
-
Re: Estimating disk space requirements
Hi,

some comments are inside your message ...
2013/1/18 Panshul Whisper <[EMAIL PROTECTED]>

> Hello,
>
> I was estimating how much disk space do I need for my cluster.
>
> I have 24 million JSON documents approx. 5kb each
> the Json is to be stored into HBASE with some identifying data in coloumns
> and I also want to store the Json for later retrieval based on the Id data
> as keys in Hbase.
> I have my HDFS replication set to 3
> each node has Hadoop and hbase and Ubuntu installed on it.. so approx 11
> GB is available for use on my 20 GB node.
>

11 GB is quite small  - or is there a typo?

The amount of raw data is about 115 GB
   *nr of items* *size of an item* *
* *Bytes* *GB*  24 1.00E+006 5 1.02E+003
122880000000 114.4409179688  (without additional key and metadata)

Depending in the amount of overhead this could be about 200GB x 3 is 600GB
just for distributed storage.

And than you need some capacity to store intermediate processing data (20%
to 30%) of the processed data is recommendet.

So you might prepare a capacity of 1TB or even more if your dataset grows.
>
>

> I have no idea, if I have not enabled Hbase replication, is the HDFS
> replication enough to keep the data safe and redundant.
>

The replication on the HDFS level is sufficient for keeping the data safe,
no need to replicate the HBase tables separately.
>  How much total disk space I will need for the storage of the data.
>
>
> Please help me estimate this.
>
> Thank you so much.
>
> --
> Regards,
> Ouch Whisper
> 010101010101
>

Best wishes
Mirko
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB