Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Estimating disk space requirements


+
Panshul Whisper 2013-01-18, 12:11
+
Mirko Kämpf 2013-01-18, 12:44
+
Jean-Marc Spaggiari 2013-01-18, 12:36
Copy link to this message
-
Re: Estimating disk space requirements
Thank you for the replies,

So I take it that I should have atleast 800 GB on total free space on
HDFS.. (combined free space of all the nodes connected to the cluster). So
I can connect 20 nodes having 40 GB of hdd on each node to my cluster. Will
this be enough for the storage?
Please confirm.

Thanking You,
Regards,
Panshul.
On Fri, Jan 18, 2013 at 1:36 PM, Jean-Marc Spaggiari <
[EMAIL PROTECTED]> wrote:

> Hi Panshul,
>
> If you have 20 GB with a replication factor set to 3, you have only
> 6.6GB available, not 11GB. You need to divide the total space by the
> replication factor.
>
> Also, if you store your JSon into HBase, you need to add the key size
> to it. If you key is 4 bytes, or 1024 bytes, it makes a difference.
>
> So roughly, 24 000 000 * 5 * 1024 = 114GB. You don't have the space to
> store it. Without including the key size. Even with a replication
> factor set to 5 you don't have the space.
>
> Now, you can add some compression, but even with a lucky factor of 50%
> you still don't have the space. You will need something like 90%
> compression factor to be able to store this data in your cluster.
>
> A 1T drive is now less than $100... So you might think about replacing
> you 20 GB drives by something bigger.
> to reply to your last question, for your data here, you will need AT
> LEAST 350GB overall storage. But that's a bare minimum. Don't go under
> 500GB.
>
> IMHO
>
> JM
>
> 2013/1/18, Panshul Whisper <[EMAIL PROTECTED]>:
> > Hello,
> >
> > I was estimating how much disk space do I need for my cluster.
> >
> > I have 24 million JSON documents approx. 5kb each
> > the Json is to be stored into HBASE with some identifying data in
> coloumns
> > and I also want to store the Json for later retrieval based on the Id
> data
> > as keys in Hbase.
> > I have my HDFS replication set to 3
> > each node has Hadoop and hbase and Ubuntu installed on it.. so approx 11
> GB
> > is available for use on my 20 GB node.
> >
> > I have no idea, if I have not enabled Hbase replication, is the HDFS
> > replication enough to keep the data safe and redundant.
> > How much total disk space I will need for the storage of the data.
> >
> > Please help me estimate this.
> >
> > Thank you so much.
> >
> > --
> > Regards,
> > Ouch Whisper
> > 010101010101
> >
>

--
Regards,
Ouch Whisper
010101010101
+
Jean-Marc Spaggiari 2013-01-18, 13:26
+
Panshul Whisper 2013-01-18, 13:55
+
Mohammad Tariq 2013-01-18, 14:37
+
Mohammad Tariq 2013-01-18, 22:35
+
Ted Dunning 2013-01-18, 23:36