Panshul Whisper 2013-01-18, 12:11
Mirko Kämpf 2013-01-18, 12:44
Jean-Marc Spaggiari 2013-01-18, 12:36
-Re: Estimating disk space requirements
Panshul Whisper 2013-01-18, 13:20
Thank you for the replies,
So I take it that I should have atleast 800 GB on total free space on
HDFS.. (combined free space of all the nodes connected to the cluster). So
I can connect 20 nodes having 40 GB of hdd on each node to my cluster. Will
this be enough for the storage?
On Fri, Jan 18, 2013 at 1:36 PM, Jean-Marc Spaggiari <
[EMAIL PROTECTED]> wrote:
> Hi Panshul,
> If you have 20 GB with a replication factor set to 3, you have only
> 6.6GB available, not 11GB. You need to divide the total space by the
> replication factor.
> Also, if you store your JSon into HBase, you need to add the key size
> to it. If you key is 4 bytes, or 1024 bytes, it makes a difference.
> So roughly, 24 000 000 * 5 * 1024 = 114GB. You don't have the space to
> store it. Without including the key size. Even with a replication
> factor set to 5 you don't have the space.
> Now, you can add some compression, but even with a lucky factor of 50%
> you still don't have the space. You will need something like 90%
> compression factor to be able to store this data in your cluster.
> A 1T drive is now less than $100... So you might think about replacing
> you 20 GB drives by something bigger.
> to reply to your last question, for your data here, you will need AT
> LEAST 350GB overall storage. But that's a bare minimum. Don't go under
> 2013/1/18, Panshul Whisper <[EMAIL PROTECTED]>:
> > Hello,
> > I was estimating how much disk space do I need for my cluster.
> > I have 24 million JSON documents approx. 5kb each
> > the Json is to be stored into HBASE with some identifying data in
> > and I also want to store the Json for later retrieval based on the Id
> > as keys in Hbase.
> > I have my HDFS replication set to 3
> > each node has Hadoop and hbase and Ubuntu installed on it.. so approx 11
> > is available for use on my 20 GB node.
> > I have no idea, if I have not enabled Hbase replication, is the HDFS
> > replication enough to keep the data safe and redundant.
> > How much total disk space I will need for the storage of the data.
> > Please help me estimate this.
> > Thank you so much.
> > --
> > Regards,
> > Ouch Whisper
> > 010101010101
Jean-Marc Spaggiari 2013-01-18, 13:26
Panshul Whisper 2013-01-18, 13:55
Mohammad Tariq 2013-01-18, 14:37
Mohammad Tariq 2013-01-18, 22:35
Ted Dunning 2013-01-18, 23:36