Panshul Whisper 2013-01-18, 12:11
Mirko Kämpf 2013-01-18, 12:44
Jean-Marc Spaggiari 2013-01-18, 12:36
Panshul Whisper 2013-01-18, 13:20
-Re: Estimating disk space requirements
Jean-Marc Spaggiari 2013-01-18, 13:26
20 nodes with 40 GB will do the work.
After that you will have to consider performances based on your access
pattern. But that's another story.
2013/1/18, Panshul Whisper <[EMAIL PROTECTED]>:
> Thank you for the replies,
> So I take it that I should have atleast 800 GB on total free space on
> HDFS.. (combined free space of all the nodes connected to the cluster). So
> I can connect 20 nodes having 40 GB of hdd on each node to my cluster. Will
> this be enough for the storage?
> Please confirm.
> Thanking You,
> On Fri, Jan 18, 2013 at 1:36 PM, Jean-Marc Spaggiari <
> [EMAIL PROTECTED]> wrote:
>> Hi Panshul,
>> If you have 20 GB with a replication factor set to 3, you have only
>> 6.6GB available, not 11GB. You need to divide the total space by the
>> replication factor.
>> Also, if you store your JSon into HBase, you need to add the key size
>> to it. If you key is 4 bytes, or 1024 bytes, it makes a difference.
>> So roughly, 24 000 000 * 5 * 1024 = 114GB. You don't have the space to
>> store it. Without including the key size. Even with a replication
>> factor set to 5 you don't have the space.
>> Now, you can add some compression, but even with a lucky factor of 50%
>> you still don't have the space. You will need something like 90%
>> compression factor to be able to store this data in your cluster.
>> A 1T drive is now less than $100... So you might think about replacing
>> you 20 GB drives by something bigger.
>> to reply to your last question, for your data here, you will need AT
>> LEAST 350GB overall storage. But that's a bare minimum. Don't go under
>> 2013/1/18, Panshul Whisper <[EMAIL PROTECTED]>:
>> > Hello,
>> > I was estimating how much disk space do I need for my cluster.
>> > I have 24 million JSON documents approx. 5kb each
>> > the Json is to be stored into HBASE with some identifying data in
>> > and I also want to store the Json for later retrieval based on the Id
>> > as keys in Hbase.
>> > I have my HDFS replication set to 3
>> > each node has Hadoop and hbase and Ubuntu installed on it.. so approx
>> > 11
>> > is available for use on my 20 GB node.
>> > I have no idea, if I have not enabled Hbase replication, is the HDFS
>> > replication enough to keep the data safe and redundant.
>> > How much total disk space I will need for the storage of the data.
>> > Please help me estimate this.
>> > Thank you so much.
>> > --
>> > Regards,
>> > Ouch Whisper
>> > 010101010101
> Ouch Whisper
Panshul Whisper 2013-01-18, 13:55
Mohammad Tariq 2013-01-18, 14:37
Mohammad Tariq 2013-01-18, 22:35
Ted Dunning 2013-01-18, 23:36