Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - Estimating disk space requirements


+
Panshul Whisper 2013-01-18, 12:11
+
Mirko Kämpf 2013-01-18, 12:44
+
Jean-Marc Spaggiari 2013-01-18, 12:36
+
Panshul Whisper 2013-01-18, 13:20
Copy link to this message
-
Re: Estimating disk space requirements
Jean-Marc Spaggiari 2013-01-18, 13:26
20 nodes with 40 GB will do the work.

After that you will have to consider performances based on your access
pattern. But that's another story.

JM

2013/1/18, Panshul Whisper <[EMAIL PROTECTED]>:
> Thank you for the replies,
>
> So I take it that I should have atleast 800 GB on total free space on
> HDFS.. (combined free space of all the nodes connected to the cluster). So
> I can connect 20 nodes having 40 GB of hdd on each node to my cluster. Will
> this be enough for the storage?
> Please confirm.
>
> Thanking You,
> Regards,
> Panshul.
>
>
> On Fri, Jan 18, 2013 at 1:36 PM, Jean-Marc Spaggiari <
> [EMAIL PROTECTED]> wrote:
>
>> Hi Panshul,
>>
>> If you have 20 GB with a replication factor set to 3, you have only
>> 6.6GB available, not 11GB. You need to divide the total space by the
>> replication factor.
>>
>> Also, if you store your JSon into HBase, you need to add the key size
>> to it. If you key is 4 bytes, or 1024 bytes, it makes a difference.
>>
>> So roughly, 24 000 000 * 5 * 1024 = 114GB. You don't have the space to
>> store it. Without including the key size. Even with a replication
>> factor set to 5 you don't have the space.
>>
>> Now, you can add some compression, but even with a lucky factor of 50%
>> you still don't have the space. You will need something like 90%
>> compression factor to be able to store this data in your cluster.
>>
>> A 1T drive is now less than $100... So you might think about replacing
>> you 20 GB drives by something bigger.
>> to reply to your last question, for your data here, you will need AT
>> LEAST 350GB overall storage. But that's a bare minimum. Don't go under
>> 500GB.
>>
>> IMHO
>>
>> JM
>>
>> 2013/1/18, Panshul Whisper <[EMAIL PROTECTED]>:
>> > Hello,
>> >
>> > I was estimating how much disk space do I need for my cluster.
>> >
>> > I have 24 million JSON documents approx. 5kb each
>> > the Json is to be stored into HBASE with some identifying data in
>> coloumns
>> > and I also want to store the Json for later retrieval based on the Id
>> data
>> > as keys in Hbase.
>> > I have my HDFS replication set to 3
>> > each node has Hadoop and hbase and Ubuntu installed on it.. so approx
>> > 11
>> GB
>> > is available for use on my 20 GB node.
>> >
>> > I have no idea, if I have not enabled Hbase replication, is the HDFS
>> > replication enough to keep the data safe and redundant.
>> > How much total disk space I will need for the storage of the data.
>> >
>> > Please help me estimate this.
>> >
>> > Thank you so much.
>> >
>> > --
>> > Regards,
>> > Ouch Whisper
>> > 010101010101
>> >
>>
>
>
>
> --
> Regards,
> Ouch Whisper
> 010101010101
>
+
Panshul Whisper 2013-01-18, 13:55
+
Mohammad Tariq 2013-01-18, 14:37
+
Mohammad Tariq 2013-01-18, 22:35
+
Ted Dunning 2013-01-18, 23:36