Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Re: Sane max storage size for DN


Copy link to this message
-
Re: Sane max storage size for DN
Hi Mohammed,

The amount of RAM on the NN is related to the number of blocks... so let's
do some math. :)  1G of RAM to 1M blocks seems to be the general rule.

I'll probably mess this up so someone check my math:

9 PT ~ 9,216 TB ~ 9,437,184 GB of data.  Let's put that in 128MB blocks:
 according to kcalc that's 75,497,472 of 128 MB Blocks.
Unless I missed this by an order of magnitude (entirely possible... I've
been drinking since 6), that sound like 76G of RAM (above OS requirements).
 128G should kick it's ass; 256G seems like a waste of $$.

Hmm... That makes the NN sound extremely efficient.  Someone validate me or
kick me to the curb.

YMMV ;)

On Wed, Dec 12, 2012 at 10:52 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:

> Hello Michael,
>
>       It's an array. The actual size of the data could be somewhere around
> 9PB(exclusive of replication) and we want to keep the no of DNs as less as
> possible. Computations are not too frequent, as I have specified earlier.
> If I have 500TB in 1 DN, the no of DNs would be around 49. And, if the
> block size is 128MB, the no of blocks would be 201326592. So, I was
> thinking of having 256GB RAM for the NN. Does this make sense to you?
>
> Many thanks.
>
> Regards,
>     Mohammad Tariq
>
>
>
> On Thu, Dec 13, 2012 at 12:28 AM, Michael Segel <[EMAIL PROTECTED]
> > wrote:
>
>> 500 TB?
>>
>> How many nodes in the cluster? Is this attached storage or is it in an
>> array?
>>
>> I mean if you have 4 nodes for a total of 2PB, what happens when you lose
>> 1 node?
>>
>>
>> On Dec 12, 2012, at 9:02 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:
>>
>> Hello list,
>>
>>           I don't know if this question makes any sense, but I would like
>> to ask, does it make sense to store 500TB (or more) data in a single DN?If
>> yes, then what should be the spec of other parameters *viz*. NN & DN
>> RAM, N/W etc?If no, what could be the alternative?
>>
>> Many thanks.
>>
>> Regards,
>>     Mohammad Tariq
>>
>>
>>
>>
>
+
Mohammad Tariq 2012-12-13, 05:27
+
Hemanth Yamijala 2012-12-13, 14:51
+
Mohammad Tariq 2012-12-13, 15:18
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB