Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - Re: Sane max storage size for DN


+
Chris Embree 2012-12-13, 05:08
+
Mohammad Tariq 2012-12-13, 05:27
Copy link to this message
-
Re: Sane max storage size for DN
Hemanth Yamijala 2012-12-13, 14:51
This is a dated blog post, so it would help if someone with current HDFS
knowledge can validate it:
http://developer.yahoo.com/blogs/hadoop/posts/2010/05/scalability_of_the_hadoop_dist/
.

There is a bit about the RAM required for the Namenode and how to compute
it:

You can look at the 'Namespace limitations' section.

Thanks
hemanth
On Thu, Dec 13, 2012 at 10:57 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:

> Hello Chris,
>
>      Thank you so much for the valuable insights. I was actually using the
> same principle. I did the blunder and did the maths for entire (9*3)PB.
>
> Seems I am higher than you, that too without drinking ;)
>
> Many thanks.
>
>
> Regards,
>     Mohammad Tariq
>
>
>
> On Thu, Dec 13, 2012 at 10:38 AM, Chris Embree <[EMAIL PROTECTED]> wrote:
>
>> Hi Mohammed,
>>
>> The amount of RAM on the NN is related to the number of blocks... so
>> let's do some math. :)  1G of RAM to 1M blocks seems to be the general rule.
>>
>> I'll probably mess this up so someone check my math:
>>
>> 9 PT ~ 9,216 TB ~ 9,437,184 GB of data.  Let's put that in 128MB blocks:
>>  according to kcalc that's 75,497,472 of 128 MB Blocks.
>> Unless I missed this by an order of magnitude (entirely possible... I've
>> been drinking since 6), that sound like 76G of RAM (above OS requirements).
>>  128G should kick it's ass; 256G seems like a waste of $$.
>>
>> Hmm... That makes the NN sound extremely efficient.  Someone validate me
>> or kick me to the curb.
>>
>> YMMV ;)
>>
>>
>> On Wed, Dec 12, 2012 at 10:52 PM, Mohammad Tariq <[EMAIL PROTECTED]>wrote:
>>
>>> Hello Michael,
>>>
>>>       It's an array. The actual size of the data could be somewhere
>>> around 9PB(exclusive of replication) and we want to keep the no of DNs as
>>> less as possible. Computations are not too frequent, as I have specified
>>> earlier. If I have 500TB in 1 DN, the no of DNs would be around 49. And, if
>>> the block size is 128MB, the no of blocks would be 201326592. So, I was
>>> thinking of having 256GB RAM for the NN. Does this make sense to you?
>>>
>>> Many thanks.
>>>
>>> Regards,
>>>     Mohammad Tariq
>>>
>>>
>>>
>>> On Thu, Dec 13, 2012 at 12:28 AM, Michael Segel <
>>> [EMAIL PROTECTED]> wrote:
>>>
>>>> 500 TB?
>>>>
>>>> How many nodes in the cluster? Is this attached storage or is it in an
>>>> array?
>>>>
>>>> I mean if you have 4 nodes for a total of 2PB, what happens when you
>>>> lose 1 node?
>>>>
>>>>
>>>> On Dec 12, 2012, at 9:02 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:
>>>>
>>>> Hello list,
>>>>
>>>>           I don't know if this question makes any sense, but I would
>>>> like to ask, does it make sense to store 500TB (or more) data in a single
>>>> DN?If yes, then what should be the spec of other parameters *viz*. NN
>>>> & DN RAM, N/W etc?If no, what could be the alternative?
>>>>
>>>> Many thanks.
>>>>
>>>> Regards,
>>>>     Mohammad Tariq
>>>>
>>>>
>>>>
>>>>
>>>
>>
>
+
Mohammad Tariq 2012-12-13, 15:18