Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - Re: Sane max storage size for DN


+
Chris Embree 2012-12-13, 05:08
+
Mohammad Tariq 2012-12-13, 05:27
+
Hemanth Yamijala 2012-12-13, 14:51
Copy link to this message
-
Re: Sane max storage size for DN
Mohammad Tariq 2012-12-13, 15:18
Thank you so much Hemanth.

Regards,
    Mohammad Tariq

On Thu, Dec 13, 2012 at 8:21 PM, Hemanth Yamijala <[EMAIL PROTECTED]
> wrote:

> This is a dated blog post, so it would help if someone with current HDFS
> knowledge can validate it:
> http://developer.yahoo.com/blogs/hadoop/posts/2010/05/scalability_of_the_hadoop_dist/
> .
>
> There is a bit about the RAM required for the Namenode and how to compute
> it:
>
> You can look at the 'Namespace limitations' section.
>
> Thanks
> hemanth
>
>
> On Thu, Dec 13, 2012 at 10:57 AM, Mohammad Tariq <[EMAIL PROTECTED]>wrote:
>
>> Hello Chris,
>>
>>      Thank you so much for the valuable insights. I was actually using
>> the same principle. I did the blunder and did the maths for entire (9*3)PB.
>>
>> Seems I am higher than you, that too without drinking ;)
>>
>> Many thanks.
>>
>>
>> Regards,
>>     Mohammad Tariq
>>
>>
>>
>> On Thu, Dec 13, 2012 at 10:38 AM, Chris Embree <[EMAIL PROTECTED]> wrote:
>>
>>> Hi Mohammed,
>>>
>>> The amount of RAM on the NN is related to the number of blocks... so
>>> let's do some math. :)  1G of RAM to 1M blocks seems to be the general rule.
>>>
>>> I'll probably mess this up so someone check my math:
>>>
>>> 9 PT ~ 9,216 TB ~ 9,437,184 GB of data.  Let's put that in 128MB blocks:
>>>  according to kcalc that's 75,497,472 of 128 MB Blocks.
>>> Unless I missed this by an order of magnitude (entirely possible... I've
>>> been drinking since 6), that sound like 76G of RAM (above OS requirements).
>>>  128G should kick it's ass; 256G seems like a waste of $$.
>>>
>>> Hmm... That makes the NN sound extremely efficient.  Someone validate me
>>> or kick me to the curb.
>>>
>>> YMMV ;)
>>>
>>>
>>> On Wed, Dec 12, 2012 at 10:52 PM, Mohammad Tariq <[EMAIL PROTECTED]>wrote:
>>>
>>>> Hello Michael,
>>>>
>>>>       It's an array. The actual size of the data could be somewhere
>>>> around 9PB(exclusive of replication) and we want to keep the no of DNs as
>>>> less as possible. Computations are not too frequent, as I have specified
>>>> earlier. If I have 500TB in 1 DN, the no of DNs would be around 49. And, if
>>>> the block size is 128MB, the no of blocks would be 201326592. So, I was
>>>> thinking of having 256GB RAM for the NN. Does this make sense to you?
>>>>
>>>> Many thanks.
>>>>
>>>> Regards,
>>>>     Mohammad Tariq
>>>>
>>>>
>>>>
>>>> On Thu, Dec 13, 2012 at 12:28 AM, Michael Segel <
>>>> [EMAIL PROTECTED]> wrote:
>>>>
>>>>> 500 TB?
>>>>>
>>>>> How many nodes in the cluster? Is this attached storage or is it in an
>>>>> array?
>>>>>
>>>>> I mean if you have 4 nodes for a total of 2PB, what happens when you
>>>>> lose 1 node?
>>>>>
>>>>>
>>>>> On Dec 12, 2012, at 9:02 AM, Mohammad Tariq <[EMAIL PROTECTED]>
>>>>> wrote:
>>>>>
>>>>> Hello list,
>>>>>
>>>>>           I don't know if this question makes any sense, but I would
>>>>> like to ask, does it make sense to store 500TB (or more) data in a single
>>>>> DN?If yes, then what should be the spec of other parameters *viz*. NN
>>>>> & DN RAM, N/W etc?If no, what could be the alternative?
>>>>>
>>>>> Many thanks.
>>>>>
>>>>> Regards,
>>>>>     Mohammad Tariq
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>