|
|
-
Re: Sane max storage size for DN
Chris Embree 2012-12-13, 05:08
Hi Mohammed,
The amount of RAM on the NN is related to the number of blocks... so let's do some math. :) 1G of RAM to 1M blocks seems to be the general rule.
I'll probably mess this up so someone check my math:
9 PT ~ 9,216 TB ~ 9,437,184 GB of data. Let's put that in 128MB blocks: according to kcalc that's 75,497,472 of 128 MB Blocks. Unless I missed this by an order of magnitude (entirely possible... I've been drinking since 6), that sound like 76G of RAM (above OS requirements). 128G should kick it's ass; 256G seems like a waste of $$.
Hmm... That makes the NN sound extremely efficient. Someone validate me or kick me to the curb.
YMMV ;)
On Wed, Dec 12, 2012 at 10:52 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:
> Hello Michael, > > It's an array. The actual size of the data could be somewhere around > 9PB(exclusive of replication) and we want to keep the no of DNs as less as > possible. Computations are not too frequent, as I have specified earlier. > If I have 500TB in 1 DN, the no of DNs would be around 49. And, if the > block size is 128MB, the no of blocks would be 201326592. So, I was > thinking of having 256GB RAM for the NN. Does this make sense to you? > > Many thanks. > > Regards, > Mohammad Tariq > > > > On Thu, Dec 13, 2012 at 12:28 AM, Michael Segel <[EMAIL PROTECTED] > > wrote: > >> 500 TB? >> >> How many nodes in the cluster? Is this attached storage or is it in an >> array? >> >> I mean if you have 4 nodes for a total of 2PB, what happens when you lose >> 1 node? >> >> >> On Dec 12, 2012, at 9:02 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote: >> >> Hello list, >> >> I don't know if this question makes any sense, but I would like >> to ask, does it make sense to store 500TB (or more) data in a single DN?If >> yes, then what should be the spec of other parameters *viz*. NN & DN >> RAM, N/W etc?If no, what could be the alternative? >> >> Many thanks. >> >> Regards, >> Mohammad Tariq >> >> >> >> >
+
Chris Embree 2012-12-13, 05:08
-
Re: Sane max storage size for DN
Mohammad Tariq 2012-12-13, 05:27
Hello Chris,
Thank you so much for the valuable insights. I was actually using the same principle. I did the blunder and did the maths for entire (9*3)PB.
Seems I am higher than you, that too without drinking ;)
Many thanks. Regards, Mohammad Tariq
On Thu, Dec 13, 2012 at 10:38 AM, Chris Embree <[EMAIL PROTECTED]> wrote:
> Hi Mohammed, > > The amount of RAM on the NN is related to the number of blocks... so let's > do some math. :) 1G of RAM to 1M blocks seems to be the general rule. > > I'll probably mess this up so someone check my math: > > 9 PT ~ 9,216 TB ~ 9,437,184 GB of data. Let's put that in 128MB blocks: > according to kcalc that's 75,497,472 of 128 MB Blocks. > Unless I missed this by an order of magnitude (entirely possible... I've > been drinking since 6), that sound like 76G of RAM (above OS requirements). > 128G should kick it's ass; 256G seems like a waste of $$. > > Hmm... That makes the NN sound extremely efficient. Someone validate me > or kick me to the curb. > > YMMV ;) > > > On Wed, Dec 12, 2012 at 10:52 PM, Mohammad Tariq <[EMAIL PROTECTED]>wrote: > >> Hello Michael, >> >> It's an array. The actual size of the data could be somewhere >> around 9PB(exclusive of replication) and we want to keep the no of DNs as >> less as possible. Computations are not too frequent, as I have specified >> earlier. If I have 500TB in 1 DN, the no of DNs would be around 49. And, if >> the block size is 128MB, the no of blocks would be 201326592. So, I was >> thinking of having 256GB RAM for the NN. Does this make sense to you? >> >> Many thanks. >> >> Regards, >> Mohammad Tariq >> >> >> >> On Thu, Dec 13, 2012 at 12:28 AM, Michael Segel < >> [EMAIL PROTECTED]> wrote: >> >>> 500 TB? >>> >>> How many nodes in the cluster? Is this attached storage or is it in an >>> array? >>> >>> I mean if you have 4 nodes for a total of 2PB, what happens when you >>> lose 1 node? >>> >>> >>> On Dec 12, 2012, at 9:02 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote: >>> >>> Hello list, >>> >>> I don't know if this question makes any sense, but I would >>> like to ask, does it make sense to store 500TB (or more) data in a single >>> DN?If yes, then what should be the spec of other parameters *viz*. NN & >>> DN RAM, N/W etc?If no, what could be the alternative? >>> >>> Many thanks. >>> >>> Regards, >>> Mohammad Tariq >>> >>> >>> >>> >> >
+
Mohammad Tariq 2012-12-13, 05:27
-
Re: Sane max storage size for DN
Hemanth Yamijala 2012-12-13, 14:51
This is a dated blog post, so it would help if someone with current HDFS knowledge can validate it: http://developer.yahoo.com/blogs/hadoop/posts/2010/05/scalability_of_the_hadoop_dist/. There is a bit about the RAM required for the Namenode and how to compute it: You can look at the 'Namespace limitations' section. Thanks hemanth On Thu, Dec 13, 2012 at 10:57 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote: > Hello Chris, > > Thank you so much for the valuable insights. I was actually using the > same principle. I did the blunder and did the maths for entire (9*3)PB. > > Seems I am higher than you, that too without drinking ;) > > Many thanks. > > > Regards, > Mohammad Tariq > > > > On Thu, Dec 13, 2012 at 10:38 AM, Chris Embree <[EMAIL PROTECTED]> wrote: > >> Hi Mohammed, >> >> The amount of RAM on the NN is related to the number of blocks... so >> let's do some math. :) 1G of RAM to 1M blocks seems to be the general rule. >> >> I'll probably mess this up so someone check my math: >> >> 9 PT ~ 9,216 TB ~ 9,437,184 GB of data. Let's put that in 128MB blocks: >> according to kcalc that's 75,497,472 of 128 MB Blocks. >> Unless I missed this by an order of magnitude (entirely possible... I've >> been drinking since 6), that sound like 76G of RAM (above OS requirements). >> 128G should kick it's ass; 256G seems like a waste of $$. >> >> Hmm... That makes the NN sound extremely efficient. Someone validate me >> or kick me to the curb. >> >> YMMV ;) >> >> >> On Wed, Dec 12, 2012 at 10:52 PM, Mohammad Tariq <[EMAIL PROTECTED]>wrote: >> >>> Hello Michael, >>> >>> It's an array. The actual size of the data could be somewhere >>> around 9PB(exclusive of replication) and we want to keep the no of DNs as >>> less as possible. Computations are not too frequent, as I have specified >>> earlier. If I have 500TB in 1 DN, the no of DNs would be around 49. And, if >>> the block size is 128MB, the no of blocks would be 201326592. So, I was >>> thinking of having 256GB RAM for the NN. Does this make sense to you? >>> >>> Many thanks. >>> >>> Regards, >>> Mohammad Tariq >>> >>> >>> >>> On Thu, Dec 13, 2012 at 12:28 AM, Michael Segel < >>> [EMAIL PROTECTED]> wrote: >>> >>>> 500 TB? >>>> >>>> How many nodes in the cluster? Is this attached storage or is it in an >>>> array? >>>> >>>> I mean if you have 4 nodes for a total of 2PB, what happens when you >>>> lose 1 node? >>>> >>>> >>>> On Dec 12, 2012, at 9:02 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote: >>>> >>>> Hello list, >>>> >>>> I don't know if this question makes any sense, but I would >>>> like to ask, does it make sense to store 500TB (or more) data in a single >>>> DN?If yes, then what should be the spec of other parameters *viz*. NN >>>> & DN RAM, N/W etc?If no, what could be the alternative? >>>> >>>> Many thanks. >>>> >>>> Regards, >>>> Mohammad Tariq >>>> >>>> >>>> >>>> >>> >> >
+
Hemanth Yamijala 2012-12-13, 14:51
-
Re: Sane max storage size for DN
Mohammad Tariq 2012-12-13, 15:18
Thank you so much Hemanth. Regards, Mohammad Tariq On Thu, Dec 13, 2012 at 8:21 PM, Hemanth Yamijala <[EMAIL PROTECTED] > wrote: > This is a dated blog post, so it would help if someone with current HDFS > knowledge can validate it: > http://developer.yahoo.com/blogs/hadoop/posts/2010/05/scalability_of_the_hadoop_dist/> . > > There is a bit about the RAM required for the Namenode and how to compute > it: > > You can look at the 'Namespace limitations' section. > > Thanks > hemanth > > > On Thu, Dec 13, 2012 at 10:57 AM, Mohammad Tariq <[EMAIL PROTECTED]>wrote: > >> Hello Chris, >> >> Thank you so much for the valuable insights. I was actually using >> the same principle. I did the blunder and did the maths for entire (9*3)PB. >> >> Seems I am higher than you, that too without drinking ;) >> >> Many thanks. >> >> >> Regards, >> Mohammad Tariq >> >> >> >> On Thu, Dec 13, 2012 at 10:38 AM, Chris Embree <[EMAIL PROTECTED]> wrote: >> >>> Hi Mohammed, >>> >>> The amount of RAM on the NN is related to the number of blocks... so >>> let's do some math. :) 1G of RAM to 1M blocks seems to be the general rule. >>> >>> I'll probably mess this up so someone check my math: >>> >>> 9 PT ~ 9,216 TB ~ 9,437,184 GB of data. Let's put that in 128MB blocks: >>> according to kcalc that's 75,497,472 of 128 MB Blocks. >>> Unless I missed this by an order of magnitude (entirely possible... I've >>> been drinking since 6), that sound like 76G of RAM (above OS requirements). >>> 128G should kick it's ass; 256G seems like a waste of $$. >>> >>> Hmm... That makes the NN sound extremely efficient. Someone validate me >>> or kick me to the curb. >>> >>> YMMV ;) >>> >>> >>> On Wed, Dec 12, 2012 at 10:52 PM, Mohammad Tariq <[EMAIL PROTECTED]>wrote: >>> >>>> Hello Michael, >>>> >>>> It's an array. The actual size of the data could be somewhere >>>> around 9PB(exclusive of replication) and we want to keep the no of DNs as >>>> less as possible. Computations are not too frequent, as I have specified >>>> earlier. If I have 500TB in 1 DN, the no of DNs would be around 49. And, if >>>> the block size is 128MB, the no of blocks would be 201326592. So, I was >>>> thinking of having 256GB RAM for the NN. Does this make sense to you? >>>> >>>> Many thanks. >>>> >>>> Regards, >>>> Mohammad Tariq >>>> >>>> >>>> >>>> On Thu, Dec 13, 2012 at 12:28 AM, Michael Segel < >>>> [EMAIL PROTECTED]> wrote: >>>> >>>>> 500 TB? >>>>> >>>>> How many nodes in the cluster? Is this attached storage or is it in an >>>>> array? >>>>> >>>>> I mean if you have 4 nodes for a total of 2PB, what happens when you >>>>> lose 1 node? >>>>> >>>>> >>>>> On Dec 12, 2012, at 9:02 AM, Mohammad Tariq <[EMAIL PROTECTED]> >>>>> wrote: >>>>> >>>>> Hello list, >>>>> >>>>> I don't know if this question makes any sense, but I would >>>>> like to ask, does it make sense to store 500TB (or more) data in a single >>>>> DN?If yes, then what should be the spec of other parameters *viz*. NN >>>>> & DN RAM, N/W etc?If no, what could be the alternative? >>>>> >>>>> Many thanks. >>>>> >>>>> Regards, >>>>> Mohammad Tariq >>>>> >>>>> >>>>> >>>>> >>>> >>> >> >
+
Mohammad Tariq 2012-12-13, 15:18
|
|