|
|
-
Sane max storage size for DN
Mohammad Tariq 2012-12-12, 15:02
Hello list,
I don't know if this question makes any sense, but I would like to ask, does it make sense to store 500TB (or more) data in a single DN?If yes, then what should be the spec of other parameters *viz*. NN & DN RAM, N/W etc?If no, what could be the alternative?
Many thanks.
Regards, Mohammad Tariq
+
Mohammad Tariq 2012-12-12, 15:02
-
Re: Sane max storage size for DN
Ted Dunning 2012-12-12, 15:44
Yes it does make sense, depending on how much compute each byte of data will require on average. With ordinary Hadoop, it is reasonable to have half a dozen 2TB drives. With specialized versions of Hadoop considerably more can be supported.
>From what you say, it sounds like you are suggesting that your name node get a part of a single drive with the rest being shared with other virtual instances or with an OS partition. That would be a really bad idea for performance. Many Hadoop programs are I/O bound so having more than one spindle is a good thing.
On Wed, Dec 12, 2012 at 7:02 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:
> Hello list, > > I don't know if this question makes any sense, but I would like > to ask, does it make sense to store 500TB (or more) data in a single DN?If > yes, then what should be the spec of other parameters *viz*. NN & DN RAM, > N/W etc?If no, what could be the alternative? > > Many thanks. > > Regards, > Mohammad Tariq > > >
+
Ted Dunning 2012-12-12, 15:44
-
Re: Sane max storage size for DN
Mohammad Tariq 2012-12-12, 15:52
Thank you so much for the valuable response Ted.
No, there would be dedicated storage for NN as well.
Any tips on RAM & N/W?
*Computations are not really frequent.
Thanks again.
Regards, Mohammad Tariq
On Wed, Dec 12, 2012 at 9:14 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
> > Yes it does make sense, depending on how much compute each byte of data > will require on average. With ordinary Hadoop, it is reasonable to have > half a dozen 2TB drives. With specialized versions of Hadoop considerably > more can be supported. > > From what you say, it sounds like you are suggesting that your name node > get a part of a single drive with the rest being shared with other virtual > instances or with an OS partition. That would be a really bad idea for > performance. Many Hadoop programs are I/O bound so having more than one > spindle is a good thing. > > > > On Wed, Dec 12, 2012 at 7:02 AM, Mohammad Tariq <[EMAIL PROTECTED]>wrote: > >> Hello list, >> >> I don't know if this question makes any sense, but I would like >> to ask, does it make sense to store 500TB (or more) data in a single DN?If >> yes, then what should be the spec of other parameters *viz*. NN & DN >> RAM, N/W etc?If no, what could be the alternative? >> >> Many thanks. >> >> Regards, >> Mohammad Tariq >> >> >> >
+
Mohammad Tariq 2012-12-12, 15:52
-
Re: Sane max storage size for DN
Michael Segel 2012-12-12, 18:58
500 TB?
How many nodes in the cluster? Is this attached storage or is it in an array?
I mean if you have 4 nodes for a total of 2PB, what happens when you lose 1 node? On Dec 12, 2012, at 9:02 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:
> Hello list, > > I don't know if this question makes any sense, but I would like to ask, does it make sense to store 500TB (or more) data in a single DN?If yes, then what should be the spec of other parameters viz. NN & DN RAM, N/W etc?If no, what could be the alternative? > > Many thanks. > > Regards, > Mohammad Tariq > >
+
Michael Segel 2012-12-12, 18:58
-
Re: Sane max storage size for DN
Mohammad Tariq 2012-12-13, 03:52
Hello Michael,
It's an array. The actual size of the data could be somewhere around 9PB(exclusive of replication) and we want to keep the no of DNs as less as possible. Computations are not too frequent, as I have specified earlier. If I have 500TB in 1 DN, the no of DNs would be around 49. And, if the block size is 128MB, the no of blocks would be 201326592. So, I was thinking of having 256GB RAM for the NN. Does this make sense to you?
Many thanks.
Regards, Mohammad Tariq
On Thu, Dec 13, 2012 at 12:28 AM, Michael Segel <[EMAIL PROTECTED]>wrote:
> 500 TB? > > How many nodes in the cluster? Is this attached storage or is it in an > array? > > I mean if you have 4 nodes for a total of 2PB, what happens when you lose > 1 node? > > > On Dec 12, 2012, at 9:02 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote: > > Hello list, > > I don't know if this question makes any sense, but I would like > to ask, does it make sense to store 500TB (or more) data in a single DN?If > yes, then what should be the spec of other parameters *viz*. NN & DN RAM, > N/W etc?If no, what could be the alternative? > > Many thanks. > > Regards, > Mohammad Tariq > > > >
+
Mohammad Tariq 2012-12-13, 03:52
|
|