Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Using a hard drive instead of

Copy link to this message
Re: Using a hard drive instead of

If you are worried about the memory constraints of a Linux system, I'd say go with MapR and their CLDB.

I just did a quick look at Supermico servers and found that on a 2u server  768GB was the max.
So how many blocks can you store in that much memory? I only have 10 fingers and toes so I can't count that high. ;-)

Assuming that you use 64MB blocks what's that max size?
Switching to 128MB blocks what's the max size then?

From Tom White's blog "Every file, directory and block in HDFS is represented as an object in the nam- enode’s memory, each of which occupies 150 bytes, as a rule of thumb. So 10 million files, each using a block, would use about 3 gigabytes of memory. Scaling up much beyond this level is a problem with current hardware. Certainly a billion files is not feasible."

So 10 million blocks in 3GB. 1x10^7 * 200 = 6*10^9 blocks in 600GB of memory.

Thats 6 Billion blocks.
At 64MB that's 384 Billion MBs which off the top of my head, is 384 PB?
(Ok, I'll admit that makes my head spin so someone may want to check my math....)

The point is that its possible to build out your control nodes with more than enough memory to handle the largest cluster you can build with Map/Reduce 1.x (Pre YARN)

I am skeptical of Federation.

Just Saying...

On Oct 17, 2012, at 5:37 PM, Colin Patrick McCabe <[EMAIL PROTECTED]> wrote:

> The direct answer to your question us to use this theoretical super-fast hard drive as Linux swap space.
> The better answer is to use federation or another solution if your needs exceed those servable by a single NameNode.
> Cheers.
> Colin
> On Oct 11, 2012 9:00 PM, "Mark Kerzner" <[EMAIL PROTECTED]> wrote:
> Hi,
> Imagine I have a very fast hard drive that I want to use for the NameNode. That is, I want the NameNode to store its blocks information on this hard drive instead of in memory.
> Why would I do it? Scalability (no federation needed), many files are not a problem, and warm fail-over is automatic. What would I need to change in the NameNode to tell it to use the hard drive?
> Thank you,
> Mark