We have a box that's a bit overpowered for just running our namenode and
jobtracker on a 10-node cluster and we also wanted to make use of the
storage and processor resources of that node, like you.
What we did is use LXC containers to segregate the different processes. LXC
is a very light weight psudo-virtualization platform for linux (near 0
The key benefit to LXC, in this case, is that we can use linux cgroups
(standard, simple config in LXC) to specify that the container/VM running
the namenode/jobtracker should have 10x the CPU and IO resources than the
container that runs a tasktracker/data node (though since LXC containers all
run under the same kernel, any "unused" resources are assigned to runnable
We run cloudera hadoop and deployed a slightly modified tasktracker
configuration on the shared box (fewer task slots so as to not over utilize
That tasktracker doesn't do as much work as the other dedicated nodes, but
it does a fair share, and the cgroup configurations (cpu.shares &
blkio.weight for the curious) ensure that the bulk processing doesn't
interfere with the critical namenode & jobtracker systems.
From: Robert Dyer [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, May 14, 2013 11:23 PM
To: [EMAIL PROTECTED]
Subject: Re: About configuring cluster setup
You can, however note that unless you also run a TaskTracker on that node
(bad idea) then any blocks that are replicated to this node won't be
available as input to MapReduces and you are lowering the odds of having
data locality on those blocks.
On Tue, May 14, 2013 at 2:01 AM, Ramya S <[EMAIL PROTECTED]> wrote:
Can we configure 1 node as both Name node and Data node ?