2 nodes and replication factor of 2 results in a replica of each block
present on each node. This would allow the possibility that a single node
would do the work and yet be data local. It will probably happen if that
single node has the needed capacity. More nodes than the replication
factor are needed to force distribution of the processing.
On Jan 8, 2014 7:35 AM, "Ashish Jain" <[EMAIL PROTECTED]> wrote:
> I am sure that only one node is being used. I just know ran the job again
> and could see that CPU usage only for one server going high other server
> CPU usage remains constant and hence it means other node is not being used.
> Can someone help me to debug this issue?
> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <[EMAIL PROTECTED]> wrote:
>> Hello All,
>> I have a 2 node hadoop cluster running with a replication factor of 2. I
>> have a file of size around 1 GB which when copied to HDFS is replicated to
>> both the nodes. Seeing the block info I can see the file has been
>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>> each of size 128 MB. I use this file as input to run the word count
>> program. Some how I feel only one node is doing all the work and the code
>> is not distributed to other node. How can I make sure code is distributed
>> to both the nodes? Also is there a log or GUI which can be used for this?
>> Please note I am using the latest stable release that is 2.2.0.