Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Understanding harpoon - help needed

Copy link to this message
Understanding harpoon - help needed
I am doing some performance testing in HADOOP. But while testing, I faced a
situation. I need your help.

My HADOOP cluster :
6 Datanodes, 1 Namenode, 2 Clients.

Replication factor = 3

2 clients write a file(put operation) whose size is 2 x block size.
DFS.DATA.DIR in each Datanodes is equal and is same as block size. That
means each Datanodes stores a single block.

Now, if 2 clients simultaneously reads the file( get operation),
Will 2 clients read 2 blocks from different Datanodes ?
Or they will read from the same datanodes?

Does Namenode know which Datanode is busy and which one is idle?

What I am trying to find is that...
Is it possible to decrease the read time by increasing replication factor?

I have attached an image to better understand my question. Kindly take a
look. Thank you. And if possible please give references.