Dibyendu Karmakar 2013-04-11, 10:19
-Re: UNDERSTANDING HADOOP PERFORMANCE
MARCOS MEDRADO RUBINELLI 2013-04-11, 11:14
dfs.namenode.handler.count and dfs.datanode.handler.count control how many concurrent threads the server will have to handle incoming requests. The default values should be fine for smaller clusters, but if you have a lot of simultaneous HDFS operations, you may see performance gains by increasing these numbers. Just make sure you have the memory to spare and adjust your heap sizes accordingly.
dfs.heartbeat.interval and dfs.blockreport.intervalMsec will affect performance in larger clusters. Datanodes send a message to the namenode saying they are still alive every dfs.heartbeat.interval seconds, and after dfs.namenode.stale.datanode.interval milliseconds without a heartbeat, the namenode will mark that datanode as stale. Similarly, the datanode will send a list of all the blocks it has every dfs.blockreport.intervalMsec milliseconds. For a cluster of 30 machines, that means the namenode receives a heartbeat, on average, every 0.1 seconds, and a block report every 6 minutes, which should be a negligible load and worth the extra reliability. If your block reports are taking too long, that's a sign that you have too many small files and should look into archiving or consolidating them somehow. Personally, I ran into trouble around 1 million blocks/datanode.
dfs.namenode.decommission.interval is only used when removing datanodes from the cluster. You can safely ignore it.
On 11-04-2013 07:19, Dibyendu Karmakar wrote:
I am testing hadoop performance. I have come accross the following parameters:
3. dfs.heartbeat.interval (dafault: 3)
4. dfs.blockreport.intervalMsec (default: 3600000)
5. dfs.namenode.handler.count (default: 10)
6. dfs.datanode.handler.count (default: 3)
7.dfs.replication.interval (default: 3)
8.dfs.namenode.decomission.interval (default: 300)
I have successfully tested 1 and 2 parameters. But the rest of the
parameters starting from dfs.heartbeat.interval is confusing me a lot.
On increment of those parameters, will the hadoop perform better? (
considering separately for read and write operation )...
OR, do I have to decrease those parameters to have hadoop perform better?
Anyone please help. If possible please explain
dfs.namenode.hanlder.count and dfs.datanode.handler.count i.e. what
these two parameters do?