Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> toward Rack-Awareness approach


Copy link to this message
-
toward Rack-Awareness approach
Hi Hadoopers,

Currently I am running hadoop version 0.20.203 in production with 600 TB in her.
I am planning to enable rack awareness in my production, but I still
didn't see it through.

plan/questions.

1. I have script that can solve datanode/tasktracker IP to rack name.
2. Add topology.script.file.name in hdfs-site.xml and restart cluster.
3. After the cluster come back, my question start here,
    - do i have to run balancer or fsck or some command to have those
600 TB become redistribute to different rack in one time ?
    - currently i run balancer 2 hrs. everyday, can i keep this
routine and hope that at some point the data will be nicely
redistributed and aware of rack location ?
    - how could we know that the data in the cluster is now fully rack
awareness ??
    - if i just add the script and run balancer 2 hrs everyday, before
the whole data become rack awareness. the data will be kind
      of mix between "default-rack" of existing data (haven't get
balanced) and probably new loaded data will be rack-awareness.
      is it OK ? to have mix of default-rack and rack-specific data together ?

4. thought ?

Hope this make sense,

Thanks in advance
Patai
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB