Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> toward Rack-Awareness approach


Copy link to this message
-
toward Rack-Awareness approach
Hi Hadoopers,

Currently I am running hadoop version 0.20.203 in production with 600 TB in her.
I am planning to enable rack awareness in my production, but I still
didn't see it through.

plan/questions.

1. I have script that can solve datanode/tasktracker IP to rack name.
2. Add topology.script.file.name in hdfs-site.xml and restart cluster.
3. After the cluster come back, my question start here,
    - do i have to run balancer or fsck or some command to have those
600 TB become redistribute to different rack in one time ?
    - currently i run balancer 2 hrs. everyday, can i keep this
routine and hope that at some point the data will be nicely
redistributed and aware of rack location ?
    - how could we know that the data in the cluster is now fully rack
awareness ??
    - if i just add the script and run balancer 2 hrs everyday, before
the whole data become rack awareness. the data will be kind
      of mix between "default-rack" of existing data (haven't get
balanced) and probably new loaded data will be rack-awareness.
      is it OK ? to have mix of default-rack and rack-specific data together ?

4. thought ?

Hope this make sense,

Thanks in advance
Patai