Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - hadoop infrastructure questions (production environment)


Copy link to this message
-
hadoop infrastructure questions (production environment)
Oleg Ruchovets 2011-02-08, 15:45
Hi , we are going to production and have some questions to ask:

   We are using 0.20_append  version (as I understand it is  hbase 0.90
requirement).
   1) Currently we have to process 50GB text files per day , it can grow to
150GB
          -- what is the best hadoop file size for our load and are there
suggested disk block size for that size?
          -- We worked using gz and I saw that for every files 1 map task
was assigned.
                  What is the best practice:  to work with gz files and save
disc space or work without archiving ?
                  Lets say we want to get performance benefits and disk
space is less critical.

   2)  Currently adding additional machine to the greed we need manually
maintain all files and configurations.
         Is it possible to auto-deploy hadoop servers without the need to
manually define each one on all nodes?
   3) Can we change masters without reinstalling the entire grid

Thank in advance
 Oleg