Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - small files and number of mappers


Copy link to this message
-
small files and number of mappers
Marc Sturlese 2010-11-29, 23:26

Hey there,
I am doing some tests and wandering which are the best practices to deal
with very small files which are continuously being generated(1Mb or even
less).

I see that if I have hundreds of small files in hdfs, hadoop automatically
will create A LOT of map tasks to consume them. Each map task will take 10
seconds or less... I don't know if it's possible to change the number of map
tasks from java code using the new API (I know it can be done with the old
one). I would like to do something like NumMapTasksCalculatedByHadoop * 0.3.
This way, less maps tasks would be instanciated and each would be working
more time.

I have had a look at hadoop archives aswell but don't thing they can help me
here.

Any advice or similar experience?
Thanks in advance.
--
View this message in context: http://lucene.472066.n3.nabble.com/small-files-and-number-of-mappers-tp1989598p1989598.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.