I am doing some tests and wandering which are the best practices to deal
with very small files which are continuously being generated(1Mb or even
I see that if I have hundreds of small files in hdfs, hadoop automatically
will create A LOT of map tasks to consume them. Each map task will take 10
seconds or less... I don't know if it's possible to change the number of map
tasks from java code using the new API (I know it can be done with the old
one). I would like to do something like NumMapTasksCalculatedByHadoop * 0.3.
This way, less maps tasks would be instanciated and each would be working
I have had a look at hadoop archives aswell but don't thing they can help me
Any advice or similar experience?
Thanks in advance.
View this message in context: http://lucene.472066.n3.nabble.com/small-files-and-number-of-mappers-tp1989598p1989598.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.