Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - How to make zip files as Hadoop input


Copy link to this message
-
How to make zip files as Hadoop input
Mark Kerzner 2011-03-02, 05:45
Hi,

I have a bunch of zip files that I want to serve as input to a MapReduce
job. My initial design was to list them in a text file and then give this
list file as input. The list file would be read, and each line would be
handed off to a node to process, which would pick up the corresponding zip
file and work on it.

But I feel that a better design is possible, and that my way is redundant.
Can I just give the input directory as input? How do I make sure each node
gets a file to process?

Thank you,
Mark