I have a bunch of zip files that I want to serve as input to a MapReduce
job. My initial design was to list them in a text file and then give this
list file as input. The list file would be read, and each line would be
handed off to a node to process, which would pick up the corresponding zip
file and work on it.
But I feel that a better design is possible, and that my way is redundant.
Can I just give the input directory as input? How do I make sure each node
gets a file to process?