|
|
-
map-red with many input pathsKoert Kuipers 2012-10-17, 00:25
currently i run a map-reduce job that reads from a single path with a glob:
"/data/*" i am considering replacing this one glob path with an explicit list of all the paths (so that i can check for _SUCCESS files in the subdirs and exclude the subdirs that don't have this file, to avoid reading from subdirs as data is being written to them). there are hundreds of subdirectories in /data, and it will be thousands soon... is there a limit on how many paths i can include for a map-red job? is there a smarter way to do this? thanks! koert +
Lohit 2012-10-17, 00:46
|