I want execute the wordcount in yarn with compression enabled with a dir
with several files, but for that I must compress the input.
1 - Should I compress the whole dir or each file in the dir?
2 - Should I use gzip or bzip2?
3 - Do I need to setup any yarn configuration file?
4 - when the job is running, the files are decompressed before running
the mappers and compressed again after reducers executed?