Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> How to execute wordcount with compression?


Copy link to this message
-
How to execute wordcount with compression?
Hi,
I want execute the wordcount in yarn with compression enabled with a dir
with several files, but for that I must compress the input.

dir1/file1.txt
dir1/file2.txt
dir1/file3.txt
dir1/file4.txt
dir1/file5.txt

1 - Should I compress the whole dir or each file in the dir?

2 - Should I use gzip or bzip2?

3 - Do I need to setup any yarn configuration file?

4 - when the job is running, the files are decompressed before running
the mappers and compressed again after reducers executed?

--
Thanks,