Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> How to execute wordcount with compression?


Copy link to this message
-
How to execute wordcount with compression?
Hi,
I want execute the wordcount in yarn with compression enabled with a dir
with several files, but for that I must compress the input.

dir1/file1.txt
dir1/file2.txt
dir1/file3.txt
dir1/file4.txt
dir1/file5.txt

1 - Should I compress the whole dir or each file in the dir?

2 - Should I use gzip or bzip2?

3 - Do I need to setup any yarn configuration file?

4 - when the job is running, the files are decompressed before running
the mappers and compressed again after reducers executed?

--
Thanks,
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB