Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> how to control the number of mappers?


Copy link to this message
-
how to control the number of mappers?
I have a pig script  that does basically a map-only job:

raw = LOAD 'input.txt' ;

processed = FOREACH raw GENERATE convert_somehow($1,$2...);

store processed into 'output.txt';

I have many nodes on my cluster, so I want PIG to process the input in
more mappers. but it generates only 2 part-m-xxxxx  files, i.e.
using 2 mappers.

in hadoop job it's possible to pass mapper count and
-Dmapred.min.split.size= ,  would this also work for PIG? the PARALLEL
keyword only works for reducers
Thanks
Yang
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB