Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - how to control the number of mappers?


Copy link to this message
-
how to control the number of mappers?
Yang 2012-01-12, 02:12
I have a pig script  that does basically a map-only job:

raw = LOAD 'input.txt' ;

processed = FOREACH raw GENERATE convert_somehow($1,$2...);

store processed into 'output.txt';

I have many nodes on my cluster, so I want PIG to process the input in
more mappers. but it generates only 2 part-m-xxxxx  files, i.e.
using 2 mappers.

in hadoop job it's possible to pass mapper count and
-Dmapred.min.split.size= ,  would this also work for PIG? the PARALLEL
keyword only works for reducers
Thanks
Yang