Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - how to control the number of mappers?


Copy link to this message
-
Re: how to control the number of mappers?
Yang 2012-01-17, 20:46
thanks, but from http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#set
it looks the params that can be 'set' is very limited, and does not contain
the min split size  and mapper count that I want

On Wed, Jan 11, 2012 at 9:52 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:

> Yes, you can use the "set" keyword to set such properties in the script.
>
> On Jan 11, 2012, at 6:12 PM, Yang <[EMAIL PROTECTED]> wrote:
>
> > I have a pig script  that does basically a map-only job:
> >
> > raw = LOAD 'input.txt' ;
> >
> > processed = FOREACH raw GENERATE convert_somehow($1,$2...);
> >
> > store processed into 'output.txt';
> >
> >
> >
> > I have many nodes on my cluster, so I want PIG to process the input in
> > more mappers. but it generates only 2 part-m-xxxxx  files, i.e.
> > using 2 mappers.
> >
> > in hadoop job it's possible to pass mapper count and
> > -Dmapred.min.split.size= ,  would this also work for PIG? the PARALLEL
> > keyword only works for reducers
> >
> >
> > Thanks
> > Yang
>