Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> how to control the number of mappers?


Copy link to this message
-
Re: how to control the number of mappers?
thanks, but from http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#set
it looks the params that can be 'set' is very limited, and does not contain
the min split size  and mapper count that I want

On Wed, Jan 11, 2012 at 9:52 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:

> Yes, you can use the "set" keyword to set such properties in the script.
>
> On Jan 11, 2012, at 6:12 PM, Yang <[EMAIL PROTECTED]> wrote:
>
> > I have a pig script  that does basically a map-only job:
> >
> > raw = LOAD 'input.txt' ;
> >
> > processed = FOREACH raw GENERATE convert_somehow($1,$2...);
> >
> > store processed into 'output.txt';
> >
> >
> >
> > I have many nodes on my cluster, so I want PIG to process the input in
> > more mappers. but it generates only 2 part-m-xxxxx  files, i.e.
> > using 2 mappers.
> >
> > in hadoop job it's possible to pass mapper count and
> > -Dmapred.min.split.size= ,  would this also work for PIG? the PARALLEL
> > keyword only works for reducers
> >
> >
> > Thanks
> > Yang
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB