Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> how to control the number of mappers?


Copy link to this message
-
Re: how to control the number of mappers?
weird

I tried

# head a.pg

set job.name 'blah';
SET mapred.map.tasks.speculative.execution false;
set mapred.min.split.size 10000;

set mapred.tasktracker.map.tasks.maximum 10000;
[root@]# pig a.pg
2012-01-17 16:19:18,407 [main] INFO  org.apache.pig.Main - Logging error
messages to: /mnt/pig_1326835158407.log
2012-01-17 16:19:18,564 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting to hadoop file system at: hdfs://
ec2-107-22-118-169.compute-1.amazonaws.com:8020/
2012-01-17 16:19:18,749 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting to map-reduce job tracker at:
ec2-107-22-118-169.compute-1.amazonaws.com:8021
2012-01-17 16:19:18,858 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1000: Error during parsing. Unrecognized set key:
mapred.map.tasks.speculative.execution
Details at logfile: /mnt/pig_1326835158407.log
Pig Stack Trace
---------------
ERROR 1000: Error during parsing. Unrecognized set key:
mapred.map.tasks.speculative.execution

org.apache.pig.tools.pigscript.parser.ParseException: Unrecognized set key:
mapred.map.tasks.speculative.execution
        at
org.apache.pig.tools.grunt.GruntParser.processSet(GruntParser.java:459)
        at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:429)
        at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
        at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
        at org.apache.pig.Main.main(Main.java:397)
===============================================================================

so the job.name param is accepted, but the next one mapred.map...... was
unrecognized.
but that is the one I pasted from the docs page
On Tue, Jan 17, 2012 at 1:15 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:

> http://pig.apache.org/docs/r0.9.1/cmds.html#set
>
> "All Pig and Hadoop properties can be set, either in the Pig script or via
> the Grunt command line."
>
> On Tue, Jan 17, 2012 at 12:53 PM, Yang <[EMAIL PROTECTED]> wrote:
>
> > Prashant:
> >
> > I tried splitting the input files, yes that worked, and multiple mappers
> > were indeed created.
> >
> > but then I would have to create a separate stage simply to split the
> input
> > files, so that is a bit cumbersome. it would be nice if there is some
> > control to directly limit map file input size etc.
> >
> > Thanks
> > Yang
> >
> > On Wed, Jan 11, 2012 at 7:46 PM, Prashant Kommireddi <
> [EMAIL PROTECTED]
> > >wrote:
> >
> > > By block size I mean the actual HDFS block size. Based on your
> > requirement
> > > it seems like the input files are extremely small and reducing the
> block
> > > size is not an option.
> > >
> > > Specifying "mapred.min.split.size" would not work for both Hadoop/Java
> MR
> > > and Pig. Hadoop only picks the maximum of (minSplitSize, blockSize).
> > >
> > > Your job is more CPU intensive than I/O. I can think of splitting your
> > > files into multiple input files (equal to # of map tasks on your
> cluster)
> > > and turning off splitCombination (pig.splitCombination=false). Though
> > this
> > > is generally a terrible MR practice!
> > >
> > > Another thing you could try is to give more memory to your map tasks by
> > > increasing "mapred.child.java.opts" to a higher value.
> > >
> > > Thanks,
> > > Prashant
> > >
> > >
> > > On Wed, Jan 11, 2012 at 6:27 PM, Yang <[EMAIL PROTECTED]> wrote:
> > >
> > > > Prashant:
> > > >
> > > > thanks.
> > > >
> > > > by "reducing the block size", do you mean split size ? ---- block
> size
> > > > is fixed on a hadoop hdfs.
> > > >
> > > > my application is not really data heavy, each line of input takes a
> > > > long while to process. as a result, the input size is small, but
> total
> > > > processing time is long, and the potential parallelism is high
> > > >
> > > > Yang
> > > >
> > > > On Wed, Jan 11, 2012 at 6:21 PM, Prashant Kommireddi