Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Is there a way to limit the number of maps produced by HBaseStorage ?


+
Vincent Barat 2013-01-21, 13:27
+
Mohammad Tariq 2013-01-21, 13:43
Copy link to this message
-
Re: Is there a way to limit the number of maps produced by HBaseStorage ?
Hi Vincent,

You can restrict the number of concurrent maps by setting this
parameter *mapred.tasktracker.map.tasks.maximum
= 1 or 2*.

*Thanks
Nagamallikarjuna*

On Mon, Jan 21, 2013 at 7:13 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:

> Hello Vincent,
>
>          The number of map tasks for a job is primarily governed by the
> InputSplits and the InputFormat you are using. So setting it through a
> config parameter doesn't guarantee that your job would have the specified
> number of map tasks. However, you can give it a try by using "set
> mapred.map.tasks=n" in your PigLatin job.
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Mon, Jan 21, 2013 at 6:57 PM, Vincent Barat <[EMAIL PROTECTED]
> >wrote:
>
> > Hi,
> >
> > We are using HBaseStorage intensively to load data from tables having
> more
> > than 100 regions.
> >
> > HBaseStorage generates 1 map par region, and our cluster having 50 map
> > slots, it happens that our PIG scripts start 50 maps reading concurrently
> > data from HBase.
> >
> > The problem is that our HBase cluster has only 10 nodes, and thus the
> maps
> > overload it (5 intensive readers per node is too much to bare).
> >
> > So question: is there a way to say to PIG : limit the nb of maps to this
> > maximum (ex: 10) ?
> > If not, how can I patch the code to do this ?
> >
> > Thanks a lot for your help
> >
>
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Mon, Jan 21, 2013 at 6:57 PM, Vincent Barat <[EMAIL PROTECTED]
> >wrote:
>
> > Hi,
> >
> > We are using HBaseStorage intensively to load data from tables having
> more
> > than 100 regions.
> >
> > HBaseStorage generates 1 map par region, and our cluster having 50 map
> > slots, it happens that our PIG scripts start 50 maps reading concurrently
> > data from HBase.
> >
> > The problem is that our HBase cluster has only 10 nodes, and thus the
> maps
> > overload it (5 intensive readers per node is too much to bare).
> >
> > So question: is there a way to say to PIG : limit the nb of maps to this
> > maximum (ex: 10) ?
> > If not, how can I patch the code to do this ?
> >
> > Thanks a lot for your help
> >
>

--
Thanks and Regards
Nagamallikarjuna
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB