Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> mapred.map.tasks getting set, but not sure where


Copy link to this message
-
Re: mapred.map.tasks getting set, but not sure where
It seems logical too that launching 4000 map tasks on a 20 node cluster is going to have a lot of overhead with it.  20 does not seem like the ideal number, but I don't really know the internals of Cassandra that well.  You might want to post this question on the Cassandra list to see if they can help you identify a way to increase the number of map tasks.

--Bobby Evans

On 11/5/11 9:33 AM, "Brendan W." <[EMAIL PROTECTED]> wrote:

Yeah, that's my guess now, that somebody must have hacked the Cassandra
libs on me...just wanted to see if there were other possibilities for where
that parameter was being set.

Thanks a lot for the help.

On Fri, Nov 4, 2011 at 2:11 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> Could just be that Cassandra has changed the way their splits generate?
> Was Cassandra client libs changed at any point? Have you looked at its
> input formats' sources?
>
> On 04-Nov-2011, at 10:05 PM, Brendan W. wrote:
>
> > Plain Java MR, using the Cassandra inputFormat to read out of Cassandra.
> >
> > Perhaps somebody hacked the inputFormat code on me...
> >
> > But what's weird is that the parameter mapred.map.tasks didn't appear in
> > the job confs before at all.  Now it does, with a value of 20 (happens to
> > be the # of machines in the cluster), and that's without the jobs or the
> > mapred-site.xml files themselves changing.
> >
> > The inputSplitSize is set specifically in the jobs, and has not been
> > changed (except I subsequently fiddled with it a little to see if it
> > affected the fact that I was getting 20 splits, and it didn't affect
> > that...just the split size, not the number).
> >
> > After a submit the job, I get a message "TOTAL NUMBER OF SPLIT = 20",
> > before a list of the input splits...sort of looks like a hack but I can't
> > find where it is.
> >
> > On Fri, Nov 4, 2011 at 11:58 AM, Harsh J <[EMAIL PROTECTED]> wrote:
> >
> >> Brendan,
> >>
> >> Are these jobs (whose split behavior has changed) via Hive/etc. or plain
> >> Java MR?
> >>
> >> In case its the former, do you have users using newer versions of them?
> >>
> >> On 04-Nov-2011, at 8:03 PM, Brendan W. wrote:
> >>
> >>> Hi,
> >>>
> >>> In the jobs running on my cluster of 20 machines, I used to run jobs
> (via
> >>> "hadoop jar ...") that would spawn around 4000 map tasks.  Now when I
> run
> >>> the same jobs, that number is 20; and I notice that in the job
> >>> configuration, the parameter mapred.map.tasks is set to 20, whereas it
> >>> never used to be present at all in the configuration file.
> >>>
> >>> Changing the input split size in the job doesn't affect this--I get the
> >>> size split I ask for, but the *number* of input splits is still capped
> at
> >>> 20--i.e., the job isn't reading all of my data.
> >>>
> >>> The mystery to me is where this parameter could be getting set.  It is
> >> not
> >>> present in the mapred-site.xml file in <hadoop home>/conf on any
> machine
> >> in
> >>> the cluster, and it is not being set in the job (I'm running out of the
> >>> same jar I always did; no updates).
> >>>
> >>> Is there *anywhere* else this parameter could possibly be getting set?
> >>> I've stopped and restarted map-reduce on the cluster with no
> >> effect...it's
> >>> getting re-read in from somewhere, but I can't figure out where.
> >>>
> >>> Thanks a lot,
> >>>
> >>> Brendan
> >>
> >>
>
>

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB