-Re: Incresing map reduce tasks will increse the time of the cpu does this seem to be correct
imen Megdiche 2012-12-13, 12:55
thank you for your explanantions. I work in a pseudo distributed mode and
not in cluster. Does your recommendation are also available in this mode
and how can i do to have an execution time increasing in function of the
nbr of map reduces tasks, if it is possible.
I don t understand in general how mapreduce is much performant in analysis
then other systems like the datawarehouses. I have tested for example with
hive a simple query "select sum(col1) from table1" and the resultts
abtained with hive is in order of 10 min and with oracle is in the order
of 0, 20 min for a size of dat ain the order of 40 MB.
2012/12/13 Mohammad Tariq <[EMAIL PROTECTED]>
> Hello Imen,
> If you have huge no of tasks then the overhead of managing the map
> and reduce task creation begins to dominate the total job execution time.
> Also, more tasks means you need more free cpu slots. If the slots are not
> free then the data block of interest will be moved to some other node where
> frees lots are available and it will consume time and it is also against
> the most basic principle of Hadoop i.e data localization. So, the no. of
> maps and reduces should be raised keeping all the factors in mind,
> otherwise you may face performance issues.
> Mohammad Tariq
> On Thu, Dec 13, 2012 at 4:11 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote:
>> If the number of maps or reducers your job launched are more than the
>> jobqueue/cluster capacity, cpu time will increase
>> On Dec 13, 2012 4:02 PM, "imen Megdiche" <[EMAIL PROTECTED]> wrote:
>>> I am trying to increase the number of map and reduce tasks for a job and
>>> even for the same data size, I noticed that the total time CPU increases but
>>> I thought it would decrease. MapReduce is known for performance calculation,
>>> but I do not see this when i do these small tests.
>>> What de you thins about this issue ?