Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Incresing map reduce tasks will increse the time of the cpu does this seem to be correct


+
imen Megdiche 2012-12-13, 10:31
+
Nitin Pawar 2012-12-13, 10:41
+
Mohammad Tariq 2012-12-13, 10:50
Copy link to this message
-
Re: Incresing map reduce tasks will increse the time of the cpu does this seem to be correct
thank you for your explanantions. I  work in a pseudo distributed mode and
not in cluster. Does your recommendation are also available  in this mode
and how can i do to have an execution time increasing in function of the
nbr of map reduces tasks, if it is possible.
I don t understand in general how mapreduce is much performant in analysis
then other systems like the datawarehouses. I have tested for example with
hive a simple query "select sum(col1) from table1" and the resultts
abtained with hive is in order of 10 min  and with oracle is in the order
of 0, 20 min for a size of dat ain the order of 40 MB.

Thank you.
2012/12/13 Mohammad Tariq <[EMAIL PROTECTED]>

> Hello Imen,
>
>       If you have huge no of tasks then the overhead of managing the map
> and reduce task creation begins to dominate the total job execution time.
> Also, more tasks means you need more free cpu slots. If the slots are not
> free then the data block of interest will be moved to some other node where
> frees lots are available and it will consume time and it is also against
> the most basic principle of Hadoop i.e data localization. So, the no. of
> maps and reduces should be raised keeping all the factors in mind,
> otherwise you may face performance issues.
>
> HTH
>
>
> Regards,
>     Mohammad Tariq
>
>
>
> On Thu, Dec 13, 2012 at 4:11 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote:
>
>> If the number of maps or reducers your job launched are more than the
>> jobqueue/cluster capacity, cpu time will increase
>> On Dec 13, 2012 4:02 PM, "imen Megdiche" <[EMAIL PROTECTED]> wrote:
>>
>>> Hello,
>>>
>>> I am trying to increase the number of map and reduce tasks for a job and
>>> even for the same data size, I noticed that the total time CPU increases but
>>> I thought it would decrease. MapReduce is known for performance calculation,
>>> but I do not see this when i  do these small tests.
>>>
>>> What de you thins about this issue ?
>>>
>>>
>
+
Mohammad Tariq 2012-12-13, 13:59
+
imen Megdiche 2012-12-13, 14:21
+
Mohammad Tariq 2012-12-13, 14:23
+
imen Megdiche 2012-12-13, 14:28
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB