Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - how to enhance job start up speed?


Copy link to this message
-
Re: how to enhance job start up speed?
Bertrand Dechoux 2012-08-13, 13:57
I am not sure to understand and I guess I am not the only one.

1) What's a worker in your context? Only the logic inside your Mapper or
something else?
2) You should clarify your cases. You seem to have two cases but both are
in overhead so I am assuming there is a baseline? Hadoop vs sequential, so
sequential is not Hadoop?
3) What are the size of the file?

Bertrand

On Mon, Aug 13, 2012 at 1:51 PM, Matthias Kricke <
[EMAIL PROTECTED]> wrote:

> Hello all,
>
> I'm using CDH3u3.
> If I want to process one File, set to non splitable hadoop starts one
> Mapper and no Reducer (thats ok for this test scenario). The Mapper
> goes through a configuration step where some variables for the worker
> inside the mapper are initialized.
> Now the Mapper gives me K,V-pairs, which are lines of an input file. I
> process the V with the worker.
>
> When I compare the run time of hadoop to the run time of the same process
> in sequentiell manner, I get:
>
> worker time --> same in both cases
>
> case: mapper --> overhead of ~32% to the worker process (same for bigger
> chunk size)
> case: sequentiell --> overhead of ~15% to the worker process
>
> It shouldn't be that much slower, because of non splitable, the mapper
> will be executed where the data is saved by HDFS, won't it?
> Where did those 17% go? How to reduce this? Did hadoop needs the whole
> time for reading or streaming the data out of HDFS?
>
> I would appreciate your help,
>
> Greetings
> mk
>
>
--
Bertrand Dechoux