Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - issue with map running time


+
Kasi Subrahmanyam 2012-07-04, 12:02
+
Robert Evans 2012-07-06, 17:00
+
Phani 2012-07-07, 07:24
+
Manoj Babu 2012-07-09, 17:57
Copy link to this message
-
Re: issue with map running time
Karthik Kambatla 2012-07-09, 19:02
Hi Manoj,

It seems like a different issue.

Let me understand you case better. Is your input 656 files of 11 MB each?
In that case, MapReduce does create 656 map tasks. In general, an input
split is the data read from a single file, but limited to the block size
(64 MB in your case). As the files are smaller than 64 MB, each file forms
a different split.

Hope that helps.
Karthik

On Mon, Jul 9, 2012 at 10:57 AM, Manoj Babu <[EMAIL PROTECTED]> wrote:

> Hi Bobby,
>
> I have faced a similar issue, In the job the block size is 64MB and the no
> of the maps created is 656 and the no of files uploaded to HDFS is 656 and
> its each file size is 11MB. I assume that if small files exist it will not
> able to group.
>
> Could kindly clarify it?
>
> Cheers!
> Manoj.
>
>
>
> On Fri, Jul 6, 2012 at 10:30 PM, Robert Evans <[EMAIL PROTECTED]> wrote:
>
>> How long a program takes to run depends on a lot of things.  It could be
>> a connectivity issue, or it could be that your program does a lot more
>> processing for some input records then for others, or it could be that some
>> of your records are a lot smaller so that more of them exist in a single
>> input split.  Without knowing what the code is doing it is hard to say
>> more then that.
>>
>> --Bobby Evans
>>
>> From: Kasi Subrahmanyam <[EMAIL PROTECTED]>
>> Reply-To: "[EMAIL PROTECTED]" <
>> [EMAIL PROTECTED]>
>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>> Subject: issue with map running time
>>
>> Hi ,
>>
>> I have a job which has let us say 10 mappers running in parallel.
>> Some are running fast but few of them are taking too long to run.
>> For example few mappers are taking 5 to 10 mins but others are taking
>> around 12 hours or more.
>> Does the difference in the data handled by the mappers can cause such a
>> variation or is it the issue with connectivity.
>>
>> Note:The cluster we are using have multiple users running their jobs on
>> it.
>>
>> Thanks in advance.
>> Subbu
>>
>
>
+
Manoj Babu 2012-07-10, 06:57
+
Karthik Kambatla 2012-07-10, 08:39