Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Map‘s number with NLineInputFormat


Copy link to this message
-
Re: Map‘s number with NLineInputFormat
yypvsxf19870706 2013-04-21, 04:52
Hi Harsh

   Thank you for suggestion . I do miss the expression to set the input format .
    Now, it works .
Thanks

Regards

发自我的 iPhone

在 2013-4-21,1:04,Harsh J <[EMAIL PROTECTED]> 写道:

> Do you also ensure setting your desired input format class via the
> setInputFormat*(…) API?
>
> On Sat, Apr 20, 2013 at 6:48 AM, yypvsxf19870706
> <[EMAIL PROTECTED]> wrote:
>> Hi
>>   I thought it would be different when adopt the NLineInputFormat
>>   So here is my conclusion the maps distribution has nothing with the
>> NLineInputFormat . The
>> NLineInputFormat could decide the number of row to each map, which map has
>> been generated according to the split.size .
>>
>>    An I got the point?
>>
>>
>> Regards
>>
>> 发自我的 iPhone
>>
>> 在 2013-4-20,8:39,"姚吉龙" <[EMAIL PROTECTED]> 写道:
>>
>> The num of map is decided by the block size and your rawdata
>>
>> ―
>> Sent from Mailbox for iPhone
>>
>>
>> On Sat, Apr 20, 2013 at 12:30 AM, YouPeng Yang <[EMAIL PROTECTED]>
>> wrote:
>>>
>>> Hi All
>>>
>>> I  take NLineInputFormat  as the Text Input Format with the following
>>> code :
>>> NLineInputFormat.setNumLinesPerSplit(job, 10);
>>> NLineInputFormat.addInputPath(job,new Path(args[0].toString()));
>>>
>>> My input file contains 1000 rows,so I thought it will distribute
>>> 100(1000/10) maps.However I got 4 maps.
>>>
>>>  I'm confued by the number of Map that was distributed according to the
>>> running log[1].
>>> How it distribute  maps when using NLineInputFormat
>>>
>>>
>>> Regards
>>>
>>>
>>>
>>> [1]======================================================>>> ....
>>> ....
>>> 2013-04-19 23:56:20,377 INFO  mapreduce.Job
>>> (Job.java:monitorAndPrintJob(1286)) - Job job_local_0001 running in uber
>>> mode : false
>>> 2013-04-19 23:56:20,377 INFO  mapreduce.Job
>>> (Job.java:monitorAndPrintJob(1293)) -  map 25% reduce 0%
>>> 2013-04-19 23:56:20,381 INFO  mapred.MapTask
>>> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
>>> 2013-04-19 23:56:20,384 INFO  mapred.Task (Task.java:done(979)) -
>>> Task:attempt_local_0001_m_000001_0 is done. And is in the process of
>>> committing
>>> 2013-04-19 23:56:20,388 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:statusUpdate(501)) - map
>>> 2013-04-19 23:56:20,389 INFO  mapred.Task (Task.java:sendDone(1099)) -
>>> Task 'attempt_local_0001_m_000001_0' done.
>>> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:run(238)) - Finishing task:
>>> attempt_local_0001_m_000001_0
>>> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:run(213)) - Starting task:
>>> attempt_local_0001_m_000002_0
>>> 2013-04-19 23:56:20,391 INFO  mapred.Task (Task.java:initialize(565)) -
>>> Using ResourceCalculatorPlugin :
>>> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@36bf7916
>>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask
>>> (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584)
>>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(923)) -
>>> mapreduce.task.io.sort.mb: 100
>>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(924)) -
>>> soft limit at 83886080
>>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(925)) -
>>> bufstart = 0; bufvoid = 104857600
>>> 2013-04-19 23:56:20,487 INFO  mapred.MapTask (MapTask.java:<init>(926)) -
>>> kvstart = 26214396; length = 6553600
>>> 2013-04-19 23:56:20,515 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:statusUpdate(501)) -
>>> 2013-04-19 23:56:20,515 INFO  mapred.MapTask (MapTask.java:flush(1389)) -
>>> Starting flush of map output
>>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1408)) -
>>> Spilling map output
>>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1409)) -
>>> bufstart = 0; bufend = 336; bufvoid = 104857600
>>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1411)) -
>>> kvstart = 26214396(104857584); kvend = 26214208(104856832); length >>> 189/6553600