Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Map‘s number with NLineInputFormat


Copy link to this message
-
Re: Map‘s number with NLineInputFormat
Hi Harsh

   Thank you for suggestion . I do miss the expression to set the input format .
    Now, it works .
Thanks

Regards

发自我的 iPhone

在 2013-4-21,1:04,Harsh J <[EMAIL PROTECTED]> 写道:

> Do you also ensure setting your desired input format class via the
> setInputFormat*(…) API?
>
> On Sat, Apr 20, 2013 at 6:48 AM, yypvsxf19870706
> <[EMAIL PROTECTED]> wrote:
>> Hi
>>   I thought it would be different when adopt the NLineInputFormat
>>   So here is my conclusion the maps distribution has nothing with the
>> NLineInputFormat . The
>> NLineInputFormat could decide the number of row to each map, which map has
>> been generated according to the split.size .
>>
>>    An I got the point?
>>
>>
>> Regards
>>
>> 发自我的 iPhone
>>
>> 在 2013-4-20,8:39,"姚吉龙" <[EMAIL PROTECTED]> 写道:
>>
>> The num of map is decided by the block size and your rawdata
>>
>> ―
>> Sent from Mailbox for iPhone
>>
>>
>> On Sat, Apr 20, 2013 at 12:30 AM, YouPeng Yang <[EMAIL PROTECTED]>
>> wrote:
>>>
>>> Hi All
>>>
>>> I  take NLineInputFormat  as the Text Input Format with the following
>>> code :
>>> NLineInputFormat.setNumLinesPerSplit(job, 10);
>>> NLineInputFormat.addInputPath(job,new Path(args[0].toString()));
>>>
>>> My input file contains 1000 rows,so I thought it will distribute
>>> 100(1000/10) maps.However I got 4 maps.
>>>
>>>  I'm confued by the number of Map that was distributed according to the
>>> running log[1].
>>> How it distribute  maps when using NLineInputFormat
>>>
>>>
>>> Regards
>>>
>>>
>>>
>>> [1]======================================================>>> ....
>>> ....
>>> 2013-04-19 23:56:20,377 INFO  mapreduce.Job
>>> (Job.java:monitorAndPrintJob(1286)) - Job job_local_0001 running in uber
>>> mode : false
>>> 2013-04-19 23:56:20,377 INFO  mapreduce.Job
>>> (Job.java:monitorAndPrintJob(1293)) -  map 25% reduce 0%
>>> 2013-04-19 23:56:20,381 INFO  mapred.MapTask
>>> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
>>> 2013-04-19 23:56:20,384 INFO  mapred.Task (Task.java:done(979)) -
>>> Task:attempt_local_0001_m_000001_0 is done. And is in the process of
>>> committing
>>> 2013-04-19 23:56:20,388 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:statusUpdate(501)) - map
>>> 2013-04-19 23:56:20,389 INFO  mapred.Task (Task.java:sendDone(1099)) -
>>> Task 'attempt_local_0001_m_000001_0' done.
>>> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:run(238)) - Finishing task:
>>> attempt_local_0001_m_000001_0
>>> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:run(213)) - Starting task:
>>> attempt_local_0001_m_000002_0
>>> 2013-04-19 23:56:20,391 INFO  mapred.Task (Task.java:initialize(565)) -
>>> Using ResourceCalculatorPlugin :
>>> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@36bf7916
>>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask
>>> (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584)
>>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(923)) -
>>> mapreduce.task.io.sort.mb: 100
>>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(924)) -
>>> soft limit at 83886080
>>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(925)) -
>>> bufstart = 0; bufvoid = 104857600
>>> 2013-04-19 23:56:20,487 INFO  mapred.MapTask (MapTask.java:<init>(926)) -
>>> kvstart = 26214396; length = 6553600
>>> 2013-04-19 23:56:20,515 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:statusUpdate(501)) -
>>> 2013-04-19 23:56:20,515 INFO  mapred.MapTask (MapTask.java:flush(1389)) -
>>> Starting flush of map output
>>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1408)) -
>>> Spilling map output
>>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1409)) -
>>> bufstart = 0; bufend = 336; bufvoid = 104857600
>>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1411)) -
>>> kvstart = 26214396(104857584); kvend = 26214208(104856832); length >>> 189/6553600
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB