Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # dev - Confusion related to NLineInputFormat

Copy link to this message
Confusion related to NLineInputFormat
Darpan R 2013-05-06, 12:23
Hi guys,
 I've a confusion related to NLineInputFormat.

I have written MR job using NLineInputFormat ,output I am getting fine. But
I am getting only 2 Map jobs running.

According to documentation of NLineInputFormat :
If you want your mappers to receive a fixed number of lines of input, then
NLineInputFormat is the InputFormat to use. N refers to the number of lines
of input that each mapper receives.

I've couple of files. each around 1Mb. I've kept on HDFS.
I've written the MR job and in the driver I am setting
mapreduce.input.lineinputformat.linespermap to 20.
(Means I want 20 lines to be processed by each map. )
I've also tried setting this value by calling
 NLineInputFormat.setNumLinesPerSplit(job, 20);

Both of my input files have exactly 1000 lines each , so total 20000 lines,
so according to this 2000/20 = 100 map tasks should have been created. But
when I refer to the counters I see only 2 map taks have run. I am not sure
if I've done something wrong.
Can anyone help me understand this better ?

Thanks in advance.