Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Suggestion for InputSplit and InputFormat - Split every line.


Copy link to this message
-
Re: Suggestion for InputSplit and InputFormat - Split every line.
Have a look at NLineInputFormat class in Hadoop. It is build to split the
input on the basis of number of lines.

On Thu, Mar 15, 2012 at 6:13 PM, Deepak Nettem <[EMAIL PROTECTED]>wrote:

> Hi,
>
> I have this use case - I need to spawn as many mappers as the number of
> lines in a file in HDFS. This file isn't big (only 10-50 lines). Actually
> each line represents the path of another data source that the Mappers will
> work on. So each mapper will read 1 line, (the map() method will need to be
> called only once), and work on the data source.
>
> What's the best way to construct InputSplit, InputFormat and RecordReader
> to achieve this? I would appreciate any example code :)
>
> Best,
> Deepak
>

--
Thanks & Regards,
Anil Gupta
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB