Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> one new bie question


Copy link to this message
-
Re: one new bie question
Hi Balson,

Have you tried NLineInputFormat<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html>?
You can find example of NLineInputFormat here: http://goo.gl/aVzDr.
On Thu, May 9, 2013 at 2:53 PM, Balachandar R.A.
<[EMAIL PROTECTED]>wrote:

>
> Hello
>
> I would like to see the possibility of using map reduce framework for my
> following problem.
>
> I have a set of huge files. I would like to execute a binary over every
> input files. The binary needs to operate over the whole file and hence it
> is not possible to split the file in chunks. Let’s assume that I have six
> such files and have their names in a single text file. I need to write
> hadoop code to take this single file as input and every line in it should
> go to one map task. The map tasks shall execute the binary on this file and
> the file can be located in hdfs. No reduce tasks is needed and no output
> shall be emitted from the map tasks as well. The binary take care of
> creating output file in the specified location.
> Is there a way to tell hadoop to feed single line to a map task? I came
> across few examples wherein a set of files has been given and looks like
> the framework try to split the file, reads every line in the split,
> generates key/value pairs and send this pairs to single map task. In my
> situation, I want only one key value pair should be generated for one line
> and it should be given to a single map task. Thats it?
>
> For ex. Assume that this is my file <input.txt>
>
> myFirstInput.vlc
> mySecondInput.vlc
> myThirdInput.vlc
>
> Now, first map task should get a pair <1, myFirstInput.vlc>, the second
> gets a pair <2, mySecondInput.vlc> and so on.
>
> Can someone throw some light in to this problem? For me, it looks
> straightforward but could not find any pointers in the web.
>
>
>
>
>
>
>
> With thanks and regards
> Balson
>
>
>
>

--
Regards,
Ted Xu
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB