Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # dev - How to process part of a file in Hadoop?


Copy link to this message
-
How to process part of a file in Hadoop?
Suresh S 2014-02-07, 16:21
Dear Friends,

          I have some very large file in HDFS with 3000+ blocks.

I want run a job with various input size. I want to use the same file as a
input. Usually the number of task is equal to number of blocks/splits.
Suppose the job with 2 task need to process randomly any two block of the
given input file.

How to give a random set of HDFS blocks as a input of a job?

note:  my aim is not processing the input file to produce some output.
I want to replicate the individual block based on the load.

*Regards*
*S.Suresh,*
*Research Scholar,*
*Department of Computer Applications,*
*National Institute of Technology,*
*Tiruchirappalli - 620015.*
*+91-9941506562*