I have some very large file in HDFS with 3000+ blocks.
I want run a job with various input size. I want to use the same file as a input. Usually the number of task is equal to number of blocks/splits. Suppose the job with 2 task need to process randomly any two block of the given input file.
How to give a random set of HDFS blocks as a input of a job?
note: my aim is not processing the input file to produce some output. I want to replicate the individual block based on the load.
*Regards* *S.Suresh,* *Research Scholar,* *Department of Computer Applications,* *National Institute of Technology,* *Tiruchirappalli - 620015.* *+91-9941506562*