Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # dev >> How to process part of a file in Hadoop?


Copy link to this message
-
How to process part of a file in Hadoop?
Dear Friends,

          I have some very large file in HDFS with 3000+ blocks.

I want run a job with various input size. I want to use the same file as a
input. Usually the number of task is equal to number of blocks/splits.
Suppose the job with 2 task need to process randomly any two block of the
given input file.

How to give a random set of HDFS blocks as a input of a job?

note:  my aim is not processing the input file to produce some output.
I want to replicate the individual block based on the load.

*Regards*
*S.Suresh,*
*Research Scholar,*
*Department of Computer Applications,*
*National Institute of Technology,*
*Tiruchirappalli - 620015.*
*+91-9941506562*

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB