Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - How to split a specified number of rows per Map


Copy link to this message
-
How to split a specified number of rows per Map
edward choi 2011-06-05, 11:04
Hi,

I am using HBase as a source of my MapReduce jobs.

I recently found out that TableInputFormat automatically splits the input
table so that each region of the table will be assigned to a single Map job.

But what I want to do is to split the input table so that user-specified
lines of row will be assigned to each Mapper.

For example, if I set a certain parameter to 100, then each Mapper will get
100 lines from the input Table.

Is there a method for this kind of operation?
Or do I have to modify the getSplits() of
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase?

Any answer or opinion will be much appreciated!!

Ed