Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> How to split a specified number of rows per Map


Copy link to this message
-
How to split a specified number of rows per Map
Hi,

I am using HBase as a source of my MapReduce jobs.

I recently found out that TableInputFormat automatically splits the input
table so that each region of the table will be assigned to a single Map job.

But what I want to do is to split the input table so that user-specified
lines of row will be assigned to each Mapper.

For example, if I set a certain parameter to 100, then each Mapper will get
100 lines from the input Table.

Is there a method for this kind of operation?
Or do I have to modify the getSplits() of
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase?

Any answer or opinion will be much appreciated!!

Ed
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB