Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Map Reduce with multiple scans

Copy link to this message
Map Reduce with multiple scans
My rowkeys look something like this:

md5( date ) + md5( ip address )

So an example would be
md5( "2013-02-08") + md5( "")

For one particular date I got several rows. Now I'd like to query
different dates, for example "2013-01-01" and "2013-02-01" and some
other. Additionally I'd like to perform this or these scans in a map
reduce job.

Currently my map reduce job looks like this:

Configuration config = HBaseConfiguration.create();
Job job = new Job(config,"ToyJob");
job.setJarByClass( PlayWithMapReduce.class );

byte[] md5Key = Utils.md5( "2013-01-07" );
int md5Length = 16;
int longLength = 8;

byte[] startRow = Bytes.padTail( md5Key, longLength ); //append "0 0 0
0 0 0 0 0"
byte[] endRow = Bytes.padTail( md5Key, longLength );
endRow[md5Length-1]++; //last byte gets counted up

Scan scan = new Scan( startRow, endRow );

Filter f = new SingleColumnValueFilter( Bytes.toBytes("CF"),
Bytes.toBytes("creativeId"), CompareOp.EQUAL, Bytes.toBytes("100") );

String tableName = "ToyDataTable";
TableMapReduceUtil.initTableMapperJob( tableName, scan, Mapper.class,
null, null, job);

This map reduce job works fine but this is just one scan job for this
map reduce task. What do I have to do to pass multiple scans? Or do
you have any other suggestions on how to achieve that goal? The
constraint would be that it must be possible to combine it with map