-Re: Map Reduce with multiple scans
Nick Dimiduk 2013-02-26, 20:12
You want to run multiple scans so that you can filter the previous scan
results? Am I correct in my understanding of your objective?
First, I suggest you use the PrefixFilter  instead of constructing the
rowkey prefix manually. This looks something like:
byte md5Key = Utils.md5( "2013-01-07" );
Scan scan = new Scan(md5Key);
Yes, that's a bit redundant, but setting the startkey explicitly will save
you some unnecessary processing.
This map reduce job works fine but this is just one scan job for this map
> reduce task. What do I have to do to pass multiple scans?
Do you mean processing on multiple dates? In that case, what you really
want is a full (unbounded) table scan. Since date is the first part of your
compound rowkey, there's no prefix and no need for a filter, just use new
In general, you can use multiple filters in a given Scan (or Get). See the
FilterList  for details.
Does this help?
On Tue, Feb 26, 2013 at 5:41 AM, Paul van Hoven <
[EMAIL PROTECTED]> wrote:
> My rowkeys look something like this:
> md5( date ) + md5( ip address )
> So an example would be
> md5( "2013-02-08") + md5( "192.168.187.2")
> For one particular date I got several rows. Now I'd like to query
> different dates, for example "2013-01-01" and "2013-02-01" and some
> other. Additionally I'd like to perform this or these scans in a map
> reduce job.
> Currently my map reduce job looks like this:
> Configuration config = HBaseConfiguration.create();
> Job job = new Job(config,"ToyJob");
> job.setJarByClass( PlayWithMapReduce.class );
> byte md5Key = Utils.md5( "2013-01-07" );
> int md5Length = 16;
> int longLength = 8;
> byte startRow = Bytes.padTail( md5Key, longLength ); //append "0 0 0
> 0 0 0 0 0"
> byte endRow = Bytes.padTail( md5Key, longLength );
> endRow[md5Length-1]++; //last byte gets counted up
> Scan scan = new Scan( startRow, endRow );
> Filter f = new SingleColumnValueFilter( Bytes.toBytes("CF"),
> Bytes.toBytes("creativeId"), CompareOp.EQUAL, Bytes.toBytes("100") );
> String tableName = "ToyDataTable";
> TableMapReduceUtil.initTableMapperJob( tableName, scan, Mapper.class,
> null, null, job);
> This map reduce job works fine but this is just one scan job for this
> map reduce task. What do I have to do to pass multiple scans? Or do
> you have any other suggestions on how to achieve that goal? The
> constraint would be that it must be possible to combine it with map