Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> MapReduce mapper not seeing all rows


Copy link to this message
-
MapReduce mapper not seeing all rows
Hello,

I'm running a map reduce job over a table using AccumuloRowInputFormat.
 For debugging purposes I'm logging the key.getRow() so I can see what rows
it's finding as it progresses.

If I don't specify any ranges on the input format, it skips significant
number of rows - that is, I don't see any logging indicating that it
traversed them.

To see if it was a visibility issue, I tried explicitly setting a range,
like this:

        AccumuloRowInputFormat.setRanges(job.getConfiguration(), ranges);

When doing that it does process the rows that it otherwise skips.

The same TimestampFilter is being applied in both scenarios, no other
filters / iterators are being used.

Any thoughts on why, when run without the ranges specified, it isn't seeing
a significant portion of the data?

Thanks,

Mike
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB