-Re: MapReduce mapper not seeing all rows
Mike Hugo 2013-02-26, 20:31
Our row keys are a combination of two elements, like this:
When running without any ranges set, we're missing an entire prefix worth -
e.g. we don't get any rows that start with "foo"
When I tried running with the range set, I did a prefix range on "foo" and
it then found the rows starting with "foo"
On Tue, Feb 26, 2013 at 2:28 PM, Billie Rinaldi <[EMAIL PROTECTED]> wrote:
> Have you noticed any pattern in the rows it seems to be missing? E.g.
> every other row, the last row in each tablet, etc.? When you set a range,
> what range did you set?
> On Tue, Feb 26, 2013 at 12:17 PM, Mike Hugo <[EMAIL PROTECTED]> wrote:
>> I'm running a map reduce job over a table using AccumuloRowInputFormat.
>> For debugging purposes I'm logging the key.getRow() so I can see what rows
>> it's finding as it progresses.
>> If I don't specify any ranges on the input format, it skips significant
>> number of rows - that is, I don't see any logging indicating that it
>> traversed them.
>> To see if it was a visibility issue, I tried explicitly setting a range,
>> like this:
>> AccumuloRowInputFormat.setRanges(job.getConfiguration(), ranges);
>> When doing that it does process the rows that it otherwise skips.
>> The same TimestampFilter is being applied in both scenarios, no other
>> filters / iterators are being used.
>> Any thoughts on why, when run without the ranges specified, it isn't
>> seeing a significant portion of the data?