Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> MapReduce mapper not seeing all rows


Copy link to this message
-
Re: MapReduce mapper not seeing all rows
Our row keys are a combination of two elements, like this:

foo/bar
foo/baz
foo/bee

eee/blah
eee/boo

When running without any ranges set, we're missing an entire prefix worth -
e.g. we don't get any rows that start with "foo"

When I tried running with the range set, I did a prefix range on "foo" and
it then found the rows starting with "foo"
On Tue, Feb 26, 2013 at 2:28 PM, Billie Rinaldi <[EMAIL PROTECTED]> wrote:

> Have you noticed any pattern in the rows it seems to be missing?  E.g.
> every other row, the last row in each tablet, etc.?  When you set a range,
> what range did you set?
>
> Billie
>
>
>
> On Tue, Feb 26, 2013 at 12:17 PM, Mike Hugo <[EMAIL PROTECTED]> wrote:
>
>> Hello,
>>
>> I'm running a map reduce job over a table using AccumuloRowInputFormat.
>>  For debugging purposes I'm logging the key.getRow() so I can see what rows
>> it's finding as it progresses.
>>
>> If I don't specify any ranges on the input format, it skips significant
>> number of rows - that is, I don't see any logging indicating that it
>> traversed them.
>>
>> To see if it was a visibility issue, I tried explicitly setting a range,
>> like this:
>>
>>         AccumuloRowInputFormat.setRanges(job.getConfiguration(), ranges);
>>
>> When doing that it does process the rows that it otherwise skips.
>>
>> The same TimestampFilter is being applied in both scenarios, no other
>> filters / iterators are being used.
>>
>> Any thoughts on why, when run without the ranges specified, it isn't
>> seeing a significant portion of the data?
>>
>> Thanks,
>>
>> Mike
>>
>
>