Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> MapReduce mapper not seeing all rows


+
Mike Hugo 2013-02-26, 20:17
+
Billie Rinaldi 2013-02-26, 20:28
+
Mike Hugo 2013-02-26, 20:31
Copy link to this message
-
Re: MapReduce mapper not seeing all rows
On Tue, Feb 26, 2013 at 12:31 PM, Mike Hugo <[EMAIL PROTECTED]> wrote:

> Our row keys are a combination of two elements, like this:
>
> foo/bar
> foo/baz
> foo/bee
>
> eee/blah
> eee/boo
>
> When running without any ranges set, we're missing an entire prefix worth
> - e.g. we don't get any rows that start with "foo"
>

That sounds like a clue, because Accumulo doesn't know about the format of
your row keys.  If it were dropping arbitrary rows, I would expect you to
see some foo-prefixed rows and not others.  Are there any other differences
in the two runs?  How is the TimestampFilter configured?

Billie

>
> When I tried running with the range set, I did a prefix range on "foo" and
> it then found the rows starting with "foo"
>
>
> On Tue, Feb 26, 2013 at 2:28 PM, Billie Rinaldi <[EMAIL PROTECTED]> wrote:
>
>> Have you noticed any pattern in the rows it seems to be missing?  E.g.
>> every other row, the last row in each tablet, etc.?  When you set a range,
>> what range did you set?
>>
>> Billie
>>
>>
>>
>> On Tue, Feb 26, 2013 at 12:17 PM, Mike Hugo <[EMAIL PROTECTED]> wrote:
>>
>>> Hello,
>>>
>>> I'm running a map reduce job over a table using AccumuloRowInputFormat.
>>>  For debugging purposes I'm logging the key.getRow() so I can see what rows
>>> it's finding as it progresses.
>>>
>>> If I don't specify any ranges on the input format, it skips significant
>>> number of rows - that is, I don't see any logging indicating that it
>>> traversed them.
>>>
>>> To see if it was a visibility issue, I tried explicitly setting a range,
>>> like this:
>>>
>>>         AccumuloRowInputFormat.setRanges(job.getConfiguration(), ranges);
>>>
>>> When doing that it does process the rows that it otherwise skips.
>>>
>>> The same TimestampFilter is being applied in both scenarios, no other
>>> filters / iterators are being used.
>>>
>>> Any thoughts on why, when run without the ranges specified, it isn't
>>> seeing a significant portion of the data?
>>>
>>> Thanks,
>>>
>>> Mike
>>>
>>
>>
>
+
Mike Hugo 2013-02-26, 23:05
+
Mike Hugo 2013-02-27, 04:39