|
|
-
Re: MapReduce mapper not seeing all rowsMike Hugo 2013-02-26, 23:05
Thanks Billie,
The TimestampFilter is configured with an end time: IteratorSetting timestampIterator = new IteratorSetting(1, "tsBefore", TimestampFilter.class); TimestampFilter.setEnd(timestampIterator, endTime, true); We have validated that all the records we're interested in have a timestamp that's less than the end time we're passing in. E.g. the timestamp being passed to the timestamp filter is 1361907184183 and a sample timestamp on a record in the table is 1361849294237. The only difference between the two runs is whether we set the ranges or not: AccumuloRowInputFormat.setRanges(job.getConfiguration(), ranges); Running a scan from the accumulo shell we see all the data is there, as well as running a scan via the Java API (not map-reduce, just a straight up scanner), but for some reason the Mapper just never hits those rows. Is there any other visibility type of issue I might be hitting? I don't think there is, as the two map / reduce runs (one with a range, one without) are kicked off the same way, with the same username/password, and by the same unix user. Any other thoughts? I'm sure we're missing something simple but I can't pinpoint it. Thanks, Mike On Tue, Feb 26, 2013 at 4:45 PM, Billie Rinaldi <[EMAIL PROTECTED]> wrote: > On Tue, Feb 26, 2013 at 12:31 PM, Mike Hugo <[EMAIL PROTECTED]> wrote: > >> Our row keys are a combination of two elements, like this: >> >> foo/bar >> foo/baz >> foo/bee >> >> eee/blah >> eee/boo >> >> When running without any ranges set, we're missing an entire prefix worth >> - e.g. we don't get any rows that start with "foo" >> > > That sounds like a clue, because Accumulo doesn't know about the format of > your row keys. If it were dropping arbitrary rows, I would expect you to > see some foo-prefixed rows and not others. Are there any other differences > in the two runs? How is the TimestampFilter configured? > > Billie > > > >> >> When I tried running with the range set, I did a prefix range on "foo" >> and it then found the rows starting with "foo" >> >> >> On Tue, Feb 26, 2013 at 2:28 PM, Billie Rinaldi <[EMAIL PROTECTED]>wrote: >> >>> Have you noticed any pattern in the rows it seems to be missing? E.g. >>> every other row, the last row in each tablet, etc.? When you set a range, >>> what range did you set? >>> >>> Billie >>> >>> >>> >>> On Tue, Feb 26, 2013 at 12:17 PM, Mike Hugo <[EMAIL PROTECTED]> wrote: >>> >>>> Hello, >>>> >>>> I'm running a map reduce job over a table using AccumuloRowInputFormat. >>>> For debugging purposes I'm logging the key.getRow() so I can see what rows >>>> it's finding as it progresses. >>>> >>>> If I don't specify any ranges on the input format, it skips significant >>>> number of rows - that is, I don't see any logging indicating that it >>>> traversed them. >>>> >>>> To see if it was a visibility issue, I tried explicitly setting a >>>> range, like this: >>>> >>>> AccumuloRowInputFormat.setRanges(job.getConfiguration(), >>>> ranges); >>>> >>>> When doing that it does process the rows that it otherwise skips. >>>> >>>> The same TimestampFilter is being applied in both scenarios, no other >>>> filters / iterators are being used. >>>> >>>> Any thoughts on why, when run without the ranges specified, it isn't >>>> seeing a significant portion of the data? >>>> >>>> Thanks, >>>> >>>> Mike >>>> >>> >>> >> > |