|
|
-
Re: Reset column iterator while using AccumuloRowInputFormatBillie Rinaldi 2013-03-02, 21:05
On Tue, Feb 26, 2013 at 9:12 PM, Mike Hugo <[EMAIL PROTECTED]> wrote:
> Is there a way to "reset" the column iterator back to the "beginning" when > using the AccumuloRowInputFormat? We have a case in which we need to > iterate over the columns for a row at least twice and it could be a large > row that may not fit in memory. > > I think we can work around this by having a separate scanner used within > the map method for this purpose. Other than that, is there a way to clone > or copy or reset the column iterator such that we can iterate over it more > than once? > Currently, no. It's not immediately obvious how we could change the InputFormat to accomplish this. The RecordReader creates a scanner, does the seeking/fetching for the InputSplit once in its initialize method, then iterates over the scanner, grouping together rows as appropriate. Going back to the beginning of a row would require us to seek the scanner again, and replace the old iterator with a new one. We could make a special RecordReader with a reset method, but I don't know how we could call the method. Interactions with the RecordReader are handled by the MapContext, and I don't know if you can use a custom MapContext. Maybe we could have an InputFormat that gives you a Scanner directly that you could reseek in the Mapper, but we'd have to spend some time thinking about it to make sure it would work. Billie > Thanks, > > Mike > > public void map(Text key, PeekingIterator<Map.Entry<Key, Value>> > columnIterator, Context context) { > while (columnIterator.hasNext()) { > Map.Entry<Key, Value> kv = columnIterator.next(); > } > > * // reset column iterator back to the beginning* > > while (columnIterator.hasNext()) { > Map.Entry<Key, Value> kv = columnIterator.next(); > } > > } > |