Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # user - Reset column iterator while using AccumuloRowInputFormat


Copy link to this message
-
Re: Reset column iterator while using AccumuloRowInputFormat
Billie Rinaldi 2013-03-02, 21:05
On Tue, Feb 26, 2013 at 9:12 PM, Mike Hugo <[EMAIL PROTECTED]> wrote:

> Is there a way to "reset" the column iterator back to the "beginning" when
> using the AccumuloRowInputFormat?  We have a case in which we need to
> iterate over the columns for a row at least twice and it could be a large
> row that may not fit in memory.
>
> I think we can work around this by having a separate scanner used within
> the map method for this purpose.  Other than that, is there a way to clone
> or copy or reset the column iterator such that we can iterate over it more
> than once?
>

Currently, no.  It's not immediately obvious how we could change the
InputFormat to accomplish this.  The RecordReader creates a scanner, does
the seeking/fetching for the InputSplit once in its initialize method, then
iterates over the scanner, grouping together rows as appropriate.  Going
back to the beginning of a row would require us to seek the scanner again,
and replace the old iterator with a new one.  We could make a special
RecordReader with a reset method, but I don't know how we could call the
method.  Interactions with the RecordReader are handled by the MapContext,
and I don't know if you can use a custom MapContext.  Maybe we could have
an InputFormat that gives you a Scanner directly that you could reseek in
the Mapper, but we'd have to spend some time thinking about it to make sure
it would work.

Billie

> Thanks,
>
> Mike
>
> public void map(Text key, PeekingIterator<Map.Entry<Key, Value>>
> columnIterator, Context context) {
>     while (columnIterator.hasNext()) {
>         Map.Entry<Key, Value> kv = columnIterator.next();
>     }
>
> *    // reset column iterator back to the beginning*
>
>     while (columnIterator.hasNext()) {
>         Map.Entry<Key, Value> kv = columnIterator.next();
>     }
>
> }
>