Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Reset column iterator while using AccumuloRowInputFormat


Copy link to this message
-
Re: Reset column iterator while using AccumuloRowInputFormat
On Tue, Feb 26, 2013 at 9:12 PM, Mike Hugo <[EMAIL PROTECTED]> wrote:

> Is there a way to "reset" the column iterator back to the "beginning" when
> using the AccumuloRowInputFormat?  We have a case in which we need to
> iterate over the columns for a row at least twice and it could be a large
> row that may not fit in memory.
>
> I think we can work around this by having a separate scanner used within
> the map method for this purpose.  Other than that, is there a way to clone
> or copy or reset the column iterator such that we can iterate over it more
> than once?
>

Currently, no.  It's not immediately obvious how we could change the
InputFormat to accomplish this.  The RecordReader creates a scanner, does
the seeking/fetching for the InputSplit once in its initialize method, then
iterates over the scanner, grouping together rows as appropriate.  Going
back to the beginning of a row would require us to seek the scanner again,
and replace the old iterator with a new one.  We could make a special
RecordReader with a reset method, but I don't know how we could call the
method.  Interactions with the RecordReader are handled by the MapContext,
and I don't know if you can use a custom MapContext.  Maybe we could have
an InputFormat that gives you a Scanner directly that you could reseek in
the Mapper, but we'd have to spend some time thinking about it to make sure
it would work.

Billie

> Thanks,
>
> Mike
>
> public void map(Text key, PeekingIterator<Map.Entry<Key, Value>>
> columnIterator, Context context) {
>     while (columnIterator.hasNext()) {
>         Map.Entry<Key, Value> kv = columnIterator.next();
>     }
>
> *    // reset column iterator back to the beginning*
>
>     while (columnIterator.hasNext()) {
>         Map.Entry<Key, Value> kv = columnIterator.next();
>     }
>
> }
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB