Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Reset column iterator while using AccumuloRowInputFormat


Copy link to this message
-
Re: Reset column iterator while using AccumuloRowInputFormat
You could use the leverage new TransformingIterator to seek and
iterate over the keys n times:

r1 cf:cq v
r2 cf:cq v
r3 cf:cq v

becomes:

pass1-r1 cf:cq v
pass1-r2 cf:cq v
pass1-r3 cf:cq v
pass2-r1 cf:cq v
pass2-r2 cf:cq v
pass2-r3 cf:cq v

However, are you sure you need to iterate over the whole row twice?
There are strategies to internally intersect a row with itself (see
ItersectingIterator) that avoids this (at least, avoids it from the
user's perspective).

If you don't need the range in the same mapper, you could specify the
range twice in the AccumuloInputFormat's configuration, (disable
auto-adjust ranges feature so they won't be collapsed to one), and
you'll get 1 mapper per range (though I'm pretty sure this gets you
nothing more than simply doing two actions in the same mapper before
moving on to the next key/value pair).

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii
On Tue, Feb 26, 2013 at 9:12 PM, Mike Hugo <[EMAIL PROTECTED]> wrote:
> Is there a way to "reset" the column iterator back to the "beginning" when
> using the AccumuloRowInputFormat?  We have a case in which we need to
> iterate over the columns for a row at least twice and it could be a large
> row that may not fit in memory.
>
> I think we can work around this by having a separate scanner used within the
> map method for this purpose.  Other than that, is there a way to clone or
> copy or reset the column iterator such that we can iterate over it more than
> once?
>
> Thanks,
>
> Mike
>
> public void map(Text key, PeekingIterator<Map.Entry<Key, Value>>
> columnIterator, Context context) {
>     while (columnIterator.hasNext()) {
>         Map.Entry<Key, Value> kv = columnIterator.next();
>     }
>
>     // reset column iterator back to the beginning
>
>     while (columnIterator.hasNext()) {
>         Map.Entry<Key, Value> kv = columnIterator.next();
>     }
>
> }