Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Using iterators to add columns to a row

Copy link to this message
Re: Using iterators to add columns to a row
Many thanks - this is exactly the kind of solution I was looking for, I think I prefer the buffered row approach as I can add my columns at arbitrary points. 

I looked at the whole row iterator, bu I'm not keen on the serialization overhead.     


 From: Billie Rinaldi <[EMAIL PROTECTED]>
Sent: Thursday, 18 April 2013, 18:43
Subject: Re: Using iterators to add columns to a row
Writing mutations is not necessary in this case.  The iterator has the ability to change how the current row is seen, so you don't have to create a mutation to change the row -- you just have to create an extra key with the information you want.

One possibility is to write an iterator that passes through existing key/value pairs, counting them, until it gets to the end of the row, at which point it creates a new key/value pair and passes that along before continuing.  You'd have to make sure the count column name was chosen to sort at the end of the row.  (If that is not possible you could iterate over the entire row first, but that's more work for Accumulo.)  Below is an incomplete sketch of an end-of-row counting approach.

  public Key getTopKey() {
    if count key is ready
      return count key
      return source's top key
  public void next() throws IOException {
    if count key is ready
      reset count key to null
    else {
      call next() on source iterator
      if this is the start of a new row {
        prepare count key for previous row
        reset count


      increment count


On Thu, Apr 18, 2013 at 9:28 AM, Peter Tillotson <[EMAIL PROTECTED]> wrote:
>Apologies in advanced - this is some of my first Accumulo code, but I suspect there is a much better way to do this. 
>Basically I'm trying to add an edge count column to each row of my table, so I get rows along the following line
> - node1 {  to:count:3, to:node2:, to:node3:, to:node3:  }
>But on the client side I only need write 
> - node1 {  to:node2:, to:node3:, to:node3:  }
>I'd like to use the same approach to add indexes to separate column families, and combiners to aggregate.  
>Aside from the inefficiency of a BatchWriter for each mutation 
> - is this the correct approach? or
> - is there a simpler way to achieve this?
>Many thanks in advance
>Peter T

>--- code compiles but not tested ---
>import java.io.IOException;
>import java.nio.ByteBuffer;
>import java.util.Map;
>import org.apache.accumulo.core.client.BatchWriter;
>import org.apache.accumulo.core.client.Connector;
>import org.apache.accumulo.core.client.MutationsRejectedException;
>import org.apache.accumulo.core.data.Key;
>import org.apache.accumulo.core.data.Mutation;
>import org.apache.accumulo.core.data.Value;
>import org.apache.accumulo.core.iterators.IteratorEnvironment;
>import org.apache.accumulo.core.iterators.IteratorUtil.IteratorScope;
>import org.apache.accumulo.core.iterators.SortedKeyIterator;
>import org.apache.accumulo.core.iterators.SortedKeyValueIterator;
>import org.apache.accumulo.core.security.thrift.AuthInfo;
>import org.apache.accumulo.server.client.HdfsZooInstance;
>import org.apache.hadoop.io.Text;
>public class EdgeCountIterator extends SortedKeyIterator
>    private boolean isDisabled = false;
>    private Connector connector;
>    private Key currentRowStart = null;
>    private String tableId;
>    private int count = 0;
>    @Override
>    public void init(SortedKeyValueIterator<Key, Value> source, Map<String, String> options, IteratorEnvironment env) throws IOException
>    {
>        super.init(source, options, env);
>        if(env.getIteratorScope() == IteratorScope.scan)
>        {
>            isDisabled = true;
>            return;
>        }
>        String user = options.get("username");
>        String password = options.get("password");