Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> Using iterators to add columns to a row


+
Peter Tillotson 2013-04-18, 16:28
Copy link to this message
-
Re: Using iterators to add columns to a row
Writing mutations is not necessary in this case.  The iterator has the
ability to change how the current row is seen, so you don't have to create
a mutation to change the row -- you just have to create an extra key with
the information you want.

One possibility is to write an iterator that passes through existing
key/value pairs, counting them, until it gets to the end of the row, at
which point it creates a new key/value pair and passes that along before
continuing.  You'd have to make sure the count column name was chosen to
sort at the end of the row.  (If that is not possible you could iterate
over the entire row first, but that's more work for Accumulo.)  Below is an
incomplete sketch of an end-of-row counting approach.

  public Key getTopKey() {
    if count key is ready
      return count key
    else
      return source's top key
  }
  public void next() throws IOException {
    if count key is ready
      reset count key to null
    else {
      call next() on source iterator
      if this is the start of a new row {
        prepare count key for previous row
        reset count
      }
      increment count
    }
  }

Billie
On Thu, Apr 18, 2013 at 9:28 AM, Peter Tillotson <[EMAIL PROTECTED]>wrote:

>
> Apologies in advanced - this is some of my first Accumulo code, but I
> suspect there is a much better way to do this.
>
> Basically I'm trying to add an edge count column to each row of my table,
> so I get rows along the following line
>  - node1 {  to:count:3, to:node2:, to:node3:, to:node3:  }
>
> But on the client side I only need write
>  - node1 {  to:node2:, to:node3:, to:node3:  }
>
> I'd like to use the same approach to add indexes to separate column
> families, and combiners to aggregate.
>
> Aside from the inefficiency of a BatchWriter for each mutation
>  - is this the correct approach? or
>  - is there a simpler way to achieve this?
>
> Many thanks in advance
>
> Peter T
>
> --- code compiles but not tested ---
>
> import java.io.IOException;
> import java.nio.ByteBuffer;
> import java.util.Map;
>
> import org.apache.accumulo.core.client.BatchWriter;
> import org.apache.accumulo.core.client.Connector;
> import org.apache.accumulo.core.client.MutationsRejectedException;
> import org.apache.accumulo.core.data.Key;
> import org.apache.accumulo.core.data.Mutation;
> import org.apache.accumulo.core.data.Value;
> import org.apache.accumulo.core.iterators.IteratorEnvironment;
> import org.apache.accumulo.core.iterators.IteratorUtil.IteratorScope;
> import org.apache.accumulo.core.iterators.SortedKeyIterator;
> import org.apache.accumulo.core.iterators.SortedKeyValueIterator;
> import org.apache.accumulo.core.security.thrift.AuthInfo;
> import org.apache.accumulo.server.client.HdfsZooInstance;
> import org.apache.hadoop.io.Text;
>
> public class EdgeCountIterator extends SortedKeyIterator
> {
>     private boolean isDisabled = false;
>     private Connector connector;
>     private Key currentRowStart = null;
>     private String tableId;
>     private int count = 0;
>
>     @Override
>     public void init(SortedKeyValueIterator<Key, Value> source, Map<String
> , String> options, IteratorEnvironment env) throws IOException
>     {
>         super.init(source, options, env);
>         if(env.getIteratorScope() == IteratorScope.scan)
>         {
>             isDisabled = true;
>             return;
>         }
>
>         String user = options.get("username");
>         String password = options.get("password");
>         String instanceId = options.get("instance");
>         tableId = options.get("tableId");
>
>         AuthInfo authInfo = new AuthInfo();
>         authInfo.setUser(user);
>         authInfo.setPassword(password.getBytes());
>         authInfo.setInstanceId(instanceId);
>
>         authInfo.setInstanceId(instanceId);
>
>
>         try
>         {
>             connector  = HdfsZooInstance.getInstance().getConnector(
> authInfo);
>         }
>         catch (Exception e)
>         {
>             throw new RuntimeException(e);
+
Peter Tillotson 2013-04-19, 08:09
+
Bell, Philip S CIV SPAWAR... 2013-04-18, 17:04