Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo, mail # user - How to remove entire row at the server side?


+
Terry P. 2013-11-05, 23:20
+
William Slacum 2013-11-06, 02:48
Copy link to this message
-
Re: How to remove entire row at the server side?
Terry P. 2013-11-06, 20:37
Hi William, many thanks for the explanation of scan time versus compaction
time. I'll look through the classes again and note where the remove versus
suppress wordings are used and open a ticket.

As mentioned, I only dabble in java, but regardless of that fact at this
point I'm the one that has to get this done. I've hobbled together my first
attempt, but I get the following error where I try to add it as a scan
iterator for testing:

root@meta> setiter -class
org.esa.accumulo.iterators.ExpirationTimestampPurgeFilter -n expTsFilter -p
20 -scan -t itertest
2013-11-06 14:06:34,914 [shell.Shell] ERROR:
org.apache.accumulo.core.util.shell.ShellCommandException: Command could
not be initialized (Servers are unable to load
org.esa.accumulo.iterators.ExpirationTimestampPurgeFilter as type
org.apache.accumulo.core.iterators.SortedKeyValueIterator)

Here's my source.  Note that the value stored in the expTs ColFam is in the
format "yyyyMMddHHmmssS", which I convert to a long for a direct comparison
to System.currentTimeMillis(). I only overrode the init and acceptRow
methods, hoping the others would work as-is from the base class.

One clarification: turns out expTs is the ColumnFamily, and the ingest app
does not assign a ColumnQualifier for expTs. So to amend my prior table
layout (including the datetime format):

Format: Key:CF:CQ:Value
abc:data:title:"My fantastic data"
abc:data:content:<bytedata>
abc:creTs::20130804171412445
abc:*expTs*::20131104171412445
... 6-8 more columns of data per row ...

where *expTs* is the ColumnFamily to determine if the entire row should be
removed based on whether its value is <= NOW.  If a row has not yet been
assigned an expiration date, expTs will not be set and the ColumnFamily
will not yet be present.  Seems like an odd choice to use distinct Column
Families, without Column Qualifiers, but that's how the ingest app was done.

I greatly appreciate any advice you can provide.

package com.esa.accumulo.iterators;

import java.io.IOException;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Map;

import org.apache.accumulo.core.data.Key;
import org.apache.accumulo.core.data.Value;
import org.apache.accumulo.core.iterators.IteratorEnvironment;
import org.apache.accumulo.core.iterators.SortedKeyValueIterator;
import org.apache.accumulo.core.iterators.user.RowFilter;

/**
 * A filter that removes rows based on the column designated as the
"expiration timestamp" column family.
 *
 * It removes the row if the value in the expirationTimestamp column is
less than currentTime.
 *
 * TODO: The designation of the expirationTimestamp ColumnFamily and its
DateFormat is
 * set in the iterator options when the iterator is applied to the table.
(For
 * now it is hardcoded to match the format used in the Solr-Accumulo plugin)
 */
public class ExpirationTimestampPurgeFilter extends RowFilter {
  private long currentTime;
  // TODO: make accumuloDateFormat settable via Iterator Options
  // Date Format for Expiration Timestamp ColumnFamily stored in Accumulo
  private String expTsDateFormat = "yyyyMMddHHmmssS";
  SimpleDateFormat df = new SimpleDateFormat(expTsDateFormat);

  // TODO: make expTs settable via Iterator Options
  // ColumnFamily containing Expiration Timestamp value (note ingest app
  // did NOT assign a ColumnQualifier, only a ColumnFamily)
  private String expTsColFam = "expTs";

  @Override
  public boolean acceptRow(SortedKeyValueIterator<Key, Value> rowIterator)
    throws IOException {

    if
(rowIterator.getTopKey().getColumnFamily().toString().equals(expTsColFam)) {
       Date expTsDate = null;
       try {
         expTsDate = df.parse(rowIterator.getTopValue().toString());
           if (expTsDate.getTime() < currentTime)
             return false;
       } catch (ParseException e) {
         // TODO Auto-generated catch block
         e.printStackTrace();
       }
    }
    return true;
  }

  @Override
  public void init(SortedKeyValueIterator<Key, Value> source,
      Map<String, String> options, IteratorEnvironment env) throws
IOException {
    super.init(source, options, env);
    currentTime = System.currentTimeMillis();
  }

}
On Tue, Nov 5, 2013 at 8:48 PM, William Slacum <
[EMAIL PROTECTED]> wrote:

+
Billie Rinaldi 2013-11-06, 20:43
+
Terry P. 2013-11-06, 20:49
+
Billie Rinaldi 2013-11-06, 21:29
+
Terry P. 2013-11-06, 22:50
+
Billie Rinaldi 2013-11-07, 00:56
+
Terry P. 2013-11-07, 02:28
+
David Medinets 2013-11-07, 02:06
+
John Vines 2013-11-07, 20:57
+
Terry P. 2013-11-07, 02:31
+
Keith Turner 2013-11-07, 17:43
+
Terry P. 2013-11-07, 20:49
+
Keith Turner 2013-11-07, 21:16
+
Terry P. 2013-11-08, 05:26
+
Keith Turner 2013-11-08, 14:49
+
Terry P. 2013-11-12, 17:36