Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> How to remove entire row at the server side?


Copy link to this message
-
Re: How to remove entire row at the server side?
Is there a typo in the package name?  One place says "com" and the other
"org".
On Wed, Nov 6, 2013 at 12:37 PM, Terry P. <[EMAIL PROTECTED]> wrote:

> Hi William, many thanks for the explanation of scan time versus compaction
> time. I'll look through the classes again and note where the remove versus
> suppress wordings are used and open a ticket.
>
> As mentioned, I only dabble in java, but regardless of that fact at this
> point I'm the one that has to get this done. I've hobbled together my first
> attempt, but I get the following error where I try to add it as a scan
> iterator for testing:
>
> root@meta> setiter -class
> org.esa.accumulo.iterators.ExpirationTimestampPurgeFilter -n expTsFilter -p
> 20 -scan -t itertest
> 2013-11-06 14:06:34,914 [shell.Shell] ERROR:
> org.apache.accumulo.core.util.shell.ShellCommandException: Command could
> not be initialized (Servers are unable to load
> org.esa.accumulo.iterators.ExpirationTimestampPurgeFilter as type
> org.apache.accumulo.core.iterators.SortedKeyValueIterator)
>
> Here's my source.  Note that the value stored in the expTs ColFam is in
> the format "yyyyMMddHHmmssS", which I convert to a long for a direct
> comparison to System.currentTimeMillis(). I only overrode the init and
> acceptRow methods, hoping the others would work as-is from the base class.
>
> One clarification: turns out expTs is the ColumnFamily, and the ingest app
> does not assign a ColumnQualifier for expTs. So to amend my prior table
> layout (including the datetime format):
>
>
> Format: Key:CF:CQ:Value
> abc:data:title:"My fantastic data"
> abc:data:content:<bytedata>
> abc:creTs::20130804171412445
> abc:*expTs*::20131104171412445
> ... 6-8 more columns of data per row ...
>
> where *expTs* is the ColumnFamily to determine if the entire row should
> be removed based on whether its value is <= NOW.  If a row has not yet been
> assigned an expiration date, expTs will not be set and the ColumnFamily
> will not yet be present.  Seems like an odd choice to use distinct Column
> Families, without Column Qualifiers, but that's how the ingest app was done.
>
> I greatly appreciate any advice you can provide.
>
> package com.esa.accumulo.iterators;
>
> import java.io.IOException;
> import java.text.ParseException;
> import java.text.SimpleDateFormat;
> import java.util.Date;
> import java.util.Map;
>
> import org.apache.accumulo.core.data.Key;
> import org.apache.accumulo.core.data.Value;
> import org.apache.accumulo.core.iterators.IteratorEnvironment;
> import org.apache.accumulo.core.iterators.SortedKeyValueIterator;
> import org.apache.accumulo.core.iterators.user.RowFilter;
>
> /**
>  * A filter that removes rows based on the column designated as the
> "expiration timestamp" column family.
>  *
>  * It removes the row if the value in the expirationTimestamp column is
> less than currentTime.
>  *
>  * TODO: The designation of the expirationTimestamp ColumnFamily and its
> DateFormat is
>  * set in the iterator options when the iterator is applied to the table.
> (For
>  * now it is hardcoded to match the format used in the Solr-Accumulo
> plugin)
>  */
> public class ExpirationTimestampPurgeFilter extends RowFilter {
>   private long currentTime;
>   // TODO: make accumuloDateFormat settable via Iterator Options
>   // Date Format for Expiration Timestamp ColumnFamily stored in Accumulo
>   private String expTsDateFormat = "yyyyMMddHHmmssS";
>   SimpleDateFormat df = new SimpleDateFormat(expTsDateFormat);
>
>   // TODO: make expTs settable via Iterator Options
>   // ColumnFamily containing Expiration Timestamp value (note ingest app
>   // did NOT assign a ColumnQualifier, only a ColumnFamily)
>   private String expTsColFam = "expTs";
>
>   @Override
>   public boolean acceptRow(SortedKeyValueIterator<Key, Value> rowIterator)
>     throws IOException {
>
>     if
> (rowIterator.getTopKey().getColumnFamily().toString().equals(expTsColFam)) {
>        Date expTsDate = null;
>        try {
>          expTsDate = df.parse(rowIterator.getTopValue().toString());