Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> How to remove entire row at the server side?

Copy link to this message
Re: How to remove entire row at the server side?
Hi Keith,
No, expTs won't be the first actually -- that'll teach me to try things
with overly simplistic data!

 There will be 10-12 column families for each row. I take it my simple
check for column family name isn't enough?

On Thursday, November 7, 2013, Keith Turner wrote:

> Your accept row function assumes that expTs will be the first column in
> the row, is this always the case?
> On Wed, Nov 6, 2013 at 3:37 PM, Terry P. <[EMAIL PROTECTED]> wrote:
> Hi William, many thanks for the explanation of scan time versus compaction
> time. I'll look through the classes again and note where the remove versus
> suppress wordings are used and open a ticket.
> As mentioned, I only dabble in java, but regardless of that fact at this
> point I'm the one that has to get this done. I've hobbled together my first
> attempt, but I get the following error where I try to add it as a scan
> iterator for testing:
> root@meta> setiter -class
> org.esa.accumulo.iterators.ExpirationTimestampPurgeFilter -n expTsFilter -p
> 20 -scan -t itertest
> 2013-11-06 14:06:34,914 [shell.Shell] ERROR:
> org.apache.accumulo.core.util.shell.ShellCommandException: Command could
> not be initialized (Servers are unable to load
> org.esa.accumulo.iterators.ExpirationTimestampPurgeFilter as type
> org.apache.accumulo.core.iterators.SortedKeyValueIterator)
> Here's my source.  Note that the value stored in the expTs ColFam is in
> the format "yyyyMMddHHmmssS", which I convert to a long for a direct
> comparison to System.currentTimeMillis(). I only overrode the init and
> acceptRow methods, hoping the others would work as-is from the base class.
> One clarification: turns out expTs is the ColumnFamily, and the ingest app
> does not assign a ColumnQualifier for expTs. So to amend my prior table
> layout (including the datetime format):
> Format: Key:CF:CQ:Value
> abc:data:title:"My fantastic data"
> abc:data:content:<bytedata>
> abc:creTs::20130804171412445
> abc:*expTs*::20131104171412445
> ... 6-8 more columns of data per row ...
> where *expTs* is the ColumnFamily to determine if the entire row should
> be removed based on whether its value is <= NOW.  If a row has not yet been
> assigned an expiration date, expTs will not be set and the ColumnFamily
> will not yet be present.  Seems like an odd choice to use distinct Column
> Families, without Column Qualifiers, but that's how the ingest app was done.
> I greatly appreciate any advice you can provide.
> package com.esa.accumulo.iterators;
> import java.io.IOException;
> import java.text.ParseException;
> import java.text.SimpleDateFormat;
> import java.util.Date;
> import java.util.Map;
> import org.apache.accumulo.core.data.Key;
> import org.apache.accumulo.core.data.Value;
> import org.apache.accumulo.core.iterators.IteratorEnvironment;
> import org.apache.accumulo.core.iterators.SortedKeyValueIterator;
> import org.apache.accumulo.core.iterators.user.RowFilter;
> /**
>  * A filter that removes rows based on the column designated as the
> "expiration timestamp" column family.
>  *
>  * It removes the row if the value in the expirationTimestamp column is
> less than currentTime.
>  *
>  * TODO: The designation of the expirationTimestamp ColumnFamily and its
> DateFormat is
>  * set in the iterator options when the iterator is applied to the table.
> (For
>  * now it is hardcoded to match the format used in the Solr-Accumulo
> plugin)
>  */
> public class ExpirationTimestampPurgeFilter extends RowFilter {
>   private long currentTime;
>   // TODO: make accumuloDateFormat settable via Iterator Options
>   // Date Format for Expiration Timestamp ColumnFamily stored in Accumulo
>   private String expTsDateFormat = "yyyyMMddHHmmssS";
>   SimpleDateFormat df = new SimpleDateFormat(expTsDateFormat);
>   // TODO: make expTs settable via Iterator Options
>   // ColumnFamily containing Expiration Timestamp value (note ingest app
>   // did NOT assign a ColumnQualifier, only a ColumnFamily)