Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Deleting many rows that match a given criterion


Copy link to this message
-
Re: Deleting many rows that match a given criterion
Hi Mike,
Did you wind up writing java code to do this?  Did you go with a RowFilter?

I have a similar circumstance where I need to delete millions of rows daily
and the criteria for deletion is not in the rowkey.

Thanks in advance,
Terry

On Wed, Oct 23, 2013 at 4:21 PM, Mike Drob <[EMAIL PROTECTED]> wrote:

> Thanks for the feedback, Aru and Keith.
>
> I've had some more time to play around with this, and here's some
> additional observations.
>
> My existing process is very slow. I think this is due to each deletemany
> command starting up a new scanner and batchwriter, and creating a lot of
> rpc overhead. I didn't initially think that it would be a significant
> amount of data, but maybe I just had the wrong idea of what "significant"
> is in this case.
>
> I'm not sure the RowDeletingIterator would work in this case because I do
> use empty rows for other purposes. The RowFilter at compaction is a great
> option, except I had hoped to avoid writing actual java code. Looking back
> at this, I might have to bite that bullet.
>
> Again, thanks both for the suggestions!
>
> Mike
>
>
> On Tue, Oct 22, 2013 at 12:04 PM, Keith Turner <[EMAIL PROTECTED]> wrote:
>
>> If its a significant amount of data, you could create a class that
>> extends row filter and set it as a compaction iterator.
>>
>>
>> On Tue, Oct 22, 2013 at 11:45 AM, Mike Drob <[EMAIL PROTECTED]> wrote:
>>
>>> I'm attempting to delete all rows from a table that contain a specific
>>> word in the value of a specified column. My current process looks like:
>>>
>>> accumulo shell -e 'egrep .*EXPRESSION.* -np -t tab -c col' | awk 'BEGIN
>>> {print "table tab"}; {print "deletemany -f -np -r" $1}; END {print "exit"}'
>>> > rows.out
>>> accumulo shell -f rows.out
>>>
>>> I tried playing around with scan iterators and various options on
>>> deletemany and deleterows but wasn't able to find a more straightforward
>>> way to do this. Does anybody have any suggestions?
>>>
>>> Mike
>>>
>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB