Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Deleting many rows that match a given criterion


Copy link to this message
-
Re: Deleting many rows that match a given criterion
Thanks for the feedback, Aru and Keith.

I've had some more time to play around with this, and here's some
additional observations.

My existing process is very slow. I think this is due to each deletemany
command starting up a new scanner and batchwriter, and creating a lot of
rpc overhead. I didn't initially think that it would be a significant
amount of data, but maybe I just had the wrong idea of what "significant"
is in this case.

I'm not sure the RowDeletingIterator would work in this case because I do
use empty rows for other purposes. The RowFilter at compaction is a great
option, except I had hoped to avoid writing actual java code. Looking back
at this, I might have to bite that bullet.

Again, thanks both for the suggestions!

Mike
On Tue, Oct 22, 2013 at 12:04 PM, Keith Turner <[EMAIL PROTECTED]> wrote:

> If its a significant amount of data, you could create a class that extends
> row filter and set it as a compaction iterator.
>
>
> On Tue, Oct 22, 2013 at 11:45 AM, Mike Drob <[EMAIL PROTECTED]> wrote:
>
>> I'm attempting to delete all rows from a table that contain a specific
>> word in the value of a specified column. My current process looks like:
>>
>> accumulo shell -e 'egrep .*EXPRESSION.* -np -t tab -c col' | awk 'BEGIN
>> {print "table tab"}; {print "deletemany -f -np -r" $1}; END {print "exit"}'
>> > rows.out
>> accumulo shell -f rows.out
>>
>> I tried playing around with scan iterators and various options on
>> deletemany and deleterows but wasn't able to find a more straightforward
>> way to do this. Does anybody have any suggestions?
>>
>> Mike
>>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB