Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Deleting many rows that match a given criterion


Copy link to this message
-
Re: Deleting many rows that match a given criterion
Thanks for the feedback, Aru and Keith.

I've had some more time to play around with this, and here's some
additional observations.

My existing process is very slow. I think this is due to each deletemany
command starting up a new scanner and batchwriter, and creating a lot of
rpc overhead. I didn't initially think that it would be a significant
amount of data, but maybe I just had the wrong idea of what "significant"
is in this case.

I'm not sure the RowDeletingIterator would work in this case because I do
use empty rows for other purposes. The RowFilter at compaction is a great
option, except I had hoped to avoid writing actual java code. Looking back
at this, I might have to bite that bullet.

Again, thanks both for the suggestions!

Mike
On Tue, Oct 22, 2013 at 12:04 PM, Keith Turner <[EMAIL PROTECTED]> wrote:

> If its a significant amount of data, you could create a class that extends
> row filter and set it as a compaction iterator.
>
>
> On Tue, Oct 22, 2013 at 11:45 AM, Mike Drob <[EMAIL PROTECTED]> wrote:
>
>> I'm attempting to delete all rows from a table that contain a specific
>> word in the value of a specified column. My current process looks like:
>>
>> accumulo shell -e 'egrep .*EXPRESSION.* -np -t tab -c col' | awk 'BEGIN
>> {print "table tab"}; {print "deletemany -f -np -r" $1}; END {print "exit"}'
>> > rows.out
>> accumulo shell -f rows.out
>>
>> I tried playing around with scan iterators and various options on
>> deletemany and deleterows but wasn't able to find a more straightforward
>> way to do this. Does anybody have any suggestions?
>>
>> Mike
>>
>
>