Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Deleting many rows that match a given criterion


Copy link to this message
-
Re: Deleting many rows that match a given criterion
Terry,

Yea, a RowFilter + full compaction takes care of the issue. Note that
simply setting a RowFilter for scan time and expecting the data to delete
naturally might not work if your clients set varying fetch columns on their
scanners.

Mike
On Thu, Oct 31, 2013 at 5:11 PM, Terry P. <[EMAIL PROTECTED]> wrote:

> Hi Mike,
> Did you wind up writing java code to do this?  Did you go with a RowFilter?
>
> I have a similar circumstance where I need to delete millions of rows
> daily and the criteria for deletion is not in the rowkey.
>
> Thanks in advance,
> Terry
>
>
>
> On Wed, Oct 23, 2013 at 4:21 PM, Mike Drob <[EMAIL PROTECTED]> wrote:
>
>> Thanks for the feedback, Aru and Keith.
>>
>> I've had some more time to play around with this, and here's some
>> additional observations.
>>
>> My existing process is very slow. I think this is due to each deletemany
>> command starting up a new scanner and batchwriter, and creating a lot of
>> rpc overhead. I didn't initially think that it would be a significant
>> amount of data, but maybe I just had the wrong idea of what "significant"
>> is in this case.
>>
>> I'm not sure the RowDeletingIterator would work in this case because I do
>> use empty rows for other purposes. The RowFilter at compaction is a great
>> option, except I had hoped to avoid writing actual java code. Looking back
>> at this, I might have to bite that bullet.
>>
>> Again, thanks both for the suggestions!
>>
>> Mike
>>
>>
>> On Tue, Oct 22, 2013 at 12:04 PM, Keith Turner <[EMAIL PROTECTED]> wrote:
>>
>>> If its a significant amount of data, you could create a class that
>>> extends row filter and set it as a compaction iterator.
>>>
>>>
>>> On Tue, Oct 22, 2013 at 11:45 AM, Mike Drob <[EMAIL PROTECTED]> wrote:
>>>
>>>> I'm attempting to delete all rows from a table that contain a specific
>>>> word in the value of a specified column. My current process looks like:
>>>>
>>>> accumulo shell -e 'egrep .*EXPRESSION.* -np -t tab -c col' | awk 'BEGIN
>>>> {print "table tab"}; {print "deletemany -f -np -r" $1}; END {print "exit"}'
>>>> > rows.out
>>>> accumulo shell -f rows.out
>>>>
>>>> I tried playing around with scan iterators and various options on
>>>> deletemany and deleterows but wasn't able to find a more straightforward
>>>> way to do this. Does anybody have any suggestions?
>>>>
>>>> Mike
>>>>
>>>
>>>
>>
>