On Wed, Oct 9, 2013 at 4:21 PM, Eric Newton <[EMAIL PROTECTED]> wrote:
> They do different things. > > Deleting mutations marks each entry with a delete marker. Using the > iterator marks a whole row with a single mutation. > > If you have a million entries in your row, the iterator is faster for > the delete, but requires a seek to the start of the row for every > read, so reads are slower. > > If your row has one entry, they are the same thing. > > Somewhere under N keys... the mutation path will be quite fast, and > still preserve your reading speed. I'll just pull a number out of > thin air... let's say a few thousand. >
The iterator may still be useful even if rows have few columns because a row can be deleted w/o reading the row. W/ m.putDelete() you may need to read the row and insert a delete for each column value. If you know what columns to delete then you can avoid the read
If I have 10M rows to delete, each row having 10 unpredictable columns. With the iterator I can batch write 10M row deletion mutations. Without the iterator I do 10M seeks, 100M reads and write 100M deletes. > > -Eric > > > > On Wed, Oct 9, 2013 at 4:01 PM, David Medinets <[EMAIL PROTECTED]> > wrote: > > Are there any reason to favor one approach over the other? >
Keith Turner 2013-10-09, 22:28
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext