Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> m.putDelete versus RowDeletingIterator?

David Medinets 2013-10-09, 20:01
Eric Newton 2013-10-09, 20:21
Copy link to this message
Re: m.putDelete versus RowDeletingIterator?
On Wed, Oct 9, 2013 at 4:21 PM, Eric Newton <[EMAIL PROTECTED]> wrote:

> They do different things.
> Deleting mutations marks each entry with a delete marker.  Using the
> iterator marks a whole row with a single mutation.
> If you have a million entries in your row, the iterator is faster for
> the delete, but requires a seek to the start of the row for every
> read, so reads are slower.
> If your row has one entry, they are the same thing.
> Somewhere under N keys... the mutation path will be quite fast, and
> still preserve your reading speed.  I'll just pull a number out of
> thin air... let's say a few thousand.

The iterator may still be useful even if rows have few columns because a
row can be deleted w/o reading the row.  W/ m.putDelete() you may need to
read the row and insert a delete for each column value.   If you know what
columns to delete then you can avoid the read

If I have 10M rows to delete, each row having 10 unpredictable columns.
 With the iterator I can batch write 10M row deletion mutations.   Without
the iterator I do 10M seeks, 100M reads and write 100M deletes.
> -Eric
> On Wed, Oct 9, 2013 at 4:01 PM, David Medinets <[EMAIL PROTECTED]>
> wrote:
> > Are there any reason to favor one approach over the other?