Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Deleting rows from the Java API

Copy link to this message
Re: Deleting rows from the Java API
I would also add that "small number of entries" in this case is probably
measured in the millions or tens of millions. If you're talking about
deleting more entries than that then you might start to look into the
iterator method.

On Wed, May 9, 2012 at 11:01 AM, Billie J Rinaldi <[EMAIL PROTECTED]
> wrote:

> On Wednesday, May 9, 2012 10:31:46 AM, "Sean Pines" <[EMAIL PROTECTED]>
> wrote:
> > I have a use case that involves me removing a record from Accumulo
> > based on the Row ID and the Column Family.
> >
> > In the shell, I noticed the command "deletemany" which allows you to
> > specify column family/column qualifier. Is there an equivalent of this
> > in the Java API?
> >
> > In the Java API, I noticed the method:
> > deleteRows(String tableName, org.apache.hadoop.io.Text start,
> > org.apache.hadoop.io.Text end)
> > Delete rows between (start, end]
> >
> > However that only seems to work for deleting a range of RowIDs
> >
> > I would also imagine that deleting rows is costly; is there a better
> > way to approach something like this?
> > The workaround I have for now is to just overwrite the row with an
> > empty string in the value field and ignore any entries that have that.
> > However this just leaves lingering rows for each "delete" and I'd like
> > to avoid that if at all possible.
> >
> > Thanks!
> Connector provides a createBatchDeleter method.  You can set the range and
> columns for BatchDeleter just like you would with a Scanner.  This is not
> an efficient operation (despite the current javadocs for BatchDeleter), but
> it works well if you're deleting a small number of entries.  It scans for
> the affected key/value pairs, pulls them back to the client, then inserts
> deletion entries for each.  The deleteRows method, on the other hand, is
> efficient because large ranges can just be dropped.  If you want to delete
> a lot of things and deleteRows won't work for you, consider using a majc
> scope Filter that filters out what you don't want, compact the table, then
> remove the filter.
> Billie