Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - delete rows from hbase

Copy link to this message
Re: delete rows from hbase
Michael Segel 2012-06-20, 14:10


Just a couple of nits...

1) Please don't write your Mapper and Reducer classes as inner classes.
I don't know who started this ... maybe its easier as example code. But It really makes it harder to learn M/R code.  (Also harder to teach, but that's another story... ;-)

2) Looking at your code I saw this...
> public static class MyMapper extends
> TableMapper<ImmutableBytesWritable, Delete> {
> context.write(row, new Delete(row.get()));

Ok... while this code works, I have to ask why?

Wouldn't it be simpler to do the following.... [Note this code is an example... written from memory...]

Add a class variable HTable delTab...

Inside MyMapper add the following:

@Override setup(Mapper.Context context)
delTab = new HTable(context.getConfiguration(), "DELETE TABLE NAME GOES HERE");

Then in your TableMapper.map()

> @Override
>        public void map(ImmutableBytesWritable row, Result value, Context
> context) throws IOException, InterruptedException {
>            context.getCounter("amobee",
> "DeleteRowByCriteria.RowCounter").increment(1);
>            delTab.delete(new Delete(row);  <=== This row changed to use the reference to the table where we want to delete rows.
>        }

Not much difference except that you're not using the context.
You can test the solution.

Its a bit more general because you could be selecting rows from one table and using that data deleting from another.

In terms of speed. Its relative.

If you want to batch the rows, you could. Then you'd want to put in a local counter and every 100 rows pass in a batch delete.

While I suspect there isn't much difference in using the Context.write and just issuing a HTable.delete(),  it makes it more generic such that you can use the same code to delete from a single table or different tables.


On Jun 20, 2012, at 6:56 AM, Oleg Ruchovets wrote:

> *
> *
> Well  , I a bit changed my previous solution , it works but it is very slow
> !!!!!!!
> I think it is because I pass SINGLE DELETE object  and not LIST of DELETES.
> Is it possible to pass List of Deletes thru map instead of single delete?
> import org.apache.commons.cli.*;
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.hbase.HBaseConfiguration;
> import org.apache.hadoop.hbase.client.Delete;
> import org.apache.hadoop.hbase.client.Result;
> import org.apache.hadoop.hbase.client.Scan;
> import org.apache.hadoop.hbase.filter.Filter;
> import org.apache.hadoop.hbase.filter.PrefixFilter;
> import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
> import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
> import org.apache.hadoop.hbase.mapreduce.TableMapper;
> import org.apache.hadoop.hbase.mapreduce.TableOutputFormat;
> import org.apache.hadoop.hbase.util.Bytes;
> import org.apache.hadoop.mapreduce.Job;
> import org.slf4j.Logger;
> import org.slf4j.LoggerFactory;
> import java.io.IOException;
> public class DeleteRowByCriteria {
>    final static Logger LOG > LoggerFactory.getLogger(DeleteRowByCriteria.class);
>    public static class MyMapper extends
> TableMapper<ImmutableBytesWritable, Delete> {
>        @Override
>        public void map(ImmutableBytesWritable row, Result value, Context
> context) throws IOException, InterruptedException {
>            context.getCounter("amobee",
> "DeleteRowByCriteria.RowCounter").increment(1);
>            context.write(row, new Delete(row.get()));
>        }
>    }
>    public static void main(String[] args) throws ClassNotFoundException,
> IOException, InterruptedException {
>        Configuration config = HBaseConfiguration.create();
>        config.setBoolean("mapred.map.tasks.speculative.execution" , false);
>        Job job = new Job(config, "DeleteRowByCriteria");
>        job.setJarByClass(DeleteRowByCriteria.class);
>        Options options = getOptions();
>        try {
>            AggregationContext aggregationContext > getAggregationContext(args, options);