|
|
Oleg Ruchovets 2012-06-18, 22:08
Hi , I need to delete rows from hbase table by criteria. For example I need to delete all rows started with "12345". I didn't find a way to set a row prefix for delete operation. What is the best way ( practice ) to delete rows by criteria from hbase table?
Thanks in advance. Oleg.
+
Oleg Ruchovets 2012-06-18, 22:08
-
Re: delete rows from hbase
Jean-Daniel Cryans 2012-06-18, 22:18
In order to delete a row in HBase you need to know that it exists, so the way I'd go around this is running a MR job that scans and for each row that matches the filter would emit a Delete.
Hope this helps,
J-D
On Mon, Jun 18, 2012 at 3:08 PM, Oleg Ruchovets <[EMAIL PROTECTED]> wrote: > Hi , > I need to delete rows from hbase table by criteria. > For example I need to delete all rows started with "12345". > I didn't find a way to set a row prefix for delete operation. > What is the best way ( practice ) to delete rows by criteria from hbase > table? > > Thanks in advance. > Oleg.
+
Jean-Daniel Cryans 2012-06-18, 22:18
-
Re: delete rows from hbase
Oleg Ruchovets 2012-06-18, 23:13
Ok , I see. Is it possible to do it using one map/reduce job. Map phase will scan required rows using filter. Reduce phase use this row and delete it from the table. My question is it possible to execute delete using reducers and not executing it from a single client. Also can I use the same table as source and sink in the same job?
On Tue, Jun 19, 2012 at 1:18 AM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:
> In order to delete a row in HBase you need to know that it exists, so > the way I'd go around this is running a MR job that scans and for each > row that matches the filter would emit a Delete. > > Hope this helps, > > J-D > > On Mon, Jun 18, 2012 at 3:08 PM, Oleg Ruchovets <[EMAIL PROTECTED]> > wrote: > > Hi , > > I need to delete rows from hbase table by criteria. > > For example I need to delete all rows started with "12345". > > I didn't find a way to set a row prefix for delete operation. > > What is the best way ( practice ) to delete rows by criteria from hbase > > table? > > > > Thanks in advance. > > Oleg. >
+
Oleg Ruchovets 2012-06-18, 23:13
-
Re: delete rows from hbase
Jean-Daniel Cryans 2012-06-18, 23:18
On Mon, Jun 18, 2012 at 4:13 PM, Oleg Ruchovets <[EMAIL PROTECTED]> wrote: > Ok , I see. Is it possible to do it using one map/reduce job. Map phase > will scan required rows using filter. Reduce phase use this row and delete > it from the table. > My question is it possible to execute delete using > reducers and not executing it from a single client.
Going through a reducer would be slower, I would emit from the Map directly.
> Also can I use the > same table as source and sink in the same job?
That's the whole point :)
J-D
+
Jean-Daniel Cryans 2012-06-18, 23:18
-
Re: delete rows from hbase
Amitanand Aiyer 2012-06-18, 23:36
You could set up a scan with the criteria you want (start row, end row, keyonlyfilter etc), and do a delete for The rows you get.
On 6/18/12 3:08 PM, "Oleg Ruchovets" <[EMAIL PROTECTED]> wrote:
>Hi , >I need to delete rows from hbase table by criteria. >For example I need to delete all rows started with "12345". >I didn't find a way to set a row prefix for delete operation. >What is the best way ( practice ) to delete rows by criteria from hbase >table? > >Thanks in advance. >Oleg.
+
Amitanand Aiyer 2012-06-18, 23:36
-
Re: delete rows from hbase
shashwat shriparv 2012-06-19, 07:43
Try to impliment something like this
Class RegexStringComparator
On Tue, Jun 19, 2012 at 5:06 AM, Amitanand Aiyer <[EMAIL PROTECTED]> wrote:
> You could set up a scan with the criteria you want (start row, end row, > keyonlyfilter etc), and do a delete for > The rows you get. > > On 6/18/12 3:08 PM, "Oleg Ruchovets" <[EMAIL PROTECTED]> wrote: > > >Hi , > >I need to delete rows from hbase table by criteria. > >For example I need to delete all rows started with "12345". > >I didn't find a way to set a row prefix for delete operation. > >What is the best way ( practice ) to delete rows by criteria from hbase > >table? > > > >Thanks in advance. > >Oleg. > > -- ∞ Shashwat Shriparv
+
shashwat shriparv 2012-06-19, 07:43
-
Re: delete rows from hbase
Mohammad Tariq 2012-06-19, 10:46
you can use Hbase RowFilter to do that.
Regards, Mohammad Tariq On Tue, Jun 19, 2012 at 1:13 PM, shashwat shriparv <[EMAIL PROTECTED]> wrote: > Try to impliment something like this > > Class RegexStringComparator > > > > On Tue, Jun 19, 2012 at 5:06 AM, Amitanand Aiyer <[EMAIL PROTECTED]> wrote: > >> You could set up a scan with the criteria you want (start row, end row, >> keyonlyfilter etc), and do a delete for >> The rows you get. >> >> On 6/18/12 3:08 PM, "Oleg Ruchovets" <[EMAIL PROTECTED]> wrote: >> >> >Hi , >> >I need to delete rows from hbase table by criteria. >> >For example I need to delete all rows started with "12345". >> >I didn't find a way to set a row prefix for delete operation. >> >What is the best way ( practice ) to delete rows by criteria from hbase >> >table? >> > >> >Thanks in advance. >> >Oleg. >> >> > > > -- > > > ∞ > Shashwat Shriparv
+
Mohammad Tariq 2012-06-19, 10:46
-
Re: delete rows from hbase
Kevin O'dell 2012-06-19, 13:26
Oleg,
Here is some code that we used for deleting all rows with user name foo. It should be fairly portable to your situation:
import java.io.IOException;
import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.client.ResultScanner; import org.apache.hadoop.hbase.client.Scan; import org.apache.hadoop.hbase.util.Bytes;
public class HBaseDelete { public static void main(String[] args){ Configuration conf = HbaseConfiguration.create(); Htable t = new HTable("t");
String user = "foo";
byte[] startRow = Bytes.toBytes(user); byte[] stopRow = Bytes.toBytes(user); stopRow[stopRow.length - 1]++; //'fop' Scan scan = new Scan(start Row, stopRow); ResultScanner sc = t.getScanner(scan); for(Result r : sc) { t.delete(new Delete(r.getRow())); } } } /** * Start row: foo * HBase begins matching this byte, one after another. * End row: foo * HBase stops matching at first match, cause start == stop. * End Row: fo[p] (p being 0 +1) * HBase stops matching at something not "foo" */ On Tue, Jun 19, 2012 at 6:46 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote: > you can use Hbase RowFilter to do that. > > Regards, > Mohammad Tariq > > > On Tue, Jun 19, 2012 at 1:13 PM, shashwat shriparv > <[EMAIL PROTECTED]> wrote: >> Try to impliment something like this >> >> Class RegexStringComparator >> >> >> >> On Tue, Jun 19, 2012 at 5:06 AM, Amitanand Aiyer <[EMAIL PROTECTED]> wrote: >> >>> You could set up a scan with the criteria you want (start row, end row, >>> keyonlyfilter etc), and do a delete for >>> The rows you get. >>> >>> On 6/18/12 3:08 PM, "Oleg Ruchovets" <[EMAIL PROTECTED]> wrote: >>> >>> >Hi , >>> >I need to delete rows from hbase table by criteria. >>> >For example I need to delete all rows started with "12345". >>> >I didn't find a way to set a row prefix for delete operation. >>> >What is the best way ( practice ) to delete rows by criteria from hbase >>> >table? >>> > >>> >Thanks in advance. >>> >Oleg. >>> >>> >> >> >> -- >> >> >> ∞ >> Shashwat Shriparv
-- Kevin O'Dell Customer Operations Engineer, Cloudera
+
Kevin O'dell 2012-06-19, 13:26
-
Re: delete rows from hbase
Oleg Ruchovets 2012-06-19, 16:17
Thank you all for the answers. I try to speed up my solution and user map/reduce over hbase
Here is the code: I want to use Delete (map function to delete the row) and I pass the same tableName at TableMapReduceUtil.initTableMapperJob and TableMapReduceUtil.initTableReducerJob.
Question: is it possible to pass Delete as I did in map function? public class DeleteRowByCriteria { final static Logger LOG LoggerFactory.getLogger(DeleteRowByCriteria.class); public static class MyMapper extends TableMapper<ImmutableBytesWritable, Delete> {
public String account; public String lifeDate;
@Override public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException { context.write(row, new Delete(row.get())); } } public static void main(String[] args) throws ClassNotFoundException, IOException, InterruptedException {
String tableName = args[0]; String filterCriteria = args[1];
Configuration config = HBaseConfiguration.create(); Job job = new Job(config, "DeleteRowByCriteria"); job.setJarByClass(DeleteRowByCriteria.class);
try {
Filter campaignIdFilter = new PrefixFilter(Bytes.toBytes(filterCriteria)); Scan scan = new Scan(); scan.setFilter(campaignIdFilter); scan.setCaching(500); scan.setCacheBlocks(false); TableMapReduceUtil.initTableMapperJob( tableName, scan, MyMapper.class, null, null, job); TableMapReduceUtil.initTableReducerJob( tableName, null, job); job.setNumReduceTasks(0);
boolean b = job.waitForCompletion(true); if (!b) { throw new IOException("error with job!"); }
}catch (Exception e) { LOG.error(e.getMessage(), e); } } }
On Tue, Jun 19, 2012 at 9:26 AM, Kevin O'dell <[EMAIL PROTECTED]>wrote:
> Oleg, > > Here is some code that we used for deleting all rows with user name > foo. It should be fairly portable to your situation: > > import java.io.IOException; > > import org.apache.hadoop.conf.Configuration; > import org.apache.hadoop.hbase.HBaseConfiguration; > import org.apache.hadoop.hbase.client.HTable; > import org.apache.hadoop.hbase.client.Result; > import org.apache.hadoop.hbase.client.ResultScanner; > import org.apache.hadoop.hbase.client.Scan; > import org.apache.hadoop.hbase.util.Bytes; > > public class HBaseDelete { > public static void main(String[] args){ > Configuration conf = HbaseConfiguration.create(); > Htable t = new HTable("t"); > > String user = "foo"; > > byte[] startRow = Bytes.toBytes(user); > byte[] stopRow = Bytes.toBytes(user); > stopRow[stopRow.length - 1]++; //'fop' > Scan scan = new Scan(start Row, stopRow); > ResultScanner sc = t.getScanner(scan); > for(Result r : sc) { > t.delete(new Delete(r.getRow())); > } > } > } > /** > * Start row: foo > * HBase begins matching this byte, one after another. > * End row: foo > * HBase stops matching at first match, cause start == stop. > * End Row: fo[p] (p being 0 +1) > * HBase stops matching at something not "foo" > */ > > > On Tue, Jun 19, 2012 at 6:46 AM, Mohammad Tariq <[EMAIL PROTECTED]> > wrote: > > you can use Hbase RowFilter to do that. > > > > Regards, > > Mohammad Tariq > > > > > > On Tue, Jun 19, 2012 at 1:13 PM, shashwat shriparv > > <[EMAIL PROTECTED]> wrote: > >> Try to impliment something like this > >> > >> Class RegexStringComparator > >> > >> > >> > >> On Tue, Jun 19, 2012 at 5:06 AM, Amitanand Aiyer <[EMAIL PROTECTED]> > wrote: > >> > >>> You could set up a scan with the criteria you want (start row, end row, > >>> keyonlyfilter etc), and do a delete for > >>> The rows you get. > >>> > >>
+
Oleg Ruchovets 2012-06-19, 16:17
-
RE: delete rows from hbase
Anoop Sam John 2012-06-20, 08:38
Hi Do some one tried for the possibility of an Endpoint implementation using which the delete can be done directly with the scan at server side. In the below samples I can see Client -> Server - Scan for certain rows ( we want the rowkeys satisfying our criteria) Client <- Server - returns the Results Client -> Server - Delete calls
Instead using the Endpoints we can make one call from Client to Server in which both the scan and the delete will happen...
-Anoop- ________________________________________ From: Oleg Ruchovets [[EMAIL PROTECTED]] Sent: Tuesday, June 19, 2012 9:47 PM To: [EMAIL PROTECTED] Subject: Re: delete rows from hbase
Thank you all for the answers. I try to speed up my solution and user map/reduce over hbase
Here is the code: I want to use Delete (map function to delete the row) and I pass the same tableName at TableMapReduceUtil.initTableMapperJob and TableMapReduceUtil.initTableReducerJob.
Question: is it possible to pass Delete as I did in map function? public class DeleteRowByCriteria { final static Logger LOG LoggerFactory.getLogger(DeleteRowByCriteria.class); public static class MyMapper extends TableMapper<ImmutableBytesWritable, Delete> {
public String account; public String lifeDate;
@Override public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException { context.write(row, new Delete(row.get())); } } public static void main(String[] args) throws ClassNotFoundException, IOException, InterruptedException {
String tableName = args[0]; String filterCriteria = args[1];
Configuration config = HBaseConfiguration.create(); Job job = new Job(config, "DeleteRowByCriteria"); job.setJarByClass(DeleteRowByCriteria.class);
try {
Filter campaignIdFilter = new PrefixFilter(Bytes.toBytes(filterCriteria)); Scan scan = new Scan(); scan.setFilter(campaignIdFilter); scan.setCaching(500); scan.setCacheBlocks(false); TableMapReduceUtil.initTableMapperJob( tableName, scan, MyMapper.class, null, null, job); TableMapReduceUtil.initTableReducerJob( tableName, null, job); job.setNumReduceTasks(0);
boolean b = job.waitForCompletion(true); if (!b) { throw new IOException("error with job!"); }
}catch (Exception e) { LOG.error(e.getMessage(), e); } } }
On Tue, Jun 19, 2012 at 9:26 AM, Kevin O'dell <[EMAIL PROTECTED]>wrote:
> Oleg, > > Here is some code that we used for deleting all rows with user name > foo. It should be fairly portable to your situation: > > import java.io.IOException; > > import org.apache.hadoop.conf.Configuration; > import org.apache.hadoop.hbase.HBaseConfiguration; > import org.apache.hadoop.hbase.client.HTable; > import org.apache.hadoop.hbase.client.Result; > import org.apache.hadoop.hbase.client.ResultScanner; > import org.apache.hadoop.hbase.client.Scan; > import org.apache.hadoop.hbase.util.Bytes; > > public class HBaseDelete { > public static void main(String[] args){ > Configuration conf = HbaseConfiguration.create(); > Htable t = new HTable("t"); > > String user = "foo"; > > byte[] startRow = Bytes.toBytes(user); > byte[] stopRow = Bytes.toBytes(user); > stopRow[stopRow.length - 1]++; //'fop' > Scan scan = new Scan(start Row, stopRow); > ResultScanner sc = t.getScanner(scan); > for(Result r : sc) { > t.delete(new Delete(r.getRow())); > } > } > } > /** > * Start row: foo > * HBase begins matching this byte, one after another. > * End row: foo > * HBase stops matching at first match, cause start == stop. > * End Row: fo[p] (p being 0 +1) > * HBase stops matching at something not "foo"
+
Anoop Sam John 2012-06-20, 08:38
-
Re: delete rows from hbase
Michael Segel 2012-06-20, 11:41
Hi,
The simple way to do this as a map/reduce is the following....
Use the HTable Input and scan the records you want to delete. In side Mapper.Setup() create a connection to the HTable where you want to delete the records. In side Mapper.Map() for each iteration you will get a row which matched your scan that you set up in your ToolRunner. If the record matches the criteria that you want to delete, you just issue a delete command passing in that row key.
And voila! You are done.
No muss, no fuss, and no reducer.
Its that easy.
There is no output that you return to your client job except if you maybe want to keep count of the records that you deleted and that's an easy thing to do using dynamic counters.
HTH -Mike
On Jun 20, 2012, at 3:38 AM, Anoop Sam John wrote:
> Hi > Do some one tried for the possibility of an Endpoint implementation using which the delete can be done directly with the scan at server side. > In the below samples I can see > Client -> Server - Scan for certain rows ( we want the rowkeys satisfying our criteria) > Client <- Server - returns the Results > Client -> Server - Delete calls > > Instead using the Endpoints we can make one call from Client to Server in which both the scan and the delete will happen... > > -Anoop- > ________________________________________ > From: Oleg Ruchovets [[EMAIL PROTECTED]] > Sent: Tuesday, June 19, 2012 9:47 PM > To: [EMAIL PROTECTED] > Subject: Re: delete rows from hbase > > Thank you all for the answers. I try to speed up my solution and user > map/reduce over hbase > > Here is the code: > I want to use Delete (map function to delete the row) and I pass the same > tableName at TableMapReduceUtil.initTableMapperJob > and TableMapReduceUtil.initTableReducerJob. > > Question: is it possible to pass Delete as I did in map function? > > > > > public class DeleteRowByCriteria { > final static Logger LOG > LoggerFactory.getLogger(DeleteRowByCriteria.class); > public static class MyMapper extends > TableMapper<ImmutableBytesWritable, Delete> { > > public String account; > public String lifeDate; > > @Override > public void map(ImmutableBytesWritable row, Result value, Context > context) throws IOException, InterruptedException { > context.write(row, new Delete(row.get())); > } > } > public static void main(String[] args) throws ClassNotFoundException, > IOException, InterruptedException { > > String tableName = args[0]; > String filterCriteria = args[1]; > > Configuration config = HBaseConfiguration.create(); > Job job = new Job(config, "DeleteRowByCriteria"); > job.setJarByClass(DeleteRowByCriteria.class); > > try { > > Filter campaignIdFilter = new > PrefixFilter(Bytes.toBytes(filterCriteria)); > Scan scan = new Scan(); > scan.setFilter(campaignIdFilter); > scan.setCaching(500); > scan.setCacheBlocks(false); > > > TableMapReduceUtil.initTableMapperJob( > tableName, > scan, > MyMapper.class, > null, > null, > job); > > > TableMapReduceUtil.initTableReducerJob( > tableName, > null, > job); > job.setNumReduceTasks(0); > > boolean b = job.waitForCompletion(true); > if (!b) { > throw new IOException("error with job!"); > } > > }catch (Exception e) { > LOG.error(e.getMessage(), e); > } > } > } > > > > On Tue, Jun 19, 2012 at 9:26 AM, Kevin O'dell <[EMAIL PROTECTED]>wrote: > >> Oleg, >> >> Here is some code that we used for deleting all rows with user name >> foo. It should be fairly portable to your situation: >> >> import java.io.IOException; >> >> import org.apache.hadoop.conf.Configuration; >> import org.apache.hadoop.hbase.HBaseConfiguration;
+
Michael Segel 2012-06-20, 11:41
-
Re: delete rows from hbase
Oleg Ruchovets 2012-06-20, 11:56
* *
Well , I a bit changed my previous solution , it works but it is very slow !!!!!!!
I think it is because I pass SINGLE DELETE object and not LIST of DELETES.
Is it possible to pass List of Deletes thru map instead of single delete?
import org.apache.commons.cli.*; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.client.Delete; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.client.Scan; import org.apache.hadoop.hbase.filter.Filter; import org.apache.hadoop.hbase.filter.PrefixFilter; import org.apache.hadoop.hbase.io.ImmutableBytesWritable; import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil; import org.apache.hadoop.hbase.mapreduce.TableMapper; import org.apache.hadoop.hbase.mapreduce.TableOutputFormat; import org.apache.hadoop.hbase.util.Bytes; import org.apache.hadoop.mapreduce.Job; import org.slf4j.Logger; import org.slf4j.LoggerFactory;
import java.io.IOException;
public class DeleteRowByCriteria { final static Logger LOG LoggerFactory.getLogger(DeleteRowByCriteria.class);
public static class MyMapper extends TableMapper<ImmutableBytesWritable, Delete> {
@Override public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException { context.getCounter("amobee", "DeleteRowByCriteria.RowCounter").increment(1); context.write(row, new Delete(row.get())); } } public static void main(String[] args) throws ClassNotFoundException, IOException, InterruptedException {
Configuration config = HBaseConfiguration.create(); config.setBoolean("mapred.map.tasks.speculative.execution" , false); Job job = new Job(config, "DeleteRowByCriteria"); job.setJarByClass(DeleteRowByCriteria.class); Options options = getOptions(); try { AggregationContext aggregationContext getAggregationContext(args, options); Filter campaignIdFilter = new PrefixFilter(Bytes.toBytes(aggregationContext.getCampaignId())); Scan scan = new Scan(); scan.setFilter(campaignIdFilter); scan.setCaching(20000); scan.setCacheBlocks(false); TableMapReduceUtil.initTableMapperJob( aggregationContext.getCmltTableName(), scan, MyMapper.class, null, null, job);
job.setOutputFormatClass(TableOutputFormat.class); job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, aggregationContext.getCmltTableName());
job.setNumReduceTasks(0);
boolean b = job.waitForCompletion(true); if (!b) { throw new IOException("error with job!"); }
} catch (Exception e) { LOG.error(e.getMessage(), e); } }
} On Wed, Jun 20, 2012 at 7:41 AM, Michael Segel <[EMAIL PROTECTED]>wrote:
> Hi, > > The simple way to do this as a map/reduce is the following.... > > Use the HTable Input and scan the records you want to delete. > In side Mapper.Setup() create a connection to the HTable where you want to > delete the records. > In side Mapper.Map() for each iteration you will get a row which matched > your scan that you set up in your ToolRunner. If the record matches the > criteria that you want to delete, you just issue a delete command passing > in that row key. > > And voila! You are done. > > No muss, no fuss, and no reducer. > > Its that easy. > > There is no output that you return to your client job except if you maybe > want to keep count of the records that you deleted and that's an easy thing > to do using dynamic counters. > > HTH > -Mike > > On Jun 20, 2012, at 3:38 AM, Anoop Sam John wrote: > > > Hi > > Do some one tried for the possibility of an Endpoint implementation > using which the delete can be done directly with the scan at server side.
+
Oleg Ruchovets 2012-06-20, 11:56
-
Re: delete rows from hbase
Michael Segel 2012-06-20, 14:10
Hi,
Ok...
Just a couple of nits...
1) Please don't write your Mapper and Reducer classes as inner classes. I don't know who started this ... maybe its easier as example code. But It really makes it harder to learn M/R code. (Also harder to teach, but that's another story... ;-)
2) Looking at your code I saw this... > public static class MyMapper extends > TableMapper<ImmutableBytesWritable, Delete> { and > context.write(row, new Delete(row.get()));
Ok... while this code works, I have to ask why?
Wouldn't it be simpler to do the following.... [Note this code is an example... written from memory...]
Add a class variable HTable delTab...
Inside MyMapper add the following:
@Override setup(Mapper.Context context) { delTab = new HTable(context.getConfiguration(), "DELETE TABLE NAME GOES HERE"); }
Then in your TableMapper.map()
> @Override > public void map(ImmutableBytesWritable row, Result value, Context > context) throws IOException, InterruptedException { > context.getCounter("amobee", > "DeleteRowByCriteria.RowCounter").increment(1); > delTab.delete(new Delete(row); <=== This row changed to use the reference to the table where we want to delete rows. > }
Not much difference except that you're not using the context. You can test the solution.
Its a bit more general because you could be selecting rows from one table and using that data deleting from another.
In terms of speed. Its relative.
If you want to batch the rows, you could. Then you'd want to put in a local counter and every 100 rows pass in a batch delete.
While I suspect there isn't much difference in using the Context.write and just issuing a HTable.delete(), it makes it more generic such that you can use the same code to delete from a single table or different tables. HTH
-Mike
On Jun 20, 2012, at 6:56 AM, Oleg Ruchovets wrote:
> * > * > > Well , I a bit changed my previous solution , it works but it is very slow > !!!!!!! > > I think it is because I pass SINGLE DELETE object and not LIST of DELETES. > > Is it possible to pass List of Deletes thru map instead of single delete? > > import org.apache.commons.cli.*; > import org.apache.hadoop.conf.Configuration; > import org.apache.hadoop.hbase.HBaseConfiguration; > import org.apache.hadoop.hbase.client.Delete; > import org.apache.hadoop.hbase.client.Result; > import org.apache.hadoop.hbase.client.Scan; > import org.apache.hadoop.hbase.filter.Filter; > import org.apache.hadoop.hbase.filter.PrefixFilter; > import org.apache.hadoop.hbase.io.ImmutableBytesWritable; > import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil; > import org.apache.hadoop.hbase.mapreduce.TableMapper; > import org.apache.hadoop.hbase.mapreduce.TableOutputFormat; > import org.apache.hadoop.hbase.util.Bytes; > import org.apache.hadoop.mapreduce.Job; > import org.slf4j.Logger; > import org.slf4j.LoggerFactory; > > import java.io.IOException; > > public class DeleteRowByCriteria { > final static Logger LOG > LoggerFactory.getLogger(DeleteRowByCriteria.class); > > public static class MyMapper extends > TableMapper<ImmutableBytesWritable, Delete> { > > @Override > public void map(ImmutableBytesWritable row, Result value, Context > context) throws IOException, InterruptedException { > context.getCounter("amobee", > "DeleteRowByCriteria.RowCounter").increment(1); > context.write(row, new Delete(row.get())); > } > } > > > public static void main(String[] args) throws ClassNotFoundException, > IOException, InterruptedException { > > Configuration config = HBaseConfiguration.create(); > config.setBoolean("mapred.map.tasks.speculative.execution" , false); > Job job = new Job(config, "DeleteRowByCriteria"); > job.setJarByClass(DeleteRowByCriteria.class); > > > Options options = getOptions(); > try { > AggregationContext aggregationContext > getAggregationContext(args, options);
+
Michael Segel 2012-06-20, 14:10
|
|