|
Leif Wickland
2011-06-17, 20:43
Joey Echeverria
2011-06-17, 20:44
Stack
2011-06-17, 20:57
Ryan Rawson
2011-06-17, 21:33
Leif Wickland
2011-06-17, 23:12
Jean-Daniel Cryans
2011-06-20, 17:39
Leif Wickland
2011-06-21, 18:35
|
-
Is there a reason mapreduce.TableOutputFormat doesn't support Increment?Leif Wickland 2011-06-17, 20:43
I tried to use TableMapper and TableOutputFormat in
from org.apache.hadoop.hbase.mapreduce to write a map-reduce which incremented some columns. I noticed that TableOutputFormat.write() doesn't support Increment, only Put and Delete. Is there a reason that TableOutputFormat shouldn't support increment? I think adding support for increment would only require adding a copy constructor to Increment and a few lines to TableOutputFormat: I'd be willing to give writing the patch a try if there's no objection. Leif Wickland
-
Re: Is there a reason mapreduce.TableOutputFormat doesn't support Increment?Joey Echeverria 2011-06-17, 20:44
+1
On Jun 17, 2011 4:43 PM, "Leif Wickland" <[EMAIL PROTECTED]> wrote: > I tried to use TableMapper and TableOutputFormat in > from org.apache.hadoop.hbase.mapreduce to write a map-reduce which > incremented some columns. I noticed that TableOutputFormat.write() doesn't > support Increment, only Put and Delete. > > Is there a reason that TableOutputFormat shouldn't support increment? > > I think adding support for increment would only require adding a copy > constructor to Increment and a few lines to TableOutputFormat: I'd be > willing to give writing the patch a try if there's no objection. > > Leif Wickland
-
Re: Is there a reason mapreduce.TableOutputFormat doesn't support Increment?Stack 2011-06-17, 20:57
Go for it!
St.Ack On Fri, Jun 17, 2011 at 1:43 PM, Leif Wickland <[EMAIL PROTECTED]> wrote: > I tried to use TableMapper and TableOutputFormat in > from org.apache.hadoop.hbase.mapreduce to write a map-reduce which > incremented some columns. I noticed that TableOutputFormat.write() doesn't > support Increment, only Put and Delete. > > Is there a reason that TableOutputFormat shouldn't support increment? > > I think adding support for increment would only require adding a copy > constructor to Increment and a few lines to TableOutputFormat: I'd be > willing to give writing the patch a try if there's no objection. > > Leif Wickland >
-
Re: Is there a reason mapreduce.TableOutputFormat doesn't support Increment?Ryan Rawson 2011-06-17, 21:33
Watch out - increment is not idempotent, so you will have to somehow
ensure that a map runs exactly 1x and never more or less than that. Also job failures will ruin the data as well. -ryan On Fri, Jun 17, 2011 at 1:57 PM, Stack <[EMAIL PROTECTED]> wrote: > Go for it! > St.Ack > > On Fri, Jun 17, 2011 at 1:43 PM, Leif Wickland <[EMAIL PROTECTED]> wrote: >> I tried to use TableMapper and TableOutputFormat in >> from org.apache.hadoop.hbase.mapreduce to write a map-reduce which >> incremented some columns. I noticed that TableOutputFormat.write() doesn't >> support Increment, only Put and Delete. >> >> Is there a reason that TableOutputFormat shouldn't support increment? >> >> I think adding support for increment would only require adding a copy >> constructor to Increment and a few lines to TableOutputFormat: I'd be >> willing to give writing the patch a try if there's no objection. >> >> Leif Wickland >> >
-
Re: Is there a reason mapreduce.TableOutputFormat doesn't support Increment?Leif Wickland 2011-06-17, 23:12
Interesting (and mildly terrifying) point, Ryan.
Is there a valid pattern for storing a sum in HBase then using mapreduce to calculate an update to that sum based on incremental data updates? It seems a cycle like the following would avoid double increment problems, but would suffer from a monster race condition. 1. Mapreduce updated values into aggregates (written to HDFS). 2. Mapreduce aggregates with existing value in HBase into new target value for HBase (but written to HDFS). 3. Mapreduce writing new values to HBases. Please tell me there's a better way. Thanks, Leif On Fri, Jun 17, 2011 at 3:33 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: > Watch out - increment is not idempotent, so you will have to somehow > ensure that a map runs exactly 1x and never more or less than that. > Also job failures will ruin the data as well. > > -ryan > > On Fri, Jun 17, 2011 at 1:57 PM, Stack <[EMAIL PROTECTED]> wrote: > > Go for it! > > St.Ack > > > > On Fri, Jun 17, 2011 at 1:43 PM, Leif Wickland <[EMAIL PROTECTED]> > wrote: > >> I tried to use TableMapper and TableOutputFormat in > >> from org.apache.hadoop.hbase.mapreduce to write a map-reduce which > >> incremented some columns. I noticed that TableOutputFormat.write() > doesn't > >> support Increment, only Put and Delete. > >> > >> Is there a reason that TableOutputFormat shouldn't support increment? > >> > >> I think adding support for increment would only require adding a copy > >> constructor to Increment and a few lines to TableOutputFormat: I'd be > >> willing to give writing the patch a try if there's no objection. > >> > >> Leif Wickland > >> > > >
-
Re: Is there a reason mapreduce.TableOutputFormat doesn't support Increment?Jean-Daniel Cryans 2011-06-20, 17:39
I think you could store deltas and roll them up later. You would have
to store them under a qualifier that's unique for each job so that failures and speculative execution (if enabled) only overwrites instead of incrementing something. At read time you would need to sum up those columns together. J-D On Fri, Jun 17, 2011 at 4:12 PM, Leif Wickland <[EMAIL PROTECTED]> wrote: > Interesting (and mildly terrifying) point, Ryan. > > Is there a valid pattern for storing a sum in HBase then using mapreduce to > calculate an update to that sum based on incremental data updates? > > It seems a cycle like the following would avoid double increment problems, > but would suffer from a monster race condition. > > 1. Mapreduce updated values into aggregates (written to HDFS). > 2. Mapreduce aggregates with existing value in HBase into new target value > for HBase (but written to HDFS). > 3. Mapreduce writing new values to HBases. > > Please tell me there's a better way. > > Thanks, > > Leif > > On Fri, Jun 17, 2011 at 3:33 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: > >> Watch out - increment is not idempotent, so you will have to somehow >> ensure that a map runs exactly 1x and never more or less than that. >> Also job failures will ruin the data as well. >> >> -ryan >> >> On Fri, Jun 17, 2011 at 1:57 PM, Stack <[EMAIL PROTECTED]> wrote: >> > Go for it! >> > St.Ack >> > >> > On Fri, Jun 17, 2011 at 1:43 PM, Leif Wickland <[EMAIL PROTECTED]> >> wrote: >> >> I tried to use TableMapper and TableOutputFormat in >> >> from org.apache.hadoop.hbase.mapreduce to write a map-reduce which >> >> incremented some columns. I noticed that TableOutputFormat.write() >> doesn't >> >> support Increment, only Put and Delete. >> >> >> >> Is there a reason that TableOutputFormat shouldn't support increment? >> >> >> >> I think adding support for increment would only require adding a copy >> >> constructor to Increment and a few lines to TableOutputFormat: I'd be >> >> willing to give writing the patch a try if there's no objection. >> >> >> >> Leif Wickland >> >> >> > >> >
-
Re: Is there a reason mapreduce.TableOutputFormat doesn't support Increment?Leif Wickland 2011-06-21, 18:35
My patch to add support for Increment to TableOutputFormat follows. (I did
the svn diff in trunk/src/main/java/org/apache/hadoop/hbase) One point I was unsure about was whether I should duplicate the TimeRange in the Increment's copy constructor. TimeRange is immutable except for its Writeable.readFields() implementation. Index: mapreduce/TableOutputFormat.java ==================================================================--- mapreduce/TableOutputFormat.java (revision 1138076) +++ mapreduce/TableOutputFormat.java (working copy) @@ -30,6 +30,7 @@ import org.apache.hadoop.hbase.client.Delete; import org.apache.hadoop.hbase.client.HConnectionManager; import org.apache.hadoop.hbase.client.HTable; +import org.apache.hadoop.hbase.client.Increment; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.zookeeper.ZKUtil; import org.apache.hadoop.io.Writable; @@ -41,8 +42,8 @@ /** * Convert Map/Reduce output and write it to an HBase table. The KEY is ignored - * while the output value <u>must</u> be either a {@link Put} or a - * {@link Delete} instance. + * while the output value <u>must</u> be a {@link Put}, + * {@link Delete}, or {@link Increment} instance. * * @param <KEY> The type of the key. Ignored in this class. */ @@ -119,8 +120,9 @@ public void write(KEY key, Writable value) throws IOException { if (value instanceof Put) this.table.put(new Put((Put)value)); + else if (value instanceof Increment) this.table.increment(new Increment((Increment)value)); else if (value instanceof Delete) this.table.delete(new Delete((Delete)value)); - else throw new IOException("Pass a Delete or a Put"); + else throw new IOException("Pass a Delete, Increment or a Put"); } } Index: client/Increment.java ==================================================================--- client/Increment.java (revision 1138076) +++ client/Increment.java (working copy) @@ -101,6 +101,19 @@ return this; } + /** + * Copy constructor. Creates an Increment operation cloned from the specified + * Increment. + * @param incrementToCopy increment to copy + */ + public Increment(final Increment incrementToCopy) { + this.row = incrementToCopy.getRow(); + this.lockId = incrementToCopy.getLockId(); + this.writeToWAL = incrementToCopy.getWriteToWAL(); + this.tr = incrementToCopy.getTimeRange(); + this.familyMap.putAll(incrementToCopy.getFamilyMap()); + } + /* Accessors */ /** On Fri, Jun 17, 2011 at 2:57 PM, Stack <[EMAIL PROTECTED]> wrote: > Go for it! > St.Ack > > On Fri, Jun 17, 2011 at 1:43 PM, Leif Wickland <[EMAIL PROTECTED]> > wrote: > > I tried to use TableMapper and TableOutputFormat in > > from org.apache.hadoop.hbase.mapreduce to write a map-reduce which > > incremented some columns. I noticed that TableOutputFormat.write() > doesn't > > support Increment, only Put and Delete. > > > > Is there a reason that TableOutputFormat shouldn't support increment? > > > > I think adding support for increment would only require adding a copy > > constructor to Increment and a few lines to TableOutputFormat: I'd be > > willing to give writing the patch a try if there's no objection. > > > > Leif Wickland > > > |