Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase


Copy link to this message
-
Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase
Weishung Chung 2011-05-18, 21:15
Thank you, I like the second option better to avoid the roundtrip to HBase.
I am trying it out now.

On Wed, May 18, 2011 at 10:03 AM, Alex Baranau <[EMAIL PROTECTED]>wrote:

> There are several options here. E.g.:
>
> 1) Given that you have "original key" of the record, you can fetch the
> stored record key from HBase and use it to create Put with updated (or new)
> cells.
>
> Currently you'll need to use distributes scan for that, there's not
> analogue
> for Get operation yet (see https://github.com/sematext/HBaseWD/issues/1).
>
> Note: you need to first find out the real key of stored record by fetching
> data from HBase in case you use included in current lib
> RowKeyDistributorByOneBytePrefix. Alternatively, see next option:
>
> 2) You can create your own RowKeyDistributor implementation which will
> create "distributed key" based on original key value so that later when you
> have original key and want to update the record you can calculate
> distributed key without roundtrip to HBase.
>
> E.g. your RowKeyDistributor implementation you can calculate 1-byte hash of
> original key (https://github.com/sematext/HBaseWD/issues/2).
>
>
>
> In either way you don't need to delete record to update some cells of it or
> add new cells.
>
> Please let me know if you have more Qs!
>
> Alex Baranau
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
>
> On Wed, May 18, 2011 at 1:19 AM, Weishung Chung <[EMAIL PROTECTED]>
> wrote:
>
> > I have another question. For overwriting, do I need to delete the
> existing
> > one before re-writing it?
> >
> > On Sat, May 14, 2011 at 10:17 AM, Weishung Chung <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Yes, it's simple yet useful. I am integrating it. Thanks alot :)
> > >
> > >
> > > On Fri, May 13, 2011 at 3:12 PM, Alex Baranau <
> [EMAIL PROTECTED]
> > >wrote:
> > >
> > >> Thanks for the interest!
> > >>
> > >> We are using it in production. It is simple and hence quite stable.
> > Though
> > >> some minor pieces are missing (like
> > >> https://github.com/sematext/HBaseWD/issues/1) this doesn't affect
> > >> stability
> > >> and/or major functionality.
> > >>
> > >> Alex Baranau
> > >> ----
> > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
> > >> HBase
> > >>
> > >> On Fri, May 13, 2011 at 10:45 AM, Weishung Chung <[EMAIL PROTECTED]>
> > >> wrote:
> > >>
> > >> > What's the status on this package? Is it mature enough?
> > >> >  I am using it in my project, tried out the write method yesterday
> and
> > >> > going
> > >> > to incorporate into read method tomorrow.
> > >> >
> > >> > On Wed, May 11, 2011 at 3:41 PM, Alex Baranau <
> > [EMAIL PROTECTED]
> > >> > >wrote:
> > >> >
> > >> > > > The start/end rows may be written twice.
> > >> > >
> > >> > > Yeah, I know. I meant that size of startRow+stopRow data is
> > "bearable"
> > >> in
> > >> > > attribute value no matter how long are they (keys), since we
> already
> > >> OK
> > >> > > with
> > >> > > transferring them initially (i.e. we should be OK with
> transferring
> > 2x
> > >> > > times
> > >> > > more).
> > >> > >
> > >> > > So, what about the suggestion of sourceScan attribute value I
> > >> mentioned?
> > >> > If
> > >> > > you can tell why it isn't sufficient in your case, I'd have more
> > info
> > >> to
> > >> > > think about better suggestion ;)
> > >> > >
> > >> > > > It is Okay to keep all versions of your patch in the JIRA.
> > >> > > > Maybe the second should be named HBASE-3811-v2.patch<
> > >> > >
> > >> >
> > >>
> >
> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
> > >> > > >?
> > >> > >
> > >> > > np. Can do that. Just thought that they (patches) can be sorted by
> > >> date
> > >> > to
> > >> > > find out the final one (aka "convention over naming-rules").
> > >> > >
> > >> > > Alex.
> > >> > >
> > >> > > On Wed, May 11, 2011 at 11:13 PM, Ted Yu <[EMAIL PROTECTED]>
> > wrote:
> > >> > >
> > >> > > > >> Though it might be ok, since we anyways "transfer" start/stop