Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase


Copy link to this message
-
Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase
Thank you, I like the second option better to avoid the roundtrip to HBase.
I am trying it out now.

On Wed, May 18, 2011 at 10:03 AM, Alex Baranau <[EMAIL PROTECTED]>wrote:

> There are several options here. E.g.:
>
> 1) Given that you have "original key" of the record, you can fetch the
> stored record key from HBase and use it to create Put with updated (or new)
> cells.
>
> Currently you'll need to use distributes scan for that, there's not
> analogue
> for Get operation yet (see https://github.com/sematext/HBaseWD/issues/1).
>
> Note: you need to first find out the real key of stored record by fetching
> data from HBase in case you use included in current lib
> RowKeyDistributorByOneBytePrefix. Alternatively, see next option:
>
> 2) You can create your own RowKeyDistributor implementation which will
> create "distributed key" based on original key value so that later when you
> have original key and want to update the record you can calculate
> distributed key without roundtrip to HBase.
>
> E.g. your RowKeyDistributor implementation you can calculate 1-byte hash of
> original key (https://github.com/sematext/HBaseWD/issues/2).
>
>
>
> In either way you don't need to delete record to update some cells of it or
> add new cells.
>
> Please let me know if you have more Qs!
>
> Alex Baranau
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
>
> On Wed, May 18, 2011 at 1:19 AM, Weishung Chung <[EMAIL PROTECTED]>
> wrote:
>
> > I have another question. For overwriting, do I need to delete the
> existing
> > one before re-writing it?
> >
> > On Sat, May 14, 2011 at 10:17 AM, Weishung Chung <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Yes, it's simple yet useful. I am integrating it. Thanks alot :)
> > >
> > >
> > > On Fri, May 13, 2011 at 3:12 PM, Alex Baranau <
> [EMAIL PROTECTED]
> > >wrote:
> > >
> > >> Thanks for the interest!
> > >>
> > >> We are using it in production. It is simple and hence quite stable.
> > Though
> > >> some minor pieces are missing (like
> > >> https://github.com/sematext/HBaseWD/issues/1) this doesn't affect
> > >> stability
> > >> and/or major functionality.
> > >>
> > >> Alex Baranau
> > >> ----
> > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
> > >> HBase
> > >>
> > >> On Fri, May 13, 2011 at 10:45 AM, Weishung Chung <[EMAIL PROTECTED]>
> > >> wrote:
> > >>
> > >> > What's the status on this package? Is it mature enough?
> > >> >  I am using it in my project, tried out the write method yesterday
> and
> > >> > going
> > >> > to incorporate into read method tomorrow.
> > >> >
> > >> > On Wed, May 11, 2011 at 3:41 PM, Alex Baranau <
> > [EMAIL PROTECTED]
> > >> > >wrote:
> > >> >
> > >> > > > The start/end rows may be written twice.
> > >> > >
> > >> > > Yeah, I know. I meant that size of startRow+stopRow data is
> > "bearable"
> > >> in
> > >> > > attribute value no matter how long are they (keys), since we
> already
> > >> OK
> > >> > > with
> > >> > > transferring them initially (i.e. we should be OK with
> transferring
> > 2x
> > >> > > times
> > >> > > more).
> > >> > >
> > >> > > So, what about the suggestion of sourceScan attribute value I
> > >> mentioned?
> > >> > If
> > >> > > you can tell why it isn't sufficient in your case, I'd have more
> > info
> > >> to
> > >> > > think about better suggestion ;)
> > >> > >
> > >> > > > It is Okay to keep all versions of your patch in the JIRA.
> > >> > > > Maybe the second should be named HBASE-3811-v2.patch<
> > >> > >
> > >> >
> > >>
> >
> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
> > >> > > >?
> > >> > >
> > >> > > np. Can do that. Just thought that they (patches) can be sorted by
> > >> date
> > >> > to
> > >> > > find out the final one (aka "convention over naming-rules").
> > >> > >
> > >> > > Alex.
> > >> > >
> > >> > > On Wed, May 11, 2011 at 11:13 PM, Ted Yu <[EMAIL PROTECTED]>
> > wrote:
> > >> > >
> > >> > > > >> Though it might be ok, since we anyways "transfer" start/stop
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB