Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase


Copy link to this message
-
Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase
Weishung Chung 2011-05-14, 15:17
Yes, it's simple yet useful. I am integrating it. Thanks alot :)

On Fri, May 13, 2011 at 3:12 PM, Alex Baranau <[EMAIL PROTECTED]>wrote:

> Thanks for the interest!
>
> We are using it in production. It is simple and hence quite stable. Though
> some minor pieces are missing (like
> https://github.com/sematext/HBaseWD/issues/1) this doesn't affect
> stability
> and/or major functionality.
>
> Alex Baranau
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
>
> On Fri, May 13, 2011 at 10:45 AM, Weishung Chung <[EMAIL PROTECTED]>
> wrote:
>
> > What's the status on this package? Is it mature enough?
> >  I am using it in my project, tried out the write method yesterday and
> > going
> > to incorporate into read method tomorrow.
> >
> > On Wed, May 11, 2011 at 3:41 PM, Alex Baranau <[EMAIL PROTECTED]
> > >wrote:
> >
> > > > The start/end rows may be written twice.
> > >
> > > Yeah, I know. I meant that size of startRow+stopRow data is "bearable"
> in
> > > attribute value no matter how long are they (keys), since we already OK
> > > with
> > > transferring them initially (i.e. we should be OK with transferring 2x
> > > times
> > > more).
> > >
> > > So, what about the suggestion of sourceScan attribute value I
> mentioned?
> > If
> > > you can tell why it isn't sufficient in your case, I'd have more info
> to
> > > think about better suggestion ;)
> > >
> > > > It is Okay to keep all versions of your patch in the JIRA.
> > > > Maybe the second should be named HBASE-3811-v2.patch<
> > >
> >
> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
> > > >?
> > >
> > > np. Can do that. Just thought that they (patches) can be sorted by date
> > to
> > > find out the final one (aka "convention over naming-rules").
> > >
> > > Alex.
> > >
> > > On Wed, May 11, 2011 at 11:13 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
> > >
> > > > >> Though it might be ok, since we anyways "transfer" start/stop rows
> > > with
> > > > Scan object.
> > > > In write() method, we now have:
> > > >     Bytes.writeByteArray(out, this.startRow);
> > > >     Bytes.writeByteArray(out, this.stopRow);
> > > > ...
> > > >       for (Map.Entry<String, byte[]> attr :
> this.attributes.entrySet())
> > {
> > > >         WritableUtils.writeString(out, attr.getKey());
> > > >         Bytes.writeByteArray(out, attr.getValue());
> > > >       }
> > > > The start/end rows may be written twice.
> > > >
> > > > Of course, you have full control over how to generate the unique ID
> for
> > > > "sourceScan" attribute.
> > > >
> > > > It is Okay to keep all versions of your patch in the JIRA. Maybe the
> > > second
> > > > should be named HBASE-3811-v2.patch<
> > >
> >
> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
> > > >?
> > > >
> > > > Thanks
> > > >
> > > >
> > > > On Wed, May 11, 2011 at 1:01 PM, Alex Baranau <
> > [EMAIL PROTECTED]
> > > >wrote:
> > > >
> > > >> > Can you remove the first version ?
> > > >> Isn't it ok to keep it in JIRA issue?
> > > >>
> > > >>
> > > >> > In HBaseWD, can you use reflection to detect whether Scan supports
> > > >> setAttribute() ?
> > > >> > If it does, can you encode start row and end row as "sourceScan"
> > > >> attribute ?
> > > >>
> > > >> Yeah, smth like this is going to be implemented. Though I'd still
> want
> > > to
> > > >> hear from the devs the story about Scan version.
> > > >>
> > > >>
> > > >> > One consideration is that start row or end row may be quite long.
> > > >>
> > > >> Yeah, that is was my though too at first. Though it might be ok,
> since
> > > we
> > > >> anyways "transfer" start/stop rows with Scan object.
> > > >>
> > > >> > What do you think ?
> > > >>
> > > >> I'd love to hear from you is this variant I mentioned is what we are
> > > >> looking at here:
> > > >>
> > > >>
> > > >> > From what I understand, you want to distinguish scans fired by the
> > > same
> > > >> distributed scan.
> > > >> > I.e. group scans which were fired by single distributed scan. If