Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase


Copy link to this message
-
Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase
Yes, it's simple yet useful. I am integrating it. Thanks alot :)

On Fri, May 13, 2011 at 3:12 PM, Alex Baranau <[EMAIL PROTECTED]>wrote:

> Thanks for the interest!
>
> We are using it in production. It is simple and hence quite stable. Though
> some minor pieces are missing (like
> https://github.com/sematext/HBaseWD/issues/1) this doesn't affect
> stability
> and/or major functionality.
>
> Alex Baranau
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
>
> On Fri, May 13, 2011 at 10:45 AM, Weishung Chung <[EMAIL PROTECTED]>
> wrote:
>
> > What's the status on this package? Is it mature enough?
> >  I am using it in my project, tried out the write method yesterday and
> > going
> > to incorporate into read method tomorrow.
> >
> > On Wed, May 11, 2011 at 3:41 PM, Alex Baranau <[EMAIL PROTECTED]
> > >wrote:
> >
> > > > The start/end rows may be written twice.
> > >
> > > Yeah, I know. I meant that size of startRow+stopRow data is "bearable"
> in
> > > attribute value no matter how long are they (keys), since we already OK
> > > with
> > > transferring them initially (i.e. we should be OK with transferring 2x
> > > times
> > > more).
> > >
> > > So, what about the suggestion of sourceScan attribute value I
> mentioned?
> > If
> > > you can tell why it isn't sufficient in your case, I'd have more info
> to
> > > think about better suggestion ;)
> > >
> > > > It is Okay to keep all versions of your patch in the JIRA.
> > > > Maybe the second should be named HBASE-3811-v2.patch<
> > >
> >
> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
> > > >?
> > >
> > > np. Can do that. Just thought that they (patches) can be sorted by date
> > to
> > > find out the final one (aka "convention over naming-rules").
> > >
> > > Alex.
> > >
> > > On Wed, May 11, 2011 at 11:13 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
> > >
> > > > >> Though it might be ok, since we anyways "transfer" start/stop rows
> > > with
> > > > Scan object.
> > > > In write() method, we now have:
> > > >     Bytes.writeByteArray(out, this.startRow);
> > > >     Bytes.writeByteArray(out, this.stopRow);
> > > > ...
> > > >       for (Map.Entry<String, byte[]> attr :
> this.attributes.entrySet())
> > {
> > > >         WritableUtils.writeString(out, attr.getKey());
> > > >         Bytes.writeByteArray(out, attr.getValue());
> > > >       }
> > > > The start/end rows may be written twice.
> > > >
> > > > Of course, you have full control over how to generate the unique ID
> for
> > > > "sourceScan" attribute.
> > > >
> > > > It is Okay to keep all versions of your patch in the JIRA. Maybe the
> > > second
> > > > should be named HBASE-3811-v2.patch<
> > >
> >
> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
> > > >?
> > > >
> > > > Thanks
> > > >
> > > >
> > > > On Wed, May 11, 2011 at 1:01 PM, Alex Baranau <
> > [EMAIL PROTECTED]
> > > >wrote:
> > > >
> > > >> > Can you remove the first version ?
> > > >> Isn't it ok to keep it in JIRA issue?
> > > >>
> > > >>
> > > >> > In HBaseWD, can you use reflection to detect whether Scan supports
> > > >> setAttribute() ?
> > > >> > If it does, can you encode start row and end row as "sourceScan"
> > > >> attribute ?
> > > >>
> > > >> Yeah, smth like this is going to be implemented. Though I'd still
> want
> > > to
> > > >> hear from the devs the story about Scan version.
> > > >>
> > > >>
> > > >> > One consideration is that start row or end row may be quite long.
> > > >>
> > > >> Yeah, that is was my though too at first. Though it might be ok,
> since
> > > we
> > > >> anyways "transfer" start/stop rows with Scan object.
> > > >>
> > > >> > What do you think ?
> > > >>
> > > >> I'd love to hear from you is this variant I mentioned is what we are
> > > >> looking at here:
> > > >>
> > > >>
> > > >> > From what I understand, you want to distinguish scans fired by the
> > > same
> > > >> distributed scan.
> > > >> > I.e. group scans which were fired by single distributed scan. If
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB