Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase


Copy link to this message
-
Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase
What's the status on this package? Is it mature enough?
 I am using it in my project, tried out the write method yesterday and going
to incorporate into read method tomorrow.

On Wed, May 11, 2011 at 3:41 PM, Alex Baranau <[EMAIL PROTECTED]>wrote:

> > The start/end rows may be written twice.
>
> Yeah, I know. I meant that size of startRow+stopRow data is "bearable" in
> attribute value no matter how long are they (keys), since we already OK
> with
> transferring them initially (i.e. we should be OK with transferring 2x
> times
> more).
>
> So, what about the suggestion of sourceScan attribute value I mentioned? If
> you can tell why it isn't sufficient in your case, I'd have more info to
> think about better suggestion ;)
>
> > It is Okay to keep all versions of your patch in the JIRA.
> > Maybe the second should be named HBASE-3811-v2.patch<
> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
> >?
>
> np. Can do that. Just thought that they (patches) can be sorted by date to
> find out the final one (aka "convention over naming-rules").
>
> Alex.
>
> On Wed, May 11, 2011 at 11:13 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > >> Though it might be ok, since we anyways "transfer" start/stop rows
> with
> > Scan object.
> > In write() method, we now have:
> >     Bytes.writeByteArray(out, this.startRow);
> >     Bytes.writeByteArray(out, this.stopRow);
> > ...
> >       for (Map.Entry<String, byte[]> attr : this.attributes.entrySet()) {
> >         WritableUtils.writeString(out, attr.getKey());
> >         Bytes.writeByteArray(out, attr.getValue());
> >       }
> > The start/end rows may be written twice.
> >
> > Of course, you have full control over how to generate the unique ID for
> > "sourceScan" attribute.
> >
> > It is Okay to keep all versions of your patch in the JIRA. Maybe the
> second
> > should be named HBASE-3811-v2.patch<
> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
> >?
> >
> > Thanks
> >
> >
> > On Wed, May 11, 2011 at 1:01 PM, Alex Baranau <[EMAIL PROTECTED]
> >wrote:
> >
> >> > Can you remove the first version ?
> >> Isn't it ok to keep it in JIRA issue?
> >>
> >>
> >> > In HBaseWD, can you use reflection to detect whether Scan supports
> >> setAttribute() ?
> >> > If it does, can you encode start row and end row as "sourceScan"
> >> attribute ?
> >>
> >> Yeah, smth like this is going to be implemented. Though I'd still want
> to
> >> hear from the devs the story about Scan version.
> >>
> >>
> >> > One consideration is that start row or end row may be quite long.
> >>
> >> Yeah, that is was my though too at first. Though it might be ok, since
> we
> >> anyways "transfer" start/stop rows with Scan object.
> >>
> >> > What do you think ?
> >>
> >> I'd love to hear from you is this variant I mentioned is what we are
> >> looking at here:
> >>
> >>
> >> > From what I understand, you want to distinguish scans fired by the
> same
> >> distributed scan.
> >> > I.e. group scans which were fired by single distributed scan. If
> that's
> >> what you want, distributed
> >> > scan can generate unique ID and set, say "sourceScan" attribute to its
> >> value. This way we'll
> >> > have <# of distinct "sourceScan" attribute values> = <number of
> >> distributed scans invoked by
> >> > client side> and two scans on server side will have the same
> >> "sourceScan" attribute iff they
> >> > "belong" to same distributed scan.
> >>
> >>
> >> Alex Baranau
> >> ----
> >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
> >> HBase
> >>
> >> On Wed, May 11, 2011 at 5:15 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
> >>
> >>> Alex:
> >>> Your second patch looks good.
> >>> Can you remove the first version ?
> >>>
> >>> In HBaseWD, can you use reflection to detect whether Scan supports
> >>> setAttribute() ?
> >>> If it does, can you encode start row and end row as "sourceScan"
> >>> attribute ?
> >>>
> >>> One consideration is that start row or end row may be quite long.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB