Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # dev - Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase


+
Alex Baranau 2011-05-11, 20:01
+
Alex Baranau 2011-05-11, 20:41
Copy link to this message
-
Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase
Weishung Chung 2011-05-13, 07:45
What's the status on this package? Is it mature enough?
 I am using it in my project, tried out the write method yesterday and going
to incorporate into read method tomorrow.

On Wed, May 11, 2011 at 3:41 PM, Alex Baranau <[EMAIL PROTECTED]>wrote:

> > The start/end rows may be written twice.
>
> Yeah, I know. I meant that size of startRow+stopRow data is "bearable" in
> attribute value no matter how long are they (keys), since we already OK
> with
> transferring them initially (i.e. we should be OK with transferring 2x
> times
> more).
>
> So, what about the suggestion of sourceScan attribute value I mentioned? If
> you can tell why it isn't sufficient in your case, I'd have more info to
> think about better suggestion ;)
>
> > It is Okay to keep all versions of your patch in the JIRA.
> > Maybe the second should be named HBASE-3811-v2.patch<
> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
> >?
>
> np. Can do that. Just thought that they (patches) can be sorted by date to
> find out the final one (aka "convention over naming-rules").
>
> Alex.
>
> On Wed, May 11, 2011 at 11:13 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > >> Though it might be ok, since we anyways "transfer" start/stop rows
> with
> > Scan object.
> > In write() method, we now have:
> >     Bytes.writeByteArray(out, this.startRow);
> >     Bytes.writeByteArray(out, this.stopRow);
> > ...
> >       for (Map.Entry<String, byte[]> attr : this.attributes.entrySet()) {
> >         WritableUtils.writeString(out, attr.getKey());
> >         Bytes.writeByteArray(out, attr.getValue());
> >       }
> > The start/end rows may be written twice.
> >
> > Of course, you have full control over how to generate the unique ID for
> > "sourceScan" attribute.
> >
> > It is Okay to keep all versions of your patch in the JIRA. Maybe the
> second
> > should be named HBASE-3811-v2.patch<
> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
> >?
> >
> > Thanks
> >
> >
> > On Wed, May 11, 2011 at 1:01 PM, Alex Baranau <[EMAIL PROTECTED]
> >wrote:
> >
> >> > Can you remove the first version ?
> >> Isn't it ok to keep it in JIRA issue?
> >>
> >>
> >> > In HBaseWD, can you use reflection to detect whether Scan supports
> >> setAttribute() ?
> >> > If it does, can you encode start row and end row as "sourceScan"
> >> attribute ?
> >>
> >> Yeah, smth like this is going to be implemented. Though I'd still want
> to
> >> hear from the devs the story about Scan version.
> >>
> >>
> >> > One consideration is that start row or end row may be quite long.
> >>
> >> Yeah, that is was my though too at first. Though it might be ok, since
> we
> >> anyways "transfer" start/stop rows with Scan object.
> >>
> >> > What do you think ?
> >>
> >> I'd love to hear from you is this variant I mentioned is what we are
> >> looking at here:
> >>
> >>
> >> > From what I understand, you want to distinguish scans fired by the
> same
> >> distributed scan.
> >> > I.e. group scans which were fired by single distributed scan. If
> that's
> >> what you want, distributed
> >> > scan can generate unique ID and set, say "sourceScan" attribute to its
> >> value. This way we'll
> >> > have <# of distinct "sourceScan" attribute values> = <number of
> >> distributed scans invoked by
> >> > client side> and two scans on server side will have the same
> >> "sourceScan" attribute iff they
> >> > "belong" to same distributed scan.
> >>
> >>
> >> Alex Baranau
> >> ----
> >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
> >> HBase
> >>
> >> On Wed, May 11, 2011 at 5:15 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
> >>
> >>> Alex:
> >>> Your second patch looks good.
> >>> Can you remove the first version ?
> >>>
> >>> In HBaseWD, can you use reflection to detect whether Scan supports
> >>> setAttribute() ?
> >>> If it does, can you encode start row and end row as "sourceScan"
> >>> attribute ?
> >>>
> >>> One consideration is that start row or end row may be quite long.
+
Alex Baranau 2011-05-13, 20:12
+
Weishung Chung 2011-05-14, 15:17
+
Weishung Chung 2011-05-17, 22:19
+
Alex Baranau 2011-05-18, 15:03
+
Weishung Chung 2011-05-18, 21:15
+
Ted Yu 2011-05-18, 23:18
+
Weishung Chung 2011-05-19, 03:50
+
Alex Baranau 2011-05-19, 13:14
+
Weishung Chung 2011-05-19, 13:45