|
Alex Baranau
2011-05-11, 20:01
Alex Baranau
2011-05-11, 20:41
Weishung Chung
2011-05-13, 07:45
Alex Baranau
2011-05-13, 20:12
Weishung Chung
2011-05-14, 15:17
Weishung Chung
2011-05-17, 22:19
Alex Baranau
2011-05-18, 15:03
Weishung Chung
2011-05-18, 21:15
Ted Yu
2011-05-18, 23:18
Weishung Chung
2011-05-19, 03:50
Alex Baranau
2011-05-19, 13:14
Weishung Chung
2011-05-19, 13:45
|
-
Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBaseAlex Baranau 2011-05-11, 20:01
> Can you remove the first version ?
Isn't it ok to keep it in JIRA issue? > In HBaseWD, can you use reflection to detect whether Scan supports setAttribute() ? > If it does, can you encode start row and end row as "sourceScan" attribute ? Yeah, smth like this is going to be implemented. Though I'd still want to hear from the devs the story about Scan version. > One consideration is that start row or end row may be quite long. Yeah, that is was my though too at first. Though it might be ok, since we anyways "transfer" start/stop rows with Scan object. > What do you think ? I'd love to hear from you is this variant I mentioned is what we are looking at here: > From what I understand, you want to distinguish scans fired by the same distributed scan. > I.e. group scans which were fired by single distributed scan. If that's what you want, distributed > scan can generate unique ID and set, say "sourceScan" attribute to its value. This way we'll > have <# of distinct "sourceScan" attribute values> = <number of distributed scans invoked by > client side> and two scans on server side will have the same "sourceScan" attribute iff they > "belong" to same distributed scan. Alex Baranau ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase On Wed, May 11, 2011 at 5:15 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > Alex: > Your second patch looks good. > Can you remove the first version ? > > In HBaseWD, can you use reflection to detect whether Scan supports > setAttribute() ? > If it does, can you encode start row and end row as "sourceScan" attribute > ? > > One consideration is that start row or end row may be quite long. > Ideally we should store hash code of source Scan object as "sourceScan" > attribute. But Scan doesn't implement hashCode(). We can add it, that would > require running all Scan related tests. > > What do you think ? > > Thanks > > > On Tue, May 10, 2011 at 5:46 AM, Alex Baranau <[EMAIL PROTECTED]>wrote: > >> Sorry for the delay in response (public holidays here). >> >> This depends on what info you are looking for on server side. >> >> From what I understand, you want to distinguish scans fired by the same >> distributed scan. I.e. group scans which were fired by single distributed >> scan. If that's what you want, distributed scan can generate unique ID and >> set, say "sourceScan" attribute to its value. This way we'll have <# of >> distinct "sourceScan" attribute values> = <number of distributed scans >> invoked by client side> and two scans on server side will have the same >> "sourceScan" attribute iff they "belong" to same distributed scan. >> >> Is this what are you looking for? >> >> Alex Baranau >> >> P.S. attached patch for HBASE-3811<https://issues.apache.org/jira/browse/HBASE-3811> >> . >> P.S-2. should this conversation be moved to dev list? >> >> ---- >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - >> HBase >> >> On Fri, May 6, 2011 at 12:06 AM, Ted Yu <[EMAIL PROTECTED]> wrote: >> >>> Alex: >>> What type of identification should we put in the map of the Scan object ? >>> I am thinking of using the Id of RowKeyDistributor. But the user can use >>> same distributor on multiple scans. >>> >>> Please share your thought. >>> >>> >>> On Thu, Apr 21, 2011 at 8:32 AM, Alex Baranau <[EMAIL PROTECTED]>wrote: >>> >>>> https://issues.apache.org/jira/browse/HBASE-3811 >>>> >>>> Alex Baranau >>>> ---- >>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - >>>> HBase >>>> >>>> On Thu, Apr 21, 2011 at 5:57 PM, Ted Yu <[EMAIL PROTECTED]> wrote: >>>> >>>> > My plan was to make regions that have active scanners more stable - >>>> trying >>>> > not to move them when balancing. >>>> > I prefer second approach - adding custom attribute(s) to Scan so that >>>> the >>>> > Scans created by the method below can be 'grouped'. >>>> > >>>> > If you can file a JIRA, that would be great. >>>> > >>>> > On Thu, Apr 21, 2011 at 7:23 AM, Alex Baranau < +
Alex Baranau 2011-05-11, 20:01
-
Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBaseAlex Baranau 2011-05-11, 20:41
> The start/end rows may be written twice.
Yeah, I know. I meant that size of startRow+stopRow data is "bearable" in attribute value no matter how long are they (keys), since we already OK with transferring them initially (i.e. we should be OK with transferring 2x times more). So, what about the suggestion of sourceScan attribute value I mentioned? If you can tell why it isn't sufficient in your case, I'd have more info to think about better suggestion ;) > It is Okay to keep all versions of your patch in the JIRA. > Maybe the second should be named HBASE-3811-v2.patch<https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch>? np. Can do that. Just thought that they (patches) can be sorted by date to find out the final one (aka "convention over naming-rules"). Alex. On Wed, May 11, 2011 at 11:13 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > >> Though it might be ok, since we anyways "transfer" start/stop rows with > Scan object. > In write() method, we now have: > Bytes.writeByteArray(out, this.startRow); > Bytes.writeByteArray(out, this.stopRow); > ... > for (Map.Entry<String, byte[]> attr : this.attributes.entrySet()) { > WritableUtils.writeString(out, attr.getKey()); > Bytes.writeByteArray(out, attr.getValue()); > } > The start/end rows may be written twice. > > Of course, you have full control over how to generate the unique ID for > "sourceScan" attribute. > > It is Okay to keep all versions of your patch in the JIRA. Maybe the second > should be named HBASE-3811-v2.patch<https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch>? > > Thanks > > > On Wed, May 11, 2011 at 1:01 PM, Alex Baranau <[EMAIL PROTECTED]>wrote: > >> > Can you remove the first version ? >> Isn't it ok to keep it in JIRA issue? >> >> >> > In HBaseWD, can you use reflection to detect whether Scan supports >> setAttribute() ? >> > If it does, can you encode start row and end row as "sourceScan" >> attribute ? >> >> Yeah, smth like this is going to be implemented. Though I'd still want to >> hear from the devs the story about Scan version. >> >> >> > One consideration is that start row or end row may be quite long. >> >> Yeah, that is was my though too at first. Though it might be ok, since we >> anyways "transfer" start/stop rows with Scan object. >> >> > What do you think ? >> >> I'd love to hear from you is this variant I mentioned is what we are >> looking at here: >> >> >> > From what I understand, you want to distinguish scans fired by the same >> distributed scan. >> > I.e. group scans which were fired by single distributed scan. If that's >> what you want, distributed >> > scan can generate unique ID and set, say "sourceScan" attribute to its >> value. This way we'll >> > have <# of distinct "sourceScan" attribute values> = <number of >> distributed scans invoked by >> > client side> and two scans on server side will have the same >> "sourceScan" attribute iff they >> > "belong" to same distributed scan. >> >> >> Alex Baranau >> ---- >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - >> HBase >> >> On Wed, May 11, 2011 at 5:15 PM, Ted Yu <[EMAIL PROTECTED]> wrote: >> >>> Alex: >>> Your second patch looks good. >>> Can you remove the first version ? >>> >>> In HBaseWD, can you use reflection to detect whether Scan supports >>> setAttribute() ? >>> If it does, can you encode start row and end row as "sourceScan" >>> attribute ? >>> >>> One consideration is that start row or end row may be quite long. >>> Ideally we should store hash code of source Scan object as "sourceScan" >>> attribute. But Scan doesn't implement hashCode(). We can add it, that would >>> require running all Scan related tests. >>> >>> What do you think ? >>> >>> Thanks >>> >>> >>> On Tue, May 10, 2011 at 5:46 AM, Alex Baranau <[EMAIL PROTECTED]>wrote: >>> >>>> Sorry for the delay in response (public holidays here). >>>> >>>> This depends on what info you are looking for on server side. +
Alex Baranau 2011-05-11, 20:41
-
Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBaseWeishung Chung 2011-05-13, 07:45
What's the status on this package? Is it mature enough?
I am using it in my project, tried out the write method yesterday and going to incorporate into read method tomorrow. On Wed, May 11, 2011 at 3:41 PM, Alex Baranau <[EMAIL PROTECTED]>wrote: > > The start/end rows may be written twice. > > Yeah, I know. I meant that size of startRow+stopRow data is "bearable" in > attribute value no matter how long are they (keys), since we already OK > with > transferring them initially (i.e. we should be OK with transferring 2x > times > more). > > So, what about the suggestion of sourceScan attribute value I mentioned? If > you can tell why it isn't sufficient in your case, I'd have more info to > think about better suggestion ;) > > > It is Okay to keep all versions of your patch in the JIRA. > > Maybe the second should be named HBASE-3811-v2.patch< > https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch > >? > > np. Can do that. Just thought that they (patches) can be sorted by date to > find out the final one (aka "convention over naming-rules"). > > Alex. > > On Wed, May 11, 2011 at 11:13 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > >> Though it might be ok, since we anyways "transfer" start/stop rows > with > > Scan object. > > In write() method, we now have: > > Bytes.writeByteArray(out, this.startRow); > > Bytes.writeByteArray(out, this.stopRow); > > ... > > for (Map.Entry<String, byte[]> attr : this.attributes.entrySet()) { > > WritableUtils.writeString(out, attr.getKey()); > > Bytes.writeByteArray(out, attr.getValue()); > > } > > The start/end rows may be written twice. > > > > Of course, you have full control over how to generate the unique ID for > > "sourceScan" attribute. > > > > It is Okay to keep all versions of your patch in the JIRA. Maybe the > second > > should be named HBASE-3811-v2.patch< > https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch > >? > > > > Thanks > > > > > > On Wed, May 11, 2011 at 1:01 PM, Alex Baranau <[EMAIL PROTECTED] > >wrote: > > > >> > Can you remove the first version ? > >> Isn't it ok to keep it in JIRA issue? > >> > >> > >> > In HBaseWD, can you use reflection to detect whether Scan supports > >> setAttribute() ? > >> > If it does, can you encode start row and end row as "sourceScan" > >> attribute ? > >> > >> Yeah, smth like this is going to be implemented. Though I'd still want > to > >> hear from the devs the story about Scan version. > >> > >> > >> > One consideration is that start row or end row may be quite long. > >> > >> Yeah, that is was my though too at first. Though it might be ok, since > we > >> anyways "transfer" start/stop rows with Scan object. > >> > >> > What do you think ? > >> > >> I'd love to hear from you is this variant I mentioned is what we are > >> looking at here: > >> > >> > >> > From what I understand, you want to distinguish scans fired by the > same > >> distributed scan. > >> > I.e. group scans which were fired by single distributed scan. If > that's > >> what you want, distributed > >> > scan can generate unique ID and set, say "sourceScan" attribute to its > >> value. This way we'll > >> > have <# of distinct "sourceScan" attribute values> = <number of > >> distributed scans invoked by > >> > client side> and two scans on server side will have the same > >> "sourceScan" attribute iff they > >> > "belong" to same distributed scan. > >> > >> > >> Alex Baranau > >> ---- > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - > >> HBase > >> > >> On Wed, May 11, 2011 at 5:15 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > >> > >>> Alex: > >>> Your second patch looks good. > >>> Can you remove the first version ? > >>> > >>> In HBaseWD, can you use reflection to detect whether Scan supports > >>> setAttribute() ? > >>> If it does, can you encode start row and end row as "sourceScan" > >>> attribute ? > >>> > >>> One consideration is that start row or end row may be quite long. +
Weishung Chung 2011-05-13, 07:45
-
Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBaseAlex Baranau 2011-05-13, 20:12
Thanks for the interest!
We are using it in production. It is simple and hence quite stable. Though some minor pieces are missing (like https://github.com/sematext/HBaseWD/issues/1) this doesn't affect stability and/or major functionality. Alex Baranau ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase On Fri, May 13, 2011 at 10:45 AM, Weishung Chung <[EMAIL PROTECTED]> wrote: > What's the status on this package? Is it mature enough? > I am using it in my project, tried out the write method yesterday and > going > to incorporate into read method tomorrow. > > On Wed, May 11, 2011 at 3:41 PM, Alex Baranau <[EMAIL PROTECTED] > >wrote: > > > > The start/end rows may be written twice. > > > > Yeah, I know. I meant that size of startRow+stopRow data is "bearable" in > > attribute value no matter how long are they (keys), since we already OK > > with > > transferring them initially (i.e. we should be OK with transferring 2x > > times > > more). > > > > So, what about the suggestion of sourceScan attribute value I mentioned? > If > > you can tell why it isn't sufficient in your case, I'd have more info to > > think about better suggestion ;) > > > > > It is Okay to keep all versions of your patch in the JIRA. > > > Maybe the second should be named HBASE-3811-v2.patch< > > > https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch > > >? > > > > np. Can do that. Just thought that they (patches) can be sorted by date > to > > find out the final one (aka "convention over naming-rules"). > > > > Alex. > > > > On Wed, May 11, 2011 at 11:13 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > >> Though it might be ok, since we anyways "transfer" start/stop rows > > with > > > Scan object. > > > In write() method, we now have: > > > Bytes.writeByteArray(out, this.startRow); > > > Bytes.writeByteArray(out, this.stopRow); > > > ... > > > for (Map.Entry<String, byte[]> attr : this.attributes.entrySet()) > { > > > WritableUtils.writeString(out, attr.getKey()); > > > Bytes.writeByteArray(out, attr.getValue()); > > > } > > > The start/end rows may be written twice. > > > > > > Of course, you have full control over how to generate the unique ID for > > > "sourceScan" attribute. > > > > > > It is Okay to keep all versions of your patch in the JIRA. Maybe the > > second > > > should be named HBASE-3811-v2.patch< > > > https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch > > >? > > > > > > Thanks > > > > > > > > > On Wed, May 11, 2011 at 1:01 PM, Alex Baranau < > [EMAIL PROTECTED] > > >wrote: > > > > > >> > Can you remove the first version ? > > >> Isn't it ok to keep it in JIRA issue? > > >> > > >> > > >> > In HBaseWD, can you use reflection to detect whether Scan supports > > >> setAttribute() ? > > >> > If it does, can you encode start row and end row as "sourceScan" > > >> attribute ? > > >> > > >> Yeah, smth like this is going to be implemented. Though I'd still want > > to > > >> hear from the devs the story about Scan version. > > >> > > >> > > >> > One consideration is that start row or end row may be quite long. > > >> > > >> Yeah, that is was my though too at first. Though it might be ok, since > > we > > >> anyways "transfer" start/stop rows with Scan object. > > >> > > >> > What do you think ? > > >> > > >> I'd love to hear from you is this variant I mentioned is what we are > > >> looking at here: > > >> > > >> > > >> > From what I understand, you want to distinguish scans fired by the > > same > > >> distributed scan. > > >> > I.e. group scans which were fired by single distributed scan. If > > that's > > >> what you want, distributed > > >> > scan can generate unique ID and set, say "sourceScan" attribute to > its > > >> value. This way we'll > > >> > have <# of distinct "sourceScan" attribute values> = <number of > > >> distributed scans invoked by > > >> > client side> and two scans on server side will have the same +
Alex Baranau 2011-05-13, 20:12
-
Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBaseWeishung Chung 2011-05-14, 15:17
Yes, it's simple yet useful. I am integrating it. Thanks alot :)
On Fri, May 13, 2011 at 3:12 PM, Alex Baranau <[EMAIL PROTECTED]>wrote: > Thanks for the interest! > > We are using it in production. It is simple and hence quite stable. Though > some minor pieces are missing (like > https://github.com/sematext/HBaseWD/issues/1) this doesn't affect > stability > and/or major functionality. > > Alex Baranau > ---- > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase > > On Fri, May 13, 2011 at 10:45 AM, Weishung Chung <[EMAIL PROTECTED]> > wrote: > > > What's the status on this package? Is it mature enough? > > I am using it in my project, tried out the write method yesterday and > > going > > to incorporate into read method tomorrow. > > > > On Wed, May 11, 2011 at 3:41 PM, Alex Baranau <[EMAIL PROTECTED] > > >wrote: > > > > > > The start/end rows may be written twice. > > > > > > Yeah, I know. I meant that size of startRow+stopRow data is "bearable" > in > > > attribute value no matter how long are they (keys), since we already OK > > > with > > > transferring them initially (i.e. we should be OK with transferring 2x > > > times > > > more). > > > > > > So, what about the suggestion of sourceScan attribute value I > mentioned? > > If > > > you can tell why it isn't sufficient in your case, I'd have more info > to > > > think about better suggestion ;) > > > > > > > It is Okay to keep all versions of your patch in the JIRA. > > > > Maybe the second should be named HBASE-3811-v2.patch< > > > > > > https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch > > > >? > > > > > > np. Can do that. Just thought that they (patches) can be sorted by date > > to > > > find out the final one (aka "convention over naming-rules"). > > > > > > Alex. > > > > > > On Wed, May 11, 2011 at 11:13 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > > > >> Though it might be ok, since we anyways "transfer" start/stop rows > > > with > > > > Scan object. > > > > In write() method, we now have: > > > > Bytes.writeByteArray(out, this.startRow); > > > > Bytes.writeByteArray(out, this.stopRow); > > > > ... > > > > for (Map.Entry<String, byte[]> attr : > this.attributes.entrySet()) > > { > > > > WritableUtils.writeString(out, attr.getKey()); > > > > Bytes.writeByteArray(out, attr.getValue()); > > > > } > > > > The start/end rows may be written twice. > > > > > > > > Of course, you have full control over how to generate the unique ID > for > > > > "sourceScan" attribute. > > > > > > > > It is Okay to keep all versions of your patch in the JIRA. Maybe the > > > second > > > > should be named HBASE-3811-v2.patch< > > > > > > https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch > > > >? > > > > > > > > Thanks > > > > > > > > > > > > On Wed, May 11, 2011 at 1:01 PM, Alex Baranau < > > [EMAIL PROTECTED] > > > >wrote: > > > > > > > >> > Can you remove the first version ? > > > >> Isn't it ok to keep it in JIRA issue? > > > >> > > > >> > > > >> > In HBaseWD, can you use reflection to detect whether Scan supports > > > >> setAttribute() ? > > > >> > If it does, can you encode start row and end row as "sourceScan" > > > >> attribute ? > > > >> > > > >> Yeah, smth like this is going to be implemented. Though I'd still > want > > > to > > > >> hear from the devs the story about Scan version. > > > >> > > > >> > > > >> > One consideration is that start row or end row may be quite long. > > > >> > > > >> Yeah, that is was my though too at first. Though it might be ok, > since > > > we > > > >> anyways "transfer" start/stop rows with Scan object. > > > >> > > > >> > What do you think ? > > > >> > > > >> I'd love to hear from you is this variant I mentioned is what we are > > > >> looking at here: > > > >> > > > >> > > > >> > From what I understand, you want to distinguish scans fired by the > > > same > > > >> distributed scan. > > > >> > I.e. group scans which were fired by single distributed scan. If +
Weishung Chung 2011-05-14, 15:17
-
Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBaseWeishung Chung 2011-05-17, 22:19
I have another question. For overwriting, do I need to delete the existing
one before re-writing it? On Sat, May 14, 2011 at 10:17 AM, Weishung Chung <[EMAIL PROTECTED]> wrote: > Yes, it's simple yet useful. I am integrating it. Thanks alot :) > > > On Fri, May 13, 2011 at 3:12 PM, Alex Baranau <[EMAIL PROTECTED]>wrote: > >> Thanks for the interest! >> >> We are using it in production. It is simple and hence quite stable. Though >> some minor pieces are missing (like >> https://github.com/sematext/HBaseWD/issues/1) this doesn't affect >> stability >> and/or major functionality. >> >> Alex Baranau >> ---- >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - >> HBase >> >> On Fri, May 13, 2011 at 10:45 AM, Weishung Chung <[EMAIL PROTECTED]> >> wrote: >> >> > What's the status on this package? Is it mature enough? >> > I am using it in my project, tried out the write method yesterday and >> > going >> > to incorporate into read method tomorrow. >> > >> > On Wed, May 11, 2011 at 3:41 PM, Alex Baranau <[EMAIL PROTECTED] >> > >wrote: >> > >> > > > The start/end rows may be written twice. >> > > >> > > Yeah, I know. I meant that size of startRow+stopRow data is "bearable" >> in >> > > attribute value no matter how long are they (keys), since we already >> OK >> > > with >> > > transferring them initially (i.e. we should be OK with transferring 2x >> > > times >> > > more). >> > > >> > > So, what about the suggestion of sourceScan attribute value I >> mentioned? >> > If >> > > you can tell why it isn't sufficient in your case, I'd have more info >> to >> > > think about better suggestion ;) >> > > >> > > > It is Okay to keep all versions of your patch in the JIRA. >> > > > Maybe the second should be named HBASE-3811-v2.patch< >> > > >> > >> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch >> > > >? >> > > >> > > np. Can do that. Just thought that they (patches) can be sorted by >> date >> > to >> > > find out the final one (aka "convention over naming-rules"). >> > > >> > > Alex. >> > > >> > > On Wed, May 11, 2011 at 11:13 PM, Ted Yu <[EMAIL PROTECTED]> wrote: >> > > >> > > > >> Though it might be ok, since we anyways "transfer" start/stop >> rows >> > > with >> > > > Scan object. >> > > > In write() method, we now have: >> > > > Bytes.writeByteArray(out, this.startRow); >> > > > Bytes.writeByteArray(out, this.stopRow); >> > > > ... >> > > > for (Map.Entry<String, byte[]> attr : >> this.attributes.entrySet()) >> > { >> > > > WritableUtils.writeString(out, attr.getKey()); >> > > > Bytes.writeByteArray(out, attr.getValue()); >> > > > } >> > > > The start/end rows may be written twice. >> > > > >> > > > Of course, you have full control over how to generate the unique ID >> for >> > > > "sourceScan" attribute. >> > > > >> > > > It is Okay to keep all versions of your patch in the JIRA. Maybe the >> > > second >> > > > should be named HBASE-3811-v2.patch< >> > > >> > >> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch >> > > >? >> > > > >> > > > Thanks >> > > > >> > > > >> > > > On Wed, May 11, 2011 at 1:01 PM, Alex Baranau < >> > [EMAIL PROTECTED] >> > > >wrote: >> > > > >> > > >> > Can you remove the first version ? >> > > >> Isn't it ok to keep it in JIRA issue? >> > > >> >> > > >> >> > > >> > In HBaseWD, can you use reflection to detect whether Scan >> supports >> > > >> setAttribute() ? >> > > >> > If it does, can you encode start row and end row as "sourceScan" >> > > >> attribute ? >> > > >> >> > > >> Yeah, smth like this is going to be implemented. Though I'd still >> want >> > > to >> > > >> hear from the devs the story about Scan version. >> > > >> >> > > >> >> > > >> > One consideration is that start row or end row may be quite long. >> > > >> >> > > >> Yeah, that is was my though too at first. Though it might be ok, >> since >> > > we >> > > >> anyways "transfer" start/stop rows with Scan object. +
Weishung Chung 2011-05-17, 22:19
-
Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBaseAlex Baranau 2011-05-18, 15:03
There are several options here. E.g.:
1) Given that you have "original key" of the record, you can fetch the stored record key from HBase and use it to create Put with updated (or new) cells. Currently you'll need to use distributes scan for that, there's not analogue for Get operation yet (see https://github.com/sematext/HBaseWD/issues/1). Note: you need to first find out the real key of stored record by fetching data from HBase in case you use included in current lib RowKeyDistributorByOneBytePrefix. Alternatively, see next option: 2) You can create your own RowKeyDistributor implementation which will create "distributed key" based on original key value so that later when you have original key and want to update the record you can calculate distributed key without roundtrip to HBase. E.g. your RowKeyDistributor implementation you can calculate 1-byte hash of original key (https://github.com/sematext/HBaseWD/issues/2). In either way you don't need to delete record to update some cells of it or add new cells. Please let me know if you have more Qs! Alex Baranau ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase On Wed, May 18, 2011 at 1:19 AM, Weishung Chung <[EMAIL PROTECTED]> wrote: > I have another question. For overwriting, do I need to delete the existing > one before re-writing it? > > On Sat, May 14, 2011 at 10:17 AM, Weishung Chung <[EMAIL PROTECTED]> > wrote: > > > Yes, it's simple yet useful. I am integrating it. Thanks alot :) > > > > > > On Fri, May 13, 2011 at 3:12 PM, Alex Baranau <[EMAIL PROTECTED] > >wrote: > > > >> Thanks for the interest! > >> > >> We are using it in production. It is simple and hence quite stable. > Though > >> some minor pieces are missing (like > >> https://github.com/sematext/HBaseWD/issues/1) this doesn't affect > >> stability > >> and/or major functionality. > >> > >> Alex Baranau > >> ---- > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - > >> HBase > >> > >> On Fri, May 13, 2011 at 10:45 AM, Weishung Chung <[EMAIL PROTECTED]> > >> wrote: > >> > >> > What's the status on this package? Is it mature enough? > >> > I am using it in my project, tried out the write method yesterday and > >> > going > >> > to incorporate into read method tomorrow. > >> > > >> > On Wed, May 11, 2011 at 3:41 PM, Alex Baranau < > [EMAIL PROTECTED] > >> > >wrote: > >> > > >> > > > The start/end rows may be written twice. > >> > > > >> > > Yeah, I know. I meant that size of startRow+stopRow data is > "bearable" > >> in > >> > > attribute value no matter how long are they (keys), since we already > >> OK > >> > > with > >> > > transferring them initially (i.e. we should be OK with transferring > 2x > >> > > times > >> > > more). > >> > > > >> > > So, what about the suggestion of sourceScan attribute value I > >> mentioned? > >> > If > >> > > you can tell why it isn't sufficient in your case, I'd have more > info > >> to > >> > > think about better suggestion ;) > >> > > > >> > > > It is Okay to keep all versions of your patch in the JIRA. > >> > > > Maybe the second should be named HBASE-3811-v2.patch< > >> > > > >> > > >> > https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch > >> > > >? > >> > > > >> > > np. Can do that. Just thought that they (patches) can be sorted by > >> date > >> > to > >> > > find out the final one (aka "convention over naming-rules"). > >> > > > >> > > Alex. > >> > > > >> > > On Wed, May 11, 2011 at 11:13 PM, Ted Yu <[EMAIL PROTECTED]> > wrote: > >> > > > >> > > > >> Though it might be ok, since we anyways "transfer" start/stop > >> rows > >> > > with > >> > > > Scan object. > >> > > > In write() method, we now have: > >> > > > Bytes.writeByteArray(out, this.startRow); > >> > > > Bytes.writeByteArray(out, this.stopRow); > >> > > > ... > >> > > > for (Map.Entry<String, byte[]> attr : > >> this.attributes.entrySet()) > >> > { > >> > > > WritableUtils.writeString(out, attr.getKey()); +
Alex Baranau 2011-05-18, 15:03
-
Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBaseWeishung Chung 2011-05-18, 21:15
Thank you, I like the second option better to avoid the roundtrip to HBase.
I am trying it out now. On Wed, May 18, 2011 at 10:03 AM, Alex Baranau <[EMAIL PROTECTED]>wrote: > There are several options here. E.g.: > > 1) Given that you have "original key" of the record, you can fetch the > stored record key from HBase and use it to create Put with updated (or new) > cells. > > Currently you'll need to use distributes scan for that, there's not > analogue > for Get operation yet (see https://github.com/sematext/HBaseWD/issues/1). > > Note: you need to first find out the real key of stored record by fetching > data from HBase in case you use included in current lib > RowKeyDistributorByOneBytePrefix. Alternatively, see next option: > > 2) You can create your own RowKeyDistributor implementation which will > create "distributed key" based on original key value so that later when you > have original key and want to update the record you can calculate > distributed key without roundtrip to HBase. > > E.g. your RowKeyDistributor implementation you can calculate 1-byte hash of > original key (https://github.com/sematext/HBaseWD/issues/2). > > > > In either way you don't need to delete record to update some cells of it or > add new cells. > > Please let me know if you have more Qs! > > Alex Baranau > ---- > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase > > On Wed, May 18, 2011 at 1:19 AM, Weishung Chung <[EMAIL PROTECTED]> > wrote: > > > I have another question. For overwriting, do I need to delete the > existing > > one before re-writing it? > > > > On Sat, May 14, 2011 at 10:17 AM, Weishung Chung <[EMAIL PROTECTED]> > > wrote: > > > > > Yes, it's simple yet useful. I am integrating it. Thanks alot :) > > > > > > > > > On Fri, May 13, 2011 at 3:12 PM, Alex Baranau < > [EMAIL PROTECTED] > > >wrote: > > > > > >> Thanks for the interest! > > >> > > >> We are using it in production. It is simple and hence quite stable. > > Though > > >> some minor pieces are missing (like > > >> https://github.com/sematext/HBaseWD/issues/1) this doesn't affect > > >> stability > > >> and/or major functionality. > > >> > > >> Alex Baranau > > >> ---- > > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - > > >> HBase > > >> > > >> On Fri, May 13, 2011 at 10:45 AM, Weishung Chung <[EMAIL PROTECTED]> > > >> wrote: > > >> > > >> > What's the status on this package? Is it mature enough? > > >> > I am using it in my project, tried out the write method yesterday > and > > >> > going > > >> > to incorporate into read method tomorrow. > > >> > > > >> > On Wed, May 11, 2011 at 3:41 PM, Alex Baranau < > > [EMAIL PROTECTED] > > >> > >wrote: > > >> > > > >> > > > The start/end rows may be written twice. > > >> > > > > >> > > Yeah, I know. I meant that size of startRow+stopRow data is > > "bearable" > > >> in > > >> > > attribute value no matter how long are they (keys), since we > already > > >> OK > > >> > > with > > >> > > transferring them initially (i.e. we should be OK with > transferring > > 2x > > >> > > times > > >> > > more). > > >> > > > > >> > > So, what about the suggestion of sourceScan attribute value I > > >> mentioned? > > >> > If > > >> > > you can tell why it isn't sufficient in your case, I'd have more > > info > > >> to > > >> > > think about better suggestion ;) > > >> > > > > >> > > > It is Okay to keep all versions of your patch in the JIRA. > > >> > > > Maybe the second should be named HBASE-3811-v2.patch< > > >> > > > > >> > > > >> > > > https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch > > >> > > >? > > >> > > > > >> > > np. Can do that. Just thought that they (patches) can be sorted by > > >> date > > >> > to > > >> > > find out the final one (aka "convention over naming-rules"). > > >> > > > > >> > > Alex. > > >> > > > > >> > > On Wed, May 11, 2011 at 11:13 PM, Ted Yu <[EMAIL PROTECTED]> > > wrote: > > >> > > > > >> > > > >> Though it might be ok, since we anyways "transfer" start/stop +
Weishung Chung 2011-05-18, 21:15
-
Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBaseTed Yu 2011-05-18, 23:18
Alex:
Can you summarize HBaseWD in your blog, including points 1 and 2 below ? Thanks On Wed, May 18, 2011 at 8:03 AM, Alex Baranau <[EMAIL PROTECTED]>wrote: > There are several options here. E.g.: > > 1) Given that you have "original key" of the record, you can fetch the > stored record key from HBase and use it to create Put with updated (or new) > cells. > > Currently you'll need to use distributes scan for that, there's not > analogue > for Get operation yet (see https://github.com/sematext/HBaseWD/issues/1). > > Note: you need to first find out the real key of stored record by fetching > data from HBase in case you use included in current lib > RowKeyDistributorByOneBytePrefix. Alternatively, see next option: > > 2) You can create your own RowKeyDistributor implementation which will > create "distributed key" based on original key value so that later when you > have original key and want to update the record you can calculate > distributed key without roundtrip to HBase. > > E.g. your RowKeyDistributor implementation you can calculate 1-byte hash of > original key (https://github.com/sematext/HBaseWD/issues/2). > > > > In either way you don't need to delete record to update some cells of it or > add new cells. > > Please let me know if you have more Qs! > > Alex Baranau > ---- > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase > > On Wed, May 18, 2011 at 1:19 AM, Weishung Chung <[EMAIL PROTECTED]> > wrote: > > > I have another question. For overwriting, do I need to delete the > existing > > one before re-writing it? > > > > On Sat, May 14, 2011 at 10:17 AM, Weishung Chung <[EMAIL PROTECTED]> > > wrote: > > > > > Yes, it's simple yet useful. I am integrating it. Thanks alot :) > > > > > > > > > On Fri, May 13, 2011 at 3:12 PM, Alex Baranau < > [EMAIL PROTECTED] > > >wrote: > > > > > >> Thanks for the interest! > > >> > > >> We are using it in production. It is simple and hence quite stable. > > Though > > >> some minor pieces are missing (like > > >> https://github.com/sematext/HBaseWD/issues/1) this doesn't affect > > >> stability > > >> and/or major functionality. > > >> > > >> Alex Baranau > > >> ---- > > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - > > >> HBase > > >> > > >> On Fri, May 13, 2011 at 10:45 AM, Weishung Chung <[EMAIL PROTECTED]> > > >> wrote: > > >> > > >> > What's the status on this package? Is it mature enough? > > >> > I am using it in my project, tried out the write method yesterday > and > > >> > going > > >> > to incorporate into read method tomorrow. > > >> > > > >> > On Wed, May 11, 2011 at 3:41 PM, Alex Baranau < > > [EMAIL PROTECTED] > > >> > >wrote: > > >> > > > >> > > > The start/end rows may be written twice. > > >> > > > > >> > > Yeah, I know. I meant that size of startRow+stopRow data is > > "bearable" > > >> in > > >> > > attribute value no matter how long are they (keys), since we > already > > >> OK > > >> > > with > > >> > > transferring them initially (i.e. we should be OK with > transferring > > 2x > > >> > > times > > >> > > more). > > >> > > > > >> > > So, what about the suggestion of sourceScan attribute value I > > >> mentioned? > > >> > If > > >> > > you can tell why it isn't sufficient in your case, I'd have more > > info > > >> to > > >> > > think about better suggestion ;) > > >> > > > > >> > > > It is Okay to keep all versions of your patch in the JIRA. > > >> > > > Maybe the second should be named HBASE-3811-v2.patch< > > >> > > > > >> > > > >> > > > https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch > > >> > > >? > > >> > > > > >> > > np. Can do that. Just thought that they (patches) can be sorted by > > >> date > > >> > to > > >> > > find out the final one (aka "convention over naming-rules"). > > >> > > > > >> > > Alex. > > >> > > > > >> > > On Wed, May 11, 2011 at 11:13 PM, Ted Yu <[EMAIL PROTECTED]> > > wrote: > > >> > > > > >> > > > >> Though it might be ok, since we anyways "transfer" start/stop +
Ted Yu 2011-05-18, 23:18
-
Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBaseWeishung Chung 2011-05-19, 03:50
I have another question about option 2. It seems like I need to handle the
distributed scan differently to read from start row to end row, assuming 1 byte hash of the original key is used as prefix since the order of the original key range is different from the resulting distributed key range. On Wed, May 18, 2011 at 6:18 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > Alex: > Can you summarize HBaseWD in your blog, including points 1 and 2 below ? > > Thanks > > On Wed, May 18, 2011 at 8:03 AM, Alex Baranau <[EMAIL PROTECTED] > >wrote: > > > There are several options here. E.g.: > > > > 1) Given that you have "original key" of the record, you can fetch the > > stored record key from HBase and use it to create Put with updated (or > new) > > cells. > > > > Currently you'll need to use distributes scan for that, there's not > > analogue > > for Get operation yet (see https://github.com/sematext/HBaseWD/issues/1 > ). > > > > Note: you need to first find out the real key of stored record by > fetching > > data from HBase in case you use included in current lib > > RowKeyDistributorByOneBytePrefix. Alternatively, see next option: > > > > 2) You can create your own RowKeyDistributor implementation which will > > create "distributed key" based on original key value so that later when > you > > have original key and want to update the record you can calculate > > distributed key without roundtrip to HBase. > > > > E.g. your RowKeyDistributor implementation you can calculate 1-byte hash > of > > original key (https://github.com/sematext/HBaseWD/issues/2). > > > > > > > > In either way you don't need to delete record to update some cells of it > or > > add new cells. > > > > Please let me know if you have more Qs! > > > > Alex Baranau > > ---- > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - > HBase > > > > On Wed, May 18, 2011 at 1:19 AM, Weishung Chung <[EMAIL PROTECTED]> > > wrote: > > > > > I have another question. For overwriting, do I need to delete the > > existing > > > one before re-writing it? > > > > > > On Sat, May 14, 2011 at 10:17 AM, Weishung Chung <[EMAIL PROTECTED]> > > > wrote: > > > > > > > Yes, it's simple yet useful. I am integrating it. Thanks alot :) > > > > > > > > > > > > On Fri, May 13, 2011 at 3:12 PM, Alex Baranau < > > [EMAIL PROTECTED] > > > >wrote: > > > > > > > >> Thanks for the interest! > > > >> > > > >> We are using it in production. It is simple and hence quite stable. > > > Though > > > >> some minor pieces are missing (like > > > >> https://github.com/sematext/HBaseWD/issues/1) this doesn't affect > > > >> stability > > > >> and/or major functionality. > > > >> > > > >> Alex Baranau > > > >> ---- > > > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop > - > > > >> HBase > > > >> > > > >> On Fri, May 13, 2011 at 10:45 AM, Weishung Chung < > [EMAIL PROTECTED]> > > > >> wrote: > > > >> > > > >> > What's the status on this package? Is it mature enough? > > > >> > I am using it in my project, tried out the write method yesterday > > and > > > >> > going > > > >> > to incorporate into read method tomorrow. > > > >> > > > > >> > On Wed, May 11, 2011 at 3:41 PM, Alex Baranau < > > > [EMAIL PROTECTED] > > > >> > >wrote: > > > >> > > > > >> > > > The start/end rows may be written twice. > > > >> > > > > > >> > > Yeah, I know. I meant that size of startRow+stopRow data is > > > "bearable" > > > >> in > > > >> > > attribute value no matter how long are they (keys), since we > > already > > > >> OK > > > >> > > with > > > >> > > transferring them initially (i.e. we should be OK with > > transferring > > > 2x > > > >> > > times > > > >> > > more). > > > >> > > > > > >> > > So, what about the suggestion of sourceScan attribute value I > > > >> mentioned? > > > >> > If > > > >> > > you can tell why it isn't sufficient in your case, I'd have more > > > info > > > >> to > > > >> > > think about better suggestion ;) > > > >> > > > > > >> > > > It is Okay to keep all versions of your patch in the JIRA. +
Weishung Chung 2011-05-19, 03:50
-
Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBaseAlex Baranau 2011-05-19, 13:14
Implemented RowKeyDistributorByHashPrefix. From README:
Another useful RowKeyDistributor is RowKeyDistributorByHashPrefix. Please see example below. It creates "distributed key" based on original key value so that later when you have original key and want to update the record you can calculate distributed key without roundtrip to HBase. AbstractRowKeyDistributor keyDistributor new RowKeyDistributorByHashPrefix( new RowKeyDistributorByHashPrefix.OneByteSimpleHash(15)); You can use your own hashing logic here by implementing simple interface: public static interface Hasher extends Parametrizable { byte[] getHashPrefix(byte[] originalKey); byte[][] getAllPossiblePrefixes(); } OneByteSimpleHash implements very simple hash algorythm: simple sum of all bytes in row key % maxBuckets. In example above 15 is maxBuckets count. You can use buckets count # up to 255. Please, use wisely, as (the same thing as with byOneByte prefix) Disctributed scanner will instantiate this number of scans under the hood. With this row key hash-based distributor, you can find out the distributed key (and use it to update the record) without roundtrip to HBase. From unit-test: // Testing simple get byte[] originalKey = new byte[] {123, 124, 122}; Put put = new Put(keyDistributor.getDistributedKey(originalKey)); put.add(CF, QUAL, Bytes.toBytes("some")); hTable.put(put); byte[] distributedKey = keyDistributor.getDistributedKey(originalKey); Result result = hTable.get(new Get(distributedKey)); Assert.assertArrayEquals(originalKey, keyDistributor.getOriginalKey(result.getRow())); Assert.assertArrayEquals(Bytes.toBytes("some"), result.getValue(CF, QUAL)); NOTE: This feature is included in hbasewd-0.1.0-SNAPSHOT-2011.05.19.jar (downloadable from https://github.com/sematext/HBaseWD) Alex Baranau ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase P.S. > Can you summarize HBaseWD in your blog That is on my todo list! You pushed it higher to the top priority items ;) On Thu, May 19, 2011 at 6:50 AM, Weishung Chung <[EMAIL PROTECTED]> wrote: > I have another question about option 2. It seems like I need to handle the > distributed scan differently to read from start row to end row, assuming 1 > byte hash of the original key is used as prefix since the order of the > original key range is different from the resulting distributed key range. > > On Wed, May 18, 2011 at 6:18 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > Alex: > > Can you summarize HBaseWD in your blog, including points 1 and 2 below ? > > > > Thanks > > > > On Wed, May 18, 2011 at 8:03 AM, Alex Baranau <[EMAIL PROTECTED] > > >wrote: > > > > > There are several options here. E.g.: > > > > > > 1) Given that you have "original key" of the record, you can fetch the > > > stored record key from HBase and use it to create Put with updated (or > > new) > > > cells. > > > > > > Currently you'll need to use distributes scan for that, there's not > > > analogue > > > for Get operation yet (see > https://github.com/sematext/HBaseWD/issues/1 > > ). > > > > > > Note: you need to first find out the real key of stored record by > > fetching > > > data from HBase in case you use included in current lib > > > RowKeyDistributorByOneBytePrefix. Alternatively, see next option: > > > > > > 2) You can create your own RowKeyDistributor implementation which will > > > create "distributed key" based on original key value so that later when > > you > > > have original key and want to update the record you can calculate > > > distributed key without roundtrip to HBase. > > > > > > E.g. your RowKeyDistributor implementation you can calculate 1-byte > hash > > of > > > original key (https://github.com/sematext/HBaseWD/issues/2). > > > > > > > > > > > > In either way you don't need to delete record to update some cells of > it > > or > > > add new cells. > > > > > > Please let me know if you have more Qs! > > > > > > Alex Baranau +
Alex Baranau 2011-05-19, 13:14
-
Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBaseWeishung Chung 2011-05-19, 13:45
Awesome, I'm going to check it out and use it today. Thank you :)
On Thu, May 19, 2011 at 8:14 AM, Alex Baranau <[EMAIL PROTECTED]>wrote: > Implemented RowKeyDistributorByHashPrefix. From README: > > Another useful RowKeyDistributor is RowKeyDistributorByHashPrefix. Please > see > example below. It creates "distributed key" based on original key value > > so that later when you have original key and want to update the record you > can > calculate distributed key without roundtrip to HBase. > > AbstractRowKeyDistributor keyDistributor > new RowKeyDistributorByHashPrefix( > new RowKeyDistributorByHashPrefix.OneByteSimpleHash(15)); > > You can use your own hashing logic here by implementing simple interface: > > public static interface Hasher extends Parametrizable { > byte[] getHashPrefix(byte[] originalKey); > byte[][] getAllPossiblePrefixes(); > } > > > OneByteSimpleHash implements very simple hash algorythm: simple sum of all > bytes in row key % maxBuckets. In example above 15 is maxBuckets count. You > can use buckets count # up to 255. Please, use wisely, as (the same thing as > with byOneByte prefix) Disctributed scanner will instantiate this number of > scans under the hood. > > With this row key hash-based distributor, you can find out the distributed > key (and use it to update the record) without roundtrip to HBase. From > unit-test: > > // Testing simple get > byte[] originalKey = new byte[] {123, 124, 122}; > > Put put = new Put(keyDistributor.getDistributedKey(originalKey)); > put.add(CF, QUAL, Bytes.toBytes("some")); > hTable.put(put); > > byte[] distributedKey = keyDistributor.getDistributedKey(originalKey); > Result result = hTable.get(new Get(distributedKey)); > Assert.assertArrayEquals(originalKey, > keyDistributor.getOriginalKey(result.getRow())); > Assert.assertArrayEquals(Bytes.toBytes("some"), result.getValue(CF, > QUAL)); > > > NOTE: This feature is included in hbasewd-0.1.0-SNAPSHOT-2011.05.19.jar > (downloadable from https://github.com/sematext/HBaseWD) > > Alex Baranau > ---- > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase > > P.S. > > Can you summarize HBaseWD in your blog > That is on my todo list! You pushed it higher to the top priority items ;) > > > On Thu, May 19, 2011 at 6:50 AM, Weishung Chung <[EMAIL PROTECTED]>wrote: > >> I have another question about option 2. It seems like I need to handle the >> distributed scan differently to read from start row to end row, assuming 1 >> byte hash of the original key is used as prefix since the order of the >> original key range is different from the resulting distributed key range. >> >> On Wed, May 18, 2011 at 6:18 PM, Ted Yu <[EMAIL PROTECTED]> wrote: >> >> > Alex: >> > Can you summarize HBaseWD in your blog, including points 1 and 2 below ? >> > >> > Thanks >> > >> > On Wed, May 18, 2011 at 8:03 AM, Alex Baranau <[EMAIL PROTECTED] >> > >wrote: >> > >> > > There are several options here. E.g.: >> > > >> > > 1) Given that you have "original key" of the record, you can fetch the >> > > stored record key from HBase and use it to create Put with updated (or >> > new) >> > > cells. >> > > >> > > Currently you'll need to use distributes scan for that, there's not >> > > analogue >> > > for Get operation yet (see >> https://github.com/sematext/HBaseWD/issues/1 >> > ). >> > > >> > > Note: you need to first find out the real key of stored record by >> > fetching >> > > data from HBase in case you use included in current lib >> > > RowKeyDistributorByOneBytePrefix. Alternatively, see next option: >> > > >> > > 2) You can create your own RowKeyDistributor implementation which will >> > > create "distributed key" based on original key value so that later >> when >> > you >> > > have original key and want to update the record you can calculate >> > > distributed key without roundtrip to HBase. >> > > >> > > E.g. your RowKeyDistributor implementation you can calculate 1-byte +
Weishung Chung 2011-05-19, 13:45
|