Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase


Copy link to this message
-
Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase
> Can you remove the first version ?
Isn't it ok to keep it in JIRA issue?

> In HBaseWD, can you use reflection to detect whether Scan supports
setAttribute() ?
> If it does, can you encode start row and end row as "sourceScan" attribute
?

Yeah, smth like this is going to be implemented. Though I'd still want to
hear from the devs the story about Scan version.

> One consideration is that start row or end row may be quite long.

Yeah, that is was my though too at first. Though it might be ok, since we
anyways "transfer" start/stop rows with Scan object.

> What do you think ?

I'd love to hear from you is this variant I mentioned is what we are looking
at here:

> From what I understand, you want to distinguish scans fired by the same
distributed scan.
> I.e. group scans which were fired by single distributed scan. If that's
what you want, distributed
> scan can generate unique ID and set, say "sourceScan" attribute to its
value. This way we'll
> have <# of distinct "sourceScan" attribute values> = <number of
distributed scans invoked by
> client side> and two scans on server side will have the same "sourceScan"
attribute iff they
> "belong" to same distributed scan.
Alex Baranau
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase

On Wed, May 11, 2011 at 5:15 PM, Ted Yu <[EMAIL PROTECTED]> wrote:

> Alex:
> Your second patch looks good.
> Can you remove the first version ?
>
> In HBaseWD, can you use reflection to detect whether Scan supports
> setAttribute() ?
> If it does, can you encode start row and end row as "sourceScan" attribute
> ?
>
> One consideration is that start row or end row may be quite long.
> Ideally we should store hash code of source Scan object as "sourceScan"
> attribute. But Scan doesn't implement hashCode(). We can add it, that would
> require running all Scan related tests.
>
> What do you think ?
>
> Thanks
>
>
> On Tue, May 10, 2011 at 5:46 AM, Alex Baranau <[EMAIL PROTECTED]>wrote:
>
>> Sorry for the delay in response (public holidays here).
>>
>> This depends on what info you are looking for on server side.
>>
>> From what I understand, you want to distinguish scans fired by the same
>> distributed scan. I.e. group scans which were fired by single distributed
>> scan. If that's what you want, distributed scan can generate unique ID and
>> set, say "sourceScan" attribute to its value. This way we'll have <# of
>> distinct "sourceScan" attribute values> = <number of distributed scans
>> invoked by client side> and two scans on server side will have the same
>> "sourceScan" attribute iff they "belong" to same distributed scan.
>>
>> Is this what are you looking for?
>>
>> Alex Baranau
>>
>> P.S. attached patch for HBASE-3811<https://issues.apache.org/jira/browse/HBASE-3811>
>> .
>> P.S-2. should this conversation be moved to dev list?
>>
>> ----
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
>> HBase
>>
>> On Fri, May 6, 2011 at 12:06 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
>>
>>> Alex:
>>> What type of identification should we put in the map of the Scan object ?
>>> I am thinking of using the Id of RowKeyDistributor. But the user can use
>>> same distributor on multiple scans.
>>>
>>> Please share your thought.
>>>
>>>
>>> On Thu, Apr 21, 2011 at 8:32 AM, Alex Baranau <[EMAIL PROTECTED]>wrote:
>>>
>>>> https://issues.apache.org/jira/browse/HBASE-3811
>>>>
>>>> Alex Baranau
>>>> ----
>>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
>>>> HBase
>>>>
>>>> On Thu, Apr 21, 2011 at 5:57 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>>>>
>>>> > My plan was to make regions that have active scanners more stable -
>>>> trying
>>>> > not to move them when balancing.
>>>> > I prefer second approach - adding custom attribute(s) to Scan so that
>>>> the
>>>> > Scans created by the method below can be 'grouped'.
>>>> >
>>>> > If you can file a JIRA, that would be great.
>>>> >
>>>> > On Thu, Apr 21, 2011 at 7:23 AM, Alex Baranau <