Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Custom Filter and SEEK_NEXT_USING_HINT issue


Copy link to this message
-
Re: Custom Filter and SEEK_NEXT_USING_HINT issue
Ted 2013-01-19, 13:16
In your original email you said the first key looked like start key of a region, can you verify that ?

Thanks

On Jan 19, 2013, at 1:36 AM, Eugeny Morozov <[EMAIL PROTECTED]> wrote:

> Ted,
>
> that is correct.
> HBase 0.92.x and we use part of the patch 6509.
>
> I use the filter as a custom filter, it lives in separate jar file and goes
> to HBase's classpath. I did not patch HBase.
> Moreover I do not use protobuf's descriptions that comes with the filter in
> patch. Only two classes I have - FuzzyRowFilter itself and its test class.
>
> And it works perfectly on small dataset like 100 rows (1 region). But when
> my dataset is more than 10mln (260 regions), it somehow loosing rows. I'm
> not sure, but it seems to me it is not fault of the filter.
>
>
> On Sat, Jan 19, 2013 at 3:56 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
>> To my knowledge CDH-4.1.2 is based on HBase 0.92.x
>>
>> Looks like you were using patch from HBASE-6509 which was integrated to
>> trunk only.
>> Please confirm.
>>
>> Copying Alex who wrote the patch.
>>
>> Cheers
>>
>> On Fri, Jan 18, 2013 at 3:28 PM, Eugeny Morozov
>> <[EMAIL PROTECTED]>wrote:
>>
>>> Hi, folks!
>>>
>>> HBase, Hadoop, etc version is CDH-4.1.2
>>>
>>> I'm using custom FuzzyRowFilter, which I get from
>> http://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes-in-hbase/and
>>> suddenly after quite a time we found that it starts loosing data.
>>>
>>> Basically the idea of FuzzyRowFilter is that it tries to find key that
>> has
>>> been provided and if there is no such a key - but more exists in table -
>> it
>>> returns SEEK_NEXT_USING_HINT. And in getNextKeyHint(...) it builds
>> required
>>> key. As I understand, HBase in this key will fast-forward to required
>> key -
>>> it must be similar or same as to get Scan with setStartRow.
>>>
>>> I'm trying to find key F7dt8QWPSIDw, it is definitely in HBase - I'm able
>>> to get it using Scan.setStartRow.
>>> For FuzzyFilter I'm using empty Scan - I didn't specify start row, stop
>> row
>>> or anything related.
>>> That's what happening:
>>>
>>> Fzzy: AAAA1Q7iQ9JA
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: AQAAnA96rxTg
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: AgAADQWPSIDw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: AwAA-Q33Zb9Q
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: BAAAOg8oyu7A
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: BQAA9gqVQrTw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: BgABZQ7iQ9JA
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: BwAAbgrpAojg
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: CAAAUQWPSIDw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: CQABVgqVQrTw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: CgAAOQ7iQ9JA
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: CwAALwqVQrTw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: DAAAMwWPSIDw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: DQAADgjqzsIQ
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: DgAAOgCcWv9g
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: DwAAKg7iQ9JA
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: EAAAugqVQrTw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: EQAAJAqVQrTw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: EgAABgIOMBgg
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: EwAAEwqVQrTw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: FAAACQqVQrTw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: FQAAIAqVQrTw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: FgAAeAWPSIDw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: FwAAAw33Zb9Q
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: F7dt8QWPSIDw
>>>
>>> It's obvious that my FuzzyRowFilter knows what to search and every time
>> it
>>> repeats its question.
>>> The very first key - I suppose is just the first key of a region where my
>>> key is located.
>>> The very last key - is the key that is already bigger than what I'm
>> trying
>>> to find - that's the reason why FuzzyFilter stopped there.
>>>
>>> Do you know any issue with SEEK_NEXT_USING_HINT? I've searched, but
>>> unsuccessfully.
>>> Do you have any idea how to explain these many trials?
>>>
>>> Thanks in advance.
>>> --
>>> Evgeny Morozov
>>> Developer Grid Dynamics
>>> Skype: morozov.evgeny