Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Custom Filter and SEEK_NEXT_USING_HINT issue


+
Eugeny Morozov 2013-01-18, 23:28
+
Ted Yu 2013-01-18, 23:56
+
Eugeny Morozov 2013-01-19, 09:36
+
Ted 2013-01-19, 13:16
+
Eugeny Morozov 2013-01-20, 21:22
Copy link to this message
-
Re: Custom Filter and SEEK_NEXT_USING_HINT issue
If its the same class and its not a patch, then the first class loaded wins.

So if you have a Class Foo and HBase has a Class Foo, your code will never see the light of day.

Perhaps I'm stating the obvious but its something to think about when working w Hadoop.

On Jan 19, 2013, at 3:36 AM, Eugeny Morozov <[EMAIL PROTECTED]> wrote:

> Ted,
>
> that is correct.
> HBase 0.92.x and we use part of the patch 6509.
>
> I use the filter as a custom filter, it lives in separate jar file and goes
> to HBase's classpath. I did not patch HBase.
> Moreover I do not use protobuf's descriptions that comes with the filter in
> patch. Only two classes I have - FuzzyRowFilter itself and its test class.
>
> And it works perfectly on small dataset like 100 rows (1 region). But when
> my dataset is more than 10mln (260 regions), it somehow loosing rows. I'm
> not sure, but it seems to me it is not fault of the filter.
>
>
> On Sat, Jan 19, 2013 at 3:56 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
>> To my knowledge CDH-4.1.2 is based on HBase 0.92.x
>>
>> Looks like you were using patch from HBASE-6509 which was integrated to
>> trunk only.
>> Please confirm.
>>
>> Copying Alex who wrote the patch.
>>
>> Cheers
>>
>> On Fri, Jan 18, 2013 at 3:28 PM, Eugeny Morozov
>> <[EMAIL PROTECTED]>wrote:
>>
>>> Hi, folks!
>>>
>>> HBase, Hadoop, etc version is CDH-4.1.2
>>>
>>> I'm using custom FuzzyRowFilter, which I get from
>>>
>>>
>> http://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes-in-hbase/and
>>> suddenly after quite a time we found that it starts loosing data.
>>>
>>> Basically the idea of FuzzyRowFilter is that it tries to find key that
>> has
>>> been provided and if there is no such a key - but more exists in table -
>> it
>>> returns SEEK_NEXT_USING_HINT. And in getNextKeyHint(...) it builds
>> required
>>> key. As I understand, HBase in this key will fast-forward to required
>> key -
>>> it must be similar or same as to get Scan with setStartRow.
>>>
>>> I'm trying to find key F7dt8QWPSIDw, it is definitely in HBase - I'm able
>>> to get it using Scan.setStartRow.
>>> For FuzzyFilter I'm using empty Scan - I didn't specify start row, stop
>> row
>>> or anything related.
>>> That's what happening:
>>>
>>> Fzzy: AAAA1Q7iQ9JA
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: AQAAnA96rxTg
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: AgAADQWPSIDw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: AwAA-Q33Zb9Q
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: BAAAOg8oyu7A
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: BQAA9gqVQrTw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: BgABZQ7iQ9JA
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: BwAAbgrpAojg
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: CAAAUQWPSIDw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: CQABVgqVQrTw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: CgAAOQ7iQ9JA
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: CwAALwqVQrTw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: DAAAMwWPSIDw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: DQAADgjqzsIQ
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: DgAAOgCcWv9g
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: DwAAKg7iQ9JA
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: EAAAugqVQrTw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: EQAAJAqVQrTw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: EgAABgIOMBgg
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: EwAAEwqVQrTw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: FAAACQqVQrTw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: FQAAIAqVQrTw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: FgAAeAWPSIDw
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: FwAAAw33Zb9Q
>>> Next fzzy: F7dtxwqVQ_Pw
>>> Fzzy: F7dt8QWPSIDw
>>>
>>> It's obvious that my FuzzyRowFilter knows what to search and every time
>> it
>>> repeats its question.
>>> The very first key - I suppose is just the first key of a region where my
>>> key is located.
>>> The very last key - is the key that is already bigger than what I'm
>> trying
>>> to find - that's the reason why FuzzyFilter stopped there.
>>>
>>> Do you know any issue with SEEK_NEXT_USING_HINT? I've searched, but
>>> unsuccessfully.
+
Eugeny Morozov 2013-01-21, 08:16
+
ramkrishna vasudevan 2013-01-21, 08:56
+
Anoop Sam John 2013-01-21, 08:59
+
Eugeny Morozov 2013-01-21, 11:44
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB