Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Custom Filter and SEEK_NEXT_USING_HINT issue


+
Eugeny Morozov 2013-01-18, 23:28
+
Ted Yu 2013-01-18, 23:56
+
Eugeny Morozov 2013-01-19, 09:36
+
Ted 2013-01-19, 13:16
+
Eugeny Morozov 2013-01-20, 21:22
+
Michael Segel 2013-01-21, 00:22
+
Eugeny Morozov 2013-01-21, 08:16
+
ramkrishna vasudevan 2013-01-21, 08:56
+
Anoop Sam John 2013-01-21, 08:59
Copy link to this message
-
Re: Custom Filter and SEEK_NEXT_USING_HINT issue
Anoop, Ramkrishna

Thank you for explanation! I've got it.
On Mon, Jan 21, 2013 at 12:59 PM, Anoop Sam John <[EMAIL PROTECTED]> wrote:

> > I suppose if scanning process has started at once on
> all regions, then I would find in log files at least one value per region,
> but I have found one value per region only for those regions, that resides
> before the particular one.
>
> @Eugeny -  FuzzyFilter like any other filter works at the server side. The
> scanning from client side will be like sequential starting from the 1st
> region (Region with empty startkey or the corresponding region which
> contains the startkey whatever you mentioned in your scan). From client,
> request will go to RS for scanning a region. Once that region is over the
> next region will be contacted for scan(from client) and so on.  There is no
> parallel scanning of multiple regions from client side.  [This is when
> using a HTable scan APIs]
>
> When MR used for scanning, we will be doing parallel scans from all the
> regions. Here will be having mappers per region.  But the normal scan from
> client side will be sequential on the regions not parallel.
>
> -Anoop-
> ________________________________________
> From: Eugeny Morozov [[EMAIL PROTECTED]]
> Sent: Monday, January 21, 2013 1:46 PM
> To: [EMAIL PROTECTED]
> Cc: Alex Baranau
> Subject: Re: Custom Filter and SEEK_NEXT_USING_HINT issue
>
> Finally, the mystery has been solved.
>
> Small remark before I explain everything.
>
> The situation with only region is absolutely the same:
> Fzzy: AAAA1Q7iQ9JA
> Next fzzy: F7dtxwqVQ_Pw  <-- the value I'm trying to find.
> Fzzy: F7dt8QWPSIDw
> Somehow FuzzyRowFilter has just omit my value here.
>
>
> So, the explanation.
> In javadoc for FuzzyRowFilter question mark is used as substitution for
> unknown value. Of course it's possible to use anything including zero
> instead of question mark.
> For quite some time we used literals to encode our keys. Literals like
> you've seen already: AAAA1Q7iQ9JA or F7dt8QWPSIDw. But that's Base64 form
> of just 8 bytes, which requires 1.5 times more space. So we've decided to
> store raw version - just  byte[8]. But unfortunately the symbol '?' is
> exactly in the middle of the byte (according to ascii table
> http://www.asciitable.com/), which means with FuzzyRowFilter we skip half
> of values in some cases. In the same time question mark is exactly before
> any letter that could be used in key.
>
> Despite the fact we have integration tests - that's just a coincidence we
> haven't such an example in there.
>
> So, as an advice - always use zero instead of question mark for
> FuzzyRowFilter.
>
> Thank's to everyone!
>
> P.S. But the question with region scanning order is still here. I do not
> understand why with FuzzyFilter it goes from one region to another until it
> stops at the value. I suppose if scanning process has started at once on
> all regions, then I would find in log files at least one value per region,
> but I have found one value per region only for those regions, that resides
> before the particular one.
>
>
> On Mon, Jan 21, 2013 at 4:22 AM, Michael Segel <[EMAIL PROTECTED]
> >wrote:
>
> > If its the same class and its not a patch, then the first class loaded
> > wins.
> >
> > So if you have a Class Foo and HBase has a Class Foo, your code will
> never
> > see the light of day.
> >
> > Perhaps I'm stating the obvious but its something to think about when
> > working w Hadoop.
> >
> > On Jan 19, 2013, at 3:36 AM, Eugeny Morozov <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Ted,
> > >
> > > that is correct.
> > > HBase 0.92.x and we use part of the patch 6509.
> > >
> > > I use the filter as a custom filter, it lives in separate jar file and
> > goes
> > > to HBase's classpath. I did not patch HBase.
> > > Moreover I do not use protobuf's descriptions that comes with the
> filter
> > in
> > > patch. Only two classes I have - FuzzyRowFilter itself and its test
> > class.
> > >
> >
Evgeny Morozov
Developer Grid Dynamics
Skype: morozov.evgeny
www.griddynamics.com
[EMAIL PROTECTED]
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB