Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Understanding scan behaviour


Copy link to this message
-
Re: Understanding scan behaviour
See javadoc of TimestampsFilter which reveals how you can narrow the scan:

 * Note: Use of this filter overrides any time range/time stamp

 * options specified using {@link
org.apache.hadoop.hbase.client.Get#setTimeRange(long, long)},

 * {@link org.apache.hadoop.hbase.client.Scan#setTimeRange(long, long)}, {@link
org.apache.hadoop.hbase.client.Get#setTimeStamp(long)},

 * or {@link org.apache.hadoop.hbase.client.Scan#setTimeStamp(long)}.
Answer to your second question is Yes.

On Thu, Mar 28, 2013 at 10:17 AM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:

> Could the prefix filter lead to full tablescan? In other words is
> PrefixFilter applied after fetching the rows?
>
> Another question I have is say I have row key abc and abd and I search for
> row "abc", is it always guranteed to be the first key when returned from
> scanned results? If so I can alway put a condition in the client app.
>
> On Thu, Mar 28, 2013 at 9:15 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > Take a look at the following in
> > hbase-server/src/main/ruby/shell/commands/scan.rb
> > (trunk)
> >
> >   hbase> scan 't1', {FILTER => "(PrefixFilter ('row2') AND
> >     (QualifierFilter (>=, 'binary:xyz'))) AND (TimestampsFilter ( 123,
> > 456))"}
> >
> > Cheers
> >
> > On Thu, Mar 28, 2013 at 9:02 AM, Mohit Anchlia <[EMAIL PROTECTED]
> > >wrote:
> >
> > > I see then I misunderstood the behaviour. My keys are id + timestamp so
> > > that I can do a range type search. So what I really want is to return a
> > row
> > > where id matches the prefix. Is there a way to do this without having
> to
> > > scan large amounts of data?
> > >
> > >
> > >
> > > On Thu, Mar 28, 2013 at 8:26 AM, Jean-Marc Spaggiari <
> > > [EMAIL PROTECTED]> wrote:
> > >
> > > > Hi Mohit,
> > > >
> > > > "+" ascii code is 43
> > > > "9" ascii code is 57.
> > > >
> > > > So "+9" is coming after "++". If you don't have any row with the
> exact
> > > > key "+++++", HBase will look for the first one after this one. And in
> > > > your case, it's +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF.
> > > >
> > > > JM
> > > >
> > > > 2013/3/28 Mohit Anchlia <[EMAIL PROTECTED]>:
> > > > > My understanding is that the row key would start with +++++ for
> > > instance.
> > > > >
> > > > > On Thu, Mar 28, 2013 at 7:53 AM, Jean-Marc Spaggiari <
> > > > > [EMAIL PROTECTED]> wrote:
> > > > >
> > > > >> Hi Mohit,
> > > > >>
> > > > >> I see nothing wrong with the results below. What would I have
> > > expected?
> > > > >>
> > > > >> JM
> > > > >>
> > > > >> 2013/3/28 Mohit Anchlia <[EMAIL PROTECTED]>:
> > > > >>  > I am running 92.1 version and this is what happens.
> > > > >> >
> > > > >> >
> > > > >> > hbase(main):003:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1,
> > STARTROW
> > > =>
> > > > >> > 'sdw0'}
> > > > >> > ROW                                                  COLUMN+CELL
> > > > >> >  s\xC1\xEAR\xDF\xEA&\x89\x91\xFF\x1A^\xB6d\xF0\xEC\x
> > > > >> > column=SID_T_MTX:\x00\x00Rc, timestamp=1363056261106,
> > > > >> > value=PAGE\x09\x091363056252990\x09\x09/
> > > > >> >  7F\xFF\xFE\xC2\xA3\x84Z\x7F
> > > > >> >
> > > > >> > 1 row(s) in 0.0450 seconds
> > > > >> > hbase(main):004:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1,
> > STARTROW
> > > =>
> > > > >> > '------'}
> > > > >> > ROW                                                  COLUMN+CELL
> > > > >> >  -\xA1\xAF>r\xBD\xE2L\x00\xCD*\xD7\xE8\xD6\x1Dk\x7F\
> > > > >> > column=SID_T_MTX:\x00\x00hF, timestamp=1363384706714,
> > > > >> > value=PAGE\x09239923973\x091363384698919\x09/
> > > > >> >  xFF\xFE\xC2\x8F\xF0\xC1\xBF
> > > > >> >   row(s) in 0.0500 seconds
> > > > >> > hbase(main):005:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1,
> > STARTROW
> > > =>
> > > > >> > '++++'}
> > > > >> > ROW                                                  COLUMN+CELL
> > > > >> >  +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF
> > > > >> > column=SID_T_MTX:\x00\x00<2, timestamp=1364404155426,
> > > > >> > value=PAGE\x09\x091364404145275\x09 \x09/