Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Scan (Start Row, End Row) vs Scan (Row)


Copy link to this message
-
RE: Scan (Start Row, End Row) vs Scan (Row)
The best way to do this is as Friso describes, using the existing stopRow parameter in Scan.

There is another way to do it with startRow + a filter.  There is a PrefixFilter which could be used here.  Looking at the code, it seems as though the PrefixFilter does an early out and stops the scan once passed the prefix.

If not, you can wrap any filter in a WhileMatchFilter.  That wrapping filter will make it so once the underlying filter fails once, all further things will fail and the scan will early out.

JG

> -----Original Message-----
> From: Friso van Vollenhoven [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, January 20, 2011 12:45 AM
> To: <[EMAIL PROTECTED]>
> Subject: Re: Scan (Start Row, End Row) vs Scan (Row)
>
> Performing a scan with
>
> start row = 20100809041500_abd
> end row = 20100809041500_abe
>
> will give you just that. The end row is exclusive, so it will only return rows
> with VAR1 = abd. You need to compute the 'abe' yourself, though (which is
> basically taking 'abd' and increasing the right most byte by 1 unless it's at max
> byte value, then set it to 0 and increase the byte left to that by 1, etc.). There
> is no scan method that has 'starts with' semantics, AFAIK.
>
> See here:
> http://hbase.apache.org/docs/r0.89.20100924/apidocs/org/apache/hadoop/
> hbase/client/Scan.html#Scan(byte[],
> byte[])<http://hbase.apache.org/docs/r0.89.20100924/apidocs/org/apache/
> hadoop/hbase/client/Scan.html#Scan(byte%5B%5D,%20byte%5B%5D)>
>
>
> Friso
>
>
>
>
> On 20 jan 2011, at 09:22, Shuja Rehman wrote:
>
> Hi
> Consider the following scenario.
>
> Row Key  Format = DATETIME_VAR1_VAR2 (where var1 and var2 have fixed
> lengths)
>
> and example data could be
>
> 20100809041500_abc_xyz
> 20100809041500_abc_xyw
> 20100809041500_abc_xyc
> *20100809041500_abd_xyz*
> 20100809041500_abd_xyw
> 20100809041500_abf_xyz
> ...
>
> Now if i want to get the rows which only have this row key
> 20100809041500_abd then is there anyway to achieve through scan without
> using filter because if i use filter scan(startrow, filter) where
> startrow="20100809041500_abd" then it will scan whole table from start key
> to end of table. i want to just scan that part of table which i require. So if
> there is any method like this
>
> scan(row)  where row ="20100809041500_abd"  and it just return the
> following results
>
> 20100809041500_abd_xyz
> 20100809041500_abd_xyw
>
> Kindly let me know whether it is achievable or not?
> thnx
> --
> Regards
> Shuja-ur-Rehman Baig
> <http://pk.linkedin.com/in/shujamughal>