Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Re: HBase - Secondary Index


Copy link to this message
-
Re: HBase - Secondary Index
Hi Anoop,

Please find my reply inline.

Thanks,
Anil Gupta

On Sun, Dec 16, 2012 at 8:02 PM, Anoop Sam John <[EMAIL PROTECTED]> wrote:

> Hi Anil
>                 During the scan, there is no need to fetch any index data
> to client side. So there is no need to create any scanner on the index
> table at the client side. This happens at the server side.
>
>
> For the Scan on the main table with condition on timestamp and customer
> id, a scanner to be created with Filters. Yes like normal when there is no
> secondary index. So this scan from the client will go through all the
> regions in the main table.
Anil: Do you mean that if the table is spread across 50 region servers in
60 node cluster then we need to send a scan request to all the 50 RS.
Right? Doesn't it sounds expensive? IMHO you were not doing this in your
solution. Your solution looked cleaner than this since you exactly knew
which Node you need to go to for querying while using secondary index due
to co-location(due to static begin part for secondary table rowkey) of
region of primary table and secondary index table. My problem is little
more complicated due to the constraints that: I cannot have a "static begin
part" in the rowkey of my secondary table.

When it scans one particular region say (x,y] on the main table, using the
> CP we can get the index table region object corresponding to this main
> table region from the RS.  There is no issue in creating the static part of
> the rowkey. You know 'x' is the region start key. Then at the server side
> will create a scanner on the index region directly and here we can specify
> the startkey. 'x' + <timestamp value> + <customer id>..  Using the results
> from the index scan we will make reseek on the main region to the exact
> rows where the data what we are interested in is available. So there wont
> be a full region data scan happening.
>

> When in the cases where only timestamp is there but no customer id, it
> will be simple again. Create a scanner on the main table with only one
> filter. At the CP side the scanner on the index region will get created
> with startkey as 'x' + <timestamp value>..    When you create the scan
> object and set startRow on that it need not be the full rowkey. It can be
> part of the rowkey also. Yes like prefix.
>
> Hope u got it now :)
>
Anil: I hope now we are on same page. Thanks a lot for your valuable time
to discuss this stuff.

>
> -Anoop-
> ________________________________________
> From: anil gupta [[EMAIL PROTECTED]]
> Sent: Friday, December 14, 2012 11:31 PM
> To: [EMAIL PROTECTED]
> Subject: Re: HBase - Secondary Index
>
> On Fri, Dec 14, 2012 at 12:54 AM, Anoop Sam John <[EMAIL PROTECTED]>
> wrote:
>
> > Hi Anil,
> >
> > >1. In your presentation you mentioned that region of Primary Table and
> > Region of Secondary Table are always located on the same region server.
> How
> > do you achieve it? By using the Primary table rowkey as prefix of  Rowkey
> > of Secondary Table? Will your implementation work if the rowkey of
> primary
> > table cannot be used as prefix in rowkey of Secondary table( i have this
> > limitation in my use case)?
> > First all there will be same number of regions in both primary and index
> > tables. All the start/stop keys of the regions also will be same.
> > Suppose there are 2 regions on main table say for keys 0-10 and 10-20.
> >  Then we will create 2 regions in index table also with same key ranges.
> > At the master balancing level it is easy to collocate these regions
> seeing
> > the start and end keys.
> > When the selection of the rowkey that will be used in the index table is
> > the key here.
> > What we will do is all the rowkeys in the index table will be prefixed
> > with the start key of the region/
> > When an entry is added to the main table with rowkey as 5 it will go to
> > the 1st region (0-10)
> > Now there will be index region with range as 0-10.  We will select this
> > region to store this index data.
>
Thanks & Regards,
Anil Gupta
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB