HBase >> mail # user >> Re: HBase - Secondary Index

Re: HBase - Secondary Index
Hi Michael,

Please find my replies inline.


On Tue, Dec 18, 2012 at 1:02 AM, Michel Segel <[EMAIL PROTECTED]>wrote:

> Just a couple of questions...
> First, since you don't have any natural secondary indices, you can create
> one from a couple of choices. Keeping it simple, you choose an inverted
> table as your index.
Reasons for not creating a inverted table:
1. There can be millions of columns corresponding to a rowkey in my
secondary index. In future it can even grow more.
2. While using secondary index, we are also planning to have filtering on
the basis of other non-rowkey columns.
For example: 1 Row of Secondary table might look like this:
Rowkey: cf:PrimarytableRowKey=x, cf:customerFirstName=xyz,
cf:customerAddress=123, Union Sq, LA
My primary table has around 50 columns and in secondary table i duplicate
two columns to used along with secondary index for filtering.

> In doing so, you have one column containing all of the row ids for a given
> value.
> This means that it is a simple get().
> My question is that since you don't have any formal SQL syntax, how are
> you doing this all server side?
As Anoop said, I am not doing the index data scan at the server side. He
scan the index table data back to client and from client doing gets to get
the main table data.

> Sent from a remote device. Please excuse any typos...
> Mike Segel
> On Dec 18, 2012, at 2:28 AM, anil gupta <[EMAIL PROTECTED]> wrote:
> > Hi Anoop,
> >
> > Please find my reply inline.
> >
> > Thanks,
> > Anil Gupta
> >
> > On Sun, Dec 16, 2012 at 8:02 PM, Anoop Sam John <[EMAIL PROTECTED]>
> wrote:
> >
> >> Hi Anil
> >>                During the scan, there is no need to fetch any index data
> >> to client side. So there is no need to create any scanner on the index
> >> table at the client side. This happens at the server side.
> >
> >
> >>
> >> For the Scan on the main table with condition on timestamp and customer
> >> id, a scanner to be created with Filters. Yes like normal when there is
> no
> >> secondary index. So this scan from the client will go through all the
> >> regions in the main table.
> >
> >
> > Anil: Do you mean that if the table is spread across 50 region servers in
> > 60 node cluster then we need to send a scan request to all the 50 RS.
> > Right? Doesn't it sounds expensive? IMHO you were not doing this in your
> > solution. Your solution looked cleaner than this since you exactly knew
> > which Node you need to go to for querying while using secondary index due
> > to co-location(due to static begin part for secondary table rowkey) of
> > region of primary table and secondary index table. My problem is little
> > more complicated due to the constraints that: I cannot have a "static
> begin
> > part" in the rowkey of my secondary table.
> >
> > When it scans one particular region say (x,y] on the main table, using
> the
> >> CP we can get the index table region object corresponding to this main
> >> table region from the RS.  There is no issue in creating the static
> part of
> >> the rowkey. You know 'x' is the region start key. Then at the server
> side
> >> will create a scanner on the index region directly and here we can
> specify
> >> the startkey. 'x' + <timestamp value> + <customer id>..  Using the
> results
> >> from the index scan we will make reseek on the main region to the exact
> >> rows where the data what we are interested in is available. So there
> wont
> >> be a full region data scan happening.
> >
> >> When in the cases where only timestamp is there but no customer id, it
> >> will be simple again. Create a scanner on the main table with only one
> >> filter. At the CP side the scanner on the index region will get created
> >> with startkey as 'x' + <timestamp value>..    When you create the scan
> >> object and set startRow on that it need not be the full rowkey. It can
> be
> >> part of the rowkey also. Yes like prefix.
> >>
> >> Hope u got it now :)
> > Anil: I hope now we are on same page. Thanks a lot for your valuable time

Thanks & Regards,
Anil Gupta