Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Re: HBase - Secondary Index


+
anil gupta 2012-12-14, 08:41
+
Anoop Sam John 2012-12-14, 08:54
+
ramkrishna vasudevan 2012-12-14, 11:34
+
anil gupta 2012-12-14, 18:01
+
Anoop Sam John 2012-12-17, 04:02
+
anil gupta 2012-12-18, 08:28
+
Anoop Sam John 2012-12-18, 09:27
+
anil gupta 2012-12-19, 08:24
+
Michel Segel 2012-12-18, 09:02
+
Anoop Sam John 2012-12-18, 09:35
Copy link to this message
-
Re: HBase - Secondary Index
anil gupta 2012-12-19, 08:39
Hi Michael,

Please find my replies inline.

Thanks,
Anil

On Tue, Dec 18, 2012 at 1:02 AM, Michel Segel <[EMAIL PROTECTED]>wrote:

> Just a couple of questions...
>
> First, since you don't have any natural secondary indices, you can create
> one from a couple of choices. Keeping it simple, you choose an inverted
> table as your index.
>
Reasons for not creating a inverted table:
1. There can be millions of columns corresponding to a rowkey in my
secondary index. In future it can even grow more.
2. While using secondary index, we are also planning to have filtering on
the basis of other non-rowkey columns.
For example: 1 Row of Secondary table might look like this:
Rowkey: cf:PrimarytableRowKey=x, cf:customerFirstName=xyz,
cf:customerAddress=123, Union Sq, LA
My primary table has around 50 columns and in secondary table i duplicate
two columns to used along with secondary index for filtering.

>
> In doing so, you have one column containing all of the row ids for a given
> value.
> This means that it is a simple get().
>
> My question is that since you don't have any formal SQL syntax, how are
> you doing this all server side?
>
As Anoop said, I am not doing the index data scan at the server side. He
scan the index table data back to client and from client doing gets to get
the main table data.

>
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Dec 18, 2012, at 2:28 AM, anil gupta <[EMAIL PROTECTED]> wrote:
>
> > Hi Anoop,
> >
> > Please find my reply inline.
> >
> > Thanks,
> > Anil Gupta
> >
> > On Sun, Dec 16, 2012 at 8:02 PM, Anoop Sam John <[EMAIL PROTECTED]>
> wrote:
> >
> >> Hi Anil
> >>                During the scan, there is no need to fetch any index data
> >> to client side. So there is no need to create any scanner on the index
> >> table at the client side. This happens at the server side.
> >
> >
> >>
> >> For the Scan on the main table with condition on timestamp and customer
> >> id, a scanner to be created with Filters. Yes like normal when there is
> no
> >> secondary index. So this scan from the client will go through all the
> >> regions in the main table.
> >
> >
> > Anil: Do you mean that if the table is spread across 50 region servers in
> > 60 node cluster then we need to send a scan request to all the 50 RS.
> > Right? Doesn't it sounds expensive? IMHO you were not doing this in your
> > solution. Your solution looked cleaner than this since you exactly knew
> > which Node you need to go to for querying while using secondary index due
> > to co-location(due to static begin part for secondary table rowkey) of
> > region of primary table and secondary index table. My problem is little
> > more complicated due to the constraints that: I cannot have a "static
> begin
> > part" in the rowkey of my secondary table.
> >
> > When it scans one particular region say (x,y] on the main table, using
> the
> >> CP we can get the index table region object corresponding to this main
> >> table region from the RS.  There is no issue in creating the static
> part of
> >> the rowkey. You know 'x' is the region start key. Then at the server
> side
> >> will create a scanner on the index region directly and here we can
> specify
> >> the startkey. 'x' + <timestamp value> + <customer id>..  Using the
> results
> >> from the index scan we will make reseek on the main region to the exact
> >> rows where the data what we are interested in is available. So there
> wont
> >> be a full region data scan happening.
> >
> >> When in the cases where only timestamp is there but no customer id, it
> >> will be simple again. Create a scanner on the main table with only one
> >> filter. At the CP side the scanner on the index region will get created
> >> with startkey as 'x' + <timestamp value>..    When you create the scan
> >> object and set startRow on that it need not be the full rowkey. It can
> be
> >> part of the rowkey also. Yes like prefix.
> >>
> >> Hope u got it now :)
> > Anil: I hope now we are on same page. Thanks a lot for your valuable time

Thanks & Regards,
Anil Gupta
+
Shengjie Min 2012-12-27, 11:23
+
Anoop Sam John 2012-12-27, 11:30
+
Shengjie Min 2012-12-27, 13:07
+
Anoop John 2012-12-27, 15:54
+
ramkrishna vasudevan 2012-12-27, 16:11
+
Shengjie Min 2012-12-27, 16:29
+
Anoop Sam John 2012-12-28, 03:33
+
Mohit Anchlia 2012-12-28, 03:42
+
Anoop Sam John 2012-12-28, 04:14
+
Shengjie Min 2012-12-28, 10:55
+
Adrien Mogenet 2013-01-06, 20:30
+
Anoop Sam John 2013-01-07, 03:48
+
Mohit Anchlia 2013-01-07, 04:17
+
Anoop Sam John 2013-01-07, 13:49
+
Michael Segel 2013-01-08, 14:33
+
lars hofhansl 2013-01-09, 00:30
+
Michel Segel 2013-01-09, 01:30
+
anil gupta 2013-01-09, 01:28
+
Anoop Sam John 2013-01-09, 03:22
+
ramkrishna vasudevan 2013-01-09, 04:11
+
Mohit Anchlia 2013-01-09, 01:50
+
Asaf Mesika 2013-01-08, 23:00
+
Mohit Anchlia 2013-01-06, 20:36
+
Adrien Mogenet 2013-01-06, 20:40
+
anil gupta 2013-01-06, 22:12
+
Anoop Sam John 2012-12-20, 03:33
+
Farah Karim 2012-12-25, 10:14
+
David Arthur 2012-12-20, 02:47
+
Anoop Sam John 2012-12-20, 03:44