Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Re: HBase - Secondary Index


+
anil gupta 2012-12-14, 08:41
+
Anoop Sam John 2012-12-14, 08:54
+
ramkrishna vasudevan 2012-12-14, 11:34
+
anil gupta 2012-12-14, 18:01
+
Anoop Sam John 2012-12-17, 04:02
+
anil gupta 2012-12-18, 08:28
+
Anoop Sam John 2012-12-18, 09:27
+
anil gupta 2012-12-19, 08:24
+
Michel Segel 2012-12-18, 09:02
+
Anoop Sam John 2012-12-18, 09:35
Copy link to this message
-
Re: HBase - Secondary Index
Hi Michael,

Please find my replies inline.

Thanks,
Anil

On Tue, Dec 18, 2012 at 1:02 AM, Michel Segel <[EMAIL PROTECTED]>wrote:

> Just a couple of questions...
>
> First, since you don't have any natural secondary indices, you can create
> one from a couple of choices. Keeping it simple, you choose an inverted
> table as your index.
>
Reasons for not creating a inverted table:
1. There can be millions of columns corresponding to a rowkey in my
secondary index. In future it can even grow more.
2. While using secondary index, we are also planning to have filtering on
the basis of other non-rowkey columns.
For example: 1 Row of Secondary table might look like this:
Rowkey: cf:PrimarytableRowKey=x, cf:customerFirstName=xyz,
cf:customerAddress=123, Union Sq, LA
My primary table has around 50 columns and in secondary table i duplicate
two columns to used along with secondary index for filtering.

>
> In doing so, you have one column containing all of the row ids for a given
> value.
> This means that it is a simple get().
>
> My question is that since you don't have any formal SQL syntax, how are
> you doing this all server side?
>
As Anoop said, I am not doing the index data scan at the server side. He
scan the index table data back to client and from client doing gets to get
the main table data.

>
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Dec 18, 2012, at 2:28 AM, anil gupta <[EMAIL PROTECTED]> wrote:
>
> > Hi Anoop,
> >
> > Please find my reply inline.
> >
> > Thanks,
> > Anil Gupta
> >
> > On Sun, Dec 16, 2012 at 8:02 PM, Anoop Sam John <[EMAIL PROTECTED]>
> wrote:
> >
> >> Hi Anil
> >>                During the scan, there is no need to fetch any index data
> >> to client side. So there is no need to create any scanner on the index
> >> table at the client side. This happens at the server side.
> >
> >
> >>
> >> For the Scan on the main table with condition on timestamp and customer
> >> id, a scanner to be created with Filters. Yes like normal when there is
> no
> >> secondary index. So this scan from the client will go through all the
> >> regions in the main table.
> >
> >
> > Anil: Do you mean that if the table is spread across 50 region servers in
> > 60 node cluster then we need to send a scan request to all the 50 RS.
> > Right? Doesn't it sounds expensive? IMHO you were not doing this in your
> > solution. Your solution looked cleaner than this since you exactly knew
> > which Node you need to go to for querying while using secondary index due
> > to co-location(due to static begin part for secondary table rowkey) of
> > region of primary table and secondary index table. My problem is little
> > more complicated due to the constraints that: I cannot have a "static
> begin
> > part" in the rowkey of my secondary table.
> >
> > When it scans one particular region say (x,y] on the main table, using
> the
> >> CP we can get the index table region object corresponding to this main
> >> table region from the RS.  There is no issue in creating the static
> part of
> >> the rowkey. You know 'x' is the region start key. Then at the server
> side
> >> will create a scanner on the index region directly and here we can
> specify
> >> the startkey. 'x' + <timestamp value> + <customer id>..  Using the
> results
> >> from the index scan we will make reseek on the main region to the exact
> >> rows where the data what we are interested in is available. So there
> wont
> >> be a full region data scan happening.
> >
> >> When in the cases where only timestamp is there but no customer id, it
> >> will be simple again. Create a scanner on the main table with only one
> >> filter. At the CP side the scanner on the index region will get created
> >> with startkey as 'x' + <timestamp value>..    When you create the scan
> >> object and set startRow on that it need not be the full rowkey. It can
> be
> >> part of the rowkey also. Yes like prefix.
> >>
> >> Hope u got it now :)
> > Anil: I hope now we are on same page. Thanks a lot for your valuable time

Thanks & Regards,
Anil Gupta
+
Shengjie Min 2012-12-27, 11:23
+
Anoop Sam John 2012-12-27, 11:30
+
Shengjie Min 2012-12-27, 13:07
+
Anoop John 2012-12-27, 15:54
+
ramkrishna vasudevan 2012-12-27, 16:11
+
Shengjie Min 2012-12-27, 16:29
+
Anoop Sam John 2012-12-28, 03:33
+
Mohit Anchlia 2012-12-28, 03:42
+
Anoop Sam John 2012-12-28, 04:14
+
Shengjie Min 2012-12-28, 10:55
+
Adrien Mogenet 2013-01-06, 20:30
+
Anoop Sam John 2013-01-07, 03:48
+
Mohit Anchlia 2013-01-07, 04:17
+
Anoop Sam John 2013-01-07, 13:49
+
Michael Segel 2013-01-08, 14:33
+
lars hofhansl 2013-01-09, 00:30
+
Michel Segel 2013-01-09, 01:30
+
anil gupta 2013-01-09, 01:28
+
Anoop Sam John 2013-01-09, 03:22
+
ramkrishna vasudevan 2013-01-09, 04:11
+
Mohit Anchlia 2013-01-09, 01:50
+
Asaf Mesika 2013-01-08, 23:00
+
Mohit Anchlia 2013-01-06, 20:36
+
Adrien Mogenet 2013-01-06, 20:40
+
anil gupta 2013-01-06, 22:12
+
Anoop Sam John 2012-12-20, 03:33
+
Farah Karim 2012-12-25, 10:14
+
David Arthur 2012-12-20, 02:47
+
Anoop Sam John 2012-12-20, 03:44
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB