Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Re: HBase - Secondary Index


+
anil gupta 2012-12-14, 08:41
+
Anoop Sam John 2012-12-14, 08:54
+
ramkrishna vasudevan 2012-12-14, 11:34
+
anil gupta 2012-12-14, 18:01
+
Anoop Sam John 2012-12-17, 04:02
+
anil gupta 2012-12-18, 08:28
+
Anoop Sam John 2012-12-18, 09:27
+
anil gupta 2012-12-19, 08:24
+
Michel Segel 2012-12-18, 09:02
+
Anoop Sam John 2012-12-18, 09:35
+
anil gupta 2012-12-19, 08:39
+
Shengjie Min 2012-12-27, 11:23
+
Anoop Sam John 2012-12-27, 11:30
+
Shengjie Min 2012-12-27, 13:07
+
Anoop John 2012-12-27, 15:54
+
ramkrishna vasudevan 2012-12-27, 16:11
+
Shengjie Min 2012-12-27, 16:29
+
Anoop Sam John 2012-12-28, 03:33
+
Mohit Anchlia 2012-12-28, 03:42
+
Anoop Sam John 2012-12-28, 04:14
+
Shengjie Min 2012-12-28, 10:55
+
Adrien Mogenet 2013-01-06, 20:30
+
Anoop Sam John 2013-01-07, 03:48
+
Mohit Anchlia 2013-01-07, 04:17
+
Anoop Sam John 2013-01-07, 13:49
+
Michael Segel 2013-01-08, 14:33
+
lars hofhansl 2013-01-09, 00:30
+
Michel Segel 2013-01-09, 01:30
+
anil gupta 2013-01-09, 01:28
+
Anoop Sam John 2013-01-09, 03:22
Copy link to this message
-
Re: HBase - Secondary Index
ramkrishna vasudevan 2013-01-09, 04:11
As far as i can see its more related to using the coprocessor framework in
this soln that helps us in a great way to avoid unnecessary RPC calls when
we go with Region level indexing.

Regards
Ram

On Wed, Jan 9, 2013 at 8:52 AM, Anoop Sam John <[EMAIL PROTECTED]> wrote:

> Totally agree with Lars.  The design came up as per our usage and data
> distribution style etc.
> Also the put performance we were not able to compromise. That is why the
> region collocation based region based indexing design came :)
> Also as we are having the indexing and index usage every thing happening
> at server side, there is no need for any change in the client part
> depending on what type of client u use. Java code or REST APIs or any
> thing.  Also MR based parallel scans any thing can be comparably easy I
> feel as there is absolutely no changes needed at client side.  :)
>
> As Anil said there will be pros and cons for every way and which one suits
> your usage, needs to be adopted. :)
>
> -Anoop-
> ________________________________________
> From: anil gupta [[EMAIL PROTECTED]]
> Sent: Wednesday, January 09, 2013 6:58 AM
> To: [EMAIL PROTECTED]; lars hofhansl
> Subject: Re: HBase - Secondary Index
>
> +1 on Lars comment.
>
> Either the client gets the rowkey from secondary table and then gets the
> real data from Primary Table. ** OR ** Send the request to all the RS(or
> region) hosting a region of primary table.
>
> Anoop is using the latter mechanism. Both the mechanism have their pros and
> cons. IMO, there is no outright winner.
>
> ~Anil Gupta
>
> On Tue, Jan 8, 2013 at 4:30 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
> > Different use cases.
> >
> >
> > For global point queries you want exactly what you said below.
> > For range scans across many rows you want Anoop's design. As usually it
> > depends.
> >
> >
> > The tradeoff is bringing a lot of unnecessary data to the client vs
> having
> > to contact each region (or at least each region server).
> >
> >
> > -- Lars
> >
> >
> >
> > ________________________________
> >  From: Michael Segel <[EMAIL PROTECTED]>
> > To: [EMAIL PROTECTED]
> > Sent: Tuesday, January 8, 2013 6:33 AM
> > Subject: Re: HBase - Secondary Index
> >
> > So if you're using an inverted table / index why on earth are you doing
> it
> > at the region level?
> >
> > I've tried to explain this to others over 6 months ago and its not really
> > a good idea.
> >
> > You're over complicating this and you will end up creating performance
> > bottlenecks when your secondary index is completely orthogonal to your
> row
> > key.
> >
> > To give you an example...
> >
> > Suppose you're CCCIS and you have a large database of auto insurance
> > claims that you've acquired over the years from your Pathways product.
> >
> > Your primary key would be a combination of the Insurance Company's ID and
> > their internal claim ID for the individual claim.
> > Your row would be all of the data associated to that claim.
> >
> > So now lets say you want to find the average cost to repair a front end
> > collision of an S80 Volvo.
> > The make and model of the car would be orthogonal to the initial key.
> This
> > means that the result set containing insurance records for Front End
> > collisions of S80 Volvos would be most likely evenly distributed across
> the
> > cluster's regions.
> >
> > If you used a series of inverted tables, you would be able to use a
> series
> > of get()s to get the result set from each index and then find their
> > intersections. (Note that you could also put them in sort order so that
> the
> > intersections would be fairly straight forward to find.
> >
> > Doing this at the region level isn't so simple.
> >
> > So I have to again ask why go through and over complicate things?
> >
> > Just saying...
> >
> > On Jan 7, 2013, at 7:49 AM, Anoop Sam John <[EMAIL PROTECTED]> wrote:
> >
> > > Hi,
> > > It is inverted index based on column(s) value(s)
> > > It will be region wise indexing. Can work when some one knows the
+
Mohit Anchlia 2013-01-09, 01:50
+
Asaf Mesika 2013-01-08, 23:00
+
Mohit Anchlia 2013-01-06, 20:36
+
Adrien Mogenet 2013-01-06, 20:40
+
anil gupta 2013-01-06, 22:12
+
Anoop Sam John 2012-12-20, 03:33
+
Farah Karim 2012-12-25, 10:14
+
David Arthur 2012-12-20, 02:47
+
Anoop Sam John 2012-12-20, 03:44