Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Re: HBase - Secondary Index

Copy link to this message
RE: HBase - Secondary Index
Totally agree with Lars.  The design came up as per our usage and data distribution style etc.
Also the put performance we were not able to compromise. That is why the region collocation based region based indexing design came :)
Also as we are having the indexing and index usage every thing happening at server side, there is no need for any change in the client part depending on what type of client u use. Java code or REST APIs or any thing.  Also MR based parallel scans any thing can be comparably easy I feel as there is absolutely no changes needed at client side.  :)

As Anil said there will be pros and cons for every way and which one suits your usage, needs to be adopted. :)

From: anil gupta [[EMAIL PROTECTED]]
Sent: Wednesday, January 09, 2013 6:58 AM
To: [EMAIL PROTECTED]; lars hofhansl
Subject: Re: HBase - Secondary Index

+1 on Lars comment.

Either the client gets the rowkey from secondary table and then gets the
real data from Primary Table. ** OR ** Send the request to all the RS(or
region) hosting a region of primary table.

Anoop is using the latter mechanism. Both the mechanism have their pros and
cons. IMO, there is no outright winner.

~Anil Gupta

On Tue, Jan 8, 2013 at 4:30 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> Different use cases.
> For global point queries you want exactly what you said below.
> For range scans across many rows you want Anoop's design. As usually it
> depends.
> The tradeoff is bringing a lot of unnecessary data to the client vs having
> to contact each region (or at least each region server).
> -- Lars
> ________________________________
>  From: Michael Segel <[EMAIL PROTECTED]>
> Sent: Tuesday, January 8, 2013 6:33 AM
> Subject: Re: HBase - Secondary Index
> So if you're using an inverted table / index why on earth are you doing it
> at the region level?
> I've tried to explain this to others over 6 months ago and its not really
> a good idea.
> You're over complicating this and you will end up creating performance
> bottlenecks when your secondary index is completely orthogonal to your row
> key.
> To give you an example...
> Suppose you're CCCIS and you have a large database of auto insurance
> claims that you've acquired over the years from your Pathways product.
> Your primary key would be a combination of the Insurance Company's ID and
> their internal claim ID for the individual claim.
> Your row would be all of the data associated to that claim.
> So now lets say you want to find the average cost to repair a front end
> collision of an S80 Volvo.
> The make and model of the car would be orthogonal to the initial key. This
> means that the result set containing insurance records for Front End
> collisions of S80 Volvos would be most likely evenly distributed across the
> cluster's regions.
> If you used a series of inverted tables, you would be able to use a series
> of get()s to get the result set from each index and then find their
> intersections. (Note that you could also put them in sort order so that the
> intersections would be fairly straight forward to find.
> Doing this at the region level isn't so simple.
> So I have to again ask why go through and over complicate things?
> Just saying...
> On Jan 7, 2013, at 7:49 AM, Anoop Sam John <[EMAIL PROTECTED]> wrote:
> > Hi,
> > It is inverted index based on column(s) value(s)
> > It will be region wise indexing. Can work when some one knows the rowkey
> range or NOT.
> >
> > -Anoop-
> > ________________________________________
> > From: Mohit Anchlia [[EMAIL PROTECTED]]
> > Sent: Monday, January 07, 2013 9:47 AM
> > Subject: Re: HBase - Secondary Index
> >
> > Hi Anoop,
> >
> > Am I correct in understanding that this indexing mechanism is only
> > applicable when you know the row key? It's not an inverted index truly
> > based on the column value.
Thanks & Regards,
Anil Gupta