Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - secondary index feature


+
Henning Blohm 2013-12-22, 10:11
+
Ted Yu 2013-12-22, 13:34
+
Pradeep Gollakota 2013-12-22, 15:53
+
Pradeep Gollakota 2013-12-22, 16:00
+
Ted Yu 2013-12-22, 16:09
+
Anoop John 2013-12-22, 16:41
+
Henning Blohm 2013-12-23, 11:13
+
lars hofhansl 2013-12-22, 22:37
+
Henning Blohm 2013-12-23, 11:47
+
James Taylor 2013-12-23, 18:01
+
Jesse Yates 2013-12-23, 19:10
+
Henning Blohm 2013-12-24, 11:18
+
Henning Blohm 2014-01-03, 09:41
+
Anoop John 2014-01-03, 09:52
+
rajeshbabu chintaguntla 2014-01-03, 10:19
Copy link to this message
-
Re: secondary index feature
Asaf Mesika 2014-01-03, 13:56
Are the regions scanned in parallel?

On Friday, January 3, 2014, rajeshbabu chintaguntla wrote:

>
> Here are some performance numbers with RLI.
>
> No Region servers : 4
> Data per region    : 2 GB
>
> Regions/RS| Total regions|  Blocksize(kb) |No#rows matching values| Time
> taken(sec)|
>  50 | 200| 64|199|102
> 50  | 200|8|199| 35
> 100|400 | 8| 350| 95
> 200| 800| 8| 353| 153
>
> Without secondary index scan is taking in hours.
>
>
> Thanks,
> Rajeshbabu
> ________________________________________
> From: Anoop John [[EMAIL PROTECTED] <javascript:;>]
> Sent: Friday, January 03, 2014 3:22 PM
> To: [EMAIL PROTECTED] <javascript:;>
> Subject: Re: secondary index feature
>
> >Is there any data on how RLI (or in particular Phoenix) query throughput
> correlates with the number of region servers assuming homogeneously
> distributed data?
>
> Phoenix is yet to add RLI. Now it is having global indexing only. Correct
> James?
>
> RLI impl from Huawei (HIndex) is having some numbers wrt regions.. But I
> doubt whether it is there large no# RSs.  Do you have some data Rajesh
> Babu?
>
> -Anoop-
>
> On Fri, Jan 3, 2014 at 3:11 PM, Henning Blohm <[EMAIL PROTECTED]
> >wrote:
>
> > Jesse, James, Lars,
> >
> > after looking around a bit and in particular looking into Phoenix (which
> I
> > find very interesting), assuming that you want a secondary indexing on
> > HBASE without adding other infrastructure, there seems to be not a lot of
> > choice really: Either go with a region-level (and co-processor based)
> > indexing feature (Phoenix, Huawei, is IHBase dead?) or add an index table
> > to store (index value, entity key) pairs.
> >
> > The main concern I have with region-level indexing (RLI) is that Gets
> > potentially require to visit all regions. Compared to global index tables
> > this seems to flatten the read-scalability curve of the cluster. In our
> > case, we have a large data set (hence HBASE) that will be queried (mostly
> > point-gets via an index) in some linear correlation with its size.
> >
> > Is there any data on how RLI (or in particular Phoenix) query throughput
> > correlates with the number of region servers assuming homogeneously
> > distributed data?
> >
> > Thanks,
> > Henning
> >
> >
> >
> >
> > On 24.12.2013 12:18, Henning Blohm wrote:
> >
> >>  All that sounds very promising. I will give it a try and let you know
> >> how things worked out.
> >>
> >> Thanks,
> >> Henning
> >>
> >> On 12/23/2013 08:10 PM, Jesse Yates wrote:
> >>
> >>>  The work that James is referencing grew out of the discussions Lars
> >>> and I
> >>> had (which lead to those blog posts). The solution we implement is
> >>> designed
> >>> to be generic, as James mentioned above, but was written with all the
> >>> hooks
> >>> necessary for Phoenix to do some really fast updates (or skipping
> updates
> >>> in the case where there is no change).
> >>>
> >>> You should be able to plug in your own simple index builder (there is
> >>> an example
> >>> in the phoenix codebase<https://github.com/forcedotcom/phoenix/tree/
> >>> master/src/main/java/com/salesforce/hbase/index/covered/example>)
> >>> to basic solution which supports the same transactional guarantees as
> >>> HBase
> >>> (per row) + data guarantees across the index rows. There are more
> details
> >>> in the presentations James linked.
> >>>
> >>> I'd love you see if your implementation can fit into the framework we
> >>> wrote
> >>> - we would be happy to work to see if it needs some more hooks or
> >>> modifications - I have a feeling this is pretty much what you guys will
> >>> need
> >>>
> >>> -Jesse
> >>>
> >>>
> >>> On Mon, Dec 23, 2013 at 10:01 AM, James Taylor<[EMAIL PROTECTED]>
> >>> wrote:
> >>>
> >>>  Henning,
> >>>> Jesse Yates wrote the back-end of our global secondary indexing system
> >>>> in
> >>>> Phoenix. He designed it as a separate, pluggable module with no
> Phoenix
> >>>> dependencies. Here's an overview of the feature:
> >>>> https://github.com/forcedotcom/phoenix/wiki/Secondary-Indexing. The
+
rajeshbabu chintaguntla 2014-01-03, 14:05
+
Anoop John 2014-01-03, 16:57
+
Henning Blohm 2014-01-03, 20:46
+
James Taylor 2014-01-03, 20:53
+
Henning Blohm 2014-01-03, 21:11
+
James Taylor 2014-01-03, 21:34
+
Henning Blohm 2014-01-04, 18:32
+
Anoop John 2014-01-03, 11:01
+
ramkrishna vasudevan 2014-01-03, 13:48
+
Ted Yu 2014-01-03, 14:02
+
Henning Blohm 2013-12-23, 19:28