Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> secondary index feature


+
Henning Blohm 2013-12-22, 10:11
+
Ted Yu 2013-12-22, 13:34
+
Pradeep Gollakota 2013-12-22, 15:53
+
Pradeep Gollakota 2013-12-22, 16:00
+
Ted Yu 2013-12-22, 16:09
+
Anoop John 2013-12-22, 16:41
+
Henning Blohm 2013-12-23, 11:13
+
lars hofhansl 2013-12-22, 22:37
+
Henning Blohm 2013-12-23, 11:47
+
James Taylor 2013-12-23, 18:01
+
Jesse Yates 2013-12-23, 19:10
+
Henning Blohm 2013-12-24, 11:18
+
Henning Blohm 2014-01-03, 09:41
+
Anoop John 2014-01-03, 09:52
+
rajeshbabu chintaguntla 2014-01-03, 10:19
Copy link to this message
-
Re: secondary index feature
Are the regions scanned in parallel?

On Friday, January 3, 2014, rajeshbabu chintaguntla wrote:

>
> Here are some performance numbers with RLI.
>
> No Region servers : 4
> Data per region    : 2 GB
>
> Regions/RS| Total regions|  Blocksize(kb) |No#rows matching values| Time
> taken(sec)|
>  50 | 200| 64|199|102
> 50  | 200|8|199| 35
> 100|400 | 8| 350| 95
> 200| 800| 8| 353| 153
>
> Without secondary index scan is taking in hours.
>
>
> Thanks,
> Rajeshbabu
> ________________________________________
> From: Anoop John [[EMAIL PROTECTED] <javascript:;>]
> Sent: Friday, January 03, 2014 3:22 PM
> To: [EMAIL PROTECTED] <javascript:;>
> Subject: Re: secondary index feature
>
> >Is there any data on how RLI (or in particular Phoenix) query throughput
> correlates with the number of region servers assuming homogeneously
> distributed data?
>
> Phoenix is yet to add RLI. Now it is having global indexing only. Correct
> James?
>
> RLI impl from Huawei (HIndex) is having some numbers wrt regions.. But I
> doubt whether it is there large no# RSs.  Do you have some data Rajesh
> Babu?
>
> -Anoop-
>
> On Fri, Jan 3, 2014 at 3:11 PM, Henning Blohm <[EMAIL PROTECTED]
> >wrote:
>
> > Jesse, James, Lars,
> >
> > after looking around a bit and in particular looking into Phoenix (which
> I
> > find very interesting), assuming that you want a secondary indexing on
> > HBASE without adding other infrastructure, there seems to be not a lot of
> > choice really: Either go with a region-level (and co-processor based)
> > indexing feature (Phoenix, Huawei, is IHBase dead?) or add an index table
> > to store (index value, entity key) pairs.
> >
> > The main concern I have with region-level indexing (RLI) is that Gets
> > potentially require to visit all regions. Compared to global index tables
> > this seems to flatten the read-scalability curve of the cluster. In our
> > case, we have a large data set (hence HBASE) that will be queried (mostly
> > point-gets via an index) in some linear correlation with its size.
> >
> > Is there any data on how RLI (or in particular Phoenix) query throughput
> > correlates with the number of region servers assuming homogeneously
> > distributed data?
> >
> > Thanks,
> > Henning
> >
> >
> >
> >
> > On 24.12.2013 12:18, Henning Blohm wrote:
> >
> >>  All that sounds very promising. I will give it a try and let you know
> >> how things worked out.
> >>
> >> Thanks,
> >> Henning
> >>
> >> On 12/23/2013 08:10 PM, Jesse Yates wrote:
> >>
> >>>  The work that James is referencing grew out of the discussions Lars
> >>> and I
> >>> had (which lead to those blog posts). The solution we implement is
> >>> designed
> >>> to be generic, as James mentioned above, but was written with all the
> >>> hooks
> >>> necessary for Phoenix to do some really fast updates (or skipping
> updates
> >>> in the case where there is no change).
> >>>
> >>> You should be able to plug in your own simple index builder (there is
> >>> an example
> >>> in the phoenix codebase<https://github.com/forcedotcom/phoenix/tree/
> >>> master/src/main/java/com/salesforce/hbase/index/covered/example>)
> >>> to basic solution which supports the same transactional guarantees as
> >>> HBase
> >>> (per row) + data guarantees across the index rows. There are more
> details
> >>> in the presentations James linked.
> >>>
> >>> I'd love you see if your implementation can fit into the framework we
> >>> wrote
> >>> - we would be happy to work to see if it needs some more hooks or
> >>> modifications - I have a feeling this is pretty much what you guys will
> >>> need
> >>>
> >>> -Jesse
> >>>
> >>>
> >>> On Mon, Dec 23, 2013 at 10:01 AM, James Taylor<[EMAIL PROTECTED]>
> >>> wrote:
> >>>
> >>>  Henning,
> >>>> Jesse Yates wrote the back-end of our global secondary indexing system
> >>>> in
> >>>> Phoenix. He designed it as a separate, pluggable module with no
> Phoenix
> >>>> dependencies. Here's an overview of the feature:
> >>>> https://github.com/forcedotcom/phoenix/wiki/Secondary-Indexing. The
+
rajeshbabu chintaguntla 2014-01-03, 14:05
+
Anoop John 2014-01-03, 16:57
+
Henning Blohm 2014-01-03, 20:46
+
James Taylor 2014-01-03, 20:53
+
Henning Blohm 2014-01-03, 21:11
+
James Taylor 2014-01-03, 21:34
+
Henning Blohm 2014-01-04, 18:32
+
Anoop John 2014-01-03, 11:01
+
ramkrishna vasudevan 2014-01-03, 13:48
+
Ted Yu 2014-01-03, 14:02
+
Henning Blohm 2013-12-23, 19:28
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB