Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Re: HBase - Secondary Index


+
anil gupta 2012-12-14, 08:41
+
Anoop Sam John 2012-12-14, 08:54
+
ramkrishna vasudevan 2012-12-14, 11:34
+
anil gupta 2012-12-14, 18:01
+
Anoop Sam John 2012-12-17, 04:02
+
anil gupta 2012-12-18, 08:28
+
Anoop Sam John 2012-12-18, 09:27
+
anil gupta 2012-12-19, 08:24
+
Michel Segel 2012-12-18, 09:02
+
Anoop Sam John 2012-12-18, 09:35
+
anil gupta 2012-12-19, 08:39
+
Shengjie Min 2012-12-27, 11:23
+
Anoop Sam John 2012-12-27, 11:30
+
Shengjie Min 2012-12-27, 13:07
+
Anoop John 2012-12-27, 15:54
+
ramkrishna vasudevan 2012-12-27, 16:11
+
Shengjie Min 2012-12-27, 16:29
+
Anoop Sam John 2012-12-28, 03:33
+
Mohit Anchlia 2012-12-28, 03:42
Copy link to this message
-
RE: HBase - Secondary Index
Anoop Sam John 2012-12-28, 04:14
> Do you have link to that presentation?

http://hbtc2012.hadooper.cn/subject/track4TedYu4.pdf

-Anoop-

________________________________________
From: Mohit Anchlia [[EMAIL PROTECTED]]
Sent: Friday, December 28, 2012 9:12 AM
To: [EMAIL PROTECTED]
Subject: Re: HBase - Secondary Index

On Thu, Dec 27, 2012 at 7:33 PM, Anoop Sam John <[EMAIL PROTECTED]> wrote:

> Yes as you say when the no of rows to be returned is becoming more and
> more the latency will be becoming more.  seeks within an HFile block is
> some what expensive op now. (Not much but still)  The new encoding prefix
> trie will be a huge bonus here. There the seeks will be flying.. [Ted also
> presented this in the Hadoop China]  Thanks to Matt... :)  I am trying to
> measure the scan performance with this new encoding . Trying to back port a
> simple patch for 94 version just for testing...   Yes when the no of
> results to be returned is more and more any index will become less
> performing as per my study  :)
>
> Do you have link to that presentation?
> >btw, quick question- in your presentation, the scale there is seconds or
> mill-seconds:)
>
> It is seconds.  Dont consider the exact values. What is the % of increase
> in latency is important :) Those were not high end machines.
>
> -Anoop-
> ________________________________________
> From: Shengjie Min [[EMAIL PROTECTED]]
> Sent: Thursday, December 27, 2012 9:59 PM
> To: [EMAIL PROTECTED]
> Subject: Re: HBase - Secondary Index
>
>  >Didnt follow u completely here. There wont be any get() happening.. As
> the
> >exact rowkey in a region we get from the index table, we can seek to the
> >exact position and return that row.
>
> Sorry, When I misused "get()" here, I meant seeking. Yes, if it's just
> small number of rows returned, this works perfect. As you said you will get
> the exact rowkey positions per region, and simply seek them. I was trying
> to work out the case that when the number of result rows increases
> massively. Like in Anil's case, he wants to do a scan query against the
> 2ndary index(timestamp): "select all rows from timestamp1 to timestamp2"
> given no customerId provided. During that time period, he might have a big
> chunk of rows from different customerIds. The index table returns a lot of
> rowkey positions for different customerIds (I believe they are scattered in
> different regions), then you end up seeking all different positions in
> different regions and return all the rows needed. According to your
> presentation page14 - Performance Test Results (Scan), without index, it's
> a linear increase as result rows # increases. on the other hand, with
> index, time spent climbs up way quicker than the case without index.
>
> btw, quick question- in your presentation, the scale there is seconds or
> mill-seconds:)
>
> - Shengjie
>
>
> On 27 December 2012 15:54, Anoop John <[EMAIL PROTECTED]> wrote:
>
> > >how the massive number of get() is going to
> > perform againt the main table
> >
> > Didnt follow u completely here. There wont be any get() happening.. As
> the
> > exact rowkey in a region we get from the index table, we can seek to the
> > exact position and return that row.
> >
> > -Anoop-
> >
> > On Thu, Dec 27, 2012 at 6:37 PM, Shengjie Min <[EMAIL PROTECTED]>
> > wrote:
> >
> > > how the massive number of get() is going to
> > > perform againt the main table
> > >
> >
>
>
>
> --
> All the best,
> Shengjie Min
>
+
Shengjie Min 2012-12-28, 10:55
+
Adrien Mogenet 2013-01-06, 20:30
+
Anoop Sam John 2013-01-07, 03:48
+
Mohit Anchlia 2013-01-07, 04:17
+
Anoop Sam John 2013-01-07, 13:49
+
Michael Segel 2013-01-08, 14:33
+
lars hofhansl 2013-01-09, 00:30
+
Michel Segel 2013-01-09, 01:30
+
anil gupta 2013-01-09, 01:28
+
Anoop Sam John 2013-01-09, 03:22
+
ramkrishna vasudevan 2013-01-09, 04:11
+
Mohit Anchlia 2013-01-09, 01:50
+
Asaf Mesika 2013-01-08, 23:00
+
Mohit Anchlia 2013-01-06, 20:36
+
Adrien Mogenet 2013-01-06, 20:40
+
anil gupta 2013-01-06, 22:12
+
Anoop Sam John 2012-12-20, 03:33
+
Farah Karim 2012-12-25, 10:14
+
David Arthur 2012-12-20, 02:47
+
Anoop Sam John 2012-12-20, 03:44