Anoop Sam John 2012-12-04, 08:10
Jan Van Besien 2012-12-04, 19:24
Anoop Sam John 2012-12-05, 11:04
ramkrishna vasudevan 2012-12-05, 11:28
ramkrishna vasudevan 2012-12-05, 11:40
Jan Van Besien 2012-12-05, 15:12
ramkrishna vasudevan 2012-12-05, 17:42
Andrey Stepachev 2012-12-05, 11:52
Anoop Sam John 2012-12-05, 12:55
Andrey Stepachev 2012-12-05, 15:59
Anoop John 2012-12-05, 17:54
Jonathan Hsieh 2012-12-05, 19:23
lars hofhansl 2012-12-05, 21:03
Andrew Purtell 2012-12-06, 04:41
Anoop Sam John 2012-12-06, 04:57
ramkrishna vasudevan 2012-12-06, 13:05
Doug Meil 2012-12-05, 21:58
-Re: HBase - Secondary Index
Anoop John 2012-12-06, 01:17
Thanks for the look and your concerns and comments..
@Jon Yes the consistency is maintained now. As the put to both the tables
in the same RS this was comparably easy job..More granular details on how
this is being done , I can share in later mails.
Yes as per Jon when the cluster is really big with 100s of RSs, for the
scan it need to visit every region in RSs. But may be from some regions no
data with the condition present and the scan will end immediately.. We
were also thinking about ways to add some custom blooms so that a special
bloom on the index region can tell whether a column value is present in
that region or not. Didn't try with this and the perf comparison.. Now we
are in the process of profiling and fine tuning.. What we felt is that
the per region indexing would be still better than the full table scan..
Also one main reason for not doing the client based approach where a global
ordered indexing will be there, is the put performance.. In the past when
we have done a fully client based solution ( something like Lily) the put
performance was not meeting our goal as one put was potentially making many
puts.. In our customer case some tables having upto 5 indices...
@Ted, at production level it is yet to get used.. The apps using the
indexing is under the way now...
Good to hear from all so that we can also get more scenarios and concerns
and opinions from experts..:)
On Thu, Dec 6, 2012 at 3:28 AM, Doug Meil <[EMAIL PROTECTED]>wrote:
> re: "It seems everybody wants secondary indexes in HBase. The problem is
> that most folks don't agree what that actually means."
> Agreed. And often times when people say "secondary indexing" they may be
> talking about an access path on something that isn't represented in the
> key. So you'd not just need a "row key schema" (which is what we were
> talking about in HBASE-7221), but a schema for the columns too.
> On 12/5/12 4:03 PM, "lars hofhansl" <[EMAIL PROTECTED]> wrote:
> >I'd say "it depends".
> >It seems everybody wants secondary indexes in HBase. The problem is that
> >most folks don't agree what that actually means.
> >The most interesting problem and discussion point (IMHO) is that HBase
> >would need some form of schema description.
> >See also HBASE-7221. Assuming we had something like that, it could be a
> >building block for a simple built-in secondary index solution.
> >In the end you're probably right and we cannot prescribe a single
> >secondary index solution that will fit all use cases.
> >-- Lars
> >----- Original Message -----
> >From: Jonathan Hsieh <[EMAIL PROTECTED]>
> >To: [EMAIL PROTECTED]
> >Sent: Wednesday, December 5, 2012 11:23 AM
> >Subject: Re: HBase - Secondary Index
> >I personally feel that we should only add the primitive apis necessary to
> >make this work (and if non are required, all the better!), and to try to
> >keep secondary indexing work it as a separate project on top of hbase
> >because there are many possible valid architectures and implementations.
> >I have two question areas -- one touched upon in previous messages -- what
> >logging and how do we get consistency guarantee's do we get bewteen the
> >index and primary?
> >The other has to do with scalability. I'm not sure I interpreted the
> >correctly, but from the slides 8 and 13, is the architecture such that
> >primary table region has a corresponding index table region?
> >Is slide 14 a comparison of a full table scan vs the indexed lookups on 4
> >rs's? What happens if we go up to 20, or 100 rs's? If I'm right about
> >per index region per table region, I have a feeling this isn't going to
> >scale well with a large number of regions (since it would potentially have
> >to talk to each region and essentially every region server).
> >On Wed, Dec 5, 2012 at 9:54 AM, Anoop John <[EMAIL PROTECTED]> wrote:
Nick Dimiduk 2012-12-18, 17:48
Andrew Purtell 2012-12-19, 00:51
ramkrishna vasudevan 2012-12-19, 04:24