Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> A general question on maxVersion handling when we have Secondary index tables


Copy link to this message
-
RE: A general question on maxVersion handling when we have Secondary index tables
Reg , the collocation part of the main table regions and index table
regions, that is pretty much necessary.

Reg, how secondary index feature can be supported either as external or
core-> I would say that seeing the current things that we have done it can
be like security means secondary index can be supplied along with the core
and if we base our impl based on coprocessors overall changes to the kernel
seems to be minimal and if we are ok in having secondary index feature along
with the core then those changes become inevitable and at the same time
useful too.

Regards
Ram
> -----Original Message-----
> From: Ted Yu [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, August 29, 2012 9:50 PM
> To: [EMAIL PROTECTED]
> Subject: Re: A general question on maxVersion handling when we have
> Secondary index tables
>
> For the secondary index based on state portion of address example, I
> wonder
> if we can achieve comparable performance using scan with proper filter.
>
> Cheers
>
> On Wed, Aug 29, 2012 at 9:11 AM, Jonathan Hsieh <[EMAIL PROTECTED]>
> wrote:
>
> > Ted,
> >
> > Ram's summarizes the concern succinctly -- to answer the specific
> question
> > it isn't for versions -- it is for the case where a secondary index
> can
> > point to many many primary rows.  (let's say we have a rowkey userid
> and we
> > want to have a 2ndary index based on the state portion of there
> address
> >  --- we'll end up pointing to many many primary rows).
> >
> > Jon.
> >
> >
> >
> > On Wed, Aug 29, 2012 at 7:15 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
> >
> > > Thanks for the detailed response, Jon.
> > >
> > > bq. it would mean that a query based on secondary index would
> > > potentially have to hit every region server that has a region in
> the
> > > primary table.
> > >
> > > Can you elaborate on the above a little bit ?
> > > Is this because secondary index would point us to more than one
> region in
> > > the data table because several versions are saved for the same row
> ?
> > >
> > > My thinking was to ease management of simultaneous (data and index)
> > region
> > > split through region colocation.
> > >
> > > Cheers
> > >
> > > On Wed, Aug 29, 2012 at 6:47 AM, Jonathan Hsieh <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > > I'm more of a fan of having secondary indexes added as an
> external
> > > feature
> > > > (coproc or new client library on top of our current client
> library) and
> > > > focusing on only adding apis necessary to make 2ndary indexes
> possible
> > > and
> > > > correct on/in HBase.  There are many different use patterns and
> > > > requirements and one style of secondary index will not be good
> for
> > > > everything.  Do we only care about this working well for highly
> > > selectivity
> > > > keys?  What are possible indexes (col name, value, value prefix,
> > > everything
> > > > our filters support?)  Do we care more about writes or reads,
> ACID
> > > > correctness or speed, etc?  Also, there are several questions
> about how
> > > we
> > > > handle other features in conjunction with 2ndary indexes:
> replication,
> > > bulk
> > > > load, snapshots, to name a few.
> > > >
> > > > Maybe it makes sense to spend some time defining what we want to
> index
> > > > secondarily and what a user api to this external api would be.
> Then we
> > > > could have the different implementations under-the-covers, and
> allow
> > for
> > > > users to swap implementations for the tradeoffs that fit their
> use
> > cases.
> > > >  It wouldn't be free to change but hopefully "easy" from a user
> point
> > of
> > > > view.
> > > >
> > > > Personally, I've tend to favor more of a percolator-style
> > implementation
> > > --
> > > > it is a client library and built on top of hbase. This approach
> seems
> > to
> > > be
> > > > more "HBase-style" with it's emphasis consistency and atomicity,
> and
> > > seems
> > > > to require only a few mondifications to HBase core. Sure it
> likely
> > slower
> > > > than my read of Jesse's proposal, but it seems always always
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB