Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> secondary index feature

Copy link to this message
Re: secondary index feature
No worries, Henning. It's a little deceiving, because the coprocessors that
do the index maintenance are invoked on a per region basis. However, the
writes/puts that they do for the maintenance end up going over the wire if

Let me know if you have other questions. It'd be good to understand your
use case more to see if Phoenix is a good fit - we're definitely open to
collaborating. FYI, we're in the process of moving to Apache, so will keep
you posted once the transition is complete.


On Fri, Jan 3, 2014 at 1:11 PM, Henning Blohm <[EMAIL PROTECTED]>wrote:

> Hi James,
> this is a little embarassing... I even browsed through the code and read
> it as implementing a region level index.
> But now at least I get the restrictions mentioned for using the covered
> indexes.
> Thanks for clarifying. Guess I need to browse the code a little harder ;-)
> Henning
> On 03.01.2014 21:53, James Taylor wrote:
>> Hi Henning,
>> Phoenix maintains a global index. It is essentially maintaining another
>> HBase table for you with a different row key (and a subset of your data
>> table columns that are "covered"). When an index is used by Phoenix, it is
>> *exactly* like querying a data table (that's what Phoenix does - it ends
>> up
>> issuing a Phoenix query against a Phoenix table that happens to be an
>> index
>> table).
>> The hit you take for a global index is at write time - we need to look up
>> the prior state of the rows being updated to do the index maintenance.
>> Then
>> we need to do a write to the index table. The upside is that there's no
>> hit
>> at read/query time (we don't yet attempt to join from the index table back
>> to the data table - if a query is using columns that aren't in the index,
>> it simply won't be used). More here:
>> https://github.com/forcedotcom/phoenix/wiki/Secondary-Indexing
>> Thanks,
>> James
>> On Fri, Jan 3, 2014 at 12:46 PM, Henning Blohm <[EMAIL PROTECTED]>
>> wrote:
>>  When scanning in order of an index and you use RLI, it seems, there is no
>>> alternative but to involve all regions - and essentially this should
>>> happen
>>> in parallel as otherwise you might not get what you wanted. Also, for a
>>> single Get, it seems (as Lars pointed out in https://issues.apache.org/
>>> jira/browse/HBASE-2038) that you have to consult all regions.
>>> When that parallelism is no problem (small number of servers) it will
>>> actually help single scan performance (regions can provide their share in
>>> parallel).
>>> A high number of concurrent client requests leads to the same number of
>>> requests on all regions and its multiple of connections to be maintained
>>> by
>>> the client.
>>> My assumption is that that will eventually lead to a scalability problem
>>> -
>>> when, say, having a 100 region servers or so in place. I was wondering,
>>> if
>>> anyone has experience with that.
>>> That will be perfectly acceptable for many use cases that benefit from
>>> the
>>> scan (and hence query) performance more than they suffer from the load
>>> problem. Other use cases have less requirements on scans and query
>>> flexibility but rather want to preserve the quality that a Get has fixed
>>> resource usage.
>>> Btw.: I was convinces that Phoenix is keeping indexes on the region
>>> level.
>>> Is that not so?
>>> Thanks,
>>> Henning
>>> On 03.01.2014 17:57, Anoop John wrote:
>>>  In case of HBase normal scan as we know, regions will be scanned
>>>> sequentially.  Pheonix having parallel scan impls in it.  When RLI is
>>>> used
>>>> and we make use of index completely at server side, it is irrespective
>>>> of
>>>> client scan ways. Sequential or parallel, using java or any other client
>>>> layer or using SQL layer like Phoenix, using MR or not...  all client
>>>> side
>>>> dont have to worry abt this but the usage will be fully at server end.
>>>> Yes when parallel scan is done on regions, RLI might perform much