Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> secondary index feature


+
Henning Blohm 2013-12-22, 10:11
+
Ted Yu 2013-12-22, 13:34
+
Pradeep Gollakota 2013-12-22, 15:53
+
Pradeep Gollakota 2013-12-22, 16:00
+
Ted Yu 2013-12-22, 16:09
+
Anoop John 2013-12-22, 16:41
+
Henning Blohm 2013-12-23, 11:13
+
lars hofhansl 2013-12-22, 22:37
+
Henning Blohm 2013-12-23, 11:47
+
James Taylor 2013-12-23, 18:01
+
Jesse Yates 2013-12-23, 19:10
+
Henning Blohm 2013-12-24, 11:18
+
Henning Blohm 2014-01-03, 09:41
+
Anoop John 2014-01-03, 09:52
+
rajeshbabu chintaguntla 2014-01-03, 10:19
+
Asaf Mesika 2014-01-03, 13:56
+
rajeshbabu chintaguntla 2014-01-03, 14:05
+
Anoop John 2014-01-03, 16:57
+
Henning Blohm 2014-01-03, 20:46
+
James Taylor 2014-01-03, 20:53
+
Henning Blohm 2014-01-03, 21:11
Copy link to this message
-
Re: secondary index feature
No worries, Henning. It's a little deceiving, because the coprocessors that
do the index maintenance are invoked on a per region basis. However, the
writes/puts that they do for the maintenance end up going over the wire if
necessary.

Let me know if you have other questions. It'd be good to understand your
use case more to see if Phoenix is a good fit - we're definitely open to
collaborating. FYI, we're in the process of moving to Apache, so will keep
you posted once the transition is complete.

Thanks,

James
On Fri, Jan 3, 2014 at 1:11 PM, Henning Blohm <[EMAIL PROTECTED]>wrote:

> Hi James,
>
> this is a little embarassing... I even browsed through the code and read
> it as implementing a region level index.
>
> But now at least I get the restrictions mentioned for using the covered
> indexes.
>
> Thanks for clarifying. Guess I need to browse the code a little harder ;-)
>
> Henning
>
>
> On 03.01.2014 21:53, James Taylor wrote:
>
>> Hi Henning,
>> Phoenix maintains a global index. It is essentially maintaining another
>> HBase table for you with a different row key (and a subset of your data
>> table columns that are "covered"). When an index is used by Phoenix, it is
>> *exactly* like querying a data table (that's what Phoenix does - it ends
>> up
>> issuing a Phoenix query against a Phoenix table that happens to be an
>> index
>> table).
>>
>> The hit you take for a global index is at write time - we need to look up
>> the prior state of the rows being updated to do the index maintenance.
>> Then
>> we need to do a write to the index table. The upside is that there's no
>> hit
>> at read/query time (we don't yet attempt to join from the index table back
>> to the data table - if a query is using columns that aren't in the index,
>> it simply won't be used). More here:
>> https://github.com/forcedotcom/phoenix/wiki/Secondary-Indexing
>>
>> Thanks,
>> James
>>
>>
>> On Fri, Jan 3, 2014 at 12:46 PM, Henning Blohm <[EMAIL PROTECTED]>
>> wrote:
>>
>>  When scanning in order of an index and you use RLI, it seems, there is no
>>> alternative but to involve all regions - and essentially this should
>>> happen
>>> in parallel as otherwise you might not get what you wanted. Also, for a
>>> single Get, it seems (as Lars pointed out in https://issues.apache.org/
>>> jira/browse/HBASE-2038) that you have to consult all regions.
>>>
>>> When that parallelism is no problem (small number of servers) it will
>>> actually help single scan performance (regions can provide their share in
>>> parallel).
>>>
>>> A high number of concurrent client requests leads to the same number of
>>> requests on all regions and its multiple of connections to be maintained
>>> by
>>> the client.
>>>
>>> My assumption is that that will eventually lead to a scalability problem
>>> -
>>> when, say, having a 100 region servers or so in place. I was wondering,
>>> if
>>> anyone has experience with that.
>>>
>>> That will be perfectly acceptable for many use cases that benefit from
>>> the
>>> scan (and hence query) performance more than they suffer from the load
>>> problem. Other use cases have less requirements on scans and query
>>> flexibility but rather want to preserve the quality that a Get has fixed
>>> resource usage.
>>>
>>> Btw.: I was convinces that Phoenix is keeping indexes on the region
>>> level.
>>> Is that not so?
>>>
>>> Thanks,
>>> Henning
>>>
>>>
>>> On 03.01.2014 17:57, Anoop John wrote:
>>>
>>>  In case of HBase normal scan as we know, regions will be scanned
>>>> sequentially.  Pheonix having parallel scan impls in it.  When RLI is
>>>> used
>>>> and we make use of index completely at server side, it is irrespective
>>>> of
>>>> client scan ways. Sequential or parallel, using java or any other client
>>>> layer or using SQL layer like Phoenix, using MR or not...  all client
>>>> side
>>>> dont have to worry abt this but the usage will be fully at server end.
>>>>
>>>> Yes when parallel scan is done on regions, RLI might perform much
+
Henning Blohm 2014-01-04, 18:32
+
Anoop John 2014-01-03, 11:01
+
ramkrishna vasudevan 2014-01-03, 13:48
+
Ted Yu 2014-01-03, 14:02
+
Henning Blohm 2013-12-23, 19:28
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB