Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> How to query by rowKey-infix


+
Christian Schäfer 2012-07-31, 15:27
+
Jerry Lam 2012-07-31, 17:10
+
Matt Corgan 2012-07-31, 17:41
+
Christian Schäfer 2012-08-01, 08:18
Copy link to this message
-
Re: How to query by rowKey-infix
Actually w coprocessors you can create a secondary index in short order.
Then your cost is going to be 2 fetches. Trying to do a partial table scan will be more expensive.

On Jul 31, 2012, at 12:41 PM, Matt Corgan <[EMAIL PROTECTED]> wrote:

> When deciding between a table scan vs secondary index, you should try to
> estimate what percent of the underlying data blocks will be used in the
> query.  By default, each block is 64KB.
>
> If each user's data is small and you are fitting multiple users per block,
> then you're going to need all the blocks, so a tablescan is better because
> it's simpler.  If each user has 1MB+ data then you will want to pick out
> the individual blocks relevant to each date.  The secondary index will help
> you go directly to those sparse blocks, but with a cost in complexity,
> consistency, and extra denormalized data that knocks primary data out of
> your block cache.
>
> If latency is not a concern, I would start with the table scan.  If that's
> too slow you add the secondary index, and if you still need it faster you
> do the primary key lookups in parallel as Jerry mentions.
>
> Matt
>
> On Tue, Jul 31, 2012 at 10:10 AM, Jerry Lam <[EMAIL PROTECTED]> wrote:
>
>> Hi Chris:
>>
>> I'm thinking about building a secondary index for primary key lookup, then
>> query using the primary keys in parallel.
>>
>> I'm interested to see if there is other option too.
>>
>> Best Regards,
>>
>> Jerry
>>
>> On Tue, Jul 31, 2012 at 11:27 AM, Christian Schäfer <[EMAIL PROTECTED]
>>> wrote:
>>
>>> Hello there,
>>>
>>> I designed a row key for queries that need best performance (~100 ms)
>>> which looks like this:
>>>
>>> userId-date-sessionId
>>>
>>> These queries(scans) are always based on a userId and sometimes
>>> additionally on a date, too.
>>> That's no problem with the key above.
>>>
>>> However, another kind of queries shall be based on a given time range
>>> whereas the outermost left userId is not given or known.
>>> In this case I need to get all rows covering the given time range with
>>> their date to create a daily reporting.
>>>
>>> As I can't set wildcards at the beginning of a left-based index for the
>>> scan,
>>> I only see the possibility to scan the index of the whole table to
>> collect
>>> the
>>> rowKeys that are inside the timerange I'm interested in.
>>>
>>> Is there a more elegant way to collect rows within time range X?
>>> (Unfortunately, the date attribute is not equal to the timestamp that is
>>> stored by hbase automatically.)
>>>
>>> Could/should one maybe leverage some kind of row key caching to
>> accelerate
>>> the collection process?
>>> Is that covered by the block cache?
>>>
>>> Thanks in advance for any advice.
>>>
>>> regards
>>> Chris
>>>
>>
+
Christian Schäfer 2012-08-02, 12:23
+
Michael Segel 2012-08-03, 12:21
+
Christian Schäfer 2012-08-06, 12:54
+
Alex Baranau 2012-08-02, 22:57
+
Matt Corgan 2012-08-02, 23:09
+
Alex Baranau 2012-08-03, 01:15
+
Matt Corgan 2012-08-03, 01:29
+
Christian Schäfer 2012-08-03, 09:34
+
Christian Schäfer 2012-08-03, 09:23
+
Alex Baranau 2012-08-03, 22:14
+
Alex Baranau 2012-08-09, 20:18
+
Christian Schäfer 2012-08-06, 13:00
+
Christian Schäfer 2012-08-09, 20:55
+
anil gupta 2012-08-22, 18:42
+
Christian Schäfer 2012-08-23, 08:41
+
anil gupta 2012-08-24, 07:53
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB