Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> sum, avg, count, etc...


Copy link to this message
-
Re: sum, avg, count, etc...
Also, make sure that you're either setting a stop row on the scan, or
if you're using a filter, try wrapping it in a WhileMatchFilter.  This
tells the scanner it can stop as soon as the filter starts rejecting
rows.  Otherwise you can wind up getting back just the data you
expect, but still scanning all the way to the end of the table, just
filtering out all the remaining rows.

On Wed, Oct 26, 2011 at 6:18 AM, Doug Meil
<[EMAIL PROTECTED]> wrote:
> Hi there-
>
> First, make sure you aren't tripping on any of these issues..
>
> http://hbase.apache.org/book.html#perf.reading
>
>
>
>
>
> On 10/26/11 6:21 AM, "Rita" <[EMAIL PROTECTED]> wrote:
>
>>I am trying to do some simple statistics with my data but its taking
>>longer
>>than expected.
>>
>>
>>
>>Here is how my data is structured in hbase.
>>
>>keys (symbol#epoch time stamp)
>>msft#1319562974#NASDAQ
>>t#1319562974#NYSE
>>yhoo#1319562974#NASDAQ
>>msft#1319562975#NASDAQ
>>
>>The values look like this (for instance microsoft)
>>...
>>price=26.81
>>open>>close>>...
>>
>>there are about 300 values per each key.
>>
>>
>>So, for instance if I want to calculate avg price of msft I am setting up
>>a
>>start and stop filter and its able to calculate it by tick. But its taking
>>about 7 seconds to go thru 500 keys. Is that normal? Is there a faster way
>>to calculate sum,avg,count in hbase? would I need to redo my schema?
>>
>>tia
>>
>>
>>
>>
>>
>>--
>>--- Get your facts first, then you can distort them as you please.--
>
>