|
Rita
2011-10-26, 10:21
Doug Meil
2011-10-26, 13:18
Gary Helmling
2011-10-26, 18:49
Rita
2011-10-27, 00:27
Rita
2011-10-29, 12:38
Doug Meil
2011-10-29, 15:26
|
-
sum, avg, count, etc...Rita 2011-10-26, 10:21
I am trying to do some simple statistics with my data but its taking longer
than expected. Here is how my data is structured in hbase. keys (symbol#epoch time stamp) msft#1319562974#NASDAQ t#1319562974#NYSE yhoo#1319562974#NASDAQ msft#1319562975#NASDAQ The values look like this (for instance microsoft) ... price=26.81 openclose... there are about 300 values per each key. So, for instance if I want to calculate avg price of msft I am setting up a start and stop filter and its able to calculate it by tick. But its taking about 7 seconds to go thru 500 keys. Is that normal? Is there a faster way to calculate sum,avg,count in hbase? would I need to redo my schema? tia -- --- Get your facts first, then you can distort them as you please.--
-
Re: sum, avg, count, etc...Doug Meil 2011-10-26, 13:18
Hi there-
First, make sure you aren't tripping on any of these issues.. http://hbase.apache.org/book.html#perf.reading On 10/26/11 6:21 AM, "Rita" <[EMAIL PROTECTED]> wrote: >I am trying to do some simple statistics with my data but its taking >longer >than expected. > > > >Here is how my data is structured in hbase. > >keys (symbol#epoch time stamp) >msft#1319562974#NASDAQ >t#1319562974#NYSE >yhoo#1319562974#NASDAQ >msft#1319562975#NASDAQ > >The values look like this (for instance microsoft) >... >price=26.81 >open>close>... > >there are about 300 values per each key. > > >So, for instance if I want to calculate avg price of msft I am setting up >a >start and stop filter and its able to calculate it by tick. But its taking >about 7 seconds to go thru 500 keys. Is that normal? Is there a faster way >to calculate sum,avg,count in hbase? would I need to redo my schema? > >tia > > > > > >-- >--- Get your facts first, then you can distort them as you please.--
-
Re: sum, avg, count, etc...Gary Helmling 2011-10-26, 18:49
Also, make sure that you're either setting a stop row on the scan, or
if you're using a filter, try wrapping it in a WhileMatchFilter. This tells the scanner it can stop as soon as the filter starts rejecting rows. Otherwise you can wind up getting back just the data you expect, but still scanning all the way to the end of the table, just filtering out all the remaining rows. On Wed, Oct 26, 2011 at 6:18 AM, Doug Meil <[EMAIL PROTECTED]> wrote: > Hi there- > > First, make sure you aren't tripping on any of these issues.. > > http://hbase.apache.org/book.html#perf.reading > > > > > > On 10/26/11 6:21 AM, "Rita" <[EMAIL PROTECTED]> wrote: > >>I am trying to do some simple statistics with my data but its taking >>longer >>than expected. >> >> >> >>Here is how my data is structured in hbase. >> >>keys (symbol#epoch time stamp) >>msft#1319562974#NASDAQ >>t#1319562974#NYSE >>yhoo#1319562974#NASDAQ >>msft#1319562975#NASDAQ >> >>The values look like this (for instance microsoft) >>... >>price=26.81 >>open>>close>>... >> >>there are about 300 values per each key. >> >> >>So, for instance if I want to calculate avg price of msft I am setting up >>a >>start and stop filter and its able to calculate it by tick. But its taking >>about 7 seconds to go thru 500 keys. Is that normal? Is there a faster way >>to calculate sum,avg,count in hbase? would I need to redo my schema? >> >>tia >> >> >> >> >> >>-- >>--- Get your facts first, then you can distort them as you please.-- > >
-
Re: sum, avg, count, etc...Rita 2011-10-27, 00:27
Thanks for all of your responses.
The original file is a text file and when I try to search that using grep it takes minutes. So, taking 7 seconds aint too bad. thanks again for your time and advise On Wed, Oct 26, 2011 at 2:49 PM, Gary Helmling <[EMAIL PROTECTED]> wrote: > Also, make sure that you're either setting a stop row on the scan, or > if you're using a filter, try wrapping it in a WhileMatchFilter. This > tells the scanner it can stop as soon as the filter starts rejecting > rows. Otherwise you can wind up getting back just the data you > expect, but still scanning all the way to the end of the table, just > filtering out all the remaining rows. > > On Wed, Oct 26, 2011 at 6:18 AM, Doug Meil > <[EMAIL PROTECTED]> wrote: > > Hi there- > > > > First, make sure you aren't tripping on any of these issues.. > > > > http://hbase.apache.org/book.html#perf.reading > > > > > > > > > > > > On 10/26/11 6:21 AM, "Rita" <[EMAIL PROTECTED]> wrote: > > > >>I am trying to do some simple statistics with my data but its taking > >>longer > >>than expected. > >> > >> > >> > >>Here is how my data is structured in hbase. > >> > >>keys (symbol#epoch time stamp) > >>msft#1319562974#NASDAQ > >>t#1319562974#NYSE > >>yhoo#1319562974#NASDAQ > >>msft#1319562975#NASDAQ > >> > >>The values look like this (for instance microsoft) > >>... > >>price=26.81 > >>open> >>close> >>... > >> > >>there are about 300 values per each key. > >> > >> > >>So, for instance if I want to calculate avg price of msft I am setting up > >>a > >>start and stop filter and its able to calculate it by tick. But its > taking > >>about 7 seconds to go thru 500 keys. Is that normal? Is there a faster > way > >>to calculate sum,avg,count in hbase? would I need to redo my schema? > >> > >>tia > >> > >> > >> > >> > >> > >>-- > >>--- Get your facts first, then you can distort them as you please.-- > > > > > -- --- Get your facts first, then you can distort them as you please.--
-
Re: sum, avg, count, etc...Rita 2011-10-29, 12:38
For the values,
... price=26.81 openclose... Does hbase do a full scan across all values or does it have a constant lookup, O(1) ? On Wed, Oct 26, 2011 at 8:27 PM, Rita <[EMAIL PROTECTED]> wrote: > Thanks for all of your responses. > > The original file is a text file and when I try to search that using grep > it takes minutes. So, taking 7 seconds aint too bad. > > thanks again for your time and advise > > > On Wed, Oct 26, 2011 at 2:49 PM, Gary Helmling <[EMAIL PROTECTED]>wrote: > >> Also, make sure that you're either setting a stop row on the scan, or >> if you're using a filter, try wrapping it in a WhileMatchFilter. This >> tells the scanner it can stop as soon as the filter starts rejecting >> rows. Otherwise you can wind up getting back just the data you >> expect, but still scanning all the way to the end of the table, just >> filtering out all the remaining rows. >> >> On Wed, Oct 26, 2011 at 6:18 AM, Doug Meil >> <[EMAIL PROTECTED]> wrote: >> > Hi there- >> > >> > First, make sure you aren't tripping on any of these issues.. >> > >> > http://hbase.apache.org/book.html#perf.reading >> > >> > >> > >> > >> > >> > On 10/26/11 6:21 AM, "Rita" <[EMAIL PROTECTED]> wrote: >> > >> >>I am trying to do some simple statistics with my data but its taking >> >>longer >> >>than expected. >> >> >> >> >> >> >> >>Here is how my data is structured in hbase. >> >> >> >>keys (symbol#epoch time stamp) >> >>msft#1319562974#NASDAQ >> >>t#1319562974#NYSE >> >>yhoo#1319562974#NASDAQ >> >>msft#1319562975#NASDAQ >> >> >> >>The values look like this (for instance microsoft) >> >>... >> >>price=26.81 >> >>open>> >>close>> >>... >> >> >> >>there are about 300 values per each key. >> >> >> >> >> >>So, for instance if I want to calculate avg price of msft I am setting >> up >> >>a >> >>start and stop filter and its able to calculate it by tick. But its >> taking >> >>about 7 seconds to go thru 500 keys. Is that normal? Is there a faster >> way >> >>to calculate sum,avg,count in hbase? would I need to redo my schema? >> >> >> >>tia >> >> >> >> >> >> >> >> >> >> >> >>-- >> >>--- Get your facts first, then you can distort them as you please.-- >> > >> > >> > > > > -- > --- Get your facts first, then you can distort them as you please.-- > -- --- Get your facts first, then you can distort them as you please.--
-
Re: sum, avg, count, etc...Doug Meil 2011-10-29, 15:26
See... http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html ... for access to values. It's backed by a map. On 10/29/11 8:38 AM, "Rita" <[EMAIL PROTECTED]> wrote: >For the values, >... >price=26.81 >open>close>... >Does hbase do a full scan across all values or does it have a constant >lookup, O(1) ? > > > > > >On Wed, Oct 26, 2011 at 8:27 PM, Rita <[EMAIL PROTECTED]> wrote: > >> Thanks for all of your responses. >> >> The original file is a text file and when I try to search that using >>grep >> it takes minutes. So, taking 7 seconds aint too bad. >> >> thanks again for your time and advise >> >> >> On Wed, Oct 26, 2011 at 2:49 PM, Gary Helmling >><[EMAIL PROTECTED]>wrote: >> >>> Also, make sure that you're either setting a stop row on the scan, or >>> if you're using a filter, try wrapping it in a WhileMatchFilter. This >>> tells the scanner it can stop as soon as the filter starts rejecting >>> rows. Otherwise you can wind up getting back just the data you >>> expect, but still scanning all the way to the end of the table, just >>> filtering out all the remaining rows. >>> >>> On Wed, Oct 26, 2011 at 6:18 AM, Doug Meil >>> <[EMAIL PROTECTED]> wrote: >>> > Hi there- >>> > >>> > First, make sure you aren't tripping on any of these issues.. >>> > >>> > http://hbase.apache.org/book.html#perf.reading >>> > >>> > >>> > >>> > >>> > >>> > On 10/26/11 6:21 AM, "Rita" <[EMAIL PROTECTED]> wrote: >>> > >>> >>I am trying to do some simple statistics with my data but its taking >>> >>longer >>> >>than expected. >>> >> >>> >> >>> >> >>> >>Here is how my data is structured in hbase. >>> >> >>> >>keys (symbol#epoch time stamp) >>> >>msft#1319562974#NASDAQ >>> >>t#1319562974#NYSE >>> >>yhoo#1319562974#NASDAQ >>> >>msft#1319562975#NASDAQ >>> >> >>> >>The values look like this (for instance microsoft) >>> >>... >>> >>price=26.81 >>> >>open>>> >>close>>> >>... >>> >> >>> >>there are about 300 values per each key. >>> >> >>> >> >>> >>So, for instance if I want to calculate avg price of msft I am >>>setting >>> up >>> >>a >>> >>start and stop filter and its able to calculate it by tick. But its >>> taking >>> >>about 7 seconds to go thru 500 keys. Is that normal? Is there a >>>faster >>> way >>> >>to calculate sum,avg,count in hbase? would I need to redo my schema? >>> >> >>> >>tia >>> >> >>> >> >>> >> >>> >> >>> >> >>> >>-- >>> >>--- Get your facts first, then you can distort them as you please.-- >>> > >>> > >>> >> >> >> >> -- >> --- Get your facts first, then you can distort them as you please.-- >> > > > >-- >--- Get your facts first, then you can distort them as you please.-- |