|
techbuddy
2012-10-02, 00:03
Doug Meil
2012-10-02, 00:54
lars hofhansl
2012-10-02, 01:05
Andrew Purtell
2012-10-02, 07:08
Doug Meil
2012-10-02, 12:28
gordoslocos
2012-10-02, 12:46
Doug Meil
2012-10-02, 14:04
|
-
Re: HBase vs. HDFStechbuddy 2012-10-02, 00:03
How did you verify that all the rows indeed reside on the same region server?
-- View this message in context: http://apache-hbase.679495.n3.nabble.com/HBase-vs-HDFS-tp4032463p4032473.html Sent from the HBase User mailing list archive at Nabble.com.
-
Re: HBase vs. HDFSDoug Meil 2012-10-02, 00:54
Hi there- Might want to start with thisŠ http://hbase.apache.org/book.html#perf.reading Š if you're using default scan caching (which is 1) that would explain a lot. On 10/1/12 7:01 PM, "Juan P." <[EMAIL PROTECTED]> wrote: >Hi guys, >I'm trying to get familiarized with HBase and one thing I noticed is that >reads seem to very slow. I just tried doing a "scan 'my_table'" to get >120K >records and it took about 50 seconds to print it all out. > >In contrast "hadoop fs -cat my_file.csv" where my_file.csv has 120K lines >completed in under a second. > >Is that possible? Am I missing something about HBase reads? > >Thanks, >Joni
-
Re: HBase vs. HDFSlars hofhansl 2012-10-02, 01:05
You probably executed 120k next() RPC against your server, unless you enabled scanner caching.
(On a related note, we should probably not default this to 1, but something more sensible, like 10 or 100). -- Lars ----- Original Message ----- From: Juan P. <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: Sent: Monday, October 1, 2012 4:01 PM Subject: HBase vs. HDFS Hi guys, I'm trying to get familiarized with HBase and one thing I noticed is that reads seem to very slow. I just tried doing a "scan 'my_table'" to get 120K records and it took about 50 seconds to print it all out. In contrast "hadoop fs -cat my_file.csv" where my_file.csv has 120K lines completed in under a second. Is that possible? Am I missing something about HBase reads? Thanks, Joni
-
Re: HBase vs. HDFSAndrew Purtell 2012-10-02, 07:08
On Tue, Oct 2, 2012 at 9:05 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:
> You probably executed 120k next() RPC against your server, unless you enabled scanner caching. > (On a related note, we should probably not default this to 1, but something more sensible, like 10 or 100). We use 100. -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
-
Re: HBase vs. HDFSDoug Meil 2012-10-02, 12:28
Hi there, Another thing to consider on top of the scan-caching is that that HBase is doing more in the process of scanning the table. See... http://hbase.apache.org/book.html#conceptual.view http://hbase.apache.org/book.html#regions.arch ... Specifically, processing the KeyValues, potentially merging rows between StoreFiles, checking for un-flushed updates in the MemStore per CF. On 10/1/12 8:54 PM, "Doug Meil" <[EMAIL PROTECTED]> wrote: > >Hi there- > >Might want to start with thisŠ > >http://hbase.apache.org/book.html#perf.reading > >Š if you're using default scan caching (which is 1) that would explain a >lot. > > > > >On 10/1/12 7:01 PM, "Juan P." <[EMAIL PROTECTED]> wrote: > >>Hi guys, >>I'm trying to get familiarized with HBase and one thing I noticed is that >>reads seem to very slow. I just tried doing a "scan 'my_table'" to get >>120K >>records and it took about 50 seconds to print it all out. >> >>In contrast "hadoop fs -cat my_file.csv" where my_file.csv has 120K lines >>completed in under a second. >> >>Is that possible? Am I missing something about HBase reads? >> >>Thanks, >>Joni > > >
-
Re: HBase vs. HDFSgordoslocos 2012-10-02, 12:46
Thank you all! Setting a cache size helped a great deal. It's still slower though.
I think it might be possible that the overhead of processing the data from the table might be the cause. I guess if HBase adds an indirection to the HDFS then it makes sense that it'd be slower, right? On 02/10/2012, at 09:28, Doug Meil <[EMAIL PROTECTED]> wrote: > > Hi there, > > Another thing to consider on top of the scan-caching is that that HBase is > doing more in the process of scanning the table. See... > > http://hbase.apache.org/book.html#conceptual.view > > http://hbase.apache.org/book.html#regions.arch > > > ... Specifically, processing the KeyValues, potentially merging rows between > StoreFiles, checking for un-flushed updates in the MemStore per CF. > > > > On 10/1/12 8:54 PM, "Doug Meil" <[EMAIL PROTECTED]> wrote: > >> >> Hi there- >> >> Might want to start with thisŠ >> >> http://hbase.apache.org/book.html#perf.reading >> >> Š if you're using default scan caching (which is 1) that would explain a >> lot. >> >> >> >> >> On 10/1/12 7:01 PM, "Juan P." <[EMAIL PROTECTED]> wrote: >> >>> Hi guys, >>> I'm trying to get familiarized with HBase and one thing I noticed is that >>> reads seem to very slow. I just tried doing a "scan 'my_table'" to get >>> 120K >>> records and it took about 50 seconds to print it all out. >>> >>> In contrast "hadoop fs -cat my_file.csv" where my_file.csv has 120K lines >>> completed in under a second. >>> >>> Is that possible? Am I missing something about HBase reads? >>> >>> Thanks, >>> Joni > >
-
Re: HBase vs. HDFSDoug Meil 2012-10-02, 14:04
If you take Hbase out of it and think of it from the standpoint of 2 programs, one of which opens a file and write the output to another file, and the other one which actually processes each row and then writes out results, the 2nd one is going to be slower because it's doing more, ceteris paribus. HBase is like the 2nd program in your test. On 10/2/12 8:46 AM, "gordoslocos" <[EMAIL PROTECTED]> wrote: >Thank you all! Setting a cache size helped a great deal. It's still >slower though. > >I think it might be possible that the overhead of processing the data >from the table might be the cause. > >I guess if HBase adds an indirection to the HDFS then it makes sense that >it'd be slower, right? > >On 02/10/2012, at 09:28, Doug Meil <[EMAIL PROTECTED]> wrote: > >> >> Hi there, >> >> Another thing to consider on top of the scan-caching is that that HBase >>is >> doing more in the process of scanning the table. See... >> >> http://hbase.apache.org/book.html#conceptual.view >> >> http://hbase.apache.org/book.html#regions.arch >> >> >> ... Specifically, processing the KeyValues, potentially merging rows >>between >> StoreFiles, checking for un-flushed updates in the MemStore per CF. >> >> >> >> On 10/1/12 8:54 PM, "Doug Meil" <[EMAIL PROTECTED]> wrote: >> >>> >>> Hi there- >>> >>> Might want to start with thisŠ >>> >>> http://hbase.apache.org/book.html#perf.reading >>> >>> Š if you're using default scan caching (which is 1) that would explain >>>a >>> lot. >>> >>> >>> >>> >>> On 10/1/12 7:01 PM, "Juan P." <[EMAIL PROTECTED]> wrote: >>> >>>> Hi guys, >>>> I'm trying to get familiarized with HBase and one thing I noticed is >>>>that >>>> reads seem to very slow. I just tried doing a "scan 'my_table'" to get >>>> 120K >>>> records and it took about 50 seconds to print it all out. >>>> >>>> In contrast "hadoop fs -cat my_file.csv" where my_file.csv has 120K >>>>lines >>>> completed in under a second. >>>> >>>> Is that possible? Am I missing something about HBase reads? >>>> >>>> Thanks, >>>> Joni >> >> > |