Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Re: HBase vs. HDFS


Copy link to this message
-
Re: HBase vs. HDFS

Hi there,

Another thing to consider on top of the scan-caching is that that HBase is
doing more in the process of scanning the table.  See...

http://hbase.apache.org/book.html#conceptual.view

http://hbase.apache.org/book.html#regions.arch
... Specifically, processing the KeyValues, potentially merging rows between
StoreFiles, checking for un-flushed updates in the MemStore per CF.

On 10/1/12 8:54 PM, "Doug Meil" <[EMAIL PROTECTED]> wrote:

>
>Hi there-
>
>Might want to start with thisŠ
>
>http://hbase.apache.org/book.html#perf.reading
>
>Š if you're using default scan caching (which is 1) that would explain a
>lot.
>
>
>
>
>On 10/1/12 7:01 PM, "Juan P." <[EMAIL PROTECTED]> wrote:
>
>>Hi guys,
>>I'm trying to get familiarized with HBase and one thing I noticed is that
>>reads seem to very slow. I just tried doing a "scan 'my_table'" to get
>>120K
>>records and it took about 50 seconds to print it all out.
>>
>>In contrast "hadoop fs -cat my_file.csv" where my_file.csv has 120K lines
>>completed in under a second.
>>
>>Is that possible? Am I missing something about HBase reads?
>>
>>Thanks,
>>Joni
>
>
>