Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Re: HBase vs. HDFS


Copy link to this message
-
Re: HBase vs. HDFS

Hi there,

Another thing to consider on top of the scan-caching is that that HBase is
doing more in the process of scanning the table.  See...

http://hbase.apache.org/book.html#conceptual.view

http://hbase.apache.org/book.html#regions.arch
... Specifically, processing the KeyValues, potentially merging rows between
StoreFiles, checking for un-flushed updates in the MemStore per CF.

On 10/1/12 8:54 PM, "Doug Meil" <[EMAIL PROTECTED]> wrote:

>
>Hi there-
>
>Might want to start with thisŠ
>
>http://hbase.apache.org/book.html#perf.reading
>
>Š if you're using default scan caching (which is 1) that would explain a
>lot.
>
>
>
>
>On 10/1/12 7:01 PM, "Juan P." <[EMAIL PROTECTED]> wrote:
>
>>Hi guys,
>>I'm trying to get familiarized with HBase and one thing I noticed is that
>>reads seem to very slow. I just tried doing a "scan 'my_table'" to get
>>120K
>>records and it took about 50 seconds to print it all out.
>>
>>In contrast "hadoop fs -cat my_file.csv" where my_file.csv has 120K lines
>>completed in under a second.
>>
>>Is that possible? Am I missing something about HBase reads?
>>
>>Thanks,
>>Joni
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB