Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Re: HBase vs. HDFS


Copy link to this message
-
Re: HBase vs. HDFS

If you take Hbase out of it and think of it from the standpoint of 2
programs, one of which opens a file and write the output to another file,
and the other one which actually processes each row and then writes out
results, the 2nd one is going to be slower because it's doing more,
ceteris paribus.  HBase is like the 2nd program in your test.
On 10/2/12 8:46 AM, "gordoslocos" <[EMAIL PROTECTED]> wrote:

>Thank you all! Setting a cache size helped a great deal. It's still
>slower though.
>
>I think it might be possible that the overhead of processing the data
>from the table might be the cause.
>
>I guess if HBase adds an indirection to the HDFS then it makes sense that
>it'd be slower, right?
>
>On 02/10/2012, at 09:28, Doug Meil <[EMAIL PROTECTED]> wrote:
>
>>
>> Hi there,
>>
>> Another thing to consider on top of the scan-caching is that that HBase
>>is
>> doing more in the process of scanning the table.  See...
>>
>> http://hbase.apache.org/book.html#conceptual.view
>>
>> http://hbase.apache.org/book.html#regions.arch
>>
>>
>> ... Specifically, processing the KeyValues, potentially merging rows
>>between
>> StoreFiles, checking for un-flushed updates in the MemStore per CF.
>>
>>
>>
>> On 10/1/12 8:54 PM, "Doug Meil" <[EMAIL PROTECTED]> wrote:
>>
>>>
>>> Hi there-
>>>
>>> Might want to start with thisŠ
>>>
>>> http://hbase.apache.org/book.html#perf.reading
>>>
>>> Š if you're using default scan caching (which is 1) that would explain
>>>a
>>> lot.
>>>
>>>
>>>
>>>
>>> On 10/1/12 7:01 PM, "Juan P." <[EMAIL PROTECTED]> wrote:
>>>
>>>> Hi guys,
>>>> I'm trying to get familiarized with HBase and one thing I noticed is
>>>>that
>>>> reads seem to very slow. I just tried doing a "scan 'my_table'" to get
>>>> 120K
>>>> records and it took about 50 seconds to print it all out.
>>>>
>>>> In contrast "hadoop fs -cat my_file.csv" where my_file.csv has 120K
>>>>lines
>>>> completed in under a second.
>>>>
>>>> Is that possible? Am I missing something about HBase reads?
>>>>
>>>> Thanks,
>>>> Joni
>>
>>
>