Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Does HBase supports parallel table scan if I use MapReduce


+
yonghu 2013-08-20, 15:45
+
Jeff Kolesky 2013-08-20, 16:02
+
yonghu 2013-08-21, 08:08
Copy link to this message
-
Re: Does HBase supports parallel table scan if I use MapReduce
Have a look at Phoenix (https://github.com/forcedotcom/phoenix), a SQL
skin over HBase. It does parallel scans and has no map/reduce
dependencies. Instead, it compiles your SQL into native HBase calls.
Thanks,
James
@JamesPlusPlus
http://phoenix-hbase.blogspot.com

On Aug 21, 2013, at 1:08 AM, yonghu <[EMAIL PROTECTED]> wrote:

> Thanks. So, to scan the table just using the java program without using
> MapReduce will heavily decrease the performance.
>
> Yong
>
>
> On Tue, Aug 20, 2013 at 6:02 PM, Jeff Kolesky <[EMAIL PROTECTED]> wrote:
>
>> The scan will be broken up into multiple map tasks, each of which will run
>> over a single split of the table (look at TableInputFormat to see how it is
>> done).  The map tasks will run in parallel.
>>
>> Jeff
>>
>>
>> On Tue, Aug 20, 2013 at 8:45 AM, yonghu <[EMAIL PROTECTED]> wrote:
>>
>>> Hello,
>>>
>>> I know if I use default scan api, HBase scans table in a serial manner,
>> as
>>> it needs to guarantee the order of the returned tuples. My question is
>> if I
>>> use MapReduce to read the HBase table, and directly output the results in
>>> HDFS, not returned back to client. The HBase scan is still in a serial
>>> manner or in this situation it can run a parallel scan.
>>>
>>> Thanks!
>>>
>>> Yong
>>>
>>
>>
>>
>> --
>> *Jeff Kolesky*
>> Chief Software Architect
>> *Opower*
>>