Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> speeding up rowcount


Copy link to this message
-
Re: speeding up rowcount
MapReduce support in HBase inherently provides parallelism such that
each Region is given to one mapper.

Himanshu

On Sun, Oct 9, 2011 at 6:44 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
> Be aware that the contract for a scan is to return all rows sorted by rowkey, hence it cannot scan regions in parallel by default.I have not played much HBase with MapReduce, but if order is not important you can to split the scan into multiple scans.
>
>
> ----- Original Message -----
> From: Tom Goren <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Cc:
> Sent: Sunday, October 9, 2011 8:07 AM
> Subject: Re: speeding up rowcount
>
> lol - i just ran a rowcount via mapreduce, and it took 6 hours for 7.5
> million rows...
>
> On Sun, Oct 9, 2011 at 7:50 AM, Rita <[EMAIL PROTECTED]> wrote:
>
>> Hi,
>>
>> I have been doing a rowcount via mapreduce and its taking about 4-5 hours
>> to
>> count a 500million rows in a table. I was wondering if there are any map
>> reduce tunings I can do so it will go much faster.
>>
>> I have 10 node cluster, each node with 8CPUs with 64GB of memory. Any
>> tuning
>> advice would be much appreciated.
>>
>>
>> --
>> --- Get your facts first, then you can distort them as you please.--
>>
>
>