Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> speeding up rowcount


Copy link to this message
-
Re: speeding up rowcount
Opened, https://issues.apache.org/jira/browse/HBASE-4702
Please edit to your liking.
On Sun, Oct 9, 2011 at 9:05 PM, Himanshu Vashishtha <[EMAIL PROTECTED]
> wrote:

> MapReduce support in HBase inherently provides parallelism such that
> each Region is given to one mapper.
>
> Himanshu
>
> On Sun, Oct 9, 2011 at 6:44 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
> > Be aware that the contract for a scan is to return all rows sorted by
> rowkey, hence it cannot scan regions in parallel by default.I have not
> played much HBase with MapReduce, but if order is not important you can to
> split the scan into multiple scans.
> >
> >
> > ----- Original Message -----
> > From: Tom Goren <[EMAIL PROTECTED]>
> > To: [EMAIL PROTECTED]
> > Cc:
> > Sent: Sunday, October 9, 2011 8:07 AM
> > Subject: Re: speeding up rowcount
> >
> > lol - i just ran a rowcount via mapreduce, and it took 6 hours for 7.5
> > million rows...
> >
> > On Sun, Oct 9, 2011 at 7:50 AM, Rita <[EMAIL PROTECTED]> wrote:
> >
> >> Hi,
> >>
> >> I have been doing a rowcount via mapreduce and its taking about 4-5
> hours
> >> to
> >> count a 500million rows in a table. I was wondering if there are any map
> >> reduce tunings I can do so it will go much faster.
> >>
> >> I have 10 node cluster, each node with 8CPUs with 64GB of memory. Any
> >> tuning
> >> advice would be much appreciated.
> >>
> >>
> >> --
> >> --- Get your facts first, then you can distort them as you please.--
> >>
> >
> >
>

--
--- Get your facts first, then you can distort them as you please.--