Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> speeding up rowcount


Copy link to this message
-
Re: speeding up rowcount
Ha. You are over estimating my Java Ted. I am no programmer just a ignorant
consumer of great technologies
On Sat, Oct 29, 2011 at 10:46 AM, Ted Yu <[EMAIL PROTECTED]> wrote:

> Thanks Rita for logging the JIRA.
>
> Do you want to provide a patch ?
>
> On Sat, Oct 29, 2011 at 7:29 AM, Rita <[EMAIL PROTECTED]> wrote:
>
> > Opened, https://issues.apache.org/jira/browse/HBASE-4702
> >
> >
> > Please edit to your liking.
> >
> >
> > On Sun, Oct 9, 2011 at 9:05 PM, Himanshu Vashishtha <
> > [EMAIL PROTECTED]
> > > wrote:
> >
> > > MapReduce support in HBase inherently provides parallelism such that
> > > each Region is given to one mapper.
> > >
> > > Himanshu
> > >
> > > On Sun, Oct 9, 2011 at 6:44 PM, lars hofhansl <[EMAIL PROTECTED]>
> > wrote:
> > > > Be aware that the contract for a scan is to return all rows sorted by
> > > rowkey, hence it cannot scan regions in parallel by default.I have not
> > > played much HBase with MapReduce, but if order is not important you can
> > to
> > > split the scan into multiple scans.
> > > >
> > > >
> > > > ----- Original Message -----
> > > > From: Tom Goren <[EMAIL PROTECTED]>
> > > > To: [EMAIL PROTECTED]
> > > > Cc:
> > > > Sent: Sunday, October 9, 2011 8:07 AM
> > > > Subject: Re: speeding up rowcount
> > > >
> > > > lol - i just ran a rowcount via mapreduce, and it took 6 hours for
> 7.5
> > > > million rows...
> > > >
> > > > On Sun, Oct 9, 2011 at 7:50 AM, Rita <[EMAIL PROTECTED]> wrote:
> > > >
> > > >> Hi,
> > > >>
> > > >> I have been doing a rowcount via mapreduce and its taking about 4-5
> > > hours
> > > >> to
> > > >> count a 500million rows in a table. I was wondering if there are any
> > map
> > > >> reduce tunings I can do so it will go much faster.
> > > >>
> > > >> I have 10 node cluster, each node with 8CPUs with 64GB of memory.
> Any
> > > >> tuning
> > > >> advice would be much appreciated.
> > > >>
> > > >>
> > > >> --
> > > >> --- Get your facts first, then you can distort them as you please.--
> > > >>
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > --- Get your facts first, then you can distort them as you please.--
> >
>

--
--- Get your facts first, then you can distort them as you please.--
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB