Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> speeding up rowcount


Copy link to this message
-
Re: speeding up rowcount
Since a RowCounter uses FirstKeyOnlyFilter, we can have a default Scan
cache value of 500 or so?

Himanshu

On Sun, Oct 9, 2011 at 9:44 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
> Excellent question.
> There seems to be a bug for RowCounter.
>
> In TableInputFormat:
>        if (conf.get(SCAN_CACHEDROWS) != null) {
>          scan.setCaching(Integer.parseInt(conf.get(SCAN_CACHEDROWS)));
>        }
> But I don't see SCAN_CACHEDROWS in either TableMapReduceUtil or RowCounter.
>
> Mind filing a bug ?
>
> On Sun, Oct 9, 2011 at 8:30 AM, Rita <[EMAIL PROTECTED]> wrote:
>
>> Thanks for the responses.
>>
>> Where do I set the high Scan cache values?
>>
>>
>> On Sun, Oct 9, 2011 at 11:19 AM, Himanshu Vashishtha <
>> [EMAIL PROTECTED]> wrote:
>>
>> > Since a MapReduce is a separate process, try with a high Scan cache
>> value.
>> >
>> > http://hbase.apache.org/book.html#perf.hbase.client.caching
>> >
>> > Himanshu
>> >
>> > On Sun, Oct 9, 2011 at 9:09 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
>> > > I guess your hbase.hregion.max.filesize is quite high.
>> > > If possible, lower its value so that you have smaller regions.
>> > >
>> > > On Sun, Oct 9, 2011 at 7:50 AM, Rita <[EMAIL PROTECTED]> wrote:
>> > >
>> > >> Hi,
>> > >>
>> > >> I have been doing a rowcount via mapreduce and its taking about 4-5
>> > hours
>> > >> to
>> > >> count a 500million rows in a table. I was wondering if there are any
>> map
>> > >> reduce tunings I can do so it will go much faster.
>> > >>
>> > >> I have 10 node cluster, each node with 8CPUs with 64GB of memory. Any
>> > >> tuning
>> > >> advice would be much appreciated.
>> > >>
>> > >>
>> > >> --
>> > >> --- Get your facts first, then you can distort them as you please.--
>> > >>
>> > >
>> >
>>
>>
>>
>> --
>> --- Get your facts first, then you can distort them as you please.--
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB