Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - speeding up rowcount


+
Rita 2011-10-09, 14:50
+
Tom Goren 2011-10-09, 15:07
+
lars hofhansl 2011-10-10, 00:44
+
Himanshu Vashishtha 2011-10-10, 01:05
+
Rita 2011-10-29, 14:29
+
Ted Yu 2011-10-29, 14:46
+
Rita 2011-10-29, 16:56
+
Ted Yu 2011-10-29, 21:32
+
Ted Yu 2011-10-09, 15:09
+
Himanshu Vashishtha 2011-10-09, 15:19
+
Rita 2011-10-09, 15:30
+
Ted Yu 2011-10-09, 15:44
Copy link to this message
-
Re: speeding up rowcount
Himanshu Vashishtha 2011-10-09, 16:26
Since a RowCounter uses FirstKeyOnlyFilter, we can have a default Scan
cache value of 500 or so?

Himanshu

On Sun, Oct 9, 2011 at 9:44 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
> Excellent question.
> There seems to be a bug for RowCounter.
>
> In TableInputFormat:
>        if (conf.get(SCAN_CACHEDROWS) != null) {
>          scan.setCaching(Integer.parseInt(conf.get(SCAN_CACHEDROWS)));
>        }
> But I don't see SCAN_CACHEDROWS in either TableMapReduceUtil or RowCounter.
>
> Mind filing a bug ?
>
> On Sun, Oct 9, 2011 at 8:30 AM, Rita <[EMAIL PROTECTED]> wrote:
>
>> Thanks for the responses.
>>
>> Where do I set the high Scan cache values?
>>
>>
>> On Sun, Oct 9, 2011 at 11:19 AM, Himanshu Vashishtha <
>> [EMAIL PROTECTED]> wrote:
>>
>> > Since a MapReduce is a separate process, try with a high Scan cache
>> value.
>> >
>> > http://hbase.apache.org/book.html#perf.hbase.client.caching
>> >
>> > Himanshu
>> >
>> > On Sun, Oct 9, 2011 at 9:09 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
>> > > I guess your hbase.hregion.max.filesize is quite high.
>> > > If possible, lower its value so that you have smaller regions.
>> > >
>> > > On Sun, Oct 9, 2011 at 7:50 AM, Rita <[EMAIL PROTECTED]> wrote:
>> > >
>> > >> Hi,
>> > >>
>> > >> I have been doing a rowcount via mapreduce and its taking about 4-5
>> > hours
>> > >> to
>> > >> count a 500million rows in a table. I was wondering if there are any
>> map
>> > >> reduce tunings I can do so it will go much faster.
>> > >>
>> > >> I have 10 node cluster, each node with 8CPUs with 64GB of memory. Any
>> > >> tuning
>> > >> advice would be much appreciated.
>> > >>
>> > >>
>> > >> --
>> > >> --- Get your facts first, then you can distort them as you please.--
>> > >>
>> > >
>> >
>>
>>
>>
>> --
>> --- Get your facts first, then you can distort them as you please.--
>>
>
+
Ted Yu 2011-10-09, 16:29
+
Ryan Rawson 2011-10-10, 00:01