Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - What is the best way to get the total row count?


Copy link to this message
-
Re: What is the best way to get the total row count?
shashwat shriparv 2012-07-09, 17:56
On more option is will suggest... while puting the data in hadoop just
maintain a count somewhere..

On Mon, Jul 9, 2012 at 11:25 PM, shashwat shriparv <
[EMAIL PROTECTED]> wrote:

> Count the number of rows in a table. This operation may take a LONG
>            time (Run '$HADOOP_HOME/bin/hadoop jar hbase.jar rowcount' to run a
>            counting mapreduce job). Current count is shown every 1000 rows by
>            default. Count interval may be optionally specified. Examples:
>
>            hbase> count 't1'
>            hbase> count 't1', 100000
>
>
> On Mon, Jul 9, 2012 at 11:01 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:
>
>> If you need the exact count the best way is still RowCounter, maybe
>> set a bigger scanner caching?
>>
>> Another option that works if you only need an estimate is using the
>> reported number of KVs per region and then summing them up. Look at
>> any of your region servers' web ui and on the right you'll see the
>> count per region.
>>
>> J-D
>>
>> On Mon, Jul 9, 2012 at 3:41 AM, Gopinathan A <[EMAIL PROTECTED]>
>> wrote:
>> > Hi,
>> >
>> >
>> >
>> > What is the best way to get the total row count?
>> >
>> >
>> >
>> > I tried following things,
>> >
>> > a>     Count 'tablename' in shell prompt: Helpful, only with very less
>> > number of records.
>> >
>> > b>     Runing RowCounter Job: It took almost 8hr to get row count of 2TB
>> > data in 3node cluster (16 core system, 48GB RAM)
>> >
>> > c>   Using AggregationClient: Disk IO is very high (System wait is
>> 65-70%,
>> > Load factor is almost 110), this makes server to non responsive and
>> makes
>> > the clients to go down (Due to RPCTimeOut Exceptions).
>> >
>> > Thanks & Regards,
>> >
>> > Gopinathan A
>> >
>> >
>> >
>> >
>> ****************************************************************************
>> > ***********
>> > This e-mail and attachments contain confidential information from
>> HUAWEI,
>> > which is intended only for the person or entity whose address is listed
>> > above. Any use of the information contained herein in any way
>> (including,
>> > but not limited to, total or partial disclosure, reproduction, or
>> > dissemination) by persons other than the intended recipient's) is
>> > prohibited. If you receive this e-mail in error, please notify the
>> sender by
>> > phone or email immediately and delete it!
>> >
>> >
>> >
>>
>
>
>
> --
>
>
> ∞
> Shashwat Shriparv
>
>
>
--

Shashwat Shriparv