Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> What is the best way to get the total row count?


Copy link to this message
-
Re: What is the best way to get the total row count?
On more option is will suggest... while puting the data in hadoop just
maintain a count somewhere..

On Mon, Jul 9, 2012 at 11:25 PM, shashwat shriparv <
[EMAIL PROTECTED]> wrote:

> Count the number of rows in a table. This operation may take a LONG
>            time (Run '$HADOOP_HOME/bin/hadoop jar hbase.jar rowcount' to run a
>            counting mapreduce job). Current count is shown every 1000 rows by
>            default. Count interval may be optionally specified. Examples:
>
>            hbase> count 't1'
>            hbase> count 't1', 100000
>
>
> On Mon, Jul 9, 2012 at 11:01 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:
>
>> If you need the exact count the best way is still RowCounter, maybe
>> set a bigger scanner caching?
>>
>> Another option that works if you only need an estimate is using the
>> reported number of KVs per region and then summing them up. Look at
>> any of your region servers' web ui and on the right you'll see the
>> count per region.
>>
>> J-D
>>
>> On Mon, Jul 9, 2012 at 3:41 AM, Gopinathan A <[EMAIL PROTECTED]>
>> wrote:
>> > Hi,
>> >
>> >
>> >
>> > What is the best way to get the total row count?
>> >
>> >
>> >
>> > I tried following things,
>> >
>> > a>     Count 'tablename' in shell prompt: Helpful, only with very less
>> > number of records.
>> >
>> > b>     Runing RowCounter Job: It took almost 8hr to get row count of 2TB
>> > data in 3node cluster (16 core system, 48GB RAM)
>> >
>> > c>   Using AggregationClient: Disk IO is very high (System wait is
>> 65-70%,
>> > Load factor is almost 110), this makes server to non responsive and
>> makes
>> > the clients to go down (Due to RPCTimeOut Exceptions).
>> >
>> > Thanks & Regards,
>> >
>> > Gopinathan A
>> >
>> >
>> >
>> >
>> ****************************************************************************
>> > ***********
>> > This e-mail and attachments contain confidential information from
>> HUAWEI,
>> > which is intended only for the person or entity whose address is listed
>> > above. Any use of the information contained herein in any way
>> (including,
>> > but not limited to, total or partial disclosure, reproduction, or
>> > dissemination) by persons other than the intended recipient's) is
>> > prohibited. If you receive this e-mail in error, please notify the
>> sender by
>> > phone or email immediately and delete it!
>> >
>> >
>> >
>>
>
>
>
> --
>
>
> ∞
> Shashwat Shriparv
>
>
>
--

Shashwat Shriparv
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB