Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Speeding up the row count


Copy link to this message
-
Re: Speeding up the row count
Hi,

You might want to take a look at that:

http://hbase.apache.org/book/ops_mgt.html#rowcounter

JM
2013/4/17 Omkar Joshi <[EMAIL PROTECTED]>

> Hi,
>
> I'm having two tables - CUSTOMERS(60000 + rows) and PRODUCTS(1000851 rows).
>
> The table structures are  :
>
> CUSTOMERS
> rowkey :                       CUSTOMER_ID
>
> column family : CUSTOMER_INFO
>
>             columns :          NAME
>                                     EMAIL
>                                     ADDRESS
>                                     MOBILE
>
>
> PRODUCTS
> rowkey :                       PRODUCT_ID
>
> column family : PRODUCT_INFO
>
>             columns : NAME
>                                     CATEGORY
>                                     GROUP
>                                     COMPANY
>                                     COST
>                                     COLOR
>
> I'm trying to get the row count for each table using the following snippet
> :
> .
> .
> .
> hbaseCRUD.getTableCount(args[1], "CUSTOMER_INFO","NAME");
> .
> .
> hbaseCRUD.getTableCount(args[1], "PRODUCT_INFO","NAME");
>
> public long getTableCount(String tableName, String columnFamilyName,
>                   String columnName) {
>             AggregationClient aggregationClient = new
> AggregationClient(config);
>             Scan scan = new Scan();
>             scan.addFamily(Bytes.toBytes(columnFamilyName));
>             if (columnName != null && !columnName.isEmpty()) {
>                   scan.addColumn(Bytes.toBytes(columnFamilyName),
>                               Bytes.toBytes(columnName));
>             }
>
>             long rowCount = 0;
>             try {
>                   rowCount > aggregationClient.rowCount(Bytes.toBytes(tableName),
>                               null, scan);
>             } catch (Throwable e) {
>                   // TODO Auto-generated catch block
>                   e.printStackTrace();
>             }
>             System.out.println("row count is " + rowCount);
>
>             return rowCount;
>       }
>
> For CUSTOMERS, the response is acceptable but for PRODUCTS, it is
> timing-out(even on the shell 1000851 row(s) in 258.9220 seconds).
>
> What needs to be done to get a response quickly? Approach other than
> AggregationClient or tweaking the Scan in the above code snippet?
>
> Regards,
> Omkar Joshi
>
> ________________________________
> The contents of this e-mail and any attachment(s) may contain confidential
> or privileged information for the intended recipient(s). Unintended
> recipients are prohibited from taking action on the basis of information in
> this e-mail and using or disseminating the information, and must notify the
> sender and delete it from their system. L&T Infotech will not accept
> responsibility or liability for the accuracy or completeness of, or the
> presence of any virus or disabling code in this e-mail"
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB