Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Speeding up the row count


+
Omkar Joshi 2013-04-17, 09:47
Copy link to this message
-
Re: Speeding up the row count
Jean-Marc Spaggiari 2013-04-17, 11:06
Hi,

You might want to take a look at that:

http://hbase.apache.org/book/ops_mgt.html#rowcounter

JM
2013/4/17 Omkar Joshi <[EMAIL PROTECTED]>

> Hi,
>
> I'm having two tables - CUSTOMERS(60000 + rows) and PRODUCTS(1000851 rows).
>
> The table structures are  :
>
> CUSTOMERS
> rowkey :                       CUSTOMER_ID
>
> column family : CUSTOMER_INFO
>
>             columns :          NAME
>                                     EMAIL
>                                     ADDRESS
>                                     MOBILE
>
>
> PRODUCTS
> rowkey :                       PRODUCT_ID
>
> column family : PRODUCT_INFO
>
>             columns : NAME
>                                     CATEGORY
>                                     GROUP
>                                     COMPANY
>                                     COST
>                                     COLOR
>
> I'm trying to get the row count for each table using the following snippet
> :
> .
> .
> .
> hbaseCRUD.getTableCount(args[1], "CUSTOMER_INFO","NAME");
> .
> .
> hbaseCRUD.getTableCount(args[1], "PRODUCT_INFO","NAME");
>
> public long getTableCount(String tableName, String columnFamilyName,
>                   String columnName) {
>             AggregationClient aggregationClient = new
> AggregationClient(config);
>             Scan scan = new Scan();
>             scan.addFamily(Bytes.toBytes(columnFamilyName));
>             if (columnName != null && !columnName.isEmpty()) {
>                   scan.addColumn(Bytes.toBytes(columnFamilyName),
>                               Bytes.toBytes(columnName));
>             }
>
>             long rowCount = 0;
>             try {
>                   rowCount > aggregationClient.rowCount(Bytes.toBytes(tableName),
>                               null, scan);
>             } catch (Throwable e) {
>                   // TODO Auto-generated catch block
>                   e.printStackTrace();
>             }
>             System.out.println("row count is " + rowCount);
>
>             return rowCount;
>       }
>
> For CUSTOMERS, the response is acceptable but for PRODUCTS, it is
> timing-out(even on the shell 1000851 row(s) in 258.9220 seconds).
>
> What needs to be done to get a response quickly? Approach other than
> AggregationClient or tweaking the Scan in the above code snippet?
>
> Regards,
> Omkar Joshi
>
> ________________________________
> The contents of this e-mail and any attachment(s) may contain confidential
> or privileged information for the intended recipient(s). Unintended
> recipients are prohibited from taking action on the basis of information in
> this e-mail and using or disseminating the information, and must notify the
> sender and delete it from their system. L&T Infotech will not accept
> responsibility or liability for the accuracy or completeness of, or the
> presence of any virus or disabling code in this e-mail"
>
+
Vedad Kirlic 2013-04-17, 18:52
+
Omkar Joshi 2013-04-19, 07:33
+
Ted Yu 2013-04-19, 09:29
+
Omkar Joshi 2013-04-19, 09:55
+
Ted Yu 2013-04-19, 13:55
+
Omkar Joshi 2013-04-22, 06:39
+
lars hofhansl 2013-04-19, 18:10
+
James Taylor 2013-04-19, 15:59