Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> full table scan


Copy link to this message
-
Re: full table scan
How many regions does your table have?

On Mon, Jun 6, 2011 at 4:48 AM, Andreas Reiter <[EMAIL PROTECTED]> wrote:
> hello everybody
>
> i'm trying to scan my hbase table for reporting purposes
> the cluster has 4 servers:
>  - server1: namenode, secondary namenode, jobtracker, hbase master,
> zookeeper1
>  - server2: datanode, tasktracker, hbase regionserver, zookeeper2
>  - server3: datanode, tasktracker, hbase regionserver, zookeeper3
>  - server4: datanode, tasktracker, hbase regionserver
> everything seems to work properly
> versions:
>  - hadoop-0.20.2-CDH3B4
>  - hbase-0.90.1-CDH3B4
>  - zookeeper-3.3.2-CDH3B4
>
>
> at the moment our hbase table has 300000 entries
>
> if i do a table scan over the hbase api  (at the moment without a filter)
> ResultScanner scanner = table.getScanner(...);
>
> it takes about 60 seconds to process, which is actually okey, because all
> records are processed be only one thread sequentially
> BUT it takes approximately the same time, if i do a scan over Map&Reduce job
> using TableInputFormat
>
> i'm definitely doing something wrong, because the processing time is going
> up directly proportional to the number of rows.
> in my understanding, the big advantage of hadoop/hbase is, that huge numbers
> of entries can be processed in parallel and very fast
>
> 300k entries are not much, we expecting this number to be added hourly to
> our cluster, but the processing time is increasing, which is actually not
> acceptable
>
> any one an idea, what i'm doing wrong?
>
> best regards
> andre
>
>

--
Joseph Echeverria
Cloudera, Inc.
443.305.9434
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB