Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - full table scan


Copy link to this message
-
Re: full table scan
Joey Echeverria 2011-06-06, 13:10
How many regions does your table have?

On Mon, Jun 6, 2011 at 4:48 AM, Andreas Reiter <[EMAIL PROTECTED]> wrote:
> hello everybody
>
> i'm trying to scan my hbase table for reporting purposes
> the cluster has 4 servers:
>  - server1: namenode, secondary namenode, jobtracker, hbase master,
> zookeeper1
>  - server2: datanode, tasktracker, hbase regionserver, zookeeper2
>  - server3: datanode, tasktracker, hbase regionserver, zookeeper3
>  - server4: datanode, tasktracker, hbase regionserver
> everything seems to work properly
> versions:
>  - hadoop-0.20.2-CDH3B4
>  - hbase-0.90.1-CDH3B4
>  - zookeeper-3.3.2-CDH3B4
>
>
> at the moment our hbase table has 300000 entries
>
> if i do a table scan over the hbase api  (at the moment without a filter)
> ResultScanner scanner = table.getScanner(...);
>
> it takes about 60 seconds to process, which is actually okey, because all
> records are processed be only one thread sequentially
> BUT it takes approximately the same time, if i do a scan over Map&Reduce job
> using TableInputFormat
>
> i'm definitely doing something wrong, because the processing time is going
> up directly proportional to the number of rows.
> in my understanding, the big advantage of hadoop/hbase is, that huge numbers
> of entries can be processed in parallel and very fast
>
> 300k entries are not much, we expecting this number to be added hourly to
> our cluster, but the processing time is increasing, which is actually not
> acceptable
>
> any one an idea, what i'm doing wrong?
>
> best regards
> andre
>
>

--
Joseph Echeverria
Cloudera, Inc.
443.305.9434