Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Scan performance on a big table as combination of multiple logic tables


Copy link to this message
-
Re: Scan performance on a big table as combination of multiple logic tables
Out of curiosity,  what do you perceive as the benefit to having only one
table?  Are there reasons that you think one table would perform better
than a few?

If you're splitting data within a table because you'd otherwise have
millions of tables, I understand that and would concur with Vladimir's
approach below.  However, if you're really looking at 10 tables versus one
table, it seems like HBase is built exactly to make that work well (rather
than having to make all sorts of application level code to do what HBase
already does).

thanks,
Jacques

On Wed, Feb 15, 2012 at 1:57 PM, Pan, Thomas <[EMAIL PROTECTED]> wrote:

>
> Since Hbase is tailored to handle one table very well, we are thinking to
> put multiple tables into one big table but on different column family sets.
> Our use case is full table scan against single column value filters. As
> records from different "logical tables" are at different column families,
> could we speed up the scan performance by simply checking the column family
> referenced by these single column value filters first before really going
> through all the underlying K-V pairs? It would be great if the Hbase code
> is already coded that way.
>
>
> $0.02,
> Thomas
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB