Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> more tables or more rows


Copy link to this message
-
Re: more tables or more rows
Hello sir,

    It is absolutely fine to have as many tables as we like. My point
was that if we have a large no of tables then it might add some
overhead in locating the user region, as there will be a huge amount
of mapping from "user tables" to "region servers". Also, client will
have to cache  more information blocking the additional memory. So, I
suggested to have small no of large tables rather than large no of
small tables, if the data is similar.

Regards,
    Mohammad Tariq
On Tue, Aug 7, 2012 at 5:30 PM, Eric Czech <[EMAIL PROTECTED]> wrote:
> Thanks Mohammad,
>
> By saying the major purpose is to host very large tables (implying a
> smaller number of them), are you referring to anything other than the
> memstores per column family taking up sizable portions of physical memory?
>  Are there other components or design aspects that make using large numbers
> of tables inadvisable?
>
> On Sun, Aug 5, 2012 at 5:55 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:
>> Hello sir,
>>
>>       Going for a single table with 30+ rows would be a better choice,
>> if the data from all the sources is not very different. Since, you are
>> considering Hbase as your data store, it wouldn't be wise to have
>> several small rows. The major purpose of Hbase is to host very large
>> tables that may go beyond billions of rows and millions of columns.
>>
>> Regards,
>>     Mohammad Tariq
>>
>>
>> On Mon, Aug 6, 2012 at 3:18 AM, Eric Czech <[EMAIL PROTECTED]> wrote:
>>> I need to support data that comes from 30+ sources and the structure
>>> of that data is consistent across all the sources, but what I'm not
>>> clear on is whether or not I should use 30+ tables with roughly the
>>> same format or 1 table where the row key reflects the source.
>>>
>>> Anybody have a strong argument one way or the other?
>>>
>>> Thanks!
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB