Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # dev >> Looking at the columns table


+
Edward Capriolo 2012-04-11, 14:53
Copy link to this message
-
Re: Looking at the columns table
Hey Ed,

Your thinking is correct and has been implemented in
https://issues.apache.org/jira/browse/HIVE-2246

Time to upgrade to 0.8 :)

Thanks,
Ashutosh

On Wed, Apr 11, 2012 at 07:53, Edward Capriolo <[EMAIL PROTECTED]>wrote:

> Hey all. Our metastore in mysql is fairly large over 12GB. All the
> storage here is the columns table. It seems that each column is stored
> for each partition/storage descriptor as a one-many relationship.
>
> In our case all the partitions have the same column definition. My
> thinking. Should the relationship from columns->partition/storage
> descriptor be a many<->many? In this way we only store the column once
> and the current column table can reference the primary key of this
> column. This should bring the size of this table down really
> drastically.
>
> Since every other table in the metastore is so small this huge columns
> table looks like the only scalability choke point we have.
>
> Edward
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB