Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # dev >> Looking at the columns table


Copy link to this message
-
Re: Looking at the columns table
Hey Ed,

Your thinking is correct and has been implemented in
https://issues.apache.org/jira/browse/HIVE-2246

Time to upgrade to 0.8 :)

Thanks,
Ashutosh

On Wed, Apr 11, 2012 at 07:53, Edward Capriolo <[EMAIL PROTECTED]>wrote:

> Hey all. Our metastore in mysql is fairly large over 12GB. All the
> storage here is the columns table. It seems that each column is stored
> for each partition/storage descriptor as a one-many relationship.
>
> In our case all the partitions have the same column definition. My
> thinking. Should the relationship from columns->partition/storage
> descriptor be a many<->many? In this way we only store the column once
> and the current column table can reference the primary key of this
> column. This should bring the size of this table down really
> drastically.
>
> Since every other table in the metastore is so small this huge columns
> table looks like the only scalability choke point we have.
>
> Edward
>