Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Creating partition with __HIVE_DEFAULT_PARTITION__ value


Copy link to this message
-
Re: Creating partition with __HIVE_DEFAULT_PARTITION__ value
so msck repair table + dynamic partitioning semantics looks like it fits
the bill for you.

yeah, 300K partitions. That's getting up there on the scale of things with
hive i'd say and close to over-partitioning.   for archival purposes maybe
older data doesn't need such a fine grained partition?  something to think
about anyway.

anyhoo, glad you found a solution!

On Tue, Sep 24, 2013 at 3:07 AM, Ivan Kruglov <[EMAIL PROTECTED]>wrote:

> Hi everyone,
>
> Thank you for your answers.
>
> On 24.09.2013, at 0:36, Stephen Sprague <[EMAIL PROTECTED]> wrote:
>
> If its any help I've done this kind of thing frequently:
>
> 1. create the table on the new cluster.
>
> 2. distcp the data right into the hdfs directory where the table resides
> on the new cluster - no temp storage required.
>
> 3. run this hive command:   msck repair table <table>;   -- this command
> will create your partitions for you - its pretty slick that way.
>
>
> Let us know how it goes.
>
>
> I was thinking about this case. But it doesn't work well for me. The thing
> is that the table is giant. It's about 200TB and has about 300K partitions.
> So, MSCK REPAIR TABLE takes forever to compete. And I would need to run it
> every day (I'm doing incremental-like distcp-ing). However, I can go
> another way. I can distcp data into temporary table which has small amount
> of partitions, run MSCK against it and then do something like "INSERT
> OVERWRITE TABLE target_table PARTITION(….) SELECT * FROM tmp_table". It
> should work.
>
>
>
> On Mon, Sep 23, 2013 at 10:46 AM, Edward Capriolo <[EMAIL PROTECTED]>wrote:
>
>> Did you try ALTER TABLE table ADD IF NOT EXISTS PARTITION
>> (partition=NULL);
>>
>> If that does not work you will need to create a dynamic partition type
>> query that will create the dummy partition. File a jira if the above syntax
>> does not work. There should be SOME way to create the default partition by
>> hand.
>>
>
> Yes, I've tried it and it doesn't work as well. I will file a ticket.
>
>
>>
>> On Mon, Sep 23, 2013 at 10:48 AM, Ivan Kruglov <[EMAIL PROTECTED]>wrote:
>>
>>> Hello to everyone,
>>>
>>> I'm working on the task of syncing data between two tables which have
>>> similar structure (read the same set of partitions). The tables are in
>>> different data centers and one table is a backup copy of another one. I'm
>>> trying to achieve this goal through distcp-ing data into target DC in
>>> temporary folder, recreating all needed partitions in target table and
>>> moving files from temporary place to final place. But I'm stuck on issue of
>>> creating partitions with value ' __HIVE_DEFAULT_PARTITION__'
>>>
>>> So, my question is: Is it possible in hive to manually create partition
>>> with '__HIVE_DEFAULT_PARTITION__' value?
>>>
>>> Neither of this way work:
>>> ALTER TABLE table ADD IF NOT EXISTS PARTITION (partition=);
>>> ALTER TABLE table ADD IF NOT EXISTS PARTITION (partition='');
>>> ALTER TABLE table ADD IF NOT EXISTS PARTITION
>>> (partition='__HIVE_DEFAULT_PARTITION__');
>>>
>>> Thank you.
>>> Ivan Kruglov.
>>
>>
>>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB