Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Automating the partition creation process


Copy link to this message
-
Re: Automating the partition creation process
Sadananda Hegde 2013-01-30, 01:49
Thanks, Edward.  I can probably create all previous days partitions ahead
of time and then use Dean's logic to create new partitions on a daily
basis. I will probably end up having few empty partitions; need to make
sure it does not cause any confusions.

Thanks,
Sadu

On Tue, Jan 29, 2013 at 7:21 PM, Edward Capriolo <[EMAIL PROTECTED]>wrote:

> You can also just create all your partitions ahead of time. They will not
> do any harm if empty. (unless you have an older version and hit this...
> http://issues.apache.org/jira/browse/HIVE-1007 )
>
>
> On Tue, Jan 29, 2013 at 8:17 PM, Mark Grover <[EMAIL PROTECTED]>wrote:
>
>> Hi Sadananda,
>> Sorry to hear that.
>>
>> It got committed, don't worry about the "ABORTED". Here is the commit on
>> the trunk:
>>
>> https://github.com/apache/hive/commit/523f47c3b6e7cb7b6b7b7801c66406e116af6dbc
>>
>> However, there is no Apache Hive release with that patch in it.
>>
>> You have two options:
>> 1. Download the patch, rebuild hive and use it
>> 2. Find a hacky way to recover your partitions when they are empty and
>> populate them later.
>>
>> Sorry for the inconvenience.
>>
>> Mark
>>
>> On Tue, Jan 29, 2013 at 5:09 PM, Sadananda Hegde <[EMAIL PROTECTED]>wrote:
>>
>>> Thanks Mark,
>>>
>>> Recover partition feature will satisfy my needs; but MSCK Repair
>>> Partition < tablename> option is not working for me. It does not give any
>>> error; but does not add any partitions either.  It looks like it adds
>>> partitions only when the sub-folder is empty; but not
>>> when the sub-folder has the data files. I see a fix to this issue here.
>>>
>>> https://issues.apache.org/jira/browse/HIVE-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>>>
>>> But probably it's not commited yet, since the final result says
>>> 'ABORTED".
>>>
>>> Thanks,
>>> Sadu
>>>
>>> On Mon, Jan 28, 2013 at 10:47 PM, Mark Grover <
>>> [EMAIL PROTECTED]> wrote:
>>>
>>>> Sadananda,
>>>> See if this helps:
>>>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Recoverpartitions
>>>>
>>>>
>>>> On Mon, Jan 28, 2013 at 8:05 PM, Sadananda Hegde <[EMAIL PROTECTED]>wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> My hive table is partitioned by year, month and day. I have defined it
>>>>> as external table. The M/R job correctly loads the files into the daily
>>>>> subfolders. The hdfs files will be loaded to
>>>>> <hivetable>/year=yyyy/month=mm/day=dd/ folders by the scheduled M/R jobs.
>>>>> The M/R job has some business logic in determining the values for year,
>>>>> month and day; so one run might create / load files into multiple sub
>>>>> -folders (multiple days). I am able to query the tables after adding
>>>>> partitions using ALTER TABLE ADD PARTITION statement. But how do I automate
>>>>> the partition creation step? Basically this script needs to identify the
>>>>> subfolders created by the M/R job and create corresponding ALTER TABLE ADD
>>>>> PARTITION statements.
>>>>>
>>>>> For example, say the M/R job loads files into the following 3
>>>>> sub-folders
>>>>>
>>>>> /user/hive/warehouse/sales/year=2013/month=1/day=21
>>>>> /user/hive/warehouse/sales/year=2013/month=1/day=22
>>>>> /user/hive/warehouse/sales/year=2013/month=1/day=23
>>>>>
>>>>> Then it should create 3 alter table statements
>>>>>
>>>>> ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=21);
>>>>>  ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=22);
>>>>> ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=23);
>>>>>
>>>>> I thought of changing M/R jobs to load all files into same folder,
>>>>> then first load the files into non-partitioned table and then to load the
>>>>> partitioned table from non-partitioned table (using dynamic partition); but
>>>>> would prefer to avoid that extra step if possible (esp. since data is
>>>>> already in the correct sub-folders).
>>>>>
>>>>> Any help would greately be appreciated.
>>>>>
>>>>> Regards,
>>>>> Sadu
>>>>>
>>>>>
>>>>>