Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Automating the partition creation process


+
Sadananda Hegde 2013-01-29, 04:05
+
Mark Grover 2013-01-29, 04:47
+
Dean Wampler 2013-01-29, 16:37
+
Sadananda Hegde 2013-01-30, 01:44
+
Sadananda Hegde 2013-01-30, 01:09
Copy link to this message
-
Re: Automating the partition creation process
Hi Sadananda,
Sorry to hear that.

It got committed, don't worry about the "ABORTED". Here is the commit on
the trunk:
https://github.com/apache/hive/commit/523f47c3b6e7cb7b6b7b7801c66406e116af6dbc

However, there is no Apache Hive release with that patch in it.

You have two options:
1. Download the patch, rebuild hive and use it
2. Find a hacky way to recover your partitions when they are empty and
populate them later.

Sorry for the inconvenience.

Mark

On Tue, Jan 29, 2013 at 5:09 PM, Sadananda Hegde <[EMAIL PROTECTED]>wrote:

> Thanks Mark,
>
> Recover partition feature will satisfy my needs; but MSCK Repair Partition
> < tablename> option is not working for me. It does not give any error; but
> does not add any partitions either.  It looks like it adds partitions only
> when the sub-folder is empty; but not when the sub-folder has the data
> files. I see a fix to this issue here.
>
> https://issues.apache.org/jira/browse/HIVE-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>
> But probably it's not commited yet, since the final result says 'ABORTED".
>
> Thanks,
> Sadu
>
> On Mon, Jan 28, 2013 at 10:47 PM, Mark Grover <[EMAIL PROTECTED]
> > wrote:
>
>> Sadananda,
>> See if this helps:
>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Recoverpartitions
>>
>>
>> On Mon, Jan 28, 2013 at 8:05 PM, Sadananda Hegde <[EMAIL PROTECTED]>wrote:
>>
>>> Hello,
>>>
>>> My hive table is partitioned by year, month and day. I have defined it
>>> as external table. The M/R job correctly loads the files into the daily
>>> subfolders. The hdfs files will be loaded to
>>> <hivetable>/year=yyyy/month=mm/day=dd/ folders by the scheduled M/R jobs.
>>> The M/R job has some business logic in determining the values for year,
>>> month and day; so one run might create / load files into multiple sub
>>> -folders (multiple days). I am able to query the tables after adding
>>> partitions using ALTER TABLE ADD PARTITION statement. But how do I automate
>>> the partition creation step? Basically this script needs to identify the
>>> subfolders created by the M/R job and create corresponding ALTER TABLE ADD
>>> PARTITION statements.
>>>
>>> For example, say the M/R job loads files into the following 3 sub-folders
>>>
>>> /user/hive/warehouse/sales/year=2013/month=1/day=21
>>> /user/hive/warehouse/sales/year=2013/month=1/day=22
>>> /user/hive/warehouse/sales/year=2013/month=1/day=23
>>>
>>> Then it should create 3 alter table statements
>>>
>>> ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=21);
>>>  ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=22);
>>> ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=23);
>>>
>>> I thought of changing M/R jobs to load all files into same folder,
>>> then first load the files into non-partitioned table and then to load the
>>> partitioned table from non-partitioned table (using dynamic partition); but
>>> would prefer to avoid that extra step if possible (esp. since data is
>>> already in the correct sub-folders).
>>>
>>> Any help would greately be appreciated.
>>>
>>> Regards,
>>> Sadu
>>>
>>>
>>>
>>
>>
>
+
Edward Capriolo 2013-01-30, 01:21
+
Sadananda Hegde 2013-01-30, 01:49
+
Dean Wampler 2013-01-30, 02:05
+
abhishek 2013-01-29, 04:47