Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive, mail # user - Automating the partition creation process


+
Sadananda Hegde 2013-01-29, 04:05
+
Mark Grover 2013-01-29, 04:47
+
Dean Wampler 2013-01-29, 16:37
+
Sadananda Hegde 2013-01-30, 01:44
+
Sadananda Hegde 2013-01-30, 01:09
+
Mark Grover 2013-01-30, 01:17
+
Edward Capriolo 2013-01-30, 01:21
+
Sadananda Hegde 2013-01-30, 01:49
+
Dean Wampler 2013-01-30, 02:05
Copy link to this message
-
Re: Automating the partition creation process
abhishek 2013-01-29, 04:47
Sadananda,

Look at Oozie workflow.

Regards
Abhishek

On Jan 28, 2013, at 11:05 PM, Sadananda Hegde <[EMAIL PROTECTED]> wrote:

> Hello,
>  
> My hive table is partitioned by year, month and day. I have defined it as external table. The M/R job correctly loads the files into the daily subfolders. The hdfs files will be loaded to <hivetable>/year=yyyy/month=mm/day=dd/ folders by the scheduled M/R jobs. The M/R job has some business logic in determining the values for year, month and day; so one run might create / load files into multiple sub -folders (multiple days). I am able to query the tables after adding partitions using ALTER TABLE ADD PARTITION statement. But how do I automate the partition creation step? Basically this script needs to identify the subfolders created by the M/R job and create corresponding ALTER TABLE ADD PARTITION statements.
>  
> For example, say the M/R job loads files into the following 3 sub-folders
>  
> /user/hive/warehouse/sales/year=2013/month=1/day=21
> /user/hive/warehouse/sales/year=2013/month=1/day=22
> /user/hive/warehouse/sales/year=2013/month=1/day=23
>  
> Then it should create 3 alter table statements
>  
> ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=21);
> ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=22);
> ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=23);
>  
> I thought of changing M/R jobs to load all files into same folder, then first load the files into non-partitioned table and then to load the partitioned table from non-partitioned table (using dynamic partition); but would prefer to avoid that extra step if possible (esp. since data is already in the correct sub-folders).
>  
> Any help would greately be appreciated.
>  
> Regards,
> Sadu
>  
>